Date: Nov 18, 2025
Location: Saudi Arabia
Company: King Abdullah University of Science & Technology
The Data Center Hardware & Equipment Specialist at KAUST's flagship supercomputer facility plays a crucial role in ensuring the physical infrastructure's security and operational continuity. This position oversees the installation, configuration, and maintenance of servers, including GPUs, within the CRAY supercomputer environment and other hosted equipment. This position ensures compliance with hardware tracking processes and updates the asset register, collaborating with various teams to meet procurement and regulatory requirements while maintaining high standards of security and operational efficiency.
Working within the Scientific Computing Center (SCC) data center and other Data Centers, hosting HPCs, including Shaheen III, the Data Center Hardware & Equipment Specialist manages data center hardware and equipment, ensuring that all components are properly installed and configured to support operations. The Specialist maintains detailed records of hardware assets, tracks equipment status, and ensures compliance with established protocols. This role is essential to maintaining the integrity and performance of the HPE/CRAY supercomputer's infrastructure.
Major Responsibilities:
- Oversee the installation, configuration, and maintenance of servers, including GPUs
- Ensure compliance with hardware tracking processes and update the asset register
- Perform regular inspections and maintenance of data center hardware
- Monitor physical conditions of servers and other IT infrastructure
- Manage network cables, server hardware and other equipment
- Collaborate with vendors to proactively manage backup part inventory to ensure uptime
- Work with compliance officers to ensure accurate accounting of controlled equipment
- Follow data center access protocols to ensure the security of the system and prevent unauthorized removal of parts
- Respond to alarms and incidents, providing immediate resolution
- Collaborate with procurement teams to acquire necessary equipment
- Ensure adherence to export control regulations and NIST SP 800-53 standards.
- Maintain detailed records of hardware assets and track equipment status
- Oversee/perform complex software/hardware troubleshooting, patches, and re-installations
- Manage infrastructure capacity and performance, verifying application logs and monitoring activity
Personal Requirements:
Competencies
- Demonstrates expertise in managing and maintaining HPE/CRAY supercomputer hardware infrastructure, including servers, storage systems, and networking equipment
- Shows proficiency in handling GPUs and optimizing the infrastructure performance within the supercomputer environment
- Exhibits strong understanding of HPC systems and architectures
- Communicates effectively with stakeholders
- Uses monitoring tools to optimize supercomputer infrastructure performance
- Demonstrates analytical skills to troubleshoot and resolve hardware issues
- Shows flexibility to adapt to new technologies and changing business needs
- Ensures compliance with hardware tracking processes
- Collaborates across teams to achieve goals
- Ability to manage logistics of heavy equipment and work in confined spaces
Experience
- Detailed knowledge of HPE/CRAY supercomputer hardware infrastructure and NVIDIA supercomputer GPUs, including installation, maintenance, and troubleshooting of the supporting infrastructure
- Experience working in data centers, managing large-scale hardware deployments, and ensuring uptime and reliability
- Proven track record in overseeing the installation, configuration, and maintenance of servers and data center equipment
- Familiarity with hardware tracking processes and asset register management
- Bachelor’s degree in Computer Science, Information Technology, Electrical Engineering, Electronics Engineering or related field
- Relevant certifications preferred (e.g., CDCTP, DCCA, CCNP Data Center, RCDD)
- Minimum of 7 years of experience in managing HPC hardware and data center equipment/infrastructure
- Knowledge of data center infrastructure and operations
- Understanding of IT asset management for controlled equipment