Qureos

Find The RightJob.

Senior Infrastructure / HPC

Date Posted:
11 June, 2026
Industry:
IT Services and IT Consulting
Location:
VaporVM

Job Description:

Senior HPC / Infrastructure Engineer

Location: Riyadh, Saudi Arabia

Employment Type: Full-Time

Experience: 10+ Years (Hands-On)

Position Overview

We are seeking a highly experienced Senior HPC / Infrastructure Engineer with proven expertise in designing, deploying, and operating enterprise-scale High-Performance Computing (HPC) and AI infrastructure environments. This role is ideal for a hands-on technical leader who has built and managed production-grade HPC platforms, GPU clusters, Kubernetes ecosystems, and AI infrastructure from the ground up.

The successful candidate will play a critical role in architecting, optimizing, and maintaining mission-critical compute environments that support advanced AI/ML, data science, and high-performance workloads.

Required Certifications

  • RHCE – Red Hat Certified Engineer (Active)
  • CKA – Certified Kubernetes Administrator (Active)

Core Technical Expertise

HPC & NVIDIA AI Ecosystem

  • NVIDIA Base Command Manager (BCM)
  • NVIDIA AI Enterprise
  • NVIDIA GPU Operator & Network Operator
  • NVIDIA NIM Inference Services
  • NVIDIA AI Blueprints
  • CUDA, GPU Drivers, and Performance Optimization

Compute & Container Platforms

  • Kubernetes (Architecture, Operations & Scaling)
  • Slurm Workload Manager
  • Distributed Computing Environments

Operating Systems

  • Red Hat Enterprise Linux (RHEL)
  • Ubuntu LTS (Canonical)

Automation & DevOps

  • CI/CD Pipeline Design & Implementation
  • Infrastructure Automation
  • Platform Lifecycle Management
  • Configuration Management & Orchestration

Key Responsibilities

  • Design, deploy, and operate large-scale HPC and AI infrastructure environments from bare metal through workload orchestration.
  • Architect and manage NVIDIA GPU platforms, including BCM, AI Enterprise, GPU Operator, and AI service enablement.
  • Configure, optimize, and maintain Slurm scheduling environments for high-throughput and GPU-intensive workloads.
  • Design and operate highly available Kubernetes clusters supporting AI/ML, analytics, and containerized workloads.
  • Enable and support NVIDIA NIM services and AI Blueprint deployments for enterprise AI initiatives.
  • Administer and optimize RHEL and Ubuntu environments, ensuring stability, security, and performance.
  • Develop and maintain infrastructure automation frameworks and CI/CD pipelines for platform and application deployment.
  • Optimize performance across compute, GPU, storage, networking, and cluster resources.
  • Implement monitoring, observability, alerting, capacity planning, and operational best practices.
  • Enforce security, patch management, access controls, and compliance standards across the infrastructure stack.
  • Lead troubleshooting, root cause analysis, and resolution of complex infrastructure and platform issues.

Candidate Profile

  • 10+ years of hands-on experience in HPC, Linux infrastructure, and enterprise platform engineering.
  • Proven track record of building and operating production-scale HPC, GPU, or AI infrastructure environments.
  • Deep expertise in Kubernetes, Slurm, Linux administration, and NVIDIA AI technologies.
  • Strong understanding of distributed systems, workload scheduling, cluster management, and performance optimization.
  • Experience supporting AI/ML, data science, and high-performance computing workloads at scale.
  • Strong analytical, troubleshooting, and problem-solving skills.
  • Ability to work across infrastructure, platform, automation, and AI enablement domains.
  • Demonstrated ownership mindset with a history of delivering reliable, scalable, and high-performing solutions.

© 2026 Qureos. All rights reserved.