Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Role Overview:
We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities:

Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
Manage and operate GPU orchestration tools and platforms such as:
Nvidia Base Command Manager (critical)
Nvidia AI Enterprise Suite
Nvidia GPU and Network Operators
Nvidia NIMs and Blueprints
Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
Slurm (critical)
Vanilla Kubernetes
Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.

Requirements:
Required Skills & Experience:

Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
Hands-on experience with:
Nvidia Base Command Manager
Nvidia AI Enterprise Suite
Nvidia GPU/Network Operators, NIMs, Blueprints
Strong experience with Slurm and/or Kubernetes orchestration.
Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
Excellent troubleshooting and performance-tuning skills.
Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications:

Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Similar jobs

Senior Transportation Engineer

KEO International Consultants

Riyadh, Saudi Arabia

5 days ago

Senior MEP Manager (Engineer) – Station and Transportation

Parsons

Riyadh, Saudi Arabia

5 days ago

Senior Infrastructure & Virtualization Engineer - Riyadh,KSA

DeepSource Technologies

Riyadh, Saudi Arabia

5 days ago

Senior Discipline Engineer (Landscape)

JASARA PMC

Riyadh, Saudi Arabia

5 days ago

Senior Data Engineer – Riyadh

Jawraa

Riyadh, Saudi Arabia

5 days ago

Project Engineer (MEP/Civil Design) - Saudi National

Parsons

Riyadh, Saudi Arabia

5 days ago

Field Engineer - Civil

Parsons

Riyadh, Saudi Arabia

5 days ago

Term of use Privacy policy