Qureos

Find The RightJob.

AI Infrastructure Consultant

Job Title: AI Infrastructure Consultant

Job Type: Permanent

Job Location: Riyadh, Saudi Arabia

Job Summary:

We are seeking a seasoned AI Infrastructure Consultant to lead the design, implementation, and optimization of our high-performance computing environment. This role is critical for bridging the gap between raw hardware capabilities (GPUs) and scalable AI/ML model deployment. You will be responsible for ensuring our infrastructure is robust, cost-effective, and capable of supporting complex machine learning workloads at scale.

Roles and Responsibilities:
  1. Architecture & Design
    • Assess AI/ML workload requirements to design end-to-end compute, storage, and networking architectures.
    • Architect specialized GPU clusters (NVIDIA A100/H100 or similar) tailored for training and inference.
    • Define high-speed networking requirements (e.g., InfiniBand, RoCE) and low-latency storage solutions for massive datasets.
  2. Containerization & Orchestration
    • Implement and manage Docker containerization for consistent model environments.
    • Deploy and scale AI workloads using Kubernetes (or managed services like EKS/GKE/AKS), ensuring high availability and seamless resource scheduling.
  3. MLOps & CI/CD Integration
    • Build and maintain robust CI/CD pipelines specifically for AI models, automating the journey from code to production.
    • Integrate automated testing, versioning for models/data, and deployment strategies (Canary, Blue-Green).
  4. Monitoring & Cost Optimization
    • Establish comprehensive monitoring frameworks to track infrastructure utilization and GPU health.
    • Analyze performance bottlenecks and implement strategies to optimize cost-performance, ensuring maximum ROI on expensive compute resources.
Required Qualifications & Skills:
  • Total Experience: 10+ years in IT Infrastructure, Systems Engineering, or DevOps.
  • AI Specialization: 2-3 years of hands on experience specifically in AI/ML infrastructure.
  • GPU Expertise: Proven track record in GPU setup, CUDA configurations, and managing hardware acceleration for deep learning.
  • Orchestration: Expert level knowledge of Kubernetes and the CNCF ecosystem.
  • Cloud & Hybrid: Proficiency in major cloud providers (AWS/Azure/GCP) and on premise data center environments.
  • Soft Skills: Strong consultancy mindset with the ability to translate complex technical requirements into actionable architectural roadmaps.

© 2026 Qureos. All rights reserved.