AI Infrastructure Consultant

Job Title: AI Infrastructure Consultant

Job Type: Permanent

Job Location: Riyadh, Saudi Arabia

Job Summary:

We are seeking a seasoned AI Infrastructure Consultant to lead the design, implementation, and optimization of our high-performance computing environment. This role is critical for bridging the gap between raw hardware capabilities (GPUs) and scalable AI/ML model deployment. You will be responsible for ensuring our infrastructure is robust, cost-effective, and capable of supporting complex machine learning workloads at scale.

Roles and Responsibilities:

Architecture & Design
- Assess AI/ML workload requirements to design end-to-end compute, storage, and networking architectures.
- Architect specialized GPU clusters (NVIDIA A100/H100 or similar) tailored for training and inference.
- Define high-speed networking requirements (e.g., InfiniBand, RoCE) and low-latency storage solutions for massive datasets.
Containerization & Orchestration
- Implement and manage Docker containerization for consistent model environments.
- Deploy and scale AI workloads using Kubernetes (or managed services like EKS/GKE/AKS), ensuring high availability and seamless resource scheduling.
MLOps & CI/CD Integration
- Build and maintain robust CI/CD pipelines specifically for AI models, automating the journey from code to production.
- Integrate automated testing, versioning for models/data, and deployment strategies (Canary, Blue-Green).
Monitoring & Cost Optimization
- Establish comprehensive monitoring frameworks to track infrastructure utilization and GPU health.
- Analyze performance bottlenecks and implement strategies to optimize cost-performance, ensuring maximum ROI on expensive compute resources.

Required Qualifications & Skills:

Total Experience: 10+ years in IT Infrastructure, Systems Engineering, or DevOps.
AI Specialization: 2-3 years of hands on experience specifically in AI/ML infrastructure.
GPU Expertise: Proven track record in GPU setup, CUDA configurations, and managing hardware acceleration for deep learning.
Orchestration: Expert level knowledge of Kubernetes and the CNCF ecosystem.
Cloud & Hybrid: Proficiency in major cloud providers (AWS/Azure/GCP) and on premise data center environments.
Soft Skills: Strong consultancy mindset with the ability to translate complex technical requirements into actionable architectural roadmaps.

Similar jobs

No similar jobs found

Term of use Privacy policy