Qureos

Find The RightJob.

Date: May 19, 2026

Location: Saudi Arabia

Company: King Abdullah University of Science & Technology

Position Summary

The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps

infrastructure, container orchestration, and workflow automation at a supercomputing scale. Working under the

AI/ML Support Team Lead, this role is responsible for developing and maintaining secure, OCI-compliant container

images, robust CI/CD pipelines, and cloud-native MLOps workflows that enable researchers to efficiently deploy and

manage AI/ML workloads. The Analyst will bridge the gap between cutting-edge Kubernetes-based infrastructure

and the diverse needs of the research community, contributing to governance, technical enablement, and

community development initiatives.


Major Responsibilities

1 MLOps and Container Development

  • Providing timely and useful user support via telephone, walk-in, email, and ticketing system submissions

for all types of inquiries.

  • Maintain high customer service standards in dealing with and responding to user issues and questions.
  • Develop and maintain secure, OCI-compliant, and HPC-ready AI/ML and data science software container

images

  • Design and implement robust MLOps workflows and pipelines at supercomputing scale
  • Develop and maintain CI/CD pipelines for reproducible infrastructure and workflow deployment
  • Design and deploy APIs for AI/ML services and inference endpoints
  • Implement and manage Kubernetes-based orchestration, including CNI, CSI, and service mesh

configurations and optimization

  • Deploy and maintain container registries (Harbor) and model registries (MLFlow, Kubeflow Model

Registry)

2 Governance and Compliance Support

  • Assist in computational readiness reviews for AI research projects
  • Assist in AI model and artifact control reviews to ensure compliance with institutional standards
  • Provide consultation to users on efficient resource usage for AI/ML and MLOps workflows
  • Ensure container images and workflows comply with security policies and best practices
  • Support the implementation of usage monitoring and reporting systems

3 Performance and Benchmarking

  • Perform performance debugging and tuning of MLOps and cloud-native workflows
  • Develop and maintain AI/ML and MLOps workload benchmarks for procuring new systems
  • Create and maintain regression testing workloads for existing clusters
  • Deploy and maintain observability and resource monitoring stacks using Prometheus, Grafana, NVIDIA

DCGM, and Grafana Loki

  • Contribute to technology evaluation and benchmarking exercises for future infrastructure investments

4 Training and Documentation

  • Create comprehensive training content for users on MLOps platforms, Kubernetes, and containerization
  • Develop and maintain high-quality user documentation for automation tools and workflows
  • Support the delivery of workshops on CI/CD, container orchestration, and MLOps best practices
  • Contribute to knowledge transfer initiatives within the KAUST research community
  • Provide one-on-one consultation to researchers on efficient use of automation infrastructure

Personal Requirements

Competencies

  • Experience
  • Demonstrated experience developing robust and complex MLOps pipelines
  • Hands-on experience with API design and deployment
  • Experience developing robust and portable CI/CD pipelines for reproducible infrastructure and workflow

deployment

  • Experience supporting researchers or working in academic/research computing settings preferred
  • Technical Skills - Essential
  • Kubernetes: Strong expertise in Kubernetes, Container Network Interface (CNI), Container Storage

Interface (CSI), and Service Mesh

  • MLOps: Experience developing and maintaining MLOps pipelines and workflows
  • CI/CD: Proficiency in building CI/CD pipelines for infrastructure and application deployment
  • Containerization: Experience building secure, OCI-compliant container images
  • API Development: Experience in API design, development, and deployment
  • Programming: Proficiency in Python; experience with Go, Bash scripting
  • Linux: Strong Linux/Unix systems administration skills
  • Technical Skills - Desired
  • Experience with ArgoCD, Airflow, DASK, Spark for workflow orchestration
  • Experience with Kubeflow, KServe, and Seldon for ML serving and pipelines
  • Experience deploying and maintaining observability stacks (Prometheus, Grafana, NVIDIA DCGM, Grafana

Loki)

  • Knowledge of Model Context Protocol (MCP) and agentic frameworks
  • Experience deploying inference services at scale
  • Experience deploying and maintaining container registries (Harbor) and model registries (MLFlow,

Kubeflow Model Registry, Artifact Hub)

  • Experience with GitOps practices and Infrastructure as Code (Terraform, Ansible)
  • Experience with HPC schedulers (SLURM) and HPC-cloud integration
  • Soft Skills
  • Strong problem-solving and analytical abilities
  • Excellent written and verbal communication skills in English
  • Customer service mindset with patience for supporting diverse skill levels
  • Ability to work independently and as part of a collaborative team
  • Strong documentation and knowledge-sharing practices
  • Cultural sensitivity for working in an international environment

Preferred Qualifications

  • Experience in national laboratories or major research computing facilities
  • Experience with GPU scheduling and resource management in Kubernetes
  • Background in DevOps or Site Reliability Engineering (SRE)
  • Contributions to open-source cloud-native or MLOps projects
  • Publications or presentations on MLOps, Kubernetes, or automation topics
  • Knowledge of Saudi Arabia's Vision 2030 and national AI initiatives
  • Additional certifications: AWS/Azure/GCP, Terraform, NVIDIA DLI

Qualifications

  • Bachelor's or master’s degree in computer science, Data Science, Computational Science, Artificial

Intelligence, or a related field

  • Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application

Developer), CKS (Certified Kubernetes Security Specialist), or CNPE (Certified Cloud Native Platform

Engineer) are highly valued


Experience

  • Minimum of 2 years of relevant experience

Similar jobs

No similar jobs found

© 2026 Qureos. All rights reserved.