Find The RightJob.

King Abdullah University of Science and Technology

AI/ML Support Automation Analyst

Date: May 19, 2026

Location: Saudi Arabia

Company: King Abdullah University of Science & Technology

Position Summary

The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps

infrastructure, container orchestration, and workflow automation at a supercomputing scale. Working under the

AI/ML Support Team Lead, this role is responsible for developing and maintaining secure, OCI-compliant container

images, robust CI/CD pipelines, and cloud-native MLOps workflows that enable researchers to efficiently deploy and

manage AI/ML workloads. The Analyst will bridge the gap between cutting-edge Kubernetes-based infrastructure

and the diverse needs of the research community, contributing to governance, technical enablement, and

community development initiatives.

Major Responsibilities

1 MLOps and Container Development

Providing timely and useful user support via telephone, walk-in, email, and ticketing system submissions

for all types of inquiries.

Maintain high customer service standards in dealing with and responding to user issues and questions.

Develop and maintain secure, OCI-compliant, and HPC-ready AI/ML and data science software container

images

Design and implement robust MLOps workflows and pipelines at supercomputing scale

Develop and maintain CI/CD pipelines for reproducible infrastructure and workflow deployment

Design and deploy APIs for AI/ML services and inference endpoints

Implement and manage Kubernetes-based orchestration, including CNI, CSI, and service mesh

configurations and optimization

Deploy and maintain container registries (Harbor) and model registries (MLFlow, Kubeflow Model

Registry)

2 Governance and Compliance Support

Assist in computational readiness reviews for AI research projects

Assist in AI model and artifact control reviews to ensure compliance with institutional standards

Provide consultation to users on efficient resource usage for AI/ML and MLOps workflows

Ensure container images and workflows comply with security policies and best practices

Support the implementation of usage monitoring and reporting systems

3 Performance and Benchmarking

Perform performance debugging and tuning of MLOps and cloud-native workflows

Develop and maintain AI/ML and MLOps workload benchmarks for procuring new systems

Create and maintain regression testing workloads for existing clusters

Deploy and maintain observability and resource monitoring stacks using Prometheus, Grafana, NVIDIA

DCGM, and Grafana Loki

Contribute to technology evaluation and benchmarking exercises for future infrastructure investments

4 Training and Documentation

Create comprehensive training content for users on MLOps platforms, Kubernetes, and containerization

Develop and maintain high-quality user documentation for automation tools and workflows

Support the delivery of workshops on CI/CD, container orchestration, and MLOps best practices

Contribute to knowledge transfer initiatives within the KAUST research community

Provide one-on-one consultation to researchers on efficient use of automation infrastructure

Personal Requirements

Competencies

Experience

Demonstrated experience developing robust and complex MLOps pipelines

Hands-on experience with API design and deployment

Experience developing robust and portable CI/CD pipelines for reproducible infrastructure and workflow

deployment

Experience supporting researchers or working in academic/research computing settings preferred

Technical Skills - Essential

Kubernetes: Strong expertise in Kubernetes, Container Network Interface (CNI), Container Storage

Interface (CSI), and Service Mesh

MLOps: Experience developing and maintaining MLOps pipelines and workflows

CI/CD: Proficiency in building CI/CD pipelines for infrastructure and application deployment

Containerization: Experience building secure, OCI-compliant container images

API Development: Experience in API design, development, and deployment

Programming: Proficiency in Python; experience with Go, Bash scripting

Linux: Strong Linux/Unix systems administration skills

Technical Skills - Desired

Experience with ArgoCD, Airflow, DASK, Spark for workflow orchestration

Experience with Kubeflow, KServe, and Seldon for ML serving and pipelines

Experience deploying and maintaining observability stacks (Prometheus, Grafana, NVIDIA DCGM, Grafana

Loki)

Knowledge of Model Context Protocol (MCP) and agentic frameworks

Experience deploying inference services at scale

Experience deploying and maintaining container registries (Harbor) and model registries (MLFlow,

Kubeflow Model Registry, Artifact Hub)

Experience with GitOps practices and Infrastructure as Code (Terraform, Ansible)

Experience with HPC schedulers (SLURM) and HPC-cloud integration

Soft Skills

Strong problem-solving and analytical abilities

Excellent written and verbal communication skills in English

Customer service mindset with patience for supporting diverse skill levels

Ability to work independently and as part of a collaborative team

Strong documentation and knowledge-sharing practices

Cultural sensitivity for working in an international environment

Preferred Qualifications

Experience in national laboratories or major research computing facilities

Experience with GPU scheduling and resource management in Kubernetes

Background in DevOps or Site Reliability Engineering (SRE)

Contributions to open-source cloud-native or MLOps projects

Publications or presentations on MLOps, Kubernetes, or automation topics

Knowledge of Saudi Arabia's Vision 2030 and national AI initiatives

Additional certifications: AWS/Azure/GCP, Terraform, NVIDIA DLI

Qualifications

Bachelor's or master’s degree in computer science, Data Science, Computational Science, Artificial

Intelligence, or a related field

Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application

Developer), CKS (Certified Kubernetes Security Specialist), or CNPE (Certified Cloud Native Platform

Engineer) are highly valued

Experience

Minimum of 2 years of relevant experience

Similar jobs

ML Operations & Customer Support Engineer, Staff/Senior Staff level - Riyadh, KSA

Qualcomm

Riyadh, Saudi Arabia

6 days ago

Devops Engineer, Senior

Adree

Riyadh, Saudi Arabia

6 days ago

Platform Engineering Expert

Takamol Holding

Riyadh, Saudi Arabia

8 days ago

Open Shift Engineer

Visible Stars

Riyadh, Saudi Arabia

8 days ago

Forward Deployed Engineer

Nameless Ventures

Riyadh, Saudi Arabia

8 days ago

Commvault Senior Engineer

Emdad By Elm

Riyadh, Saudi Arabia

8 days ago

Term of use Privacy policy