Find The RightJob.
We are seeking an experienced DevOps Engineer to support a high-impact, multi-workstream AI program focused on enabling the rapid, reliable, and secure delivery of AI/ML applications and services. This role is 100% dedicated to AI initiatives, covering the full lifecycle from early experimentation through full production deployment.
The DevOps Engineer will be responsible for building and operating CI/CD pipelines, cloud infrastructure, and runtime platforms that support data-intensive and model-driven workloads. Working closely with Product Teams, Data Engineers, and Data Scientists, this role plays a critical part in streamlining the AI model lifecycle while ensuring enterprise-grade reliability, scalability, security, and governance.
This is an excellent opportunity for a DevOps professional interested in working at the intersection of cloud engineering, automation, and AI delivery.
Key Responsibilities
Platform & CI/CD Pipelines
Design, implement, and maintain CI/CD pipelines for applications, APIs, batch jobs, platforms, and data pipelines.
Automate model build, testing, packaging, versioning, and deployment across development, test, and production environments.
Establish standardized and repeatable deployment patterns to support rapid iteration and reliable environment promotion.
Cloud & Infrastructure
Provision and manage cloud infrastructure (compute, networking, storage) optimized for AI workloads, including batch, containerized, and serverless runtimes.
Implement Infrastructure as Code (IaC) using Terraform (primary) and/or Bicep.
Design and support environment promotion strategies aligned with enterprise architecture standards.
Containers & Orchestration
Build, deploy, and optimize containerized services using Docker.
Operate workloads on managed container platforms such as OpenShift and AKS.
Implement autoscaling, resiliency, and operational hardening best practices.
Observability & Reliability
Establish and maintain monitoring, logging, and tracing across applications, AI models, and data pipelines.
Proactively troubleshoot platform, pipeline, and runtime issues using tools such as Datadog, Azure Monitor, and Application Insights.
Ensure high availability, performance, and operational stability of AI services.
Security, Compliance & Governance
Embed security controls into CI/CD pipelines, including automated code scanning, vulnerability detection, secrets management, and CodeQL analysis.
Implement RBAC, identity management, and data protection aligned with enterprise security and compliance requirements.
Ensure auditability across source code, pipelines, and runtime environments.
Collaboration & Enablement
Partner closely with Data Scientists and ML Engineers to streamline the AI/ML lifecycle from experimentation to production.
Create and maintain documentation, runbooks, reusable templates, and reference architectures.
Promote DevOps and platform best practices across teams.
Required Qualifications
3–5+ years of experience in DevOps or Platform Engineering, including production support.
Strong experience with Windows Server administration and troubleshooting (Linux experience is a plus).
Hands-on expertise with CI/CD tools such as Azure DevOps, GitHub Actions, and Ansible.
Solid experience with cloud platforms (Azure and AWS), including networking, identity, and storage.
Deep proficiency with Infrastructure as Code, with Terraform as the primary tool.
Strong scripting and automation skills (PowerShell required; Python is a plus).
Hands-on experience with Docker and Kubernetes, including deployments, scaling, upgrades, and operational hardening.
Experience implementing monitoring and observability solutions (e.g., Datadog, Azure Monitor, Application Insights).
Strong understanding of cost governance, resource design, and platform architecture best practices.
Proven experience implementing security best practices across pipelines and runtime environments (e.g., RBAC, secrets, artifact repositories).
Excellent communication skills with the ability to collaborate across product, data, security, and architecture teams.
Preferred Qualifications
Experience operating AI/ML workloads, including:Model packaging and artifact registries
Model versioning
Controlled and staged rollouts
Familiarity with ML platforms and toolchains (e.g., MLflow, Azure ML, or equivalent).
Understanding of data engineering concepts, including batch and streaming pipelines, data quality, schemas, and version management.
Experience with GPU-aware deployments and cost/performance optimization for training and inference workloads.
Experience working in large enterprise or highly regulated environments.
Ways of Working
Core collaboration hours aligned with Houston / Central Time preferred.
Operate within Agile methodologies (Scrum, sprint planning, daily stand-ups).
Participate in daily and weekly ceremonies and be available during critical delivery windows.
Embrace a culture of transparency, accountability, and continuous improvement.
Success Measures
Reduced lead time for changes and faster, more reliable model-to-production cycles.
High service availability, scalability, and performance.
Strong security posture and auditability across platforms, pipelines, and runtime environments.
Positive feedback from Architecture, Data Science, Product, and Security stakeholders.
Top 3 Skills for Success
CI/CD Tooling
Infrastructure as Code (Terraform – primary)
Cloud Platforms (Azure & AWS)
Similar jobs
Muller's Solutions
Riyadh, Saudi Arabia
about 4 hours ago
Sparc technology services inc
United States
about 6 hours ago
Metropolitan Transportation Authority
New York, United States
about 6 hours ago
KAUST (King Abdullah University of Science and Technology)
Saudi Arabia
about 7 hours ago
Walmart
Sunnyvale, United States
about 7 hours ago
© 2026 Qureos. All rights reserved.