FIND_THE_RIGHTJOB.

Senior DevOps Engineer (Kubernetes & AI Infra)

India

Job Summary:

We’re looking for an experienced Senior DevOps Engineer who loves working with Kubernetes and AI-driven applications. In this role, you’ll be responsible for designing, implementing, and maintaining scalable cloud infrastructure while supporting MLOps pipelines for AI workloads.

What You’ll Be Doing:

Building Scalable Infrastructure: You’ll design, implement, and maintain cloud infrastructure using Kubernetes to handle AI and non-AI workloads efficiently.

Developing CI/CD & MLOps Pipelines: Help us automate AI/ML workflows using tools like Kubeflow, MLflow, or Argo Workflows, ensuring seamless deployment and monitoring of AI models.

Optimizing AI Model Deployments: Work with ML engineers to fine-tune LLM models, AI-driven applications, and containerized environments for smooth operation.

Monitoring & Performance Tuning: Keep an eye on Kubernetes clusters and AI workloads, using tools like Prometheus, Grafana, and Loki to ensure high availability and performance.

Automating Everything: Whether it’s infrastructure provisioning (Terraform, Helm) or Kubernetes security best practices, you’ll help enforce efficiency and compliance.

Staying Ahead of the Curve: You’ll have the opportunity to explore and implement emerging AI infrastructure trends, including KServe, Ray, and Triton Inference Server.

What We’re Looking For:

8+ years of experience in DevOps, SRE, or Platform Engineering role, with expertise in Kubernetes and cloud-native DevOps

Strong knowledge of Kubernetes fundamentals (deployments, services, ingress, storage, GPU scheduling, multi-cluster management).

Proficiency in scripting & automation with Python, Bash, or Go, particularly for AI-related workflows.

Hands-on experience with AWS, Azure, or GCP, especially in Kubernetes-based AI/ML infrastructure (e.g., Amazon SageMaker, GKE with AI, Azure ML).

Hands-on experience with model deployment frameworks (NVIDIA Triton, vLLM. TGI etc.)

Experience with Distributed computing, multi-GPU training on kubernetes and on-prem GPU clusters.

Experience managing resource allocation and autoscaling for large training/inference workloads. (KEDA, HPA etc.)

Experience with CI/CD & MLOps tools such as Jenkins, Argo CD, Kubeflow, MLflow, or Tekton.

Familiarity with GenAI model deployment, including fine-tuning, inference optimization, and A/B testing.

Hands-on experience with managed ML services (AWS bedrock, Vertext AI models etc.)

Strong problem-solving skills and a mindset of automating repetitive tasks.

Excellent communication skills to collaborate with ML engineers, data scientists, and software teams.

Bonus Points If You Have:

Experience with LLMOps (Large Language Model Operations) and deploying LLM-based applications at scale.

Knowledge of Vector Databases (FAISS, Weaviate, Qdrant) for AI-driven applications.

Similar jobs

Specialist Data Scientist, Actimize

NICE Actimize

India

9 days ago

Senior Data Scientist

Infoorigin Inc

Uttar Tola, India

9 days ago

Senior Data Analyst - Digital Finance

Novartis

India

9 days ago

Manager, Digital Supply Chain Visualization Engineer

MSD

India

9 days ago

Decision Science Intern

Jupiter

India

9 days ago

Data Engineer III A - GBS IND

Bank of America

India

9 days ago

Data Analyst

McCain Foods

Turigram, India

9 days ago

Term of use Privacy policy