Key Responsibilities1. ML Pipeline Development
- Build and automate end-to-end ML pipelines (data ingestion → preprocessing → training → deployment).
- Develop reusable workflows using tools like Kubeflow, Airflow, MLflow, DVC, Metaflow.
- Implement scalable model training on cloud GPUs/CPUs.
2. Model Deployment & Serving
- Deploy ML & LLM models to production using:
- Docker, Kubernetes
- AWS Sagemaker, GCP Vertex AI, Azure ML
- ONNX, Triton Inference Server, Ray Serve
- Implement high-availability model endpoints with autoscaling.
3. CI/CD for Machine Learning
- Build automated CI/CD pipelines for ML workflows using GitHub Actions, GitLab CI, or Jenkins.
- Integrate continuous training (CT) and continuous deployment (CD) for models.
- Version management for code, data, and models.
4. Monitoring & Observability
- Monitor model performance (accuracy drift, data drift, concept drift).
- Set up logs, metrics, alerts using Prometheus, Grafana, ELK, or similar tools.
- Maintain dashboards for model health and prediction quality.
5. Data Engineering & Governance
- Collaborate with data engineers to maintain robust data pipelines.
- Ensure data quality, lineage, versioning, governance, and compliance.
- Manage feature stores (Feast, Tecton, Vertex Feature Store).
6. Infrastructure & Cloud Management
- Build and maintain cloud infrastructure for ML workloads.
- Optimize model serving costs and performance.
- Manage containerized environments using Docker & Kubernetes.
7. Collaboration & Best Practices
- Work closely with data scientists, engineers, and product teams.
- Translate ML requirements into scalable infrastructure solutions.
- Establish and enforce MLOps best practices across teams.
Required Skills & QualificationsTechnical Skills
- Strong experience in Python, ML libraries, and API development.
- Expertise with ML platforms: Airflow, MLflow, DVC, Kubeflow, Ray, Tecton.
- Hands-on experience with Docker, Kubernetes, Terraform/Ansible.
- Strong command of AWS/GCP/Azure cloud ML services.
- Knowledge of CI/CD pipelines.
- Experience with monitoring tools (Prometheus, Grafana, ELK).
- Familiarity with data versioning, ETL pipelines, and feature stores.
- Understanding of LLM Ops for deploying large language models.
Soft Skills
- Strong communication and documentation abilities.
- Analytical, problem-solving mindset.
- Ability to collaborate with cross-functional teams.
- Independent ownership of ML deployments and infra decisions.
Preferred Qualifications
- Bachelor’s or Master’s in Computer Science, AI/ML, Data Science, or related fields.
- Certifications: AWS ML Specialty, Google ML Engineer, Kubernetes CKA/CKAD.
- Experience with LLM fine-tuning and vector databases.
- Prior production experience with NLP/CV/LLM models.
Key KPIs
- Uptime and availability of ML model APIs.
- Deployment frequency and automation success rate.
- Reduction in training/serving costs.
- Monitoring and alert efficiency (drift detection, performance stability).
- Faster lead time from model development to production.
Why Join Us
- Work with cutting-edge AI/ML technologies.
- Opportunity to build a world-class MLOps infrastructure.
- Fast-paced environment with strong career growth.
- Work on impactful AI projects across industries.
Job Types: Full-time, Part-time, Freelance
Pay: ₹271,719.81 - ₹1,166,266.45 per year
Work Location: In person