Job Description – MLOps Engineer (Triton + GPU + Production AI)
Immediate joining.
Employment Type: Full-time
Project: OTRAS – Next-Gen AI-based Government Exam & Recruitment Platform
MLOps Engineer (Triton + GPU + Production AI)
Role: MLOps Engineer
Experience: 5–10 Years
Location: Andhra Pradesh
Salary: ₹1,00,000 – ₹1,50,000 per month
About the Role
We are building OTRAS, India’s largest next-gen AI-based examination platform serving 250M+ candidates per year.
We need an experienced MLOps Engineer who can productionize large AI/ML models (OMR, OCR, face recognition, fraud detection) using NVIDIA Triton, ONNX, TensorRT, and GPU pipelines.
You will be responsible for deploying, scaling, monitoring, and optimizing AI workloads in a distributed Kubernetes environment.
Key Responsibilities
Model Deployment & Serving
- Deploy PyTorch/TensorFlow models on NVIDIA Triton Inference Server
- Convert models to ONNX and optimize using TensorRT
- Implement batching, dynamic batching, and GPU scheduling
- Build scalable inference APIs (HTTP/gRPC)
Infrastructure & Automation
- Deploy and manage AI workloads on Kubernetes (GPU node pools)
- Automate model CI/CD using GitHub Actions + ArgoCD
- Setup model versioning, canary deployments, and rollback workflows
- Manage the Triton model repository & configs
Monitoring & Optimization
- Implement inference metrics (latency, TPS, GPU utilization)
- Setup monitoring using Prometheus + Grafana
- Optimize inference speed and memory with TensorRT
- Run load tests for 10M+ inference events
Data & Pipelines
- Build ETL workflows for AI datasets
- Automate dataset cleaning, preprocessing
- Integrate with ClickHouse / S3 storage
- Create pipelines for:
- ✔ OMR data ingestion✔ ID card OCR✔ Face detection & liveness scoringSecurity & Reliability
- Ensure secure model access (token-based + mTLS)
- Handle production failures, logs, distributed tracing
- Implement AI/ML model audit trails
Required Skills
- 4+ years experience in MLOps or ML Engineering
- Strong hands-on with:
- ✔ NVIDIA Triton Inference Server✔ ONNX / ONNX Runtime✔ TensorRT✔ PyTorch or TensorFlow✔ CUDA (basic understanding)
- Strong in Docker & Kubernetes
- Experience with CI/CD
- Knowledge of GPU scaling, batching, and memory optimization
- Experience working with large-scale ML systemsBonus Skills
- Experience with Airflow or Kubeflow
- Experience with model quantization
- Familiarity with computer vision
- Knowledge of message queues (Kafka)
- Worked on AI for ID verification / OMR / OCR
Why Join OTRAS?
- Build India’s first AI-powered exam infrastructure
- Work with Go microservices + Kubernetes + Triton
- Massive impact (250M candidates)
- Fast-moving, high-performance engineering culture
- High visibility role with strong growth
Job Types: Full-time, Permanent, Volunteer
Pay: ₹180,000.00 - ₹1,080,070.03 per year
Benefits:
- Health insurance
- Life insurance
- Provident Fund
Ability to commute/relocate:
- Guntur, Andhra Pradesh: Reliably commute or planning to relocate before starting work (Required)
Work Location: In person