FIND_THE_RIGHTJOB.

maawaabro it solutions pvt ltd

MLOps Engineer (Triton + GPU + Production AI)

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Job Description – MLOps Engineer (Triton + GPU + Production AI)

Immediate joining.
Employment Type: Full-time
Project: OTRAS – Next-Gen AI-based Government Exam & Recruitment Platform

MLOps Engineer (Triton + GPU + Production AI)

Role: MLOps Engineer

Experience: 5–10 Years

Location: Andhra Pradesh

Salary: ₹1,00,000 – ₹1,50,000 per month

About the Role

We are building OTRAS, India’s largest next-gen AI-based examination platform serving 250M+ candidates per year.

We need an experienced MLOps Engineer who can productionize large AI/ML models (OMR, OCR, face recognition, fraud detection) using NVIDIA Triton, ONNX, TensorRT, and GPU pipelines.

You will be responsible for deploying, scaling, monitoring, and optimizing AI workloads in a distributed Kubernetes environment.

Key Responsibilities

Model Deployment & Serving

Deploy PyTorch/TensorFlow models on NVIDIA Triton Inference Server
Convert models to ONNX and optimize using TensorRT
Implement batching, dynamic batching, and GPU scheduling
Build scalable inference APIs (HTTP/gRPC)

Infrastructure & Automation

Deploy and manage AI workloads on Kubernetes (GPU node pools)
Automate model CI/CD using GitHub Actions + ArgoCD
Setup model versioning, canary deployments, and rollback workflows
Manage the Triton model repository & configs

Monitoring & Optimization

Implement inference metrics (latency, TPS, GPU utilization)
Setup monitoring using Prometheus + Grafana
Optimize inference speed and memory with TensorRT
Run load tests for 10M+ inference events

Data & Pipelines

Build ETL workflows for AI datasets
Automate dataset cleaning, preprocessing
Integrate with ClickHouse / S3 storage
Create pipelines for:
✔ OMR data ingestion✔ ID card OCR✔ Face detection & liveness scoringSecurity & Reliability
Ensure secure model access (token-based + mTLS)
Handle production failures, logs, distributed tracing
Implement AI/ML model audit trails

Required Skills

4+ years experience in MLOps or ML Engineering
Strong hands-on with:
✔ NVIDIA Triton Inference Server✔ ONNX / ONNX Runtime✔ TensorRT✔ PyTorch or TensorFlow✔ CUDA (basic understanding)
Strong in Docker & Kubernetes

Experience with CI/CD
Knowledge of GPU scaling, batching, and memory optimization
Experience working with large-scale ML systemsBonus Skills

Experience with Airflow or Kubeflow
Experience with model quantization
Familiarity with computer vision
Knowledge of message queues (Kafka)
Worked on AI for ID verification / OMR / OCR

Why Join OTRAS?

Build India’s first AI-powered exam infrastructure
Work with Go microservices + Kubernetes + Triton
Massive impact (250M candidates)
Fast-moving, high-performance engineering culture
High visibility role with strong growth

Job Types: Full-time, Permanent, Volunteer

Pay: ₹180,000.00 - ₹1,080,070.03 per year

Benefits:

Health insurance
Life insurance
Provident Fund

Ability to commute/relocate:

Guntur, Andhra Pradesh: Reliably commute or planning to relocate before starting work (Required)

Work Location: In person

Similar jobs

Maintenance Engineer (Exp 3+ yrs, CNC/VMC Machines)

Dimension

India

3 days ago

Electrical Service Technician (Operation Engineer)

Kazam EV Tech Private Limited

India

4 days ago

Data Scientist - AGM/ DGM/ GM

UltraTech Cement

Mumbai, India

4 days ago

AI Engineer - Legal & Enterprise automation

Vyapi Lextech Solutions Pvt Ltd.

India

4 days ago

Term of use Privacy policy