FIND_THE_RIGHTJOB.
JOB_REQUIREMENTS
Hires in
Not specified
Employment Type
Not specified
Company Location
Not specified
Salary
Not specified
Job Specification: AI Platform Engineer
About the Role
We are seeking an AI Platform Engineer to build and scale the infrastructure that powers
our production AI services. You will take cutting-edge models—ranging from speech
recognition (ASR) to large language models (LLMs)—and deploy them into highly
available, developer-friendly APIs.
You will be responsible for creating the bridge between the R&D team, who train models,
and the applications that consume them. This means developing robust APIs, deploying
and optimizing models on Triton Inference Server (or similar frameworks), and ensuring
real-time, scalable inference.
Responsibilities
● API Development
○ Design, build, and maintain production-ready APIs for speech, language, and
other AI models.
○ Provide SDKs and documentation to enable easy developer adoption.
● Model Deployment
○ Deploy models (ASR, LLM, and others) using Triton Inference Server or
similar systems.
○ Optimize inference pipelines for low-latency, high-throughput workloads.
● Scalability & Reliability
○ Architect infrastructure for handling large-scale, concurrent inference
requests.
○ Implement monitoring, logging, and auto-scaling for deployed services.
● Collaboration
○ Work with research teams to productionize new models.
○ Partner with application teams to deliver AI functionality seamlessly through
APIs.
● DevOps & Infrastructure
○ Automate CI/CD pipelines for models and APIs.
○ Manage GPU-based infrastructure in cloud or hybrid environments.
Requirements
● Core Skills
○ Strong programming experience in Python (FastAPI, Flask) and/or
Go/Node.js for API services.
○ Hands-on experience with model deployment using Triton Inference Server,
TorchServe, or similar.
○ Familiarity with both ASR frameworks and LLM frameworks (Hugging
Face Transformers, TensorRT-LLM, vLLM, etc.).
● Infrastructure
○ Experience with Docker, Kubernetes, and managing GPU-accelerated
workloads.
○ Deep knowledge of real-time inference systems (REST, gRPC, WebSockets,
streaming).
○ Cloud experience (AWS, GCP, Azure).
● Bonus
○ Experience with model optimization (quantization, distillation, TensorRT,
ONNX).
○ Exposure to MLOps tools for deployment and monitoring
Job Types: Full-time, Permanent
Pay: From ₹50,000.00 per month
Experience:
Work Location: In person
Similar jobs
No similar jobs found
© 2025 Qureos. All rights reserved.