Role Overview:
Support the deployment, scaling, optimization, and monitoring of AI/ML models in production environments. Work closely with data scientists and developers to ensure models run efficiently, reliably, and with fast inference performance.
Key Responsibilities
- Develop, maintain, and deploy ML/AI models into production environments.
- Build and serve model inference APIs using frameworks like FastAPI.
- Optimize models for better inference performance including quantization and model compression.
- Package and containerize models using Docker and manage deployments with orchestration tools (e.g., Kubernetes).
- Set up CI/CD pipelines and automation workflows for model deployment.
- Monitor model performance, latency, and reliability in production.
- Troubleshoot and resolve deployment, infrastructure, or inference issues.
- Collaborate with ML Engineers, Data Scientists, and DevOps teams to streamline workflows.
Skills & Requirements
- Proficiency in Python and familiarity with building REST APIs (FastAPI, Flask).
- Experience deploying ML models and serving them reliably.
- Understanding of model optimization techniques such as quantization for faster inference.
- Knowledge of Docker and container orchestration (e.g., Kubernetes).
- Familiarity with CI/CD tools and automation workflows.
- Ability to monitor and troubleshoot production systems.
Nice to Have
- Experience with model versioning and ML lifecycle tools.
- Exposure to any cloud platform (AWS, GCP, Azure).
- Understanding of performance profiling and benchmarking tools.