Role Description
It is a remote job for a 3-month contract (extendable). Design and operate production-grade ML systems that are reliable, scalable, and fully observable.
Own the transition of models from experimentation to high-throughput services.
Prevent model decay and reduce data science operational load through automation and monitoring.
Qualifications
- 5+ years in Data Science/ML with 3+ years in MLOps or Production Engineering
- Proven experience deploying and maintaining production ML services
- Strong Python and SQL expertise
- Experience working with cloud platforms (AWS, Azure, or GCP)
- Background integrating ML with business systems (ERP, CRM, Supply Chain)
- Experience collaborating with Data Engineering and DevOps teams
- Ability to build resilient systems in legacy or imperfect environments
Responsibilities
- Architect and deploy ML models across the full lifecycle
- Implement monitoring for data drift, model drift, and performance degradation
- Build automated retraining pipelines and rollback mechanisms
- Develop CI/CD workflows for model versioning and reproducibility
- Optimize inference performance and reduce latency & compute costs
- Collaborate with teams to operationalize models for business applications
Must Have
- Expert-level Python and API development (FastAPI/Flask)
- Containerization & orchestration experience (Docker, Kubernetes)
- Experience with ML pipelines & orchestration tools (Airflow, Prefect, Dagster)
- Hands-on model monitoring & observability implementation
- Experience deploying real-time or high-throughput inference systems
Nice to Have
- Experience with MLflow, DVC, Kubeflow, or Weights & Biases
- Knowledge of model optimization (quantization, pruning, caching)
- Experience with Prometheus, Grafana, or advanced monitoring tools
- Familiarity with Spark, Databricks, or Snowflake ecosystems
- Experience with SageMaker, Vertex AI, or Azure ML platforms
Job Type: Full-time
Work Location: Remote