Overview:
Join the Prodapt team in building the Model Automation Platform—a unified, cloud-native system that streamlines and automates the entire model development lifecycle. You will help design, develop, and optimize a scalable platform that enables rapid onboarding, training, evaluation, and governance of machine learning models across business units.
Responsibilities:
- Design, develop, and maintain automation pipelines for model training, evaluation, and deployment using Kubernetes and GCP.
-
Build and enhance the unified web UI for managing model development, logging, evaluation, and dataset creation.
-
Integrate CI/CD pipelines for automated model refresh, versioning, and lifecycle management.
-
Develop and manage database-backed systems for storing model metadata, training datasets, and automation manifests.
-
Implement and optimize pipeline/template systems for workflow automation and onboarding new models.
-
Ensure robust separation and monitoring of research and production environments.
-
Develop dashboards for monitoring model refreshes, training jobs, and automation statistics.
-
Automate model logging, including source code, parameters, metrics, and datasets for governance and reproducibility.
-
Collaborate with data scientists, ML engineers, and platform teams to deliver scalable, reliable solutions.
-
Participate in code reviews, architecture discussions, and continuous improvement of the platform
Requirements:
-
Proficiency in Python and experience with ML/AI model development (TensorFlow, PyTorch, scikit-learn, MLflow).
-
Experience with Kubernetes and Google Cloud Platform (GCP) for orchestration, compute, and storage.
-
Hands-on experience with CI/CD tools and automation of ML workflows.
-
Strong understanding of database systems for metadata and artifact management.
-
Familiarity with web UI development and integration (React, Angular, or similar frameworks).
-
Experience with containerization (Docker) and cloud-native deployment patterns.
-
Knowledge of model governance, logging, and reproducibility best practices.
-
Excellent troubleshooting, debugging, and communication skills.
-
Experience with agentic frameworks (LangChain, CrewAI) and LLM integration.
-
Familiarity with large-scale financial/ML platforms.
-
Experience with pipeline/template management systems and workflow automation.
-
Exposure to model security, vulnerability scanning, and compliance automation.
-
Knowledge of monitoring, metrics, and dashboarding tools.