Overview:
Core Technical Skills
-
Python (essential for ML frameworks and automation)
-
Java / Scala (for data pipelines or Spark-based processing)
-
Go or Node.js (for backend and API development)
-
Data Engineering & Processing
-
SQL and NoSQL databases (e.g., PostgreSQL, MongoDB)
-
Data pipelines with Apache Spark, Kafka, Airflow, Beam, or Flink
-
ETL/ELT development and data lake management (e.g., BigQuery, Snowflake, Databricks)
-
APIs and Microservices
-
REST/gRPC API design
-
Containerization using Docker and orchestration via Kubernetes
AI/ML-Specific Skills
-
Model Lifecycle Management
-
Model training, versioning, deployment, and rollback
-
MLflow, Kubeflow, Vertex AI, or SageMaker
-
Frameworks and Libraries
-
TensorFlow, PyTorch, Scikit-learn
-
Hugging Face, LangChain (for LLM integration)
-
Data Science Collaboration
-
Working with Data Scientists for model reproducibility and scalability
-
Feature store design and implementation (e.g., Feast, Tecton)
-
AI Infrastructure
-
GPU/TPU management, distributed training setups
-
Optimization for inference latency and cost efficiency
-
Jenkins, GitHub Actions, GitLab CI, ArgoCD
-
Automated testing and validation of ML pipelines
-
Monitoring and Observability
-
Model drift and data drift detection
-
Logging, tracing, and alerting (Prometheus, Grafana, ELK stack)
-
Cloud Platforms
-
GCP (Vertex AI, BigQuery, Dataflow, GKE)
-
AWS (SageMaker, EKS, Glue, Lambda)
-
Azure (ML Studio, AKS, Data Factory)
-
Security and Compliance
-
IAM, RBAC, network policies, data encryption
-
Compliance with data privacy (GDPR, HIPAA)
Responsibilities:
Key Responsibilities
- Platform Development and Evangelism:
-
Build scalable AI platforms that are customer-facing.
-
Connect Front end-Middleware and Backend
-
Ensure platform scalability, reliability, and performance to meet business needs.
Machine Learning Pipeline Design:
-
Design ML pipelines for experiment management, model management, feature management, and model retraining.
-
Implement A/B testing of models.
-
Design APIs for model inferencing at scale.
-
Proven expertise with MLflow, SageMaker, Vertex AI, and Azure AI.
L
Qualifications:
Core Technical Skills
-
Python (essential for ML frameworks and automation)
-
Java / Scala (for data pipelines or Spark-based processing)
-
Go or Node.js (for backend and API development)
-
Data Engineering & Processing
-
SQL and NoSQL databases (e.g., PostgreSQL, MongoDB)
-
Data pipelines with Apache Spark, Kafka, Airflow, Beam, or Flink
-
ETL/ELT development and data lake management (e.g., BigQuery, Snowflake, Databricks)
-
APIs and Microservices
-
REST/gRPC API design
-
Containerization using Docker and orchestration via Kubernetes
AI/ML-Specific Skills
-
Model Lifecycle Management
-
Model training, versioning, deployment, and rollback
-
MLflow, Kubeflow, Vertex AI, or SageMaker
-
Frameworks and Libraries
-
TensorFlow, PyTorch, Scikit-learn
-
Hugging Face, LangChain (for LLM integration)
-
Data Science Collaboration
-
Working with Data Scientists for model reproducibility and scalability
-
Feature store design and implementation (e.g., Feast, Tecton)
-
AI Infrastructure
-
GPU/TPU management, distributed training setups
-
Optimization for inference latency and cost efficiency
-
Jenkins, GitHub Actions, GitLab CI, ArgoCD
-
Automated testing and validation of ML pipelines
-
Monitoring and Observability
-
Model drift and data drift detection
-
Logging, tracing, and alerting (Prometheus, Grafana, ELK stack)
-
Cloud Platforms
-
AWS (SageMaker, EKS, Glue, Lambda)
-
Security and Compliance
-
IAM, RBAC, network policies, data encryption
-
Compliance with data privacy (GDPR, HIPAA)
Essential skills:
Core Technical Skills
-
Python (essential for ML frameworks and automation)
-
Java / Scala (for data pipelines or Spark-based processing)
-
Go or Node.js (for backend and API development)
-
Data Engineering & Processing
-
SQL and NoSQL databases (e.g., PostgreSQL, MongoDB)
-
Data pipelines with Apache Spark, Kafka, Airflow, Beam, or Flink
-
ETL/ELT development and data lake management (e.g., BigQuery, Snowflake, Databricks)
-
APIs and Microservices
-
REST/gRPC API design
-
Containerization using Docker and orchestration via Kubernetes
AI/ML-Specific Skills
-
Model Lifecycle Management
-
Model training, versioning, deployment, and rollback
-
MLflow, Kubeflow, Vertex AI, or SageMaker
-
Frameworks and Libraries
-
TensorFlow, PyTorch, Scikit-learn
-
Hugging Face, LangChain (for LLM integration)
-
Data Science Collaboration
-
Working with Data Scientists for model reproducibility and scalability
-
Feature store design and implementation (e.g., Feast, Tecton)
-
AI Infrastructure
-
GPU/TPU management, distributed training setups
-
Optimization for inference latency and cost efficiency
-
Jenkins, GitHub Actions, GitLab CI, ArgoCD
-
Automated testing and validation of ML pipelines
-
Monitoring and Observability
-
Model drift and data drift detection
-
Logging, tracing, and alerting (Prometheus, Grafana, ELK stack)
-
Cloud Platforms
-
GCP (Vertex AI, BigQuery, Dataflow, GKE)
-
AWS (SageMaker, EKS, Glue, Lambda)
-
Azure (ML Studio, AKS, Data Factory)
-
Security and Compliance
-
IAM, RBAC, network policies, data encryption
-
Compliance with data privacy (GDPR, HIPAA)
Experience: