We are looking for an experienced
MLOps Engineer (GCP)
to design, operationalize, deploy, monitor, and scale production-grade AI/ML solutions on Google Cloud Platform (GCP). In this role, you will build reliable, secure, and automated end-to-end machine learning platforms and pipelines while enabling seamless collaboration between Data Scientists, AI Engineers, Platform Teams, and Operations teams.
You will play a key role in ensuring machine learning models are consistently trained, versioned, deployed, monitored, and governed across their lifecycle using GCP-native technologies, particularly Vertex AI.
Key Responsibilities
-
Design and implement scalable end-to-end MLOps architectures using GCP-native services.
-
Build standardized frameworks for model training, deployment, monitoring, retraining, and governance.
-
Deploy and manage ML models using Vertex AI Endpoints for online and batch inference.
-
Implement model versioning, rollout/rollback strategies, and traffic splitting for production deployments.
-
Build and automate CI/CD pipelines for ML workflows and model deployment.
-
Develop automated ML pipelines using Vertex AI Pipelines and ensure reproducibility across environments (development, testing, and production).
-
Integrate source control, testing frameworks, and artifact repositories into ML workflows.
-
Monitor model performance, model drift, data quality, and system reliability.
-
Implement observability, logging, alerting mechanisms, and service-level objectives (SLOs) for ML systems.
-
Define retraining triggers and support incident analysis and remediation of production ML services.
-
Ensure scalability, security, compliance, and alignment with enterprise cloud architecture standards.
-
Collaborate closely with Data Scientists, AI Engineers, Data Engineers, Platform Teams, and business stakeholders.
Requirements
Experience
-
5+ years of experience in ML Engineering, DevOps, MLOps, or related engineering roles.
-
Minimum 3+ years of recent hands-on experience with Google Cloud Platform (GCP) (mandatory).
-
Strong production experience deploying and managing ML systems at scale.
Technical Skills
-
Strong hands-on experience with Google Cloud Platform (GCP).
-
Deep expertise with Vertex AI including Pipelines, Endpoints, Model Registry, and Monitoring.
-
Strong understanding of CI/CD practices, infrastructure automation, and ML lifecycle management.
-
Experience with Docker and containerization/orchestration concepts.
-
Strong Python programming skills for ML workflows and automation.
-
Experience with ML monitoring, observability, reliability, and scalability practices.
-
Knowledge of model versioning, deployment automation, and production operations.
Education & Certifications
-
Bachelor’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
-
GCP certifications such as Professional Cloud DevOps Engineer or equivalent are a strong plus.
Preferred Candidate Profile
-
Strong problem-solving mindset with a focus on automation and reliability.
-
Experience working in cross-functional AI/ML environments.
-
Ability to work in production-grade cloud environments and drive operational excellence for ML systems.
-
Strong communication and stakeholder collaboration skills.
-
Fluent English, Arabic is a plus