Technical Expertise and Required Qualifications
- 5–8 years of experience in ML engineering , DevOps , or data platform engineering , with at least 2 years in MLOps or model operations.
- Proficiency in Python , particularly for automation, data processing, and ML model development.
- Solid experience with SQL and distributed query engines (e.g., Trino , Spark SQL ).
- Deep expertise in Docker , Kubernetes , and cloud-native container orchestration tools (e.g., OCI Container Engine , EKS , GKE ).
- Working knowledge of open-source data lakehouse frameworks and data versioning tools (e.g., Delta Lake , Apache Iceberg , DVC ).
- Familiarity with model deployment strategies, including batch , real-time inference , and edge deployments .
- Experience with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins) and MLOps frameworks (Kubeflow, MLflow, Seldon Core).
- Competence in implementing monitoring and logging systems (e.g., Prometheus , ELK Stack , Datadog ) for ML applications.
- Strong understanding of cloud platforms (OCI, AWS, GCP) and IaC tools (Terraform, CloudFormation).
- Bachelor’s or Master’s degree in Computer Science , Data Science , or a related technical discipline.
Preferred Qualifications
- Experience integrating AI workflows with Oracle Data Lakehouse , Databricks , or Snowflake .
- Hands-on experience with orchestration tools like Apache Airflow , Prefect , or Dagster .
- Exposure to real-time ML systems using Kafka or Oracle Stream Analytics .
- Understanding of vector databases (e.g., Oracle 23ai Vector Search ).
- Knowledge of AI governance , including model explainability, auditability, and reproducibility frameworks.
Soft Skills
- Strong problem-solving skills and an automation-first mindset.
- Excellent cross-functional communication , especially when collaborating with data scientists, DevOps, and platform engineering teams.
- A collaborative and knowledge-sharing attitude, with good documentation habits.
- Passion for continuous learning , especially in the areas of AI/ML tooling, open-source platforms, and data engineering innovation.
- Design, implement, and automate ML lifecycle workflows using tools like MLflow , Kubeflow , Airflow and OCI Data Science Pipelines .
- Build and maintain CI/CD pipelines for model training, validation, and deployment using GitHub Actions , Jenkins , or Argo Workflows .
- Collaborate with data engineers to deploy models within modern data lakehouse architectures (e.g., Apache Iceberg , Delta Lake , Apache Hudi ).
- Integrate machine learning frameworks such as TensorFlow , PyTorch , and Scikit-learn into distributed environments like Apache Spark , Ray , or Dask .
- Operationalize model tracking, versioning, and drift detection using DVC , model registries, and ML metadata stores.
- Manage infrastructure as code (IaC) using tools like Terraform , Helm , or Ansible to support dynamic GPU/CPU training clusters.
- Configure real-time and batch data ingestion and feature transformation pipelines using Kafka , Goldengate and OCI Streaming .
- Collaborate with DevOps and platform teams to implement robust monitoring, observability , and alerting with tools like Prometheus , Grafana , and the ELK Stack .
- Support AI governance by enabling model explainability, audit logging, and compliance mechanisms aligned with enterprise data and security policies.