MLOps Engineer

Hyderabad, Pakistan

Internal Job Description

About the role

We're seeking an exceptional MLOps / DevOps Engineer to join Opella's AI-driven digital transformation journey. In this role, you'll architect, implement, and maintain scalable ML pipelines and infrastructure. You'll work at the intersection of data science and engineering, ensuring our AI solutions are robust, compliant, and deliver measurable business value.

Who you are

Mid-level MLOps/DevOps Professional: You have ~4-6 years of experience deploying and maintaining machine learning systems in production.
Collaborative Team Player: Proven track record of working in agile, cross-functional teams alongside data scientists, software engineers, and product managers.
Effective Communicator: You can translate complex technical concepts into clear terms for stakeholders and write thorough, accessible documentation.
Agile & Adaptable: Comfortable in a fast-paced environment, able to iterate quickly and embrace new technologies and best practices.
Problem Solver: You take ownership of challenges with a proactive, automation-first mindset, continually seeking to improve efficiency and reliability.
Quality-Focused: You understand the importance of compliance and precision in a regulated industry and strive for excellence and stability in every deployment.

Key responsibilities

Processing on AWS EMR: Provision, configure and optimize AWS EMR clusters for large-scale data ingestion and distributed model training. Define EMR step workflows, tuning Spark jobs for performance and cost-efficiency, and integrate EMR with S3, IAM roles, and AWS Glue for ETL orchestration.
ML Pipeline Development: Design, build, and maintain scalable pipelines for data ingestion, model training, and deployment to support our marketing AI initiatives (e.g. using Databricks for big data processing and MLflow for model tracking).
Model Deployment & Serving: Deploy machine learning models as robust, secure services – containerize models with Docker and serve them via FastAPI on AWS – ensuring low-latency predictions for marketing applications.
CI/CD Automation: Implement continuous integration and delivery (CI/CD) pipelines for ML projects. Automate testing, model validation, and deployment workflows using tools like GitHub Actions to accelerate delivery.
Model Lifecycle Management: Orchestrate the end-to-end ML lifecycle, including versioning, packaging, and registering models. Maintain a model repository/registry (MLflow or similar) for reproducibility and governance from experimentation through production.
Monitoring & Optimization: Monitor model performance, data drift, and system health in production. Set up alerts and dashboards (e.g. with CloudWatch or Prometheus) and proactively initiate model retraining or tuning to sustain accuracy and efficiency over time.
Infrastructure as Code: Leverage Infrastructure-as-Code (Terraform preferred) to provision and manage cloud resources on AWS. Ensure consistent, reproducible environments for development, testing, and production.
Collaboration & Support: Work closely with data scientists to understand model requirements and constraints, helping to refactor code or optimize algorithms for production use. Collaborate with DevOps and IT teams to integrate ML services into the broader tech ecosystem and adhere to security best practices.
Compliance & Documentation: Ensure all ML pipelines and services adhere to company policies and industry regulations (e.g. data privacy and security standards). Create and maintain clear documentation for workflows, validations, and operational procedures to support audits and knowledge transfer.
Performance Tuning: Optimize system performance and cost-efficiency. This may include tuning Spark jobs on Databricks, right-sizing AWS resources, improving API throughput, and implementing caching or other enhancements to meet SLAs.
Innovation: Stay up-to-date with the latest MLOps tools and industry best practices. Continuously evaluate and champion new ideas, frameworks, or processes that could enhance our ML platform’s reliability, scalability, and speed of delivery.

Functional qualifications

Experience: 4+ years in MLOps/DevOps roles delivering production ML or data-driven applications.
Domain Expertise: Familiarity with regulated industries (e.g., pharmaceuticals), with a strong focus on documentation, data privacy (e.g., GDPR), and quality control.
Collaboration: Proven ability to work with data science, analytics, and software engineering teams to create scalable solutions.
Agile Mindset: Skilled in Agile/Scrum, adept at breaking down complex tasks and adapting quickly to changes.
Communication: Excellent written and verbal skills to clearly explain MLOps best practices to both technical and non-technical audiences.
Problem Solving: Proactive in troubleshooting issues across data, code, and infrastructure with thorough testing and monitoring.
Continuous Learning: Eager to adopt emerging technologies in cloud, ML, and DevOps.

Technical qualifications

AWS EMR: Hands-on experience provisioning, tuning and managing EMR clusters for large-scale Spark workloads; integrating with S3, Glue and IAM; monitoring via CloudWatch.
Programming: Proficiency in Python for building data pipelines, automation scripts, and backend services, with Visual Studio Code as the preferred IDE.
Cloud Infrastructure: Solid AWS skills using EC2, S3, Lambda, IAM, and ideally AWS SageMaker or similar. Experience in Azure will be valued positively
ML Lifecycle: Skilled in using MLflow for experiment tracking, model versioning, and registry.
Workflow Automation & Scheduling: Experience using Airflow for orchestrating data pipelines and ML workflows.
API Development: Build RESTful APIs/microservices using FastAPI (or Flask).
Containers: Proficient with Docker; familiar with Kubernetes or AWS ECS/EKS for orchestration.
CI/CD & Version Control: Set up CI/CD pipelines (GitHub Actions) and manage Git workflows efficiently.
Databases & SQL: Strong SQL abilities using relational databases and Snowflake.
Monitoring & Logging: Use AWS CloudWatch, Prometheus, or ELK stack for monitoring and logging.

Similar jobs