Qureos

Find The RightJob.

Engineering Manager - Machine Learning

What you’ll do:

Job Responsibilities
The Manager will be responsible for:
  • Lead and manage a team of Engineers to deploy and monitor machine learning models in production.
  • Working with data engineers for designing data engineering pipelines and performs robust ETL processes to ensure reliable, high‑quality data for analytics and ML workloads.
  • Collaborate with cross-functional teams, including data science, engineering, and operations, to understand business requirements and translate them into scalable ML solutions.
  • Architect and implement end-to-end machine learning pipelines for model training, testing, deployment, and monitoring.
  • Establish best practices and standards for model versioning, deployment, and monitoring to ensure reliability, scalability, and performance.
  • Implement automated processes for model training, hyperparameter tuning, and model evaluation using tools such as Weight and Biases, MLflow, Kubeflow, or similar.
  • Design and implement infrastructure for scalable and efficient model serving and inference, leveraging technologies such as Kubernetes, Docker, and serverless computing.
  • Develop and maintain monitoring and alerting systems to detect model drift, performance degradation, and other issues in production.
  • Provide technical leadership and mentorship to team members, fostering their professional growth and development.
  • Stay current with emerging technologies and industry trends in machine learning engineering, and evaluate their potential impact on our processes and infrastructure.
  • Collaborate with product management to define requirements and priorities for machine learning model deployments and validation, ensuring alignment with business goals and objectives.
  • Implement monitoring and logging solutions to track model performance metrics, resource utilization, and system health, enabling proactive issue detection and resolution.
  • Lead efforts to optimize resource utilization and cost-effectiveness of machine learning infrastructure, including compute resources, storage, and data transfer.
  • Stay abreast of advancements in machine learning technologies, evaluating their applicability and potential impact on our AI Operations strategy and roadmap.
  • Foster a culture of innovation, collaboration, and continuous improvement within the AI Operations team, encouraging experimentation and learning from failures.

Qualifications:

  • B.tech / M Tech in Computer Science, Electronics or related fields
  • 8 Years +

Skills:


  • Machine Learning, Software Development
  • Research and development, Technology strategy, Global Project Management, Team Management, Mentoring, Risk Management.
  • Desired Skills :
    • Masters or Bachelor's degree in Computer Science, Engineering, or related field
    • 8+ years of experience in software engineering, data engineering, or related roles, with at least 2 years in a managerial or leadership role.
    • Experience in Designs and maintains scalable data engineering pipelines and performs robust ETL processes to ensure reliable, high‑quality data for analytics and ML workloads
    • Previous experience in a leadership or management role, with a track record of successfully leading technical teams and delivering high-impact projects.
    • Experience with version control systems (e.g., Git) and collaboration tools (e.g., GitHub, GitLab) for managing code repositories and facilitating team collaboration.
    • Familiarity with infrastructure as code (IaC) tools such as Terraform or CloudFormation for provisioning and managing cloud resources.
    • Knowledge of software development methodologies (e.g., Agile, DevOps) and best practices for building scalable and reliable software systems.
    • Ability to effectively communicate technical concepts and solutions to non-technical stakeholders, including executives, product managers, and business users.
    • Strong proficiency in Python, JAVA and related IDEs
    • Awareness of machine learning concepts, algorithms, and frameworks (e.g. TensorFlow, PyTorch, sci-kit-learn).
    • Experience with cloud platforms and services (e.g., Azure, AWS, GCP) for building and deploying machine learning applications.
    • Proficiency in containerization technologies (e.g., Docker) and orchestration tools (e.g., Kubernetes).
    • Hands-on experience with MLOps tools and platforms such as Weight and Biase, MLflow, Kubeflow, TFX, or similar.
    • Experience in DevOps and DevSecOps tools and practices
    • Strong problem-solving skills and ability to troubleshoot complex issues in production environments.
    • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Similar jobs

No similar jobs found

© 2026 Qureos. All rights reserved.