We are seeking an experienced Senior MLOps Engineer to join our team and help operationalize and manage machine learning models. You will play a critical role in optimizing deployment processes, scaling infrastructure, and ensuring the reliability of machine learning solutions in production environments.
Responsibilities
-
Develop and maintain automation tools for continuous integration and deployment (CI/CD) of machine learning models
-
Ensure robust monitoring, logging, and alerting systems for models in production to quickly identify and address any issues
-
Collaborate closely with data scientists and data engineers to enhance model architectures and performance
-
Manage the infrastructure and resources required for deploying models in production, including servers, data storage, and computational resources
-
Implement version control and change management processes for machine learning models
-
Proactively identify, diagnose, and resolve performance bottlenecks and anomalies in deployed models
-
Enforce data security best practices and ensure compliance with regulatory requirements
-
Stay updated on the latest advancements in machine learning technologies, tools, and industry standards
Requirements
-
A degree in Computer Science, Engineering, Statistics, or a related field
-
3+ years of experience with MLOps workflows and practices
-
Experience with tools such as Docker, Kubernetes, Jenkins, or similar for managing containerized applications and automating workflows
-
Proficiency in programming languages widely used in machine learning, such as Python and R
-
Strong understanding of machine learning frameworks (e.g., TensorFlow, PyTorch) as well as experience in model management
-
Experience with cloud platforms (AWS, Azure, Google Cloud) and understanding of scalable architectures
-
Strong experience with monitoring tools and best practices, including setting up automated alerts, dashboards, and logging pipelines for production ML systems
-
Familiarity with observability stacks like Prometheus, Grafana, or similar solutions
-
Excellent problem-solving skills and ability to collaborate effectively within a multi-disciplinary team
-
Strong communication skills, with the ability to articulate complex technical details to non-technical stakeholders
-
English level of minimum B2 (Upper-Intermediate) for effective communication
We offer
-
CONTINUOUS UPSKILLING, LEARNING & DEVELOPMENT
-
Diversity of tasks and projects
-
Assessment center for objective review of competency level
-
Personal development plan
-
Mentoring programs and leadership development
-
Certification and professional development support
-
Access to learning platforms including more than 2,500 internal courses and the LinkedIn Learning library with 20,000+ courses
-
English courses taught by certified teachers
-
CORPORATE BENEFITS
-
Extra leave days
-
Referral bonuses
-
COMPENSATION PACKAGE
-
Competitive compensation paid in USD
-
Regular salary and performance reviews
-
MEDICAL & HEALTHCARE
-
Private health insurance
-
Well-being events
-
WORKING ENVIRONMENT
-
Recreation areas and kitchens
-
Tea, coffee, and snacks
-
Well-being events
-
Sports equipment and game consoles
-
IT Equipment
-
Microsoft's Software Assurance Home Use Program (HUP)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.