Role & responsibilities:
- Build and maintain containerized applications using OpenShift, OpenShift AI, Kubernetes, and Helm charts.
- Integrate and optimize inference engines such as Triton and vLLM for scalable model serving.
- Lead model deployment, monitoring, and lifecycle management in production environments.
- Implement monitoring and alerting solutions using Grafana and Prometheus.
- Collaborate on GenAI and LLM projects, including Agentic AI initiatives.
- Automate CI/CD pipelines and infrastructure using Jenkins, Ansible, Groovy, and Terraform.
- Develop automation scripts and tools in Python.
- Architect, deploy, and manage AI/ML solutions on AWS Cloud; experience with Bedrock and SageMaker is a plus.
- Build and enhance AI Platform ( both on premise and in public cloud).
- Make is scalable, high performance and resilient
Contribute to future road map and key architecture decisions
Requirements:
- Strong skills in OpenShift, OpenShift AI, Kubernetes, Helm, and container orchestration.
- Hands-on experience with inference engines (Triton, vLLM) and model serving.
- Proficiency in model deployment, monitoring, and MLOps best practices.
- Familiarity with Grafana, Prometheus, Jenkins, Ansible, Groovy, Terraform, and Python.
- Understanding of GenAI, LLMs, and Agentic AI concepts.
- Experience with AWS Cloud services; Bedrock and SageMaker knowledge is desirable.
- Excellent problem-solving and communication skills.