7 - 9 Years
5 Openings
Bangalore, Chennai, Hyderabad, Kochi, Noida, Pune, Trivandrum
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build, scale, and maintain our infrastructure and services. You will play a critical role in ensuring high availability, performance, scalability, and security of our production systems, while enabling continuous deployment and rapid delivery of features to our customers.
Key Responsibilities:
- Design, build, and maintain reliable, scalable, and secure cloud-based infrastructure (AWS, Azure, or GCP).
- Develop and improve observability using monitoring, ing, logging, and tracing tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.).
- Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform, CloudFormation, Pulumi).
- Create and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD, etc.) to support fast and safe delivery.
- Lead incident response, root cause analysis, and postmortems to ensure high uptime and rapid recovery.
- Optimize system performance, reliability, and cost-effectiveness through proactive monitoring and tuning.
- Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
- Implement and maintain security best practices across environments (e.g., secrets management, IAM, firewalls, etc.).
- Maintain disaster recovery plans, backups, and high-availability strategies.
Qualifications:
Required:
- 7+ years of experience as an SRE, DevOps Engineer, or similar role.
- Proficiency in scripting and automation (Bash, Python, Go, etc.).
- Strong experience with containerization and orchestration (Docker, Kubernetes, Helm).
- Solid understanding of Linux systems administration and networking fundamentals.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Experience with IaC tools like Terraform or CloudFormation.
- Familiarity with GitOps and modern deployment practices.
- Hands-on experience with observability tools (e.g., Prometheus, Grafana, Datadog).
- Strong troubleshooting and incident response skills.
Preferred:
- Experience in a high-traffic, microservices-based architecture.
- Exposure to service meshes (Istio, Linkerd).
- Certifications (AWS Certified DevOps Engineer, CKA, etc.)
- Experience with security automation and compliance (e.g., SOC2, ISO27001).
Soft Skills:
- Strong communication and collaboration abilities.
- Ability to thrive in a fast-paced, agile environment.
- Analytical mindset and proactive approach to problem-solving.
- A passion for automation, performance, and system design.
DevOps/ SRE, Bash/ Python/ Go, Docker, Kubernetes, Helm, Cloud, Prometheus/ Grafana
UST is a global digital transformation solutions provider. For more than 20 years, UST has worked side by side with the world’s best companies to make a real impact through transformation. Powered by technology, inspired by people and led by purpose, UST partners with their clients from design to operation. With deep domain expertise and a future-proof philosophy, UST embeds innovation and agility into their clients’ organizations. With over 30,000 employees in 30 countries, UST builds for boundless impact—touching billions of lives in the process.