Senior Site Reliability Engineer

End Date

Sunday 24 May 2026

We Support Flexible Working – Click here for more information on flexible working options

Flexible Working Options

Hybrid Working

Job Description Summary

We are looking for an experienced Senior Site Reliability Engineer to join our Cloud Enabling team, playing a key role in strengthening the resiliency, availability, and security of large-scale platforms. You will support high‑throughput, Kubernetes‑based systems serving millions of customers, while driving improvements in cloud infrastructure, monitoring, and CI/CD practices. This role requires strong hands-on expertise in SRE, cloud-native architectures, and automation across hybrid and public cloud environments. You will act as a technical leader, defining SLAs/SLOs, improving incident management, and enabling operational excellence at scale. The position offers an opportunity to innovate using modern technologies, including AI-driven tooling, within a large and complex enterprise environment.

Job Description

Job Titel: Senior Site Reliability Engineer

Location: Hyderabad

Position: Full time

Years of experience: 6 to 14

About this opportunity

We're seeking an experienced Site Reliability Engineer to join the Cloud Enabling team within the Personalised Experiences and Communication Platform. This role is crucial in maturing our SRE capability and contributing to the resiliency, availability and security of our infrastructure and software. The ideal candidate will have a strong background in one or multiple fields including SRE, software engineering, data engineering or AI/MLOps. In addition, the candidate will have experience supporting applications at scale, serving high-throughput, having had built and supported complex hybrid-cloud architectures. The candidate is also expected to have worked extensively with Kubernetes-based workloads, networking and monitoring/logging solutions. An engineering mindset and experience working with large complex organisations are preferable.

What you’ll do:

Support systems that serve millions of customers and billions of requests monthly, ensuring their availability, scalability and resiliency
Act as a key technical individual contributor within PEC and liaising with SRE guilds, driving improvements to our cloud deployments, monitoring solutions, CI/CD pipelines and optimising cost
Drive innovation by exploring new technologies and methodologies to improve our SRE capabilities, including exploring AI tooling and automation opportunities
Experience with managing high-throughput systems in production to deliver customer value that extends past POCs
Hands-on technical expertise with implementing SLAs/SLOs/SLIs for a range of software and data teams
Implementing tooling that allows the business to perform triage of incidents more efficiently, have more granular alerting, well-defined runbooks and auto-resolving mechanisms
Act as a subject matter expert in engineering conversations relating to site reliability engineering, fostering a culture of continued learning and development within and across our lab.
Why Lloyds Banking Group
We're on an exciting transformation journey and there could not be a better time to join us. The investments we're making in our people, data, and technology are leading to innovative projects, fresh possibilities and countless new ways for our people to work, learn, and thrive.
What you’ll need
Hands-on proven experience of software development, testing, monitoring, and operational stability at scale.
Production experience with k8s and monitoring tools such as Datadog/Dynatrace/etc.
Proven experience and knowledge of automation and CI/CD and best practices
Proven experience of running postmortems, defining SLAs/SLIs/SLOs and participating in support rotas
Coding/scripting experience developed in a commercial/industry setting (python/bash)
Database knowledge, streaming and batch operations and designing APIs
Proficient with Kubernetes (ideally microservice architectures using istio service mesh)
Extensive experience of Cloud native solutions (ideally Google Cloud).
Good understanding of cloud storage, networking, and resource provisioning.

Similar jobs

No similar jobs found

Term of use Privacy policy