Qureos

FIND_THE_RIGHTJOB.

Incident Manager - AWS, Kubernetes, Splunk/Prometheus/Grafana

Hyderabad, Pakistan

    9 - 12 Years
    3 Openings
    Bangalore, Chennai, Hyderabad, Kochi, Pune, Trivandrum


Role description

The Incident & Request Manager is a key leadership role responsible for overseeing incident response and request management across all non-production environments (Dev, QA, UAT, Performance). You will serve as the primary escalation point for project and product delivery teams, ensuring incidents are resolved quickly and requests are fulfilled efficiently. You will also lead continuous improvement efforts by embedding learnings from past incidents into future processes.

You will manage a team of Incident Analysts and Site Reliability Engineers (SREs), partner with DevOps teams to automate detection and response, and collaborate closely with Environment and Change Managers to proactively reduce the recurrence of issues.

Key Responsibilities

Incident Management

  • Own the entire incident lifecycle, from detection and triage to response, resolution, and closure.

  • Act as the primary escalation point for all non-production environment incidents.

  • Lead technical war rooms, coordinating with key stakeholders to resolve critical incidents.

  • Ensure timely escalation to other teams, including Environment, Change, DevOps, Infra, and Security.

  • Track and improve key incident SLAs, such as Mean Time to Resolve (MTTR) and Mean Time to Detect (MTTD).

Request Management

  • Oversee the fulfillment of all requests from project and product delivery teams (e.g., access, entitlements, environment service requests).

  • Collaborate with Intake and DevOps teams to standardize and automate common request types.

  • Ensure all requests are logged, prioritized, and fulfilled within established SLAs.

  • Provide clear and transparent communication to stakeholders regarding request status.

Team Leadership

  • Manage and mentor a team of Incident Analysts and SREs.

  • Ensure 24/7 "follow-the-sun" coverage through effective onshore and offshore team management.

  • Foster a culture of blameless incident management, an automation-first mindset, and continuous learning.

Governance & Root Cause Analysis (RCA)

  • Ensure comprehensive RCA is documented for all incidents.

  • Track all corrective and preventive actions, integrating them into the Change and Environment management processes.

  • Provide leadership with regular trend reporting and insights on incident data.

SRE and DevOps Alignment

  • Work with SREs and DevOps teams to automate incident detection, rollback, and recovery processes.

  • Integrate observability tools like Splunk, Prometheus, and Grafana for proactive monitoring.

Stakeholder Communication

  • Provide timely updates during incidents and communicate any delays in request fulfillment.

  • Publish regular reports on incident trends, RCA outcomes, and SLA adherence.

  • Build and maintain trust with project and product delivery teams through transparent and effective communication.

Required Skills & Experience

  • 8–10 years of experience in Incident Management, Service Operations, or SRE leadership.

  • Proven experience managing Incident Analysts and SRE teams.

  • Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).

  • Deep understanding of ITIL Incident, Problem, and Request Management processes.

  • Excellent crisis management, communication, and stakeholder engagement skills.

Skills

Incident, AWS, Kubernetes, Splunk/Prometheus/Grafana


About UST

UST is a global digital transformation solutions provider. For more than 20 years, UST has worked side by side with the world’s best companies to make a real impact through transformation. Powered by technology, inspired by people and led by purpose, UST partners with their clients from design to operation. With deep domain expertise and a future-proof philosophy, UST embeds innovation and agility into their clients’ organizations. With over 30,000 employees in 30 countries, UST builds for boundless impact—touching billions of lives in the process.

Similar jobs

No similar jobs found

© 2025 Qureos. All rights reserved.