Job Purpose
To ensure end-to-end governance of SLAs and Incident Management for business-critical applications, guaranteeing service availability, rapid incident resolution, and compliance with agreed service targets while minimizing business disruption and driving continuous service improvement.
This role acts as the central authority for SLA performance, major incident coordination, and vendor accountability across all application services.
Key Responsibilities
1. SLA Management & Governance
- Define, review, and enforce Service Level Agreements (SLAs), OLAs, and KPIs aligned with business needs
- Monitor SLA performance and proactively address breaches and risks
- Lead periodic SLA reviews with business stakeholders and vendors
- Ensure service catalog and SLA definitions remain updated and relevant
- Drive service performance reporting (availability, response, resolution, uptime)
2. Incident, Major Incident Management and Problem Management
- Own end-to-end incident management lifecycle (logging, prioritization, escalation, resolution, closure) [Applicatio...Procedure | Outlook]
- Lead Major Incident (P1/P2) bridge calls and war rooms and coordinate all resolver teams
- Ensure rapid service restoration through clear prioritization and direction
- Maintain communication governance (internal + business communication cadence)
- Conduct Post-Incident Reviews (PIR) and track corrective actions
- Own Problem Management and RCAs from initiation till closure.
3. Vendor & Third-Party Management
- Govern vendor performance against contractual SLAs and support obligations
- Act as the single point of escalation for vendor-related incidents and SLA breaches
- Review vendor deliverables, incident reports, and root cause analysis (RCA)
- Ensure vendors comply with incident communication and escalation protocols
4. Service Performance & Reporting
- Produce executive-level dashboards for:
- SLA compliance
- Incident trends & volumes
- MTTR / MTTA
- Availability metrics
- Identify recurring issues and initiate problem management actions to reduce incidents
- Support data-driven decision making for service improvement
5. Process Governance & Continuous Improvement
- Establish and maintain standard operating procedures (SOPs) for incident and SLA management
- Drive transition from reactive incident handling to proactive prevention
- Ensure compliance with ITSM processes (Incident, Problem, Change)
- Lead continuous improvement initiatives for support operations
6. Stakeholder Management
- Act as the primary interface between Business, IT, and Vendors
- Provide clear communication during major incidents and service outages
- Align service delivery with business priorities and critical services
All qualified applicants will receive consideration for employment without regard to age, religion, gender, nationality or disability. All qualified candidates will be considered in the process