Position: Observability SRE – Monitoring Specialist
Location: Abu Dhabi (Onsite)
Language Requirement: English, Conversational Emirati advantageous
Reporting to: Observability Lead
Role Overview
The Observability SRE Monitoring Specialist will play a pivotal role in a high-stakes, rapidly evolving environment, driving urgent enhancements in monitoring capabilities for a large government client in the region. This position demands exceptional technical proficiency, rigorous problem-solving skills, and resilience under significant pressure. The successful candidate will thrive in challenging situations, delivering operational excellence and maintaining superior visibility and reliability of critical services. This role is not for the faint-hearted; it is for someone prepared to operate effectively in a demanding and testing environment.
Key Responsibilities
Monitoring Best Practices:
- Swiftly establish, enforce, and continuously evolve monitoring standards and best practices across the client’s critical IT operations.
- Proactively optimize monitoring configurations to enhance system visibility and urgently reduce alert fatigue.
- Continuously adapt and update monitoring practices to meet rapidly changing business and operational requirements.
Technical Expertise:
- Rapid deployment and expert management of Dynatrace monitoring tools, including dashboards, synthetic monitoring, and advanced alerting mechanisms.
- Proficiency in deploying and managing complementary technologies such as Prometheus, Grafana, and OpenTelemetry for detailed insights and rapid troubleshooting.
- Ensure seamless integration and operational alignment of all monitoring tools within the observability framework under tight deadlines.
Operational Excellence:
- Active participation in high-pressure, 24x7 on-call rotations, with a strong ability to urgently address and resolve critical incidents.
- Closely collaborate with Incident and Event Management teams to expedite issue detection, resolution, and prevention.
- Lead swift and thorough root-cause analyses (RCAs), focusing on the rapid implementation of monitoring-related improvements and proactive solutions.
Cross-Team Collaboration and Education:
- Provide decisive technical leadership and guidance to L1/L2 teams, enhancing rapid response and proactive monitoring capabilities.
- Conduct urgent, impactful cross-functional training sessions to improve the team’s understanding of monitoring requirements and best practices.
- Educate and empower teams with proactive monitoring strategies to enhance operational effectiveness.
- Maintain clear, decisive, and timely communication with all stakeholders, ensuring proactive transparency and clarity in crisis situations.
Essential Qualifications and Experience
- 4–7 years of experience in high-pressure Monitoring, SRE, or DevOps roles.
- Advanced proficiency and hands-on expertise in Dynatrace monitoring solutions.
- Solid, demonstrable experience with Prometheus, Grafana, and OpenTelemetry.
- Proven track record of rapidly optimizing monitoring setups, effectively reducing noise, and significantly improving system visibility and reliability.
- Fluent English communication skills, adept at clearly conveying critical technical information to all stakeholders, particularly under pressure.
Preferred Qualifications and Certifications
- Dynatrace Associate or Professional Certification
- ITIL Foundation Certification
- Kubernetes Certification (CKAD or CKA)
Ideal Candidate Profile
- Technically adept professional with exceptional resilience and the ability to thrive in high-pressure and challenging operational environments.
- Proactive individual capable of decisively managing urgent priorities and rapidly changing demands.
- Strong communicator, able to swiftly translate complex monitoring strategies into actionable insights for both technical and non-technical audiences.
- Determined problem solver, adept at maintaining composure and delivering results in high-stakes scenarios.
Immediate Objectives (First 90 Days)
- Rapidly establish and reinforce monitoring best practices to significantly improve operational reliability and visibility.
- Swiftly address and reduce alert noise, ensuring proactive issue detection and response mechanisms are robustly in place.
- Develop and implement comprehensive monitoring dashboards and reports to provide real-time operational transparency and decision-making support.
Job Type: Full-time
Application Question(s):
- How many total years of professional experience do you have in Monitoring / Observability / SRE / DevOps roles?
- How many years of hands-on production experience do you have using Dynatrace?
- Do you currently hold any of the following certifications?
Dynatrace Associate or Professional
ITIL Foundation
Kubernetes (CKA / CKAD)
- Are you able to work full-time onsite in Abu Dhabi and participate in 24x7 on-call rotations?
- Have you worked in a large enterprise, regulated, or government environment?