Job Overview:
This role is conducted within the vision, mission, and strategic plan of the Insurance Authority. Reporting to the Manager of Operations & Monitoring, this role will safeguard the Authority’s IT infrastructure by orchestrating round-the-clock monitoring, first-line incident containment, and data-driven performance oversight. The Senior Specialist will translate real-time telemetry into actionable insights, ensuring robust service continuity, regulatory compliance, and adherence to ISO 20000/27001 controls. By coordinating resolver teams, external vendors, and internal stakeholders, the position drives swift escalation, root-cause remediation, and transparent KPI reporting that informs divisional strategy and audit readiness. The position also champions continuous optimization, automates repetitive tasks, and mentors junior technicians to embed a culture of operational excellence and resilient service delivery standards.
Responsibilities and Tasks:
-
Continuously monitor infrastructure dashboards, event logs, and automated alerts to detect anomalies across compute, storage, network, database, and application layers. Execute scheduled health-checks and capacity threshold reviews, escalating deviations in line with Operations & Monitoring Standard Operating Procedures.
-
Maintain and fine-tune alert rules, polling intervals, and thresholds to maximize coverage while reducing false positives.
-
Coordinate first-line incident response: perform technical triage, open ITSM tickets, and escalate to L2/L3 resolver groups within agreed service levels.
-
Lead post-incident reviews by collecting evidence, contributing to root-cause analysis, and recommending permanent fixes for recurring issues.
-
Track open incidents and problems through closure, ensuring SLA adherence and providing timely status updates to the Manager of Operations & Monitoring.
-
Compile daily, weekly, and monthly operations dashboards, KPI scorecards, and trend analysis for divisional leadership and audit purposes.
-
Create and maintain runbooks, knowledge-base articles, and step-by-step procedures to support consistent execution and rapid knowledge transfer.
-
Identify recurring alerts, performance bottlenecks, or manual tasks and propose optimization or automation opportunities to enhance service resilience.
-
Support the deployment, configuration, and acceptance testing of new monitoring tools, agents, and scripts in accordance with change-management policy.
-
Ensure operational activities comply with internal ITSM governance, cybersecurity controls, and external standards such as ISO 20000 and ISO 27001.
-
Liaise with application owners, infrastructure vendors, and the Service Desk to coordinate maintenance windows, patching schedules, and change activities with minimal business disruption.
-
Coach and mentor junior technicians by sharing best practices, reviewing ticket quality, and fostering a culture of continual service improvement.
-
Perform other job duties as assigned.
Job requirements:
Educational Qualifications (required)
-
Bachelor’s degree in computer science, Software Engineering, Computer Engineering, Information Technology, Information Systems, Data Science, or a related field, or related field
-
Master’s degree preferred
Certifications (required)
-
Relevant Professional Certificate is preferred
Experience
-
2+ years with bachelor’s degree, Position relevant experience is required.
Language
(A1-A2: Basic, B1-B2: Intermediate, C1-C2: Fluent)
-
English (C1), Arabic (C2)
Competencies required:
Core competencies
-
Ethics & Integrity (Beginner)
-
Effective Communication (Beginner)
-
Collaboration & Horizontality (Beginner)
-
Personal Competence (Beginner)
-
Analysis and Problem Solving (Intermediate)
Technical competencies
-
Network Infrastructure Management (Beginner)
-
Cloud Platform Administration (Beginner)
-
Backup & Disaster Recovery (Beginner)
-
End User Device & Asset Management (Beginner)
-
Service Desk & Incident Management (Intermediate)