Qureos

FIND_THE_RIGHTJOB.

Application Infrastructure Observability Engineer (BANKING Only)

Abu Dhabi, United Arab Emirates

Job Description: Application, Microservices, and Infrastructure Observability EngineerOverall Objectives

  • Ensure comprehensive, end-to-end visibility into the health, performance, and reliability of applications, microservices, and infrastructure across on-premise and cloud environments.
  • Implement and manage modern observability tools to support real-time insights, distributed tracing, and predictive analytics for early issue detection and resolution.
  • Drive incident prevention, reduce Mean Time to Resolution (MTTR), and enhance system resilience through data-driven monitoring, automated alerts, and root cause analysis.
  • Collaborate with DevOps, Development, and Infrastructure teams to foster a performance-centric culture in high-transaction environments.

Role-Specific Responsibilities

  • Design, implement, and maintain observability solutions across applications, microservices, and infrastructure using tools such as Prometheus, Grafana, Dynatrace, and OpenTelemetry.
  • Leverage telemetry data (logs, metrics, traces) to identify and troubleshoot issues across compute, network, storage, and application layers.
  • Enable distributed tracing and service mapping to diagnose performance bottlenecks and inter-service dependencies in microservices architectures.
  • Support performance engineering by optimizing code-level performance, transaction processing, and infrastructure scalability during peak loads or major releases.
  • Define and implement automated remediation triggers and escalation paths to minimize manual intervention and improve incident response times.

General Functional Responsibilities

  • Ensure compliance with enterprise standards and regulatory frameworks (e.g., GDPR, PSD2) for monitoring and data collection.
  • Collaborate with infrastructure, application, and security teams to enhance data ingestion, correlation, and observability maturity (progressing from reactive to predictive monitoring).
  • Participate in post-incident reviews and performance retrospectives to identify trends, reduce MTTR, and improve overall reliability.
  • Provide out-of-hours support (L1/L2) for critical incidents as part of a rotating on-call schedule.

Required Skills & Qualifications

  • Strong expertise in observability platforms: Prometheus, Grafana, Dynatrace, OpenTelemetry, ELK/EFK Stack.
  • Proficiency in cloud platforms: AWS, Azure, or GCP, including cloud-native monitoring services.
  • Hands-on experience with Kubernetes, Docker, and containerized microservices environments.
  • Solid understanding of CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, Azure DevOps).
  • Strong knowledge of infrastructure monitoring (compute, storage, network) and application performance monitoring (APM).
  • Familiarity with scripting and automation: Python, Bash, PowerShell, or Go.
  • Experience with incident management tools (PagerDuty, Opsgenie, ServiceNow) and alerting frameworks.
  • Good understanding of ITIL processes, incident response, and root cause analysis.
  • Strong communication and collaboration skills to work effectively with cross-functional teams.
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).

Key Tools & Technologies (Highlighted):

  • Prometheus | Grafana | Dynatrace | OpenTelemetry
  • AWS | Azure | GCP
  • Kubernetes | Docker
  • CI/CD (Jenkins, GitLab, GitHub Actions, Azure DevOps)
  • Scripting (Python, Bash, Go, PowerShell)
  • APM, Telemetry (Logs, Metrics, Traces), Distributed Tracing

Job Type: Contract
Contract length: 12 months

© 2025 Qureos. All rights reserved.