We are looking for a
high-impact Observability Engineer
with proven experience in
fintech, banking, or other regulated environments
to design and scale enterprise-grade observability systems. This role is critical in ensuring
high availability, low latency, and full-stack visibility
across mission-critical financial platforms, while supporting
compliance, auditability, and incident response readiness
.
Key Responsibilities:
-
Own and evolve the end-to-end observability architecture across applications, infrastructure, and cloud environments
-
Centralize metrics, logs, traces, and events with high reliability and scalability
-
Design and enforce SLOs, SLIs, and error budgets for critical financial systems
-
Build advanced real-time dashboards and business-aligned KPIs for engineering and leadership
-
Develop intelligent alerting frameworks to minimize noise and enable faster incident resolution
-
Ensure observability pipelines are resilient, scalable, and cost-optimized
-
Collaborate with DevOps and engineering teams to implement instrumentation, distributed tracing, and logging standards
-
Integrate observability systems with incident management, on-call, and escalation workflows
-
Support compliance, audit, and forensic analysis through structured logging and traceability
-
Drive root cause analysis (RCA) and continuous improvement of system reliability
-
Automate monitoring, alerting, and data enrichment workflows
Requirements
-
6 to 10 years of experience in Observability, SRE, or Monitoring Engineering roles
-
Mandatory experience in fintech, banking, or highly regulated environments
-
Strong hands-on expertise with:
-
Monitoring: Dynatrace, Prometheus, Grafana
-
Logging: Elastic Stack (ELK), Splunk, Fluentbit, Logstash
-
Alerting & Correlation: Dynatrace, ELK, Splunk Alertmanager
-
Proficiency in PromQL, SPL, KQL for advanced log/metric analysis
-
Experience developing high-performance, scalable dashboards in Grafana and Kibana, integrating application, infrastructure, and business KPIs for end-to-end observability
-
Deep understanding of distributed systems observability and performance monitoring
-
Experience with high-throughput, low-latency systems
-
Experience with enterprise monitoring tools such as Riverbed and SolarWinds for network performance monitoring (NPM), application visibility, traffic analysis, and infrastructure health tracking across distributed systems
Core Expertise:
-
Observability pillars: metrics, logs, traces, events
-
Golden signals: latency, traffic, errors, saturation
-
SLO/SLI-driven reliability engineering
-
Alert design with high signal-to-noise ratio
-
Telemetry standardization and instrumentation strategies
-
Mapping technical metrics to financial/business KPIs
Preferred Qualifications and FinTech Alignment:
Proven experience supporting audit, compliance, and regulatory requirements within fintech, banking, or other regulated environments
Strong familiarity with industry frameworks such as:
-
PCI DSS
-
ISO 27001
-
SAMA / NCA
-
Solid understanding of data sensitivity, traceability, and audit logging standards for financial systems
-
Experience working on large-scale fintech or digital banking platforms
-
Exposure to CI/CD-integrated observability and DevSecOps practices
-
Proficiency in scripting and automation (Python, Bash)
-
Hands-on experience with incident management and on-call frameworks (e.g., PagerDuty, Opsgenie)
What We're Looking For:
-
A proactive engineer with a strong reliability and performance mindset
-
Ability to translate observability data into actionable insights
-
Experience working cross-functionally with SRE, DevOps, and product teams
-
Ownership-driven individual focused on continuous improvement of monitoring systems