FIND_THE_RIGHTJOB.
JOB_REQUIREMENTS
Hires in
Not specified
Employment Type
Not specified
Company Location
Not specified
Salary
Not specified
We are seeking an experienced Observability Engineer to design, build, and scale modern observability platforms across cloud-native and on-premises environments. This role focuses on custom data integration, time-series data at scale, and user-facing web experiences, enabling teams and stakeholders to gain clear, actionable insights into system health, performance, and reliability.
Key Responsibilities
Design, implement, and operate end-to-end observability frameworks encompassing metrics, logs, traces, and events.
Integrate observability solutions across cloud-native and on-premises infrastructures.
Build and maintain scalable data ingestion pipelines to collect telemetry from diverse sources via APIs and custom integrations.
Manage and optimize time-series data for high-volume, high-cardinality environments.
Develop custom dashboards, s, and reporting views that translate complex telemetry into actionable insights.
Implement dashboards-as-code and ensure consistency, versioning, and reusability.
Design and maintain user-friendly status pages and health dashboards to communicate real-time system health and incidents to internal and external stakeholders.
Enhance the overall web experience for observability consumers.
Leverage tools such as Prometheus, Grafana, and Dynatrace for monitoring, visualization, and deep-dive performance analysis.
Implement best practices for ing, anomaly detection, and root cause analysis.
Partner closely with Development, Operations, and SRE teams to define meaningful SLIs, SLOs, and KPIs (latency, availability, error rates, saturation).
Promote observability best practices and standards across teams.
Use data-driven insights to improve system reliability, reduce Mean Time to Resolution (MTTR), and optimize infrastructure and application performance.
Required Skills & Qualifications
Strong hands-on experience with Prometheus, Grafana, and Dynatrace.
Deep understanding of time-series data and query languages such as PromQL.
Solid knowledge of API design, data modeling, and telemetry data pipelines.
Proficiency in scripting and backend development using Python, Go, or Java.
Proven experience designing and scaling production-grade observability stacks.
Hands-on experience with cloud platforms such as AWS and/or Azure.
Strong exposure to containerized environments, especially Kubernetes.
Experience with Infrastructure as Code and dashboards-as-code (e.g., Terraform).
Excellent analytical, problem-solving, and troubleshooting skills.
Strong communication and stakeholder management capabilities.
Ability to translate complex technical data into clear, actionable recommendations for both technical and non-technical audiences.
Dynatrace,Prometheus,Aws,Data Modelling
Similar jobs
No similar jobs found
© 2026 Qureos. All rights reserved.