Observability Engineer - Dynatrace

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

7 - 9 Years

2 Openings

Hyderabad

Role description

We are seeking an experienced Observability Engineer to design, build, and scale modern observability platforms across cloud-native and on-premises environments. This role focuses on custom data integration, time-series data at scale, and user-facing web experiences, enabling teams and stakeholders to gain clear, actionable insights into system health, performance, and reliability.

Key Responsibilities

Observability Platform Design

Design, implement, and operate end-to-end observability frameworks encompassing metrics, logs, traces, and events.
Integrate observability solutions across cloud-native and on-premises infrastructures.

Data Integration & Ingestion

Build and maintain scalable data ingestion pipelines to collect telemetry from diverse sources via APIs and custom integrations.
Manage and optimize time-series data for high-volume, high-cardinality environments.

Custom Solutions & Visualization

Develop custom dashboards, s, and reporting views that translate complex telemetry into actionable insights.
Implement dashboards-as-code and ensure consistency, versioning, and reusability.

Web Experience & Status Pages

Design and maintain user-friendly status pages and health dashboards to communicate real-time system health and incidents to internal and external stakeholders.
Enhance the overall web experience for observability consumers.

Tooling & Advanced Monitoring

Leverage tools such as Prometheus, Grafana, and Dynatrace for monitoring, visualization, and deep-dive performance analysis.
Implement best practices for ing, anomaly detection, and root cause analysis.

Collaboration & Evangelism

Partner closely with Development, Operations, and SRE teams to define meaningful SLIs, SLOs, and KPIs (latency, availability, error rates, saturation).
Promote observability best practices and standards across teams.

Performance & Reliability Engineering

Use data-driven insights to improve system reliability, reduce Mean Time to Resolution (MTTR), and optimize infrastructure and application performance.

Required Skills & Qualifications

Technical Skills

Strong hands-on experience with Prometheus, Grafana, and Dynatrace.
Deep understanding of time-series data and query languages such as PromQL.
Solid knowledge of API design, data modeling, and telemetry data pipelines.
Proficiency in scripting and backend development using Python, Go, or Java.

Platform & Infrastructure Experience

Proven experience designing and scaling production-grade observability stacks.
Hands-on experience with cloud platforms such as AWS and/or Azure.
Strong exposure to containerized environments, especially Kubernetes.
Experience with Infrastructure as Code and dashboards-as-code (e.g., Terraform).

Soft Skills

Excellent analytical, problem-solving, and troubleshooting skills.
Strong communication and stakeholder management capabilities.
Ability to translate complex technical data into clear, actionable recommendations for both technical and non-technical audiences.

Skills

Dynatrace,Prometheus,Aws,Data Modelling

About UST

UST is a global digital transformation solutions provider. For more than 20 years, UST has worked side by side with the world’s best companies to make a real impact through transformation. Powered by technology, inspired by people and led by purpose, UST partners with their clients from design to operation. With deep domain expertise and a future-proof philosophy, UST embeds innovation and agility into their clients’ organizations. With over 30,000 employees in 30 countries, UST builds for boundless impact—touching billions of lives in the process.

Similar jobs

No similar jobs found

Term of use Privacy policy