Qureos

FIND_THE_RIGHTJOB.

Site Reliability Engineer - SRE

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

On-prem infrastructure management

Manage on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers. Implement monitoring, alerting, and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches.

Observability

Set up and manage monitoring and logging tools such as Prometheus, Grafana, or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins, Python and ELK.

Improve monitoring systems by adding custom alerts based on business needs.

Tech stack

Baremetal data center machine management tools like IPMI, Redfish, KVM etc.

Automation using Jenkins, Python, Go, Bash.

Infrastructure tools like Kubernetes, MySQL, Prometheus, Grafana and ELK.

Any familiarity with hardware like GPU & Tegras is a plus

Job Types: Full-time, Contract

Pay: $150,000.00 - $160,000.00 per year

Work Location: In person

© 2025 Qureos. All rights reserved.