Site Reliability Engineer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Site Reliability Engineer

Location: Raleigh, NC, United States
Job Type: Contract (Onsite)
Hours: 40 hrs/week, Monday–Friday, 9:00 AM – 5:00 PM
Eligibility: Only U.S. Citizens and Green Card Holders (No H1B, OPT, CPT or other work visas)

About the Role:

We are seeking experienced Site Reliability Engineers (SREs) to ensure the reliability, scalability, and performance of critical enterprise platforms. This hands-on role requires expertise in cloud infrastructure, Linux/Windows systems, automation, and observability, and involves working closely with cross-functional teams to deliver highly available and resilient services.

Key Responsibilities:

Design, implement, and maintain reliable, scalable, and secure systems across cloud and on-prem environments.
Manage distributed systems running on Azure, Linux (RHEL7+), and Windows Server 2019+.
Build and enhance automation workflows using Python, Go, Bash.
Develop Infrastructure-as-Code (IaC) solutions with Terraform, Ansible, or similar tools.
Define, monitor, and improve SLIs, SLOs, and SLAs to ensure consistent service quality.
Reduce operational toil through automation, tooling enhancements, and process improvements.
Integrate systems with observability platforms for proactive issue detection.
Troubleshoot complex incidents, lead incident response, and conduct post-mortem analyses.
Collaborate with software engineering, infrastructure, and business teams to optimize system reliability, performance, and maintainability.

Requirements:

Proven experience as a Site Reliability Engineer or similar role in software engineering, infrastructure, or operations.
Hands-on experience with cloud platforms (Azure) and enterprise OS (Linux RHEL7+, Windows Server 2019+).
Knowledge of networking and storage (NFS, SAN, NAS).
Familiarity with DNS, LDAP, Kerberos, Centrify authentication services.
Proficiency in Python, Go, Bash scripting and automation.
Practical experience with Terraform, Ansible, or other IaC tools.
Ability to design, monitor, and improve SLIs, SLOs, and SLAs.
Experience integrating with modern observability platforms.
Strong communication and collaboration skills with cross-functional teams.
Calm, structured, and solution-oriented during high-pressure incidents.
Proactive, ownership-driven mindset with a focus on continuous improvement.

Skills:

Site Reliability Engineering, Azure, Linux (RHEL7+), Windows Server 2019+, Networking, NFS, SAN, NAS, DNS, LDAP, Kerberos, Centrify, Python, Go, Bash, Terraform, Ansible, IaC, Observability, SLIs/SLOs/SLAs, Automation, Incident Response, Metrics-Driven Reliability, System Performance, Cross-Functional Collaboration, Operational Excellence

Job Type: Full-time

Pay: $55.00 per hour

Work Location: In person

Similar jobs

No similar jobs found

Term of use Privacy policy