Site Reliability Engineer
Location: Raleigh, NC, United States
Job Type: Contract (Onsite)
Hours: 40 hrs/week, Monday–Friday, 9:00 AM – 5:00 PM
Eligibility: Only U.S. Citizens and Green Card Holders (No H1B, OPT, CPT or other work visas)
About the Role:
We are seeking experienced Site Reliability Engineers (SREs) to ensure the reliability, scalability, and performance of critical enterprise platforms. This hands-on role requires expertise in cloud infrastructure, Linux/Windows systems, automation, and observability, and involves working closely with cross-functional teams to deliver highly available and resilient services.
Key Responsibilities:
- Design, implement, and maintain reliable, scalable, and secure systems across cloud and on-prem environments.
- Manage distributed systems running on Azure, Linux (RHEL7+), and Windows Server 2019+.
- Build and enhance automation workflows using Python, Go, Bash.
- Develop Infrastructure-as-Code (IaC) solutions with Terraform, Ansible, or similar tools.
- Define, monitor, and improve SLIs, SLOs, and SLAs to ensure consistent service quality.
- Reduce operational toil through automation, tooling enhancements, and process improvements.
- Integrate systems with observability platforms for proactive issue detection.
- Troubleshoot complex incidents, lead incident response, and conduct post-mortem analyses.
- Collaborate with software engineering, infrastructure, and business teams to optimize system reliability, performance, and maintainability.
Requirements:
- Proven experience as a Site Reliability Engineer or similar role in software engineering, infrastructure, or operations.
- Hands-on experience with cloud platforms (Azure) and enterprise OS (Linux RHEL7+, Windows Server 2019+).
- Knowledge of networking and storage (NFS, SAN, NAS).
- Familiarity with DNS, LDAP, Kerberos, Centrify authentication services.
- Proficiency in Python, Go, Bash scripting and automation.
- Practical experience with Terraform, Ansible, or other IaC tools.
- Ability to design, monitor, and improve SLIs, SLOs, and SLAs.
- Experience integrating with modern observability platforms.
- Strong communication and collaboration skills with cross-functional teams.
- Calm, structured, and solution-oriented during high-pressure incidents.
- Proactive, ownership-driven mindset with a focus on continuous improvement.
Skills:
Site Reliability Engineering, Azure, Linux (RHEL7+), Windows Server 2019+, Networking, NFS, SAN, NAS, DNS, LDAP, Kerberos, Centrify, Python, Go, Bash, Terraform, Ansible, IaC, Observability, SLIs/SLOs/SLAs, Automation, Incident Response, Metrics-Driven Reliability, System Performance, Cross-Functional Collaboration, Operational Excellence
Job Type: Full-time
Pay: $55.00 per hour
Work Location: In person