Qureos

Find The RightJob.

SITE RELIABILITY ENGINEERING MANAGER

Site Reliability Engineering Manager

Ensure Reliability of Systems that Move the Nation's Food Supply


Who We Are

US Cold owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.

We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.

If you want to build durable systems that operate in the physical world at scale, this is that opportunity.


The Role

The SRE Manager will design and implement the company’s SRE framework from the ground up.

You will define what reliability means at US Cold.
You will establish SLIs and SLOs.
You will modernize monitoring and incident response.
You will build the playbook others will follow.

This is both a hands-on technical role and a practice-building leadership position.

You will report to the Director of IT Operations and


What Own

  • Establish the company’s first SRE practice including principles, standards, tooling, and operational processes
  • Define SLIs, SLOs, and error budgets across SaaS, on-prem, and custom services
  • Build reliability dashboards and executive-level reporting
  • Implement and evolve observability across logs, metrics, and distributed tracing
  • Mature incident response, outage management, and post-incident review processes
  • Partner with engineering to design resilient systems and reduce operational toil
  • Strengthen CI/CD reliability using safe deploy strategies such as canary and blue/green patterns
  • Implement cost visibility and cloud governance in partnership with Finance
  • Build runbooks, playbooks, and operational standards
  • Establish on-call structures and escalation clarity
  • Assist in hiring, mentoring, and developing future SRE team members

This is foundational work. The systems and practices you design will shape how engineering operates for years.


Technical Environment

  • Azure cloud infrastructure
  • Infrastructure as Code using Bicep, Terraform, or ARM
  • GitHub Actions for CI/CD orchestration
  • Safe deployment patterns including gated releases, canary, and blue/green
  • Observability across logging, metrics, and distributed tracing
  • Python scripting for automation and reliability tooling
  • SaaS integrations, on-prem infrastructure, and custom-built services

What We’re Looking For

  • 5–7+ years in SRE, DevOps, Infrastructure, or Production Engineering
  • Hands-on ownership of production services
  • Proven experience implementing SLIs, SLOs, observability, and automation
  • Leadership in major incident response and post-incident reviews
  • Deep CI/CD expertise, particularly GitHub Actions
  • Strong Python scripting for automation and operational tooling
  • Practical knowledge of cloud cost optimization and FinOps principles
  • Ability to influence cross-functional teams

Education:
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.


Why This Role Is Different

This is not an inherited SRE function.
There is no existing framework to simply maintain.

You will:

  • Define the reliability bar
  • Build the operating model
  • Influence architectural decisions
  • Establish executive-level visibility into system health
  • Create a culture where reliability is engineered, not reactive

This is an opportunity to build something durable inside a company modernizing its core technology platform.


Compensation & Structure

  • Salary Range: $160,000 - $190,000
  • Bonus Eligible
  • Full-time, exempt
  • Reports to:
  • Travel less than 10%
  • Location : Hybrid Greater Philadelphia

Operational Context

This role is primarily technical and office-based, with occasional interaction in operational environments depending on system needs.

Benefits Include


If annual hours are attained, these benefits may apply. Medical, Dental, Vision, Prescription, Legal Insurance, Pet Discount, Critical Illness, Accident Insurance, Hospital Indemnity, Long Term Care + Permanent Life Insurance, Identity Theft Protection, Short Term Disability Insurance, Long Term Disability Insurance, Supplemental Disability Insurance, Basic Life Insurance, Accidental Death and Dismemberment Insurance, Supplemental Life Insurance, Supplemental Spouse Life Insurance, Child Life Insurance, Loan Solution, Health Flexible Spending Account, Dependent Flexible Spending Account, Telemedicine, Virtual Primary Care, Prescription Savings Plan, Prescription Specialty Copay Assistance Program, Weight Management Program, Chronic Condition Management, Care Navigator Program, 24/7 Nurse Line, Expert Medical Opinion, Precious Additions Maternity Program, Health Advocacy, Employee Assistance Program, Digital Cognitive Behavioral Therapy, Digital Physical Therapy, Behavioral and Mental Health Platforms, Auto and home discount program, Secure Travel Protection, Discount Programs, 401(k) plan, Education Assistance, Paid Time Off, Referral program & Commuter Benefit (NJ ONLY).


Physical & Operational Context

May require physical effort associated with using the computer to access information, or occasional standing, walking, lifting needed to carry out everyday activities. Effective communication, vision, and hearing are essential for safety and productivity. Operate scanners, tablets, radios, phones, computers, and other essential equipment as required. Additional work hours may be requested by management to help manage employee production, projects, and/or special events. Engage in frequent personal interaction and communication. Attend in-person meetings and/or training on a regular basis. Possess strong arithmetic and reading skills. Follow verbal instructions, written instructions, and company policies. Work independently and coordinate with others. Fast-paced environment, managing stress and meeting productivity standards.

Additional Information

Job functions may vary based on the area of operation. This description outlines the most common tasks required for the job. Reasonable accommodation may be provided to enable individuals with disabilities to perform essential duties. This job description may not encompass all tasks necessary to complete the role. Collaborate across Software Engineering, Customer Integration Technology, Data Engineering, Infrastructure and Security

#INDIT

© 2026 Qureos. All rights reserved.