Job Description:
Centennial Technologies seeks a mid-level Site Reliability Engineer (SRE) to join our AWS DevOps Support team. The ideal candidate will contribute to ensuring the 24×7 monitoring and production support of critical systems. This role requires hands-on technical expertise and the ability to collaborate with Cloud Architects, and DevOps Engineers.
Key Responsibilities:
- Provide 24×7 monitoring and production support to ensure system availability.
- Meet defined SLAs and service levels in alignment with SRE best practices.
- Minimize manual remediation (“toil”) by developing and implementing automated remediation solutions.
- Collaborate with appropriate teams in the event of system overload, including Application and cloud automation teams.
- Administer/Configure Splunk.
- Support the automation of AWS environment provisioning and build infrastructure from the ground up using Terraform
- Design and develop automation workflows; perform unit testing, conduct reviews, and assess the overall quality of delivered components.
- Develop automation tools, CI/CD pipelines, scripts, and self-service capabilities to support AWS platform operations
- Perform application monitoring, gradual change implementation, and automation for reliability improvement.
- Contribute to Business Continuity and Disaster Recovery (DR) efforts, particularly in cloud-based business continuity.
- Assist in designing Reliability, Maintainability, and Availability (RAM/ARM) for Systems through Fault Tolerance, Redundancy, Distributed/Parallel Processing, and five 9s (i.e., 99.999%).
- Perform Business Continuity, Continuity of Operations (COOP), DR, and Readiness planning, exercises, and testing.
- Perform Switchover/Failover with Cold, Warm, or Hot Start.
Key Qualifications:
- Bachelor’s degree in computer science, Information Technology, or a related field
- 6+ years of experience as an SRE, including at least 3+ years of experience in DevOps engineering
- Experience with infrastructure automation tools (Terraform).
- Proficiency in Splunk administration and configuration.
- Strong knowledge of cloud-based Business Continuity, COOP, DR, and Readiness planning, exercises, and testing.
- Ability to work collaboratively and efficiently in a team.
- Exceptional problem-solving and troubleshooting skills.
- Excellent communication and documentation skills.
- Must have an exceptional work ethic and ability to meet strict deadlines under pressure.
- Must be a U.S. Citizen with the ability to obtain Public Trust Clearance.
Work Conditions
- Location: Hybrid – 4 days onsite in Tysons Corner, VA (preferred DC Metro candidates: VA, DC, MD).
- Job Type: Full-time
About Centennial Technologies Inc.:
Centennial Technologies Inc. is committed to a healthy work-life balance and provides a collaborative and supportive professional environment. We offer flexible PTO, casual work culture, and regular opportunities for career advancement and skills development.
Benefits include:
- Medical, Dental, and Vision Insurance
- Short-Term and Long-Term Disability
- Life Insurance
- 401(k) Retirement Plan
- Paid Time Off and Federal Holidays
Our Culture:
- Supportive work environment that promotes work-life balance
- Performance-based rewards and recognition
- Regular employee feedback and collaboration
- Paid training in emerging technologies and federal compliance
- Client-focused, employee-centered growth
Equal Opportunity Employer: Centennial is an equal opportunity employer and complies with all applicable federal, state, and local employment laws
Job Type: Full-time
Benefits:
- 401(k)
- Dental insurance
- Health insurance
- Vision insurance
Work Location: In person