Qureos

FIND_THE_RIGHTJOB.

Site Reliability Developer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

  • 5+ years of SRE/DevOps/Automation experience in a large-scale infrastructure and cloud services.
  • Experience in Compute, Database, Network, Storage in a Cloud Infrastructure environment
  • Good knowledge of Grafana, LumberJack, Shepherd, Bit Bucket, Code Reviews and Scripting.
  • Deploy, Operate and maintain large scale Cloud Service products.
  • Familiarity with docker containers, Multi-Tenant, Virtualized Infrastructure and Patching Orchestration.
  • Experience in operating CI/CD related systems, Linux Systems, Terraform, Java and Python.
  • Keen Troubleshooting skills for improving performance, availability, reliability and scalability.
  • Improve our offerings through Deep Analysis, Diagnose, on-call rotations and resolve issues
  • Aptitude to be a good team player and the desire to learn and implement new Cloud technologies as needed

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

© 2025 Qureos. All rights reserved.