Find The RightJob.

Site Reliability Developer

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5+ years experience of running large scale customer facing web services.

Responsibilities

Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier enterprises around the world. Our cloud offers unmatched hyper-scale, multi-tenant services deployed in more than 30 regions worldwide. OCI is expanding its mission beyond the traditional boundaries of public cloud to include dedicated, hybrid and multi cloud, edge computing, and more.

At Multicloud Services organization, our mission is to support customer choice, transparency, and value when it comes to cloud infrastructure. We make it easy for our customers to maximize the value of their Oracle investment as well as other clouds or on-premises infrastructure and build highly distributed, scalable, and resilient Multicloud solutions to support their business.

We are looking for hands-on engineers with expertise and passion in solving difficult problems in all areas of cloud service software engineering: high scale distributed systems, virtualized infrastructure, identity, security, observability, and user experience.

We are growing fast, still at an early stage, and working on ambitious new initiatives. An engineer at any level can have significant technical and business impact here. You will be part of a team of smart, motivated, diverse people, and given the autonomy as well as support to do your best work. It is a dynamic and flexible workplace where you’ll belong and be encouraged.

Who are we looking for?

We are looking for a Site Reliability Engineer who will operate and help develop tools for the multi-cloud services. You should be comfortable at defining how to use the latest technologies to identify and optimize operational efficiency. You will be responsible for the infrastructure and reliability of all multi-cloud and other network monitoring services. You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn.

A great SRE will make all the difference for delivering quality solutions to our customers. You will be the subject matter expert for VMware on OCI who is able to resolve complex on-premises to OCI migrations, customer escalations and be the Tier-2 point-of-contact for support.

Are you passionate about designing, developing, testing, and delivering infrastructure for cloud services? Do you thrive in a fast-paced environment, and want to be an integral part of a truly great team? If yes, come join us!

Qualifications:

3 to 5 years of their career in customer-facing roles with a proven record of earning trust, effective collaborations across multiple internal organizations, partners, and customers
Demonstrated experience in architecture, deployment, and management of VMware Solutions. Experience working directly in customer implementations is highly desirable.
Solution Architect level understanding of cloud infrastructure to enable customers to consume cloud-native services
VMware certification like VCP in DCV is a must
Hands-on working/troubleshooting experience on VMware technologies ESX (Hypervisor), vCenter, NSX (SDN), vSAN (storage), HCX
Strong understanding of IP Networking, routing, security, from L1-L7.
Expertise in IaaS and cloud-native application architectures
Strong verbal and written communications skills.
Experience with automation and solving customer adoption barriers through languages with as Python, Go, or others.
Familiar with Software Development Life Cycle and API design.
Possess 3-5 years of IT development or implementation/consulting experience in the software or infrastructure industries and demonstrate an intermediate understanding of applications, server technology, networking, and security.
BS/MS degree required; Computer Science, Math or Engineering degree with technical background highly desired; Advanced Degree a plus.

Preferred Qualifications

Strong Technical background with an ability to troubleshoot issues impacting large-scale service architectures and application stacks.
Familiarity with large scale system monitoring and alerting frameworks
Experience developing repeatable processes and metrics that maximum uptime, reliability, and predictability
Knowledge of cloud computing & networking technologies including monitoring services
Experience with programming or scripting languages is required.
Experience with Jira, Confluence, BitBucket
Knowledge of Scrum & Agile Methodologies
Able to develop and maintain strong relationships with Oracle customers
Experience working with internal customers and translating requests into prioritized work or features
Experience migrating or transforming customer solutions to the cloud.
Familiarity with common enterprise solutions (i.e., Oracle, Microsoft, SAP, VMWare, etc.)
Experience with other cloud platforms including AWS, Azure or Google Compute Platform is a plus.

Similar jobs