Qureos

FIND_THE_RIGHTJOB.

Principal Software Engineer - Network Reliability Engineering - AI

Santa Clara, United States

Oracle Cloud Infrastructure (OCI) provides mission-critical cloud services to enterprises worldwide. The Network Reliability Engineering (NRE) Automation, Reporting, and Tooling team builds innovative solutions that boost the productivity and efficiency of the Global Network Operations Center (GNOC). Our tooling empowers the GNOC and Network Reliability Engineering (NRE) teams with observability, automation, and actionable insights at hyperscale.

As a Principal Software Engineer, you will design, build, and deliver scalable automation frameworks and advanced platforms leveraging AI to drive operational excellence across OCI’s global network. This includes building AI agents that accelerate issue resolution for the GNOC team, as well as developing robust tools that provide intelligent data insights, enable natural language search in systems like Jira and data lakes, reduce operational toil, and ultimately keep OCI’s network running smoothly and securely.

You are passionate about developing software that solves real-world operational challenges, thrive in a fast-paced team, and are comfortable working with complex distributed systems. You value simplicity, scalability, and collaboration.


  • Design, implement, test, and deploy large-scale automation, reporting, and productivity tools for OCI’s global network operations.
  • Lead the design and development of intelligent systems using Large Language Models (LLMs)
  • Develop AI agents to enable natural language querying of Jira data providing context-aware answers to user questions about Jira issues, projects, and metrics.
  • Collaborate with GNOC and NRE engineers to gather requirements and deliver impactful solutions.
  • Build and maintain observability dashboards and data pipelines that drive decision-making and root cause analysis.
  • Develop auto-remediation, orchestration, and workflow automation services for operational tasks.
  • Ensure high availability, reliability, and performance of developed solutions in production environments.
  • Participate in code reviews, mentor peers, and help build a culture of engineering excellence.
  • Own and drive multiple technical projects and priorities in an agile, collaborative environment.

Required Qualifications:

  • 8 - 10 years of experience in software engineering, automation development, or similar roles.
  • Bachelors in computer science and Engineering or related engineering fields
  • Strong coding skills in Java, Python, or a comparable programming language.
  • Experience developing context-aware, intelligent systems leveraging LLMs for real-world operational workflows.
  • Experience with distributed systems, microservices, and cloud-native technologies.
  • Hands-on expertise with Linux environments and scripting languages.
  • Proficiency with data modeling, data analysis, and reporting frameworks (e.g., SQL, Spark, Prometheus, Grafana, etc.).
  • Understanding of network operations or large-scale IT infrastructure.
  • Excellent problem-solving, organizational, and communication skills.

Preferred Qualifications:

  • Experience developing automation and orchestration tools for network or cloud operations.
  • Background in creating dashboards, alerts, and real-time reporting platforms.
  • Familiarity with workflow automation (e.g., Apache Airflow), CI/CD pipelines, or infrastructure as code.
  • Previous experience supporting or building tools for NOC, GNOC, or SRE teams.
  • Knowledge of cloud platforms, REST APIs, and service-oriented architecture.
  • Familiarity with agile methodologies and DevOps practices.
  • Experience with ticketing and version control systems (e.g., Jira, Git).

© 2025 Qureos. All rights reserved.