Qureos

FIND_THE_RIGHTJOB.

Senior Member of Technical Staff

Austin, United States

Here at OCI we’re building the world’s largest AI clusters and we’re the fastest at bringing them to the market. The AI Infrastructure organization at OCI is leading this effort by creating a GPU focused cloud with the latest hardware providing the best performance, efficiency, reliability, and scalability. This is your chance to be part of the AI revolution by creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance. You will have the opportunity to work with cutting-edge technologies and make a significant impact on our organization's success.

We are seeking Software Engineers who can bring fresh ideas and embrace challenges to scale and optimize AI infrastructure components like GPU control plane and GPU data plane that provide computing resources to customer AI workloads. In this role, you will ensure top performance for AI workloads scheduled on our platform. You will design and develop solutions to enhance our AI infrastructure to deliver exceptional customer experience and peak performance.


Responsibilities
  • Design and develop large-scale distributed software services and solutions to manage AI infrastructure of OCI.
  • Write high quality and maintainable code by leveraging design reviews, code reviews, unit tests and integration tests.
  • Develop complete solutions by ensuring that the services and the components are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.
  • Focus on customer needs through a data driven approach.
  • Collaborate with other team members working on the same project to meet customer requirements.
  • Troubleshoot and optimize automation for reliability, performance, and availability.
Qualifications & Skills
  • BS (or equivalent experience) in Computer Science, Engineering, or related field.
  • 3 years of experience in software development with programming languages including, but not limited to, C, C++, C#, Java, Go, Rust.
  • 1 year of experience designing and developing distributed systems and services.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Preferred Qualifications
  • Experience in managing cloud infrastructure with hundreds of thousands of servers.
  • Experience in containerization technologies such as Docker and Kubernetes.

© 2025 Qureos. All rights reserved.