ML Performance Engineer
Location:
Palo Alto (Onsite)
We’re hiring an
ML Performance Engineer
to join a
Series B AI lab valued at $1.4B
that is building
best-in-class image models and products
(not a wrapper).
They have developed
in-house foundation models at scale across thousands of GPUs
and are widely recognized as having
one of the strongest image models in the world
. This is a product-first AI lab building creative tools currently used by
millions of users
.
This role focuses on
large-scale training and inference optimization
for in-house models, working closely with world-class research and engineering teams to maximize efficiency across GPUs and other accelerators.
What You’ll Do:
As an ML Performance Engineer, you’ll help optimize training and inference for large-scale ML models.
-
Work on core training and inference systems for large models
-
Improve efficiency across GPUs and other accelerators
-
Profile, debug, and optimize performance bottlenecks
-
Collaborate with researchers to make new models production-ready
-
Improve reliability and scalability of distributed workloads
-
Contribute to internal tooling and infrastructure that supports model deployment
-
Evaluate and adopt new hardware and system capabilities
Bonus Points:
-
Experience working on ML infrastructure, distributed systems, or performance-critical software
-
Exposure to low-level optimization, GPU programming, or numerical computing
-
Background in graphics, vision, or generative AI applications
-
Contributions to open-source projects
Requirements:
-
Strong software engineering fundamentals and problem-solving skills
-
Proficiency in Python and at least one systems language (e.g., C/C++)
-
Experience with machine learning frameworks (PyTorch, JAX, TensorFlow, etc.)
-
Understanding of performance, scalability, or distributed computing concepts
-
Interest in making systems faster and more efficient
-
Bachelor’s degree in CS, Engineering, or related field (advanced degrees welcome)