Find The RightJob.
ML Model Serving Engineer
Want to build the layer that actually makes AI usable in real time?
You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.
They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.
You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.
This is not about training models. It’s about making them fast, efficient, and production-ready.
What you’ll work on:
What you’ll bring:
Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale.
You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.
There’s real ownership here. You’ll help define how next-generation AI systems are served.
Package:
$220,000 – $320,000 base + equity
San Francisco, onsite 3 days per week
If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.
All applicants will receive a response.
Similar jobs
Guidehouse
Huntsville, United States
about 19 hours ago
DevsOrb
Lahore, Pakistan
about 20 hours ago
New York University Abu Dhabi
Abu Dhabi, United Arab Emirates
1 day ago
Archer Integrated Risk Management
Cairo, Egypt
1 day ago
Commencis
Istanbul, Turkey
1 day ago
Agency VA
Islamabad, Pakistan
1 day ago
Edge
Islamabad, Pakistan
1 day ago
© 2026 Qureos. All rights reserved.