Qureos

Find The RightJob.

Machine Learning Engineer

ML Performance Engineer – Low Latency Inference (Trading Environment)

Location : Chicago / NYC

Experience : 3-12 years (exceptional early-career profiles considered)


If you’re working on ML today, here’s the real question:

Are you optimising models for leaderboard metrics…

Or for speed under pressure?


This role sits inside a trading business where inference latency is measured in nanoseconds. Models don’t just need to be accurate; they need to fire first.

Firms compete for a few nanoseconds of edge over the market, which can lead to millions in profit.


What You’ll Do

  • Optimise GPU-based inference pipelines for real-time decision systems
  • Profile and eliminate bottlenecks across CPU, GPU, and network boundaries
  • Tune memory layout, batching, and concurrency to reduce end-to-end latency
  • Work directly with researchers and traders to productionise models
  • Own performance in live environments where speed directly impacts outcomes

This is not research-only ML.

It’s ML under strict performance constraints.


You’ll Fit If

  • Strong in Python and C++
  • Hands-on with CUDA / GPU optimisation
  • Comfortable profiling at kernel and system level
  • Care about determinism, throughput, and hardware-software interaction
  • Prefer performance problems over product roadmap debates
  • Bachelors degree in a STEM field.

Finance experience isn’t required.


What matters is whether shaving off microseconds sounds more interesting than shipping another feature at a big lab.


If it does, it's worth a conversation.

© 2026 Qureos. All rights reserved.