Find The RightJob.

AI Research Scientist - MLLM Training

Are you ready to pioneer the future of AI? We’re seeking a Research Scientist with expertise in training Multimodal Large Language Models (MLLMs) to join an ambitious, cutting-edge team working at the intersection of AI, deep learning, and human expression. This role offers a unique opportunity to define the next wave of AI innovation by building the first real-time human foundation model that integrates text, speech, facial expressions, and body language. If you’re passionate about pushing research boundaries while delivering real-world applications, we’d love to hear from you. About the Role

As a Research Scientist, you’ll drive the development of advanced systems capable of understanding and generating lifelike, human-like responses in real-time. This unified model will:

Understand fine-grained human signals — from a subtle change in tone to a quirked eyebrow — and infer meaning in context.
Generate lifelike, responsive avatars whose gestures, expressions, and tone evolve frame-by-frame to deliver genuine, social, and emotionally intelligent interactions.

This is an opportunity to tackle blank-page problems and shape foundational technology in an area where real-time, multimodal interaction is still an unsolved frontier.This Role is Perfect for You If You...

Have a PhD (or equivalent experience) in training multimodal large language models, autoregressive architectures, or related fields, with a proven track record of publishing groundbreaking research.
Excel at deep learning and know how to run the entire ML pipeline — from data preparation and rapid prototyping to large-scale model training, benchmarking, and evaluation.
Thrive in ambiguity and enjoy charting your own path to solve complex, unstructured challenges.
Are passionate about bridging research breakthroughs with real-world applications.
Write clean, efficient code that scales and stands the test of time.
Collaborate effectively with a diverse team of brilliant minds from different domains.

Why Join Us?

Shape the future of AI: This is a rare opportunity to work on foundational technology that unites text, speech, and vision into a cohesive, real-time system — tackling problems others haven’t solved yet.
Be part of a world-class team: Work alongside PhDs from top institutions like MIT, UW, and Oxford, with decades of combined experience at leading companies like Apple and Meta.
Frontier research meets real-world impact: Our team has a proven track record of advancing AI avatars and audio-visual generation, publishing at top conferences, and shipping real-time ML products used by millions.
Collaborate in-person: Join us at our Seattle HQ, where our team works together 5 days a week to drive innovation forward.

Key Details

Location: Seattle, Washington (In-person collaboration, 5 days/week)
Funding: $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels from the AI industry.

What We’re Building

We’re creating the first human foundation model that operates across text, speech, facial expression, and body language in real time. The goal? To make AI socially and emotionally intelligent, able to interpret subtle human signals and respond in a way that feels truly human.The industry has made strides in areas like voice AI and avatar visuals, but existing solutions remain fragmented. This role offers the chance to be a trailblazer in a field where the boundaries are still being defined. How to Apply

If this sounds like the challenge you’ve been looking for, let’s connect! Apply now to join a team that’s reshaping the future of AI — one lifelike interaction at a time. Note: This posting is intentionally anonymized to respect client confidentiality. We look forward to hearing from innovative thinkers ready to make an impact.

Auto

Similar jobs

No similar jobs found

Term of use Privacy policy