Qureos

FIND_THE_RIGHTJOB.

AI Engineer III

Remote, United States

Mission

We’re reimagining instant shopping so technology never stands in the way, it accelerates you exponentially toward your goal by forming a deep connection with your needs and desires. In the Personal Superintelligence Lab, you will lead the design and deployment of agentic AI that reasons over rich, real‑world context and constraints, grounded in up‑to‑the‑minute knowledge and leveraging our unparalleled delivery speed. Your work will push the state of the art in alignment, grounding, and multi‑agent orchestration—while landing breakthroughs safely and at scale in production.

Scope and Impact

As an AI Engineer III, you will be the technical lead across context engineering, RLHF/RLVR and low‑latency serving. You’ll define the architecture, standards, and evaluation strategy that connect research to real‑world lift. You’ll mentor colleagues, influence cross‑functional roadmaps, and ship systems that deliver measurable improvements to core customer and business outcomes—without disclosing competitive intelligence.

Areas of Leadership and Contribution

Advanced Context & Grounding Research:
  • Set the strategy for context engineering to maximize precision/recall of key order metrics across sessions, households, locales, and time.
  • Architect multi‑modal context integration (temporal, spatial, behavioral) and real‑time grounding with dynamic constraint satisfaction.
  • Establish retrieval freshness, geo/time‑aware constraints, and memory policies; formalize context schemas and data contracts.
  • Champion declarative prompt/program compilation (e.g., DSPy) for systematic, testable LLM behavior.
  • Design multi‑agent orchestration patterns (e.g., graph‑based agents via LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex) that yield robust emergent reasoning.

Alignment and learning Systems:
  • Lead supervised reasoning-centered fine‑tuning with rigorous data curation, synthetic data generation, and QA; institute golden sets and rubric/pairwise evals.
  • Own the reasoning architecture and evaluation strategy—planning, tool selection, reflection, and uncertainty-aware decision-making—to deliver robust, low-latency, grounded outcomes at scale.
  • Drive parameter‑efficient adaptation strategies (LoRA/QLoRA and text-to-LoRA) with clear criteria for when to specialize vs. generalize.
  • Architect RLHF and RLVR pipelines; build preference data loops, scalable oversight, and guardrails.
  • Own policy optimization strategy: expert use of DPO/PPO/GRPO/GSPO and advancement beyond them (constrained optimization, regularized objectives, KL‑control) with formal safety considerations.
  • Ensure robust offline‑to‑online correlation via counterfactual/IPS/DR estimators and stress tests across traffic segments.

Safety, robustness, and privacy:
  • Establish interpretability, controllability, and alignment verification practices for agentic systems.
  • Develop safeguards against reward hacking and unsafe exploration; enforce distributional robustness and content policy compliance.
  • Advance privacy‑preserving methods (data minimization, federated/on‑device learning where appropriate) with privacy‑by‑design.

Systems, serving, and evaluation at scale:
  • Architect low‑latency, cost‑efficient inference (quantization, caching, batching, streaming) with resilient fallbacks and red‑teaming.
  • Build eval frameworks that tightly couple offline metrics with online performance and safety criteria; define promotion gates.
  • Use relevant APIs to perform high‑fidelity data augmentation that strengthens grounding, disambiguation, and availability‑aware suggestions.

Experimentation and cross‑functional impact:
  • Partner closely with Engineering and Data Science to design experiments, define success criteria, and iterate quickly from signal to lift.
  • Translate ambiguous product goals into crisp technical milestones; maintain clear documentation, incident response, and learning playbooks.
  • Mentor colleagues; raise the bar on design quality, reproducibility, and ethical rigor.

Requirements:

  • PhD in Computer Science, Machine Learning, or equivalent research experience with significant contributions to AI/ML literature.
  • 7+ years of building and shipping large‑scale ML systems with significant ownership; proven impact in production LLM or RL‑driven products.
  • Mastery of advanced fine-tuning techniques including LoRA/QLoRA, adapter methods, and parameter-efficient transfer learning.
  • Research experience with agentic AI frameworks, multi-agent systems, and declarative programming approaches (DSPy, LangChain ecosystem).
  • Strong systems engineering capabilities with PyTorch, distributed training, and cloud-native ML infrastructure.
  • Track record of publications in top-tier venues (NeurIPS, ICML, ICLR, AAAI) or equivalent industry impact.
  • Deep expertise in transformer architectures, SFT, and RLHF; hands‑on leadership with RLVR and verifiable reward design.
  • Mastery of policy optimization (DPO/PPO/GRPO/GSPO) and the ability to extend/regularize policies under safety, latency, and cost constraints.
  • Strong grounding in offline evaluation, counterfactual estimators, and safe online ramp strategies.
  • Systems fluency: PyTorch, distributed training, low‑latency serving, observability, and cloud‑native ML infra.
  • Demonstrated leadership across cross‑functional teams, with clear communication and mentoring track record.
  • Commitment to responsible AI: privacy, safety, and alignment principles embedded end‑to‑end.

Preferred Qualifications:

  • Research or applied work in multi‑agent systems, decision theory, or declarative programming (e.g., DSPy).
  • Experience with formal methods for safety, program synthesis, or automated reasoning.
  • Contributions to open‑source AI frameworks or foundational model development.
  • Experience with privacy‑enhancing technologies, federated/on‑device learning, or identity/memory architectures.

Tooling and Stack:

  • Fine‑tuning: Unsloth for rapid prototyping; TRL for RLHF/RLVR workflows and policy optimization.
  • Retrieval, evaluation, and orchestration: pragmatic use of graph‑based agent frameworks and vector retrieval systems as appropriate.

What We Offer:

  • A front row place in redefining instant shopping with personal superintelligence deployed at massive scale.
  • Deep collaboration with exceptional researchers and engineers; publication support where appropriate.
  • Access to world‑class compute, datasets, and experimentation infrastructure.
  • Competitive compensation with meaningful upside tied to breakthrough AI applications.

Compensation:

  • Gopuff pays employees based on market pricing and pay may vary depending on your location. The salary range below reflects what we’d reasonably expect to pay candidates. A candidate’s starting pay will be determined based on job-related skills, experience, qualifications, interview performance, and market conditions. These ranges may be modified in the future. Exceptions may be made for exceptional individuals. For additional information on this role’s compensation package, please reach out to the designated recruiter for this role.
  • This role is eligible for a discretionary annual cash bonus and participation in Gopuff’s equity incentive plan.
  • Base Salary Range: $215,000 - $275,000

Benefits Overview:

  • Medical/Dental/Vision Insurance
  • 401(k) Retirement Savings Plan
  • HSA or FSA eligibility
  • Long and Short-Term Disability Insurance
  • Mental Health Benefits
  • Fitness Reimbursement Program
  • 25% employee discount & FAM Membership
  • Flexible PTO
  • Group Life Insurance
  • EAP through AllOne Health (formerly Carebridge)
The only predictable thing about life is that it’s wildly unpredictable. That’s where we come in.

When life does what it does best, customers turn to Gopuff to deliver their everyday essentials, and to get through their day & night, work day and weekend.

We’re assembling a team of thinkers, dreamers & risk-takers...the kind of people who know the value of peace of mind in an unpredictable world. (And people who love snacks.)

Like what you’re hearing? Welcome to Gopuff.

The Gopuff Fam is committed to an inclusive workplace where we do not discriminate on the basis of race, sex, gender, national origin, religion, sexual orientation, gender identity, marital or familial status, age, ancestry, disability, genetic information, or any other characteristic protected by applicable laws. We believe in diversity and encourage any qualified individual to apply. We are an equal employment opportunity employer.

#LI-GOPUFF

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

© 2025 Qureos. All rights reserved.