Qureos

FIND_THE_RIGHTJOB.

AI Engineer I

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

About The Role

Junior-to-mid level role designed for someone with 1–2 years of hands-on experience who is excited to build production-grade LLM features end-to-end. You will work closely with senior AI engineers and the Head of AI at AIO, owning real product capabilities such as smart assistants, content generation tools, and knowledge-retrieval systems. This is a full-stack AI engineering role: you will experience everything from prompt design and model fine-tuning to building FastAPI backends and deploying scalable inference services. Expect to ship production code in your first month.

What will be your responsibilities?

  • Fine-tune and align open-source LLMs (LLaMA-3, Mistral, Gemma, Phi-3, Qwen, etc.) using Hugging Face Transformers, PEFT (LoRA/QLoRA), TRL, and Unsloth/Axolotl on cloud or consumer GPUs.
  • Design, build, and maintain Retrieval-Augmented Generation (RAG) pipelines using LangChain/LangGraph or LlamaIndex + vector stores (Pinecone, Weaviate, Qdrant, Chroma).
  • Develop production-ready backends with FastAPI (REST + WebSockets), including asynchronous processing, background tasks (Celery/Redis), rate-limiting, and authentication.
  • Implement advanced prompting (chain-of-thought, tree-of-thought, ReAct), tool calling / function calling, and lightweight agent workflows.
  • Create robust data ingestion and preprocessing pipelines (cleaning, chunking, metadata enrichment, deduplication) for both training and retrieval.
  • Run standardized and custom evaluations using LM-Eval-Harness, DeepEval, or simple benchmarks; track accuracy, latency, token cost, and user feedback metrics.
  • Deploy and serve models with vLLM, Text Generation Inference (TGI), Ollama, or Hugging Face Inference Endpoints. Apply 4-bit / 8-bit quantization (GPTQ, AWQ, bitsandbytes) for cost and speed.
  • Build monitoring and observability stacks with LangSmith, Helicone, Phoenix, or OpenTelemetry; debug and resolve production issues (timeouts, hallucinations, drift).
  • Use Weights & Biases, MLflow, or Comet for experiment tracking; maintain clean Git workflows, PR reviews, and CI/CD via GitHub Actions.
  • Stay current with the latest open-source models and papers (ArXiv, Hugging Face daily papers) and present one actionable insight per sprint.

What are we looking for, and what does it require to be the right fit for this role?

  • Bachelor's or Master's in Computer Science, AI, Data Science, or equivalent practical experience.
  • Strong Python proficiency and daily hands-on experience with PyTorch + Hugging Face Transformers ecosystem.
  • Fine-tune and align LLMs (LLaMA-3, Mistral, Gemma, Phi-3, Qwen, etc.) using Hugging Face Transformers, PEFT (LoRA/QLoRA).
  • Design, build, and maintain Retrieval-Augmented Generation (RAG) pipelines using LangChain/LangGraph + vector stores (Pinecone, Qdrant, Chroma).
  • Develop production-ready backends with FastAPI (REST).
  • Implement advanced prompting (chain-of-thought, tree-of-thought, ReAct), tool calling/function calling, and lightweight agent workflows.
  • Run standardized and custom evaluations using LM-Eval-Harness, DeepEval, or simple benchmarks; track accuracy, latency, token cost, and user feedback metrics.
  • Deploy and serve models with vLLM, Ollama, or Hugging Face Inference Endpoints. Apply 4-bit / 8-bit quantization (GPTQ, AWQ, bitsandbytes) for cost and speed.
  • Ability to read research papers and quickly turn ideas into working prototypes.
  • Excellent communication skills and a genuine desire to learn rapidly in a supportive, fast-moving team.

Nice to Haves

  • Use Weights & Biases, MLflow, or Comet for experiment tracking; maintain clean Git workflows, PR reviews, and CI/CD via GitHub Actions.
  • Comfortable with Git, Docker, Linux command line, and basic shell scripting.

Why Join AIO?

Our mission is to revolutionize the US restaurant industry by providing a comprehensive and fully integrated solution that empowers restaurant owners to efficiently manage all aspects of their business. Our platform combines our patented AI technology with unparalleled customer support to help owners increase revenue, reduce costs, and improve their overall profit margins.

We believe that restaurants should be able to focus on delivering exceptional dining experiences to their customers, without the added stress of managing complex and disparate systems. That's why we offer an All-In-One super app platform for all their needs, from front-of-the-house operations like ordering, payment, marketing and rewards, to back-of-the-house management like inventory, staff, and financials.

We are laser focused on becoming a significant player in the $55 billion restaurant tech SaaS market. You will be a part of a world class, up-and-coming Silicon Valley funded startup.

© 2025 Qureos. All rights reserved.