BS degree in Computer Science, Software Engineering, or Computer Engineering from an HEC recognized University.
Experience:
3 to 4 years of hands-on experience building and shipping AI systems in production.
Hands-on experience training and fine-tuning LLMs, including LoRA-based fine-tuning.
Proven experience building RAG pipelines (chunking, embeddings, retrieval, reranking, grounding, and evaluation,
Experience building agentic systems using LangChain and LangGraph, deploying models locally on GPUs using vLLM and/or Ollama, Ray and Kubernetes for scalable serving & operations and building real-time voice agents using LiveKit.
Experience with automated evaluation and benchmarking for LLM, RAG, and agent workflows.
Experience implementing guardrails and secure tool execution patterns.
Key Responsibilities:
Train and fine-tune LLMs for task performance and alignment using SFT and alignment methods (RLHF, PPO, GRPO), and validate with robust evaluation practices.
Build production-grade RAG systems including ingestion, chunking, embedding, vector search, reranking, grounding, and evaluation to reduce hallucinations.
Develop agentic AI workflows using LangChain and LangGraph, including tool calling, multi-step planning, memory, guardrails, and observability.
Deploy and serve models locally on GPUs using vLLM and Ollama, optimizing throughput, latency, batching, and KV cache behavior.
Productionize distributed inference and services using Ray and Kubernetes (deployments, autoscaling, rolling updates, reliability).
Build end-to-end voice agents using LiveKit (streaming STT, LLM orchestration, TTS, turn-taking, and real-time session handling).
Collaborate with product and engineering teams to define requirements, success metrics, and deliver production-ready features with documentation and tests.
Knowledge/Skills/Abilities:
Strong Python skills with practical experience in PyTorch and modern LLM tooling.