FIND_THE_RIGHTJOB.
JOB_REQUIREMENTS
Hires in
Not specified
Employment Type
Not specified
Company Location
Not specified
Salary
Not specified
As a System Engineer you will deploy, optimize, and maintain the local AI systems including large language models (LLMs), embedding generators, rerankers, and retrieval pipelines. The role focuses on ensuring reliable local inference, policy‑safe routing, and end‑to‑end RAG performance within a fully private environment.
Responsibilities:
* Deploy and configure local LLMs (Ollama/vLLM) for low‑latency chat and retrieval tasks.
* Integrate embedding models and rerankers (e.g., bge, jina, gte, or Hugging Face alternatives).
* Implement hybrid retrieval (BM25 + vector) pipelines with pgvector.
* Own and maintain the policy engine controlling model routing and classification (local vs external).
* Conduct performance benchmarking and quantization tests for different model sizes.
* Tune model parameters for optimal inference on available GPUs.
* Collaborate with Backend engineers to wire AI inference APIs into FastAPI services.
* Develop scripts to monitor model uptime, latency, and retrieval quality.
* Maintain reproducibility: model versions, config hashes, and deterministic inference logs.
* Contribute to the Q‑CERT pipeline with model metadata and audit hashes.
Required Skills:
* Python (LangChain or LlamaIndex).
* Hugging Face Transformers and embeddings.
* Familiarity with Ollama, vLLM, or text‑generation‑inference.
* Basic GPU management, CUDA, and quantization (GGUF, GPTQ, AWQ).
* Understanding of RAG systems and evaluation metrics.
* Linux environment management and containerized inference (Docker).
Preferred (Bonus):
* Experience with fine‑tuning or LoRA adapters.
* Familiarity with vector DBs (pgvector, FAISS).
* Exposure to model evaluation tools (RAGAS, DeepEval).
* Knowledge of policy enforcement or prompt‑guard frameworks.
Work Style:
* Works closely with Backend / Infra Engineer for deployment and data pipelines.
* Weekly sync with Frontend team to validate outputs and UI integration.
* Expected to test and log all model benchmarks before production use.
* Operates in a secure internal environment — zero cloud data leakage allowed.
Notes:
Initial 3‑month engagement with option to extend based on model stability, performance gains, and adherence to privacy protocols.
Job Type: Full-time
Pay: ₹25,000.00 - ₹45,000.00 per month
Benefits:
Work Location: In person
Similar jobs
Capgemini
India
5 days ago
Navin Infrasolutions Pvt. Ltd.
India
5 days ago
Zaron Metal Sections India Private Limited
India
5 days ago
Riverview City Constructions Ltd.
India
5 days ago
bhramakar solution pvt ltd
India
5 days ago
BigR.io
India
5 days ago
Innovaccer
Uttar Tola, India
5 days ago
© 2025 Qureos. All rights reserved.