Qureos

FIND_THE_RIGHTJOB.

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Design, build, and operate a production server side AI assistant that:
Answers questions grounded in user scoped data (docs, tables) with citations where applicable
Performs actions by calling internal/external APIs securely on the user s behalf
Supports low latency, real time voice chat (streaming STT/TTS + incremental LLM responses)

Implement the tool/agent layer:
Structured tool calling (JSON schema based) to integrate business services
Model Context Protocol (MCP) servers/clients where appropriate for tool discovery and execution

Architect retrieval augmented generation (RAG):
Ingestion for documents and tables, parsing, chunking, embeddings, metadata, and indexing
Hybrid retrieval (sparse+dense), query rewriting, and answer attribution

Deliver performant, cost e cient inference on open source models:
Model selection/routing; context management; caching/batching; streaming token delivery
GPU utilization and serving via vLLM/TGI/llama.cpp/Ollama or similar

Build resilient APIs and real time integrations:
WebSockets/WebRTC/gRPC for streaming voice; REST/GraphQL for control and orchestration

Productionize and operate on server/on prem:
Containerize with Docker; automate CI/CD; implement logs/metrics/traces (OpenTelemetry)
Evals, A/B tests, safety/guardrails, and human in the loop feedback

© 2025 Qureos. All rights reserved.