Qureos

Find The RightJob.

AI Engineer

OVERVIEW
RESPONSIBILITIES
REQUIREMENTS

Government-backed Abu Dhabi organization focused on advanced technology R&D (est. 2020), defining strategy, funding, and policies across AI, robotics, and emerging technologies. Oversees the full innovation lifecycle - from research and programs to commercialization - through dedicated applied research, innovation, and venture entities.

The first production system is an AI-enabled operational platform that gives a senior leadership team a shared situational picture, an AI-classified signal feed, a daily AI-generated briefing, and an action accountability tracker. MVP target: operational within two weeks of team formation. The platform is also the technical foundation for all subsequent Data & AI systems across the organization.

Build, own, and continuously improve the AI capabilities in the DAIO's(Data & AI Office) production systems: real-time signal classification against a defined scenario framework, and daily AI-generated briefing generation. This is not a research role and not a fine-tuning role. It is applied AI engineering — structured prompts, observable outputs, deterministic fallbacks, and measurable quality. The AI capabilities must work reliably under production conditions including API outages, malformed signal data, and edge-case classification scenarios. This role also designs the migration path from the initial LLM runtime to the sovereign model runtime in Phase 2.

WHAT THIS ROLE BUILDS & OWNS

  • AI Classification & Briefing Service — FastAPI wrapper around the LLM API with two versioned prompt templates

  • Signal classification prompt — structured prompt against a defined scenario taxonomy, returning JSON with scenario tag, confidence level, and rationale

  • Daily briefing generation prompt — structured 400–600 word output covering signal summary, scenario assessment, delta from prior day, and recommended decision agenda

  • Prompt versioning system — templates stored in configuration, editable by authorized users without code changes

  • Observability layer — every API call logged with input hash, model version, output, latency, and token count

  • Fallback logic — graceful degradation when the LLM API is unavailable: items stored as unclassified and surfaced for manual review

  • Classification quality evaluation framework — weekly precision measurement against human reviewer sample

  • Phase 2: sovereign model runtime migration plan — prompt adaptation, integration testing, performance benchmarking

KEY DECISIONS THIS ROLE OWNS

  • Prompt design for each capability — structure, temperature, output format, system vs. user message split

  • Confidence threshold definition — what triggers a low-confidence flag requiring human review

  • Context window management for briefing generation — what signal subset to include within the token budget

  • When to trigger prompt iteration vs. accept current classification quality

  • Which classification errors are acceptable vs. unacceptable given operational stakes

  • Sovereign model prompt adaptation scope for Phase 2 — what needs rewriting, what transfers

WHAT THIS ROLE DOES NOT DO

  • Build the backend API or ingestion pipeline — this role calls the API, it does not build it

  • Fine-tune or train models — this is prompt engineering and integration, not ML research

  • Define the operational scenario taxonomy — that is business domain knowledge owned by designated owners

  • Own the data schema for signals — that is the Head of Data Architecture

PROFILE OF THE IDEAL CANDIDATE

Has shipped an LLM-based feature that non-AI users depend on daily — and has been responsible when it breaks. Knows that the hardest part of applied AI is not the prompt — it is the fallback, the observability, and the human review loop. Can write a classification prompt in the morning, evaluate its precision against a ground truth set in the afternoon, and ship an improved version the next day. Not attached to a particular model — the job is reliable output, not elegant architecture

  • Anthropic Claude API — structured output prompting, JSON mode, system prompt design

  • Prompt engineering for classification tasks — zero-shot and few-shot with examples

  • Python — async API calls, error handling, retry logic with exponential backoff

  • LLM evaluation — precision/recall for classification, human-AI agreement measurement

  • Structured output design — JSON schema enforcement, output validation with Pydantic

  • Open-weight / sovereign model

  • APIs (Falcon, Llama, or equivalent)

  • Token budgeting and context window management

  • Observability for AI systems — output quality monitoring, anomaly detection

  • FastAPI — building the AI service wrapper

  • Docker deployment of AI service components

Location:

Istanbul, Turkey

Seniority:

Senior

Technologies:

Python

Benefits:

  • Paid Vacation
  • Hybrid Work (home/office)
  • Sick Days
  • Sport/Insurance Compensation
  • Holidays Day Off
  • English Classes
  • Training Compensation
  • Transportation compensation

© 2026 Qureos. All rights reserved.