Find The RightJob.
Government-backed Abu Dhabi organization focused on advanced technology R&D (est. 2020), defining strategy, funding, and policies across AI, robotics, and emerging technologies. Oversees the full innovation lifecycle - from research and programs to commercialization - through dedicated applied research, innovation, and venture entities.
The first production system is an AI-enabled operational platform that gives a senior leadership team a shared situational picture, an AI-classified signal feed, a daily AI-generated briefing, and an action accountability tracker. MVP target: operational within two weeks of team formation. The platform is also the technical foundation for all subsequent Data & AI systems across the organization.
Build, own, and continuously improve the AI capabilities in the DAIO's(Data & AI Office) production systems: real-time signal classification against a defined scenario framework, and daily AI-generated briefing generation. This is not a research role and not a fine-tuning role. It is applied AI engineering — structured prompts, observable outputs, deterministic fallbacks, and measurable quality. The AI capabilities must work reliably under production conditions including API outages, malformed signal data, and edge-case classification scenarios. This role also designs the migration path from the initial LLM runtime to the sovereign model runtime in Phase 2.
WHAT THIS ROLE BUILDS & OWNS
AI Classification & Briefing Service — FastAPI wrapper around the LLM API with two versioned prompt templates
Signal classification prompt — structured prompt against a defined scenario taxonomy, returning JSON with scenario tag, confidence level, and rationale
Daily briefing generation prompt — structured 400–600 word output covering signal summary, scenario assessment, delta from prior day, and recommended decision agenda
Prompt versioning system — templates stored in configuration, editable by authorized users without code changes
Observability layer — every API call logged with input hash, model version, output, latency, and token count
Fallback logic — graceful degradation when the LLM API is unavailable: items stored as unclassified and surfaced for manual review
Classification quality evaluation framework — weekly precision measurement against human reviewer sample
Phase 2: sovereign model runtime migration plan — prompt adaptation, integration testing, performance benchmarking
KEY DECISIONS THIS ROLE OWNS
Prompt design for each capability — structure, temperature, output format, system vs. user message split
Confidence threshold definition — what triggers a low-confidence flag requiring human review
Context window management for briefing generation — what signal subset to include within the token budget
When to trigger prompt iteration vs. accept current classification quality
Which classification errors are acceptable vs. unacceptable given operational stakes
Sovereign model prompt adaptation scope for Phase 2 — what needs rewriting, what transfers
WHAT THIS ROLE DOES NOT DO
Build the backend API or ingestion pipeline — this role calls the API, it does not build it
Fine-tune or train models — this is prompt engineering and integration, not ML research
Define the operational scenario taxonomy — that is business domain knowledge owned by designated owners
Own the data schema for signals — that is the Head of Data Architecture
PROFILE OF THE IDEAL CANDIDATE
Has shipped an LLM-based feature that non-AI users depend on daily — and has been responsible when it breaks. Knows that the hardest part of applied AI is not the prompt — it is the fallback, the observability, and the human review loop. Can write a classification prompt in the morning, evaluate its precision against a ground truth set in the afternoon, and ship an improved version the next day. Not attached to a particular model — the job is reliable output, not elegant architecture
Anthropic Claude API — structured output prompting, JSON mode, system prompt design
Prompt engineering for classification tasks — zero-shot and few-shot with examples
Python — async API calls, error handling, retry logic with exponential backoff
LLM evaluation — precision/recall for classification, human-AI agreement measurement
Structured output design — JSON schema enforcement, output validation with Pydantic
Open-weight / sovereign model
APIs (Falcon, Llama, or equivalent)
Token budgeting and context window management
Observability for AI systems — output quality monitoring, anomaly detection
FastAPI — building the AI service wrapper
Docker deployment of AI service components
Location:
Seniority:
Technologies:
Benefits:
Similar jobs
Acosta Group
Eden Prairie, United States
about 5 hours ago
Oredata Yazılım Anonim Şirketi
Istanbul, Turkey
about 5 hours ago
HR Ways
Pakistan
about 5 hours ago
One Network
Rawalpindi, Pakistan
about 5 hours ago
Contour Software
Karachi, Pakistan
about 5 hours ago
SAP
Riyadh, Saudi Arabia
about 5 hours ago
© 2026 Qureos. All rights reserved.