Find The RightJob.

Senior AI-Focused Software Engineer

Remote Full-time (40 hours/week) with strong overlap during PK / Dubai working hours Compensation: Competitive, typically in the range of 2200 – 3000 USD per month (flexible for exceptional candidates)

About the Role

We're a fast-growing company shifting heavily toward AI-driven automation across the business: customer support, marketing, product, and operations. This role is for an engineer who designs and builds real production AI systems that move work through our company faster, cheaper, and with fewer humans in the loop.

This is a hands-on engineering role spanning backend systems, LLM integrations, agentic workflows, retrieval pipelines, and the data plumbing that makes AI actually work in production. You'll work in a real-world environment where models hallucinate, APIs fail, prompts drift, costs spike, and engineers are expected to take ownership of the systems they ship.

This is not a prompt engineer role. This is not a research only role. This is not "play with ChatGPT and report back" work. We move fast, ship often, debug real production issues, and expect engineers to own AI systems end to end. We use AI daily ourselves, but we care deeply about engineers who read, understand, and validate what their systems are doing, not those who treat LLMs as magic boxes.

What You'll Actually Do (Day to Day)

Design, build, and maintain AI-powered features and internal systems across one or more business areas (support automation, marketing workflows, internal research tools, ops automations, voice/email agents)
Build production integrations with LLMs across both hosted APIs (OpenAI, Anthropic, Gemini) and open-source models (Llama, Qwen, Mistral, DeepSeek, etc.) running on inference providers (Together, Groq, Replicate, Hugging Face, Fireworks) or self-hosted (vLLM, Ollama). Real systems with proper error handling, retries, timeouts, structured outputs, cost controls, and fallbacks
Pick the right model for the job. Frontier closed models when capability matters, smaller or open-source models when cost, latency, privacy, or customization matters. Fine-tune smaller models (LoRA / QLoRA) when prompting alone isn't enough and the use case is narrow and stable
Design and ship agentic workflows: multi-step LLM pipelines, tool-using agents, decision logic, task orchestration, and human-in-the-loop checkpoints
Build and maintain RAG systems end to end: ingestion, chunking, embedding generation, vector search, re-ranking, and retrieval quality evaluation
Work with vector databases (Pinecone, Qdrant, pgvector, Chroma, Weaviate, etc.) at a practical level
Build backend services and APIs that expose AI capabilities to internal tools, integrations, and external systems
Build automation pipelines that connect AI workflows to the rest of the stack (CRMs, support platforms, marketing tools, internal databases, webhooks)
Own reliability of AI systems in production: monitoring outputs, catching regressions, building eval harnesses, alerting, and debugging when behavior changes
Evaluate AI outputs systematically. Build the test sets, scoring rubrics, and feedback loops that tell you whether a system is actually working
Prepare and normalize real-world data for AI use: cleaning call transcripts, structuring support conversations, deduplicating documents, removing PII, extracting structured fields from messy inputs, and shaping data into RAG indexes, fine-tuning datasets, or evaluation sets. This is often the highest-leverage work in an AI project, and we treat it as core engineering, not preprocessing grunt work
Handle structured and unstructured data more broadly: parsing documents, transcripts, emails, scraped content, API responses, and turning messy inputs into useful structured outputs
Debug real production issues where AI behavior, data integrity, latency, or cost is impacted
Collaborate asynchronously with a remote engineering and operations team

Non-Negotiable Requirements

You must have hands-on experience with all of the following:

Strong backend engineering fundamentals. You can design, build, and ship a real backend service from scratch (Python or Node.js strongly preferred), including database design, API design, and proper error handling
Production experience with LLMs in real systems. You've shipped systems using either hosted APIs (OpenAI, Anthropic, Gemini, etc.) or open-source models via inference providers (Together, Groq, Replicate, Fireworks, etc.) or self-hosting (vLLM, Ollama, Hugging Face). Real workflows that real people or real customers depend on, not just demos or side projects
Real prompt design experience. Iterating on prompts under production conditions, structuring outputs (JSON, function calls, schemas), handling edge cases, and constraining model behavior. Not "I asked ChatGPT to write something."
API integration experience. REST, webhooks, and event-driven flows; comfort connecting multiple systems together
Practical RAG or retrieval experience. You've built or seriously contributed to a system that retrieves relevant context and feeds it to an LLM, and you understand why naive RAG often fails
Working knowledge of embeddings and vector search, conceptually and in production
Data handling and preparation skills. Parsing JSON, CSVs, documents, transcripts, API responses; cleaning, normalizing, deduplicating, and structuring messy real-world data before it reaches an AI system. You understand that most production AI failures trace back to data quality, not model quality
Debugging mindset. You don't accept "the model is just like that." You log, trace, isolate, and fix.
Ability to evaluate AI outputs. You know how to tell whether an AI feature is actually working or just looking like it is
4+ years of real-world software development experience at product companies or serious engineering teams. AI experience can be more recent, but the underlying engineering must be solid
Ability to design systems end to end, not just implement tasks, and take ownership of system reliability

What We're Looking For

Practical experience integrating AI into real business workflows in any domain (support, marketing, sales, ops, research, internal tools)
Comfort with the messiness of real AI systems: hallucinations, drift, cost spikes, rate limits, schema mismatches, intermittent failures
Pragmatic instinct for model selection. You know when a smaller, cheaper, or open-source model beats a frontier closed model for a given task, and vice versa
Hands-on experience with at least one open-source model in production (Llama, Qwen, Mistral, DeepSeek, Gemma, etc.), self-hosted or via inference providers
Fine-tuning experience (LoRA, QLoRA, or full fine-tuning of small models) for cases where prompting alone wasn't enough. Bonus if you can articulate when fine-tuning is worth the effort and when it isn't
Experience with at least one orchestration approach: n8n, Zapier, Make, LangChain, LlamaIndex, custom Python/Node orchestrators, or similar. We don't care which; we care that you've built real things
Hands-on experience with at least one vector database in production
Comfort working with both structured data (databases, APIs) and unstructured data (documents, transcripts, emails, scraped content)
Experience writing evals, test cases, or feedback loops for AI features, even informally
Voice agents, transcription, or multimodal experience is a plus but not required
Familiarity with cost optimization techniques for LLM workloads (caching, model routing, batching, smaller models for cheap steps) is a plus
Comfortable working independently in a fast-moving, remote environment with shifting priorities
A pragmatic attitude toward AI. Neither hype-driven nor dismissive

How We Work

Work is priority-driven and production-focused. Tasks may shift based on business needs, model behavior changes, or new automation opportunities. We value engineers who are comfortable adapting priorities while maintaining strong engineering standards and ownership of what they ship. We use AI tools daily to accelerate our own work, but every engineer is expected to fully understand and own the code and systems they ship.

Work Schedule & Availability

Full-time, fully remote role (40 hours/week)
Strong daily overlap required during PK / Dubai working hours
This is an urgent hire, and we're prioritizing candidates who can start soon

Work Location: Remote

Similar jobs

No similar jobs found

Term of use Privacy policy