Remote Full-time (40 hours/week) with strong overlap during PK / Dubai working hours Compensation: Competitive, typically in the range of 2200 – 3000 USD per month (flexible for exceptional candidates)
About the Role
We're a fast-growing company shifting heavily toward AI-driven automation across the business: customer support, marketing, product, and operations. This role is for an engineer who designs and builds real production AI systems that move work through our company faster, cheaper, and with fewer humans in the loop.
This is a hands-on engineering role spanning backend systems, LLM integrations, agentic workflows, retrieval pipelines, and the data plumbing that makes AI actually work in production. You'll work in a real-world environment where models hallucinate, APIs fail, prompts drift, costs spike, and engineers are expected to take ownership of the systems they ship.
This is not a prompt engineer role. This is not a research only role. This is not "play with ChatGPT and report back" work. We move fast, ship often, debug real production issues, and expect engineers to own AI systems end to end. We use AI daily ourselves, but we care deeply about engineers who read, understand, and validate what their systems are doing, not those who treat LLMs as magic boxes.
What You'll Actually Do (Day to Day)
- Design, build, and maintain AI-powered features and internal systems across one or more business areas (support automation, marketing workflows, internal research tools, ops automations, voice/email agents)
- Build production integrations with LLMs across both hosted APIs (OpenAI, Anthropic, Gemini) and open-source models (Llama, Qwen, Mistral, DeepSeek, etc.) running on inference providers (Together, Groq, Replicate, Hugging Face, Fireworks) or self-hosted (vLLM, Ollama). Real systems with proper error handling, retries, timeouts, structured outputs, cost controls, and fallbacks
- Pick the right model for the job. Frontier closed models when capability matters, smaller or open-source models when cost, latency, privacy, or customization matters. Fine-tune smaller models (LoRA / QLoRA) when prompting alone isn't enough and the use case is narrow and stable
- Design and ship agentic workflows: multi-step LLM pipelines, tool-using agents, decision logic, task orchestration, and human-in-the-loop checkpoints
- Build and maintain RAG systems end to end: ingestion, chunking, embedding generation, vector search, re-ranking, and retrieval quality evaluation
- Work with vector databases (Pinecone, Qdrant, pgvector, Chroma, Weaviate, etc.) at a practical level
- Build backend services and APIs that expose AI capabilities to internal tools, integrations, and external systems
- Build automation pipelines that connect AI workflows to the rest of the stack (CRMs, support platforms, marketing tools, internal databases, webhooks)
- Own reliability of AI systems in production: monitoring outputs, catching regressions, building eval harnesses, alerting, and debugging when behavior changes
- Evaluate AI outputs systematically. Build the test sets, scoring rubrics, and feedback loops that tell you whether a system is actually working
- Prepare and normalize real-world data for AI use: cleaning call transcripts, structuring support conversations, deduplicating documents, removing PII, extracting structured fields from messy inputs, and shaping data into RAG indexes, fine-tuning datasets, or evaluation sets. This is often the highest-leverage work in an AI project, and we treat it as core engineering, not preprocessing grunt work
- Handle structured and unstructured data more broadly: parsing documents, transcripts, emails, scraped content, API responses, and turning messy inputs into useful structured outputs
- Debug real production issues where AI behavior, data integrity, latency, or cost is impacted
- Collaborate asynchronously with a remote engineering and operations team
Non-Negotiable Requirements
You must have hands-on experience with all of the following:
- Strong backend engineering fundamentals. You can design, build, and ship a real backend service from scratch (Python or Node.js strongly preferred), including database design, API design, and proper error handling
- Production experience with LLMs in real systems. You've shipped systems using either hosted APIs (OpenAI, Anthropic, Gemini, etc.) or open-source models via inference providers (Together, Groq, Replicate, Fireworks, etc.) or self-hosting (vLLM, Ollama, Hugging Face). Real workflows that real people or real customers depend on, not just demos or side projects
- Real prompt design experience. Iterating on prompts under production conditions, structuring outputs (JSON, function calls, schemas), handling edge cases, and constraining model behavior. Not "I asked ChatGPT to write something."
- API integration experience. REST, webhooks, and event-driven flows; comfort connecting multiple systems together
- Practical RAG or retrieval experience. You've built or seriously contributed to a system that retrieves relevant context and feeds it to an LLM, and you understand why naive RAG often fails
- Working knowledge of embeddings and vector search, conceptually and in production
- Data handling and preparation skills. Parsing JSON, CSVs, documents, transcripts, API responses; cleaning, normalizing, deduplicating, and structuring messy real-world data before it reaches an AI system. You understand that most production AI failures trace back to data quality, not model quality
- Debugging mindset. You don't accept "the model is just like that." You log, trace, isolate, and fix.
- Ability to evaluate AI outputs. You know how to tell whether an AI feature is actually working or just looking like it is
- 4+ years of real-world software development experience at product companies or serious engineering teams. AI experience can be more recent, but the underlying engineering must be solid
- Ability to design systems end to end, not just implement tasks, and take ownership of system reliability
What We're Looking For
- Practical experience integrating AI into real business workflows in any domain (support, marketing, sales, ops, research, internal tools)
- Comfort with the messiness of real AI systems: hallucinations, drift, cost spikes, rate limits, schema mismatches, intermittent failures
- Pragmatic instinct for model selection. You know when a smaller, cheaper, or open-source model beats a frontier closed model for a given task, and vice versa
- Hands-on experience with at least one open-source model in production (Llama, Qwen, Mistral, DeepSeek, Gemma, etc.), self-hosted or via inference providers
- Fine-tuning experience (LoRA, QLoRA, or full fine-tuning of small models) for cases where prompting alone wasn't enough. Bonus if you can articulate when fine-tuning is worth the effort and when it isn't
- Experience with at least one orchestration approach: n8n, Zapier, Make, LangChain, LlamaIndex, custom Python/Node orchestrators, or similar. We don't care which; we care that you've built real things
- Hands-on experience with at least one vector database in production
- Comfort working with both structured data (databases, APIs) and unstructured data (documents, transcripts, emails, scraped content)
- Experience writing evals, test cases, or feedback loops for AI features, even informally
- Voice agents, transcription, or multimodal experience is a plus but not required
- Familiarity with cost optimization techniques for LLM workloads (caching, model routing, batching, smaller models for cheap steps) is a plus
- Comfortable working independently in a fast-moving, remote environment with shifting priorities
- A pragmatic attitude toward AI. Neither hype-driven nor dismissive
How We Work
Work is priority-driven and production-focused. Tasks may shift based on business needs, model behavior changes, or new automation opportunities. We value engineers who are comfortable adapting priorities while maintaining strong engineering standards and ownership of what they ship. We use AI tools daily to accelerate our own work, but every engineer is expected to fully understand and own the code and systems they ship.
Work Schedule & Availability
- Full-time, fully remote role (40 hours/week)
- Strong daily overlap required during PK / Dubai working hours
- This is an urgent hire, and we're prioritizing candidates who can start soon
Work Location: Remote