Job Overview
We are seeking a Senior LLM Developer to architect and implement advanced AI solutions within our data ecosystem. This role is critical for bridging the gap between high-scale data engineering and generative AI application development. You will be responsible for building robust LLM pipelines and, specifically, validating the Snowflake–Databricks Iceberg V3 interoperability to ensure seamless data liquidity across our lakehouse architecture.
Must possess a valid USA Visa to facilitate travel to the head office when needed.
The ideal candidate possesses deep expertise in Large Language Models (LLMs) and the underlying data infrastructure required to feed them at an enterprise scale.
Key Responsibilities
- Interoperability Validation: Design and execute rigorous testing frameworks for Snowflake and Databricks Iceberg V3 integration, ensuring metadata consistency and performance parity across platforms.
- LLM Orchestration: Build and maintain complex AI agents and workflows using frameworks like LangChain, LangGraph, or LlamaIndex.
- Data Pipeline Integration: Develop RAG (Retrieval-Augmented Generation) systems that leverage Snowflake and Databricks as primary knowledge bases.
- Evaluation & Benchmarking: Establish LLM evaluation metrics (LLM-as-a-judge) to monitor the accuracy, bias, and performance of deployed models.
- Optimization: Fine-tune models and optimize vector database indexing to reduce latency and compute costs in a production environment.
Technical Requirements
- Education: Master’s degree in Computer Science, Data Science, AI/ML, or a closely related quantitative field.
- LLM Expertise: 5+ years of experience in software engineering with at least 2+ years focused on LLM application development (Open AI API, Anthropic, Llama 3, Hugging Face).
- Data Lakehouse Mastery: Proven experience with Apache Iceberg (V3) and the technical nuances of interoperability between Snowflake (Polaris/Unistore) and Databricks (Unity Catalog).
- Backend Proficiency: Expert-level Python skills, specifically with asynchronous programming and API development (FastAPI/Flask).
- Vector Infrastructure: Extensive experience with vector databases such as Pinecone, Weaviate, or Milvus, including advanced chunking strategies.
- Cloud Architecture: Strong background in AWS or Azure, particularly with containerization (Docker, Kubernetes) and serverless functions.
- DevOps/MLOps: Experience setting up CI/CD pipelines for AI models and managing the lifecycle of data within a production environment.
Job Type: Contract
Pay: $50.00 - $80.00 per hour
Expected hours: 40 per week
Application Question(s):
- What automated frameworks or metrics (e.g., RAGAS, G-Eval) have you implemented to measure the reliability and "hallucination" rates of your LLM agents?
- For a high-scale production environment, how do you handle the trade-off between retrieval speed and context accuracy when managing billion-scale vector embeddings?
- Describe a specific challenge you faced when syncing or validating data between Snowflake and Databricks using the Apache Iceberg format. How did you ensure metadata consistency?
- Do you currently hold a valid USA Visa or residency status that allows you to travel to the United States for business meetings?
Work Location: Remote