Position Overview
We are looking for an AI Data Engineer with 2+ years of hands-on experience in data engineering, vector databases, LLM/AI integrations, and Python-based automation. The ideal candidate should have strong knowledge of Open AI, AWS Bedrock, embeddings, Flowise, prompt engineering, and building scalable RAG (Retrieval-Augmented Generation) pipelines.
Key Responsibilities:
AI & LLM Integrations
- Integrate and optimize OpenAI, GPT models, AWS Bedrock models (Claude, Titan, etc.) for production workflows.
- Build and maintain Retrieval-Augmented Generation (RAG) systems using vector search.
- Develop prompt engineering strategies, structured prompts, and dynamic contextual prompts.
- Implement LLM orchestration using Flowise, Lang Chain, or Llama Index.
Data Engineering & Pipelines
- Design, build, and maintain ETL/ELT pipelines for structured & unstructured data.
- Develop ingestion workflows for PDFs, docs, images, and text for LLM training and retrieval.
- Implement data cleaning, transformation, preprocessing, chunking, and embedding generation.
- Handle large-scale data pipelines that feed AI models and vector databases.
Vector Database Engineering
- Work with Pinecone, Qdrant, Milvus, We aviate, Chroma to store and retrieve embeddings.
- Optimize vector indexes, similarity search, metadata filtering, and document-versioning logic.
- Manage vector schema design and vector DB performance tuning.
Python Development & Automation
- Build Python-based microservices, APIs (FastAPI/Flask), and automation scripts.
- Create backend functions to handle AI requests, data ingestion, embeddings, and retrieval logic.
- Integrate with cloud storage, messaging queues, and external APIs.
Cloud & DevOps
- Deploy AI and data pipelines on AWS (Lambda, S3, DynamoDB, EC2, API Gateway).
- Manage secrets, IAM roles, scalability, and cloud resource optimization.
- Containerize workloads using Docker and work with CI/CD workflows (GitHub/GitLab).
Cross-functional Collaboration
- Work alongside AI engineers, backend teams, data scientists, and product managers.
- Document workflows, maintain internal knowledge bases, and support debugging across teams.
Required Skills & Qualifications
- Bachelor’s degree in Computer Science, Data Science, Engineering, or related field.
- 2+ years of experience in data engineering, AI, or ML-focused development.
- Strong in Python (FastAPI, Flask, Pandas, NumPy, AsyncIO).
- Experience with Open AI, GPT models, AWS Bedrock, embeddings, and tokenization.
- Strong understanding of data preprocessing for LLMs: chunking, cleaning, vectorization.
- Hands-on experience with vector databases: Pinecone, Qdrant, Milvus, We aviate, Chroma.
- Practical experience with Flowise, Lang Chain, or Llama Index.
- Knowledge of prompt engineering and optimizing LLM responses.
- Experience with SQL & NoSQL databases.
- Familiar with API integrations, backend workflows, and cloud-based pipelines.
- Understanding of CI/CD workflows, version control (Git), and containerization (Docker).
Nice-to-Have
- Experience with MLOps tools and model monitoring.
- Exposure to model fine-tuning or supervised generation training.
- Familiarity with Airflow, Prefect, or cloud-native workflow orchestrators.
- Hands-on with parallel processing or distributed pipelines.
Soft Skills
- Strong analytical thinking and problem-solving capability.
- Clear communication and documentation.
- Ability to work in fast-paced, agile environments.
- Quick learner with deep curiosity about AI/ML technologies.
Job Types: Full-time, Permanent
Pay: ₹300,000.00 - ₹420,000.00 per year
Work Location: In person