Are you passionate about pushing the boundaries of technology in the Gen AI space? Rohirrim is seeking a Senior Data Engineer to mentor engineers, provide technical direction, and drive the development of cutting-edge applications. If you thrive in a fast-paced environment and enjoy leading by example while staying hands-on with coding, we want to hear from you!
At Rohirrim, we're at the forefront of innovation in the Gen AI space. Joining our team means being part of a dynamic environment where your leadership and expertise make a tangible impact on our products and team growth.
As a Data Engineer at Rohirrim, you’ll design, build, and optimize the data pipelines and infrastructure that fuel our AI products. You’ll work closely with our AI/ML teams, product teams, customer success managers,and security/compliance partners to transform complex enterprise datasets into clean, reliable, structured foundations for Rohan deployments — especially in controlled, secure, or GovTech environments.
You’ll help us scale:
-
ingestion pipelines
-
vector stores
-
embedding workflows
-
metadata & document-processing frameworks
-
Azure-native data services
…in a way that is fast, compliant, and deeply reliable.
-
Blend capabilities in software engineering, data engineering and devops to build and maintain scalable data ingestion pipelines for structured/unstructured data (documents, PDFs, knowledge bases, enterprise systems, APIs, etc.).
-
Develop and operate ETL/ELT workflows that ensure data integrity, security, and lineage.
-
Implement and optimize vector database systems and embeddings pipelines supporting RAG and AI features.
-
Collaborate with ML engineers to support model training, evaluation, and feature engineering pipelines.
-
Architect and manage Azure-based data infrastructure (e.g., Azure Functions, Azure Storage, Azure SQL, Azure Kubernetes Service, Azure OpenAI integrations).
-
Build internal tools for metadata extraction, OCR/document parsing, text normalization, and validation.
-
Ensure pipelines meet compliance, auditability, and security requirements (SOC2, FedRAMP, etc.).
-
Support customer-specific data onboarding workflows for government + enterprise deployments.
-
Monitor and improve pipeline performance, reliability, and scalability.
-
10+ years in Data Engineering, Software Engineering, or ML/Data Infrastructure roles.
-
Strong experience with Python, SQL, and modern data engineering tools (Airflow, Dagster, dbt, Prefect, etc.).
-
Experience building large-scale document extraction ETL pipelines (OCR, PDF parsing, metadata extraction, NLP preprocessing).
-
Proficiency with Kubernetes, Docker, and containerized data pipelines deployed on Azure, AWS and/or Google Cloud
-
Hands-on experience with relational databases (Postgres, SQL Server, MySQL) and non-relational systems such as Elasticsearch, Redis, and graph databases
-
Experience with document-heavy or text-heavy data processing (OCR, parsing, NLP preprocessing).
-
Strong data quality, governance, lineage, and validation mindset.
-
Excellent communicator who can align with ML, engineering, and product teams.
-
Experience building or supporting GenAI / LLM / RAG pipelines.
-
Experience with Azure OpenAI Service.
-
Experience with min.io
-
Background with knowledge graphs, semantic search, or indexing at scale.
-
Familiarity with CI/CD pipelines in Azure DevOps, GitHub Actions, or similar.