Find The RightJob.

Platform SRE and Reliability Engineer

DeepLight AI is a specialist AI and data consultancy with extensive experience implementing intelligent enterprise systems across multiple industries, with particular depth in financial services and banking. Our team combines deep expertise in data science, statistical modeling, AI/ML technologies, workflow automation, and systems integration with a practical understanding of complex business operations.

The Platform SRE and Reliability Engineer is responsible for ensuring the absolute quality, resilience, and performance of the Bank’s next-generation AI and digital platforms. This role focuses on the high-stakes intersection of Site Reliability Engineering (SRE) and AI Quality Assurance, designing automated frameworks to validate everything from Conversational AI agents and RAG pipelines to core banking microservices. By implementing robust continuous testing pipelines and reliability governance, you will guarantee that the Bank’s AI-driven experiences remain secure, scalable, and deterministically accurate under real-world conditions.

As the Platform SRE & Reliability Engineer, your responsibilities include:

Building reusable automation frameworks to test the accuracy, stability, latency, and safety of Conversational AI platforms (voice and chat) and LLM-based agents.
Validating multi-agent orchestration, human-in-the-loop escalation logic, and the integrity of RAG pipelines and vector search results.
Testing AI/ML platform components for scaling behavior, failover resilience, high availability, and disaster recovery.
Integrating automated test pipelines into CI/CD workflows for MLOps, focusing on drift detection, retraining validation, and model registry integrity.
Verifying AI/ML pipelines on Azure AI Foundry and AWS SageMaker, ensuring data integrity across storage services (S3/Blobs) and serverless functions.
Conducting load testing for AI services and ensure engineering guardrails for fairness, explainability, and regulatory compliance are enforced.
Acting as a bridge between engineering and business, translating complex technical reliability requirements into actionable quality narratives.

As an AI consultancy, our greatest asset is the expertise of our people.

While technical mastery is the foundation of what we do, the ability to bridge the gap between complex data science and actionable business value is what defines your success with Deeplight.

We're looking for individuals who are not only world-class in their fields of specialism, but also compelling communicators and persuasive advocates for their own skills.

You will be the face of our firm, tasked with building trust, articulating the "why" behind your technical decisions, and effectively "selling" your vision to high-level stakeholders.

If you thrive on the challenge of presenting cutting-edge solutions as much as you do on building them, you will fit right in.

Requirements

To be successful in this role, we need you to have:

A Bachelor’s degree in Computer Science, AI, Software Engineering, or a related quantitative field. A Master’s degree in AI/ML is highly preferred.
5+ years in QA, Application Testing, or Reliability Engineering, ideally for a large-scale brand or digital-only bank.
Proven track record in deploying AI/ML QA solutions at an enterprise scale within the financial services sector.
Experience testing distributed architectures, microservices, and large-scale data platforms (Vector DBs, Data Lakes).
Expertise in Python-based automation frameworks and tools such as Selenium, Playwright, PyTest, JMeter, and Locust.
A deep understanding of LLM evaluation frameworks, prompt stability testing, and hallucination avoidance validation.
Hands-on experience testing and validating services across both Azure and AWS cloud environments.
Strong SQL/NoSQL validation skills (Postgres, MongoDB) and experience testing REST, GraphQL, and FastAPI integrations.
Be proficient in testing within Docker and Kubernetes (EKS/AKS) environments.

It would be beneficial if you also had:

An ability to evaluate and adopt emerging QA tools for AI frameworks like LangChain, CrewAI, and Bedrock.
An understanding of cutting-edge quality trends, including multimodal QA and RLHF (Reinforcement Learning from Human Feedback) output evaluation.
A proactive approach to identifying edge cases in AI agents that could impact banking compliance or customer experience.
A strong ability to coordinate with different functional teams to implement models and monitor outcomes.

Benefits

Benefits & Growth Opportunities:

Competitive salary.
Visa Sponsorship for the successful individual.
Comprehensive health insurance for the successful individual.
Professional development and certification support.
Opportunity to work on cutting-edge AI projects.
Career advancement opportunities in a rapidly growing AI company.

This position offers a unique opportunity to shape the future of AI implementation while working with a talented team of professionals at the forefront of technological innovation. The successful candidate will play a crucial role in driving our company's success in delivering transformative AI solutions to our clients.

At DeepLight AI, we recognise that diversity drives innovation. We are committed to fostering an inclusive environment where individuals with different thinking styles can thrive and contribute their unique strengths to our specialised AI and data solutions.

Our goal is to ensure our application and interview process is accessible, predictable, and fair for all candidates.

If you require any specific adjustments to the application process, or if you require any reasonable adjustments should you be successful in being processed to the interview stage, please do let us know. This information will be kept strictly confidential and will not impact hiring decisions.

Similar jobs

QA/QC Specialist

Burtplace General Contracting LLC

Abu Dhabi, United Arab Emirates

about 7 hours ago

QA & QC ENGINEER

HESAL CONTRACTING L.L.C

Dubai, United Arab Emirates

about 7 hours ago

Term of use Privacy policy