Senior Data Quality Analyst – Capgemini
About Capgemini
Capgemini is a global leader in consulting, digital transformation, technology, and engineering services. With a presence in over 50 countries and a strong heritage of innovation, Capgemini enables organizations to realize their business ambitions through an array of services from strategy to operations. Our collaborative approach and a people-centric work culture have made us a partner of choice for clients across industries.
Role Overview
We are seeking a highly skilled Senior Data Quality Analyst with a robust background in designing, implementing, and maintaining data quality frameworks leveraging Python or Collibra. The ideal candidate will be adept at ensuring data accuracy, consistency, completeness, and reliability across large-scale cloud-based platforms, especially within Azure Databricks environments. This role requires expertise in automated data quality assurance, a deep understanding of data governance, and hands-on experience integrating quality controls into modern data pipelines.
The Senior Data Quality Analyst will be embedded within an agile squad dedicated to a specific business mission while contributing to a broader program comprising 4 to 8 interconnected squads. Collaboration, technical leadership, and a continuous improvement mindset are essential as you work cross-functionally to elevate the organization’s data quality standards.
Key Responsibilities
1. Development & Integration
-
Design, develop, and implement automated data quality checks using Python scripts and libraries or Collibra Data Quality components.
-
Integrate data quality validation logic within existing ETL/ELT pipelines operating on Azure Databricks, ensuring quality gates are consistently enforced across all data flows.
-
Develop and maintain reusable Python modules that perform anomaly detection, schema validation, and rule-based data quality checks to enable rapid scaling of quality coverage.
-
Collaborate with data engineering teams to embed continuous quality controls throughout the data ingestion, transformation, and consumption lifecycle.
-
Support the deployment and management of Collibra-based data quality solutions to automate governance workflows and stewardship activities.
2
. Data Quality Management
-
Define, measure, and rigorously enforce data quality metrics, thresholds, and Service Level Agreements (SLAs) tailored to business-critical datasets.
-
Utilize Collibra to manage and operationalize data governance workflows, maintain business glossaries, and delineate stewardship responsibilities.
-
Monitor the integrity of data pipelines for completeness, accuracy, timeliness, and consistency across distributed and cloud-native environments.
-
Conduct detailed root cause analyses for complex data quality issues, collaborating with engineers and domain experts to drive permanent remediation and prevention strategies.
-
Implement and continuously refine monitoring frameworks, utilizing dashboards and alerting systems (built using Python and Collibra integrations) for real-time visibility into key data quality indicators.
3.
Support & Operations
-
Act as a Level 2/3 escalation point for data quality incidents, troubleshooting issues and coordinating with other agile squads and technical teams for rapid resolution.
-
Work closely with product owners, business analysts, and key stakeholders to understand evolving data requirements and ensure quality expectations are aligned and met.
-
Maintain and optimize operational dashboards for ongoing data quality monitoring, leveraging both Python-based and Collibra-integrated solutions.
-
Participate actively in agile ceremonies, including sprint planning, daily standups, reviews, and retrospectives, contributing to squad goals and continuous delivery improvements.
4.
Governance & Best Practices
-
Establish, document, and evangelize data quality standards, validation frameworks, and best practices across squads and the broader data organization.
-
Maintain comprehensive documentation on validation rules, automated test cases, and quality assurance procedures, ensuring transparency and repeatability.
-
Mentor, coach, and upskill junior data engineers and analysts in data quality concepts, tools, and processes to foster a quality-first culture.
-
Ensure strict compliance with data governance, privacy, and security policies by leveraging Collibra’s governance and stewardship frameworks.
-
Continuously assess emerging technologies, tools, and methodologies for potential enhancement of the data quality ecosystem.
Qualifications
-
Bachelor’s or Master’s degree in Computer Science, Data Management, Information Systems, or a closely related field.
-
Years of progressive experience in data quality engineering, data management, or related data roles within complex technology environments.
-
Demonstrable expertise in Python, including the development of reusable data quality and validation libraries.
-
Extensive hands-on experience with Azure Databricks, including cloud-native data processing, ETL/ELT orchestration, and distributed computing concepts.
-
Proficiency with Collibra Data Quality platform or equivalent data governance and stewardship tools.
-
Strong track record working in agile environments, participating in cross-functional teams, and adapting to rapidly evolving project requirements.
-
Excellent analytical, problem-solving, and communication skills, with the ability to convey complex technical topics to both technical and non-technical audiences.
Preferred Certifications (One or More)
-
Databricks Certified Data Engineer Associate or Professional
-
Microsoft Certified: Azure Data Engineer Associate
-
Python Institute Certifications (PCAP, PCPP)
-
Collibra Ranger or Collibra Data Quality Steward Certifications
Key Skills & Competencies
-
Deep understanding of data quality frameworks, methodologies, and industry best practices
-
Hands-on experience building automated data quality tests using Python, PySpark, or similar open-source libraries
-
Expertise in designing quality validation steps within ETL/ELT data pipelines for large volumes of structured and semi-structured data
-
Familiarity with cloud data ecosystems, especially Azure and Databricks
-
Proven ability to operationalize and scale data governance using Collibra or comparable tools
-
Experience with dashboarding, data visualization, and monitoring tools for real-time data quality tracking
-
Strong collaboration, leadership, and mentoring abilities within agile squads or matrix teams
-
Knowledge of data privacy, security, and regulatory compliance requirements
-
Ability to drive innovation and continuous improvement in data quality processes
What We Offer
-
Opportunity to work on cutting-edge data platforms and technologies in a global, multicultural environment
-
Collaborative and agile work culture with empowering career growth opportunities
-
Competitive remuneration, benefits, and professional certification support
-
Access to Capgemini’s global learning platforms, mentorship programs, and technology communities
-
Exposure to high-impact projects with Fortune 500 clients