We are seeking an experienced engineer to design and implement a Quality Assurance (QA) framework for automated testing across our cloud-native, real-time, and batch data pipelines. The ideal candidate will have deep expertise in Python-based test frameworks, data validation at scale, and CI/CD integration, ensuring reliability, accuracy, and performance in our data platform.
Key Responsibilities
-
Design and develop a scalable automated data quality and testing framework for real-time (Kafka/Flink/Spark Streaming) and batch (PySpark) data pipelines.
-
Integrate automated data validation (schema checks, statistical profiling, anomaly detection) into data workflows using tools like Great Expectations, Deequ, or custom-built Python solutions.
-
Build test harnesses and reusable libraries for unit, integration, and regression testing of ETL pipelines and APIs.
-
Implement continuous testing pipelines integrated with CI/CD tools (Azure DevOps, GitHub Actions, or Jenkins).
-
Collaborate with data engineers to embed data quality gates within orchestration tools (Azure Data Factory / Airflow).
-
Define and monitor data quality KPIs, thresholds, and alerting using observability tools (Datadog, Prometheus, etc.).
-
Develop mock data generation and simulation frameworks for performance and stress testing.
-
Ensure comprehensive test coverage for schema evolution, transformation logic, and downstream data consumption APIs.
-
Contribute to best practices and documentation around test-driven data engineering and quality-first development culture
-
Have experience of Shift-Left Testing and Test-Driven Data Engineering
-
Integrate QA Framework with Orchestration and Monitoring Pipelines
-
Develop Synthetic Data Generators for Edge Cases and Negative Testing
-
Validate Schema Evolution and Backward Compatibility
-
Optimize Testing for Cost Efficiency
-
Ensure Stability of Every Release
Required Skills
-
Strong knowledge of data warehousing, lakehouse architectures, and data lake
-
Overall 6+ years of experience in data warehousing domain and 3+ in ETL testing and data quality
-
Strong Python development skills, emphasizing modular, testable, and reusable code.
-
Hands on knowledge of test automation tools and libraries for data warehouse solutions.
-
Expertise with data quality libraries such as Great Expectations, Deequ, Soda Core, or equivalent.
-
Solid understanding of SQL and ability to validate transformations over large-scale datasets.
-
Familiarity with Apache Spark (PySpark) and real-time data processing frameworks (Kafka, Flink, Spark Streaming).
-
Experience with Azure ecosystem (Data Factory, Databricks, Storage, Synapse).
-
Experience with CI/CD pipelines and automated testing workflows.
-
Exposure to monitoring and alerting for data quality metrics
We have an amazing team of 700+ individuals working on highly innovative enterprise projects & products. Our customer base includes Fortune 100 retail and CPG companies, leading store chains, fast-growth fintech, and multiple Silicon Valley startups.
What makes Confiz stand out is our focus on processes and culture. Confiz is
ISO 9001:2015
(QMS),
ISO 27001:2022
(ISMS),
ISO 20000-1:2018
(ITSM),
ISO 14001:2015
(EMS),
ISO 45001:2018
(OHSMS) Certified. We have a vibrant culture of learning via collaboration and making workplace fun.
People who work with us work with cutting-edge technologies while contributing success to the company as well as to themselves.
To know more about Confiz Limited, visit:
https://www.linkedin.com/company/confiz-pakistan/