We are seeking a Data Engineer with
3+ years
of hands-on experience and a strong background in real-time and batch data processing, containerization, and cloud-based data orchestration. This role is ideal for someone passionate about building robust, scalable, and efficient data pipelines, and who thrives in agile, collaborative environments.
Responsibilities
-
Design, build, and maintain real-time data pipelines using streaming frameworks such as Kafka, Apache Flink, and Spark Structured Streaming.
-
Develop batch processing workflows with Apache Spark (PySpark)
-
Orchestrate and schedule data workflows using orchestration frameworks such as Apache Airflow and Azure Data Factory
-
Containerize applications using Docker, manage deployments with Helm, and run them on Kubernetes
-
Implement modern storage solutions using open formats such as Parquet, Delta Lake, and Apache Iceberg
-
Build high-performance analytics engines using tools like Trino or Presto
-
Collaborate with DevOps to manage infrastructure with Terraform and integrate with CI/CD pipelines via Azure DevOps
-
Ensure data quality and consistency using tools like Great Expectations
-
Write modular, well-tested, and maintainable Python and SQL code
-
Develop an observability layer to monitor and optimize performance across data pipelines
-
Participate in agile ceremonies and contribute to sprint planning and reviews
Required Skills
-
Advanced Python programming with a strong focus on modular and testable code
-
Strong knowledge of SQL and experience working with large-scale datasets
-
Hands-on experience with at least one major cloud platform (Azure preferred)
-
Solid experience with real-time data processing (Kafka, Flink, or Spark Streaming)
-
Expertise in Apache Spark (PySpark) for batch processing
-
Experience implementing lakehouse architectures and working with columnar storage (e.g., ClickHouse)
-
Proficient in using Azure Data Factory or Apache Airflow for data orchestration
-
Experience in building APIs to expose large datasets
-
Solid experience with Docker, Kubernetes, and Helm
-
Familiarity with data lake open formats such as Parquet, Delta Lake, and Iceberg
-
Basic experience with Terraform for infrastructure provisioning
-
Practical experience with data quality frameworks (e.g., Great Expectations)
-
Comfortable working in agile development teams
-
Proven ability in debugging and performance tuning of streaming and batch data jobs
-
Experience with AI-driven tools (e.g., text-to-SQL) is a plus
-
Bachelor’s degree in computer science or related discipline
We have an amazing team of 700+ individuals working on highly innovative enterprise projects & products. Our customer base includes Fortune 100 retail and CPG companies, leading store chains, fast-growth fintech, and multiple Silicon Valley startups.
What makes Confiz stand out is our focus on processes and culture. Confiz is
ISO 9001:2015
(QMS),
ISO 27001:2022
(ISMS),
ISO 20000-1:2018
(ITSM),
ISO 14001:2015
(EMS),
ISO 45001:2018
(OHSMS) Certified. We have a vibrant culture of learning via collaboration and making workplace fun.
People who work with us work with cutting-edge technologies while contributing success to the company as well as to themselves.
To know more about Confiz Limited, visit:
https://www.linkedin.com/company/confiz-pakistan/