What’s important to us:
We are looking for a skilled and experienced Data Engineer with over 4 years of professional experience in building, automating, and optimizing data pipelines and cloud-based architectures. The ideal candidate will have hands-on experience with cloud data services (AWS, Azure, or GCP) and CI/CD pipelines for deploying scalable, reliable, and secure data solutions.
The candidate will collaborate with cross-functional teams including data analysts, data scientists, and software engineers to design and maintain robust data infrastructure that supports analytics, AI/ML workflows, and enterprise reporting systems.
Key Responsibilities:
-
Design, build, and maintain end-to-end ETL/ELT pipelines using both on-premise and cloud-based technologies.
-
Architect and operate data storage and streaming solutions leveraging cloud-based services on AWS, Azure, or GCP.
-
Design and implement data ingestion and transformation workflows using Airflow, AWS Glue, or Azure Data Factory.
-
Develop and optimize data pipelines using Python and PySpark for large-scale distributed data processing.
-
Build data models — normalized, denormalized, and dimensional (Star/Snowflake) — for analytics and warehousing solutions.
-
Implement data quality, lineage, and governance using metadata management and monitoring tools.
-
Collaborate with cross-functional teams to deliver clean, reliable, and timely data for analytics and machine learning use cases.
-
Integrate CI/CD pipelines for data infrastructure deployment using GitHub Actions, Jenkins, or Azure DevOps.
-
Automate infrastructure provisioning using Infrastructure as Code (IaC) tools such as AWS CloudFormation or Terraform.
-
Monitor and optimize data processing performance for scalability, reliability, and cost-efficiency.
-
Enforce data security policies and ensure compliance with standards such as GDPR and HIPAA.
Must-Have Skills & Qualifications:
Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field.-
Experience: Minimum 4 years of hands-on experience as a Data Engineer or in data-intensive environments.
-
SQL Expertise: Advanced proficiency in SQL for complex queries, joins, window functions, and performance tuning.
-
Analytical Databases: Experience working with Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, and PostgreSQL.
-
Query Optimization: Skilled in query optimization, indexing, and execution plan analysis for high-performance analytics workloads.
-
Programming: Proficient in Python and PySpark for data manipulation, automation, and pipeline orchestration.
-
Data Processing Frameworks: Strong understanding of Apache Spark (RDD, DataFrame, Spark SQL, optimization), Hive, Hadoop, and Flink for large-scale distributed data processing.
-
ETL/ELT Frameworks: Hands-on experience designing and maintaining pipelines using Airflow, AWS Glue, or Azure Data Factory.
-
Data Integration Patterns: Familiar with incremental loading, Slowly Changing Dimensions (SCD), Change Data Capture (CDC), and error handling in data pipelines.
-
Data Modeling: Expertise in data modeling, schema design, and building normalized, denormalized, and dimensional (Star/Snowflake) schemas.
-
Data Architecture: Strong understanding of Data Warehousing, Data Lakes, and Lakehouse architectures, including Delta Lake, ACID transactions, and partitioning strategies.
-
Cloud Platforms: Practical experience with major cloud ecosystems —
-
AWS: S3, Glue, Redshift, Athena, Lambda, Step Functions, EMR
-
Azure: Data Factory, Data Lake Gen2, Synapse, Databricks
-
Cloud Security: Experience managing IAM roles, access control, and encryption in cloud environments.
-
Pipeline Optimization: Skilled in optimizing data pipelines for performance, scalability, and cost-efficiency.
-
CI/CD and DevOps: Hands-on experience with CI/CD tools such as GitHub Actions, GitLab CI, or Azure DevOps.
-
Version Control: Proficient with Git and familiar with agile development practices.
Good-to-Have Skills:
-
Experience with containerization and orchestration.
-
Exposure to data cataloging and governance tools.
-
Experience with monitoring tools .
-
Familiarity with data APIs and microservices architecture.
-
Certification in cloud data engineering (e.g., AWS Certified Data Engineer, Azure Data Engineer Associate, or GCP Professional Data Engineer).
-
Experience supporting machine learning and analytics pipelines.
Soft Skills:
-
Strong analytical and problem-solving mindset.
-
Excellent communication and documentation skills.
-
Ability to work collaboratively in a cross-functional, fast-paced environment.
-
Strong attention to detail with a focus on data accuracy and reliability.
-
Eagerness to learn and adopt emerging data technologies.