FIND_THE_RIGHTJOB.

GCP data engineer

India

Overview:
Design and implement complex ETL/ELT pipelines using PySpark and Airflow for large-scale data processing on GCP.Lead data migration initiatives, including automating the movement of Teradata tables to BigQuery, ensuring data accuracy and consistency.

Develop robust frameworks to streamline batch and streaming data ingestion workflows, leveraging Kafka, Dataflow, and NiFi.

Collaborate with data scientists to build ML-ready data layers and support analytics solutions.

Conduct proof of concepts (POCs) and document performance benchmarking for data throughput and velocity, ensuring optimized data workflows.

Enhance CI/CD pipelines using Jenkins and GitLab for efficient deployment and monitoring of data solutions.

Collaborate in agile teams for product development and delivery.

Ability to work independently and design data integrations and data quality framework.

Responsibilities:
Strong proficiency in Python and SQL for data engineering tasks.

Strong understanding and experience with distributed computing principles and frameworks like Hadoop, Apache Spark etc.

Advanced experience with GCP services, including BigQuery, Dataflow, Cloud Composer (Airflow), and Dataproc.

Expertise in data modeling, ETL/ELT pipeline development, and workflow orchestration using Airflow DAGs.

Hands-on experience with data migration from legacy systems (Teradata, Hive) to cloud platforms (BigQuery).

Familiarity with streaming data ingestion tools like Kafka and NiFi.

Strong problem-solving skills and experience with performance optimization in large-scale data environments.

Proficiency in CI/CD tools (Jenkins, GitLab) and version control systems (Git).

GCP Professional Data Engineer certification.

Requirements:
Design and implement complex ETL/ELT pipelines using PySpark and Airflow for large-scale data processing on GCP.Lead data migration initiatives, including automating the movement of Teradata tables to BigQuery, ensuring data accuracy and consistency.

Develop robust frameworks to streamline batch and streaming data ingestion workflows, leveraging Kafka, Dataflow, and NiFi.

Collaborate with data scientists to build ML-ready data layers and support analytics solutions.

Conduct proof of concepts (POCs) and document performance benchmarking for data throughput and velocity, ensuring optimized data workflows.

Enhance CI/CD pipelines using Jenkins and GitLab for efficient deployment and monitoring of data solutions.

Collaborate in agile teams for product development and delivery.

Ability to work independently and design data integrations and data quality framework.Strong proficiency in Python and SQL for data engineering tasks.

Strong understanding and experience with distributed computing principles and frameworks like Hadoop, Apache Spark etc.

Advanced experience with GCP services, including BigQuery, Dataflow, Cloud Composer (Airflow), and Dataproc.

Expertise in data modeling, ETL/ELT pipeline development, and workflow orchestration using Airflow DAGs.

Hands-on experience with data migration from legacy systems (Teradata, Hive) to cloud platforms (BigQuery).

Familiarity with streaming data ingestion tools like Kafka and NiFi.

Strong problem-solving skills and experience with performance optimization in large-scale data environments.

Proficiency in CI/CD tools (Jenkins, GitLab) and version control systems (Git).

GCP Professional Data Engineer certification.

Similar jobs