The Global Data Insight & Analytics organization is looking for a Consultant Data Engineer with deep expertise in Google Cloud Platform (GCP) and a proven track record of designing and implementing complex ETL processes to build robust, scalable, and efficient data pipelines . In this role, you will be part of a dynamic, cross-functional team, collaborating closely and consistently with other engineers, business partners, product managers, and designers. You will be involved in frequent and iterative releases. Your focus will be on shaping our data architecture, enabling advanced analytics, and supporting the integration of AI/ML and LLM capabilities
Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field.
- 10+ years of experience in data engineering, with a strong focus on building and managing large-scale data platforms.
- Expert-level proficiency with Google Cloud Platform (GCP) data services (e.g., BigQuery, Cloud Storage, Dataflow, Pub/Sub, Cloud Functions).
- 7+ years of experience in designing, developing, and optimizing complex ETL processes.
- 3+ years of experience with BigQuery for data warehousing, modeling, and query optimization.
- Strong programming skills in Python, with significant experience in PySpark for data manipulation and processing on distributed frameworks like Dataproc .
- Solid understanding of data warehousing concepts, dimensional modeling, and SQL.
- Experience with version control systems (e.g., Git) and CI/CD practices.
- Experience working in a product-driven organization , contributing to data solutions that directly support product development and user needs.
Key Responsibilities:
- Design, develop, and maintain highly scalable and reliable ETL pipelines to ingest, transform, and load large datasets from various sources into our data ecosystem.
- Build and optimize data models in BigQuery for analytical and operational use cases, ensuring data quality, consistency, and accessibility.
- Leverage GCP services, including Dataproc, for distributed data processing using PySpark.
- Collaborate with data scientists, analysts, and product teams to understand data requirements and translate them into technical solutions.
- Implement data governance, security, and compliance best practices within the data platform.
- Monitor, troubleshoot, and optimize data pipeline performance and reliability.
- Mentor junior engineers and contribute to the continuous improvement of our data engineering practices and standards.
- Stay abreast of emerging technologies and industry trends, particularly in big data, cloud computing, AI/ML, and LLMs, and recommend their adoption where appropriate.