Skills:
Bigdata, Pyspark, Python ,Hadoop / HDFS; Spark;
Good to have :
GCP/any cloud
Roles/Responsibilities:
-
Develops and maintains scalable data pipelines to support continuing increases in data volume and complexity.
-
Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organization.
-
Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
-
Writes unit/integration tests, contributes to engineering wiki, and documents work.
-
Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
-
Works closely with a team of frontend and backend engineers, product managers, and analysts.
-
Defines company data assets (data models), spark, sparkSQL, and hiveSQL jobs to populate data models.
-
Designs data integrations and data quality framework.