The Role: As a Lead Data Engineer specializing in Databricks & Master Data Management, you will be a key player in designing, developing, and optimizing our next-generation data platform. You will lead a team of data engineers, providing technical guidance, mentorship, and ensuring the scalable, and high-performance data solutions.
Key Responsibilities:
-
Technical Leadership:
-
Lead the design, development, and implementation of scalable and reliable data pipelines using Databricks, Spark, and other relevant technologies.
-
Define and enforce data engineering best practices, coding standards, and architectural patterns.
-
Provide technical guidance and mentorship to junior and mid-level data engineers.
-
Conduct code reviews and ensure the quality, performance, and maintainability of data solutions.
-
Databricks Expertise:
-
Architect and implement data solutions on the Databricks platform, including Databricks Lakehouse, Delta Lake, and Unity Catalog.
-
Optimize Spark workloads for performance and cost efficiency on Databricks.
-
Develop and manage Databricks notebooks, jobs, and workflows.
-
Proficiently use Databricks features such as Delta Live Tables (DLT), Photon, and SQL Analytics.
-
Master Data Management:
-
Lead the technical design and implementation of the MDM solution (either using a dedicated tool or custom MDM on Databricks) for critical domains (e.g., Customer, Product, Vendor).
-
Define and implement data quality rules, entity resolution, matching, and survivorship logic to create and maintain ‘Golden Records.’
-
Partner with Data Governance and Data Stewardship teams to define and enforce organizational policies, standards, and data definitions for master data assets.
-
Ensure seamless and timely provisioning of high-quality master data from the MDM/Lakehouse platform to downstream consuming systems (ERP, CRM, BI).
-
Pipeline Development & Operations:
-
Develop, test, and deploy robust ETL/ELT pipelines for data ingestion, transformation, and loading from various sources (e.g., relational databases, APIs, streaming data).
-
Implement monitoring, alerting, and logging for data pipelines to ensure operational excellence.
-
Troubleshoot and resolve complex data-related issues.
-
Collaboration & Communication:
-
Work closely with cross-functional teams including product managers, data scientists, and software engineers.
-
Communicate complex technical concepts clearly to both technical and non-technical stakeholders.
-
Stay updated with industry trends and emerging technologies in data engineering and Databricks
Primary Skills :
-
Extensive hands-on experience with Databricks platform, including Databricks Workspace, Spark on Databricks, Delta Lake, and Unity Catalog.
-
Strong proficiency in optimizing Spark jobs and understanding Spark architecture.
-
Experience with Databricks features like Delta Live Tables (DLT), Photon, and Databricks SQL Analytics.
-
Deep understanding of data warehousing concepts, dimensional modeling, and data lake architectures.
-
Familiarity with data governance and cataloging tools (e.g., Purview, Profisee).