Job Description:
Responsibilities:
-
Develop & Optimize Data Pipelines
-
Build, test, and maintain ETL/ELT data pipelines using Azure Databricks & Apache Spark (PySpark).
-
Optimize performance and cost-efficiency of Spark jobs.
-
Ensure data quality through validation, monitoring, and alerting mechanisms.
-
Understand cluster types, configuration, and use-case for serverless
-
Implement Unity Catalog for Data Governance
-
Design and enforce access control policies using Unity Catalog.
-
Manage data lineage, auditing, and metadata governance.
-
Enable secure data sharing across teams and external stakeholders.
-
Integrate with Cloud Data Platforms
-
Work with Azure Data Lake Storage / Azure Blob Storage/ Azure Event Hub to integrate Databricks with cloud-based data lakes, data warehouses, and event streams.
-
Implement Delta Lake for scalable, ACID-compliant storage.
-
Automate & Orchestrate Workflows
-
Develop CI/CD pipelines for data workflows using Azure Databricks Workflows or Azure Data Factory.
-
Monitor and troubleshoot failures in job execution and cluster performance.
-
Collaborate with Stakeholders
-
Work with Data Analysts, Scientists, and Business Teams to understand requirements.
-
Translate business needs into scalable data engineering solutions.
-
API expertise
-
Ability to pull data from a wide variety of APIs using different strategies and methods
Required Skills & Experience:
-
Azure Databricks & Apache Spark (PySpark) – Strong experience in building distributed data pipelines.
-
Python – Proficiency in writing optimized and maintainable Python code for data engineering.
-
Unity Catalog – Hands-on experience implementing data governance, access controls, and lineage tracking.
-
SQL – Strong knowledge of SQL for data transformations and optimizations.
-
Delta Lake – Understanding of time travel, schema evolution, and performance tuning.
-
Workflow Orchestration – Experience with Azure Databricks Jobs or Azure Data Factory.
-
CI/CD & Infrastructure as Code (IaC) – Familiarity with Databricks CLI, Databricks DABs, and DevOps principles.
-
Security & Compliance – Knowledge of IAM, role-based access control (RBAC), and encryption.
Preferred Qualifications:
-
Experience with MLflow for model tracking & deployment in Databricks.
-
Familiarity with streaming technologies (Kafka, Delta Live Tables, Azure Event Hub, Azure Event Grid).
-
Hands-on experience with dbt (Data Build Tool) for modular ETL development.
-
Certification in Databricks, Azure is a plus.
-
Experience with Azure Databricks Lakehouse connectors for SalesForce and SQL Server
-
Experience with Azure Synapse Link for Dynamics, dataverse
-
Familiarity with other data pipeline strategies, like Azure Functions, Fabric, ADF, etc
Soft Skills:
-
Strong problem-solving and debugging skills.
-
Ability to work independently and in teams.
-
Excellent communication and documentation skills.