Job Summary:
We are seeking a highly skilled Data Engineer with expertise in leveraging Data Lake architecture and the Azure cloud platform to develop, deploy, and optimize data-driven solutions. You will play a pivotal role in transforming raw data into actionable insights, supporting strategic decision-making across the organization.
Key Responsibilities
-
Develop and optimize scalable data pipelines using Python and PySpark
-
Build and orchestrate data workflows with Azure Data Factory
-
Design and implement solutions using Azure Databricks and Synapse Analytics
-
Manage and maintain data storage solutions in Azure Data Lake Storage Gen2, leveraging cost-efficient architecture
-
Implement and manage CI/CD pipelines using Azure Logic Apps, with integration of tools such as Azure DevOps, GitHub Actions, or Jenkins
-
Model and maintain Medallion Architecture across Bronze, Silver, and gold layers based on business requirements
-
Collaborate with data scientists, analysts, and business stakeholders to ensure reliable data availability
-
Monitor and optimize performance and cost efficiency of data solutions across the ecosystem
Required Skills & Experience
-
Strong proficiency in Python and PySpark for data manipulation and transformation
-
Hands-on experience with Azure Data Factory, Azure Databricks, and Synapse Analytics
-
Familiarity with Azure Logic Apps for CI/CD pipeline management
-
Knowledge of CI/CD tools such as:
-
Azure DevOps Pipelines
-
GitHub Actions
-
Jenkins
-
Expertise in managing Azure Data Lake Storage Gen2 environments with emphasis on security and cost optimization
-
Deep understanding of Medallion Architecture principles:
-
Bronze Layer: Raw data ingestion
-
Silver Layer: Cleansed, enriched, and business-ready data
-
Gold Layer: Aggregated and analytics-ready data models
-
Strong problem-solving and communication skills
-
Experience on Cost optimization