The Senior / Principal Data Engineer will design, develop, and maintain scalable data platform and analytics solutions across data lakes and operational databases. This role requires hands-on expertise in Azure Databricks, Azure SQL, Python/PySpark, Notebooks, with a strong understanding of data modeling, ETL/ELT best practices, and CI/CD automation in Azure DevOps. The ideal candidate will have a proven record of building robust, efficient, and secure data pipelines that enable analytics, reporting, and AI/ML solutions, preferably in life sciences, clinical research, or healthcare domains.
Requirements
Data Architecture & Engineering
-
Design and implement end-to-end data pipelines using Azure Databricks, Azure Data Factory, and ADLS Gen2.
-
Build scalable and performant data models for data lakes (Medallion architecture), data warehouses, and operational systems.
-
Develop ELT/ETL frameworks for ingestion from APIs, relational sources, flat files, and third-party systems (e.g., Dynamics 365, Veeva, EDC).
-
Optimize data transformations, partitioning, and delta lake performance for analytics workloads.
Data Integration & Automation
-
Leverage Python and PySpark for data ingestion, cleansing, enrichment, and advanced transformations.
-
Implement CI/CD pipelines for data workflows using Azure DevOps and Git, including automated testing, deployment, and monitoring.
-
Develop and integrate RESTful APIs for cross-system data exchange and automation.
Analytics Enablement
-
Collaborate with the BI team to ensure clean, high-quality, and accessible data for the Power BI platform.
-
Support semantic modeling, metric layer design, and data governance best practices.
-
Enable advanced analytics by provisioning data for ML/AI initiatives and predictive insights.
Cross Functional Collaboration
-
Collaborate with product/system owners, analysts, and business stakeholders to translate analytical requirements into technical data solutions.
-
Drive best practices in Agile development, version control, and DevOps workflows.
Education Requirements and Qualifications
Qualifications
-
Bachelor's degree in Computer Science, Information Systems, Engineering, or related field (Master's preferred).
-
Minimum 5-8 years of relevant experience building and maintaining data solutions (data lakes, data warehouses, operational databases).
-
Expert-level proficiency in Azure Databricks, PySpark, SQL, and Azure DevOps.
-
Proven experience with Azure Data Factory, ADLS Gen2, and Azure SQL Server.
-
Working knowledge of CI/CD automation, version control (Git), and infrastructure as code (ARM or Terraform).
-
Experience with Power BI or similar analytics platforms (Tableau, Looker) required; experience with Snowflake, Redshift, or Synapse Analytics is a plus.
-
Strong analytical, debugging, and performance-tuning skills.
-
Experience in life sciences or healthcare industries is a strong plus.
Skills
Core expertise: Expert-level in Databricks, PySpark, SQL, and Azure DevOps
Data engineering: Data modeling, Delta Lake optimization, ETL/ELT design, distributed processing.
Integration & Automation: Azure Data Factory, REST APIs, CI/CD pipelines, Git branching strategies.
Analytics & BI: Power BI (Tableau), semantic layer design, DAX/SQL tuning.
Cloud & DevOps: Azure ecosystem (ADF, ADLS, Azure SQL, Synapse), Infrastructure as Code
Data Governance & Quality: Metadata management, data validation frameworks, logging and monitoring.
Soft skills: Good communication, mentoring, Agile teamwork, analytical thinking, collaboration.