Senior data engineer to design and operate production ETL/ELT pipelines on enterprise cloud platforms (GCP and Azure) for healthcare data. The role owns pipelines end-to-end — from requirements through deployment and ongoing operations — and is expected to set technical direction when requirements are unclear, not wait for it.
Core Responsibilities:
-
Design, build, and operate scalable ETL/ELT pipelines across GCP, Azure, BigQuery, and SQL Server.
-
Model data for analytical workloads — dimensional modeling, SCDs, and schema design.
-
Orchestrate pipelines using Airflow, Cloud Composer, Azure Data Factory, or similar.
-
Handle PHI in line with HIPAA requirements — secure movement, de-identification, access controls, and audit.
-
Deploy via Git and CI/CD; monitor and maintain pipelines in production.
-
Translate stakeholder needs into technical plans; communicate feasibility and tradeoffs early.
Technical Requirements:
-
7+ years in data engineering, with significant time at senior level delivering production systems.
-
Strong SQL plus Java or Python for pipeline development.
-
Hands-on across multiple cloud stacks (e.g., GCP, Azure, BigQuery, SQL Server).
-
Deep experience designing and operating reliable, scalable ETL/ELT pipelines; performance-minded.
-
Hands-on with at least one orchestration tool (Airflow, Cloud Composer, ADF, Dagster, or Prefect).
-
Strong data modeling skills — dimensional modeling, normalization, slowly-changing dimensions.
-
Working knowledge of HIPAA and PHI handling — secure movement, de-identification, access controls, audit.
-
Track record delivering on enterprise cloud platforms within standards and controls.
-
Proficient with Git and CI/CD.
Other Requirements:
-
End-to-end ownership: Drives delivery from definition through production with minimal direction; prioritizes operational delivery, not just prototypes.
-
Business translation: Converts stakeholder needs into clear technical plans; communicates feasibility and value early.
-
Proactive exploration: Evaluates new tools, runs lightweight proofs of concept, and shares findings.
-
Industry awareness: Tracks modern data trends (semantic layers, LLM-assisted dev, modern ELT) and brings relevant insights back.
-
Continuous learning: Adapts quickly while maintaining consistent delivery.