Pyx Health is looking for a motivated and technically solid Data Engineer to join our growing Data & Analytics team. In this role, you will contribute to building and maintaining our data infrastructure on Azure, working alongside senior engineers to develop reliable data pipelines that support our healthcare solutions. You will work with Databricks, Airflow, Python, and SQL to implement and improve data workflows in a collaborative, high-growth environment. Only candidates residing in the USA may apply.
KEY RESPONSIBILITIES:
- Build and maintain data pipelines for ingesting, transforming, and cleaning healthcare data in the Azure cloud using Databricks, PySpark, and Delta Lake
- Implement pipeline logic from defined specifications, with guidance from senior engineers on architectural decisions
- Monitor pipelines for failures and performance issues, escalating complex problems appropriately
- Develop and maintain Airflow DAGs with appropriate error handling and retry logic
- Support deployment and configuration of pipelines via Astronomer on Azure using the Astro CLI
- Contribute to improving pipeline reliability and reducing manual intervention
- Implement data models for efficient storage and retrieval using Delta Lake, including merge/upsert patterns
- Apply Change Data Capture (CDC) patterns under the direction of senior team members
- Write and optimize T-SQL for SQL Server operations, stored procedures, and ETL support
- Work within the Azure ecosystem including ADLS Gen2, Key Vaults, Logic Apps, and Azure DevOps
- Follow established security and scalability standards when building data infrastructure
- Support infrastructure tasks with guidance from senior engineers on architecture
- Implement data quality checks and validation logic within pipelines
- Troubleshoot and resolve pipeline failures, documenting root causes and resolutions
- Contribute to monitoring and alerting improvements
- Work closely with data scientists, analysts, and business stakeholders to understand data needs
- Participate in code reviews, incorporating feedback to improve code quality
- Document pipelines, processes, and implementation decisions clearly and consistently
- Stay current with data engineering technologies, particularly in the healthcare space
- 2–4 years of experience as a Data Engineer or in a closely related data role
- Hands-on experience with Azure cloud services (ADLS, Databricks, or similar)
- Working knowledge of SQL Server, including T-SQL scripting and data modeling
- Ability to contribute to technical projects with moderate oversight
- Strong communication skills — comfortable asking questions and providing updates to cross-functional partners
- Familiarity with CI/CD concepts, Git, and version control workflows
- Solid problem-solving skills with a systematic approach to debugging
- Familiarity with healthcare data standards and regulations (HIPAA, HL7, etc.) is a plus
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent work experience)
MUST HAVES:
- Databricks SQL (Mid): Delta Lake fundamentals, basic merge/upsert patterns, familiarity with CDC concepts.
- Python (Mid): Pipeline logic, data transformation, SQL scripting; some PySpark/Spark DataFrame experience.
- Databricks Spark Notebooks (Mid): Notebook-based development, basic cluster usage, Delta table operations.
- Airflow Python Development (Mid): Ability to write and maintain DAGs; understanding of error handling and retry patterns.
- Airflow Astro Configuration (Foundational): Familiarity with Astronomer or willingness to learn; basic Astro CLI usage.
- Azure Ecosystem (Foundational): Working knowledge of ADLS Gen2, Key Vaults, and Azure DevOps.
- T-SQL (Foundational–Mid): Basic stored procedures, SQL Server querying, and data manipulation.
NICE TO HAVES:
- Azure Data Factory (Foundational): Basic familiarity with pipeline authoring and triggers.
- Git + Azure DevOps CI/CD (Foundational): Branching, pull requests, and basic deployment workflows.