Sr. Data Engineer
THE POSITION
You’ll be a key member of VaxCare’s Product Group, joining our Data Engineering team and reporting to our Data Engineering Lead. We are seeking a highly skilled and experienced Senior Data Engineer to join our team. As a Data Engineer, you will play a critical role in the design, development, and management of our data processing and analytics infrastructure. The ideal candidate will have extensive hands-on experience working with Spark and Databricks, as well as a strong background in data engineering principles and best practices.
RESPONSIBILITIES
-
Design and implement Delta Lake-based data pipelines using Databricks Workflows, Delta Live Tables (DLT), and Unity Catalog for enterprise data governance
-
Build ELT/ETL pipelines using medallion architecture (bronze/silver/gold layers) supporting both batch and streaming workloads with Auto Loader and Structured Streaming
-
Architect lakehouse solutions leveraging Delta Lake ACID transactions, Z-ordering, liquid clustering, and partition
-
Implement CI/CD pipelines for data workflows using Git integration and Databricks Asset Bundle
-
Design data quality frameworks using Delta Live Tables expectations and custom PySpark validation with automated alerting and SLA monitoring
-
Create materialized views and incremental refresh strategies for optimized query performance
-
Collaborate with data scientists, ML engineers, and analysts to implement feature engineering pipelines and support MLOps workflows
-
Mentor junior engineers, conduct code reviews, and lead technical design
-
Implement data observability and monitoring using Databricks SQL, Lakeview dashboards, and custom alerting frameworks
-
Drive cost optimization initiatives leveraging Photon engine, serverless compute, and FinOps best practices
-
Troubleshoot and resolve complex issues related to distributed computing, data skew, and performance bottlenecks
-
Create comprehensive technical documentation including data contracts, runbooks, and data catalog metadata in Unity Catalog
-
Champion DataOps best practices including testing strategies, performance tuning, and data platform engineering principles
-
Stay current with lakehouse architecture trends and emerging technologies to continuously improve our data infrastructure
EXPERIENCE AND QUALIFICATIONS
Education:
-
Bachelor's degree in Computer Science, Data Engineering, Engineering, or related technical field OR equivalent practical experience
-
Master's degree or relevant industry certifications (Databricks Certified Data Engineer Professional, AWS/Azure Data certifications) preferred
Experience:
-
7+ years of data engineering experience with 3+ years hands-on production experience building data pipelines on Databricks and Apache Spark
-
Proven track record of designing and implementing lakehouse architectures at scale
Technical Skills:
Programming & Languages:
-
Expert-level proficiency in Python (PySpark, pandas, NumPy) and SQL (complex queries, window functions, CTEs, query optimization)
-
Experience with Spark SQL, Delta Lake SQL, and Databricks SQL
Apache Spark Expertise:
-
Deep expertise in Apache Spark including:
-
Performance optimization (partition tuning, broadcast joins, data skew handling, caching strategies)
-
Delta Lake features (ACID transactions, time travel, MERGE operations, CDC, liquid clustering)
-
Understanding of Spark internals (DAG execution, catalyst optimizer, tungsten execution engine)
Databricks Platform:
-
Production experience with Databricks including:
-
Delta Live Tables (DLT) for declarative pipeline development
-
Unity Catalog for data governance, access control, and lineage tracking
-
Databricks Workflows and orchestration
-
Cluster optimization and cost management (spot instances, autoscaling, serverless compute)
-
Databricks Asset Bundles for CI/CD
-
Databricks SQL and Lakeview dashboards
Data Architecture & Modeling:
-
Strong understanding of data modeling techniques:
-
Dimensional modeling (star schema, fact/dimension tables)
-
Medallion architecture (bronze/silver/gold layers)
-
Slowly Changing Dimensions (SCD) implementations
-
Expert-level SQL skills including query optimization, execution plan analysis, and performance tuning for billion-row datasets
-
Experience with modern lakehouse patterns and understanding of lakehouse vs. traditional data warehouse trade-offs
-
Familiarity with legacy systems (Oracle, SQL Server, DB2) and migration strategies to cloud platforms
DevOps & DataOps:
-
Strong DevOps/DataOps experience:
-
Git workflows (branching strategies, pull requests, code reviews)
-
CI/CD pipelines for data workflows (GitHub Actions, Azure DevOps, Jenkins)
-
Testing strategies (unit tests, integration tests, data quality tests)
-
Monitoring and observability (logging, alerting, SLA tracking)
Leadership & Soft Skills:
-
Proven ability to mentor junior engineers and conduct technical code reviews
-
Experience leading technical design discussions
-
Strong stakeholder management skills with ability to translate technical concepts to non-technical audiences
-
Systematic approach to debugging complex distributed systems and performance troubleshooting
-
Excellent problem-solving abilities with focus on pragmatic trade-offs between speed, cost, and quality
-
Strong communication and collaboration skills in cross-functional team environments