About the Role
The Data Platform Engineer is responsible for architecting, deploying, and managing our enterprise Databricks platform on Azure — ensuring it is performant, secure, cost-efficient, and built to scale. This role sits at the core of our cloud data modernization initiative, working closely with data engineering, security, cloud infrastructure, and governance teams. The ideal candidate brings deep hands-on Databricks expertise, a strong infrastructure mindset, and the collaborative instincts to enable other engineering teams to build confidently on the platform.
Responsibilities
-
Architect, deploy, and manage Databricks workspaces, clusters, cluster policies, and compute resources to ensure performance, cost efficiency, and security.
-
Implement and maintain Delta Lake architecture, including optimized storage patterns, table design, and lifecycle management.
-
Develop and enforce platform governance, including Unity Catalog configuration, access controls, secrets management, and compliance standards.
-
Build automated provisioning workflows using Terraform, Bicep, or similar IaC tools to support repeatable, version-controlled infrastructure.
-
Collaborate with data engineering teams to optimize PySpark, SQL, and ML workloads for performance and reliability.
-
Integrate Databricks with enterprise systems such as Azure Data Lake Storage, Azure Data Factory, SQL Server, and identity providers.
-
Monitor platform health, job performance, and cost usage; implement observability and alerting for proactive issue resolution.
-
Support migration of legacy data pipelines and SQL Server workloads into Databricks-based architectures.
-
Partner with security and cloud teams to ensure adherence to organizational standards and cloud best practices.
-
Provide guidance, documentation, and enablement to engineering teams on Databricks features, patterns, and best practices.
Qualifications
-
Strong hands-on experience with Databricks (clusters, jobs, repos, Delta Lake, Unity Catalog).
-
Proficiency in PySpark, SQL, and distributed data processing concepts.
-
Experience with Azure (or AWS/GCP equivalents) including storage, networking, identity, and compute services.
-
Background in data engineering, ETL/ELT development, and pipeline optimization.
-
Experience migrating data and workloads from Microsoft SQL Server or other relational systems.
-
Familiarity with CI/CD workflows and Git-based development practices.
-
Strong understanding of security, governance, and platform reliability principles.
Required Skills
-
Strong hands-on experience with Databricks (clusters, jobs, repos, Delta Lake, Unity Catalog).
-
Proficiency in PySpark, SQL, and distributed data processing concepts.
-
Experience with Azure (or AWS/GCP equivalents) including storage, networking, identity, and compute services.
-
Background in data engineering, ETL/ELT development, and pipeline optimization.
-
Experience migrating data and workloads from Microsoft SQL Server or other relational systems.
-
Familiarity with CI/CD workflows and Git-based development practices.
-
Strong understanding of security, governance, and platform reliability principles.
Preferred Skills
-
Experience with Microsoft Purview or similar data cataloging and lineage tools.
-
Familiarity with MLflow, Feature Store, or machine learning lifecycle management within Databricks.
-
Knowledge of Databricks Asset Bundles (DABs) or Databricks Terraform provider for advanced IaC workflows.
-
Exposure to real-time or streaming data patterns using Structured Streaming or Apache Kafka.
-
Experience with cost optimization frameworks for cloud data platforms, including spot instance strategies and autoscaling policies.
-
Understanding of FinOps principles as applied to cloud data infrastructure.
-
Background in real estate, mortgage, or proptech data environments.
This role is primarily remote; however, candidates are expected to relocate to Plano, TX, or Irvine, CA in the future. Occasional travel to these locations may be required initially.