About The Job
-
Build and maintain data pipelines using Python and Apache Spark.
-
Orchestrate workflows using Airflow and Google Cloud Workflows (CloudFlow).
-
Develop, deploy, and manage containerized data services using Docker and Cloud Run.
-
Design, optimize, and monitor datasets and queries in BigQuery.
-
Ingest, transform, and integrate external data through REST APIs.
-
Manage data lifecycle and storage using Google Cloud Storage (GCS).
-
Implement data quality, monitoring, and observability best practices.
-
Collaborate with cross-functional engineering, product, and data science teams.
Requirements
-
2–4+ years of experience as a Data Engineer or similar role.
-
Strong proficiency in Python, SQL, and Spark/PySpark.
-
Hands-on experience with Airflow and cloud-native orchestration (e.g., Cloud Workflows).
-
Experience with Docker, containers, and deploying services in Cloud Run.
-
Skilled with BigQuery, GCS, and general GCP data tooling.
-
Experience working with REST APIs and building ingestion integrations.
-
Solid understanding of data modeling, ETL/ELT pipelines, and distributed systems.
Bonus:
Experience with healthcare data standards (FHIR, HL7) or regulated environments.
Benefits
-
Competitive salary and performance-based incentives
-
Flexible work arrangements
-
Paid time off and public holidays
-
Professional development opportunities (training, workshops, certifications)
-
Supportive, collaborative, and mission-driven work culture