Qureos

FIND_THE_RIGHTJOB.

Lead/Senior ML Data Engineer (Cloud-Native, Healthcare AI)

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Role: Lead/ Senior ML Data Engineer
Experience: 8+ years
Work Mode: Remote

We are seeking a highly autonomous and experienced Lead/ Senior ML Data Engineer to drive the critical data foundation for our AI analytics and Generative AI platforms. This is a specialized hybrid position, focusing on designing, building, and optimizing scalable data pipelines (ETL/ELT) that transform complex, messy clinical and healthcare data into high-quality, production-ready feature stores for Machine Learning and NLP models.

The successful candidate will own technical work streams end-to-end, ensuring data quality, governance, and low-latency delivery in a cloud-native environment.

Key Responsibilities & Focus Areas:

  • ML Data Pipeline Ownership (70-80% Focus): Design and implement high-performance, scalable ETL/ELT pipelines using PySpark and a Lakehouse architecture (such as Databricks) to ingest, clean, and transform large-scale healthcare datasets.
  • AI Data Preparation: Specialize in Feature Engineering and data preparation for complex ML workloads, including transforming unstructured clinical data (e.g., medical notes) for Generative AI and NLP model training.
  • Cloud Architecture & Orchestration: Deploy, manage, and optimize data workflows using Airflow in a production AWS environment.
  • Data Governance & Compliance: Mandatorily implement pipelines with robust data masking, pseudonymization, and security controls to ensure continuous adherence to HIPAA and other relevant health data privacy regulations.
  • Technical Leadership: Lead and define technical requirements from ambiguous business problems, acting as a key contributor to the data architecture strategy for the core AI platform.

Non-Negotiable Requirements (The "Must-Haves"):

  • Releevant 5+ years of progressive experience as a Data Engineer, with a clear focus on ML/AI support.
  • Deep expertise in PySpark/Python for distributed data processing.
  • Mandatory proficiency with Lakehouse platforms (e.g., Databricks) in an AWS production environment.
  • Proven experience handling complex clinical/healthcare data (EHR, Claims), including unstructured text.
  • Hands-on experience with HIPAA/GDPR compliance in data pipeline design.

Job Type: Full-time

Pay: ₹2,000,000.00 - ₹3,000,000.00 per year

Work Location: Remote

© 2025 Qureos. All rights reserved.