Job Description
Cloud-Native Data Engineering on AWS
- Strong, hands-on expertise in AWS native data services: S3, Glue (Schema Registry, Data Catalog), Step Functions, Lambda, Lake Formation, Athena, MSK/Kinesis, EMR (Spark), SageMaker (inc. Feature Store)
-
Comfort designing and optimizing pipelines for both batch (Step Functions) and streaming (Kinesis/MSK) ingestion.
-
Data Mesh & Distributed Architectures
-
Deep understanding of data mesh principles: including domain-oriented ownership, treating data as a product, and the use of federated governance models
-
Experience enabling self-service platforms, decentralized ingestion, and transformation workflows.
-
Data Contracts & Schema Management
-
Advanced knowledge of schema enforcement, evolution, and validation (preferably AWS Glue Schema Registry/JSON/Avro)
-
Data Transformation & Modelling
-
Proficiency with modern ELT/ETL stack: Spark (EMR), dbt, AWS Glue, and Python (pandas)
AI/ML Data Enablement
- Designing and supporting vector stores (OpenSearch), feature stores (SageMaker Feature Store), and integrating with MLOps/data pipelines for AI/semantic search and RAG-type workloads
-
Metadata, Catalog, and Lineage
-
Familiarity with central cataloging, lineage solutions, and data discovery (Glue Data Catalog, Collibra, Atlan, Amundsen, etc.)
-
Implementing end-to-end lineage, auditability, and governance processes.
-
Security, Compliance, and Data Governance
-
Design and implementation of data security: row/column-level security (Lake Formation), KMS encryption, role-based access using AuthN/AuthZ standards (JWT/OIDC), GDPR/SOC2/ISO 27001-aligned policies
-
Orchestration & Observability
-
Experience with pipeline orchestration (AWS Step Functions, Apache Airflow/MWAA) and monitoring (CloudWatch, X-Ray) in large-scale environments.
APIs & Integration
- API design for both batch and real-time data delivery (REST, GraphQL endpoints for AI/reporting/BI consumption)
Job Responsibilities
- Design, build, and maintain ETL/ELT pipelines to extract, transform, and load data from various sources into cloud-based data platforms.
-
Develop and manage data architectures, data lakes, and data warehouses on AWS (e.g., S3, Redshift, Glue, Athena).
-
Collaborate with data scientists, analysts, and business stakeholders to ensure data accessibility, quality, and security.
-
Optimize performance of large-scale data systems and implement monitoring, logging, and alerting for pipelines.
-
Work with both structured and unstructured data, ensuring reliability and scalability.
-
Implement data governance, security, and compliance standards.
-
Continuously improve data workflows by leveraging automation, CI/CD, and Infrastructure-as-Code (IaC)