Role : Data Lakehouse Architect
Location:
UAE (Dubai preferred) - On-site
Experience:
11 years - 16 years
Notice period :
immediate to 15 days max.
Note: Candidate should be from the Retail or Ecommerce domain only
ROLE OVERVIEW
We are looking for a hands-on
Lead Data Engineer
with 11+ years experience who has built and operated production-grade data lakehouse platforms.
This person will work directly under the Data Lakehouse Architect and own the end-to-end engineering execution across ingestion, transformation, orchestration, governance, and consumption layers. This is a technical delivery role requiring deep engineering skills and the ability to work across all layers of the lakehouse.
KEY RESPONSIBILITIES
-
Build and operate end-to-end ingestion pipelines from 20+ heterogeneous source systems including Oracle Retail, WMS, TMS, Loyalty platforms, and third-party APIs into the Bronze layer
-
Implement CDC pipelines for real-time and near-real-time data capture across relational and NoSQL sources
-
Design and build Silver and Gold transformation layers including data cleansing, enrichment, SCD Type 1 and 2, and complex business rule application
-
Develop and maintain orchestration workflows with automated retry, failure alerting, and SLA tracking
-
Implement data quality checks, validation rules, and reprocessing/backfill capabilities
-
Enforce security policies, PII masking, row-level and column-level access control as defined by the Architect
-
Enable consumption layers for Tableau, direct SQL users, and downstream API integrations
-
Support historical data migration from legacy cloud data warehouse into the lakehouse
-
Maintain pipeline documentation, source-to-target mappings, and data dictionaries
-
Build clean, well-structured data pipelines that support AI/ML feature engineering and model training workflows
REQUIRED SKILLS AND EXPERIENCE
-
12+ years in data engineering with hands-on experience across ingestion, transformation, orchestration, and governance layers in production lakehouse environments
-
Proficient in Python and PySpark for pipeline development and custom connector building
-
Hands-on experience with cloud data services (e.g. S3/Blob Storage, Glue, DMS, Kinesis/Event Hubs, EMR or equivalent)
-
Strong CDC implementation experience using DMS, Debezium, or equivalent across Oracle, RDS, and NoSQL sources
-
Experienced with Delta Lake/Apache Iceberg ACID transactions, merge/upsert operations, and partition management at scale
-
Strong hands-on experience with dbt for SQL-based transformation layers
-
Proficient with orchestration tools such as Apache Airflow (MWAA), Dagster, or Step Functions
-
Experienced with data quality frameworks such as Great Expectations, Deequ, or dbt tests
-
Hands-on with security implementation: IAM policies, PII masking, column-level and row-level access control
-
Strong SQL skills and dimensional modeling (star schema, snowflake schema) for BI consumption layers
-
Retail or e-commerce domain experience (Oracle Retail, Magento, Shopify, WMS, TMS) is a strong advantage
-
Familiarity with AI/ML pipeline requirements including feature store design, data preparation for model training, and vector database integration
PREFERRED / NICE-TO-HAVE PLATFORMS
-
AWS (Lake Formation, Glue, Redshift, EMR, SageMaker)
-
Databricks
-
Snowflake
-
Microsoft Fabric
-
Google BigQuery
-
Dataiku
-
Apache Iceberg / Hudi
EDUCATION
-
Bachelor’s or master’s degree in computer science, Information Systems, or a related field
-
Cloud Data Analytics or Solutions Architect certification from a major cloud provider (AWS, Azure, or GCP) is a strong advantage
-
Databricks Certified Data Engineer or Snowflake SnowPro certifications are a plus
Interested candidates share their resume at uma.jangra@glovetalent.com !!