Job Description
Role Purpose:
The Data Modernization Lead is the most technically consequential hire in Bupa Arabia's Data Office. The role owns the end-to-end design, build, and operationalization of Bupa Arabia's cloud-native data platform on the cloud starting with Google Cloud Platform (GCP) trusted by Bupa teams serving over 12 million members.
Cloud starting GCP and Big Query are selected. This role builds the platform, not the strategy- Writing code, setting engineering standards, enforcing data quality at every Medallion layer, and holding the system together as it scales. The Lead acts as technical authority over any external implementation vendor, holding them accountable to SLA benchmarks and engineering quality. The role is responsible for delivering a Vertical Slice (Business-First Agile) implementation: first measurable business value within 90 days, full enterprise scale within 12–18 months, and a platform compliant with NDMO and PDPL from Day 1.
Key Accountabilities:
1- Build & Operate Real-Time Data Ingestion Pipelines;
-
Design and operate GCP DataStream CDC pipelines from all Oracle sources
-
Build event-driven ingestion using Pub/Sub + Dataflow
-
Engineer schema evolution pipelines automatically handle new columns, type changes, and table additions in source systems without failures or manual code changes
-
Enforce metadata capture: source system, timestamp, job ID, record count, schema version, and lineage marker logged on every ingestion event.
2- Design & Deliver the Medallion Architecture (Bronze / Silver / Gold);
-
Author and maintain all silver and gold layer dbt models in dbt Cloud with Git version control, CI/CD deployment pipelines (GitHub Actions), and automated dbt test suites.
-
Write dbt tests covering completeness, uniqueness, referential integrity, and custom business logic.
-
Design Analytical MDM layer - golden records for Customer and Member with de-duplication, survivorship rules, and multi-year history preservation.
3- Build & Govern the Looker BI Semantic Layer;
-
Build and govern the BI Layer and Look ML semantic layer 50+ Tier-1 KPIs with reusable, governed dimensions and measures.
-
Enable self-service exploration: business users must be able to drill from aggregate KPI to individual claim or member record without writing SQL or requesting analyst support.
-
Configure Looker role-based access controls aligned precisely to BigQuery.
-
Implement embedded analytics for internal portals and clinical dashboards via the Looker REST API; maintain T-1 daily refresh and sub-15-minute micro-batch refresh for operational dashboards.
4- Enforce Data Governance, Quality & NDMO / PDPL Compliance;
-
Configure and operate GCP Dataplex for automated data discovery, column-level PII / PHI / SPI classification, data lineage (Bronze → Silver → Gold → Looker), business glossary, and DQ monitoring across all Medallion layers.
-
Enforce NDMO compliance: all data resident in GCP me-central2 (Dammam, KSA); data classification taxonomy applied and auditable; PDPL retention policies enforced at column level.
-
Build and maintain automated source-to-target reconciliation: daily validation that bronze, silver, and gold data reconciles to source with zero tolerance on Tier-1 financial KPIs before any report is released.
-
Define and enforce the Definition of Done for all data engineering deliverables - no dataset is complete until dbt tests pass, documentation is merged, lineage is captured, and DQ gates are green.
5- Build AI / ML Data Pipeline & Operationalize Vertex AI Use Cases;
-
Design and populate the Vertex AI Feature Store from Gold layer data -enabling Wave 1 AI use cases: FWA Service Overutilization, FWA Duplicated Claims, FWA Provider Collusion, Document OCR Extraction, and member churn propensity.
-
Build Vertex AI Pipelines for automated model training, evaluation, promotion to Model Registry, and deployment to production inference endpoints no manual notebook-to-production process
-
Enable BigQuery ML as a self-service modelling tool for actuarial and finance analysts requiring SQL-based predictive model development.
6- Platform Engineering, Vendor Oversight & Internal Knowledge Transfer;
-
Own the GCP landing zone: multi-environment architecture (Dev / UAT / Prod), GCP IAM with principle of least privilege, Terraform IaC for all infrastructure, BYOK KMS encryption via Thales, and Cloud Composer orchestration.
-
Operate a co-delivery model: Bupa Arabia engineers embedded in vendor squads as co-developers; all code committed to Bupa-owned Git repositories from Day 1; vendor retains no proprietary ownership of any deliverable.
-
Drive embedded knowledge transfer — by programme close, a minimum of 3 Bupa Arabia data engineers must be independently capable of developing new dbt models, maintaining pipelines, and operating the platform without vendor dependency.
Skills
-
GCP - BigQuery (Medallion architecture, partitioning, clustering, cost optimisation, query plan tuning
-
GCP - Datastream (CDC from Oracle and SQL Server, initial backfill, schema drift handling)
-
dbt Cloud (CI/CD deployment, custom generic tests, macros, Semantic Layer, documentation site)
-
Looker / LookML (governed semantic layer, row-level security, RBAC, Looker API, PDTs)
-
SQL - BigQuery dialect (window functions, complex analytics, execution plan optimisation)
-
GCP - Pub/Sub + Dataflow (streaming ingestion, exactly-once semantics, dead-letter queues)
-
GCP - Dataplex (auto-discovery, column-level classification, lineage, DQ policies)
-
Vertex AI (Feature Store, Vertex AI Pipelines, Model Registry, batch and online inference)
-
Terraform + GCP IAM + KMS / BYOK (IaC, security controls, least-privilege architecture)
-
Cloud Composer / Apache Airflow (DAG design, SLA monitoring, backfill, GCP integration)
-
Python (Cloud Functions, custom Composer operators, pipeline scripting)
-
Git + CI/CD (GitHub Actions or equivalent - for both infrastructure and data model deployment)
-
GCP Professional Data Engineer certification - required (or written commitment to achieve within 6 months of appointment)
-
dbt Certified Developer - required (or written commitment to achieve within 6 months)
-
Looker Certified Explorer or Developer - required (or written commitment to achieve within 6 months)
-
Bachelor’s degree in computer science, Data Engineering, Software Engineering, Mathematics, Statistics, or equivalent quantitative discipline - required
-
Experience in a regulated data environment (NDMO / PDPL / GDPR / HIPAA or equivalent) with demonstrable data classification, audit trail, and access control delivery - required
-
Health insurance, financial services, or actuarial data domain experience - highly desirable
Education
Bachelor’s degree computer science, Data Engineering, Software Engineering] or any related field.