Position Overview
We're seeking a self-sufficient Senior Data Engineer to build and scale our data infrastructure supporting product, engineering and analytics team. You'll architect data pipelines, optimize our data platform, and ensure the teams have reliable, high-quality data to drive business decisions.
This is a hands-on role for someone who can own the entire data engineering stack - from ingestion to transformation to orchestration. You'll work independently to solve complex data challenges and build scalable solutions.
Core Responsibilities
Data Pipeline Development & Optimization
Design, build, and maintain scalable data pipelines using Spark and Databricks
Develop ETL/ELT workflows to process large volumes of customer behavior data
Optimize Spark jobs for performance, cost efficiency, and reliability
Build real-time and batch data processing solutions
Implement data quality checks and monitoring throughout pipelines
Ensure data freshness and SLA compliance for analytics workloads
AWS Data Infrastructure
Architect and manage data infrastructure on AWS (S3, Glue, EMR, Redshift)
Design and implement data lake architecture with proper partitioning and optimization
Configure and optimize AWS Glue for ETL jobs and data cataloging
Shifting Glue jobs to Zero ETL
Implement security best practices for data access and governance
Monitor and optimize cloud costs related to data infrastructure
Data Modeling & Architecture
Design and implement dimensional data models for analytics
Build star/snowflake schemas optimized for analytical queries
Create data marts for specific business domains (retention, campaigns, product)
Ensure data model scalability and maintainability
Document data lineage, dependencies, and business logic
Implement slowly changing dimensions and historical tracking
Orchestration & Automation
Build and maintain workflow orchestration using Airflow or similar tools
Implement scheduling, monitoring, and alerting for data pipelines
Create automated data quality validation frameworks
Design retry logic and error handling for production pipelines
Build CI/CD pipelines for data workflows
Automate infrastructure provisioning using Infrastructure as Code
Cross-Functional Collaboration
Partner with Senior Data Analyst to understand analytics requirements
Work with Growth Director and team to enable data-driven decision making
Support CRM Lead with data needs for campaign execution
Collaborate with Product and Engineering on event tracking and instrumentation
Document technical specifications and best practices for the team
Work closely with all squads , establish data contracts with engineers to land data in a most optimal way.
Required Qualifications
Must-Have Technical Skills
Apache Spark: Expert-level proficiency in PySpark/Spark SQL for large-scale data processing - this is non-negotiable
Databricks: Strong hands-on experience building and optimizing pipelines on Databricks platform - this is non-negotiable
AWS: Deep knowledge of AWS data services (S3, Glue, EMR, Redshift, Athena) - this is non-negotiable
Data Modeling: Proven experience designing dimensional models and data warehouses - this is non-negotiable
Orchestration: Strong experience with workflow orchestration tools (Airflow, Prefect, or similar) - this is non-negotiable
SQL: Advanced SQL skills for complex queries and optimization
Python: Strong programming skills for data engineering tasks
Experience
6-10 years in data engineering with focus on building scalable data platforms
Proven track record architecting and implementing data infrastructure from scratch
Experience processing large volumes of event data (billions of records)
Background in high-growth tech companies or consumer-facing products
Experience with mobile/web analytics data preferred
Technical Requirements
Expert in Apache Spark (PySpark and Spark SQL) with performance tuning experience
Deep hands-on experience with Databricks (clusters, jobs, notebooks, Delta Lake)
Strong AWS expertise: S3, Glue, EMR, Redshift, Athena, Lambda, CloudWatch
Proficiency with orchestration tools: Airflow, Prefect, Step Functions, or similar
Advanced data modeling skills: dimensional modeling, normalization, denormalization
Experience with data formats: Parquet, Avro, ORC, Delta Lake
Version control with Git and CI/CD practices
Infrastructure as Code: Terraform, CloudFormation, or similar
Understanding of data streaming technologies (Kafka, Kinesis) is a plus
Core Competencies
Self-sufficient: You figure things out independently without constant guidance
Problem solver: You diagnose and fix complex data pipeline issues autonomously
Performance-focused: You optimize for speed, cost, and reliability
Quality-driven: You build robust, maintainable, and well-documented solutions
Ownership mindset: You take end-to-end responsibility for your work
Collaborative: You work well with analysts and business stakeholders despite being independent
Nice-to-Have
Databricks certifications (Data Engineer Associate/Professional)
Experience with dbt for data transformation
Knowledge of customer data platforms (Segment, mParticle, Rudderstack)
Experience with event tracking platforms (Mixpanel, Amplitude)
Familiarity with machine learning infrastructure and MLOps
Experience in MENA region or emerging markets
Background in on-demand services, marketplaces, or subscription businesses
Knowledge of real-time streaming architectures
What We Offer
Competitive salary based on experience
Ownership of critical data infrastructure and architecture decisions
Work with modern data stack and cutting-edge AWS technologies
Direct impact on business decisions through data platform improvements
Comprehensive health benefits