Senior Big Data Engineer

Responsibilities:

Design, implement, and optimize data pipelines for batch and real-time data processing using Cloudera (Hadoop, Hive, Spark, Impala) and Informatica (PowerCenter, Cloud Data Integration)
Build data extraction, transformation, and loading (ETL) workflows using Informatica PowerCenter for large-scale data integration from source systems (e.g., relational databases, flat files, APIs) into Cloudera Data Lake or data warehouse environments.
Implement Spark jobs on Cloudera for distributed data processing and optimization of data workflows.
Leverage Informatica for orchestrating ETL workflows, including data extraction, cleansing, transformation, and loading into data repositories (HDFS, Hive, SQL databases, etc.).
Optimize the Informatica workflows to minimize runtime, ensure smooth data integration, and maintain high data quality.
Utilize Hadoop and Spark on Cloudera to process large datasets and implement data transformations using MapReduce, Spark SQL, and PySpark.
Leverage Impala for low-latency SQL queries on Hadoop, ensuring real-time access to processed data.
Implement partitioning, bucketing, and indexing strategies in Hive and HBase to improve query performance on large datasets.
Implement and enforce data quality rules within Informatica workflows, ensuring that all transformations meet the required standards for completeness, consistency, and accuracy.
Ensure compliance with data governance and security protocols (e.g., encryption, masking, access control) in accordance with industry best practices.
Automation and Scheduling: Automate ETL workflows using Informatica Server, integrating with Airflow, Nifi or other workflow orchestration tools for scheduling and monitoring jobs.
Utilize Cloudera Navigator for monitoring and auditing data processes within the Hadoop ecosystem.
Perform regular tuning of the ETL pipelines, data flows, and SQL queries to ensure optimal performance.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or related field.
6+ years of experience in the same field.
Proven experience with the Cloudera Distribution of Hadoop (CDH), including expertise in HDFS, Hive, Impala, Spark, and HBase.
Strong hands-on experience with Informatica PowerCenter (ETL), EDC, IDQ, B2B, and Axon.
Deep understanding of ETL best practices, data pipelines, and distributed computing technologies such as Spark, MapReduce, PySpark, and Hadoop ecosystem components.
Advanced SQL skills for data manipulation, aggregation, optimization, and reporting across relational and non-relational data stores (e.g., SQL Server, MySQL, PostgreSQL, Hive, Impala).
Experience in Python and SQL.
Strong background in data warehousing principles and data modeling, including dimensional modeling (star schema, snowflake schema) and OLAP/OLTP considerations.

Similar jobs

[KSA] Data & AI – Senior Consultant / Associate Manager

SIA

Riyadh, Saudi Arabia

10 days ago

Senior Data Modeler

BBI

Riyadh, Saudi Arabia

10 days ago

Senior Data Engineer

BBI

Riyadh, Saudi Arabia

22 days ago

[KSA] Data & AI – Senior Consultant / Associate Manager

Sia

Riyadh, Saudi Arabia

about 1 month ago

Term of use Privacy policy