Qureos

Find The RightJob.

Senior Big Data Engineer

Responsibilities:
  • Design, implement, and optimize data pipelines for batch and real-time data processing using Cloudera (Hadoop, Hive, Spark, Impala) and Informatica (PowerCenter, Cloud Data Integration)
  • Build data extraction, transformation, and loading (ETL) workflows using Informatica PowerCenter for large-scale data integration from source systems (e.g., relational databases, flat files, APIs) into Cloudera Data Lake or data warehouse environments.
  • Implement Spark jobs on Cloudera for distributed data processing and optimization of data workflows.
  • Leverage Informatica for orchestrating ETL workflows, including data extraction, cleansing, transformation, and loading into data repositories (HDFS, Hive, SQL databases, etc.).
  • Optimize the Informatica workflows to minimize runtime, ensure smooth data integration, and maintain high data quality.
  • Utilize Hadoop and Spark on Cloudera to process large datasets and implement data transformations using MapReduce, Spark SQL, and PySpark.
  • Leverage Impala for low-latency SQL queries on Hadoop, ensuring real-time access to processed data.
  • Implement partitioning, bucketing, and indexing strategies in Hive and HBase to improve query performance on large datasets.
  • Implement and enforce data quality rules within Informatica workflows, ensuring that all transformations meet the required standards for completeness, consistency, and accuracy.
  • Ensure compliance with data governance and security protocols (e.g., encryption, masking, access control) in accordance with industry best practices.
  • Automation and Scheduling: Automate ETL workflows using Informatica Server, integrating with Airflow, Nifi or other workflow orchestration tools for scheduling and monitoring jobs.
  • Utilize Cloudera Navigator for monitoring and auditing data processes within the Hadoop ecosystem.
  • Perform regular tuning of the ETL pipelines, data flows, and SQL queries to ensure optimal performance.
Qualifications:
  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • 6+ years of experience in the same field.
  • Proven experience with the Cloudera Distribution of Hadoop (CDH), including expertise in HDFS, Hive, Impala, Spark, and HBase.
  • Strong hands-on experience with Informatica PowerCenter (ETL), EDC, IDQ, B2B, and Axon.
  • Deep understanding of ETL best practices, data pipelines, and distributed computing technologies such as Spark, MapReduce, PySpark, and Hadoop ecosystem components.
  • Advanced SQL skills for data manipulation, aggregation, optimization, and reporting across relational and non-relational data stores (e.g., SQL Server, MySQL, PostgreSQL, Hive, Impala).
  • Experience in Python and SQL.
  • Strong background in data warehousing principles and data modeling, including dimensional modeling (star schema, snowflake schema) and OLAP/OLTP considerations.

© 2026 Qureos. All rights reserved.