Job Title: Big Data Engineer
Expericence - 5 to 8 Years -
CV with relevant skills will be contacted.
Role Purpose
The Big Data Engineer is responsible for designing, developing, and maintaining scalable data pipelines and distributed data processing systems. This role enables efficient data ingestion, transformation, and analytics across batch and real-time environments, supporting enterprise-wide data initiatives.
Key Responsibilities
-
Design, develop, and maintain scalable
batch and real-time data pipelines
to support analytics and business intelligence use cases
-
Build and manage distributed data processing solutions using
Apache Hadoop
and
Apache Spark
within
Cloudera Data Platform (CDP)
-
Develop and orchestrate
ETL workflows
using tools such as
Apache NiFi
-
Implement and manage
real-time streaming pipelines
using
Apache Kafka
-
Work with distributed storage systems such as
Hadoop Distributed File System (HDFS)
-
Utilize query engines like
Apache Hive
and
Apache Impala
for data access and analytics
-
Perform
data ingestion, transformation, and integration
from multiple structured and unstructured enterprise data sources
-
Optimize data pipelines for
performance, scalability, and reliability
-
Monitor and troubleshoot data workflows, ensuring high availability and data integrity
-
Collaborate closely with
data architects, analysts, and business stakeholders
to deliver data solutions aligned with business needs
-
Ensure adherence to
data governance, security, and data quality standards
-
Document data processes, architectures, and workflows for operational efficiency
Required Skills & Experience
-
5–8 years
of experience in Big Data Engineering or related roles
-
Strong hands-on experience with:
-
Apache Hadoop ecosystem
-
Apache Spark (PySpark/Scala preferred)
-
Apache Kafka (streaming)
-
Apache NiFi (data ingestion/ETL)
-
Experience with
Cloudera Data Platform (CDP)
or similar big data platforms
-
Proficiency in
SQL
and at least one programming language (Python, Scala, or Java)
-
Solid understanding of
distributed computing and parallel processing concepts
-
Experience working with
HDFS, Hive, and Impala
-
Knowledge of
data modeling, ETL design, and data warehousing concepts
-
Familiarity with
data governance, security, and compliance frameworks
-
Strong problem-solving and performance tuning skills
Preferred Qualifications
-
Experience with
cloud platforms
(AWS, Azure, or GCP) in big data environments
-
Knowledge of
containerization (Docker/Kubernetes)
is a plus
-
Exposure to
CI/CD pipelines
for data engineering workflows
-
Understanding of
real-time analytics and event-driven architectures
Key Competencies
-
Analytical thinking and problem-solving
-
Strong collaboration and communication skills
-
Ability to work in fast-paced, data-driven environments
-
Attention to detail and commitment to data quality
Education
-
Bachelor’s or Master’s degree in
Computer Science, Information Technology, Data Engineering
, or a related field