Data Engineer (Spark | Hadoop | Apache Ozone), Berkeley Heights, NJ

Data Engineer (Spark | Hadoop | Apache Ozone)

Berkeley Heights, NJ (Onsite)

Responsibilities:

Design and implement scalable distributed data processing solutions using Apache Spark and the Hadoop ecosystem.
Build and maintain Spark applications for ETL, aggregation, and large-scale data transformation.
Implement and manage enterprise data storage using Apache Ozone and HDFS.
Develop batch and real-time ingestion pipelines using modern big data technologies.
Optimize cluster performance, storage efficiency, and resource utilization.
Ensure data quality, governance, security, and compliance across platforms.
Troubleshoot performance issues across distributed environments.
Collaborate with Data Scientists, Analysts, and Application teams to deliver reliable data solutions.
Automate workflows and operational processes using scripting and orchestration tools.

Required Skills:

✔ Strong experience with Apache Spark (Core, SQL, Streaming).
✔ Hands-on expertise with Hadoop ecosystem (HDFS, YARN, MapReduce).
✔ Experience working with Apache Ozone object storage.
✔ Programming skills in Python, Scala, or Java.
✔ Experience building scalable ETL/Data Pipelines.
✔ Knowledge of distributed systems and cluster optimization.
✔ Strong Linux/Unix and shell scripting experience.
✔ Understanding of data security, governance, and compliance practices.

Preferred Skills:

Hive, HBase, or Kafka experience.
Cloud-based big data platforms (AWS, Azure, or GCP).
Containerization exposure (Docker, Kubernetes).
CI/CD and automation for data engineering workflows.

Qualifications:

Bachelor’s degree in computer science, Software Engineering, or related field.
Experience delivering enterprise data platform or product implementations preferred.

Excellent communication and collaboration skills, and Strong problem-solving mindset and analytical thinking.

Similar jobs

No similar jobs found

Term of use Privacy policy