Data Engineer (Spark | Hadoop | Apache Ozone)
Berkeley Heights, NJ (Onsite)
Responsibilities:
- Design and implement scalable distributed data processing solutions using Apache Spark and the Hadoop ecosystem.
- Build and maintain Spark applications for ETL, aggregation, and large-scale data transformation.
- Implement and manage enterprise data storage using Apache Ozone and HDFS.
- Develop batch and real-time ingestion pipelines using modern big data technologies.
- Optimize cluster performance, storage efficiency, and resource utilization.
- Ensure data quality, governance, security, and compliance across platforms.
- Troubleshoot performance issues across distributed environments.
- Collaborate with Data Scientists, Analysts, and Application teams to deliver reliable data solutions.
- Automate workflows and operational processes using scripting and orchestration tools.
Required Skills:
✔ Strong experience with Apache Spark (Core, SQL, Streaming).
✔ Hands-on expertise with Hadoop ecosystem (HDFS, YARN, MapReduce).
✔ Experience working with Apache Ozone object storage.
✔ Programming skills in Python, Scala, or Java.
✔ Experience building scalable ETL/Data Pipelines.
✔ Knowledge of distributed systems and cluster optimization.
✔ Strong Linux/Unix and shell scripting experience.
✔ Understanding of data security, governance, and compliance practices.
Preferred Skills:
- Hive, HBase, or Kafka experience.
- Cloud-based big data platforms (AWS, Azure, or GCP).
- Containerization exposure (Docker, Kubernetes).
- CI/CD and automation for data engineering workflows.
Qualifications:
- Bachelor’s degree in computer science, Software Engineering, or related field.
- Experience delivering enterprise data platform or product implementations preferred.
- Excellent communication and collaboration skills, and Strong problem-solving mindset and analytical thinking.