We are looking for skilled Hadoop Ecosystem Support Engineers to provide operational support and ensure the stability, performance, and availability of big data platforms. The ideal candidate will have hands-on experience managing and troubleshooting the Hadoop ecosystem — including HDFS, Hive, Spark, YARN, and other related components.
This role focuses on providing support, maintenance, and resolving issues.
Key Responsibilities
- Provide L3 production support for Hadoop ecosystem components (HDFS, Hive, Spark, YARN, Oozie, etc.).
- Monitor cluster health, performance, and resource utilization using tools such as Ambari, Cloudera Manager, or Grafana.
- Troubleshoot and resolve HDFS, Hive, and Spark job failures and performance issues.
- Perform root cause analysis (RCA) for recurring incidents and work with engineering teams to implement fixes.
- Manage user access, quotas, and security policies in Hadoop clusters.
- Conduct routine maintenance tasks such as service restarts, cluster upgrades, and patch management.
- Collaborate with data engineers and platform teams to ensure optimal cluster performance and reliability.
- Document support procedures, incident reports, and configuration changes.
Required Skills & Experience
- 3–8 years of experience supporting or administering Hadoop ecosystems in production.
- Strong hands-on knowledge of:
- HDFS (file system management, data balancing, recovery)
- Hive (query execution, metastore management, troubleshooting)
- Spark (job monitoring, debugging, performance tuning)
- YARN, Oozie, and Zookeeper
- Experience with cluster management tools like Ambari, Cloudera Manager, or similar.
- Proficiency in Linux/Unix system administration and shell scripting.
- Strong analytical and problem-solving skills with a focus on incident management and RCA.
- Familiarity with Kerberos, Ranger, or other security frameworks within Hadoop.
Nice to Have
- Exposure to cloud-based big data platforms (AWS EMR, Azure HDInsight, GCP Dataproc).
- Basic understanding of Python or Scala for log analysis and automation.
- Experience with Kafka, Airflow, or other data orchestration tools.
- Knowledge of ticketing systems (ServiceNow, JIRA) and ITIL processes.
Job Types: Full-time, Permanent
Work Location: In person