Overview:
We are looking for a Data Engineer with strong background in PySpark/Big Data development and Hadoop, with a minimum of 5 years of relevant experience in Big Data alone and Total 7+ Years of Exp. Proficiency in Hadoop, Hive, Spark, Unix, scala, SQL and Python.
Roles and Responsibilities:
-
Develop, implement, and design data pipelines and ETL processes to efficiently ingest, transform, and load large volumes of data.
-
Collaborate with cross-functional teams to gain insights into data requirements and devise scalable solutions for data storage, processing, and retrieval.
-
Fine-tune and optimize data processes to ensure exceptional performance, reliability, and data integrity.
-
Utilize PySpark, Spark, Hadoop, to build robust data solutions.
-
Keep abreast of the latest industry best practices and emerging technologies in data engineering.
-
Address and troubleshoot issues related to data pipelines and processing.
-
Participate actively in code reviews and offer constructive feedback to enhance code quality.
Qualifications:
-
Strong experience with Hadoop and its ecosystem tools such as Spark, Kafka, Hive, and Sqoop
-
Proficiency in SQL for data analysis, querying, and performance optimization
-
Hands-on experience with Unix/Linux environments
-
Programming experience in Scala and Python for data processing and pipeline development
-
Experience working with large-scale datasets and distributed data processing frameworks
-
Good understanding of ETL processes, data pipelines, and big data architecture