Experience: 2-10 Years
Qualification: BE / B. Tech
Location: Pune, Manila, Remote
Technical Know -How: Pyspark, Python programming, ETL Pipeline, Spark SQL, DataFrames, Azure, Big data ecosystem
We are looking for skilled PySpark Developers to design and develop scalable data processing solutions. The role involves working with big data platforms, building ETL pipelines, and collaborating with cross-functional teams to ensure data availability, performance, and quality.
Key Responsibilities
- Design, develop, and optimize ETL/ELT pipelines using PySpark.
- Ingest, clean, transform, and process structured/semi-structured/unstructured data.
- Work with large-scale datasets on distributed computing platforms (Hadoop, Spark, Databricks).
- Integrate data from multiple sources including Delta Lake, Data Lake, RDBMS, APIs.
- Ensure high-performance, scalability, and reliability of data pipelines.
- Collaborate with data engineers, analysts, and data scientists to deliver business-ready datasets.
- Implement monitoring, logging, and error-handling for data pipelines.
- Contribute to data quality frameworks and best practices.
Required Skills & Experience
- Bachelor’s degree in computer science, computer engineering or similar.
- Excellent communication skills.
- 2–10 years of experience in data engineering with strong hands-on expertise in PySpark.
- Proficiency in Python programming for data manipulation and pipeline development.
- Strong knowledge of Spark SQL, DataFrames, and RDDs.
- Experience with big data ecosystems – Hadoop, Hive, or Databricks.
- Good understanding of SQL and relational databases (PostgreSQL, MySQL, Oracle).
- Familiarity with cloud platforms (AWS, Azure, GCP) and services (S3, ADLS, BigQuery).
- Experience with version control (Git) and CI/CD tools.
- Strong debugging, performance optimization, and troubleshooting skills.
- Experience with containerization (Docker, Kubernetes).