Must Have Technical/Functional Skills
We are looking for a skilled PySpark Data Engineer with strong hands-on experience in PySpark and Python to design, build, and optimize scalable data processing pipelines. The ideal candidate will have practical experience working with distributed data processing and a solid foundation in writing efficient, production-grade Python code
Required Technical Skills
- Strong hands-on experience in PySpark (Spark SQL, DataFrame API)
- Advanced proficiency in Python (data processing, performance tuning, modular coding)
- Solid understanding of ETL design patterns and data pipeline architecture
- Good working knowledge of SQL for data transformation and analysis
- Experience with data processing in distributed environments
Preferred Skills (Good to Have)
- Experience with cloud platforms (AWS preferred – S3, Glue, EMR or equivalent services)
- Familiarity with workflow orchestration tools such as Airflow or similar schedulers
- Exposure to data warehousing concepts (e.g., Snowflake or similar platforms)
- Knowledge of code versioning (Git) and CI/CD practices
Experience
- 3–8 years of experience in Data Engineering / PySpark development
- Proven hands-on project experience in PySpark + Python
Roles & Responsibilities
- Design, develop, and maintain ETL/ELT pipelines using PySpark
- Write optimized and scalable PySpark transformations using DataFrames and Spark SQL
- Develop reusable and efficient Python-based data processing components
- Ensure data quality, integrity, and performance across pipelines
- Perform debugging, performance tuning, and optimization of PySpark jobs
- Collaborate with cross-functional teams (Data Analysts, Architects, DevOps)
- Contribute to CI/CD pipelines and deployment workflows for data applications
- Monitor and troubleshoot data workloads in production environments
Salary Range: $100,000 to $120,000 per year
Salary Range
$100,000-$120,000 a year
Desired Candidate Profile
Qualifications : BACHELOR OF COMPUTER SCIENCE