Job Title: PySpark Tech Lead
Location: Remote – Occasional visit Mumbai
Employment Type: Full-Time
Experience Level: 8+ years
About the Role
We are seeking an experienced PySpark Tech Lead to design, develop, and optimize large-scale data processing solutions using Apache Spark and Python. The ideal candidate will lead a team of data engineers, drive best practices in big data development, and collaborate with cross-functional teams to build scalable, high-performance data pipelines for analytics and business insights.
Key Responsibilities
- Lead the design, architecture, and implementation of end-to-end data processing and ETL pipelines using PySpark.
- Work closely with data architects, data scientists, and business stakeholders to translate requirements into technical solutions.
- Optimize Spark jobs for performance, scalability, and cost efficiency in distributed environments.
- Define and enforce coding standards, version control, and deployment best practices across the team.
- Mentor and guide junior engineers, conduct code reviews, and foster a culture of technical excellence.
- Collaborate with DevOps teams to manage data infrastructure, including cluster configuration, monitoring, and troubleshooting.
- Drive the adoption of modern data engineering tools and frameworks to improve productivity and reliability.
- Ensure data quality, governance, and compliance in all developed solutions.
Required Skills & Qualifications
- 8+ years of professional experience in data engineering or big data development.
- 3+ years of hands-on experience in PySpark, including Spark SQL, DataFrames, and RDDs.
- Strong programming skills in Python, with experience in building modular and testable code.
- Deep understanding of distributed computing concepts and Spark internals (partitions, shuffling, caching, etc.).
- Experience with data ingestion and integration from multiple sources (RDBMS, APIs, Kafka, etc.).
- Strong proficiency with SQL and experience working on data warehouses or data lakes (e.g., Delta Lake, Hive, Snowflake).
- Experience deploying Spark workloads on cloud platforms such as AWS EMR, Azure Databricks, or GCP Dataproc.
- Solid understanding of CI/CD pipelines, Git, and containerization (Docker/Kubernetes).
- Excellent problem-solving, communication, and leadership skills.
Nice to Have
- Experience with Airflow, NiFi, or other orchestration tools.
- Knowledge of Scala Spark or Java Spark.
- Familiarity with data streaming frameworks (Kafka Streams, Spark Streaming, Flink).
- Exposure to machine learning pipelines or feature engineering workflows.
- Understanding of data governance, metadata management, and data catalog tools.
Education
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
Why Join Us
- Opportunity to lead complex data initiatives and shape the organization’s data ecosystem.
- Collaborative environment that values innovation, learning, and technical excellence.
- Work with cutting-edge big data and cloud technologies in large-scale production environments.
Job Type: Full-time
Pay: ₹812,640.62 - ₹2,099,692.05 per year
Work Location: In person