FIND_THE_RIGHTJOB.
JOB_REQUIREMENTS
Hires in
Not specified
Employment Type
Not specified
Company Location
Not specified
Salary
Not specified
Job Title : PySpark Developer
Location : Chennai, Hyderabad, Kolkata
Work Mode : Monday - Friday (5 days WFO)
Experience : 5+ Years in Backend Development
Notice Period : Immediate to 15 days
Must-Have Experience : Python, PySpark, Amazon Redshift, PostgreSQL
About the Role :
We are looking for an experienced PySpark Developer with strong data engineering capabilities to design, develop, and optimize scalable data pipelines for large-scale data processing. The ideal candidate must possess in-depth knowledge of PySpark, SQL, and cloud-based data ecosystems, along with strong problem-solving skills and the ability to work with cross-functional teams.
Roles & Responsibilities :
- Design and develop robust, scalable ETL/ELT pipelines using PySpark to process data from various sources such as databases, APIs, logs, and files.
- Transform raw data into analysis-ready datasets for data hubs and analytical data marts.
- Build reusable, parameterized Spark jobs for batch and micro-batch processing.
- Optimize PySpark job performance to handle large and complex datasets efficiently.
- Ensure data quality, consistency, and lineage, and maintain thorough documentation across
all ingestion workflows.
- Collaborate with Data Architects, Data Modelers, and Data Scientists to implement ingestion
logic aligned with business requirements.
- Work with AWS-based data platforms (S3, Glue, EMR, Redshift) for data movement and
storage.
- Support version control, CI/CD processes, and infrastructure-as-code practices as required.
Must-Have Skills :
- Minimum 5+ years of data engineering experience, with a strong focus on PySpark/Spark.
- Proven experience building data pipelines and ingestion frameworks for relational, semi-
structured (JSON, XML), and unstructured data (logs, PDFs).
- Strong knowledge of Python and related data processing libraries.
- Advanced SQL proficiency (Amazon Redshift, PostgreSQL or similar).
- Hands-on expertise with distributed computing frameworks such as Spark on EMR or
Databricks.
- Familiarity with workflow orchestration tools like AWS Step Functions or similar.
- Good understanding of data lake and data warehouse architectures, including fundamental
data modeling concepts.
Good-to-Have Skills :
- Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch.
- Exposure to Delta Lake or similar large-scale storage technologies.
- Experience with real-time streaming tools such as Spark Structured Streaming or Kafka.
- Understanding of data governance, lineage, and cataloging tools (AWS Glue Catalog, Apache
Atlas).
- Knowledge of DevOps/CI-CD pipelines using Git, Jenkins.
Job Type: Full-time
Pay: ₹1,500,000.00 - ₹2,000,000.00 per year
Application Question(s):
Work Location: In person
© 2025 Qureos. All rights reserved.