FIND_THE_RIGHTJOB.

Recruitment Hub 365

Pyspark Developer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Job Title : PySpark Developer

Location : Chennai, Hyderabad, Kolkata

Work Mode : Monday - Friday (5 days WFO)

Experience : 5+ Years in Backend Development

Notice Period : Immediate to 15 days

Must-Have Experience : Python, PySpark, Amazon Redshift, PostgreSQL

About the Role :

We are looking for an experienced PySpark Developer with strong data engineering capabilities to design, develop, and optimize scalable data pipelines for large-scale data processing. The ideal candidate must possess in-depth knowledge of PySpark, SQL, and cloud-based data ecosystems, along with strong problem-solving skills and the ability to work with cross-functional teams.

Roles & Responsibilities :

- Design and develop robust, scalable ETL/ELT pipelines using PySpark to process data from various sources such as databases, APIs, logs, and files.

- Transform raw data into analysis-ready datasets for data hubs and analytical data marts.

- Build reusable, parameterized Spark jobs for batch and micro-batch processing.

- Optimize PySpark job performance to handle large and complex datasets efficiently.

- Ensure data quality, consistency, and lineage, and maintain thorough documentation across

all ingestion workflows.

- Collaborate with Data Architects, Data Modelers, and Data Scientists to implement ingestion

logic aligned with business requirements.

- Work with AWS-based data platforms (S3, Glue, EMR, Redshift) for data movement and

storage.

- Support version control, CI/CD processes, and infrastructure-as-code practices as required.

Must-Have Skills :

- Minimum 5+ years of data engineering experience, with a strong focus on PySpark/Spark.

- Proven experience building data pipelines and ingestion frameworks for relational, semi-

structured (JSON, XML), and unstructured data (logs, PDFs).

- Strong knowledge of Python and related data processing libraries.

- Advanced SQL proficiency (Amazon Redshift, PostgreSQL or similar).

- Hands-on expertise with distributed computing frameworks such as Spark on EMR or

Databricks.

- Familiarity with workflow orchestration tools like AWS Step Functions or similar.

- Good understanding of data lake and data warehouse architectures, including fundamental

data modeling concepts.

Good-to-Have Skills :

- Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch.

- Exposure to Delta Lake or similar large-scale storage technologies.

- Experience with real-time streaming tools such as Spark Structured Streaming or Kafka.

- Understanding of data governance, lineage, and cataloging tools (AWS Glue Catalog, Apache

Atlas).

- Knowledge of DevOps/CI-CD pipelines using Git, Jenkins.

Job Type: Full-time

Pay: ₹1,500,000.00 - ₹2,000,000.00 per year

Application Question(s):

We are hiring for this position immediately. Are you available to join within 30 days? If not, please mention your official notice period or your last working day
How many years of experience do you have with PySpark?
How many years of experience do you have with Amazon Redshift?
How many years of hands-on experience do you have with ETL/ELT pipeline development?
What is your current location?
Are you comfortable working from office (WFO) Monday–Friday in Chennai/Hyderabad/Kolkata?
What is your current CTC? What is your expected CTC? Do you have any offers in hand?

Work Location: In person

Similar jobs

Junior Data Engineer

Prime Tek Embedded

India

3 days ago

AWS Data Engineer + Pyspark

Tata Consultancy Services (TCS)

Hyderabad, Pakistan

3 days ago

Term of use Privacy policy