We are seeking a highly skilled Senior PySpark Data Engineer to design, build, and optimize large-scale data pipelines and distributed systems. Beyond deep expertise in Apache Spark (PySpark) and automation, this role requires the ability to manage stakeholders, ensure timely delivery, and assess requirements. You will play a critical role in bridging business needs with technical execution, ensuring high-quality, scalable, and reliable data solutions.
KEY RESPONSIBLITIES
Technical Delivery
- Data Pipeline Development: Design, develop, and optimize scalable ETL/ELT pipelines using PySpark.
- System Integration & Automation: Build automated data ingestion and transformation frameworks integrating with APIs, databases, enterprise applications, and cloud platforms.
- Performance Optimization: Tune Spark jobs for efficiency, scalability, and cost-effectiveness.
- Data Quality & Governance: Implement validation, monitoring, and controls to ensure data reliability and compliance.
- DevOps & Orchestration: Deploy pipelines through CI/CD frameworks and manage workflows using Airflow, Prefect, or similar tools.
Stakeholder Management
- Requirements Assessment: Engage with business stakeholders, analysts, and application owners to gather and translate requirements into technical deliverables.
- Deliverables & Timelines: Define scope, create technical plans, and manage progress to ensure on-time delivery of data solutions.
- Communication: Provide regular updates on progress, risks, and dependencies to technical and non-technical stakeholders.
- Cross-Functional Collaboration: Work closely with data scientists, product managers, and business teams to align solutions with organizational goals.
- Risk & Issue Management: Identify delivery risks early and proactively recommend mitigation strategies.
- Mentorship & Leadership: Guide junior engineers and champion best practices for data engineering and delivery management.
Educational
Qualification
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or related field.
- Experience Minimum 7+ years’ experience
- Required skills: Pyspark, Python, Airflow, SQL
- Domain: Data warehousing, Banking, Telecom
Job Type: Contractual / Temporary
Contract length: 12 months
Education:
- Secondary(10th Pass) (Preferred)
Experience:
- Pyspark: 4 years (Required)
- Data Engineer: 6 years (Required)
- GCP/Azure/AWS: 1 year (Preferred)
- Airflow: 1 year (Required)
- SQL: 1 year (Required)
Work Location: In person