Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark and Azure Data Factory.
- Work closely with business stakeholders, analysts, and data scientists to understand data requirements and deliver reliable solutions.
- Optimize ETL workflows for performance, scalability, and reliability.
- Implement best practices for data ingestion, transformation, and integration across multiple sources.
- Ensure data quality, governance, and security across the data lifecycle.
- Troubleshoot and resolve issues related to data pipelines, storage, and performance.
Required Skills & Qualifications
- 5+ years of total experience, with 3+ years relevant in PySpark, Azure Data Factory, and Python.
- Strong experience in building large-scale data pipelines and ETL workflows.
- Hands-on expertise in PySpark for data processing and transformation.
- Proficiency in Azure Data Factory (ADF) for orchestrating and automating workflows.
- Solid understanding of Python for scripting, data handling, and automation.
- Strong SQL skills and ability to work with relational and non-relational databases.
- Good knowledge of data warehousing concepts and performance optimization.
- Exposure to Azure ecosystem (Data Lake, Databricks, Synapse Analytics, etc.) preferred.
- Excellent problem-solving, analytical, and communication skills.
Nice to Have (Optional)
- Experience with CI/CD pipelines for data solutions.
- Knowledge of data governance, security, and compliance frameworks.
- Familiarity with real-time data streaming technologies (Kafka, Event Hubs, etc.).
Additional Details
- Cloud Preference: Azure only (AWS experience not required).
- Budget/CTC Range: 18 LPA.
- Contract/Full-Time: Full-Time