We are looking for a highly skilled Senior Data Engineer with strong expertise in PySpark, Python, and Big Data technologies to design, build, and optimize scalable data platforms and pipelines. The ideal candidate will work closely with Data Scientists, Business Analysts, and Analytics Delivery teams to develop high-performance analytics solutions and support advanced data science initiatives.
Requirements
Key Responsibilities
-
Collaborate with Analytics Delivery Leads and Lead Data Engineers to understand business requirements and deliver impactful data solutions.
-
Work closely with Data Scientists and cross-functional teams to solve complex business problems through data engineering solutions.
-
Manage data onboarding, data access, and stakeholder coordination for analytics initiatives.
-
Design, develop, and maintain scalable, secure, and high-performance data pipelines.
-
Acquire, ingest, process, and transform large-scale structured and unstructured datasets.
-
Implement data engineering best practices for building reliable and production-ready data platforms.
-
Perform data wrangling, cleansing, transformation, and feature engineering for machine learning and analytics use cases.
-
Design and develop modular data pipelines to generate reusable features and modeling datasets.
-
Build and optimize data architectures supporting advanced analytics and machine learning workloads.
-
Contribute to data platform design by selecting appropriate technologies across Big Data, SQL, and NoSQL ecosystems.
-
Ensure data quality, integrity, governance, security, and scalability across the data lifecycle.
-
Participate in Agile squads and collaborate effectively with stakeholders across business and technology teams.
-
Contribute to enterprise data architecture strategy and roadmap aligned with business objectives.
Required Technical SkillsCore Data Engineering
-
Strong experience in Data Engineering and Big Data solutions.
-
Expert-level proficiency in PySpark.
-
Strong programming experience in Python.
-
Experience building large-scale distributed data processing pipelines.
Data Processing & ETL
-
Data Ingestion
-
Data Transformation
-
Data Wrangling
-
Data Preparation
-
Data Modeling
-
Feature Engineering
-
ETL/ELT Development
-
Batch Processing
-
Data Pipeline Development
Big Data Technologies
-
Apache Spark / PySpark
-
Distributed Data Processing Frameworks
-
Big Data Ecosystems
Database Technologies
-
Strong SQL expertise
-
Experience with Relational Databases
-
Experience with NoSQL Databases
-
Data Warehousing Concepts
Cloud & Analytics Platforms (Preferred)
-
Azure Data Platform
-
AWS Data Services
-
Google Cloud Data Services
-
Databricks
-
Hadoop Ecosystem
Machine Learning Data Support
-
Feature Engineering for ML Models
-
Data Preparation for Analytics and AI Use Cases
-
Building Modeling and Feature Tables
-
Supporting Data Science Workloads
Engineering Best Practices
-
Software Engineering Principles
-
Data Pipeline Optimization
-
Data Quality Management
-
Performance Tuning
-
Scalability & Security
-
Version Control (Git)
-
CI/CD Concepts
-
Documentation Standards
Required Soft Skills
-
Strong stakeholder management and communication skills.
-
Experience working with Business Analysts, Data Scientists, and Product Teams.
-
Ability to translate business requirements into scalable data solutions.
-
Strong analytical and problem-solving skills.
-
Experience working in Agile/Scrum environments.
-
Ability to work independently and within cross-functional teams.
Preferred Qualifications
-
Experience working in enterprise-scale data platforms.
-
Experience supporting machine learning and advanced analytics initiatives.
-
Exposure to cloud-native data engineering solutions.
-
Experience designing modern data architectures and data lake environments.