A Senior Data Engineer specializing in Python and SQL is responsible for designing, developing, and maintaining scalable data pipelines, ensuring the efficient flow of data across systems, and enabling data-driven decision-making within the organization.
Key Responsibilities
Data Pipeline Development
- Design and implement robust ETL/ELT pipelines for structured, semi-structured, and unstructured data.
- Automate data workflows using Python, Airflow and SQL-based tools.
- Optimize pipelines for performance, scalability, and reliability.
Data Modeling
- Create and maintain efficient database schemas and data models for analytical and operational systems.
- Implement dimensional modeling, star/snowflake schemas for data warehouses.
Data Integration
- Integrate data from multiple sources such as APIs, databases, and cloud platforms.
- Ensure seamless data flow across data lakes, warehouses, and reporting tools.
Performance Optimization
- Optimize SQL queries and database performance using indexing, partitioning, and query tuning techniques.
- Monitor and improve pipeline performance, reducing latency and bottlenecks.
Collaboration
- Work closely with data analysts, data scientists, and business teams to understand requirements and deliver solutions.
- Collaborate with DevOps teams for CI/CD pipeline integration.
Monitoring and Maintenance
- Implement logging, monitoring, and alerting for data pipelines and workflows.
- Troubleshoot and resolve pipeline or data-related issues proactively.
Data Governance and Security
- Ensure data quality, integrity, and security by implementing best practices.
- Work with teams to enforce policies like encryption, masking, and access control.
Leadership and Mentoring
- Mentor junior data engineers and lead technical discussions within the team.
- Drive architectural decisions and recommend best practices.
Skills and Expertise
Core Technical Skills
Programming
- Expert in Python: Data manipulation (Pandas, NUMPY and other library), API integration, and scripting.
- Knowledge of frameworks like PySpark for distributed data processing.
SQL
- Advanced SQL skills for query writing, optimization, and database management.
- Experience with relational (PostgreSQL, MySQL, SQL Server) and columnar databases (Snowflake, Redshift, Clickhouse).
Data Engineering Tools
- ETL Tools: Apache Airflow, Talend.
- Big Data Tools: Hadoop, Spark (preferred but not mandatory).
- Cloud Platforms: AWS (Glue, Redshift, S3), Azure, GCP.
Data Storage
- Proficiency with data lakes (e.g., S3, Delta Lake) and data warehouses (e.g., Snowflake, Redshift, Clickhouse).
Data Formats
- Expertise in handling file formats like Parquet, Avro, ORC, JSON, and CSV.
Data Pipeline Orchestration
- Build and manage workflows using Airflow, Talend.
Additional Skills
- CI/CD: Knowledge of CI/CD pipelines and version control (Git, Jenkins).
- Monitoring: Familiarity with logging and monitoring tools like Grafana, Prometheus, or CloudWatch.
- APIs: Experience with REST and GraphQL APIs for data integration.
Experience Requirements
- Mid-Level: 5–8 years of data engineering experience.
- Senior-Level: 8+ years with proven expertise in designing and scaling data pipelines.
Tools/Platforms Knowledge
- Programming: Python, SQL.
- Data Processing: Pandas, Numpy, PySpark.
- Databases: PostgreSQL, Snowflake, Redshift, Clickhouse.
- Workflow Orchestration: Apache Airflow, Talend.
- Monitoring: CloudWatch, Grafana, Prometheus.
- DevOps: Git, Jenkins, Docker (optional).
Job Type: Full-time
Experience:
- Data Engineer: 5 years (Required)
- Python: 3 years (Required)
Work Location: In person