Role Overview
We are seeking a Data Engineer to design, build, and optimize scalable data pipelines and distributed data systems. This role involves working with large datasets, real-time and batch processing systems, and production-grade data infrastructure.
You will be responsible for transforming raw data into reliable, structured, and high-quality datasets that power analytics, machine learning, and operational systems.
Key Responsibilities
- Design and implement scalable ETL/ELT pipelines
- Build reliable batch and real-time data processing workflows
- Develop data ingestion systems from multiple sources (APIs, streaming, files, databases)
- Ensure data quality, validation, and monitoring
- Optimize data storage, query performance, and cost efficiency
- Design data models and schemas for analytical and operational use cases
- Maintain data warehouse and/or data lake environments
- Collaborate with backend, ML, and analytics teams
Required Skills
- Strong proficiency in Python and/or Java
- Solid SQL skills and experience with relational databases
- Experience building production-grade data pipelines
- Understanding of distributed data processing concepts
- Experience with data warehousing and data modeling
- Familiarity with version control and CI/CD practices
- Strong debugging and performance optimization skills
Preferred Qualifications
- Experience with big data tools (Spark, Hadoop, etc.)
- Experience with streaming systems (Kafka, Kinesis, etc.)
- Experience with cloud platforms (AWS / GCP / Azure)
- Experience with data lakes and modern warehouse platforms
- Exposure to ML data pipelines or feature engineering workflows
- Knowledge of data governance and data quality frameworks