Must Have Technical/Functional Skills
- Minimum 7 years of overall experience in data engineering roles.
- Minimum 3 years of hands-on experience with Flink, Iceberg, and Starburst
- Extensive hands-on experience (3+ years) with Apache Flink for real-time stream processing,
including Flink SQL and Flink DataStream API.
- Proven experience with Apache Iceberg, specifically in designing and managing data lakehouse
architectures, schema evolution, and performance optimization of Iceberg tables.
- Solid experience with Starburst (Trino) or open-source Trino for data virtualization, federated querying,
and connecting to various data sources.
- Proficiency in at least one programming language commonly used in data engineering (e.g., Python, Scala, Java).
- Strong experience with cloud platforms (AWS, Azure, or GCP) and their respective data services
(e.g., S3, ADLS, EMR, Databricks, Snowflake, BigQuery, Redshift).
- Deep understanding of distributed systems, data warehousing concepts, ETL/ELT methodologies,
and data modeling.
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Familiarity with messaging systems like Apache Kafka.
- Excellent problem-solving, analytical, and communication skills, with the ability to articulate complex
technical concepts.
Roles & Responsibilities
- Real-time Data Pipeline Development: Design, develop, and maintain robust, high-throughput real-time data
streaming and processing pipelines using Apache Flink for complex event processing, stream analytics, and continuous transformations.
- Data Lakehouse Architecture: Implement and manage data lakehouse solutions leveraging Apache Iceberg for table
format management, ensuring ACID transactions, schema evolution, and efficient data versioning on large-scale data lakes.
- Data Virtualization & Federated Query: Utilize Starburst Enterprise (Trino) for data virtualization and federated
querying across diverse data sources, optimizing query performance and enabling unified data access for analytics and
reporting.
- ETL/ELT Development: Develop, optimize, and maintain traditional and modern ETL/ELT processes using various
tools and programming languages (e.g., Python, Scala, Java) to ingest, transform, and load data into analytical systems.
- Cloud Data Platform Integration: Integrate data solutions with Client's cloud infrastructure (e.g., AWS, Azure, GCP)
and leverage native cloud data services for storage (e.g., S3, ADLS), compute, and analytics.
- Performance Tuning & Optimization: Identify and resolve performance bottlenecks within data pipelines, Flink jobs,
Iceberg tables, and Starburst queries, ensur ing optimal resource utilization and query response times.
- Data Quality & Governance: Implement data quality checks, monitoring, and alerting mechanisms within data pipelines.
Ensure adherence to data governance policies, metadata management, and data lineage standards.
- Architectural Input: Contribute to the architectural design and evolution of Client's data platform, providing expertise
on Flink, Iceberg, and Starburst capabilities and best practices.
- Collaboration & Mentorship: Work closely with data architects, data scientists, business analysts, and other
engineering teams. Mentor junior engineers and foster a culture of technical excellence and continuous learning.
- Operational Excellence: Establish monitoring, logging, and alerting for data pipelines and infrastructure. Participate
in on-call rotations as needed to ensure the reliability and availability of data systems.
- Documentation: Create and maintain comprehensive technical documentation for data pipelines, architecture,
and operational procedures.
Salary Range: $115,000 to $140,000 per year