Senior data engineer
Senior Data Engineer: strongest on Glue + Spark + Iceberg + streaming/batch + Redshift
14 Weeks
Note – consultants should be able to travel with in US to Greenville site once every month.
Must-have: Strong Glue, ETL, and Spark skills; experience with Iceberg; ability to work with both batch and real-time or streaming ingestion; familiarity with Kinesis and/or MSK; and understanding of Redshift and data warehouse patterns.
Nice-to-have: Working knowledge of DynamoDB for metadata-related use cases, broader awareness of AWS streaming options, and prototype-building capability to support architecture validation.
Position Overview
We are seeking a Senior Data Engineer to design and build the data pipelines, data products, and integration flows for the GE Vernova MIDA engagement. This role involves hands-on pipeline architecture, data quality validation, and building the foundational data products that enable the Industrial Data Mesh.
Key Responsibilities
Technical Leadership
- Design and architect batch and streaming data pipelines for industrial data
- Define data product schemas, contracts, and quality validation rules
- Implement data integration patterns (CDC, event-driven, pub/sub) across OT and IT systems
- Design schema evolution strategies using Avro, Parquet, and Apache Iceberg
- Optimize pipeline performance and cost efficiency
Customer Engagement
- Collaborate with GE Vernova data teams to understand existing data flows
- Support data domain workshops with technical pipeline feasibility input
- Present pipeline design recommendations to customer engineering teams
Solution Development
- Build production-ready data pipelines on AWS infrastructure
- Implement data quality validation and enrichment at ingestion
- Develop automated testing for data products
- Create infrastructure-as-code for pipeline deployment
Qualifications
Experience
- 5-7 years in data engineering or ETL/ELT development
- Experience with large-scale streaming and batch data processing
- Experience in manufacturing or industrial data environments preferred
Technical Skills (AWS Services, would consider competitive alternatives)
- AWS Glue (ETL, Spark, Iceberg)
- Amazon Kinesis Data Streams / Firehose
- Amazon MSK (Kafka)
- Amazon Athena, Amazon Redshift
- AWS Step Functions, AWS Lambda
- Amazon S3 (partitioning, lifecycle management)
- Amazon DynamoDB (metadata/state)
- AWS CDK / CloudFormation
- Programming: Python, PySpark, SQL
Soft Skills
- Strong problem-solving and analytical skills
- Ability to work collaboratively with architects and customer teams
- Experience in agile environments
AWS Certifications (Nice to have)
- AWS Certified Data Analytics — Specialty
- AWS Certified Solutions Architect — Associate
Pay: $65.00 - $70.00 per hour
Experience:
- Glue + Spark + Iceberg + streaming/batch + Redshift: 4 years (Required)
- data engineering or ETL/ELT development: 8 years (Required)
License/Certification:
- AWS Certified Data Analytics – Specialty (Required)
- AWS Certified Solutions Architect – Associate (Required)
Work Location: Remote