AWS Glue: 3+ years of hands-on experience in AWS Glue ETL development
Python/PySpark: Strong programming skills in Python and PySpark for data transformation
AWS Services: Proficiency in S3, Redshift, Athena, Lambda, and EMR
Data Formats: Experience with Parquet, Avro, JSON, CSV, and ORC file formats
SQL: Advanced SQL skills for data querying and transformation
ETL Concepts: Deep understanding of ETL/ELT design patterns and best practices
Data Modeling: Knowledge of dimensional modeling, star/snowflake schemas
Version Control: Experience with Git/Bitbucket for code management
Preferred Skills:
Experience with Vanguard platform or financial services domain Knowledge of AWS Step Functions for workflow orchestration Familiarity with Apache Spark optimization techniques
Experience with data lake architectures and Delta Lake
AWS certifications (Solutions Architect, Data Analytics, Developer) Experience with CI/CD pipelines for data workflows
Knowledge of data governance and compliance frameworks
Exposure to Terraform or CloudFormation for infrastructure as code