Mandatory Skills
-
Hands-on experience with AWS services, especially AWS Glue, DynamoDB, S3, CloudWatch
-
Strong SQL skills for data validation and troubleshooting (Aurora / Postgres / MySQL)
-
Experience in monitoring batch or near-real-time data pipelines
-
Ability to triage incidents, follow SOPs, and escalate appropriately
-
Experience working in production support environments with SLAs
-
Python: reading/debugging ETL scripts, making small fixes
-
Clear written and verbal communication for incident updates and documentation
Primary Skills
-
AWS Glue: job monitoring, reruns, backfills, basic troubleshooting
-
CloudWatch Logs: log analysis and error identification
-
Data validation & reconciliation between source and target systems
-
Ticketing tools (Jira, ServiceNow, Freshservice, etc.)
-
Understanding of source ingestion processing analytics data flows
Secondary / Good-to-Have Skills
-
Experience with API-based data ingestion (REST, JSON payloads)
-
Exposure to e-commerce platforms (BigCommerce, Shopify, etc.)
-
Familiarity with analytics/reporting use cases (marketing, performance dashboards)
-
Experience with on-call or extended-hours support
-
Basic understanding of data architecture concepts (batch vs streaming, ETL/ELT)
-
Knowledge of Git / version control for code review and fixes
Key Responsibilities
-
Monitor AWS data pipelines and respond to alerts and reported data issues
-
Perform SOP-based triage, Glue job reruns, and backfills
-
Validate data freshness and completeness using SQL and logs
-
Investigate non-SOP issues and apply L2 fixes in Python/Glue/configuration
-
Coordinate with stakeholders and provide timely incident updates
-
Maintain and improve SOPs, runbooks, and known-issue documentation
-
Flexibility to work in 24x7 environment (Primarily in US business hours)