Data Engineer (AWS)
Job Summary
We are seeking a skilled Data Engineer with strong experience in AWS-based data platforms to design, build, and maintain scalable, reliable, and secure data pipelines and data infrastructure. The role focuses on enabling high-quality data ingestion, transformation, storage, and access to support analytics, AI/ML, and Generative AI workloads within a cloud-native ecosystem.
The ideal candidate has hands-on experience with AWS data services, distributed data processing, and modern data engineering best practices.
________________________________________
Key Responsibilities
- Design, build, and maintain scalable data pipelines for structured and unstructured data on AWS.
- Develop batch and streaming data ingestion frameworks using AWS-native services.
- Implement data transformation, validation, and quality checks to ensure reliable downstream consumption.
- Design and optimize data storage solutions using S3, RDS, DynamoDB, Redshift, and OpenSearch.
- Enable data access patterns for analytics, AI/ML, and GenAI applications.
- Implement monitoring, logging, and alerting for data pipelines and data platforms.
- Optimize data processing for performance, scalability, and cost efficiency.
- Collaborate with ML engineers, data scientists, backend engineers, and architects to support end-to-end data workflows.
- Participate in code reviews, testing, and continuous improvement of data engineering processes.
- Ensure data security, governance, and compliance using AWS best practices.
________________________________________
Required Skills
Programming & Data Processing
- Strong programming experience in Python (preferred) and/or Pyspark.
- Experience building data pipelines using Apache Spark, AWS Glue, or similar frameworks.
- Solid understanding of data modelling, schema design, and partitioning strategies.
AWS Data & Cloud Services
- Hands-on experience with AWS services such as:
o S3, Glue, Athena, Redshift
o Lambda, EC2, EMR
o DynamoDB, RDS
o IAM, CloudWatch
- Experience designing data platforms within VPC-based and secure AWS environments.
Databases & Storage
- Experience with relational databases (PostgreSQL, MySQL).
- Experience with NoSQL databases (DynamoDB).
- Familiarity with data lakes and lakehouse architectures.
Data Pipeline & Orchestration
- Experience with workflow orchestration tools such as Airflow, Step Functions, or managed schedulers.
- Understanding of batch and near-real-time data processing patterns.
DevOps & Platform Practices
- Experience with Docker and CI/CD pipelines.
- Familiarity with Infrastructure as Code using Terraform or CloudFormation.
- Strong understanding of monitoring, logging, and alerting for data systems.
________________________________________
Preferred Skills
- Experience with streaming data and event-driven architectures (Kafka, Kinesis, SNS/SQS).
- Exposure to data quality frameworks and metadata management.
- Familiarity with AI/ML data preparation workflows and feature stores.
- Experience supporting GenAI / LLM workloads through data ingestion, embedding pipelines, or vector data stores.
- Knowledge of security best practices, data encryption, and access control.
- Experience with observability tools such as Prometheus and Grafana.
________________________________________
Nice to Have
- Exposure to OpenSearch or vector databases.
- Understanding of cost optimization strategies for large-scale data pipelines.
- Experience working in Agile development environments.