Job Description – AWS Platform Engineer
Key Responsibilities
System Ownership & Reliability
- Own and manage production and non-production financial systems.
- Ensure high availability, stability, and performance of all platforms.
- Troubleshoot complex issues across infrastructure, applications, middleware, and databases.
- Conduct detailed RCA and implement permanent fixes to prevent recurrences.
Operational Excellence
- Design, maintain, and execute DR strategies and periodic DR drills.
- Oversee OS patching, upgrades, and security hardening.
- Define and manage backup, restore, and retention strategies.
- Build monitoring, logging, and alerting frameworks for proactive issue detection.
SRE & Automation
- Apply SRE principles to enhance system resilience and scalability.
- Automate deployments, failover testing, scaling, and operational workflows.
- Build self-healing capabilities, runbooks, and automation tools.
- Develop automation using Python, Shell, Ansible, and Terraform.
Cloud & Hybrid Environments
- Operate systems across on-premises and AWS environments.
- Support cloud migration, modernization, patching, and automation initiatives.
- Work with AWS services including EC2, ECS/Fargate, RDS, S3, IAM, Lambda, VPC, and CloudWatch.
- Collaborate with infrastructure teams to optimize scalability, reliability, and cost.
CI/CD & Platform Engineering
- Build and maintain CI/CD pipelines using Terraform, Git, and IaC practices.
- Support platform operations for trading, risk, and capital market applications.
- Ensure smooth CI/CD for both application and infrastructure components.
Required Skills & Experience
Technical Skills
- 6–10 years in Platform Engineering, SRE, or DevOps roles.
- Strong hands-on experience with AWS, Linux/Windows, and hybrid cloud.
- Expertise in Terraform, IaC, Git, and CI/CD pipelines.
- Strong scripting skills: Python, Shell, Ansible.
- Proficiency in monitoring tools: CloudWatch, Prometheus, Grafana, ELK, etc.
- Solid understanding of cloud networking, IAM, security, and VPC design.
Operational Capabilities
- Experience with patching, upgrades, DR, backup/recovery, and system hardening.
- Strong troubleshooting skills across apps, middleware, infra, and integrations.
- Familiarity with ITIL processes (incident, problem, change management).
Preferred
- Exposure to financial domain systems such as Front Arena, Calypso, Murex, etc.
- Background in automation-driven operations and performance tuning.
- Experience in capital markets or financial services environments.
Soft Skills
- High ownership and accountability.
- Calm and decisive under high-pressure situations.
- Strong communication and cross-team collaboration.
- Passionate about automation, reliability, and continuous improvement.
Why Join Us?
- Opportunity to own platforms end-to-end and apply SRE principles to critical financial systems.
- Excellent compensation and benefits.
- Be part of a fast-growing FinTech startup environment.
Job Types: Full-time, Permanent
Benefits:
- Health insurance
- Provident Fund
Work Location: In person