Fynd is India’s largest omnichannel platform and a multi-platform tech company specializing in retail technology and products in AI, ML, big data, image editing, and the learning space. It provides a unified platform for businesses to seamlessly manage online and offline sales, store operations, inventory, and customer engagement. Serving over 2,300 brands, Fynd is at the forefront of retail technology, transforming customer experiences and business processes across various industries.
We're looking for an SDE 3 – Site Reliability Engineering (SRE) to join our Engineering Team. The Engineering Team forms the backbone of our core business. We build and operate critical systems that ensure the reliability, scalability, and performance of our services across the Fynd ecosystem. This includes infrastructure automation, monitoring, deployment pipelines, and building a culture of reliability throughout engineering. The SRE team works closely with product engineers, DevOps, and platform teams to keep our systems running smoothly and efficiently.
What will you do at Fynd?
-
Lead, mentor, and grow a team of 2-5 Site Reliability Engineers.
-
Define, implement, and advocate SRE best practices like SLAs, SLOs, SLIs, error budgets, and chaos engineering.
-
Build and maintain automated CI/CD pipelines and infrastructure using tools like Terraform, Jenkins, or GitHub Actions.
-
Own the observability stack—monitoring, alerting, logging, and tracing across microservices and platforms.
-
Improve reliability and scalability of services by proactively identifying bottlenecks and automating manual ops tasks.
-
Drive incident response practices including on-call rotations, runbooks, and blameless postmortems.
-
Ensure high availability and uptime across distributed systems hosted on AWS.
-
Collaborate with cross-functional teams to ensure the architecture is cloud-native, secure, and fault-tolerant.
-
Implement and optimize systems for cost-efficiency, auto-scaling, and performance.
-
Contribute to open source or write technical blogs to share insights and practices with the broader tech community.
-
This is a startup, so expect rapid changes and plenty of opportunities to take initiative and drive new initiatives.
Some Specific Requirements
-
At least 3+ years of experience leading SRE/DevOps/Infrastructure teams, with 5+ years overall in backend, systems, or infrastructure roles.
-
Strong experience managing distributed systems and microservices at scale.
-
Good understanding of Linux, Networking, Load Balancing, and Security concepts.
-
Hands-on experience with AWS services like EC2, ELB, AutoScaling, CloudFront, S3, CloudWatch.
-
Experience with container technologies and orchestration—Docker and Kubernetes is a must.
-
Strong proficiency with Infrastructure-as-Code tools like Terraform, CloudFormation, or Pulumi.
-
Familiarity with observability tools like Prometheus, Grafana, ELK, or Datadog.
-
Programming/scripting skills in Python, Go, Bash or similar for automation and tooling.
-
Understanding of message queues and event-driven architectures using Kafka or RabbitMQ.
-
Ability to manage incidents, write detailed postmortems, and improve reliability across teams and services.
-
Comfortable working in a fast-paced environment with a strong culture of ownership and continuous improvement.
What do we offer?
Growth
Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets and brilliant people to grow even further. We teach, groom and nurture our people to become leaders. You get to grow with a company that is growing exponentially.
Flex University
We help you upskill by organising in-house courses on important subjects
Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.
Culture
Community and Team building activities
Host weekly, quarterly and annual events/parties.
Wellness
Mediclaim policy for you + parents + spouse + kids
Experienced therapist for better mental health, improve productivity & work-life balance
We work from the office 5 days a week to promote collaboration and teamwork. Join us to make an impact in an engaging, in-person environment!