Qureos

FIND_THE_RIGHTJOB.

Site Reliability Engineer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

JOB DESCRIPTION

Job Title : Site Reliability Engineer (SRE)

Job Location : Remote

Experience: 10 years

Salary : 25-30LPA

Notice Period : Immediate

Job Role:

We are seeking an experienced Site Reliability Engineer (SRE) to accelerate several high-impact platform initiatives:

Internal Developer Portal (IDP) & Self-Service Automation – deliver a Backstage-style portal that standardised “golden-path” templates and lets engineers provision infra and pipelines without SRE intervention, dramatically reducing toil and boosting developer productivity.

Observability & Reliability Engineering – define Service-Level Indicators (SLIs), Service-Level Objectives (SLOs) and maintain error-budget policies that drive release decisions and incident response.

Infrastructure & Account Hardening – implement defence-in-depth controls (least-privilege IAM, encryption-by-default, vulnerability remediation) across multi-account AWS estates.

Service Mesh Enablement – roll out and operate Istio, Linkerd or AWS App Mesh to secure and observe service-to-service traffic across Kubernetes workloads.

Cloud Policy Management (Policy-as-Code) – enforce governance, compliance and security baselines using Open Policy Agent (OPA) and related tooling integrated into CI/CD.

Infrastructure-as-Code with AWS CDK & EKS – design, build and operate Kubernetes clusters and surrounding services using AWS CDK constructs for repeatable, version-controlled deployments.

Experience with databases like MongoDB.

Required Qualifications

10+ years in SRE, DevOps or Platform Engineering roles operating production AWS workloads.

Hands-on expertise with AWS EKS, Kubernetes networking, Helm, Karpenter/Cluster-Autoscaler.

Proven delivery of service-mesh solutions (Istio, Linkerd, AWS App Mesh) for security & traffic management.

Deep understanding of SLI/SLO/error-budget methodologies and related monitoring/alerting stacks.

Proficiency in AWS CDK (TypeScript/Python preferred) and CloudFormation; comfortable writing reusable constructs and pipelines.

Strong automation skills in at least one language (Go, Python, or Typescript) plus Bash.

Experience implementing policy-as-code with OPA/Rego or similar, and integrating it into CI/CD.

Solid grounding in cloud security best practices (IAM, KMS, VPC design, OS hardening).

Excellent communication skills; able to translate reliability data into business impact and guide incident/post-mortem discussions.

Nice-to-Have

Exposure to Backstage, Port, Cortex or another IDP platform.

Familiarity with KEDA, K8s Horizontal/Vertical Pod Autoscalers, and advanced cost-optimisation on AWS (Spot, Savings Plans).

Experience with HashiCorp Vault, Consul or AWS Secrets Manager.

Chaos- and resilience-engineering practices (Gremlin, Litmus Chaos, AWS FIS).

Job Type: Full-time

Pay: ₹2,500,000.00 - ₹3,000,000.00 per year

Work Location: Remote

© 2025 Qureos. All rights reserved.