We are seeking a DevOps Engineer with a BS in Computer Science, Engineering, or equivalent practical experience, and at least 5 years of relevant experience in DevOps, SRE, Cloud Infrastructure, or Platform Engineering roles. This position requires strong hands-on expertise in Linux, multi-cloud environments, containerization, infrastructure automation, CI/CD, observability, and cloud security.
This role is focused on building secure, reliable, and scalable systems and platforms that help engineering teams ship faster with less operational overhead. You will work across cloud infrastructure, Kubernetes, developer tooling, and production operations, with a strong emphasis on automation, resilience, and modern DevSecOps practices.
Responsibilities
-
Design, provision, and maintain cloud infrastructure using Infrastructure as Code, with Terraform/OpenTofu/Terragrunt as the primary tool; experience with CloudFormation
-
Build, operate, and optimize container platforms on Kubernetes and managed services such as EKS, AKS, and GKE.
-
Implement modern deployment workflows using GitOps tools such as Argo CD, Flux, Helm, and Kustomize.
-
Build and optimize secure CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or equivalent platforms.
-
Improve delivery security using short-lived credentials and workload identity patterns such as OIDC, reducing reliance on long-lived secrets.
-
Establish and maintain DevSecOps practices including secrets management, IaC scanning, container and dependency scanning, policy-as-code, and software supply chain controls such as SBOMs, artifact signing, and verification.
-
Implement observability standards using OpenTelemetry, Prometheus, Grafana, Elastic/OpenSearch, Datadog, or similar tools for logs, metrics, traces, dashboards, and actionable alerting.
-
Define and operate SLIs, SLOs, and reliability processes aligned with SRE principles.
-
Design secure cloud networking across AWS, Azure, and/or GCP, including private networking, IAM, VPN/connectivity, firewalls, security groups, and service-to-service access controls.
-
Support incident response, postmortems, disaster recovery readiness, and operational improvements based on production learnings.
-
Partner with engineering teams to improve developer experience through reusable templates, self-service platform capabilities, golden paths, and automation.
-
Drive infrastructure cost optimization and FinOps practices across cloud, Kubernetes, and managed services.
-
Collaborate with engineering, security, and product teams to improve platform reliability, scalability, performance, and security posture.
Requirements
-
BS in Computer Science, Engineering, or equivalent practical experience.
-
5+ years of relevant experience in DevOps, SRE, Cloud Infrastructure, or Platform Engineering roles.
-
Strong Linux administration skills and solid scripting ability in Bash and/or Python.
-
Proven hands-on expertise with Terraform in production environments.
-
Strong experience with Docker, Kubernetes, and managed Kubernetes services.
-
Experience working across at least 2 major cloud providers: AWS, Azure, and/or GCP.
-
Strong understanding of CI/CD, GitOps workflows, and release automation.
-
Hands-on experience with observability platforms and telemetry pipelines for metrics, logs, traces, dashboards, and alerting.
-
Practical experience with DevSecOps controls including secrets management, vulnerability management, policy enforcement, and software supply chain security.
-
Strong understanding of cloud networking, IAM, storage, compute, and security services in multi-cloud environments.
-
Experience participating in incident response, root cause analysis, and reliability improvements.
-
Strong problem-solving skills, analytical thinking, and the ability to communicate technical concepts clearly.
-
Fluent English communication, both written and verbal.
-
Ability to use AI tools such as Claude, ChatGPT, Copilot, or similar assistants to accelerate troubleshooting, automation, and delivery.
-
Ability to thrive in collaborative, distributed team environments.
Nice to Have
-
Cloud certifications in AWS, Azure, or GCP.
-
Experience with Vault, External Secrets, SOPS, OPA/Gatekeeper, Kyverno, Cilium, Istio, Linkerd, or Crossplane.
-
Experience with internal developer platforms or developer portals such as Backstage.
-
Familiarity with compliance and governance requirements such as SOC 2, ISO 27001, HIPAA, or PCI DSS.
-
Exposure to DORA metrics, platform engineering practices, and cost governance frameworks.