Qureos

FIND_THE_RIGHTJOB.

Senior Engineer, Defect Management & DevOps

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

As a Senior Engineer (L3) specializing in Defect Management & DevOps, you will play a critical role in driving operational excellence, ensuring defect-free delivery pipelines, and strengthening reliability across cloud-native platforms. You will collaborate closely with engineering, QA, SRE, and product teams to manage end-to-end defect processes, streamline automation, and enhance service observability. The role demands deep analytical capability, strong DevOps experience, and the ability to influence cross-functional improvements through data-driven insights and advanced troubleshooting.

You will act as a subject matter expert (SME) in DevOps and GCP/AWS, overseeing end-to-end release processes, governance, and delivery pipelines. This role requires leadership, deep technical knowledge, and excellent communication skills.

Core Responsibilities

  • Serve as the Subject Matter Expert (SME) for cloud platforms, primarily AWS (GCP exposure is a plus), providing guidance on cloud best practices, architectural decisions, and solution design.
  • Support customers with core Managed Services technologies, including Cloud, Automation, Terraform, CI/CD, and containerization.
  • Design, implement, and optimize cloud-native and DevOps solutions aligned with customer and organizational objectives.
  • Lead technical discussions, demos, and customer engagements while effectively communicating complex technical concepts to both technical and non-technical stakeholders.
  • Assist with team-building activities such as interviewing, onboarding, and aligning technical resources.
  • Provide technical leadership, coaching, and mentorship to junior team members.
  • Maintain strong project and situational awareness to ensure deliverables meet timelines and organizational expectations.
  • Develop high-quality documentation including architectures, workflows, runbooks, and other written deliverables.
  • Act as a technical expert in internal knowledge-sharing initiatives and external client interactions.
  • Influence cloud governance, operational policies, best practices, and process improvements across teams and customer environments.
  • Ensure precision, accuracy, and strong attention to detail across all tasks and deliverables.

Requirements

  • Act as the SME for Defect Management processes, governance, tooling, and reporting.
  • Own and manage the full defect lifecycle, including logging, triage, prioritization, RCA, corrective actions, and closure.
  • Partner with Development, QA, SRE, and Product teams to ensure timely resolution of high-impact issues.
  • Establish and maintain defect dashboards, KPIs, and trend analytics to drive quality and process improvements.
  • Develop standardized runbooks, escalation workflows, and operational procedures for defect handling.
  • Lead cross-team Root Cause Analysis (RCA) investigations and drive Corrective and Preventive Actions (CAPA) implementations.
  • Improve operational readiness through enhanced monitoring, alerting, and structured incident-to-defect workflows.
  • Provide guidance on CI/CD optimization, automation strategies, infrastructure stability, and reliability engineering.
  • Mentor junior engineers in DevOps principles, tooling, defect analysis techniques, and troubleshooting best practices.

Requirements

  • Defect Management Expertise
  • Full ownership of defect lifecycle ensuring SLA adherence.
  • Deep understanding of SDLC, change management, and ITIL best practices.
  • Ability to analyze defect patterns, severity trends, root causes, and long-term systemic issues.
  • Conduct structured RCA using 5 Why’s, Fishbone, Fault Tree Analysis.
  • Define and enforce severity, categorization, and prioritization standards.
  • Create dashboards and quality metrics to drive continuous improvement.
  • Tools & Skills:
  • Strong JIRA workflow, automation rule, dashboard, and reporting expertise.
  • Ability to visualize defect trends and quality metrics effectively.
  • Observability, Monitoring & SIEM Tools
  • Hands-on experience with Dynatrace, Datadog, Prometheus, Grafana, CloudWatch, or similar tooling.
  • Skilled in APM analysis, log correlation, anomaly detection, service mapping, and performance troubleshooting.
  • Build and maintain dashboards and alert frameworks.
  • Integrate monitoring insights with DevOps and operational workflows.
  • Exposure to SIEM event analysis for operational and security correlation.

Core DevOps Responsibilities

  • Build, enhance, and support CI/CD pipelines across multiple environments using AWS CodePipeline, CodeBuild, CodeDeploy, and Git-based workflows.
  • Collaborate on automation initiatives using Terraform, CloudFormation, AWS CDK, or equivalent IaC tools to standardize and streamline deployments.
  • Deploy and manage AWS cloud-native services including EKS, ECS, Lambda, API Gateway, S3, IAM, and supporting architectures.
  • Work with containers and orchestration platforms such as Kubernetes, EKS, ECS, and AKS (where required).
  • Implement deployment best practices such as blue/green, rolling updates, and automated rollback strategies to ensure safe, repeatable releases.
  • Troubleshoot complex deployment issues, environment drift, infrastructure failures, performance bottlenecks, and service-level degradations.
  • Implement and maintain observability using CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or equivalent monitoring stacks.
  • Ensure AWS workloads adhere to resiliency, compliance, security, and operational excellence guidelines.
  • Strong hands-on, production-grade DevOps experience in AWS (primary cloud).
  • Deep expertise in Kubernetes, containerized workloads, microservices, autoscaling, and cloud networking.
  • Advanced troubleshooting across AWS services, distributed systems, CI/CD pipelines, and API-driven workflows.
  • Knowledge of AWS cost optimization, tagging, FinOps alignment, and resource lifecycle governance.
  • Exposure to building or maintaining CI/CD pipelines within GCP ecosystems (Cloud Build, GKE, Artifact Registry, etc.).
  • Ability to work with GCP cloud-native services where required, ensuring consistency across hybrid/multi-cloud deployments.
  • Familiarity with GCP IAM, VPC architecture, and core compute/storage/networking components is a plus.

General Qualifications

  • Strong communication, leadership, and mentoring capabilities.
  • 6–10+ years of experience in DevOps, SRE, QA Engineering, or Cloud Operations.
  • Expert-level AWS knowledge (GCP exposure would be a plus).
  • Strong command of IaC tools such as Terraform, CloudFormation, CDK.
  • Experience with CI/CD systems: Jenkins, GitLab CI, AWS CodePipeline.
  • Proficiency with Docker, Kubernetes, and container orchestration.
  • Experience with monitoring technologies: Datadog, Grafana, Prometheus.
  • Experience with JIRA workflows and project tracking.
  • Ability to excel in dynamic, fast-paced environments.

Expectations

  • Demonstrate deep expertise across DevOps, cloud platforms, automation, and engineering practices.
  • Balance hands-on delivery with leadership responsibilities and strategic initiatives.
  • Continuously assess, refine, and enhance processes, documentation, and operational workflows.
  • Adapt effectively to evolving customer requirements, project priorities, and technology landscapes.
  • Engage confidently with senior stakeholders, providing clear communication and technical guidance.
  • Lead scoping, planning, and methodology definition for major technical initiatives and transformations.
  • Contribute to the development of new engineering standards, frameworks, and best practices across teams.
  • Take senior-level ownership of critical defects, escalations, and operational issues, driving them to resolution.
  • Influence and drive cross-team improvements in tooling, quality, automation, and operational efficiency.
  • Ensure prevention mechanisms, automation guardrails, and reliability practices are embedded early in delivery cycles.
  • Lead initiatives focused on defect prevention, observability enhancements, and overall DevOps maturity uplift.
  • Participate in on-call rotations and provide Tier-3 technical expertise for complex issues.
  • Continuously propose, design, and implement enhancements across tooling, automation, and operational frameworks.

© 2025 Qureos. All rights reserved.