C&DE-CMT-SRE Operations Engineer with - Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/Google Cloud Platform) - Onsite work - Dallas TX or Overland Park KSRequisition Name : C&DE-CMT-SRE Operations Engineer
Start Date : 5/11/2026
Duration : 55 Weeks
Services Location : TX/Dallas
Max Rate : $57 phr on W2 or 63 $ phr on Corp- Corp all inc
Description Of Services : SRE Operations Engineer The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/Google Cloud Platform). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime.
Skills Mandatory Skills (Must-Have):- System & Infrastructure Monitoring Expectation: Ability to use monitoring dashboards (e.g., Grafana, Datadog, Splunk, Argos, AIOps) toidentify anomalies, follow alert workflows, and escalate when thresholds are breached.
- Runbook Execution Expectation: Strictly follow documented steps to resolve standard incidents, escalate when stepsdo not apply or fail.
- Incident Triage & Communication Expectation: Perform first-line triage of alerts, gather logs/metrics, categorize severity, and notify stakeholders in clear, concise language.
- Kubernetes (Cloud or on-prem) operations knowledge Expectation: Ability to check pod status, understand logs, and verify service endpoints using kubectl and monitoring tools.
- Scripting (Python, Bash, PowerShell) Expectation: Able to read and make small edits to scripts to automate repetitive checks.
- Networking & Security Awareness Expectation: Understand troubleshooting (ping, netstat, curl, traceroute) and know when issues may be related to firewall, WAF, or proxy.
- Documentation & Knowledge Capture Expectation: Accurately record steps taken during incidents, suggest runbook updates where gapsexist.
Preferred Skills (Nice-to-Have):- Cloud Platform Familiarity (AWS, Azure, Google Cloud Platform) Expectation: Understand basics of cloud services (VMs, load balancers, storage) and how tonavigate a cloud console.
- Database Basics (SQL/NoSQL) Expectation: Run simple queries to validate DB connectivity and health.
- Automation & Self-Service Mindset Expectation: Identify repetitive manual steps and propose candidates for automation.
- Exposure to Incident Management Tools (xMatters, ServiceNow, Jira, etc.) Expectation: Comfortable working within ITSM/incident workflows.
- AI/Chatbot-Assisted Ops (emerging skill) Expectation: Use AI assistants to search runbooks or suggest remediation steps.
Qualifications 2 5 years in IT operations, NOC, or SRE/DevOps engineer role. Kubernetes 101, Linux 101, Networking 101 Understanding of cloud-ready applications Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.). Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone
Deliverables : Monitor system health, alerts, dashboards, and logs across cloud and on-prem infrastructure. Ability to isolate functional issue with application versus platform Execute standardized runbooks for incident resolution, deployments, and routine tasks. Perform initial triage of incidents and escalate to L2/L2+ as needed to mitigate the issue to get tobypass. Document new issues, gaps in runbooks, and automation opportunities. Provide excellent communication to stakeholders during incidents. Support onboarding of new applications into the operations framework.
For applications and inquiries, contact: hirings@openkyber.com