Find The RightJob.

Kubernetes Security Engineer

L1 SRE Operations Engineer in Dallas, TX. Start date is 5/11 and will last for 1 YEAR. This is a temp to perm role can be based in Dallas TX or Overland Park, KS Pay Rate is 52.00-58.00/hr Skills (EXPERT/ADVANCED/NONE): System & Infrastructure Monitoring Runbook Execution Incident Triage & Communication Kubernetes (Cloud or onprem) operations knowledge Scripting (Python, Bash, PowerShell) Networking & Security Awareness Documentation & Knowledge Capture Questions (must reply YES to ALL): Do you have System & Infrastructure Monitoring experience? Do you have Runbook Execution experience? Description: SRE Operations Engineer The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/Google Cloud Platform). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime. Skills Mandatory Skills (Must-Have) 1. System & Infrastructure Monitoring Expectation: Ability to use monitoring dashboards (e.g., Grafana, Datadog, Splunk, Argos, AIOps) toidentify anomalies, follow alert workflows, and escalate when thresholds are breached. Example: When a Kubernetes pod crash-loop is flagged in Prometheus, L1 should validate it againstrunbooks, check pod logs, and escalate if restart attempts fail. 2. Runbook Execution Expectation: Strictly follow documented steps to resolve standard incidents, escalate when stepsdo not apply or fail. Example: Use a provided runbook to restart a failed API proxy service; if error persists beyonddocumented steps, escalate to L2. 3. Incident Triage & Communication Expectation: Perform first-line triage of alerts, gather logs/metrics, categorize severity, and notifystakeholders in clear, concise language. Example: For a database connection timeout, collect error logs, verify service reachability, andprovide a detailed incident note to L2 before escalation. 4. Kubernetes (Cloud or onprem) operations knowledge Expectation: Ability to check pod status, understand logs, and verify service endpoints usingkubectl and monitoring tools. Example: Run kubectl get pods -n to verify if deployments arehealthy. 5. Scripting (Python, Bash, PowerShell) Expectation: Able to read and make small edits to scripts to automate repetitive checks. Example: Modify a Bash script to include an additional log path in a health check. 6. Networking & Security Awareness Expectation: Understand troubleshooting (ping, netstat, curl, traceroute) and know when issuesmay be related to firewall, WAF, or proxy. Example: For an unreachable service, confirm DNS resolution and connectivity before escalating toL2. 7. Documentation & Knowledge Capture Expectation: Accurately record steps taken during incidents, suggest runbook updates where gapsexist. Example: After handling an alert for disk usage, note missing cleanup steps in the runbook and flagfor update. Preferred Skills (Nice-to-Have) 1. Cloud Platform Familiarity (AWS, Azure, Google Cloud Platform) Expectation: Understand basics of cloud services (VMs, load balancers, storage) and how tonavigate a cloud console. Example: Use AWS Console to check EC2 instance health status when a service alert is triggered. 2.Database Basics (SQL/NoSQL) Expectation: Run simple queries to validate DB connectivity and health. Example: Execute SELECT 1; to verify a database is reachable. 3. Automation & Self-Service Mindset Expectation: Identify repetitive manual steps and propose candidates for automation. Example: Flag that manual log collection during outages could be replaced with a script. 4. Exposure to Incident Management Tools (xMatters, ServiceNow, Jira, etc.) Expectation: Comfortable working within ITSM/incident workflows. Example: Log incident details in ServiceNow with accurate categorization and timestamps. 5. AI/Chatbot-Assisted Ops (emerging skill) Expectation: Use AI assistants to search runbooks or suggest remediation steps. Example: Ask an AI ops assistant to summarize logs before escalation. Qualifications 2 5 years in IT operations, NOC, or SRE/DevOps engineer role. Kubernetes 101, Linux 101, Networking 101 Understanding of cloud-ready applications Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.). Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone

For applications and inquiries, contact: hirings@openkyber.com

Similar jobs

Security Engineer (Bangkok Based, Relocation Support)

Agoda

Doha, Qatar

about 6 hours ago

Security Engineer (Bangkok Based, Relocation Support)

Agoda

Cairo, Egypt

about 6 hours ago

Term of use Privacy policy