Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) to recruit a Kubernetes Platform Engineer for the American Science Cloud (AmSC) initiative.
AmSC is a first-of-its-kind, federally funded cloud infrastructure and API platform designed to accelerate AI model development, data sharing, and large-scale computational science across the U.S. Department of Energy (DOE). ORNL is a premier research institution delivering breakthroughs in energy, national security, and advanced computing.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
**Please note: The first step in the interview process requires candidates to join a Microsoft Teams meeting with the video turned on.**
This is a full-time position that can telecommute. Occasional travel to the Oak Ridge facility may be required.
- Working with highly talented team members
- 3 weeks’ vacation
- Excellent medical insurance, including employer-paid benefits
Cluster Operations & Administration
- Manage the full lifecycle of Kubernetes clusters (on-premises K3s/RKE2, GKE, and EKS), including upgrades, security patching, scaling, and capacity planning
- Troubleshoot cluster-level issues including control plane problems, node failures, and resource constraints
- Implement and maintain cluster security hardening based on CIS benchmarks and organizational security policies
- Manage etcd cluster health, backup procedures, and disaster recovery capabilities
- Monitor cluster performance and optimize resource utilization across multi-tenant workloads
- Coordinate with datacenter operations team for physical infrastructure changes and maintenance windows
Networking & Cilium CNI
- Implement, configure, and maintain Cilium CNI across on-premises and cloud Kubernetes environments
- Design and enforce network policies to achieve secure multi-tenant isolation
- Troubleshoot complex pod networking issues including DNS resolution, service discovery, and connectivity problems
- Configure and maintain BGP peering with physical network infrastructure for on-premises integration
- Work with network engineering team on firewall rules, VLANs, IPv6 networking, and network architecture
- Typically requires a minimum of 8 years of related experience with a Bachelor’s degree; or 6 years and a Master’s degree; or equivalent experience.
- Demonstrated experience administering Kubernetes on on-premises infrastructure (K3s, RKE2, or similar bare-metal distributions)
- Experience with cloud-managed Kubernetes (GKE and/or EKS)
- Strong understanding of Linux networking fundamentals: iptables/nftables, routing tables, DNS, TCP/IP stack, network troubleshooting
- Experience with GitOps methodologies and tools such as ArgoCD or Flux
- Proficiency in scripting and automation: Bash, Python, Go
- Cilium CNI or equivalent production experience
- Ability to work collaboratively in a team environment and communicate technical concepts clearly
- Understanding of Kubernetes security best practices including Pod Security Standards, RBAC, and secrets management
- GCP (Google Cloud Platform) and/or AWS (Amazon Web Services) cloud platform experience
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
- Go programming experience for operator maintenance and platform tooling development
- CKA (Certified Kubernetes Administrator) or CKS (Certified Kubernetes Security Specialist) certification
- Background in BGP routing protocols and network engineering concepts
- IPv6 networking experience
- Infrastructure as Code experience with Terraform or Ansible
- Experience with internal developer platform (IDP) tools such as Backstage or similar
- Experience with service mesh technologies (Istio, Linkerd)
- Excellent understanding of code review and familiarity with GitHub and GitLab workflows
Benefits
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.