Qureos

Find The RightJob.

Site Reliability Engineer

Key Responsibilities:

Air-Gapped & Restricted Environment Operations

  • Design, deploy, and operate platforms in air-gapped and network-isolated environments
  • Manage software lifecycle (images, patches, updates) in offline or semi-connected setups
  • Implement secure data ingestion and export workflows using diode (one-way) connectivity
  • Define and enforce operational procedures for restricted-access environments

Azure Platform Engineering:

  • Design and operate Azure-based infrastructure, including AKS and integrations
  • Support Azure Blob Storage for controlled data transfer and secure storage
  • Integrate Azure workloads with on-prem / isolated OpenShift clusters

Kubernetes & OpenShift:

  • Operate and maintain AKS and OpenShift clusters in connected and air-gapped modes
  • Deploy, manage, and troubleshoot containerized workloads, including system services
  • Implement cluster hardening, RBAC, network policies, and compliance controls

GPU & AI Workloads:

  • Deploy and operate GPU-enabled Kubernetes workloads for AI/ML use cases
  • Manage NVIDIA GPU drivers, operators, and scheduling within Kubernetes
  • Optimize performance and reliability of AI inference and training containers

Data, Secrets & Security:

  • Administer and support PostgreSQL (HA, backups, restore, performance tuning)
  • Implement secrets management, encryption, and key lifecycle

Job Type: Contract
Contract length: 6 months

© 2026 Qureos. All rights reserved.