Qureos

Find The RightJob.

DevOps Engineer – AI Infrastructure

We are hiring an experienced DevOps Engineer with strong expertise in AI infrastructure, GPU environments, Kubernetes, and enterprise-grade platform engineering. This role is ideal for professionals who have worked on deploying and managing production-level LLM environments within secure on-premise infrastructures.

Key Responsibilities

  • Lead deployment and management of Linux-based AI infrastructure environments.

  • Configure and maintain KVM virtualization platforms and GPU-enabled systems.

  • Deploy and manage containerized LLM serving environments in production.

  • Design scalable and secure Kubernetes-based infrastructure for AI workloads.

  • Implement CI/CD pipelines, automation frameworks, and infrastructure-as-code practices.

  • Apply security hardening, RBAC, encryption, secrets management, and audit-ready controls.

  • Monitor GPU utilization, infrastructure health, and system performance.

  • Support high availability, disaster recovery, backup, and failover strategies.

  • Troubleshoot infrastructure, GPU runtime, networking, and platform stability issues.

  • Prepare technical documentation, architecture diagrams, and operational runbooks.

Required Skills Experience

  • 5+ years of experience in DevOps, Platform Engineering, or Infrastructure Engineering.

  • Strong Linux administration experience including networking, storage, security hardening, and performance tuning.

  • Hands-on experience working with NVIDIA GPU infrastructure including H100, A100, H200, or equivalent.

  • Strong experience with CUDA drivers, GPU runtimes, GPU scheduling, and AI inference optimization.

  • Experience deploying production-grade LLM serving platforms using technologies like vLLM, TensorRT-LLM, or Triton Inference Server.

  • Strong hands-on experience with Docker, Kubernetes, and containerized AI workloads.

  • Experience with Infrastructure as Code and automation tools such as Ansible.

  • Strong understanding of DevSecOps practices, secure CI/CD pipelines, SAST integration, and secrets management.

  • Experience working with PostgreSQL, vector databases such as Qdrant, and observability tools.

  • Experience working in on-premise, air-gapped, or regulated enterprise environments.

Preferred

  • Experience designing scalable AI infrastructure environments.

  • Knowledge of HA/DR strategies and enterprise backup solutions.

  • Strong troubleshooting and documentation skills.

Similar jobs

No similar jobs found

© 2026 Qureos. All rights reserved.