Qureos

Find The RightJob.

Data Center Engineer

We are looking for a Infrastructure Architect (AI & Data Center) - Remote / Telecommute for our client in San Jose, CA

Job Title: Infrastructure Architect (AI & Data Center) - Remote / Telecommute

Job Location: San Jose, CA

Job Type: Contract

Job Overview:

Pay Range: $71.16hr - $74.90hr

Requirement/Must Have:

  • Bachelor s degree in Information Technology, Business, or a related field.
  • 5+ years of experience in Data Center projects in an enterprise environment.
  • Knowledge of Cisco, Dell, HPE, Supermicro hardware.
  • Deep knowledge of Cisco HW, NVIDIA GPU architectures (H100, B200, RTX 6000 Pro) and high-speed interconnects (RoCE v2, InfiniBand).
  • Extensive knowledge and experience with Data Center infrastructure.
  • Proficiency with asset management and automation tools (Netbox, ServiceNow, Terraform, or OpenTofu).
  • Experience in Data Center lifecycle management, DC HW capacity planning, decommissioning, defragmentation, building complex financial showback models for shared infrastructure.
  • Proven expertise in Kubernetes (NKP preferred) and NVIDIA AI Enterprise stacks (GPU Operator, DCGM, Triton, vLLM).

Responsibilities:

  • Lead the architectural design and refinement of the client GPU-as-a-Service (GPUaaS) platform, ensuring a seamless experience for internal R&D, QA, and Sales teams.
  • Provide technical leadership in key initiatives such as client Validated Designs (NVD) for the AI Factory, incorporating NVIDIA MGX/HGX architectures and high-density Cisco nodes (e.g., UCS 845A).
  • Architect the Management Cluster control plane (NKP, Prism Central, NuDeploy) to ensure it is decoupled from GPU compute nodes for maximum efficiency.
  • Implement policy-driven placement of workloads across on-prem and cloud-burst environments.
  • Design solution for a centralized Data Center Asset Inventory system, ensuring real-time visibility into all hardware assets, including CPUs, GPUs, Virtual Machines, and networking.
  • Develop a comprehensive Hardware Lifecycle Management strategy, including procurement forecasting, 'rack and stack' operationalization, and decommissioning of legacy systems (G3/G4/G5).
  • Lead 'Tiger Team' initiatives to navigate supply chain constraints, ensuring critical release milestones are not delayed by hardware shortages.
  • Enforce strict Security Standards for Data Center HW Provisioning.
  • Implement network segmentation for all critical applications.
  • Ensure all infrastructure meets SOC 2 and ISO 27001 compliance objectives while maintaining low-latency performance.
  • Provide required architecture and designs during the project intake process. Review, guide the teams for right architecture for all demands before they become approved projects.
  • Partner with security team and provide guidelines for upcoming projects.
  • Involve and lead projects as an architect on special projects.

Nice to Have:

  • Experience managing (as an architect) massive-scale data center environments (1,000+ nodes).
  • Knowledge of client Cloud Infrastructure (NCI), AHV, and Prism Central.
  • Strong background in MLOps and automated pipeline integration (Kubeflow/MLflow).

For applications and inquiries, contact: hirings@openkyber.com

© 2026 Qureos. All rights reserved.