Data Center Engineer

We are looking for a Infrastructure Architect (AI & Data Center) - Remote / Telecommute for our client in San Jose, CA

Job Title: Infrastructure Architect (AI & Data Center) - Remote / Telecommute

Job Location: San Jose, CA

Job Type: Contract

Job Overview:

Pay Range: $71.16hr - $74.90hr

Requirement/Must Have:

Bachelor s degree in Information Technology, Business, or a related field.
5+ years of experience in Data Center projects in an enterprise environment.
Knowledge of Cisco, Dell, HPE, Supermicro hardware.
Deep knowledge of Cisco HW, NVIDIA GPU architectures (H100, B200, RTX 6000 Pro) and high-speed interconnects (RoCE v2, InfiniBand).
Extensive knowledge and experience with Data Center infrastructure.
Proficiency with asset management and automation tools (Netbox, ServiceNow, Terraform, or OpenTofu).
Experience in Data Center lifecycle management, DC HW capacity planning, decommissioning, defragmentation, building complex financial showback models for shared infrastructure.
Proven expertise in Kubernetes (NKP preferred) and NVIDIA AI Enterprise stacks (GPU Operator, DCGM, Triton, vLLM).

Responsibilities:

Lead the architectural design and refinement of the client GPU-as-a-Service (GPUaaS) platform, ensuring a seamless experience for internal R&D, QA, and Sales teams.
Provide technical leadership in key initiatives such as client Validated Designs (NVD) for the AI Factory, incorporating NVIDIA MGX/HGX architectures and high-density Cisco nodes (e.g., UCS 845A).
Architect the Management Cluster control plane (NKP, Prism Central, NuDeploy) to ensure it is decoupled from GPU compute nodes for maximum efficiency.
Implement policy-driven placement of workloads across on-prem and cloud-burst environments.
Design solution for a centralized Data Center Asset Inventory system, ensuring real-time visibility into all hardware assets, including CPUs, GPUs, Virtual Machines, and networking.
Develop a comprehensive Hardware Lifecycle Management strategy, including procurement forecasting, 'rack and stack' operationalization, and decommissioning of legacy systems (G3/G4/G5).
Lead 'Tiger Team' initiatives to navigate supply chain constraints, ensuring critical release milestones are not delayed by hardware shortages.
Enforce strict Security Standards for Data Center HW Provisioning.
Implement network segmentation for all critical applications.
Ensure all infrastructure meets SOC 2 and ISO 27001 compliance objectives while maintaining low-latency performance.
Provide required architecture and designs during the project intake process. Review, guide the teams for right architecture for all demands before they become approved projects.
Partner with security team and provide guidelines for upcoming projects.
Involve and lead projects as an architect on special projects.

Nice to Have:

Experience managing (as an architect) massive-scale data center environments (1,000+ nodes).
Knowledge of client Cloud Infrastructure (NCI), AHV, and Prism Central.
Strong background in MLOps and automated pipeline integration (Kubeflow/MLflow).

For applications and inquiries, contact: hirings@openkyber.com

Similar jobs

Senior Product Lifecycle Mechanical Engineer, Data Center Mechanical Products & Services

Amazon.com

Seattle, United States

about 3 hours ago

Senior Electrical Design Engineer - Data Centers

Eaton

Turkey

about 3 hours ago

Mechanical Design Engineer, Data Center Design Engineering

Amazon.com

Seattle, United States

about 8 hours ago

Data Center Project Manager - UAE National Only, Critical Projects Implementation (CPI)

Amazon.com

Abu Dhabi, United Arab Emirates

about 13 hours ago

Data Center Management Engineer - Sr. Engineer

Valleysoft

Egypt

about 13 hours ago

Data Center Design & Assessment Consultant (Part-Time / Contract) - Remote

Franco Pinto

Egypt

about 13 hours ago

Term of use Privacy policy