- Extensive knowledge of HPC technologies and workload scheduler such as Slurm and/or Altair PBS Pro,
- Proficient in HPC cluster management tools, including HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
- Experience with HPC cluster managers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
- Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and performance tuning of HPC components.
- Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet.
- Containerization & Orchestration
- Extensive hands-on experience with containerization technologies such as Docker, Podman, and Singularity
- Proficiency with at least two container orchestration platforms: CNCF Kubernetes, Red Hat OpenShift, SUSE Rancher (RKE/K3S), Canonical Charmed Kubernetes.
- Strong understanding of GPU technologies, including the NVIDIA GPU Operator for Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU health and performance monitoring.
Desired Skills
Digital : Kubernetes | Artificial Intelligence
Desired Candidate Profile
Qualifications : BACHELOR OF ENGINEERING