Integrant is looking for game changers to join our team as " Lead AI Platform".
The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure, with a strong focus on NVIDIA AI stack, enabling high-performance, scalable, and optimized AI systems.
This role focuses on model optimization, runtime efficiency, and GPU utilization, ensuring that AI workloads are production-ready, cost-efficient, and performant across enterprise environments.
Roles and Responsibilities:
-
Translate AI/ML workloads into optimized infrastructure and deployment strategies
-
Optimize model performance across GPU environments (latency, throughput, memory utilization)
-
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)
-
Convert and optimize models across frameworks (PyTorch → ONNX → TensorRT)
-
Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)
-
Improve GPU utilization and scheduling efficiency across clusters
-
Design scalable distributed training and inference architectures
-
Work closely with customers to define AI infrastructure strategies and deployment models
-
Support production deployments including monitoring, rollback, and performance validation
-
Conduct applied research to improve model efficiency and infrastructure utilization
-
Mentor team members on AI infrastructure, optimization, and GPU systems
-
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
-
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
-
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Requirements
-
8+ years of experience in AI systems
-
8+ years of experience in ML systems, HPC and AI infrastructure
-
Strong proficiency in Python
-
Strong experience with GPU-based AI workloads and performance optimization
-
Deep understanding of model optimization techniques (quantization, pruning, batching)
-
Hands-on experience with:
-
PyTorch
-
ONNX / ONNX Runtime
-
TensorRT / TensorRT-LLM
-
Triton Inference Server
-
Knowledge of CUDA, cuDNN, and GPU architecture fundamentals
-
Experience with distributed systems (multi-GPU / multi-node)
-
Familiarity with:
-
NCCL communication
-
NVLink / InfiniBand
-
Kubernetes or Slurm for orchestration
-
Experience deploying AI models into production environments
-
Ability to analyze system bottlenecks (compute, memory, network)
-
Experience with profiling tools (Nsight, TensorRT profiler, etc.)
-
Knowledge of cost optimization strategies for GPU workloads
-
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
-
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
-
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Nice to Have
-
Experience with NVIDIA NIM and NGC ecosystem
-
Exposure to Megatron-LM, NeMo, or large-scale LLM training/inference
-
Experience with LLM optimization techniques (KV cache, batching strategies)
-
Familiarity with MLOps practices and CI/CD for AI systems
-
Experience in customer-facing architecture or consulting roles
-
Familiarity with hybrid cloud / on-prem HPC environments
Benefits
-
Salary paid in USD
-
Six-month career advancing opportunities
-
Supportive and friendly work environment
-
Premium medical insurance [employee +family]
-
English language development courses
-
Interest-free loans paid over 2.5 years
-
Technical development courses
-
Planned overtime program (POP)
-
Employment referral program
-
Premium location in Maadi
-
Social insurance