Find The RightJob.

AI Inference Infrastructure Engineer

About Dizzaract

Dizzaract is a UAE-based game development studio founded in 2022, headquartered at Yas Creative Hub, Abu Dhabi. We develop cutting-edge AI-powered games and systems, including our innovation R&D laboratory FAR labs, the upcoming hero shooter Farcana, and the AI gaming identity platform GAMED. Our research and development team boasts over 100 peer-reviewed papers and more than 20 patents in AI-driven gameplay, digital ownership, and competitive design. With a diverse team of more than 80 professionals from over 20 countries, we are committed to innovation, excellence, and building a culture that drives performance and results.

The Mission: We are building a highly optimized, decentralized AI inference network. To beat the latency and throughput of established centralized players, we cannot rely on off-the-shelf wrappers. You will be responsible for building the bare-metal, ultra-low-latency infrastructure that serves large language models and multimodal networks at unprecedented scale.

What You Will Do:

Core Engine Development: Architect and write highly optimized, low-level code (primarily in Rust and C) to manage model loading, memory allocation, and request batching across a distributed fleet of GPUs/NPUs.
Hardware-Aware Optimization: Implement tensor mathematics optimizations and custom kernels (CUDA/Triton) to squeeze maximum FLOPS out of the hardware.
Zero-Intervention Deployments: Build rock-solid, fully packaged infrastructure pipelines. We operate with zero manual intervention—no ad-hoc scripts, no PowerShell bandaids. If a node fails, the network must heal autonomously.
Decentralized Orchestration: Design the peer-to-peer or decentralized routing logic that ensures high availability and optimal load balancing across geographically distributed nodes.
Advanced Inference Techniques: Implement and optimize techniques like continuous batching, speculative decoding, and paged attention (vLLM, TensorRT-LLM) customized for our specific network architecture.

What We Are Looking For:

Deep expertise in systems programming (Go) and a strong aversion to bloated, high-level abstractions where performance matters.
Proven experience with GPU programming (CUDA, ROCm) and low-level hardware architecture.
Strong understanding of deep learning architectures (Transformers, Mamba) and how tensor operations execute on silicon.
Experience building highly concurrent, distributed systems with sub-millisecond network latency requirements.

Work Location: In person

Similar jobs