We are seeking an experienced Infrastructure & AI Systems Manager to lead the design, deployment, and optimization of advanced AI-driven infrastructure solutions. The ideal candidate will have deep expertise in cloud architecture, containerization, and AI system integration, with a focus on building scalable, secure, and compliant platforms.
Requirements
- Architect Agentic AI Solutions: Design modular systems integrating LLMs to support multi-step reasoning, retrieval-augmented generation (RAG), and tool use.
- AI Model Integration: Evaluate, select, and integrate suitable models (e.g., GPT-5 or custom LLMs) into scalable environments.
- AI Agent Development: Build and maintain intelligent agents for research, summarization, and autonomous task execution.
- Infrastructure & Scalability: Manage infrastructure using Docker and Kubernetes to ensure high availability, scalability, and reliability.
- Responsible AI Practices: Implement AI governance, including guardrails, content filters, and compliance with data protection standards.
- Cloud & Security Management: Oversee cloud operations (Azure, AWS) ensuring cost efficiency, performance, and compliance.
- Collaboration & Stakeholder Management: Work closely with cross-functional teams and customers to translate business objectives into technical AI solutions.
- Continuous Optimization: Analyze model performance, refine prompts, and enhance system accuracy and efficiency.
- Data Engineering: Design and manage data pipelines and federated data marts; ensure seamless data movement and optimization across sources.
- Automation & DevOps: Champion infrastructure-as-code, CI/CD pipelines, and the automation of provisioning, deployment, and monitoring.
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
- 8+ years of experience in infrastructure management, cloud engineering, or AI systems architecture.
- Proven experience with Docker, Kubernetes, and cloud platforms (AWS, Azure).
- Solid understanding of AI/ML deployment, RAG systems, and LLM model operations.
- Familiarity with infrastructure security, identity management, and data protection compliance (UAE CB | SIA | SAMA).
- Strong hands-on experience with DevOps, CI/CD, and automation frameworks.
- Excellent problem-solving and communication skills with the ability to collaborate across teams and manage complex technical projects.