Job Title: Director - Infrastructure & Cloud Operations
About the Role We’re seeking a highly skilled and visionary Director of Infrastructure & Cloud Operations to lead the strategy, design, and execution of our global Cloud and SaaS operations. You’ll own the availability, scalability, performance, and security of our platform, working closely with product, engineering, security, and customer success teams to drive operational excellence at scale. This is a senior leadership role requiring deep technical expertise, strong strategic thinking, and a proven track record of building and running world-class Cloud/SaaS environments.
Key Responsibilities
- Lead the strategy, design, and implementation of scalable, secure, and highly available Cloud/SaaS infrastructure.
- Manage and mentor a global team of Cloud and SRE engineers to deliver operational excellence.
- Oversee deployment, monitoring, incident response, and ongoing performance optimization.
- Own platform performance and availability reporting (SLAs, KPIs, SLOs).
- Define and implement best practices, standards, and governance for infrastructure and operations.
- Drive continuous improvement in reliability, cost efficiency, and security posture.
- Build and manage 24x7 NOC and on-call operations.
- Identify and mitigate risks, ensuring platform resilience and business continuity.
- Partner with stakeholders across the organization to communicate status, share insights, and support growth.
- Develop and manage infrastructure budgets and resource allocations.
Skill & Experience
- 5+ years in a Director or above leadership role, managing mission-critical SaaS/Cloud environments (AWS, Azure, etc.).
- 5+ years of hands-on experience in 24x7x365 Cloud Operations.
- Proven success designing and implementing highly available, scalable, and secure Cloud solutions.
- Expertise in Infrastructure as Code (IaC) tools: Terraform, Ansible, Chef.
- Strong knowledge of DevOps principles, CI/CD pipelines, containerization, and Kubernetes.
- Proficiency with monitoring, observability, and analytics tools (Prometheus, ELK, Dynatrace, New Relic, AppDynamics, Datadog, etc.).
- Experience managing global on-call and NOC operations (PagerDuty or similar).
- Strong understanding of cloud security principles and best practices.
- Exceptional leadership and people management skills, with a track record of building high performing teams.
- Excellent communication and stakeholder management abilities.