Job Title: Operations Manager – Data Center
Location:
Riyadh, Saudi Arabia
Company:
Ezditek
Role Overview:
We are seeking an experienced and strategic Operations Manager to oversee and lead the 24/7 operation, maintenance, and optimization of hyper-scale data center facilities. The role is responsible for managing teams of Site Operation Engineers and outsourced Managed Service Providers (MSP), ensuring that all aspects of whitespace (tenant IT capacity) and grey space (facility infrastructure) environments are running effectively, efficiently, and sustainably with seamless functionality. This position plays a critical role in meeting Service Level Agreements (SLAs), managing budgets, implementing sustainability initiatives, ensuring HSE compliance, and maintaining industry-best standards for continuous availability and operational excellence.
Key Responsibilities:
Leadership & Team Management
-
Lead and supervise the day-to-day activities of the MSP and in-house Site Operation Engineers in white and grey space areas.
-
Guide teams to enhance services and processes while maintaining compliance with customer SLAs.
-
Develop and foster a culture of high performance, continuous improvement, and operational excellence.
Data Center Operations & Maintenance
-
Ensure all mission-critical systems including M&E power/cooling, security, network, plumbing, and fire suppression systems are operating optimally to deliver maximum design availability.
-
Oversee and optimize mechanical, electrical, and plumbing (MEP) systems to maintain uptime in accordance with tenants' SLAs.
-
Coordinate with the Network Operations Center (NOC) to respond promptly to alarms, alerts, and escalations, taking remedial actions to ensure minimal service disruption.
-
Continuously review and improve the preventive maintenance program.
RunBook & Service Catalog Management
-
Design, review, and implement top-tier data center operation services, procedures, and practices, adhering to TCOS and industry best practices.
-
Oversee the creation, review, and continuous improvement of the RunBook to ensure all operational procedures are current, accurate, and effectively executed.
-
Maintain and regularly update the service catalog to provide tenants with the latest services aligned with evolving data center offerings.
-
Monitor adherence to established RunBook procedures by both internal teams and the MSP, guaranteeing consistency and compliance.
Tenant Service Delivery
-
Coordinate with the sales team and work closely with tenants to understand their capacity requirements, ensuring adequate provisioning of whitespace infrastructure (power, cooling, racks, etc.).
-
Coordinate the implementation of requested service catalog items, monitoring progress and performance against delivery timelines.
-
Ensure high levels of customer satisfaction by continuously communicating with tenants regarding operational status, planned maintenance, and improvements.
Budget & Financial Management
-
Effectively manage site operation budgets to optimize costs without compromising safety, availability, or committed service levels.
-
Collaborate with line managers, procurement, and finance teams to forecast operational expenses, negotiate service contracts, and manage vendor relationships.
-
Review and approve purchase orders for equipment and support vendor evaluations.
-
Regularly report on budget performance, highlighting areas of cost savings, efficiency gains, and financial risks.
Performance Monitoring & Continuous Improvement
-
Develop, track, and report Key Performance Indicators (KPIs) such as Power Usage Effectiveness (PUE), availability, mean time to repair (MTTR), SLA compliance, and customer satisfaction scores.
-
Conduct regular performance reviews with the MSP to ensure compliance with contractual obligations and promote continuous improvement initiatives.
-
Adopt best practices and innovative strategies to minimize downtime, improve capacity utilization, and enhance overall operational efficiency.
-
Develop, review, and implement policies, SOPs, EOPs, and MOPs, and initiate service improvement programs.
Risk Management, HSE & Security Compliance
-
Collaborate with HSE and Security teams to embed HSE and Security requirements within O&M policies and procedures, ensuring a safe and secure environment.
-
Identify and mitigate operational risks by maintaining strict adherence to compliance frameworks (ISO, SOC, etc.) and local regulations.
-
Ensure strict adherence to safety procedures, laws, and regulations by reviewing and reinforcing safe work responsibilities.
-
Stay updated on evolving health, safety, and environmental regulations, incorporating relevant updates into operational strategies.
Sustainability & Data Center Certifications
-
Work closely with the Sustainability team to implement energy-efficient practices, reduce carbon footprint, and optimize resource utilization.
-
Participate in activities related to acquiring and maintaining data center sustainability certifications, including witnessing the tier certificate of operational sustainability (TCOS) issued by Uptime.
-
Collaborate with construction contractors during design and build phases to review RunBook documents and ensure alignment with long-term operational and sustainability goals.
-
Oversee data center-related certifications and ensure compliance for both internal and external customers.
Stakeholder & Vendor Management
-
Maintain effective relationships with internal and external stakeholders including tenants, suppliers, contractors, and regulatory bodies.
-
Oversee the Managed Service Provider's performance, ensuring contract compliance, timely issue resolution, and consistent service quality.
-
Negotiate Service Level Agreements (SLAs) and manage performance expectations to ensure mutually beneficial partnerships.
Incident & Emergency Response
-
Implement and manage effective incident response protocols and escalation procedures with the NOC and MSP.
-
Coordinate incident investigations, produce post-incident reports, and drive root-cause analysis to prevent recurrence.
-
Be available on-call to handle critical issues, ensuring rapid resolutions that minimize downtime and safeguard tenant operations.
-
Serve as a tier 3 escalation point for onsite operational incidents, major issues, and change management activities.
-
Oversee the incident, problem, and change management processes to reduce disruptions in customer service.
Reporting & Documentation
-
Produce regular operational reports that highlight performance metrics, service availability, SLA compliance, and incident response effectiveness.
-
Provide clear and concise reporting, periodic and ad hoc, to senior management, highlighting risks, opportunities, and progress against strategic objectives.
-
Maintain accurate and comprehensive documentation for audit and compliance requirements, including RunBook, change management records, escalation logs, and executed service catalog items.
Qualifications:
-
Bachelor's degree in Electrical, Mechanical, or other relevant Engineering, Computer Science, or related technical discipline.
-
Fluency in English (written and spoken) is mandatory.
-
Master's degree in Business Administration or a related field is a plus.
-
8+ years of experience in data center operations, with at least 3+ years in a managerial or leadership capacity.
-
Demonstrated expertise in mechanical, electrical, and plumbing (MEP) systems relevant to critical facility operations.
-
Experience in managing large-scale data center facilities, budgets, and outsourced service providers.
-
Strong understanding of hyper-scale data center design, operation, and sustainability best practices.
-
Familiarity with industry standards and certifications such as Uptime Institute, ISO 27001, and other compliance frameworks.
-
Proficiency in using DCIM (Data Center Infrastructure Management) tools and BMS (Building Management Systems).
-
Solid knowledge of SLA management, vendor performance tracking, and ITIL processes.
Key Competencies:
-
Proven ability to lead and develop diverse teams, fostering a culture of high performance and continuous improvement.
-
Excellent communication and interpersonal skills, capable of effectively engaging with technical and non-technical stakeholders.
-
Strong project management skills, with the ability to prioritize tasks and meet deadlines in a fast-paced environment.
-
Adept at root-cause analysis for complex technical issues, with a track record of implementing successful mitigation strategies.
-
Comfortable interpreting data center infrastructure metrics and making data-driven decisions.
-
Keen attention to detail, ensuring accuracy and completeness in documentation and reporting.
-
Results-driven mindset with a focus on customer satisfaction and operational excellence.
-
High level of integrity, transparency, and accountability in all actions and decisions.
-
Flexible and adaptable, willing to attend site and make calls outside duty hours in case of emergency.