About the Role
SpacePointe is seeking a DevOps & Infrastructure Lead to be the operator and enabler of our engineering teams—owning reliability, performance, cost efficiency, and security at the infrastructure level. You will manage our cloud environments, streamline deployments, enforce monitoring, and ensure that our systems are resilient and scalable across multiple markets.
This role is not about managing people through reports—it is about hands-on ownership of infrastructure, writing automation code, setting best practices, and ensuring our environments meet business-critical SLAs.
ResponsibilitiesInfrastructure & Cloud Operations
- Own and manage Azure infrastructure including virtual machines, networking, database servers, and API gateways.
- Define cloud architecture patterns for resilience, scalability, and cost optimization.
- Establish and enforce infrastructure-as-code (IaC) standards using Terraform, ARM templates, or similar.
CI/CD & Deployment Ownership
- Define, build, and maintain CI/CD pipelines supporting automated builds, deployments, rollbacks, and smoke tests.
- Continuously improve deployment velocity while maintaining rollback safety and stability.
- Integrate automated load/performance testing into release workflows.
Monitoring & Performance Management
- Deploy and manage centralized monitoring/logging solutions (Grafana, Prometheus, ELK stack).
- Track system latency, throughput, and infra utilization; optimize database/caching layers.
- Partner with engineering to identify and resolve performance bottlenecks before production impact.
Incident Response & Security Support
- Provide 24/7 incident response support for infrastructure and system-level issues.
- Drive MTTR (Mean Time to Recovery) reduction through automation and playbooks.
- Support security remediation (network hardening, OS patching, access control enforcement).
- Validate and monitor fixes for long-term security alignment.
Cost Efficiency & Governance
- Continuously analyze cloud spend to ensure adherence to budget.
- Proactively recommend optimizations to cut unnecessary infra costs.
- Provide reporting on infrastructure utilization, spend, and performance.
Key Performance Indicators (KPIs)
- System Uptime (SLA % met) – 99.9% or better.
- Deployment Velocity – reduced lead time for changes, pipeline success rate above 95%.
- Performance Stability – % of releases passing automated load/performance benchmarks.
- MTTR – average time to resolve infra/security incidents.
- Cost Efficiency – cloud spend within or below approved budget.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- 6+ years in DevOps, Infrastructure Engineering, or Site Reliability Engineering (SRE).
- Deep experience with Azure cloud services (or AWS/GCP with willingness to adapt).
- Strong background in CI/CD tooling (Azure DevOps, Jenkins, GitHub Actions).
- Proven expertise in monitoring/logging platforms (Grafana, Prometheus, ELK, Datadog).
- Proficiency with infrastructure automation (Terraform, Ansible, ARM).
- Hands-on experience with relational and NoSQL databases and caching (Redis, Memcached).
- Strong knowledge of networking, load balancing, and security hardening.
- Excellent problem-solving skills and ability to handle high-pressure incidents.
Why Join SpacePointe?
- Work on mission-critical systems that power payments and commerce across 16+ countries.
- Be part of a hands-on engineering culture—your decisions directly shape reliability, performance, and cost efficiency.
- Collaborate with senior leaders and engineers across the globe in an agile, innovation-driven environment.
- Grow your expertise in multi-country, multi-vertical infrastructure at scale.
Job Types: Full-time, Permanent
Pay: E£35,000.00 - E£55,000.00 per month
Application Question(s):
- CI/CD & Deployment Automation
Describe a time you designed and managed a CI/CD pipeline for a complex system.
What tools did you use, and how did you ensure automated rollbacks and performance checks were integrated?
- Infrastructure Performance & Reliability
How have you optimized cloud infrastructure performance and cost in past roles (e.g., database tuning, caching strategies, scaling policies)?
Provide a specific example with measurable outcomes (e.g., latency reduced, cost savings achieved).
- Incident Response & Security Remediation
Walk us through how you handled a major infrastructure or security incident.
What steps did you take from detection through recovery, and how did you reduce MTTR (Mean Time to Recovery) in the process?