Cloud Storage Engineer (Ceph)

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

About the Team

At Trendyol Tech, our mission is to create a positive impact in our ecosystem by enabling commerce through technology.

We solve complex problems with data, creativity, and agility — always driven by real outcomes. With a culture built on learning, collaboration, and ownership, we grow together while building what’s next.

About the Role

This role focuses on building, operating, and continuously improving the core storage backbone of a large-scale private cloud. You will take technical leadership over Ceph, ensuring its performance, availability, and scalability as the platform grows. The position combines deep operations knowledge with modern automation and software engineering practices.

You will work on multi-site Ceph architectures, drive DR strategies, and contribute to object storage, block storage, and file system solutions consumed by OpenStack and Kubernetes environments. The role requires hands-on expertise across Ceph OSDs, MON/MGR services, RGW, CRUSH maps, placement groups, and performance tuning.

Beyond day-to-day operations, you will build tooling, improve monitoring, participate in incident response, lead capacity planning, and collaborate with other infrastructure and platform teams to align storage capabilities with broader cloud initiatives. This role is ideal for an engineer who wants deep technical ownership and the opportunity to shape the evolution of large-scale storage systems.

Responsibilites

- Operate, scale, and evolve large-scale Ceph clusters used as the core storage layer of a private cloud platform.
- Lead Ceph upgrades, expansions, and lifecycle operations across multi-region environments with minimal impact to production workloads.
- Design and manage multi-site and geo-replicated Ceph architectures to ensure high availability, durability, and disaster recovery readiness.
- Develop and maintain automation tooling using Ansible, Python, and Go to standardize cluster provisioning, configuration, and operational workflows.
- Implement efficient storage tiering strategies, including hot/cold layers, cache tiers, and erasure-coded pools based on performance and cost requirements.
- Troubleshoot complex distributed storage issues, perform deep root-cause analyses, and drive long-term reliability improvements.
- Build observability pipelines integrating Ceph metrics into Prometheus/Grafana, ELK/Opensearch, and create actionable, predictive alerting mechanisms.
- Collaborate closely with OpenStack, networking, compute, and platform engineering teams to ensure seamless integration between Ceph and dependent services.
- Operate and harden S3-compatible object storage services using RGW, including lifecycle management, S3 API compatibility, and integration with CDN or edge caching layers.
- Contribute to storage orchestration tools, Kubernetes operators, and internal CI systems for continuous validation of storage functionality and performance.

Expected Qualifications

- Strong ownership mentality and the ability to independently drive complex technical projects from design to production.
- Deep understanding of Linux internals, distributed systems, and large-scale storage operations.
- Structured and clear problem-solving skills, especially in high-pressure or incident scenarios.
- Proficiency in automation, reproducible operations, and writing clean, maintainable code.
- Strong communication skills, capable of writing documentation, RFCs, and mentoring engineers.
- Curiosity, adaptability, and willingness to explore new technologies such as NVMe/TCP, operators, and large-scale observability stacks.
- A pragmatic engineering mindset focused on reliability, simplicity, and measurable outcomes.
- Collaborative attitude and the ability to work closely with cross-functional teams in networking, compute, platform, and cloud infrastructure domains.

What We Offer

Hybrid working model with flexibility: a schedule that helps you find the right balance between flexibility and team bonding, including work-from-abroad opportunities and a summer working model.
Customisable FlexBenefits budget: Adjust your daily meal allowance, choose your health insurance package (and extend it to your spouse or children), and pick from additional benefits like fuel support or Trendyol shopping credits.
Well-being support: Access to location-based in-house doctors, as well as psychologist and dietitian support, and HPV vaccination provision.
Personalised training allowance and learning opportunities: Use your annual budget for any training or conference of your choice, explore our Learning Management System (LMS) anytime, and join in-person learning sessions offered throughout the year.
Responsibility from day one: Take full ownership from the start in a culture where every voice is heard and valued.
A diverse, international team: Collaborate with global peers across our offices in Berlin, Amsterdam, Dubai, and beyond, in a startup-spirited and collaborative environment.
Opportunities to grow with the best: Tackle meaningful challenges, develop through hands-on experience, and grow with the support of expert guidance and global mentoring.
Meaningful connections beyond tasks: Be part of team rituals, events, and social activities that help us stay connected and inspired.

Take the Next Step

If this role excites you, apply today, we look forward to taking the next step with you.

Want to get to know the team better first? Explore our Career Website, LinkedIn, or YouTube to learn more about #LifeatTrendyol and how we work.

Similar jobs