October 9, 2024

Resilience in the Cloud: Why it's just as important as flexibility and availability

AuthorMartin Krueger

The cloud has long established itself as an indispensable technology, bringing with it buzzwords like "cloud-first," "cloud-native," and "cloud-only." However, one term is mentioned less frequently, even though it is just as critical: "cloud-resilient". What does this mean, and why should businesses pay more attention to it?

Cloud Advantages: Flexible, Available, Simple

The benefits of the cloud are clear and have convinced businesses of all sizes:

  • Flexibility: Cloud models can be tailored to individual needs while relieving IT administrators at the same time.
  • High Availability: Cloud services are accessible nearly all the time (except during planned maintenance windows).
  • Plug and Play: Thanks to standardized solutions, providers can quickly and easily provide resources. For users, this means minimal effort and maximum efficiency.

At Libelle, we also use cloud resources to meet short-term demands — there’s no avoiding the cloud.

But what about resilience?

In the cloud, responsibility for resource availability is transferred to the respective service provider. Service Level Agreements (SLAs) define the extent of this responsibility, often with negotiable terms. However, both planned and unplanned scenarios can affect availability:

  • Planned outages: Maintenance windows that are announced in advance.
  • Unplanned outages: Critical network issues in the provider’s data center or force majeure events.
  • Unplanned outages can quickly lead to disasters. Think of an ERP system that’s down for longer than expected: orders can’t be processed; deliveries can’t be received — this could be life-threatening for some companies.

"Five Nines": The Gold Standard of Availability

A commonly sought benchmark for cloud availability is the "five nines," or 99.999 %. This means systems are down for only about 5 minutes per year. By comparison, an availability rate of "just" 99.5 % would result in roughly 1.8 days of downtime — a significant difference that could severely disrupt business operations.

Risk Management and Alternative Strategies

How can businesses mitigate this risk? There are two main approaches:

  1. Risk Acceptance: Relying on the availability guarantees provided in the SLAs.
  2. Alternative Scenarios: Implementing redundant systems or outsourcing to third parties or private environments as a backup plan.

For highly available or business-critical systems, the latter is often the safer choice. Companies should distinguish between which systems truly need high availability and which are less critical.

Cloud Resilience even for less critical systems

Even for seemingly less critical systems, such as HR or CRM software that’s accessed via a browser, businesses should keep resilience in mind. Often, only a single cloud variant is offered, and deviations from the availability guarantee could disrupt processes — even if they are on the periphery of core business operations.

Dependency on Third-Party providers requires a robust Resilience Strategy

A key challenge remains: End users do not control the availability of the actual resources. This creates a dependency on third parties.

For (IT) operations in a scalable environment, there are essentially no technical limits. However, when it comes to the resilience of the environment, awareness and the implementation of alternative scenarios are crucial.

Libelle’s Role in Cloud-Resilient Strategies

When building a cloud-resilient infrastructure, Libelle can be a reliable partner. To create redundancies, our customers replicate critical systems even in cloud environments, preventing outages. Even during maintenance windows, the systems remain available.

One major advantage of our on-prem solution: It works offline as well, providing security in hybrid or pure cloud scenarios — a sensible safeguard against unforeseen risks.


Learn more about Libelle BusinessShadow.


Recommended articles
November 30, 2022 The 12 Factor App Part 3: (Disposability, Dev/prod parity, Logs, Admin processes)
September 23, 2022 Data loss: How to protect your data and IT

All blog articles