Service Levels: A Better System Availability Service Level

As I've pointed out previously, one of the primary service levels in just about any cloud computing / SaaS agreement is system availability. While there may be an exception in some cases where the availability of the system to the customer is far outweighed by other performance factors, it's rare that the availability service level is not considered the primary measure by which the vendor's performance is measured. This makes perfect sense since the customer's use of the system in a hosted environment is wholly dependent on the vendor keeping the system available and performing appropriately.

Is the System "Unavailable"?

The vendor's standard service level agreement most likely contains a default availability standard with service level credits in the case where the service level is not met over the period of measurement. For example, we might see a 99.9% System Availability Service Level measured 24 x 7 over each month. In this case, if the system is "unavailable" more than about 43.8 minutes in a month, the corresponding service level credit will be triggered. There are a number of items to look at closely when considering the appropriateness of this service level, but for the purposes of this post I'll focus on the term "unavailable."

With the vendor's starting language, "unavailable" is typically defined along the lines of "The Customer is unable to access the Services, excluding any downtime caused by an event outside of vendor's reasonable control." In some cases, the vendor may even require the customer to report a period of downtime before it will affect the service level. But this definition doesn't begin to capture the customer's needs for system availability.

A Better Definition

To begin, the concept of system unavailability cannot be dependent on the customer's reporting of the outage to the vendor. The vendor may try to persuade the customer that this makes sense because an outage that the customer is not aware of doesn't really affect the customer and therefore shouldn't trigger a potential credit. But one must remember that many systems may be used across company divisions by a large number of employees. An employee trying to complete the week's payroll entries in the accounting department in Poughkeepsie may experience an outage that doesn't allow her to complete her work yet may have no idea that the outage should be reported to the vendor or even where to begin to report it. Instead, the vendor is best suited to continuously monitor the system availability and report the metric to the customer.

Second, unavailability must be tied not only to the entire system but to the significant operations of the system. For example, it's entirely possible that a system is up and running but the availability to transmit electronic financial transactions isn't working. If that function is key to the customer's use of the system, then its unavailability should trigger the downtime remedies just as if the entire system were down.

Finally, as I'll discuss in more detail in a future post, a force majeure event should generally not provide an exception to meeting the availability service level. Essentially, a force majeure exception avoids the essential purpose of the system availability service level.

Sample Language

Here's an example of an availability service level that meets the customer's needs:

Vendor will provide at least 99.9% System Availability, measured on a per calendar-month basis. "System Availability" is defined as the ability of a Customer user to (a) access and transmit electronic payment information using the System, (b) record employee payroll information, and (c) generate ad hoc and month-end management reports. Loss of Service Availability caused by (i) Customer’s acts or omissions; (ii) Customer’s Internet connectivity; or (iii) Vendor's regularly scheduled maintenance windows as defined below will be excluded from Service Availability calculations.