Service Level Agreements
Service Level Agreements
The relationship between the cloud provider and the cloud consumer must be described with a
Service Level Agreement. Because cloud consumers trust cloud providers to deliver some of their
infrastructure services, it is vital to define those services, how they are delivered and how they
are used.
An SLA is the foundation of the consumer's trust in the provider. A well-written SLA codifies
the provider's reputation.
In addition to the prose that defines the relationship between the consumer and provider, an SLA
contains Service Level Objectives (SLOs) that define objectively measurable conditions for the
service. The consumer must weigh the terms of the SLA and its SLOs against the goals of their
business to select a cloud provider.
It is crucial that the consumer of cloud services fully understand all the terms of the provider's
SLA, and that the consumer consider the needs of their organization before signing any
agreement.
1.What is an SLA?
An SLA defines the interaction between a cloud service provider and a cloud service consumer.
An SLA contains several things:
A set of services the provider will deliver
A complete, specific definition of each service
The responsibilities of the provider and the consumer
A set of metrics to determine whether the provider is delivering the service as promised
An auditing mechanism to monitor the service
The remedies available to the consumer and provider if the terms of the SLA are not met
How the SLA will change over time
The marketplace features two types of SLAs: Off-the-shelf agreements and negotiated
agreements between a provider and consumer to meet that consumer's specific needs. It is
unlikely that any consumer with critical data and applications will be able to use the first type.
Therefore the consumer's first step
in approaching an SLA (and the cloud in general) is to determine how critical their data and
applications are.
Most public cloud services offer a non-negotiable SLA. With these providers, a consumer whose
requirements aren't met has two remedies:
1. Accept a credit towards next month's bill (after paying this month's bill in full), or
2. Stop using the service.
Clearly an SLA with these terms is unacceptable for any mission-critical applications or data. On
the other hand, an SLA with these terms will be far less expensive than a cloud service provided
under a negotiated SLA.
4.4System Redundancy
Many cloud providers deliver their services via massively redundant systems. Those systems are
designed so that even if hard drives or network connections or servers fail, consumers will not
experience any outages. Consumers moving data and applications that must be constantly
available should consider the redundancy of their provider's systems.
4.5Maintenance
Providers handle the maintenance of their infrastructure, freeing consumers from having to do that
themselves. However, consumers should understand how and when their providers will do
maintenance tasks. Will their services be unavailable during that time? Will their services be
available, but with much lower throughput? If there is a chance the maintenance will affect the
consumer's applications, will the consumer have a chance to test their applications against the
updated service? Note that maintenance can affect any type of cloud offering and that it applies to
hardware as well as software.
4.6Location of Data
The physical location of many types of data is restricted. For example, many countries prohibit
storing personal information about its citizens on any machine outside its borders. If a cloud
provider cannot guarantee that a consumer's data will be stored in certain locations only, the
consumer cannot use that provider's services. If a cloud service provider promises to enforce data
location regulations, the consumer must be able to audit the provider to prove that regulations are
being followed.
4.7Seizure of Data
There have been a few well-publicized instances of law enforcement officials seizing the assets
of a hosting company. Even if law enforcement targets the data and applications associated with
a particular consumer, the multi-tenant nature of cloud computing makes it likely that other
consumers will be affected. Although there are limits to what an SLA can cover, consumers
should consider the laws that apply to the provider. Consumers should also consider using a third
party to back up their data and applications.
5.SLA requirements
5.1Security
Security as a general requirement is discussed in detail in Sections 6 and 7 of this paper. The
security-related aspects of an SLA should be written with the security controls and federation
patterns from Section 6 in mind. A cloud consumer must understand their security requirements
and what controls and federation patterns are necessary to meet those requirements. In turn, a
cloud provider must understand what they must deliver to the consumer to enable the appropriate
controls and federation patterns.
5.2Data Encryption
If a consumer is storing vital data in the cloud, it is important that the data be encrypted while it
is in motion and while it is at rest. The details of the encryption algorithms and access control
policies should be specified in the SLA.
5.3Privacy
Basic privacy concerns are addressed by requirements such as data encryption, retention and
deletion. In addition, an SLA should make it clear how the cloud provider isolates data and
applications in a multi-tenant environment.
5.4Data Retention and Deletion
Many organizations have legal requirements that data must be kept for a certain period of time.
Some organizations also require that data be deleted after a certain period of time. Cloud
providers must be able to prove they are compliant with these policies.
5.6Regulatory Compliance
Many types of data and applications are subject to regulations. Some of those are laws (HIPAA
for medical records in the United States), while others are industry-specific (PCI DSS for
retailers who accept credit cards). If regulations must be enforced, the cloud provider must be
able to prove their compliance.
5.7Transparency
Under the SLAs of some cloud providers, the consumer bears the burden of proving that the
provider failed to live up to the terms of the SLA. A provider's service might be down for hours,
but consumers who are unable to prove that downtime are not eligible for any sort of
compensation.
For critical data and applications, providers must be proactive in notifying consumers when the
terms of the SLA are breached. This includes infrastructure issues such as outages and
performance problems as well as security incidents.
5.8Certification
There are many different certifications that apply to certain types of data and applications. For
example, consumer might have the requirement that their cloud provider be ISO 27001 certified.
The provider would be responsible for proving their certification and keeping it up-to-date.
5.9Terminology for key performance indicators
The term uptime can be defined in many ways. Often that definition is specific to a provider's
architecture. If a provider has a data center on six continents, does uptime refer to a particular
data center or any data center? If the only available data center is on another continent, that
uptime is unlikely to be acceptable. To make matters worse, other cloud providers will use
definitions specific to their architectures. This makes it difficult to compare cloud services.
A set of industry-defined terms for different key performance indicators would make it much
easier to compare SLAs in particular (and cloud services in general).
5.10Monitoring
If a failure to meet the terms of an SLA has financial or legal consequences, the question of who
should monitor the performance of the provider (and whether the consumer meets its
responsibilities as well) becomes crucial. It is in the provider's interest to define uptime in the
broadest possible terms, while consumers could be tempted to blame the provider for any system
problems that occur. The best solution to this problem is a neutral third-party organization that
monitors the performance of the provider. This eliminates the conflicts of interest that might
occur if providers report outages at their sole discretion or if consumers are responsible for
proving that an outage occurred.
5.11Auditability
Many consumer requirements include adherence to legal regulations or industry standards.
Because the consumer is liable for any breaches that occur, it is vital that the consumer be able to
audit the provider's systems and procedures. An SLA should make it clear how and when those
audits take place. Because audits are disruptive and expensive, the provider will most likely
place limits and charges on them.
5.12Metrics
Monitoring and auditing require something tangible that can be monitored as it happens and
audited after the fact. The metrics of an SLA must be objectively and unambiguously defined.
Cloud consumers will have an endless variety of metrics depending on the nature of their
applications and data. Although listing all metrics it is impossible, some of the most common
are:
Throughput – How quickly the service responds
Reliability – How often the service is available
Load balancing – When elasticity kicks in (new VMs are booted or terminated, for
example)
Durability – How likely the data is to be lost
Elasticity – The ability for a given resource to grow infinitely, with limits (the maximum
amount of storage or bandwidth, for example) clearly stated
Linearity – How a system performs as the load increases
Agility – How quickly the provider responds as the consumer's resource load scales up
and down
Automation – What percentage of requests to the provider are handled without any
human interaction
Customer service response times – How quickly the provider responds to a service
request. This refers to the human interactions required when something goes wrong with
the on-demand, self-service aspects of the cloud.
5.13Machine-Readable SLAs
A machine-readable language for SLAs would enable an automated cloud broker that could select
a cloud provider dynamically. One of the basic characteristics of cloud computing is on-demand
self-service; an automated cloud broker would extend this characteristic by selecting the cloud
provider on demand as well. The broker could select a cloud provider based on business criteria
defined by the consumer. For example, the consumer's policy might state that the broker should
use the cheapest possible provider for some tasks, but the most secure provider for others.
Although substantial marketplace demand for this requirement will take some time to develop,
any work on standardizing SLAs should be done with this in mind.
5.14Human Interaction
Although on-demand self-service is one of the basic characteristics of cloud computing, the fact
remains that there will always be problems that can only be resolved with human interaction.
These situations must be rare, but many SLAs will include guarantees about the provider's
responsiveness to requests for support. Typical guarantees will cover how many requests the
consumer can make, how much they will cost and how soon the provider will respond.