infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service
(SaaS). This chapter describes several aspects of data security, including:
• Processing of data, including multitenancy
• Data lineage
• Data provenance
• Data remanence
The objective of this chapter is to help users evaluate their data security scenarios and make
informed judgments regarding risk for their organizations. As with other aspects of cloud
computing and security, not all of these data security facets are of equal importance in all
topologies (e.g., the use of a public cloud versus a private cloud, or non-sensitive data versus
Aspects of Data Security
With regard to data-in-transit, the primary risk is in not using a vetted encryption algorithm.
Although this is obvious to information security professionals, it is not common for others to
understand this requirement when using a public cloud, regardless of whether it is IaaS, PaaS,
or SaaS. It is also important to ensure that a protocol provides confidentiality as well as integrity
(e.g., FTP over SSL [FTPS], Hypertext Transfer Protocol Secure [HTTPS], and Secure Copy
Program [SCP])—particularly if the protocol is used for transferring data across the Internet.
Merely encrypting data and using a non-secured protocol (e.g., “vanilla” or “straight” FTP or
HTTP) can provide confidentiality, but does not ensure the integrity of the data (e.g., with the
use of symmetric streaming ciphers).
Although using encryption to protect data-at-rest might seem obvious, the reality is not that
simple. If you are using an IaaS cloud service (public or private) for simple storage (e.g.,
Amazon’s Simple Storage Service or S3), encrypting data-at-rest is possible—and is strongly
suggested. However, encrypting data-at-rest that a PaaS or SaaS cloud-based application is
using (e.g., Google Apps, Salesforce.com) as a compensating control is not always feasible.
Data-at-rest used by a cloud-based application is generally not encrypted, because encryption
would prevent indexing or searching of that data.
Data Security Mitigation
If prospective customers of cloud computing services expect that data security will serve as
compensating controls for possibly weakened infrastructure security, since part of a customer’s
infrastructure security moves beyond its control and a provider’s infrastructure security may
(for many enterprises) or may not (for small to medium-size businesses, or SMBs) be less robust
than expectations, you will be disappointed. Although data-in-transit can and should be
encrypted, any use of that data in the cloud, beyond simple storage, requires that it be
decrypted. Therefore, it is almost certain that in the cloud, data will be unencrypted. And if
you are using a PaaS-based application or SaaS, customer-unencrypted data will also almost
certainly be hosted in a multitenancy environment (in public clouds). Add to that exposure
the difficulties in determining the data’s lineage, data provenance—where necessary—and even many providers’ failure to adequately address such a basic security concern as data
remanence, and the risks of data security for customers are significantly increased.
So, what should you do to mitigate these risks to data security? The only viable option for
mitigation is to ensure that any sensitive or regulated data is not placed into a public cloud (or
that you encrypt data placed into the cloud for simple storage only). Given the economic
considerations of cloud computing today, as well as the present limits of cryptography, CSPs
are not offering robust enough controls around data security. It may be that those economics
change and that providers offer their current services, as well as a “regulatory cloud
environment” (i.e., an environment where customers are willing to pay more for enhanced
security controls to properly handle sensitive and regulated data). Currently, the only viable
option for mitigation is to ensure that any sensitive or regulated data is not put into a public
Provider Data and Its Security
In addition to the security of your own customer data, customers should also be concerned
about what data the provider collects and how the CSP protects that data. Specifically with
regard to your customer data, what metadata does the provider have about your data, how is
it secured, and what access do you, the customer, have to that metadata? As your volume of
data with a particular provider increases, so does the value of that metadata.
Additionally, your provider collects and must protect a huge amount of security-related data.
For example, at the network level, your provider should be collecting, monitoring, and
protecting firewall, intrusion prevention system (IPS), security incident and event
management (SIEM), and router flow data. At the host level your provider should be collecting
system logfiles, and at the application level SaaS providers should be collecting application log
data, including authentication and authorization information.
What data your CSP collects and how it monitors and protects that data is important to the
provider for its own audit purposes (e.g., SAS 70, as discussed in Chapter 8). Additionally, this
information is important to both providers and customers in case it is needed for incident
response and any digital forensics required for incident analysis
For data stored in the cloud (i.e., storage-as-a-service), we are referring to IaaS and not data
associated with an application running in the cloud on PaaS or SaaS. The same three
information security concerns are associated with this data stored in the cloud (e.g., Amazon’s
S3) as with data stored elsewhere: confidentiality, integrity, and availability.
When it comes to the confidentiality of data stored in a public cloud, you have two potential
concerns. First, what access control exists to protect the data? Access control consists of both
authentication and authorization. As we will discuss further in Chapter 5, CSPs generally use
weak authentication mechanisms (e.g., username + password), and the authorization
(“access”) controls available to users tend to be quite coarse and not very granular. For large
organizations, this coarse authorization presents significant security concerns unto itself. Often,
the only authorization levels cloud vendors provide are administrator authorization (i.e., the
owner of the account itself) and user authorization (i.e., all other authorized users)—with no
levels in between (e.g., business unit administrators, who are authorized to approve access for
their own business unit personnel)
In addition to the confidentiality of your data, you also need to worry about the integrity of
your data. Confidentiality does not imply integrity; data can be encrypted for confidentiality
purposes, and yet you might not have a way to verify the integrity of that data. Encryption
alone is sufficient for confidentiality, but integrity also requires the use of message
authentication codes (MACs). The simplest way to use MACs on encrypted data is to use a
block symmetric algorithm (as opposed to a streaming symmetric algorithm) in cipher block
chaining (CBC) mode, and to include a one-way hash function. This is not for the
cryptographically uninitiated—and it is one reason why effective key management is difficult.
At the very least, cloud customers should be asking providers about these matters. Not only is
this important for the integrity of a customer’s data, but it will also serve to provide insight on
how sophisticated a provider’s security program is—or is not. Remember, however, that not
all providers encrypt customer data, especially for PaaS and SaaS services.
Assuming that a customer’s data has maintained its confidentiality and integrity, you must also
be concerned about the availability of your data. There are currently three major threats in
this regard—none of which are new to computing, but all of which take on increased
importance in cloud computing because of increased risk.