Dirty Disks Raise New Questions About Cloud Security
24 April 2012
During our research last year into Cloud Node security here we identified a security vulnerability affecting some customers at Rackspace and at VPS.NET, which were two out of the four providers we tested. Subsequent research found that VPS.NET’s service based on OnApp technology used by over 250 other providers, some of whom may share the same vulnerability. While Rackspace know of no instance of customer data being compromised through this vulnerability, they asked us to delay publication of its findings until Rackspace engineers could fully remediate the vulnerability and secure their customers. Rackspace recently completed those remediation efforts, and worked with us to publish our full findings, in hopes that they are helpful to other Cloud hosting providers and their customers.
The issue identified was in the area of separation of data. By launching several Cloud servers and conducting a detailed security assessment, we were able, on some of those servers, to gain access to some customers’ data remnants. This issue was found to affect only Linux servers on what now is the legacy platform for Rackspace. These data fragments included some personal identifiable information, such as parts of customer databases and elements of system information such as Linux shadow files (containing the system’s password hashes).
This information would not be evident to the typical user of Cloud servers and would have to be sought. Rackspace has told us that they are not aware of any exploitation of the vulnerability. The remnant data was randomly distributed and would not allow a malicious user to target a specific customer. A malicious user who discovered the vulnerability could, however, exploit it to harvest whatever unencrypted data he came across: e.g. personal information, credit cards or credentials. For that reason, we reported the vulnerability to both providers. They both said that they have since resolved the issue, and one, Rackspace, has provided details of their resolution, as detailed in this blog post.
The purpose of this blog post is to raise awareness so that other providers of Cloud Infrastructure services can ensure they do not expose their customers in the same manner and that this type of issue is not reintroduced in the future. It will also briefly describe how to test a virtual server for the issue to ensure that the provider you are choosing is not obviously vulnerable.
This blog was originally due to be released in conjunction with our talk at the CREST Conference (16th February 2012) but was delayed to allow for the necessary time for Rackspace to complete its implementation of its resolution. It should be noted that in line with our ethical obligations and confidentiality obligations, we limited access to making a determination that the data was available. We did not disclose, use, record, transmit or store any of the data it accessed.
While researching Cloud security for our Whitepaper, we gained approval from the companies involved to perform limited testing on our virtual servers. One of the tests we performed was to determine the level of separation between virtual servers, through memory and disk analysis. The theory being that if virtual machines are not sufficiently isolated or a mistake is made somewhere in the provisioning or de-provisioning process, then leakage of data might occur between servers. This would be evident in the resources visible to the virtual instance.
Memory analysis did not yield any results and none of the testers expected disk analysis to do so either. After examining a brand new provisioned disk on one of the providers, some interesting and unexpected content was discovered. There were references to an install of Wordpress and a MySQL configuration, even though the virtual server had neither installed. Expecting it to be perhaps just a ‘dirty’ OS image, a second virtual server was created and tested in the same way. Surprisingly the data was completely different, in this case exposing fragments of a website’s customer database and of Apache logs which identified the server the data was coming from. This confirmed the data was not from our provisioned server. From this it was reasonable to assume that we had identified a significant security issue and the provider was contacted. Continuing the research we also identified a second provider with what appeared to be a similar problem.
The issue itself was due to the way in which the providers provisioned new virtual servers, and how they allocated new storage space. When a client wants to create a new virtual server they can go to the provider’s web site, select the type of operating system they require as well as the amount of storage space, click a button and the machine is automatically generated.
On the backend the service gathers sufficient disk space to contain the virtual image then overwrites the start of the disk with a pre-configured OS image. This means that only the start of the disk is filled with initialised data as the rest of the disk would never be explicitly written to during provisioning.
If this allocation was being performed using the hosting operating system’s file APIs this would not normally be a problem. The OS would ensure any uninitialised data was automatically zeroed before being returned to a user application (or in this case the virtual machine). Clearly in this case it was not using these mechanisms.
It is worth noting that the data was not live, in that it was not due to sharing of disk contents between running instances of virtual servers. It is understood that the data is due to disk and swap data not being zeroed after use. That being the case the most recent data we identified was still less than a week old. Even this would contain sufficient information to compromise another server (assuming you could locate all the necessary information to identify and access it); for example how often do administrators change the root password of a server? Also, the customers’ personal data and credit card information would likely be still valid.
VPS.NET - OnApp
VPS.NET, one of the Cloud providers we informed of the remnant-data vulnerability, later told us that it took 15 days to roll out a patch which fixed the issue. Unfortunately when we requested further information we did not receive any from VPS.NET. Therefore we investigated the technology they use and found that their service is based on OnApp. This is a complete Cloud solution, used across the globe by 250+ providers, which can be purchased off-the-shelf. OnApp were contacted and they provided us with information about how they approached resolving this issue for their providers.
Zeroing out the entire of the disk by default would be too costly from a hardware IO perspective, OnApp told us, and therefore an opt-in facility is provided for users to securely delete their virtual disk when they are de-provisioned, leaving thousands of virtual machines at potential risk. This solution is based on the assumption that the only disk activity on the physical disk is through the allocation and de‐allocation of the virtual disks themselves. If the disk moves or any temporary files are on the physical disk then there is still a risk of data leakage. Whether this solution is sufficient is unknown, because VPS.NET has provided no further information and has not sought further testing. If you are using a OnApp based Cloud provider then ensure you click on the secure wipe button if you are de‐provisioning your virtual server (assuming they have exposed it).
Data exposed prior to this fix was not seen as a serious issue by OnApp because they had very few customers at the time.
Rackspace has worked closely with us in addressing the issue within their environment, inviting us to their headquarters in San Antonio, Texas and giving us access to their engineers, executives and processes. They have been open with us on the causes of the vulnerability and also how they approached their solution. Rackspace informed us that, upon deprovisioning of disks, it has long been zeroing the area of disk occupied by a virtual machine. However, this operation was not effective in all instances. We were able to see some data being leaked. In response, Rackspace patched its related processes, to include all possible remnant data from new machines. Rackspace told us that it considered it unacceptable to leave old remnant data on its disks, and employed customer migrations to eliminate any possible data fragments. Context has performed testing on the remediated environment and did not find any evidence of data remnants. Details of the solutions implemented by Rackspace are included within this blog so that other Cloud providers can learn from their experience.
In Rackspace's case the issue stemmed from the specific way it used Xen Classic as a hypervisor, which was configured to read and write directly to the physical disk. Therefore when a new server was provisioned a section of the physical disk was allocated to the server as its virtual disk. This portion of the physical disk could then be read directly, by someone who knows how to read beyond the file system, allowing any residual data that might be present to be viewed.
The first question which might be asked about this issue is why wasn’t the slack space zeroed out on disk provisioning? The answer was that the performance of provisioning a new server would mean it would take too long and have a detrimental effect on the performance of the hypervisor. This left Rackspace with the very difficult task of firstly ensuring that any data deleted from the physical disk is actual zeroed and secondly cleaning up all the existing virtual disks.
The first part of fixing the issue was to ensure that any data deletion is secure. This includes de- provisioning of disks, disk migrations, the backup process etc. All of these use-cases must ensure they do not leave data behind. This sounds like a virtually impossible task to catch all the possible edge cases, but this is necessary to ensure the security of client data.
The second part is the reason it has taken since March of last year to fully correct the issue. While ensuring deleted data is always zeroed prevented new servers seeing other users’ data, all the existing servers potentially had fragments of other user's data already on their provisioned disks. Thousands of Rackspace's servers had to be either migrated to a new Xen Server platform which provided an abstraction layer (i.e. you can't read from the disk until you have written to it) or have their disks cleaned up. Rackspace have informed us that at the time of this blog post this process has been completed. We can confirm, based on recent tests, that this security issue was not seen in either the current Rackspace Cloud Servers environment or their new Next Generation Cloud, based on OpenStack.
Which Technologies and Providers Might Have This Issue?
Any instances where multiple users have access to a shared file system which provides direct hardware access to the disk are potentially at risk. This issue does not only affect the Cloud but also shared hosting, and other similar services.
Any hypervisor that does not provide an abstraction layer between the virtual machine and the physical disk could be susceptible, however as shown there are various factors which can either mitigate or minimise the issue. As is often the case, adequate testing is required to determine if a particular stack and configuration is vulnerable to this or any other security flaws.
Repeating the Results
Extracting the data from a virtual server is not difficult at all; this is perhaps the most surprising finding of this issue. The simplest technique is to just run the ‘strings’ utility as the root user and get it to read from a raw disk partition at a location where there should be no data, for example (where sda1 is the virtual disk and the skip value is the number of megabytes of currently used disk space):firstname.lastname@example.org:~# dd if=/dev/sda1 bs=1M skip=5000 | strings
By reviewing the resulting strings, any data from other users can be seen which indicates the disk is dirty. Another target for this is the swap partition, especially on a brand new installation. Unlike Windows, Linux dedicates a separate disk partition to swapping out virtual memory so it is easy to access. On a new installation this is likely to have never been used so any data contained within it is of interest. As the uninitialised data is likely to contain files from other servers, it is also possible to use various forensic tools, such as Foremost or EnCase to extract the data in a more complete form. When testing for this issue it is important to create a number of new machines with different configurations (such as memory and storage size) and test on each. During the original research it was found that even on providers which had the issue sometimes it was possible to get a server with no uninitialised data on it.
I am a Cloud Customer, What Can I Do About This?
There is very little that can be done from a customer’s point of view for data which might have already been lost. Rackspace reports that it knows of no instance of customer data being viewed by an unauthorised party through this vulnerability. Rackspace also informed us that they have fully resolved the issue. We have retested the current and Next Generation solutions and didn’t find any issues. VPS.NET claims to have rolled out a patch within 15 days of our notification, but has provided no further details.
With all Cloud providers, Context recommends that users follow best practices for hosted services for example using full disk encryption for sensitive data. If given the option to securely wipe the disk when de-provisioning, this option should always be taken. We recommend that concerned users of Cloud IaaS should contact their provider to determine if their data has been at risk. The only realistic way to mitigate this issue though is to ensure the providers you entrust your data to have measures to manage security risks and to resolve new vulnerabilities as they arise.
In conclusion as this is a single issue within the implementation of virtual server disk provisioning; it does not mean that the Cloud is broken. However due to the simplicity of this issue it does raise questions about the maturity of Cloud security testing. This is a significant issue that Cloud providers and customers should be aware of. Also while this blog post focuses on Cloud this issue could equally affect any multi-tenancy solution where different users share the same physical disk.
During the initial research we reviewed four providers and two of them were vulnerable to this issue. Subsequently we have found that OnApp based Cloud solutions appear to also have this issue. We don’t know how widespread this issue is amongst other providers and hopefully this blog post will raise awareness of the problem and questions concerning the maturity of the security posture of Cloud IaaS providers.
Context would like to thank Rackspace and OnApp for providing details of their approach to this problem.
References - http://www.contextis.com/research/white-papers/assessing-cloud-node-security
 - http://vps.net
 - http://www.rackspace.com
 - http://onapp.com/customers
 - http://foremost.sourceforge.net
 - http://www.guidancesoftware.com/forensic.htm