T 3.100 Improper use of snapshots of virtual IT systems

Using snapshots, the state of a virtual machine can be frozen at any time. In this respect, it is not important whether the system is running or not in the moment the snapshot is created. In this manner, it is possible to easily access the state of the virtual IT system preserved in the snapshot. The snapshot can also be transferred to another virtualisation server or be used as data backup.

If the virtual machine is still operated after a snapshot has been created and the preserved state is loaded afterwards, all changes made to the guest system since that time are lost. When proceeding without due care, this might cause losses of data and is often undesirable for productive systems. Changes to the operating system, services and applications of the virtual IT system can also be reset in this manner. Insufficient file authorisations, security gaps and vulnerabilities or even deleted user accounts thus become active again.

For virtual servers which have open files or database sessions, this might result in inconsistent data. This is the case for instance, when information is being written by a client to the virtualised server whilst the snapshot is created. The file content to be saved is then not included completely in the snapshot. If the frozen state of the virtual machine is now used again, it is highly probable that there are defective files or databases with impaired integrity.

Distributed systems such as database clusters or also Active Directory domain controllers usually use a replication mechanism to ensure that their data is synchronised. Here, significant problems might occur when they are set to a snapshot. In such a case, inconsistencies might arise in the databases, which cannot be resolved by means of the replication mechanism.

If there is not sufficient disk space for comprehensive or several snapshots available, it might happen that there are bottlenecks regarding the disk space and it is not possible to save other information.

Example:

A large photo laboratory develops films for its customers. For this purpose, the customers send in their films in an envelope and provide their return addresses on this envelope. All envelopes are labelled with a unique number. In the laboratory, the films are additionally assigned an internal reference number. This internal reference number is used to make the films anonymous. For the automated shipping procedure, the reference number is stored together with the machine-readable envelope number in a database. When the photos have been developed, they are automatically assigned to the envelope based on the reference numbers. The envelope is then returned to the customer by mail.

The management of the photo laboratory has now decided to virtualise not only other IT systems, but also the database system ensuring the assignability of the reference and envelope numbers.

During the production in the laboratory, the administrator responsible detects a problem on the virtual database server. In order to eliminate this problem quickly, he resets the server to a snapshot. He knows that the server functioned properly at the time at which the snapshot was created. However, the assignment of reference and envelope numbers is now no longer correct, since the table with the assignment of the reference numbers to the envelope numbers is also reset to the snapshot. The error remains unnoticed in the shipping department. As a consequence, several customers are sent the wrong films. A large number of films can no longer be assigned to the customers, which causes a loss of reputation of the photo laboratory resulting in significant losses in sales.