S 6.138 Drawing up a business continuity plan for virtualisation component failure
Initiation responsibility: Head of IT, IT Security Officer
Implementation responsibility: Administrator
The failure of virtualisation servers normally has wide-ranging consequences for the information system. This is due to the fact that not only the virtualisation component itself is affected by the failure, but also all virtualised IT systems operated on the components.
Therefore, the failure of a virtualisation component must not be considered in an isolated manner. Within the framework of planning the use of virtualisation of IT systems in computer centres, it must be taken into consideration that the extent of the damage caused by a failure also increases due to the consolidation effects aimed at regarding the use of the hardware. This extent of damage is higher the stronger the effects of the consolidation effects. Therefore, the protection requirements of the entirety of the virtual IT systems must be mapped to the protection requirements of the virtualisation components. In so doing, the maximum principle and the accumulation principle must be taken into consideration.
Moreover, it is usually not sufficient to only consider the failure of virtualisation servers virtualised IT systems are operated on. Additional IT systems required for operating the virtualisation servers must be incorporated. The failure of these systems may limit the availability of the virtualisation systems. Therefore, an approach regarding the failure of the following systems, if applicable, must be defined:
- virtualisation servers
- administration servers (particularly also connection brokers)
- licensing servers
Depending on how the virtualisation systems have been integrated into the information system, additional systems such as directory services and name resolution services must be considered as well.
Since infrastructure services such as directory services or name resolution services may also be executed on virtualised IT systems, it is possible that the failure of one or several virtualisation components results in a much more complex situation. For example, restarting a significantly virtualised computer centre requires detailed planning due to the service dependencies common in such computer centres.
Generally, the following aspects must be taken into consideration:
- The contingency planning for virtualisation systems must be integrated into the existing business continuity plan (see also module S 1.3 Business continuity management).
- A system failure of a virtualisation server may cause losses of data in all virtual IT systems executed on the failed virtualisation server. Therefore, the extent to which the existing data backup policies (see also module S 1.4 Data backup policy) require adaptation to the virtualisation technology must be checked for all virtual IT systems. For the virtual IT systems, it should be checked whether the new virtualisation technologies (snapshots) may be used for backing up the data and which advantages and disadvantages may result from this. Important images must be included in data backup.
- In case of failure of a virtualisation server, all virtual IT systems running on it will fail as well. The likelihood of a serious loss of data occurring in at least one of the affected virtual IT systems is proportional to the number of affected systems. Therefore, contingency planning must take into consideration that a more comprehensive recovery may need to be taken into account.
- If several virtualisation servers are used in a farm (virtual infrastructure), it must be ensured that the virtual IT systems are grouped reasonably. For example, two systems capable of alternately performing the tasks of the respective other system should not operated on the same virtualisation server.
- It must be ensured that personnel trained in handling virtual infrastructures are available in cases of emergency.
- The system configuration of the virtualisation servers (see S 2.318 Secure installation of an IT system, S 2.315 Planning the use of servers, and S 4.237 Secure basic configuration of IT systems) must be available to the administrators at all times. It must designed in such a way that in cases of emergency the virtualisation servers may also be recovered by personnel who are not familiar in detail with the previously existing configuration.
- A recovery plan must be drawn up containing information on controlled restart of the virtualisation server and the virtual IT systems failed.
- It must be ensured that the recovery of the virtualisation systems does not depend on a service in the computer centre exclusively provided by a virtual IT system.
Various scenarios where the virtualisation systems or parts thereof have been compromised should be examined within the framework of contingency planning. For these scenarios, there must be a precise description as to which reactions are required and which actions must be executed. The procedure should be drilled regularly.
Timely contingency planning containing specific instructions that may also be followed by personnel who are not familiar with the administration of the system in detail may lessen the consequences in the event of damage. The corresponding documents for emergency situations must be available to authorised persons. However, since these documents contain important information, they must be stored securely.
Each of the following emergency situations should be examined:
Attack
If attacks on the virtualisation systems have been discovered, it must not be assumed that the attacks were restricted to the virtualisation systems themselves. Moreover, it must be checked whether the virtual IT systems operated on the virtualisation systems have been compromised. In so doing, it must be taken into consideration that malware (backdoors, Trojan horses) may have been installed on the virtualisation servers themselves, but also on the virtual IT systems. Moreover, it is possible that undesired communication paths have been opened via the network configuration of the virtualisation servers. Furthermore, virtual IT systems may have been copied.
In order to delete such malware reliably, it is recommendable to completely restore the virtualisation components. The created data backups, but also the documentation of the system configuration and the installation instructions may be used to this end. If the virtualisation environment used is equipped with a user administration for controlling administrative accesses, the user accounts, particularly those of the super users, must be checked for proper group memberships. All passwords should be changed in order to reduce the chances of success for follow-up attacks.
The safeguards described for virtualised IT systems that have been operated on the compromised virtualisation servers in the corresponding business continuity plans should be performed for these systems.
Theft of (physical) virtualisation servers
When virtualisation servers have been stolen, all accounts for administrating the virtualisation servers must be provided with new passwords. It must be taken into account that virtual IT systems have also been stolen together with the virtualisation server, particularly if these were stored to local hard disks of the virtualisation server. Even if this is not the case, it must be assumed that the thief gained knowledge of large parts of the system configuration of the virtual IT systems and the virtualisation infrastructure in the computer centre. Therefore, the extent to which improvements or changes to the virtualisation infrastructure may contribute to the resistance of the infrastructure against future attacks must be verified. When in doubt, the entire virtual infrastructure should be re-designed.
Theft of virtual IT systems
Normally, no physical access to the computer centre is required in order to steal a virtual IT system. An attacker may copy virtual IT systems using the functions of the virtualisation servers, for example. All an attacker needs for this is network access in order to be able to access the storage resources the virtual IT systems are stored to.
Preventively, safeguards making these options more difficult must be developed (S 2.477 Planning a virtual infrastructure, S 4.349 Secure operation of virtual infrastructures). Furthermore, the extent to which such attacks can be detected must be checked.
Therefore, the contingency planning for virtual IT systems should include regulations describing the procedure after such a theft.
Misconfigurations
Misconfigurations of virtualisation servers may have wide-ranging negative consequences for computer centre operations. Therefore, the virtualisation software must be checked regularly for misconfigurations within the framework of contingency planning. If such misconfigurations are discovered, their extent must be assessed. Here, it must be checked in particular whether virtual IT systems are affected by the misconfigurations.
The required changes for eliminating errors in the configuration can be performed directly depending on the degree of severity. However, it must be taken into consideration that virtual IT systems could possibly be affected adversely during such changes. Therefore, it may be necessary to shut down the virtual IT systems prior to performing configuration changes to the virtualisation systems.
Failures due to force majeure
The threats posed by force majeure, e.g. by earthquakes, flooding, fire, storm damage, and cable damage, may have adverse effects on the availability of the virtualisation servers. Adequate safeguards to increase the availability must be taken into consideration, e.g. through the use of redundant communication links of the IT systems.
Review questions:
- Have the effects of the consolidation effects associated with a virtual infrastructure on the availability requirements of the virtualisation servers been verified?
- Has an approach to be followed in the event of virtualisation component failure been defined?
- Have the business continuity plans been adapted to the virtual infrastructure?
- Have the data backup policies been adapted to the virtual infrastructure?
- Has it been ensured that corresponding documents and suitable personnel are available in an emergency?
- Have regulations been drawn up describing the procedure after virtual IT systems have been stolen?
- Are the virtualisation servers checked regularly for errors?
- Has the necessity of safeguards which increase the availability in cases of force majeure been verified?