T 4.76 Failure of administration servers for virtualisation systems

Several virtualisation servers may be used in order to design a virtual infrastructure. For this, the virtualisation servers are connected in such a way that the virtual IT systems running on them are always executed on the virtualisation server that is able to provide the optimal performance for the respective IT system. If a virtualisation server is able to provide a running virtual IT system with more resources (dynamic assignment of resources, e.g. Citrix XenServer Workload Balancing or VMware Dynamic Resource Scheduling), it is even possible to migrate this IT system to the IT system with the available resources with the help of a migration (Live Migration).

Additionally, the availability of the virtual IT systems can be increased by high-availability mechanisms such as the automatic restart of failed virtual machines. These functions require a central administration server for the majority of the virtualisation products coordinating the operation of the individual virtual machines and the virtualisation servers. Virtualisation products capable of using such a central administration server include Citrix XenServer, Microsoft Hyper-V, or VMware ESX, for example The administration server (Citrix XenCenter, Microsoft System Center Virtual Machine Manager, SUN Management Center, or Vmware vCenter) is normally also equipped with a monitoring component that can be used to monitor the function of the virtual IT systems and the virtualisation servers.

Since the administration server controls and administrates all functions of a virtual infrastructure, a failure of this administration system results in the loss of the capability of performing any configuration changes to the virtual infrastructure. During this period, the administrators cannot react to occurring problems such as resource bottlenecks or the failure of individual virtualisation servers, nor can they integrate a new virtualisation server into the infrastructure and/or create new virtual IT systems.

Functions such as Live Migration and therefore the dynamic assignment of resources for individual guest systems are no longer available either, since the instance coordinating such functions is no longer operational. As a consequence, the virtual infrastructure is no longer able to automatically react to resource bottlenecks which has adverse effects on both the performance and the availability of individual virtual IT systems. This is particularly applicable if the resources of the virtualisation servers have been overbooked.

Additionally, the administration server serves for monitoring the virtualisation servers and the virtual IT systems operated on these servers. If the administration server or its monitoring component provides incorrect data or no data at all, the administrators are no longer capable of appropriately monitoring the functionality of the virtual infrastructure. Thus, there is the risk that resource bottlenecks in the virtual infrastructure remain unnoticed and an expansion of the virtual infrastructure is provided too late. It may also be possible that the failure of individual virtual IT systems is noticed too late if the monitoring function of the virtual infrastructure failed.

Furthermore, the failure of virtualisation servers may even remain unnoticed if the IT systems running on this server have been migrated to another virtualisation server and therefore no services fail in the computer centre, but the failure is not indicated due to an error in the administration and monitoring software. The related reduction of redundancy may massively reduce the overall availability of the virtual infrastructure.

Example: