S 2.314 Use of high-availability architectures for servers

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: IT Security Officer, Administrator

The availability of business processes, applications, and services often depends on the function of a central server. However, the more applications are run on a server, the more reliable this server must be. Normally, a server contains different potential sources of error (single points of failure), i.e. components the failure of which may cause the overall system to fail: CPU, hard disks, power supplies, fans, back plane, etc. The restoration of the overall system may take considerable amounts of time in this case. Along with the provision of spare parts, the following options may additionally be used in order to increase the availability:

Every single one of these techniques offers a different level of availability and is normally related to different costs.

Cold standby

For cold standby, a secondary replacement system identical in construction is provided parallel to the actual productive system, which is not active, however. Should the primary system fail, the replacement system can be booted and integrated into the network manually.

Along with the provision of individual spare parts, this is the simplest redundancy solution entailing the corresponding advantages and disadvantages:

Advantages of a cold standby solution Disadvantages of a cold standby solution
  • Cold standby solutions do not increase the complexity for the overall system.
  • The costs incurred by a cold standby system only amount to the costs for the additional hardware and therefore are lowest amongst the presented options.
  • New installation of or changes to the system can be performed without any losses in availability. For this, productive operation is switched to the cold standby system during the changes.
  • A secondary system must be provided in addition to the primary system.
  • The replacement system must constantly be provided with the latest configuration and patch status.
  • Since the replacement system requires manual activation, administrators must continuously monitor the system and intervene in an emergency.
  • If the application data is not stored to an external storage system so that access directly from the replacement system is possible, the data must be migrated to the cold standby system.

Table: Advantages and disadvantages of a cold standby solution

These solutions are well suited for servers containing applications where short and/or limited downtimes until administrator intervention are uncritical. Examples include:

Hot standby (manual switchover)

For hot standby, a replacement system must also be provided that is, however, maintained in operation parallel to the productive system. The functionality of the productive system is monitored and the replacement system is activated in the event of a failure. Switchover may be manual or automatic. The overall system must comprise additional functionalities for automatic switchover, e.g. automatic recognition of failures. This case is addressed in the next section in "Cluster".

In order to ensure that the downtimes are as short as possible, the condition of the replacement system must be checked continuously.

Advantages of a hot standby solution Disadvantages of a hot standby solution
  • The downtimes are shorter when compared to a cold standby solution.
  • Just like for cold standby, this solution is also relatively cheap when compared to higher quality availability solutions described in the following.
  • The replacement system is operating and may also be used for data replication.
  • New installation of or changes to the system can be performed without any loss of availability. For this, productive operations is switched to the hot standby system during the changes.
  • Only half of the existing hardware is used at all times.
  • The replacement system must be kept up to date constantly.
  • If the hot standby system is activated manually, continuous monitoring by a person in charge of the system is required.

Table: Advantages and disadvantages of a hot standby solution.

Using hot standby systems is suitable for applications for which short downtimes are uncritical. The problem of system monitoring during activation of the hot standby server must be taken into consideration in this. For example, possible fields of application include:

Cluster (automatic switchover)

A cluster consists of a group of two or more computers operated in parallel in order to increase the availability or the performance of an application or a service. In this, the application or service may be executed actively on one of the computers or distributed to several computers (performance enhancement).

Clusters are differentiated regarding

depending on the mode of operation.

Load-balanced clusters

For load-balanced clusters, instances of an application or of a service are distributed amongst the servers depending on the utilisation. If this is possible for an application or a service, this cannot only be used to achieve load balancing and therefore performance enhancement, but also to reduce the problems occurring during failures.

One of the prerequisites for using load balancing is that the respective applications or services must not require write data access.

In this case, redundancy may be provided by installing systems with similar performances "next to each other" with the help of a load balancing process and by guaranteeing that the other servers will compensate the failure of one server.

Advantages of a load-balanced cluster Disadvantages of a load-balanced cluster
  • Both the availability and the performance can be increased using load-balanced clusters
  • All available resources are used permanently.
  • The solution is highly scalable.
  • The complexity of the overall system is lower when compared to a failover cluster.
  • This cluster cannot be used for all kinds of applications. In particular, applications not using any pure read accesses and simultaneously requiring access to the same storage resources by all servers are not suitable for load balancing.

Table: Advantages and disadvantages of a load-balanced cluster

If, along with the availability, performance is important and if the application allows for distributed use, a load-balanced cluster is the ideal solution. This may be the case for the following, for example:

web servers, frontend applications with exclusive read accesses (e.g. web server farms) failover clusters

In this document, the term failover cluster refers to a cluster where active operation of the application or service is taken over automatically by another part of the cluster in the event of one of the cluster systems failing. The term failover refers to the automatic takeover of services during the failure of a system component by a functionally equivalent component. For the failover function, a dedicated heartbeat connection is usual ensuring the communication between the cluster servers. Along with the connection to the client network, the cluster servers must also be connected to the administration network in a dedicated manner in order to provide for direct access in the event of an emergency.

Automatic failover assumes that all software and hardware components are monitored appropriately. Therefore, it is important to ensure that the failover mechanism is not based on any incorrect assumptions.

The following items must be taken into consideration when using a failover cluster:

Advantages of a failover cluster Disadvantages of a failover cluster
  • The availability may be increased significantly by automatic takeover.
  • No manual interventions are required.
  • This solution is highly complex.
  • Failover clusters are difficult to scale.
  • The resources are always only partially utilised.
  • Additional hardware and software incur high costs.

Table: Advantages and disadvantages of a failover cluster

As shown from the comparison of the advantages and disadvantages, using a failover cluster only makes sense if one or several applications are characterised by very high availability requirements. Along with the high expenditure, the personnel responsible must have very good knowledge regarding the used operating systems and applications and regarding the failover functionality. Furthermore, using failover solutions for servers only makes sense if all dependencies are also designed with the corresponding redundancies such as the network connection or availability of the client.

Areas where failover clusters are typically used in the event of high availability requirements include, for example:

If business processes, applications, or services are characterised by high availability requirements, it must be considered how these requirements may be met. The persons responsible for IT and the security management team should draw up a concept and select appropriate architectures for the corresponding servers.

Review questions: