S 6.43 Use of redundant Windows servers

Initiation responsibility: IT Security Officer, Head of IT

Implementation responsibility: Administrator

Depending on the availability requirements of the data and applications, a level of redundancy must be obtained that prevents a total loss of data with an acceptable amount of time and effort. Depending on these requirements, it is possible to store parallel copies of some of the database or even all of the database on several different disk drives so that even if one disk drive fails, the data stored on it will not be lost and the user can continue to work with the data without having to wait for it to be restored from a data backup.

The systems can be designed according to the availability requirements defined so that when a server fails, its tasks can be taken over by one or more other servers. It must be ensured in this case, though, that this distributed data remains consistent, and this consistency also needs to be guaranteed in the event that individual devices fail. There are major differences in this relationship between the various redundancy concepts in terms of their performance:

Direct physical redundancy can be achieved with RAID disk systems (RAID: Redundant Array of Inexpensive Disks). It must be noted that if you decide to use this method, then there are strict restrictions placed on the distance between the individual disks in a RAID system. This means that in the event of a fire or similar damage, all parallel copies of the database will be destroyed. RAID systems are therefore not a replacement for data backups.

Through the use of Windows 2000 clusters it is possible to distribute parallel copies of the data between different disks and under the control of different computers. The use of high-performance clusters with up to four servers reduces the number of server systems required, which in turn leads to a reduction in the time and effort required for administration and therefore to improved security.
The replication of individual directories also makes it possible to widely distribute the data, but there are no synchronisation mechanisms available in this case that allow you to maintain consistent parallel copies of the files currently being edited. A failure of the primary disk drive in this case will always result in the loss of more or less data. Use of the replicator service available in Windows 2000 should therefore be restricted to cases in which the corresponding data is only changed by one person, and its use must never be considered a replacement for performing regular data backups.

The servers must be designed redundantly to prevent the failure of the servers. There are several ways to prevent their failure, and a suitable method should be selected from the alternatives available based on the maximum acceptable downtime:

If downtimes in the range of up to half an hour are acceptable, then a separate computer should be provided that takes over the tasks of a server when one fails. To gain access to the data on the failed server, it will be necessary to move its disk drives to the backup computer.

If downtimes of up to a maximum of several minutes are acceptable, then a cluster system in which all computers have access to all disks should be used. The system should be configured so that when a server fails, the system automatically switches to a backup computer located in the system.

If downtimes in the range of a few seconds at the most can be tolerated, then it is necessary to use a fully redundant fail-safe system with multiple CPUs operating in parallel. In this case, the users will not notice the failure of one CPU or of a main memory module. This solution therefore offers the highest possible level of reliability, but at the same time it is also significantly more complex and more expensive than the other two other solutions, which means it should only be used when there are extremely high requirements placed on the availability. Windows 2000 cannot meet such high requirements at the present time, which means special systems that run on other operating systems are required in this case.

In all cases, it is necessary to determine which specific availability requirements exist based on a thorough analysis. When performing the detailed planning of the system and the network architecture, it is then necessary to select a suitable combination of redundant computers and/or disk drives to meet these requirements.

Review questions:

Are redundancies created to prevent total loss of data with an acceptable amount of time and effort?
In case of distributed data, is consistency of the same ensured?
Is consistency also ensured in case of failure of individual devices?
Are the availability requirements of the data coordinated with the organisation's security policies?
Is there a detailed list of the system and network architecture to show existing redundancies?