S 6.76 Creation of a contingency plan for failure of a Windows network

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: Administrator, Head of IT

The failure of one or more Windows systems providing certain services can have a serious effect on the IT environment since users will not be able to access the functionalities provided by the Windows system or systems. The safeguards to implement in order to avoid an emergency situation, minimise the effects of a failure, and ensure a successful recovery must be defined. The documentation and instructions needed when a (server) failure occurs may contain information requiring protection. For this reason, they must be stored securely to prevent the information from being misused. Information requiring protection can include the following, for example:

At the same time, it must be ensured organisationally that this information is available to the people who are responsible for recovery when a failure occurs. The contingency plan for Windows systems must be integrated into the contingency concept (see module S 1.3 Business continuity management) and must be compatible with S 6.96 Contingency planning for a server.

Contingency planning should be integrated into the system planning phase since certain availability specifications that make it necessary to use redundant systems, for example, must already be taken into account in the system planning phase (see S 6.1 Development of a survey of availability requirements). Clearly defined criteria specifying which Windows systems the contingency plan applies to should be stated in the contingency plan.

Data backups

In the contingency plan for Windows systems, it must be pointed out that the implementation of the safeguards in module S 1.4 Data backup policy must be implemented in order to handle and overcome an emergency. For Server systems, it must be ensured that S 6.99 Regular backup of important system components for Windows Server is implemented.

The documentation of the data backup is particularly important for the contingency plan. This documentation should be checked regularly when performing maintenance work to ensure it is up to date. In particular, the documentation must contain information on the scope of the data backup, when the last data backup was successfully created, and which software and hardware was used for the data backup.

The data backup method selected and the hardware and software used to implement this method must meet the recovery requirements within the required recovery time.

Technical documentation

Appropriate technical documentation of the system must be available in case of a failure. The technical documentation should cover the following items at a minimum:

In general, all documentation available should be taken into account in the contingency planning phase. If necessary, incomplete documentation must be completed to ensure that important functions are not forgotten in an emergency. The documentation is to be updated when performing maintenance work and after making changes to the hardware, software, or the system configuration.

The updating of the technical documentation, and therefore of the contingency plan as well, is part of change management. It should therefore be easy to determine from the documentation who made what changes and who updated the documentation. All documentation necessary for the emergency plan must be available and legible.

Failover operation

If you can only tolerate short downtimes, then a failover system must be provided. The capacity plan for the overall system should be designed so that when an individual system fails, there will be other systems already in operation that can take over most of the roles and functions of the failed system. In this case, the information in S 4.276 Planning the use of Windows Server or S 4.420 Secure use of the Windows Action Center under Windows 7 must be taken into account.

For Windows servers, you should consider purchasing backup equipment capable of operating a Windows server and some applications in case several servers fail. To minimise switchover times, these devices should have the corresponding software already installed and should also be started up and maintained regularly. When using Windows Vista, Windows 7 and Windows Server 2008 this will also apply to devices with a KMS (Key Management Service) or a MAK Proxy (Multiple Activation Key Proxy), if these forms of activation for volume licences are used.

The development of failover scenarios can take a lot of time and expense. It is recommended to take the failover scenarios into account in the server usage planning phase. Specific instructions on what action to take to initiate failover operation must be available.

Restart plan

Depending on the server role and the IT environment, there may be certain requirements a Windows system needs to meet in order to successfully restart after a failure. The restart times of the connected IT environment (e.g. of routers, other servers, site connectors, etc.) must be taken into account in addition to the restart time of the server itself. A restart plan becomes more complex as the information system gets larger, and a custom restart plan must be created based on the domain structure and server roles implemented. A member server should only be restarted after at least one domain controller with a global catalogue, one certificate server for calling up certificate lock lists (if any are present), and all infrastructure servers have been started.

Testing the contingency plan

The contingency plan should be tested regularly (e.g. every three months) in a test environment in the framework of the maintenance plan, but also occasionally in the production environment (in which case special care must be taken, of course). The more frequently you expect to make changes to the configuration, the more often you should perform tests to ensure the contingency plan is up to date. The test results must be documented and trigger changes to the existing contingency plan if such changes are necessary. In all cases, the recovery scenarios must be tested and the results of the tests documented (see S 6.41 Training data reconstruction).

Restoration

The requirements needing to be fulfilled to perform a recovery by installing a new system must be specified in the contingency plan. The preparation concept or existing installation concepts (S 4.281 Secure installation and preparation of Windows server 2003 for a Windows server 2003 system) must be taken into account. When using Windows Vista / 7 / Server 2008, this will also apply to devices with a KMS (Key Management Service) or a MAK Proxy (Multiple Activation Key Proxy), if these forms of activation are used for volume licences (for this, see S 4.336 Activation of Windows systems from a volume licence contract in Vista or Server 2008 and higher versions and S 4.343 Reactivation of Windows systems from a volume licence contract in Vista or Server 2008 and higher versions). Critical items in this regard include the drivers needed for the hardware used, among other items. For certain RAID controllers, it is necessary to install drivers during installation. As a rule, the manufacturer provides the drivers on data media supplied with the hardware or on the Internet on the manufacturer's web site. A copy of the driver currently in use must be available on a data medium.

The original software package with the original data media (including the product key and licence information) must be available for recovery. If a volume licence program is not used, then it must be pointed out that it may be necessary to activate a Windows system and that activation may fail if multiple activations were performed over the Internet using the same product key but with different hard disks. As a result, it may become necessary to contact Microsoft directly by telephone. Microsoft must be informed of the system failure in this case.

When using Windows Vista or Windows 7, it is also necessary to reactivate the Windows Vista or Windows 7 clients when volume licences are used.

When using Windows Vista and Windows 7, it must be ensured that there are always enough licences available. When performing a new installation and automatically activating the system using a MAK Proxy or KMS, the Windows Vista and Windows 7 clients will first request licences. A proper licence management system must ensure that the number of licences needed is available for activation. Further information on activation can be found in S 4.336 Activation of Windows systems from a volume licence contract in Vista or Server 2008 and higher versions and S 4.343 Reactivation of Windows systems from a volume licence contract in Vista or Server 2008 and higher versions.

The recovery keys of the Windows systems (Vista or Server 2008 and higher) protected by BitLocker Drive Encryption must be available promptly when needed. It must be impossible to acquire the keys without authorisation during transmission.

The existing authentication resources and recovery keys needed to use BitLocker Drive Encryption become invalid when performing a new installation. It must be ensured for the newly generated authentication resources and recovery keys that they can only be accessed by authorised personnel (see S 4.337 Use of BitLocker drive encryption).

Creating replicas of important information and files on several servers enables you to use these replicas when an individual server fails. This makes it possible to offer users a copy of the missing data quickly. In the contingency planning phase, it should be examined if replication is necessary, and if so, which data needs to be replicated. Windows server offers the File Replication Service (FRS) for this purpose, which can also be used in connection with the Distributed File Service (DFS). In Windows versions released before Windows Server 2003 R2, these services are only suited for use in a contingency concept with certain restrictions, and their use usually comes in conjunction with higher expenses for testing and maintenance.

The contingency plan must differentiate between certain roles assumed by Windows systems, for example between DNS servers and certificate servers, in order to be able to guarantee complete recovery. This includes backing up role-specific system components (e.g. the databases of the DNS service or of the certification authority) as well as comprehensive documentation of the settings used in connection with the corresponding roles.

Review questions: