S 6.94 Contingency planning for security gateways

Initiation responsibility: IT Security Officer, Head of IT

Implementation responsibility: Administrator

Troubleshooting with security gateways

Security gateways play a central role in maintaining the availability of the network connection of an organisation. Faults in or failures of the security gateway or individual components thereof (from sporadic malfunctions to clear-cut equipment failures and network outage caused by this) can have direct and serious consequences if there are no adequate provisions for emergencies.

To be able to react quickly and effectively in such situations, diagnostics and troubleshooting must be planned and prepared in advance. Reaction instructions should be created for typical failure scenarios and for failures that have already occurred in the organisation. Cookbook-like documentation of all the necessary steps is particularly helpful in situations where a fast response is called for. This includes not only diagnosis and error handling but also the administrative actions that are necessary in normal operation. Often these will be contained in the documentation provided by the manufacturer. However, for daily operations it is sensible to create an overall set of documents in the form of an operating manual.

A suitable logging function running during operations is also a prerequisite for the success the diagnostic procedures (see also S 4.47 Logging of security gateway activities). In addition, suitable tools should be used for error handling.

The procedure for error handling can be divided into the areas of administration, performance measurement, and diagnostics. The aspects to be taken into account in each of these three areas are explained below: For routers acting as a packet filter which are part of a security gateway, see safeguard S 6.92 Contingency planning for routers and switches.

Administration

All the necessary commands and steps involved in the administration and configuration should be documented in an operating manual for the individual components of the security gateway. For reasons of clarity, it is recommended to do this separately for every component and to also create a single overview document.

The following aspects must be taken into account:

Performance

The following aspects should be taken into account when reporting the performance:

Diagnostics

For diagnostic purposes, all the necessary commands and the outputs to be expected for viewing the operational status of all the components of the security gateway and their configuration should be documented. The following information, among others, is relevant when diagnosing errors:

Additional safeguards are described in S 2.215 Error handling.

Contingency planning to increase availability

Planning the procedure to follow when malfunctions occur can minimise the restoration time and may even be the only way to make a solution possible under some circumstances. The planning must be coordinated with the overall malfunction and contingency planning and should be based on the general business continuity planning concept (see module S 1.3 Business continuity management). The general specifications for contingency documents for the entire IT operation are formulated here. Ideally, they specify uniform and binding requirements as well as the layout, contents, and form of the documents.

The following questions are relevant to contingency planning:

When planning for limited operation, keep in mind that limited operation of the security gateway must not result in an inadequate protection of the organisation's own network. In case of doubt, it is better that the service remains unavailable for a longer time than to run the risk that further problems are caused by "limited security".

Care must be taken when drawing up the procedure descriptions necessary for contingency planning, and the procedures must be tested regularly. In some cases, different procedures must be considered for different types of devices and operating systems.

In case of central components, such as the packet filters of the security gateway that is situated between the organisation's own network and the internet, failure of one component of the security gateway can cause the entire internet connection to fail. Probably the most important measure for increasing the availability is keeping a reserve of replacement parts or replacement devices to minimise the downtime in the event of a hardware defect. As an alternative or in addition to this, service contracts can be signed with the manufacturer that ensure availability through guaranteed reaction times or even guaranteed repair times. As a result of this, the costs for storage can be reduced or an even higher level of hardware availability can be attained. The supply of software updates can also be regulated in the framework of such a contract.

Review questions: