S 6.94 Contingency planning for security gateways

Initiation responsibility: IT Security Officer, Head of IT

Implementation responsibility: Administrator

Troubleshooting with security gateways

Security gateways play a central role in maintaining the availability of the network connection of an organisation. Faults in or failures of the security gateway or individual components thereof (from sporadic malfunctions to clear-cut equipment failures and network outage caused by this) can have direct and serious consequences if there are no adequate provisions for emergencies.

To be able to react quickly and effectively in such situations, diagnostics and troubleshooting must be planned and prepared in advance. Reaction instructions should be created for typical failure scenarios and for failures that have already occurred in the organisation. Cookbook-like documentation of all the necessary steps is particularly helpful in situations where a fast response is called for. This includes not only diagnosis and error handling but also the administrative actions that are necessary in normal operation. Often these will be contained in the documentation provided by the manufacturer. However, for daily operations it is sensible to create an overall set of documents in the form of an operating manual.

A suitable logging function running during operations is also a prerequisite for the success the diagnostic procedures (see also S 4.47 Logging of security gateway activities). In addition, suitable tools should be used for error handling.

The procedure for error handling can be divided into the areas of administration, performance measurement, and diagnostics. The aspects to be taken into account in each of these three areas are explained below: For routers acting as a packet filter which are part of a security gateway, see safeguard S 6.92 Contingency planning for routers and switches.

Administration

All the necessary commands and steps involved in the administration and configuration should be documented in an operating manual for the individual components of the security gateway. For reasons of clarity, it is recommended to do this separately for every component and to also create a single overview document.

The following aspects must be taken into account:

configuration of the operating system, in particular the configuration of the network interfaces
updating of the operating system
configuration of the "function components" (packet filters, security proxies, virus scanners etc.), in particular
- important commands for starting and terminating services
- storage location and format of configuration files or configuration databases; if necessary, use of the relevant configuration tools
- in the case of security proxies (for example, HTTP proxy, e-mail gateway), the location (partition/file system) of the data directories
logging

Performance

The following aspects should be taken into account when reporting the performance:

inbound and outbound traffic via the packet filters and for each of the protocols for which a security proxy is used
statistical information about the protocols used

Diagnostics

For diagnostic purposes, all the necessary commands and the outputs to be expected for viewing the operational status of all the components of the security gateway and their configuration should be documented. The following information, among others, is relevant when diagnosing errors:

overview of the overall configuration
status and configuration of the network interfaces and the other connections
status of the available network services
processes
users logged in
logging (use of the log levels, interpretation of the log information)

Additional safeguards are described in S 2.215 Error handling.

Contingency planning to increase availability

Planning the procedure to follow when malfunctions occur can minimise the restoration time and may even be the only way to make a solution possible under some circumstances. The planning must be coordinated with the overall malfunction and contingency planning and should be based on the general business continuity planning concept (see module S 1.3 Business continuity management). The general specifications for contingency documents for the entire IT operation are formulated here. Ideally, they specify uniform and binding requirements as well as the layout, contents, and form of the documents.

The following questions are relevant to contingency planning:

What are the monitoring requirements?
- Compilation of information that is analysed on an ongoing basis by the parties responsible for operation of the network components (see also section entitled "Logging")
- How can the early detection of errors be guaranteed? Are there any tools that enable alarms to be sent automatically?
What are possible reasons for malfunctions?
- attacks
- hardware defects
- inadequate dimensioning (failure when the load increases)
What safeguards can be taken?
- develop alternative configurations and "fallback strategies" for specific failure or attack scenarios (for example, changes of routing, alternative packet filtering rules)
- standby equipment: implementation of failover solutions that make it possible to switch over to an alternative unit during live operation
- maintenance agreements
- staff training
What Service Level Agreements (SLAs) are there or should be made?
- hardware suppliers (for example, on-site replacement with response time guarantee for certain components, especially in the case of appliances)
- internal service level requirements
How will diagnosis be performed?
- status queries
- display of configuration
- logging
What correction procedures must be performed?
- procedures in the event of failure of the complete system (restoration of operating system and configuration)
- procedures in the event of failure of sub-components (e.g. memory, hard disks, network cards
Who must be informed in the event of damage?
- server and application administration
- hardware supplier/contact person for maintenance agreement

What documents must be available if damage occurs?
- configuration
- packet filter rules, configuration for security proxies
- passwords
The documentation must be available in a form other than in electronic form. Instructions should also be available at least in paper form as well. If necessary, configuration files can also be stored separately on CD-ROMs or other data media.
How is a restart performed?
- dependencies on other areas of the IT network
- reinstallation of operating system and configuration
- playback of a backed up configuration
- scope for limited operation

When planning for limited operation, keep in mind that limited operation of the security gateway must not result in an inadequate protection of the organisation's own network. In case of doubt, it is better that the service remains unavailable for a longer time than to run the risk that further problems are caused by "limited security".

Care must be taken when drawing up the procedure descriptions necessary for contingency planning, and the procedures must be tested regularly. In some cases, different procedures must be considered for different types of devices and operating systems.

In case of central components, such as the packet filters of the security gateway that is situated between the organisation's own network and the internet, failure of one component of the security gateway can cause the entire internet connection to fail. Probably the most important measure for increasing the availability is keeping a reserve of replacement parts or replacement devices to minimise the downtime in the event of a hardware defect. As an alternative or in addition to this, service contracts can be signed with the manufacturer that ensure availability through guaranteed reaction times or even guaranteed repair times. As a result of this, the costs for storage can be reduced or an even higher level of hardware availability can be attained. The supply of software updates can also be regulated in the framework of such a contract.

Review questions:

Are there reaction instructions created for typical failure scenarios and for failures that have already occurred?
Are performance measurements carried out for the security gateway components at regular intervals?
Are all the necessary commands and steps involved in the administration and configuration documented in an operating manual for the individual components of the security gateway?
Are all the necessary commands and the outputs to be expected for viewing the operational status of all the components of the security gateway and their configuration documented?
Is the security gateway contingency planning coordinated with the overall malfunction and contingency planning of the organisation?
Are the reaction instructions for security gateway contingency planning also available in printed form?
Does the planning ensure that limited operation of the security gateway does not impair the security of the network of the organisation?
Are emergency drills for security gateway contingency planning purposes carried out?