S 6.133 Recovering the operating environment after security incidents

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: IT Security Officer, Head of IT, Administrator

In order to eliminate security gaps, it is necessary to take the IT systems affected off the network and back up all files that could provide information on the type and cause of the problem that occurred. This includes all relevant log files in particular. Since all affected IT systems should generally be considered insecure or compromised, the operating system and all applications on each of these IT systems need to be examined for any changes. In addition to examining the programs, it is also necessary to examine configuration files and user files for evidence of manipulation. Checksums should be used for this purpose when possible. However this assumes that checksums of the "secure" states were generated in advance and saved to write-protected data media (see also S 4.93 Regular integrity checking).

In order to ensure that all manipulations made by an attacker (such as the installation of a Trojan horse) have really been eliminated, the original files should be restored from write-protected data media. It must be ensured when restoring the files that all security-related configurations and patches are also restored. If files are restored from data backups, it must be ensured that these backups were not affected by the security incident, i.e. that they were not infected by the computer virus that caused the security incident.. Examining the data backups can also help to determine when the attack started or when the first computer was infected by the virus.

Before restarting operations after an attack, all passwords on the IT systems affected should be changed. This also includes the IT systems that were not directly affected by the manipulations made but for which the attacker may have been able to obtain user and/or password information.

After recovering an IT system, the system should be checked to ensure all functionality really was completely restored. Users with specific knowledge of the applications and databases used on the system could be consulted for this purpose.

The organisation should assume that the attacker will start another attack once the systems have been restored to a "secure" state. For this reason, the IT systems, and especially the network gateways, should be monitored using corresponding monitoring tools. In addition to an extensive analysis of the log files, it is also possible to use intrusion detection and intrusion response systems for this purpose, for example (see also S 5.71 Intrusion detection and intrusion response systems).

When a security incident occurs, the corresponding solution should be implemented, if necessary, by the system administrator responsible for the IT system, the team of experts for handling security incidents, the Computer Emergency Response Team (CERT), the manufacturer of the IT system, or a security expert.

In this phase, emphasis should be placed on documenting the safeguards initiated (workarounds, final solution, who knows how to implement the safeguard) and updating the knowledge database (problem and solution database) accordingly (see also S 6.134 Documentation of security incidents).

If a change request is required to implement the solution, change management should grant this request. In such cases, the security incident should still be marked as "open" until execution of the change has been completed successfully. In most cases, critical security incidents will trigger special change management scenarios (emergency changes) intended to allow the solution to be implemented immediately.

Since external service providers may be involved, especially when resolving security incidents, it must be specified which information on the security incident will be made accessible to whom.

Review questions:

When eliminating security gaps on the IT systems affected, are they taken off the network and are all files backed up that could indicate the type and cause of the problem that occurred?
Are the operating system and applications checked for changes on all IT systems affected?
When recovering a secure operating environment, are users called in to help conduct functional tests on the applications?
After recovery, are the IT systems and network gateways specifically monitored to detect any repeat attacks?