S 1.71 Function tests of the technical infrastructure

Initiation responsibility: Building Services Manager, IT Security Officer

Implementation responsibility: Building Services

Unfortunately, realistic function tests are only performed on the technical infrastructure on extremely rare occasions. This means the proper function of the emergency power supply or the correct interaction of the air conditioning system and fire alarm system are not tested often enough, for example. In many cases, extensive time and effort is put into protecting against failures, yet the safeguards are often not tested due to a fear of the tests causing damage. However, the tests performed instead are not suitable for thoroughly testing the entire reaction chain that would result from a real event. Basically, though, it is generally true that it is better to conduct the tests and handle any consequential damage in the test operations (and to learn from this experience) than to suffer unexpectedly from such consequential damage during live operations when emergency measures are taken. Inspections of the technical infrastructure are generally restricted to examining each technical component (such as the power supply) separately. At the most, the interfaces to functionally related equipment are handled, but a comprehensive examination of the entire function chain is almost never performed. A typical function chain is the series of reaction to an event such as "the power fails, and the EPS starts up automatically". This function chain cannot be tested adequately by manually starting the emergency power supply (EPS) then shutting off the power because this does not test how the function chain will react when the primary power supply suddenly fails.

In general, classic inspections alone cannot provide assurance that the complex reaction chains will work properly. In spite of optimally inspecting and maintaining every single unit, it happens over and over again that the overall system does not operate as planned when a failure actually occurs.

Example: In one specific case, the EPS and the power failure detector circuit (including the signal it sent) were inspected and determined to be working properly. During an actual power failure, the power failure detector then responded correctly and sent a signal to the emergency power supply (EPS). For some unknown reason, though, the EPS did not interpret the signal correctly and therefore did not start up even though the signal sent to the EPS during the inspection conformed to the manufacturer's specifications and the EPS responded correctly during this test.

It is therefore essential to perform a fully realistic function test (real-world test) on reaction chains. This means that the equipment to be tested must be confronted as a whole system specifically with the problem it was designed to respond to. Since the purpose of a real-world test is to find detectable errors in the overall system only, it must be assumed that precisely these errors will occur and that the response will not be carried out as planned.

For this reason, real-world tests should not be conducted during peak operating times, and precautions must be taken to ensure the consequences of possible errors can be controlled. The latter should be part of the contingency plan in any case.

Real-world tests of the technical infrastructure of a computer centre should be performed every one or two years as well as after modifying systems or making extensive repairs.

Review questions: