S 6.35 Stipulating data backup procedures
Initiation responsibility: IT Security Officer
Implementation responsibility: IT Security Officer, Specialists Responsible
The procedure used to perform data backups is affected by the influencing factors mentioned in S 6.34 Determining the factors influencing data backup. A data backup procedure must be specified for each IT system and for each file type. If necessary, a differentiation must be made for individual IT applications of the IT system if these require different data backup strategies, which is often the case on mainframe computers.
The following aspects relating to data backups must be taken into account when specifying a data backup procedure:
- type of data backup,
- frequency and time of data backup,
- number of generations,
- approach and storage medium,
- person in charge of data backup,
- storage location,
- requirements for the data backup archive,
- transport terms, and
- storage conditions.
The following table shows and then explains the dependencies between the data backup aspects and the influencing factors:
Type ofdata backup | Frequency and time of the data backup | Number of generations | Procedure and storage medium | Person in chargeof data backup | Storage location | Requirements for the data backup archive | Transport terms | Storage conditions | |
---|---|---|---|---|---|---|---|---|---|
Availability requirements | X | (X) | X | X | X | X | X | X | |
Time and expense required for restoration without a data backup | (X) | X | |||||||
Data volume | X | X | X | X | X | X | |||
Change volume | X | X | X | X | |||||
Modification times of the data | (X) | X | (X) | ||||||
Deadlines | X | X | X | ||||||
Confidentiality requirements of the data | (X) | X | X | X | X | ||||
Integrity requirements of the data | (X) | (X) | X | X | X | X | |||
Knowledge of the IT users | X | X | X |
X means it has a direct influence, (X) means it has an indirect influence
Table: Data backup
Explanations:
Type of data backup
The following lists the various types of data backups:
- Full data backup: during full data backup, all files to be backed up are copied to an additional data medium at a certain time. In this case, the data is backed up regardless of whether or not the files have changed since the last data backup. For this reason, a full data backup requires a large amount of memory space. The advantage of a full data backup is that the data can be restored quickly and easily, since only the changed files need to be extracted from the last full data backup. If full data backups are rarely performed, extensive subsequent changes to a file may cause extremely time-consuming and complex subsequent re-entering.
- Incremental data backup: during incremental data backup, only those files that have changed since the last data backup (full or incremental backup) need to be stored, in contrast to a full data backup. This saves memory space and reduces the amount of time required to back up the data. However, it usually takes more time to restore the data since the files need to be extracted from several backups created at different times. An incremental data backup is always based on a full data backup. Full data backups are created periodically, and one or more incremental data backups are created between two full data backups. The last full data backup is used as the basis for the restoration of the data, and the files that have changed since this backup are obtained from the incremental backups.
- Differential data backup: during differential data backup, only those files that have changed since the last full data backup need to be backed up. A differential data backup requires more memory space than an incremental backup, but it is easier and faster to restore files in this case. To restore the data, you only need the last full data backup as well as the current differential backup, in contrast to incremental backups where it may be necessary to read-in several data backups consecutively under some circumstances.
- Note: In many cases, data mirroring is also referred to as a data backup method. In data mirroring, the data is stored redundantly and simultaneously on different data media. Since the failure of one of these data media can be overcome immediately, data mirroring increases the availability. However it is not a substitute for data backup, since it does not provide any protection against threats such as theft, fire, or the accidental deletion of data.
A special form of the data backup strategy mentioned above is the image data backup. In an image data backup, the physical sectors of the hard disk stack are backed up instead of the individual files stored on it.
An image backup is a type of full data backup that allows very fast restoration on hard disks of the same type.
Hierarchical Storage Management (HSM) is another form of data backup. In this, the primary goal is to utilise the expensive storage equipment as economically as possible. Depending on the frequency with which the files are accessed, the files are maintained on fast online storage (hard disks), on nearline storage (automatic data media exchange systems), outsourced, or archived on offline storage (magnetic tape). Generally, these HSM systems also offer automatic data backup routines combining incremental and full data backups.
RAID systems (Redundant Array of Inexpensive Disks) provide redundant storage of data. The RAID concept describes how to connect several hard disks and place them under the command of a disk array controller. There are different RAID levels available, with RAID Level 1 usually describing the data mirroring procedure.
RAID systems are not a substitute for data backup! RAID systems do not help in case of theft or fire, which is why the data stored on the RAID systems also must be backed up to other media and these data media must be stored in a different fire section.
The following influencing factors must be taken into account when deciding which data backup strategy will be used in order to select a strategy that suitably meets the requirements while simultaneously being economical:
Availability requirements:
If the availability requirements are very high, consideration should be given to data mirroring; if the availability requirements are high, full data backups should be preferred over incremental data backups.
Data volume and change volume:
If the change volume is approximately equal to the data volume (which is the case when using a database, for example), the amount of memory space saved using incremental data backups is negligible and consideration should be given to full data backups. However, if the change volume is significantly smaller than the data volume, the incremental data backups will save a considerable amount of memory space and therefore considerable cost.
Modification times of the data:
The times the data was modified may have a slight influence on the data backup strategy. If there are certain times when the entire database of an application must be backed up (for example for weekly, monthly, or annual balance sheets), only the full data backup strategy comes into question at these times.
Knowledge of the IT users:
Implementing data mirroring requires the system administrators to have the corresponding knowledge, but does not require any knowledge on part of the IT users.
A full data backup can therefore be performed by an IT user with little knowledge of the system. In contrast, an incremental data backup requires more knowledge of the system and more experience in handling data backups.
Data backup frequency and times
If data is lost (due to a head crash on a hard disk, for example), all changes made to the data since the last data backup have to be made again in order to restore the data. The shorter the interval between data backups, the less time is necessary (in general) to restore the data and re-enter the changes. At the same time, though, it must be taken into account that it is not only possible to specify data backups to be performed periodically (daily, weekly, on working days, etc.), but event-dependent data backups may also be necessary (for example after x transactions, after executing a certain program, or after changing the system).
The influencing factors should be taken into account when selecting the frequency and times of the data backups.
Availability requirements, time and expense for restoration without a data backup and change volume:
The interval between data backups must be selected in such a way that the time required for restoration and re-entering changes (amount of time it takes to restore the changed data which no data backup is available for) to the data changed in this time period (change volume) is less than the maximum tolerable period of disruption.
Modification times of the data:
If there are times when large amounts of data are changed (for example when running a wage payment program or migrating to a new version of the software) or times when all data needs to be available at the same time, it is recommend to perform a full data backup immediately thereafter. For this reason, the times of event-dependent data backups need to be specified in addition to the times of the periodic backups.
Number of generations
On the one hand, data backups are repeated in short intervals in order to have a copy of the data available that is as up to date as possible. On the other hand, the data backup must guarantee that the data backed up it stored for as long as possible. If a full data backup in considered to be a generation, it is necessary to specify the number of generations to be stored and the interval between the generations. These requirements are explained in the following examples:
- If a file is deleted intentionally or unintentionally, this file will not be available in later data backups. If it turns out that the deleted file is still needed, a data backup created before the time of deletion must be obtained in order to restore the file. If no such generation is available any more, the file must be re-created.
- If a file loses its integrity (due to a technical defect, an accidental change to the file, or a computer virus, for example), it is likely that this will not be noticed immediately and will only be noticed later. In order to be able to restore the integrity of the file, it is necessary to access a generation that was created before the loss of integrity.
- It is impossible to completely rule out the possibility that a given data backup was created incorrectly or incompletely. In this case, it often helps when there is a possibility of obtaining a backup from another generation.
In order to benefit from these advantages of the generation principle, one basic condition must be met: the interval between the generations must not be less than a certain minimum period. Example: An automatic data backup procedure repeatedly aborts the data backup run. As a result, all generations would be overwritten successively. This can be prevented by checking the minimum age of the generation and only overwriting a generation when it is older than a certain minimum age.
A generation principle is characterised by two variables: the Minimum age of the oldest generation and the Number of the available generations. The following applies in this case:
- The higher the minimum age of the oldest generation, the higher the probability that there is still an old version of a file which has lost its integrity (this includes files that have been deleted but that are needed again after deletion).
- The higher the number of generations available, the more up-to-date the older version requested will be.
However, the number of generations is directly related to the costs of the data backup, because it is necessary to provide a sufficient number of data media. This results from the necessity that every generation should be stored on a separate data medium. It is therefore necessary for economic reasons to limit the number of generations to a reasonable volume.
The following have an influence on the parameters selected when applying the generation principle:
Availability and integrity requirements of the data:
The higher the availability requirements or integrity requirements of the data, the more generations must be available to minimise the restoration time in case of a loss of integrity.
If the loss of a file or an integrity violation may only be noticed at a very late point, additional quarterly or annual data backups are recommended.
Time and expense for reconstruction without a data backup:
If the data is extensive but can be reconstructed without a data backup, this could be considered as an additional "pseudo-generation".
Data volume:
The higher the data volume, the higher the cost of a single generation due to the amount of storage space required. Therefore, high data volumes can limit the number of generations for economical reasons.
Change volume:
The higher the change volume, the shorter the interval between the generations should be in order to have the most recent version of the affected file available and to keep the time required to re-enter all changes to a minimum.
Procedure and storage medium
After specifying the type of data backup, the backup frequency, and the generation principle, it is necessary to select the procedure, including appropriate and economically viable data media. The following illustrates examples of several commonly used data backup procedures:
Example 1: Manual decentralised data backup on the PC
On non-networked PCs, the data backup is usually performed manually by the IT users in the form of a full backup of the application data. CDs or DVDs are used as storage media.
Example 2: Manual centralised data backup in the Unix system
On a Unix system with connected terminals or PCs with terminal emulation, a central data backup procedure is the most appropriate due to the centralised database. This is often performed manually by the Unix administrator as a combination of weekly full data backups and daily incremental data backups using streamer tapes.
Example 3: Manual centralised data backup in the local network
In a local network with connected PCs, a data backup is often performed in such a way that the connected PC user stores the application data to be backed up on a central server in the network and the network administrator then backs up the data of this server centrally by making a weekly full data backup and a daily incremental data backup.
Example 4: Automatic centralised data backup in the field of mainframe computers
Like in Example 2, centralised data backups for mainframe computers are performed as a combination of weekly full data backups and daily incremental data backups. In many cases, the backups are initiated automatically using a tool (HSM). Additional event-based full backups are commonly performed for the data of individual IT applications.
Example 5: Automatic centralised data backup in distributed systems
Another alternative consists of a combination of examples 3 and 4. The local data of the distributed systems is transferred to a central mainframe computer and/or a central server where the data backup is performed as a combination of full data backups and incremental data backups.
Example 6: Fully automatic decentralised data backup of data stored locally in distributed system
In contrast to the previous example, the data is transferred automatically in this case from the local systems to the central system. There are now tools available that allow a central data backup server to access the data stored locally. Therefore, the data can be backed up centrally and transparently to the local users.
In order to minimise the data volume stored on the storage medium, it is possible to use additional data compression algorithms. In some cases, the data volume can be reduced by up to 80%. If compression is used in the context of data backups, it must be ensured that the parameters and algorithms selected are documented and that this documentation is available within the framework of restoring the data (decompression).
There are two parameters that need to be specified for the approach: the degree of automation and the centralisation (storage location).
There are two degrees of automation; manual and automatic:
- Manual data backup means that the data backup is triggered manually. The advantage may be that the person triggering the backup can individually select the time of data backup according to the workflow. The disadvantage is that the effectiveness and quality of the data backup depends on the discipline and motivation of the person triggering the backup. Illness and other reasons for being absent can mean missed data backups.
- Automatic data backups are triggered under program control at certain times. The advantage is that the discipline and reliability of the person executing the backup is irrelevant as long as the backup schedule is complete and up-to-date. The disadvantage may be that the backup control programs generate costs and the backup schedule must be updated to reflect recent changes, or important changes might not be backed up immediately.
Regarding the storage location, data backups performed centrally and data backups performed locally must be differentiated:
- Central data backups are characterised in that the storage location is located on the central IT system and the data backup is performed by a single person on the central IT system. The advantage of this procedure is that only one employee requires intensive training and the users of the IT systems are relieved of this work. Another advantage is that more economical storage media can be used due to the increased centralisation of the data. The disadvantage is that confidential data might be transmitted and then read by unauthorised persons.
- Local data backups are performed by the IT users themselves without having to transfer the data to a central IT system. The advantage is that the IT users retain control over the data and the backup data media, which is especially important when the data is confidential. The disadvantage is that the consistency of the data backup depends on the reliability of the IT users and that decentralised solutions are more time-consuming for the IT users.
Once it has been decided whether to back up the data manually or automatically and centrally or locally, it is necessary to select suitable data media for the data backups. The following parameters can be considered for this purpose:
- Data medium request time: the time required to prepare for the restoration of the data is determined by the time needed to identify the necessary data backup media and to make them available in the system. Tape cartridges in a robotic system may be made available in minutes for restoration purposes, while archived tapes may need to be transported and extracted first.
- Access time, transfer rate: the time required for the creation and restoration of the data itself depends on the average access time to the data of the data medium and the data transfer rate. Hard disks permit access to certain files in milliseconds, while a magnetic tape needs to be spooled to the corresponding location first. When selecting the data medium, is must be ensured that the transmission channels are not overloaded when high transfer rates are used.
- Practicability/memory capacity: the more complicated the data backup, the greater the threat that the data will be backed up incorrectly or that the persons responsible do not back up the data at all. Data media with too little memory capacity prevent effective data backups, since the constant switching of media takes time and is prone to error.
- Costs: the costs of the data backups, i.e. the costs of purchasing read/write devices and data media, and the computing and working time required to perform them must be appropriate in relation to the purpose of the backups. The service life and the reliability of the data media must also be taken into account in this calculation.
The recurrent data backup costs must never exceed the sum of the cost of restoring the data without a data backup and cost of the consequential damages. The following influencing factors must be taken into account in this case:
Availability requirements:
The higher the availability requirements, the faster the access to the data media used as storage media for the data backups must be and therefore the faster the restoration process to restore the required data from the data media must be.
For availability reasons, it must be ensured that the storage media can still be used for restoration even when a media reader fails. The compatibility and function of a replacement media reader must be guaranteed.
Data and change volumes:
As the data volume increases, more economical tape storage media such as magnetic tapes or data cartridges are generally used.
Deadlines:
If deletion periods must be adhered to (e.g. for personal data), the selected storage medium must allow the data to be deleted. The use of storage media that can only be erased at great time and expense (e.g. WORM) should be avoided in this case.
Confidentiality and integrity requirements of the data:
If the confidentiality or integrity requirements of the data to be backed up are high, these protection requirements also apply to the data media used for the data backup. If encrypting the data backup is impossible, selecting data media that can be stored in data backup safes or normal safes due to their compact design and portability should be considered.
Knowledge of the IT users:
The knowledge and data processing-specific skills of the IT users is the deciding factor determining whether a procedure can be selected where the IT users themselves manually perform the data backups, whether other trained personnel will perform the data backup locally, or whether an automated data backup procedure is more practical.
Persons in charge of data backup
When deciding who is responsible for performing the data backup, there are basically three groups of people that come into question. The groups include the IT user himself/herself (common for decentralised and non-networked IT systems), the system administrator, or an administrator with special training in data backup procedures. If the data backup is not performed by the user, the persons in charge of the data backup must swear to secrecy in terms of the contents of the data, and consideration should be given to encryption as well.
Furthermore, the people responsible for deciding when it is necessary to restore data must be appointed. It is also necessary to clarify who is authorised to access the data backup media, especially when they are located in data backup archives. It must be ensured that only authorised persons have access to the archives. Finally, it is necessary to define who is authorised to perform a complete restoration of all data and who is allowed to restore individual files.
When specifying these responsibilities, it is especially important to consider the data confidentiality and integrity requirements and the trustworthiness of the employees responsible. It must be ensured that each person in charge is available, and that a substitute for each person has been appointed and trained.
The following influencing factor must be taken into account:
Knowledge of the IT users:
The knowledge and data processing-specific skills of each IT user determine whether or not this IT user should be allowed to perform the data backup on his own responsibility. If a given IT user does not have adequate knowledge, the responsibility must be transferred to the system administrator or a specifically trained person.
Storage location
Generally, the data backup media and original data media should be stored in different fire sections. If data backup media are stored in another building or off the organisation's property, there is a lower probability that the data backups will be affected in case of a disaster. However, the farther the data media are located away from the IT peripherals necessary for restoration (e.g. tape stations), the longer the transportation routes and transport times and the longer the overall restoration time will be. For this reason, the following influencing factors must be taken into account:
Availability requirements:
The higher the availability requirements, the faster the data media of the data backup need to be available. If the data media are stored externally for security reasons and the availability requirements are very high, consideration should be given to keeping extra copies of the data backups in reserve in the immediate vicinity of IT systems.
Confidentiality and integrity requirements of the data:
The higher these requirements, the more important it is to prevent the possibility of manipulation of the data media. The access control mechanisms required for this purpose can generally only be implemented via corresponding infrastructural and organisational safeguards (see module S 2.5 Data media archive).
Data volume:
As the data volume increases, the security of the storage location becomes more important.
Requirements for the data backup archive
Due to the high concentration of data on data backup media, these media have confidentiality and integrity requirements tat least as high as the requirements of the data backed up. When storing the data media in a central data backup archive, it is therefore necessary to implement correspondingly effective IT security safeguards such as access controls, for example.
In addition, organisational and personnel safeguards (data media control) must be implemented to ensure fast and direct access to data media needed is possible. Safeguard S 2.3 Data media control and module S 2.5 Data media archive must be considered in this context.
The following influencing factors must be taken into account:
Availability requirements:
The higher the availability requirements, the faster the access to required data media must be. If manual inventories do not fulfil the availability requirements, automated access methods (e.g. a robotic cartridge archive system) may be used.
Data volume:
In the end, the data volume determines the total number of data media to be stored. If large data volumes need to be archived, the storage capacity of the data media archive must be planned accordingly.
Deadlines:
If deletion deadlines need to be adhered to, the organisation of the data backup archive must be adapted accordingly, and it may also be necessary to have the required erasing equipment available. At the prescribed deletion times, deletion must be initiated, performed, and documented in the data backup archive. If deletion is not technically possible, organisational safeguards must ensure that the data to be deleted cannot be reused.
Confidentiality and integrity requirements of the data:
The higher these requirements, the more important it is to prevent the possibility of manipulation to the data media. The access control mechanisms required for this purpose can generally only be implemented by corresponding infrastructural and organisational safeguards (see module S 2.5 Data media archive).
Transport terms
Data is transported when performing a data backup. The data may be transmitted over a network or a cable, or the data media must be transported to the data media archive. In this, the following must be taken into account:
Availability requirements:
The higher the availability requirements, the faster the data needs to be available for restoration. This must be taken into account when selecting the data transmission medium and when selecting the data media transport route.
Data volume:
If data is transmitted over a network for the purpose of restoring the data, the data volume must be considered when selecting the transmission capacity of the network. It must be guaranteed that the data volumes can be transmitted in the required time (availability requirement).
Modification times of the data:
If data backups are performed over a network (especially at specific times), capacity bottlenecks may arise, depending on the amount of data to be transmitted. For this reason, it must be ensured that the network has sufficient data transmission capacity at the time of the data backup.
Confidentiality and integrity requirements of the data:
The higher these requirements, the more important it is to prevent the data from being read, copied without authorisation, or manipulated during transport. Encryption or cryptographic protection against manipulation must be considered for data transmissions;. when physically transporting data media, secure containers and routes must be used and the benefits of encryption should be weighed against the extra time and expense required for encryption.
Storage conditions
Within the framework of the data backup policy, it should also be examined if there are retention periods or deletion periods for certain data.
Deadlines
If there are retention periods for certain data, these periods can be complied with by archiving a data backup generation. If the retention periods are long, it must also be ensured that the necessary reading devices are available and that magnetic data media may need to be refreshed (resaving the data stored magnetically) under some circumstances, because the media may become demagnetised, and therefore lose data, over time.
If deletion deadlines must be adhered to, the organisational procedure for deletions must be specified, and the deletion equipment, if required, also must be available. The deletion must be initiated and performed on the prescribed deletion times.
Review questions:
- Was a data backup procedure specified for every IT system and for every type of data.
- Were the type, the frequency, and the times of data backups defined?
- Were the responsibilities for the data backups defined?
- Were the transport terms and storage conditions for the data backups clarified?