S 2.244 Determination of the technical influencing factors for electronic archiving
Initiation responsibility: IT Security Officer
Implementation responsibility: IT Security Officer, Archive Administrator
Before a decision as to which procedures and products are to be used for electronic archiving can be made, a host of technical influencing factors must be determined. For this, the owners of the data to be archived should be consulted, i.e. the persons in charge of the individual IT systems and/or IT applications and the system administrators, for example. The results must be documented comprehensibly in the archiving concept (see safeguard S 2.243 Development of an archiving concept). Amongst other things, the technical influencing factors decisive for electronic archiving include:
- the data volume to be expected,
- the file formats of the documents to be archived,
- the change volume and version control,
- the retention period of the documents,
- the number and type of accesses,
- the existing IT application environment, as well as
- standards to be observed.
The aforementioned influencing factors are explained in more detail below:
Data volume to be expected
The size of the files to be archived and the data volume to be expected in the future are essential criteria for the selection of electronic archive systems. This may typically only be determined by an outside estimate.
The file size of documents strongly depends on the selected file format and the extent of rendition (see below).
File formats of the documents to be stored
Depending on the selected archive system, all file formats used can be stored to the system as a matter of principle, e.g. the formats commonly used in office environments (DOC, PDF, RTF, ASCII, ZIP, etc.) or image and audio files (JPG, GIF, WAV, MPEG, etc). However, file formats offering long-term stability regarding the syntax or semantics of the data (such as SGML, XML, or HTML) or image files with the power of representing an exact image of the formerly present paper document (e.g. TIFF) are of particular importance in the field of archiving. The individual data formats are described in safeguard S 4.170 Selection of suitable data formats for the archival storage of documents in more detail.
In the past, several file formats characterised by a different suitability for future purposes of the data were established for electronic archiving. However, the future purpose often cannot or should not be defined. In such a case, the best data format for the future purpose cannot be predicted. Concurrent requirements regarding the selection of the file format resulting from the different purposes are often equally applicable at the time the data is being stored. Therefore, simultaneously archiving documents in several file formats proved to be advantageous, especially for long-term archiving. For this, the documents must be converted beforehand. This procedure is referred to as rendition. However, an exact documentation of the approach must be observed during rendition. Information about the original format must also be archived.
The rendition of documents and the subsequent storage in several file formats has direct effects on the memory capacity required for archiving.
Change volume and number of versions
When archiving documents, it must be considered which changes will be made to the documents over the course of time, how often this must be expected, and what the corresponding approach should be. If archived documents are to be changed, there are the following options:
- The original document is replaced by the changed version.
- The new version of the document is archived in addition to the original version (version control), with only a maximum number of versions of the same document remaining archived (number of versions).
Version control of the documents may be required based on organisation-internal or legal requirements. Refer to safeguards S 2.245 Determination of the legal influencing factors for electronic archiving and S 2.246 Determination of the organisational influencing factors for electronic archiving in particular.
Version control may be forced by the selection of the storage media (e.g. WORM - Write Once Read Multiple).
If version control is performed for documents, this must be taken into consideration when calculating the required memory capacity of the archive system.
Retention period of the documents
In order to calculate the required memory capacity of the archive system, an estimation of the retention period of the archived documents is indispensable. Based on legal or organisation-internal specifications, minimum, but sometimes also maximum storage periods that must be observed result for the retention period.
However, the retention period not only influences the memory capacity of the archive system, but also the selection of the storage medium and its disposal upon expiration of the retention period.
Number and type of accesses
The number of accesses, as well as the type of accesses to the archive system affect the configuration of the archive server and the selection of the storage components.
Therefore, the following influencing factors must be determined:
- How many times will the archive system be accessed within a given period?
- What is the number of write accesses when compared to read accesses?
- What are the required response times?
- Is the archive system accessed directly by user and/or client systems or by a superior document management system?
- Does the archive system have to differentiate between accesses of different users or is this performed by superior components?
- Does the archive system have to manage several separate archives (multi-client capability)?
IT application environment
Archive systems are typically embedded in more complex IT landscapes. This results in technical requirements, e.g. regarding
- the network connection,
- the available network protocols (the definition of which must be known, for instance, if the communication connection is routed via firewalls),
- the compatibility to other programs or IT systems,
- the integration into system management environments both for administrating and for monitoring the archive system,
- the administration and usage interfaces, as well as
- the response times of the archive system.
Standards to be observed
The standards applicable to the field of archiving focus on the following areas:
- file formats and compression procedures,
- storage media and their recording procedures, as well as
- document management software.
By disclosing the interfaces within the framework of standardisation, system manufacturers are provided with the option of ensuring compatibility of system components, interfaces, and data formats. Therefore, long-term planning and investment security can be guaranteed when selecting archive systems by taking into consideration the standards. The safeguards recommended within this module refer to the standards currently applicable.
For the user, the compliance with standards results in a reduction of the dependency on individual manufacturers, system suppliers, and service providers. With the long periods over which archive systems are typically used, this is particularly important, since the development of product lines cannot be predicted. For example, if a manufacturer of proprietary storage components falls into insolvency, this may result in the problem that the archiving system can no longer be expanded as before by purchasing new storage media and components. In government agencies and companies with high archiving requirements, this will typically cause the start of the migration phase. When using standardised components however, another supplier can simply be selected for the affected sub-components .
However, with regard to the standards it must be observed that their relevance is also reduced over the course of time due to new technological developments and that these are then replaced by new standards, if required. These standards may occasionally be characterised by fundamental differences regarding the content, but may outwardly only differ by the version number. Furthermore, there is also competition between different standardisation committees and manufacturers naturally looking for economic influence in the market, with concurrent standards being the result.
However, archiving is basically also possible without taking into consideration standards by using proprietary file and storage formats, if the manufacturer ensures sufficient maintenance and system support during the archiving period and an adaptation of the interfaces to changing requirements. For the reasons mentioned above, it is recommended to be guided by applicable standards for file formats and interfaces when planning archive systems.
Future migration should already be taken into consideration when planning an archive system, since typically the technology or the requirements may change during the long-term storage of data. Therefore, particular diligence should be applied when planning and selecting interfaces, file formats, and the index database and all decisions should be documented comprehensibly.
Review questions:
- Have the technical influencing factors been determined and documented before making the decision in favour of an archiving system?
- Is the size of the data to be archived and the data volume to be expected in the future estimated?
- Is it determined which changes will occur to archived documents at which frequency?
- Is the number and the type of accesses to the archive system within a given period determined and are the required response times defined?
- Is it determined whether the archive system needs to be able to manage separate archives and to differentiate between users?
- Is the application environment of the archive system determined?
- Use of proprietary file and storage formats: Are maintenance, system support, and adaptation guaranteed on the basis of contracts concluded with the manufacturers?