S 2.127 Inference prevention

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: Administrator

In order to protect personal data and other confidential data of a database system, each user must basically only be granted access to the data the user needs to perform his/her tasks. All other information stored additionally in the database should be hidden from the user's view.

For this purpose, it must be possible to define access authorisations to tables and even to individual fields of the tables. This can be accomplished using views and grants (see S 2.129 Controlling access to database information). Their use makes it possible to allow a user to only see and process the data intended for him/her. If the user then issues database queries requiring access to any other information, these queries will be rejected by the DBMS.

On the other hand, different protection requirements apply to statistical databases containing data on groups of people, the general population, or similar such data. In a statistical database, individual entries of personal data are subject to the data protection laws, but statistical information is accessible to all users.

In this case, it is desired to prevent knowledge of the data of a group from being used to gain knowledge of the data of an individual member of this group. Furthermore, the possibility of violating the anonymity of this data using correspondingly formulated database queries that utilise knowledge of the information stored in the database and/or of the storage structures of the data in the database (for example when the results returned by a database query only contain one record). This problem is referred to as the inference problem, and the protection against such techniques is referred to as inference prevention.

Even if the data in a statistical database has been anonymised, it is still possible to determine a relationship between a person and certain records using inference techniques. The rejection of certain requests (e.g. of requests with only one or a few result tuples) alone is generally inadequate protection, since the response from the DBMS to a rejected request may also contain information.

The anonymity of the data can also be lost by generating a variety of different statistics. The goal of such an indirect attack is to enable the attacker to deduce the personal data of a single individual from a series of statistics. One safeguard in this case is to prohibit the release of so-called sensitive statistics, which is referred to as "inference prevention through suppression". Another possibility is to distort such statistics through controlled rounding (a given statistic must always be rounded in the same manner) or to restrict the statistics to a statistically relevant subset of the data with the requirement that the same subset of data is always used for a given request. This prevention technique is referred to as "inference prevention through generalisation".

If additional requirements are placed on the confidentiality of the data, it is necessary to encrypt the data (refer to S 4.72 Database encryption).

Review questions:

Have the confidentiality and data protection requirements regarding the data in the database system been defined and documented?
Have the access authorisations of the users (e.g. using views and grants) been restricted in such a way that each user may only access the data he/she requires to perform his/her tasks?
Are inference prevention techniques used for statistic databases?