S 2.134 Guidelines for database queries

Initiation responsibility: IT Security Officer, Head of IT

Implementation responsibility: Application Developer

The relational database language SQL (Structured Query Language) is a standardised international language for relational database systems (DBS) that is now widely used and is implemented in most database management systems (DBMS). The commands of the language are specified in regularly revised standards (ANSI SQL-92, ANSI SQL-99, or ANSI SQL-2003). SQL can be used to modify data (UPDATE, INSERT, DELETE), manipulate database objects (CREATE, ALTER, DROP) and request information (SELECT).

Guidelines for the efficient, maintainable, and understandable programming of database queries must be created and implemented when actually programming. The following basic principles should be described in this guideline:

Queries to the database should not be executed directly on tables, but should be executed through views and procedures instead. On the one hand, this makes it easier to guarantee that the data is protected (see S 2.129 Controlling access to database information). On the other hand, it is also possible to ensure that the users are provided with the necessary information in a corresponding format and in the amounts required. In addition, these views and procedures can be stored in a separate database, in which case the users and applications are only granted access to this remote database. The data in the tables in this case can only be accessed by a special group of users (administrators, etc.) or by using the procedures and views stored in the remote database.
SQL queries should be formulated precisely and explicitly based on the database model. When specifying SQL queries, all fields to be queried should be specified explicitly, and the use of the "*" operator should be avoided. This ensures that the data will be provided in the expected order and that only the data actually needed will be selected.

Example:

A database model contains a table containing the "Article number", "Article designation", "Intended use", and "Net price" fields. While programming an extension to the application, an additional field named "Order number" is inserted after the "Intended use" field. However, for storage optimisation reasons, the DBMS does not actually insert the new field where intended, but at the second position after the "Article number" field. Since the data is queried using a SELECT * statement, the database returns the information in a different order than the order expected by the application. This leads to problems in the application, and the cause of the problems is difficult to detect at first.
In database queries with conditions (WHERE clause), the order in which the selection conditions are specified has a significant effect on the execution speed of the statement. The WHERE clause should be formulated so that the condition specified first is the one that will select the smallest possible set of results in the shortest time. The statement should also access indexed fields first, and then the non-indexed fields. Note that in general, searches for numerical data are faster than searches for text data. The same applies to database queries formulated to span more than one table (referred to as joins).

Many DBMSs already optimise database queries automatically. In many cases, several or even numerous optimisation strategies are offered for selection. They are then selected using various parameters.

Some DBMSs offer the ability to analyse the processing of database queries (for example, using EXPLAIN in Oracle or using SETOEP in Ingres). Furthermore, it is also possible using HINTS in the database query to define its processing explicitly, in which case the optimiser is disabled in principle. However, this option should be used with great care.
The optimisers supported by the DBMS as well as their advantages and disadvantages are normally documented in the manuals provided with the DBMS. The approval of the administrator should be obtained before using alternative optimisers in a DBMS.
In the case of joins, it should also be ensured that the fields are allocated uniquely to the tables.
Example:

Figure: Field allocations in a join

The field "ID" is available in both tables, and for this reason, the corresponding table names must be specified explicitly in the database query. Otherwise, it is not possible to ensure the uniqueness of the selection, and the database query will be rejected with the issue of a corresponding error message.

All other fields can be allocated uniquely to the relevant tables in this case. SQL does not require an explicit specification of the related table name for each field. Nevertheless, the individual fields should be allocated to the tables, as shown above in the "Price" and "Designation" fields of table TabB. The addition of a "Designation" field to TabA would not lead to any problems in the example given above. However, this would not be the case if the SQL statement did not contain an explicit allocation of the fields to the tables. It would be impossible to determine if the "Designation" field in TabA or TabB should be selected since both tables would have a field with this name after changing TabA. The SQL statement would then be aborted with a corresponding error message.

All database transactions should be confirmed explicitly using a COMMIT statement. If the DBMS provides an option for automatic COMMIT, then this option should not be enabled since its use can lead to unwanted inconsistencies in the database under certain circumstances.

Example:
Several individual modifications belong together logically, but each modification statement is automatically confirmed with COMMIT after the statement has completed execution. If the transaction is now interrupted in an uncontrolled manner, thus leading to a rollback, the operations already completed are confirmed and remain in the database, while the remainder remains unexecuted.
To prevent locking conflicts or even deadlocks, a locking strategy should be defined for each application-specific database (for example using hierarchical locks or explicitly locking all tables at the beginning of a transaction).
Application developers should check the error status after every SQL statement so that the application can react as quickly as possible to any errors that occur.
Authorisation to execute system-specific commands that can be used, for example, to disable the logging function or change the locking procedure should not be granted to users and should be limited to administrators only.
During the development of an application, all database accesses should be collected in one module or a certain part of the program code because otherwise the entire program code of the application system will need to be examined to check if the basic principles stated above are being followed. This also makes it easier to maintain and update the application system, for example after changing the data model.

Review questions:

Are there guidelines for the programming of database queries?
Are there guidelines specifying that queries to the database have to be executed through views and procedures instead of directly on tables?
If views and procedures are stored in a separate database, do the guidelines for database queries specify that the data in the tables can only be accessed by a limited group of users?
Do the guidelines for database queries include the requirement that SQL queries must be formulated precisely and explicitly based on the database model avoiding the use of the "*" operator?
Do the guidelines for database queries contain requirements for appropriate optimisation of database queries, if their execution speed is relevant?
Do the guidelines for database queries always request explicit allocation of fields to tables in the case of joins in a database query?
Do the guidelines for database queries include the requirement that all database transactions must be confirmed explicitly using a COMMIT statement?
Do the guidelines for database queries specify a locking strategy to prevent locking conflicts?
Is there a guideline that specifies that authorisations to execute system-specific commands should be limited to administrators only?