S 5.133 Selection of a signalling protocol for VoIP

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: Head of IT, Administrator

When using VoIP, the control information and the actual voice data are generally transmitted separately from each other using different transmission protocols. Control information, for example the "busy" state, is transmitted using signalling protocols such as H.323 or SIP (Session Initiation Protocol). To transmit the voice data, though, a media transport protocol, generally RTP (Real-time Transport Protocol), is used. Only very few protocols, for example IAX (InterAsterisk eXchange), do not separate the control information from the media information.

There are a variety of signalling protocols. Since these protocols are not compatible with each other, the selection of the protocol for the VoIP network plays an important role. VoIP components that do not support a common protocol cannot communicate with each other without a gateway. The use of a gateway translating the statements from one protocol to another protocol is very time-consuming and complex. For this reason, it must be ensured that only one signalling protocol is used, if possible.

The selection of the VoIP components to be used has a strong influence on the selection of the signalling protocol, since many VoIP components only support a certain signalling protocol. The differences between the protocols play only a minor role in terms of security. The choice of the signalling protocol should be documented.

In the following, the commonly used signalling protocols H.323 and SIP are examined. In addition to these protocols, the types of VoIP components needed (at a minimum) to make a telephone call are presented.

H.323

The group of protocols based on H.323 describes the transmission of real-time information (video, audio, and data) in packet-based transport networks. H.323 was originally developed as an implementation of the ISDN D-channel protocol Q.931 using an IP-based network. The H.225.0, H.245, H.450, and H.235 protocols are defined in this protocol group. H.323 describes the framework for the signalling protocols, H.225.0 the actual signalling, H.245 the control of the transmission of the voice information, and H.450 the actual telephony functionality. The optional support of H.235 offers protection for the integrity and confidentiality of the signalling. More detailed information can be obtained from the International Telecommunications Union (ITU), which is the body that specified these protocols. Audio and video data is transmitted via UDP; fax data via UDP or TCP. Before transmitting this real-time data, logical RTP and RTCP channels are established between the end points (terminals).

The following components may be involved in H.323 communication:

Terminals represent the end points of the users for H.323 communication. These terminal devices are generally equipped with a speaker and a microphone and provide the users with the option of establishing a connection to other communication partners. A direct connection between the terminal devices is only possible when the IP address is recognised.
Gatekeepers are used for the purpose of administration. Since a direct connection can only be established between two terminals when both IP addresses are known, a gatekeeper acts as a central control component in H.323 networks.
The Multipoint Control Unit (MCU) permits conference calls, i.e. calls involving more than two users. All media streams from the subscribers flow together in the optional MCU.
Gateways implement the transition to other networks and convert the user data and signalling information. Gateways act as an intermediary between IP and line switching telephone networks, for example.

The biggest disadvantage of H.323 is the complexity of the protocol. The large number of different protocols makes H.323 appear to be very muddled and complex. This complexity makes troubleshooting more difficult and may lead to additional costs. Another disadvantage is that many manufacturers give priority to SIP, which described in the following, in new products.

Session Initiation Protocol (SIP)

SIP is a text-based client server session signalling protocol from the IETF (Internet Engineering Task Force) used to control the opening and closing of the connections used by multimedia services and described in RFC 3261. Additional functionality such as video conferencing, instant messaging, distributed computer games, and other applications require an extension of the SIP specification. These extensions can be found in separate RFCs. The stream of multimedia information, for example the voice information of a telephone call, is generated using RTP. The signalling is often protected using SSL, TLS (Transport Layer Security), or IPSec in practical applications.

The addressing scheme of SIP is very similar to the email addressing scheme (sip:username@provider-name.org). Localisation is performed using DNS (Domain Name System). SIP supports point-to-point and point-to-multipoint IP connections. Due to the simple, plain text design of the SIP packets and the low level of complexity, SIP is being used more and more often.

The following VoIP components may be involved when communicating using SIP:

The terminal devices (telephone, softphone, or gateway) are referred to as user agents (UA). A user agent can assume the role of a client and/or a server. The initiator of a call operates as a user agent server (UAS), and the user called as a user agent client (UAC). A SIP terminal system always contains both functions.
The location server supplies the IP address of the desired communication partner when a corresponding query is issued. The communication partner can be identified by his/her user name.
A registrar permits the users to log in and register. To do this, the terminal device logs in to the registrar using an identifier (user name, password) and its SIP address. The registrar informs the location server of the address (IP address at which it is publicly accessible) of the terminal device. The terminal device can then be located, since it is now registered.
An SIP proxy assumes the role of an intermediary that processes or forwards the signalling information. A user agent sends a query to the SIP proxy. The SIP proxy then interprets the query and sends it to the user agent after processing it accordingly. If necessary, the messages are changed by the SIP proxy.

Although SIP has been standardised, the standard is often interpreted differently by the various manufacturers of VoIP components. This lack of interoperability means that not all VoIP functions in VoIP networks containing components from different manufacturers will be available in their entirety. The functions most commonly affected are the authentication function between the systems, the encryption function, and the functions providing value-added services. When purchasing VoIP components, the interoperability of the new components with existing components should be examined for this reason.

When using SIP in firewall or NAT environments, there are other special aspects that must be taken into account. It is difficult for terminal devices located in NAT environments to communicate with VoIP systems outside of the NAT environment due to the complexity. Additional information on this subject can be found in safeguard S 5.137 Use of NAT for VoIP.

Review questions:

Is only one signalling protocol used and is its choice documented?
Is it ensured that the VoIP components support the selected signalling protocol?