S 5.133 Selection of a signalling protocol for VoIP

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: Head of IT, Administrator

When using VoIP, the control information and the actual voice data are generally transmitted separately from each other using different transmission protocols. Control information, for example the "busy" state, is transmitted using signalling protocols such as H.323 or SIP (Session Initiation Protocol). To transmit the voice data, though, a media transport protocol, generally RTP (Real-time Transport Protocol), is used. Only very few protocols, for example IAX (InterAsterisk eXchange), do not separate the control information from the media information.

There are a variety of signalling protocols. Since these protocols are not compatible with each other, the selection of the protocol for the VoIP network plays an important role. VoIP components that do not support a common protocol cannot communicate with each other without a gateway. The use of a gateway translating the statements from one protocol to another protocol is very time-consuming and complex. For this reason, it must be ensured that only one signalling protocol is used, if possible.

The selection of the VoIP components to be used has a strong influence on the selection of the signalling protocol, since many VoIP components only support a certain signalling protocol. The differences between the protocols play only a minor role in terms of security. The choice of the signalling protocol should be documented.

In the following, the commonly used signalling protocols H.323 and SIP are examined. In addition to these protocols, the types of VoIP components needed (at a minimum) to make a telephone call are presented.

H.323

The group of protocols based on H.323 describes the transmission of real-time information (video, audio, and data) in packet-based transport networks. H.323 was originally developed as an implementation of the ISDN D-channel protocol Q.931 using an IP-based network. The H.225.0, H.245, H.450, and H.235 protocols are defined in this protocol group. H.323 describes the framework for the signalling protocols, H.225.0 the actual signalling, H.245 the control of the transmission of the voice information, and H.450 the actual telephony functionality. The optional support of H.235 offers protection for the integrity and confidentiality of the signalling. More detailed information can be obtained from the International Telecommunications Union (ITU), which is the body that specified these protocols. Audio and video data is transmitted via UDP; fax data via UDP or TCP. Before transmitting this real-time data, logical RTP and RTCP channels are established between the end points (terminals).

The following components may be involved in H.323 communication:

The biggest disadvantage of H.323 is the complexity of the protocol. The large number of different protocols makes H.323 appear to be very muddled and complex. This complexity makes troubleshooting more difficult and may lead to additional costs. Another disadvantage is that many manufacturers give priority to SIP, which described in the following, in new products.

Session Initiation Protocol (SIP)

SIP is a text-based client server session signalling protocol from the IETF (Internet Engineering Task Force) used to control the opening and closing of the connections used by multimedia services and described in RFC 3261. Additional functionality such as video conferencing, instant messaging, distributed computer games, and other applications require an extension of the SIP specification. These extensions can be found in separate RFCs. The stream of multimedia information, for example the voice information of a telephone call, is generated using RTP. The signalling is often protected using SSL, TLS (Transport Layer Security), or IPSec in practical applications.

The addressing scheme of SIP is very similar to the email addressing scheme (sip:username@provider-name.org). Localisation is performed using DNS (Domain Name System). SIP supports point-to-point and point-to-multipoint IP connections. Due to the simple, plain text design of the SIP packets and the low level of complexity, SIP is being used more and more often.

The following VoIP components may be involved when communicating using SIP:

Although SIP has been standardised, the standard is often interpreted differently by the various manufacturers of VoIP components. This lack of interoperability means that not all VoIP functions in VoIP networks containing components from different manufacturers will be available in their entirety. The functions most commonly affected are the authentication function between the systems, the encryption function, and the functions providing value-added services. When purchasing VoIP components, the interoperability of the new components with existing components should be examined for this reason.

When using SIP in firewall or NAT environments, there are other special aspects that must be taken into account. It is difficult for terminal devices located in NAT environments to communicate with VoIP systems outside of the NAT environment due to the complexity. Additional information on this subject can be found in safeguard S 5.137 Use of NAT for VoIP.

Review questions: