Chapter 4 Session Control on the Internet Many think that the most important component of the signaling plane is the protocol that performs session control. The protocol chosen to perform this task in the IMS is the Session Initiation Protocol (SIP) (defined in RFC 3261 [286]). SIP was originally developed within the SIP working group in the IETF. Even though SIP was initially designed to invite users to existing multimedia conferences, today it is mainly used to create, modify and terminate mu ltimedia sessions. In addition, there exist SIP extensions to deliver instant messages and to handle subscriptions to events. We will first look at the core protocol (used to manage multimedia sessions), and then we will deal with the most important extensions. 4.1 SIP Functionality Protocols developed by the IETF have a well-defined scope. The functionality to be provided by a particular protocol is carefully defined in advance before any working group starts working on it. In our case the main goal of SIP is to deliver a session description to a user at their current location. Once the user has been located and the initial session description delivered, SIP can deliver new session descriptions to modify the characteristics of the ongoing sessions and terminate the session whenever the user wants. 4.1.1 Session Descriptions and SDP A session description is, as its name indicates, a description of the session to be established. It contains enough information for the remote user to join the session. In multimedia sessions over the Internet this information includes the IP address and port number where the media needs to be sent, and the codecs used to encode the voice and the images of the p articipants. Session descriptions are created using standard formats. The most common format for describing multimedia sessions is the Session Description Protocol (SDP), defined in RFC 2327 [160]. Note that although the “P” in SDP stands for “Protocol”, SDP is simply a textual format to describe multimedia sessions. Figure 4.1 shows an example of an SDP session description that Alice sent to Bob. It contains, among other things, the subject of the conversation (swimming techniques), Alice’s IP address (192.0.0.1), the port number where Alice wants to receive audio (20000), the port number where Alice wants to receive video ´ıa- M ar t´ın The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds Third Edition Gonzalo Camarillo and Miguel A. Garc © 2008 John Wiley & Sons, Ltd. ISBN: 978- 0- 470- 51662- 1 60 CHAPTER 4. SESSION CONTROL ON THE INTERNET v=0 o=Alice 2790844676 2867892807 IN IP4 192.0.0.1 s=Let’s talk about swimming techniques c=IN IP4 192.0.0.1 t=0 0 m=audio 20000 RTP/AVP 0 a=sendrecv m=video 20002 RTP/AVP 31 a=sendrecv Figure 4.1: Example of an SDP session description (20002), and the audio and video codecs that Alice supports (0 corresponds to the audio codec G.711 µ-law and 31 corresponds to the video codec H.261). As we can see in Figure 4.1 an SDP description consists of two parts: session-level information and media-level information. The session-level information applies to the whole session and comes before the m= lines. In our example, the first five lines correspond to session-level information. They provide version and user identifiers (v= and o= lines), the subject of the session (s= line), Alice’s IP address (c= line), and the time of the session (t= line). Note that this session is supposed to take place at the moment when this session description is received. That is why the t= line is t=0 0. The media-level information is media-stream specific and consists of an m= line and a number of optional a= lines that provide further information about the media stream. Our example has two media streams and, thus, has two m= lines. The a= lines indicate that the streams are bidirectional (i.e., users send and receive media). As Figure 4.1 illustrates, the format of all the SDP lines consists of type=value,where type is always one character long. Table 4.1 shows all the types defined by SDP. Even if SDP is the most common format to describe m ultimedia sessions, SIP does not depend on it. SIP is session-description format independent. That is, SIP can deliver a description of a session written in SDP or in any other format. For example, after the video conversation above about swimming techniques, Alice feels like inviting Bob to a real training session this evening in the swimming pool next to her place. She uses a session description format for swimming sessions to create a session description and uses SIP to send it to Bob. Alice’s session description looks something like the one in Figure 4.2. This example is intended to stress that SIP is completely independent of the format of the objects it transports. Those objects may be session descriptions written in different formats or any other piece of information. We will see in subsequent sections that SIP is also used to deliver instant messages, which of course are written using a different format from SDP and from our description format for swimming sessions. 4.1.2 The Offer/Answer Model In the SDP example in Figure 4.1, Alice sent a session description to Bob that contained Alice’s transport addresses (IP address plus port numbers). Obviously, this is not enough to establish a session between them. Alice needs to know Bob’s transport addresses as well. SIP provides a two-way session-description exchange called the offer/answer model (which is described in RFC 3264 [283]). One of the users (the offerer) generates a session description 4.1. SIP FUNCTIONALITY 61 Table 4.1: SDP types Type Meaning v Protocol version b Bandwidth information o Owner of the session and session identifier z Time zone adjustments s Name of the session k Encryption key i Information about the session a Attribute lines u URL containing a description of the session t Time when the session is active e Email address to obtain information about the session t Times when the session will be repeated p Phone number to obtain information about the session m Media line c Connection information i Information about the media line Subject: Swimming Training Session Time: Today from 20:00 to 21:00 Place: Lane number 4 of the swimming-pool near my place Figure 4.2: Example of a session description without SDP being used (the offer) and sends it to the remote user (the answerer), who then generates a new session description (the a nswer) and sends it to the offerer. RFC 3264 [283] provides the rules for offer and answer generation. After the offer/answer exchange, both users have a common view of the session to be established. They know, at least, the formats they can use (i.e., formats that the remote end understands) and the transport addresses for the session. The o ffer/answer exchange can also provide extra information, such as cryptographic keys to encrypt traffic. Figure 4.3 shows the answer that Bob sent to Alice after having received Alice’s offer in Figure 4.1. Bob’s IP address is 192.0.0.2, the port number where Bob will receive audio is 30000, the port number where Bob will receive video is 30002, and, fortunately, Bob supports the same audio and video codecs as Alice (G.711 µ-law and H.261). After this offer/answer exchange, all they have left to do is to have a nice video conversation. 4.1.3 SIP and SIPS URIs SIP identifies users using SIP URIs, which are similar to email addresses; they consist of a username and a domain name. In addition, SIP URIs can contain a number o f parameters 62 CHAPTER 4. SESSION CONTROL ON THE INTERNET v=0 o=Bob 234562566 236376607 IN IP4 192.0.0.2 s=Let’s talk about swimming techniques c=IN IP4 192.0.0.2 t=0 0 m=audio 30000 RTP/AVP 0 a=sendrecv m=video 30002 RTP/AVP 31 a=sendrecv Figure 4.3: Bob’s SDP session description (e.g., transport), which are encoded using semicolons. The following are examples of SIP URIs: sip:Alice.Smith@domain.com sip:Bob.Brown@example.com sip:carol@ws1234.domain2.com;transport=tcp In addition, users can be identified using SIPS U RIs. Entities contacting a SIPS URI use TLS (Transport Layer Security, see Section 11.3) to secure their messages. The following are examples of SIPS URIs: URI}sips:Alice.Smith@domain.com URI}sips:Bob.Brown@example.com 4.1.4 User Location We said earlier that the main purpose of SIP is to deliver a session description to a user at their current location, and we have already seen what a session description looks like. Now let us lo ok at how SIP tracks the location of a given user. SIP provides personal mobility. That is, users can be reached using the same identifier no matter where they are. For example, Alice can be reached at sip:Alice.Smith@domain.com regardless of her current location. This is her public URI, also known as her AoR (Address of Record). Nevertheless, when Alice is logged in at work her SIP URI is sip:asmith@ws1234.company.com and when she is working at her computer at the university her SIP URI is sip:alice@pc12.university.edu Therefore, we need a way to map Alice’s public URI sip:Alice.Smith@domain.com to her current URI (at work or at the university) at any given moment. 4.2. SIP ENTITIES 63 To do this, SIP introduces a network element called the registrar of a particular domain. A registrar handles requests addressed to its domain. Thus, SIP requests sent to sip:Alice.Smith@domain.com will be handled by the SIP registrar at domain.com. Every time Alice logs into a new location, she registers her new location with the registrar at domain.com, as shown in Figure 4.4. This way the registrar at domain.com can always forward incoming requests to Alice wherever she is. Registrar at domain.com sip:asmith@ws1234.company.com sip:alice@pc12.university.edu REGISTER sip:Alice.Smith@domain.com Figure 4.4: Alice registers her location with the do main.com registrar On reception of the registration the registrar at domain.com can store the mapping between Alice’s public URI and her current location in two ways: it can use a local database or it can upload this mapping in to a location server. If the registrar uses a location server, it will need to consult it when it receives a request for Alice. Note that the interface between the registrar and the location server is not based on SIP, but on other protocols. 4.2 SIP Entities Besides the registrars, which were introduced in the previous section, SIP defines user agents, proxy servers, and redirect servers. UAs (user agents) are SIP endpoints that are usually handled by a user. In any case, user agents can also establish sessions automatically with no user intervention (e.g., a SIP voicemail). Sessions are typically established b etween user agents. User agents come in all types of flavors. Some are software running on a computer, others, like the commercial SIP phones shown in Figure 4.5, look like desktop phones, and others still are embedded in mobile devices like laptops, PDAs, or mobile phones. Some of them are not even used for telephony and do not have speakers or microphones. Proxy servers, typically referred to as proxies, are SIP routers. A proxy receives a SIP message from a user agent or from another proxy and routes it toward its destination. 64 CHAPTER 4. SESSION CONTROL ON THE INTERNET Figure 4.5: Three examples of commercial SIP phones Routing the request involves relaying the message to the destination user agent or to another proxy in the path. It is important to understand fully how SIP routing works, because it is one of the key components of the protocol. A given user can be available at several user agents at the same time. For instance, Alice can be reachable on her computer at the university sip:alice@pc12.university.edu and on her PDA with a wireless connection sip:alice@pda.com She has registered both locations with the registrar at domain.com. If the registrar receives a SIP message addressed to Alice’s public URI sip:Alice.Smith@domain.com it has to decide whether to route it to Alice’s computer or to Alice’s PDA. In this case, Alice has programmed the registrar to route SIP messages to her computer between 8:00 and 13:00 4.2. SIP ENTITIES 65 and to her PDA from 13:00 to 14:00. The registrar simply checks the current time and routes the SIP message accordingly. Being able to route SIP messages on the basis of any criteria is a very powerful tool for building services that are specially tailored to the needs of each user. Users typically choose to route SIP messages based on the sender, the time of the day, whether the subject is business-related or personal, the type of session (e.g., route video calls to the computer with the big screen), etc.; the combinations are infinite. In the previous example we saw that the registrar routed the SIP message to Alice’s user agent. Yet the entities handling routing of messages are called proxies. Proxies and registrars are only logical roles. In our example, the same physical box acted as a registrar when Alice registered her current location and as a proxy when it was routing SIP messages toward Alice’s user agent. This configuration is shown in Figure 4.6. Figure 4.6: Proxy co-located with the registrar of the domain A different configuration could consist of using a separate physical box for each role, as shown in Figure 4.7. Here, the proxy needs to access the information about Alice’s location that the registrar got in the first place. This is resolved by adding a location server. The registrar uploads Alice’s location to the location server, and the proxy consults the location server in order to route incoming messages. 4.2.1 Forking Proxies In the previous examples the proxy chose a single user agent as the destination of the SIP message. However, sometimes it is useful to receive calls on several user agents at the same time. For instance, in a house with a single line, all the telephones ring at once, giving us the chance to pick up the call in the kitchen or in the living room. SIP proxy servers that route messages to more than one destination are called forking proxies, as shown in Figure 4.8. A forking proxy can route messages in parallel or in sequence. An example of parallel forking is the simultaneous ringing of all the telephones in a house. Sequential forking consists of the proxy trying the different locations one after the other. A proxy can, for example, let a user agent ring for a certain period of time and, if the user does not pick up, try a new user agent. 66 CHAPTER 4. SESSION CONTROL ON THE INTERNET Figure 4.7: Proxy and registrar kept separate Figure 4.8: Forking proxy operation 4.2.2 Redirect Servers Redirect servers are also used to route SIP messages, but they do not relay the message to its destination as proxies do. Redirect servers in str uct the entity that sent the message (a user agent or a proxy) to try a new location instead. Figure 4.9 shows how redirect servers work. A user agent sends a SIP message to sip:Alice.Smith@domain.com and the red irect server tells it to try the alternative address sip:alice@pda.com 4.3. MESSAGE FORMAT 67 Figure 4.9: Redirect server operation 4.3 Message Format SIP is based on HTTP [144] and so it is a textual request-response protocol. Clients send requests, and servers answer with responses. A SIP transaction consists of a request from a client, zero or more provisional responses, and a final response from a server. We will introduce the format of SIP requests and responses before explaining, in Section 4.8, the types of transactions that SIP defines. Figure 4.10 shows the format of SIP messages. They start with the start line,whichis called th e request line in requests and the status line in responses. The start line is followed by a number of header fields that follow the format name:value and an empty line that separates the header fields from the optional message body. Start line A number of header fields Empty line Optional message body Figure 4.10: SIP message format 4.4 The Start Line in SIP Responses: the Status Line As we said earlier the start line of a response is referred to as the status line. The status line contains the protocol version (SIP/2.0) and the status of the transaction, which is given in numerical (status code) and human-readable (r eason phrase) formats. The following is an example of a status line: SIP/2.0 180 Ringing The protocol version is always set to SIP/2.0 (a history of previous versions of the protocol is given in SIP Demystified [97]). We will see in Section 4.11 how SI P is extended without it being necessary to increase its protocol version. The status code 180 indicates that the remote user is being alerted. Ringing is the reason phrase and it is intended to be read by a human (e.g., displayed to the user). Since it is intended for human consumption the reason phrase can be written in any language. 68 CHAPTER 4. SESSION CONTROL ON THE INTERNET Responses are classified by their status codes, which are integers that range from 100 to 699. Table 4.2 shows how status codes are classified according to their values. Table 4.2: Status code ranges Status code range Meaning 100–199 Provisional (also called informational) 200–299 Success 300–399 Redirection 400–499 Client error 500–599 Server error 600–699 Global failure Apart from the start line (status line in responses and request line in requests) the format of requests and responses is identical, as shown in Figure 4.10. So, let us now tackle the format of the request line and then the format of the rest of the message. 4.5 The Start Line in SIP Reques ts: the Request Line The start line in requests is referred to as the request line. It consists of a method name,the Request-URI, and the protocol version SIP/2.0. The method name indicates the purpose of the request and the Request-URI contains the destination of the request. Below, is an example of a request line: INVITE sip:Alice.Smith@domain.com SIP/2.0 The method name in this example is INVITE. It indicates that the purpose of this request is to invite a user to a session. The Request-URI shows that this request is intended for Alice. Table 4.3 shows the methods that are currently defined in SIP and their meaning. Figure 4.11 shows a SIP transaction. The user agent client (UAC) sends a BYE request, and the user agent server (UAS) sends back a 200 (OK) response. Note that, usually, SIP message flows only show the method name of the request and the status code and the reason phrase of the response. These pieces of information are usually enough for any message flow to be understood. Before explaining the types of SIP transactions and how to use them, we will study the formats of SIP header fields and bodies. After that, we will provide the readers with some message flows that will help them to understand how to perform useful tasks, such as establishing a session using SIP. 4.6 Header Fields Right after the start line, SIP messages (both requests and responses) contain a set of header fields (see Figure 4.10). There are mandatory header fields that appear in every message and optional header fields that only appear when needed. A header field consists of the header field’s name, a colon, and the header field’s value, as shown in the example below: To: Alice Smith <sip:Alice.Smith@domain.com>;tag=1234 [...]... the body is a session description, the Content-Type indicates that the session description uses the SDP format, and the ContentLength contains the length of the body in bytes Figure 4. 12 shows an example of a multipart body encoded using MIME The first body part is an SDP session description and the second body part consists of the text “This is the second body part” Note that the Content-Type for the. .. description formats may define their own extension for preconditions in the future When a user agent receives an offer with preconditions, it does not alert the user until those preconditions are met The preconditions are encoded, as mentioned earlier, in the SDP body There are two types of precondition: access preconditions and end-to-end preconditions Figure 4. 34 shows an SDP with access preconditions The. .. Figure 4. 34: Access preconditions 4. 15 EVENT NOTIFICATION 85 Figure 4. 35 shows an SDP with end-to-end preconditions The user agent that generated this session description is requesting optional end-to-end (e2e) QoS in both directions (sendrecv) m=audio 20000 RTP/AVP 0 a=curr:qos e2e none a=des:qos optional e2e sendrecv Figure 4. 35: End-to-end preconditions When mandatory preconditions appear in a session. .. 4. 8 SIP TRANSACTIONS 71 Content-Type: multipart/mixed; boundary="0806 040 5 040 00805090" Content-Length: 3 84 0806 040 5 040 00805090 Content-Type: application/sdp Content-Disposition: session v=0 o=Alice 2790 844 676 2867892807 IN IP4 192.0.0.1 s=Let’s talk about swimming techniques c=IN IP4 192.0.0.1 t=0 0 m=audio 20000 RTP/AVP 0 a=sendrecv m=video 20002 RTP/AVP 31 a=sendrecv 0806 040 5 040 00805090-Content-Type:... application-layer acknowledgement message for them in core SIP Therefore, there is an extension (defined in RFC 3262 [2 84] ) whose option tag is 100rel that creates such a message: a PRACK request Figure 4. 33 shows how this works Figure 4. 33: Reliable provisional responses and PRACK CHAPTER 4 SESSION CONTROL ON THE INTERNET 84 The INVITE request in Figure 4. 33 contains the 100rel option tag, which requests the. .. header field of an error response Such an error response terminates the establishment of the dialog Figure 4. 28 shows a successful extension negotiation between Bob and Alice They end up using the extensions whose option tags are foo1, foo2, and foo4 Figure 4. 28: Extension negotiation in SIP 80 CHAPTER 4 SESSION CONTROL ON THE INTERNET 4. 11.1 New Methods In addition to option tags, SIP can be extended... Upon reception of the 100 Trying response the user agent client knows that the next hop (i.e., the proxy) has received the request Figure 4. 32 shows the previous hop-by-hop message followed by an end-to-end message In the end-to-end message the user agent server, upon reception of the ACK request, knows that the remote end (i.e., the user agent client, as opposed to the proxy) has received the response... (Figure 4. 22), which is relayed by the proxy to Bob (Figure 4. 23) 74 CHAPTER 4 SESSION CONTROL ON THE INTERNET REGISTER sip:domain.com SIP/2.0 Via: SIP/2.0/UDP 192.0.0.1:5060;branch=z9hG4bKna43f Max-Forwards: 70 To: From: ;tag =45 344 8 Call-ID: 843 5286376 842 30998sdasdsfgt Cseq: 1 REGISTER Contact: Expires: 7200 Content-Length: 0 Figure 4. 17:... the session, the caller prefers not to establish the session at all The extension that allows user agents to express preconditions is defined in RFC 3312 [103] This extension, whose option tag is precondition, is a mixture of a SIP extension and an SDP extension; it defines a SIP option tag and new SDP attributes Therefore, this extension can only be used with sessions described using SDP Other session. .. “end-to-end” here refers to the fact that a reliable transmission for the message is provided end-to-end (by the user agents) and not by the proxy servers in the path Still, all proxy servers in the path handle the message, as shown in Figure 4. 32 4. 13 RELIABILITY OF PROVISIONAL RESPONSES 83 Figure 4. 32: End-to-end transmission in SIP Coming back to the provisional responses (other than 100 Trying), there . Edition Gonzalo Camarillo and Miguel A. Garc © 2008 John Wiley & Sons, Ltd. ISBN: 97 8- 0- 47 0- 5166 2- 1 60 CHAPTER 4. SESSION CONTROL ON THE INTERNET v=0 o=Alice 2790 844 676 2867892807 IN IP4. example, the header fields below describe the SDP session description of Figure 4. 1: Content-Disposition: session Content-Type: application/sdp Content-Length: 193 The Content-Disposition indicates. characteristics of the ongoing sessions and terminate the session whenever the user wants. 4. 1.1 Session Descriptions and SDP A session description is, as its name indicates, a description of the session to