8 Chapter 2 www.newnespress.com 2.1.1 The People and Their Devices: Phones Phones come in a number of shapes and sizes. Some have the latest features for consumers, such as music playing, video recording and camera functions, global positioning, and touch screens with tactile feedback. Some are designed for enterprise users, and have large screens, with strong email access and integration, spreadsheet and document editing capabilities, and large storage for use as a computer away from the laptop. Others are simple and rugged, meant for use in physically demanding environments where the phones need to withstand a beating. Some have nearly no buttons at all, and are even designed for nearly hands-free operation. All in all, each phone may seem wildly different from the next. But, underneath, they are made of the same stuff: a microphone to pick up voice; a speaker to play it back; maybe another speaker for speakerphone operation or to play ringtones, so as not to deafen the user who happens to have accidentally hung up on a conversation and is being called back; some way to dial; some way to see or hear who is calling; a battery for mobility; and one or more radios to connect back to the network. Within those components, the description becomes even more common. There is a digital sampler and a codec engine, to convert voice into digital data and back. There is a CPU somewhere, orchestrating everything, along with memory and some nonvolatile storage. The radios have antennas, all folded neatly into the small device. Voice operates the same on every one of these devices, and users will become just as irritated by poor audio quality as they will be pleased by good quality. The best voice mobility network is the one that users forget is even there, or is anything unique or special. They get out of the way, so to speak, and let the voice mobility user do her work. 2.1.2 The Separate Channels: Signaling and Bearer In the analog telephone days, there was only one line per extension. This analog line has to do everything. It carries the voice, but it also has to ring the phone, send busy or dial tones, and handle the beeps corresponding to each button being pressed. With digital phone calling, the signaling and bearer channels are separated. All the beeping, humming, and chirping that is meant to tell the caller what is going on with the call is removed from the audio stream and sent separately in the signaling channel. The bearer channel holds the human voice, and nothing else. The advantage of this having been done for IP-based voice mobility networks is that it allows the call setup part of the network to operate differently from the voice encoding and decoding part. IP PBXs may be configured to never carry a single voice packet, because their job is simply to figure out how to route calls—much like how Internet DNS servers are so critical in figuring out what Voice Mobility Technologies 9 www.newnespress.com “www.google.com” refers to without carrying a byte of Google’s web traffic. Media gateways can be created that specialize in conversion of media formats, and then only need to implement the basic signaling protocols, and do not need to be concerned with advanced PBX features. 2.1.3 Dialing Plans and Digits: The Difference Between Five- and Ten-Digit Dialing For all the advancements in the digital age, with email accounts, instant-messaging handles, avatars, and what not, phones still work with the concept of dialing a series of numbers. But not all numbers are created alike. In the telephone network, someone needed to determine what all of the digits mean. This meaning is known as the dialing plan. Think of the dialing plan as a series of simple rules that tell the phone system when you are done dialing and where the numbers are to go. In the United States, the dialing plan for our public telephone lines specifies that every phone number is seven digits long. Type an extra digit, and the phone ignores it. However, some calls are not in the same area code. This area code concept is a part of the dialing plan. To get to other blocks of phone numbers, outside of the block of numbers that you can dial the most conveniently, you need to dial a “1”, followed by the area code, followed by the seven-digit number. Other calls require even more digits. An international call requires dialing “011” before the country code, and then whatever digits are necessary to place a call in that country. And the first “0” must be followed by the first “1” quickly enough to prevent the phone from thinking you are done, and connecting you to the operator. Finally, some calls, like “411”, require only three digits. In the office, things can be a bit more complicated. Many people may have four-digit extensions. Only those four need to be dialed. Some companies may use longer extensions, however, with access codes in front of them. Finally, to dial out to the public network, you may need to dial a “9”. But not just any “9” will do! The “9” must be followed by a pause, to let the system present a new, outside-world dial tone, where the rest of the digits can be placed. The dialing plan defines all of this behavior. Every PBX system provides an incredible amount of depth into how these dialing plans can be created, and whether some of the digits are just part of the extension number and others are meant to shift the call over to another PBX somewhere else (like the “9” did to dial outside, but even the “1” for long distance does the same thing) to figure out the meaning. A lot is mentioned about having four- or five-digit dialing within voice mobility networks. There is an added convenience, and it is true that users of a PBX may not remember the outside number corresponding to an extension, especially if the rest of the number is different for different extensions. (Picture a system in which the 6xxx extensions are reached from outside the office by dialing 487-6xxx, but the 7xxx extensions are reached by dialing 935-7xxx.) 10 Chapter 2 www.newnespress.com 2.1.4 Why PBXs: PBX Features PBXs serve as a lot more than just the anchor or administrative server of the phone network. They also provide a long list of features that people have come to expect from enterprise phone lines—features that they probably do not have at home, even with today’s rich cellphone feature sets. PBX vendors compete with each other by making the feature set as useful and fancy as possible. There are a number of important PBX features. Some are listed here: • Dial-by-name directory: A computer voice system that allows callers to find out an extension and dial it by interactively pressing a few buttons, usually the first portion of the name. This directory is driven by the autoattendant feature. • Autoattendant: The automated telephone operator, represented by a series of recorded prompts. Autoattendants allow users to access and even manage their account on the PBX simply by calling in. Autoattendants are also the anchor for the interactive voice response systems that outside callers might get into a call center line, whose PBX is advanced enough to guide callers through the menu of options. • Call forwarding: The user can set the line up to forward to another extension, or an outside line, rather than ring the phone. This is useful for when the user is out of the office. Call forwarding is also done automatically when the user does not answer the phone after a certain number of rings. • Find-me/Follow-me/Hunting: These three names for broadly the same feature allow the user to have a number of different alternative phone numbers. When the user does not answer his or her primary line after a certain number of rings, the system hunts down the list, forwarding the call to the next number until it gives up. • Simultaneous ring: Instead of hunting through a series of numbers, the PBX can call out to each of them at once. The first one to answer gets the incoming call. This is useful when the user has a desk phone and a mobile phone, or multiple other phones, and might be at any of them. • Call transferring: Allows the user to send the answered phone call to another phone. • Call park: Allows the call that is already in place to be placed on hold and transferred to another extension, where the user can remove the call from hold. Unlike call transferring, which would ring the other phone and cause the user to have to run until voicemail picks up, call parking allows the user to take more time. • Call pickup: Allows a user to answer another user’s phone when it is ringing by entering their extension number. It can also be used in the same sense as simultaneous Voice Mobility Technologies 11 www.newnespress.com ringing can, in that an incoming call to a department might ring multiple extensions, and the first to pick it up wins. • Do-not-disturb: Rejects the call before it rings the phone, usually sending it to voicemail or back to whomever transferred the call. Similarly, a user can often use this feature manually on an incoming call by pressing a button on the phone to terminate the incoming call and bounce it back. • Voicemail: Answers the phone and records a message. • Hold music: PBXs provide a series of options and selections for the caller to be subjected to while on hold. For some unknown reason, even advanced PBXs often play a short, few-second-long segment of supposedly relaxing music in an endless loop. Administrators can, however, often replace the hold music with a prerecorded selection. This is most useful for queuing of calls in call centers, where the hold music might be interspersed with the autoattendant informing the caller of the expected wait time. • Time-based policies: PBXs can change their configuration based on the time of day, routing calls to the autoattendant instead of the corporate operator, for example, after hours. • Conference calling: PBXs can join together a limited number of lines for ad hoc conferences, such as three-way calling, for which multiple parties are needed to be on at once. As you can see, PBXs are designed to have a broad series of functions. Thankfully, PBX features are generally independent of voice mobility networking, in the sense that every PBX has a good number of features, and these features will generally work on IP PBXs, no matter what IP-based protocol the user is using. On the other hand, fixed-mobile convergence (FMC) solutions and PBXs do interact, and we will discuss that later, in Chapter 7. 2.2 Signaling Protocols in Detail Signaling protocols, and the architectures on which they run, are responsible for carrying out the process of setting up a phone call. Their systems determine how to find out the network location of the other party being called, whether the other party can be reached or is out of the network, and help establish the flow of voice traffic. The concepts from signaling protocols are roughly the same across signaling protocols. There is the notion of a registrar, where the phone number is registered for an extension and is mapped to the current network address or location of the handset. This allows for the system to maintain a list of active or available phones. There is a gateway, which is responsible for bridging the signaling and possibly the bearer protocols for a call between different formats and networks. Within every system, there needs to be a way to discover 12 Chapter 2 www.newnespress.com the location of users and their availability. Furthermore, there must be provided methods to initiate a phone call, indicate that the other side is ringing, and once the call is connected, to manage the session, allowing parts of it to be renegotiated (such as a change in the bearer channel), additional callers to be brought into the call, and other services to be invoked. We’ll go through many of the major protocols, but will spend more time on the first, the Session Initiation Protocol (SIP), as that is the common protocol on packet-based legs of voice mobility networks. 2.2.1 The Session Initiation Protocol (SIP) The Session Initiation Protocol is, by far, the most common implementation for Wi-Fi-based phones. SIP runs over the User Datagram Protocol (UDP), and so can be run over any IP-based network. Transmission Control Protocol (TCP) is also an option, though it is not commonly used for plain SIP, given the shortness of a standard SIP message. SIP was created by the Internet Engineering Task Force (IETF), the group that standardizes basic protocols such as TCP, Hypertext Transfer Protocol (HTTP), and Transport Layer Security (TLS), among many others. The definition for SIP is in IETF RFC 3261. SIP is loosely based on the concepts of another popular Internet protocol, HTTP, used by web browsers and servers to access web pages. This means that SIP is constructed around a request and response model, where one side sends a request for an action for a particular resource, and the other side reports with a response, complete with response code. Every SIP device has the ability to operate as a requester and as a responder, depending on which device is initiating the specific request/response exchange. Furthermore, every SIP message is in text, and so is theoretically human readable. (When you see some of the text that is used in a SIP message, you may beg to disagree!) The goal of SIP is to provide a simpler method, compared to the prior H.323 protocol (in Section 2.2.2) and others, for performing the basic tasks of call signaling. The introduction of SIP opened up the development of softphones, or applications that run on computers and devices not originally designed for telephone usage, to interact with calling services and act like real phones. Even the Microsoft Messenger got into the act and used SIP for instant messaging and chat. The most interesting part of SIP is that a world of open-source or low-cost applications came into the industry, spurred on by its simpler, easier-to-use interface, free from significant intellectual property encumbrances. Now, major digital PBX vendors have gotten into the act, offering SIP services on their systems, eager to allow nontraditional devices onto networks created by their equipment. 2.2.1.1 SIP Architecture The SIP name for a handset is a user agent. A user agent is an endpoint in the SIP communication, applying to both handsets and servers, and is capable of dialing out or Voice Mobility Technologies 13 www.newnespress.com receiving phone calls. User agents have IP addresses, and also have users. The users are identified by a SIP Uniform Resource Identifier (URI). These look like web URLs, and are based on the same concept, but apply to the domain of telephone calls, rather than web servers. A URI for a typical caller might look like the following: sip:5300@corp.com This looks like an email address, but it is preceded by the “sip:” marker (in the same way as web pages are proceeded by the “http:” marker). The 5300 marks the phone number, and the @ sign and everything following it represents some notion of the system that the phone number lives on. More often than not, users can ignore the @ sign and the remainder of the string, and concentrate on the phone number before it, just as email users can on a corporate email network. The fact that the URI looks a lot like an email address lets you know that SIP can also use text “phone numbers,” such as sip:bob@corp.com which requires the phone user to be able to type in letters rather than numbers, but performs the same way. SIP phones register their presence with a SIP registrar. Registrars perform one of the major functions of the PBX, which is keeping track of phones, the users, their capabilities, and locations. The registrar is how one phone knows that another phone exists. When a phone is first turned on, or it changes IP addresses or networks, it registers itself with the registrar. Before doing that, phone calls placed to that number will be rejected, or possibly sent to voicemail. After registration, however, a phone call to the number will be sent to the registered phone. This raises the question of whom a phone sends requests for phone calls to. The registrar needs to get its registrations out into the network, so that a placed phone call can find its way to the right party. This is where the second concept, the SIP proxy, comes in. The SIP proxy’s job is to take requests for phone calls, look up the location of the called party in some database—the one created by the registrar would be ideal, but not required—and forward the call signals appropriately. In this sense, the SIP proxy is the switch, or PBX, for the signaling protocol. Registrars, in fact, are generally integrated into the SIP proxy, making for one device that performs the functions expected of a PBX, including endpoint, or extension management, permissions-checking, logging, and so forth. The SIP proxy is called a proxy, however, because it does not exist transparently in the process. Rather, its job is to act as a server for the calling party and a client for the called party, responding to the caller’s requests by creating nearly identical ones of its own and sending them to the called party. This looks a lot like a web proxy, which is intentional. We will get to the mechanics of SIP signaling shortly. 14 Chapter 2 www.newnespress.com SIP does not get involved with the actual carrying of voice. In fact, it is not voice-specific, and works just as well for video calls. We will look at how the SIP signaling protocol specifies the different bearer protocol used for the voice (or video) call. One other thing that SIP was not designed to do is phone conference management. SIP is fundamentally call- based, and so is great for phones setting up a call into a conference server. However, the conference server is expected to have some other intelligence built on top that lets it tie the calls together into a conference and knows which users manage the conference and which do not. Figure 2.2 shows the architecture diagram for SIP, mapped to the standard PBX model. Proxy Registrar Phone Media Gateway Public Switched Telephony Network (PSTN) SIP over UDP Call Setup Messages SDP-specified RTP Bearer Traffic SIP over UDP Registration Messages Telephone Lines User Agent User Agent Phone Extensions Dial Plan Figure 2.2: SIP Architecture Voice Mobility Technologies 15 www.newnespress.com SIP is based on the concept of a caller inviting the other caller to join the call. Once the invitation goes out to the proxy, who knows where the other party is located, the endpoints and the proxy exchange messages until the call is established. Each invitation, and its successful response, both carry information that is used by other, non-SIP parts of the phone, to establish the bearer channels of the call. Invites are not just for new calls. A phone is allowed to send a new invite to a party while it is in the middle of a call to that device. This would be done when the caller wants to renegotiate the bearer channel, or perhaps to tear it down, such as when a call is placed on a silent (no music) hold. SIP is heavily oriented toward the notion of the proxy. The proxy, being the switch or PBX, can take care of complex routing decisions that phones should not be bothered with. One wrinkle to this is what two phones do once they find out about the other one’s addresses. Some SIP proxies will allow the contact information (which IP address an extension is currently at) to pass through the call, from one party to the other. This allows the two endpoints to take over after the call is set up, and exchange messages exclusively with each other. In this description, however, we will focus on proxies that intentionally hide the addresses of one side from the other. Doing so ensures that the PBX is always a party to every call, making network design simpler and enabling the PBX to support a larger number of features than if the clients communicated peer-to-peer. Media gateways appear, in SIP, just as ordinary endpoints. The difference lies in how the registrar and proxies treat them. The proxy will know to forward all phone numbers in the dialing plan that must go to the next network (such as outside calls) to the media gateway, as if the gateway had registered for that number. Incoming calls from the other network operate in the same way as outgoing calls do from a phone: the call is routed to the proxy. In this way, the same protocol can work for bundles of lines or general routes as easily as it can for simple devices. SIP includes provisions to allow for user authentication, and for encryption of parts of the packets. 2.2.1.2 SIP Registration As mentioned before, the SIP registrar knows about the existence of a phone by the process of registration. When the phone is turned on, or when it changes its network address, or when its old registration has expired and it needs to refresh it, the phone sets up a SIP request to the registrar. This means that the SIP phone must know which IP address the registrar is at, as that registrar becomes the constant point of contact for the network. Because registration is so important, we will use the SIP registration process as our way of understanding the format of SIP messages. For the examples in the section on SIP, we will use the following: 16 Chapter 2 www.newnespress.com SIP Registrar and Proxy: Name: corp.com. Address: 10.0.0.10 Phone 1: Number 7010. Address: 192.168.0.10 Phone 2: Number 7020. Address: 192.168.0.20 Let’s look at our first SIP message, then. SIP is sent in UDP packets to port 5060, and so the contents in Table 2.1 show the payload of the UDP packet, sent from Phone 1’s IP address at 192.168.0.10, port 5060, to the registrar’s IP address at 10.0.0.10, port 5060. Table 2.1: SIP REGISTER request REGISTER sip:corp.com SIP/2.0 Via: SIP/2.0/UDP 192.168.0.10:5060;branch=z9hG4bK1072017640 From: “7010”〈sip:7010@corp.com〉;tag=915317945 To: “7010”〈sip:7010@corp.com〉 Call-ID: 1422523958@192.168.0.10 CSeq: 1 REGISTER Contact: 〈sip:7010@192.168.0.10〉;expires=3600 Max-Forwards: 70 Content-Length: 0 This is all text, with newlines given by a carriage return and linefeed, just as with HTTP. It is structured the same way, as well. The first line begins with the action, in this case, to “REGISTER.” The URI for the registration is that of the registrar, which is “sip:10.0.0.10”. Finally, the version is “SIP/2.0”, meaning, understandably, SIP 2.0. This message is a request to register with the registrar. The rest of the lines are presented as SIP (HTTP) headers. That is, there is the text string naming the header, followed by a colon. The first header is the Via header, identifying the most recent sender of this message. Remember that all messages could potentially be proxied in the protocol, and the Via header allows the receiver to understand why the IP sender of the message is involved in sending it, especially if the From line doesn’t match. In this case, the Via header just specifies the phone who sent the REGISTER message, as no one proxied it. The line can be broken down as follows. “SIP/2.0/UDP” just repeats that the phone sends UDP. “192.168.0.10:5060” is the IP address and UDP port of the phone. With this information, the recipient—the registrar—knows that the response has to go to 192.168.0.10:5060 using SIP 2.0 on UDP. The registrar has to use this, and not the IP and UDP sender (which is identical, of course), as this allows messages to be routed in stranger ways. Think of the Via as a “Reply-to” header from email. The last piece, the “branch” part, specifies a unique identifier for this request/response transaction. (The semicolon sets the branch and other pieces that might follow aside from what came before it, and the equal sign sets the value of the branch, until the end of the line or another semicolon.) Voice Mobility Technologies 17 www.newnespress.com Because UDP has no real concept of a connection, this branch parameter is used to establish that concept. The next line is the From line, which specifies the identity of the user agent making the transaction. This line looks like a From email header, for good reason. The quoted “7010” is the user-displayable phone number. Just as with email addresses, in which the account name may be “bob@corp.com” but the person’s name would be “Bob Baker,” a user might have a different name that the callers see than that of the SIP account he uses. The “〈sip:7010@corp.com〉” is the URI for the account, set aside in angled brackets. Finally, the “tag” serves the purpose of identifying the overall call sequence for this series of requests. Whereas the branch strictly identifies the request/response pair, the tag identifies the entire sequence of requests and responses that make up one action between callers. The To line is similar to the From line. Here, there is no tag yet, because the “called party”—because this is a REGISTER, that party is just the registrar, and there is no real call from a user’s point of view—is required to pick its own tag. The Call-ID is unique for the particular call from that caller, and is given in the similar email-address format, with the IP address of the caller defining the part after the @ sign. The CSeq field defines where we are in the back-and-forth of the particular action. The value of “1 REGISTER” tells us that this is message one of the handshake, and this is a REGISTER message. These are useful for human debugging of call problems, as it tells you where you are in the process, even if the earlier parts of the process are missed. All of this previous stuff is just mechanics. The important part of the REGISTER message comes now. The Contact field tells the registrar that this is a registration for “〈sip:7010@192.168.0.10〉”, meaning that the phone number is at 192.168.0.10, and goes by the name “7010”. It is actually possible for one user agent to have multiple phone numbers, and this registration is for the one and only one phone number here. The “expires” tag states that the registration expires 3600 seconds, or one hour, from now. The Max-Forwards header just states that any intervening proxy can proxy this message, for a total of 70 times, after which, the message is dropped. This protects the network from times when a proxy might be misconfigured to forward a message back along the path from where it came. The Content-Length states that there is no SIP message body. Message bodies are used in INVITEs, which we will see later. Now that the registrar has received the request, it will send a response. The response lets the client know that the registration went well, or had an error. Table 2.2 has the response. The first line has the response. “SIP/2.0” is the version, but more importantly, “200 OK” means that the response was a success. Registrars can fail with different codes, such as . SIP name for a handset is a user agent. A user agent is an endpoint in the SIP communication, applying to both handsets and servers, and is capable of dialing out or Voice Mobility Technologies. capabilities, and large storage for use as a computer away from the laptop. Others are simple and rugged, meant for use in physically demanding environments where the phones need to withstand a beating dial tones, and handle the beeps corresponding to each button being pressed. With digital phone calling, the signaling and bearer channels are separated. All the beeping, humming, and chirping