340 Chapter 8 www.newnespress.com parties, although everyone must support 32 and 64 is recommended. The idea of the window is that the receiver keeps the highest sequence number that it has seen from a packet that has been successfully authenticated. (Forgeries may try to push the window around, and so must be ignored for setting the window.) Any packet received with a sequence number older than the current receive one, minus the window size, is dropped right away. That leaves the packets in the middle of the window. For those packets, a list of sequence numbers already seen is kept. If the packet with the same sequence number comes in twice, the second one is dropped. Otherwise, the packet is allowed in and its sequence number recorded. IPsec is flexible enough to allow for a number of different encryption and authentication protocols to be negotiated. Common encryption protocols are 3DES-CBC and AES-CBC. A common authentication protocols is HMAC-SHA1. Recall that an HMAC is a special type of signature that requires a private key to validate. If a message is received, the key and the packet data together produce the signature, which is then compared to the one on the packet. If they match, the sender has the right key. So the possession of the key by the sender is proof of the authenticity of the packet. 8.3.1.1 IPsec Key Negotiation Because IPsec is only a transport, there must be a protocol to set up the tunnels. The simplest protocol allowed is to use none, and IPsec connections are allowed to be set up on both sides manually. However, it is usually far simpler for management of the connections to use some sort of user authentication and negotiation protocol. The Internet Security Association and Key Management Protocol (ISAKMP) is used between devices to negotiate the type of IPsec connection and to establish the security association. The two endpoints decide on the type of tunnel, the type of encryption or authentication algorithm to use, and other parameters using this protocol. ISAKMP is defined in RFC 2408 and uses UDP port 500 for its communication. Related to ISAKMP is the key exchange protocol itself, the Internet Key Exchange protocol (IKE). IKE takes care of the key exchange portion of the setup, and thus piggybacks with ISAKMP as a part of the setup. ISAKMP has similar message exchanges as the other security negotiation protocols do, including certificate requests and responses, nonce exchanges, and capabilities exchange. ISAKMP is a complex protocol; it would not be useful to go into the same level of detail for ISAKMP here as it was for TLS. However, let us take a look at the basic exchange. The first phase for ISAKMP is for one endpoint—usually a VPN client—to reach out to the authenticating server. The first message sent contains a significant amount of information. When using Aggressive mode, which is common because it reduces the amount of messages that have to be exchanged, the first message contains nearly everything needed in one shot Securing Voice 341 www.newnespress.com to set up the connection. The first major piece of information in this message is the proposal list for the algorithms to use for authentication or encryption, for the ISAKMP/IKE exchange itself. This list is an ordered set of the combinations of encryption and authentication payloads methods that the client wishes to request, as well as the authentication methods and the expected key lifetimes. The next important set of information kicks off the key exchange. This key exchange starts off the key negotiation, using Diffie-Hellman keys to create a session key. A nonce is included, followed by the identification of the endpoint, and a number of options. The next phase, in aggressive mode, comes back from the server. This selects the IPsec encryption and authentication that will be used, and how the user is to authenticate. Following this is a nonce, then the server’s identity. Options conclude the packet. At this point, the ISAKMP/IKE session can be encrypted. The third phase involves the two endpoints establishing the IPsec security association proper. The two endpoints select what IPsec authentication and encryption mechanism they will use. After this has completed, the information is pushed down to set up the IPsec connections themselves. User (not packet) authentication can occur one of a couple of ways. Each side can have a preshared key, which is then used in the validation of the ISAKMP session. Or, each side can use certificates. The certificate-based scheme is undoubtedly more secure, but is harder to manage. This is precisely the same tradeoff that is experienced on link protection, such as using 802.1X verses pre-shared key for Wi-Fi. 8.3.2 Application-Specific Encryption: SIPS and SRTP The difficulty of using end-to-end encryption is that all of the endpoints must support it, which may not be the case. Instead, protocol-based security can be used. For SIP, one option is to use SIP over TCP, protected by a TLS session. This is identical to the approach HTTP uses for protection, by using TCP and requiring a TLS negotiation first. The advantage to doing this is simplicity, as TLS is a well-understood technology, and vendors do not have a difficult time implementing it. Furthermore, using TLS allows the voice mobility administrator to enable the built-in SIP authentication system, based on WWW digests, without fear of eavesdropping. Using SIP authentication greatly decreases the complexity of an authentication-based network, because SIP clients are far more likely to support it out of the box. The major issue with SIPS, or SIP with TLS, is that the processing requirements on the PBXs go up significantly, which may affect the scale that the PBX can operate at. Protecting SIP does nothing to protect the payload. To protect the bearer channel, SRTP is an option. SRTP uses a AES (and only AES) encryption to encrypt each RTP packet. The 342 Chapter 8 www.newnespress.com AES encryption is used in a stream setting, by running in counter mode, which ensures that AES can be restarted if intervening packets are lost. SRTP is effective, in that it protects the packets from eavesdropping and modification. 8.3.3 Consequences of End-to-End Security There is a major consequence of using end-to-end security. Devices that may not have been built for fast cryptographic operation, such as PBXs and media gateways, will be forced to use computationally expensive protocols on the real-time voice path to ensure privacy. The protocols themselves are not ubiquitously supported. For example, IPsec is common for VPNs, and can be used from router to router or even laptop to laptop, but phones are unlikely to have a VPN client at all, let alone one that is fast enough to be appropriate for real-time voice traffic. PBXs are less likely to support IPsec. Using application specific security makes more sense in this case, but even then, the protocols are relatively new, and are not commonly deployed. For this reason, it is usually far easier to dispense with end-to-end security, and instead to focus on protecting the mobile, exposed portion of the network. 8.4 Protecting the Pipe The pipe, in voice mobility networks, can be a number of things. When the mobility network is heavily wireline, the problem becomes authenticating over Ethernet. (Encryption for wireline networks is considered less necessary.) When voice mobility uses Wi-Fi, the problem transforms into finding the right WPA2 settings for both authentication and encryption. When traffic is coming in from the outside world, using fixed-mobile convergence solutions or remote access clients, the pipe that needs protecting crosses the Internet. The advantage of protecting the pipe, and not the entire path, is that the part of the path most vulnerable can be addressed using specific, dedicated security infrastructure, whereas the less vulnerable parts can be placed in physically or logically secure networks. This allows “legacy” voice mobility equipment to have a high chance of operating with strong security. For wireline networks, especially dedicated wireline voice networks that have exposed jacks, one of the major concerns is that someone might plug into the voice network rather than the data network, either by mistake or to cause mischief. To preserve the sanctity of the voice wireline network at the edge, one solution available is to use 802.1X on the wireline ports. 802.1X works on wireline in almost the same way as it does for Wi-Fi. (See Chapter 5 for details.) The major difference is that the end of the 802.1X EAP exchange does not lead to a continuation into any sort of key exchange or encrypted session. Rather, the edge Securing Voice 343 www.newnespress.com switch, acting as authenticator, unlocks the port for use for more than just authentication. The issue with using 802.1X authentication for wireline networks is that the desktop phones may not support it. In that case, a practical, though not terribly secure, alternative is to implement MAC address filtering on the switch. This can be done per port, or better, switch-wide. The goal is to only let phones onto the network, and ensure that any traffic that ends up on the network that comes from an accidentally connected device is dropped before it starts consuming resources. One of the biggest concerns on that front is that a client may come in and exhaust the phones’ DHCP address space. This can happen when the accidentally connected laptop is looking for an IP address in a specific range, and gets a completely different one from the DHCP server. If the client ends up rejecting that address for being in, say, a private address range that it has been configured not to use, there is a chance that the device will try again. If this happens enough times, the DHCP server will lose all of its addresses, and any phones that get plugged in or introduced to the network wirelessly will not be able to gain basic connectivity. Wi-Fi security is a must. Chapter 5 went into significant detail on how preshared keys work, compared to usernames or certificates. The advantage of using Wi-Fi’s own security, rather than an end-to-end piece, is that the phone is likely to have a high-performance security function built into the Wi-Fi chip, just for WPA2. This is because Wi-Fi certification requires that every device support WPA2, and every Wi-Fi chipset manufacturer embeds just that process into the chips that they make. Phone manufacturers need only turn on those features; there is no heavy lifting that needs to be done. Compare this to SRTP, for example, which requires that the voice coder engine, which is usually an optimized engine for producing real-time payloads, must also either know how to encrypt the traffic by itself or must pass it along to a slower software process to encrypt. This can cause significant battery drain on the phone, if such a configuration is even supported. FMC solutions beg the use of a remote security product. Again, the physical and resource capabilities of the phone come into play here. Some phones do, in fact, have VPN clients, which can be used for access into the enterprise. These VPNs terminate long before enterprise server infrastructure is reached. Running voice protocols, using FMC soft clients, over the VPN can make sense, although the common mode of operation is for voice to remain on the mobile operator’s network and for data to go through the VPN. One interesting twist, however, is that voice devices that have VPN capabilities must have the VPN logged into when the user is on the road. They can sometimes be configured to log in by themselves, but more often than not, they must be enabled manually. This is especially important for converged, dual-mode phones that also operate within the enterprise. When in the enterprise, associated to the corporate Wi-Fi network, there is no need or benefit for enabling the VPN link. In fact, the VPN server may not even be accessible from within the network, as its major interface is meant to point outside, to the Internet. Because the VPN should be on for some uses of the Wi-Fi network and not others, it can stand in the way of 344 Chapter 8 www.newnespress.com convincing users to access the enterprise network on the road, reducing the productivity gains that the FMC solution was looked at for in the first place. One option that many Wi-Fi infrastructure vendors offer, which takes advantage of the concept of protecting the pipe and not the end-to-end application, is for remote users to be offered remote access points. These access points are similar to the campus access point, and yet are designed for operation when on the road. The remote access point is essentially a VPN client and a normal access point combined. The access point’s VPN client tunnels through the Internet to the corporate network, where it terminates at the wireless controller. Once connected to the controller, the access point pulls the same enterprise configuration down as it would if it were in the office, and provides it to the remote user. This way, the remote user can use the same cellphone as on campus, with the same WPA2 security policies, without having to be bothered with the VPN. This does provide a measure of privacy from the phone to the physically protected office. 8.5 Physically Securing the Handset The handset itself is still a weak link. Handsets are designed to be portable, so in that sense, they are also designed to be transported away from their rightful owners. Many converged handsets are quite impressive in their capabilities, and so they make for an interesting target for thieves. Furthermore, busy, high-productivity enterprise users are unlikely to set up a complicated, strong phone lock password. It is simple to imagine a hospital on-call attending physician not wanting to be bothered with typing an eight- character, letter, number, and punctuation-mark password on a phone with itself no more than 16 buttons. The problem becomes, then, how to prevent phones from being stolen in the first place, and how to take care of the problem once they have left the building. 8.5.1 Preventing Theft There are many practical considerations for preventing the theft of a voice mobility handset. Unfortunately, limiting the network access is not one of them. Take Wi-Fi-only handsets as an example. It is pretty clear to the voice mobility administrators that a Wi-Fi-only handset will not be of much use at a person’s home, or a hotspot, or even another office. Most phones provide tight administrative privilege requirements to change the Wi-Fi network and SSID that the phone uses. Even if someone can accomplish the feat of penetrating those restrictions, the phone will work only with the PBX it was configured for, unless someone changes it. They do not perform well over the Internet, and are not going to be useful as a personal wireless voice phone for someone’s house. However, the devices do look like cellphones, and the people who might take one such phone are not likely to know or care about the difference until they have the phone securely in their possession. Securing Voice 345 www.newnespress.com Some common-sense approaches can work. Sticking labels onto phones that state just the fact that they will not work outside can have a slight deterring effect, much the same that which restaurant pagers’ warnings have on their patrons. A more workable approach is to use telephones that do not look like telephones. In the limited environments where voice mobility can be conducted without outdoor support, such as warehouses or hospitals, specialized devices like two-way communicator badges or ruggedized handsets can discourage some amount of theft. For environments where devices have specialized chargers, a daily checkout policy and a central charging station can at least keep track of the phones that do exist and strongly discourage users from taking devices with them to places where they might get stolen. A good example of this is with nursing—the nurses’ station makes an acceptable location to place the chargers. But, ultimately, if someone wants to take the phone, they can. A better way to protect against theft is to detect theft. Theft detection can be performed somewhat readily on converged, dual-mode phones or Wi-Fi-only phones using location tracking. Location tracking is a feature of Wi-Fi networks or of overlay systems that use Wi-Fi for monitoring, where the rough positions of each device are recorded within the system. Location tracking systems can be built with automated policies, such as email alerts that are sent when devices enter or exit certain positions. There are a number of networks that use the location tracking system to monitor the exits to the buildings, to send alerts if a device passes through there when it should not. This can send an email out to the person who owns the phone, informing them of its activity, and gently reminding them that the phone is not to leave the building. Such as system seems like it may do nothing for people who intend to steal the device and already have it within their possession, as removing the battery or powering down the device will disable the location tracking. Another option is for the system to then send a message if the phone is taken off the air. In many of these environments, phones are generally kept on 24 hours a day, and so it is possible to come up with the right set of rules to make a deterrent useful. Unfortunately, there is no foolproof way to prevent theft of voice mobility devices. Making sure that the devices do not look like high-end cellphones, unless the users need those high-end features, and a stronger educational campaign about locking phones and keeping track of them, are likely to be the most effective. The second half of the discussion is what to do when the device is stolen. This question is either more simple or less simple than it looks, depending on whether the device has a cellular radio. If the device does have a cellular radio, then the options are wider. Cellular phones, even dual-mode phones, connect to the mobile network. Once connected, a phone that was reported stolen can be disabled remotely by the mobile operator or administrator, and even potentially tracked, if the phone is being used in the commission of a serious crime. On the other hand, if the phone does not have a cellular radio, then there is 346 Chapter 8 www.newnespress.com a good chance that the thief will not be able to use the phone for much of anything, so it may already be in a state that is close enough to disabled for the comfort for the administrator. This is a reminder for voice mobility administrators to explore the encryption options for smartphones that may be used with the enterprise network. Phones set up for encryption may find that the information locked up in the stolen phone will be lost, but it is better than the information being exposed. 8.6 Physically Protecting the Network Physically protecting a voice mobility network is a very strong way to preventing unauthorized access. People may be tempted to leave voice mobility networks more exposed than their data counterparts, because the only traffic flowing across them is real time call data, and not, say, highly sensitive corporate strategies. Nonetheless, it is important that voice mobility networks be given the necessary physical security to prevent problems with direct intrusions. The wireline portion of the voice mobility network should be treated with the same level of concern as the service network for data usage would. Most IT organizations are good at ensuring that email servers are locked up, if not for security, then at least to prevent accidental disruption. The same goes for IP PBXs and gateways. However, there has been a historically different way of thinking about voice networks compared to data. Data networks invariably terminate in a switch that is placed in a locked switching closet. Voice lines, however, used to terminate in a series of punch blocks, which had been placed in locations convenient for the wire-pullers. If a network is subsequently upgraded to voice over IP, those same locations may be used for placing the Ethernet switches that concentrate the desk ports and send them to the voice network, towards the PBX. Clearly, those areas should be kept locked with the same scrutiny. Another area of concern is with the accidental confusion of voice and data network services. Voice networks should always be kept physically separate and distinctly marked from data networks. Before wireline voice over IP, the phone port was never electrically like a data port. A user making an error in reconnecting the devices on his desk would have found that the phone and computer would both not work if the machines were plugged in the wrong way. But, unfortunately, with voice-over-IP services, it is possible for the user to get it wrong and still have the appearance that everything is working correctly—that is, at least, until he tries to place a call. This is a different aspect of physically protecting the network. Previously, we saw an example of how a misconnected device can exhaust network resources, such as with accidental DHCP exhaustion. Physically protecting the port of the network would also solve that problem. The simplest way to do that is not necessarily to Securing Voice 347 www.newnespress.com secure the jack but to place a wall plate over it that makes it difficult to unplug the phone. This serves as a potential deterrent to accidental swapping. On the wireless side, however, real physical security is a necessary. Again focusing on Wi-Fi deployments, the fact that the signals do pass through walls and outside the building requires that the network be well planned for security. Depending on the nature of the environment, even strictly followed WPA2-Enterprise security with certificate exchanges using TLS can reveal information about the caller that should not be exposed. Given that a reasonable number of voice mobility devices use preshared keys, and not WPA2-Enterprise, and a physical security approach can help provide an additional layer of protection. Furthermore, there are a few environments where it is important that the very fact that a user places a phone call should be hidden. Voice traffic over Wi-Fi is designed to be distinctive, with a regular pattern of fixed-rate, fixed- length frames coming on high-priority services tipping any observer off that a call has been initiated. Here, the concern is the exposure to the outside areas of the building of the in-building voice mobility network. Physical layer Wi-Fi firewalling solutions have recently been introduced that use RF activity to mask the presence of the network and its traffic to different physical regions. By deploying these physical layer blocking systems on the outside walls of the building, the systems can provide a curtain that separates the inside, where the network is accessible, from the outside, where the network is not even recordable. This is an inherently different solution from attempting to use specialized antennas and beam technologies to concentrate the Wi-Fi network towards the inside of the building. Doing the latter does not prevent an attacker from recording the voice traffic. It requires only that the attacker get a slightly bigger antenna, with a dB gain improvement for every dB of isolation that the network installer was able to provide. Using RF firewalling instead blocks the signal from being intelligible past the curtain of coverage, preventing eavesdroppers from having useful access to the leaked signals from within the building. Be careful that some products may be labeled “RF firewalls” if they simply use the location of the device to influence the firewall policies of the network. These are not true RF firewalls, since they do not provide any security at the RF layer itself, and thus are completely vulnerable to the passive leakage attacks mentioned here. 349 CHAPTER 9 The Future: Video Mobility and Beyond 9.0 Introduction This entire book has so far been looking at the present day, with voice mobility taking center stage. But mobile devices have begun the dramatic transition from voice-only phones to multipurpose, “converged” systems that fit in the pocket but perform the work that a laptop computer would have just a couple of years ago. The main difference is that the type of applications that these devices run has expanded. From a quality-of-service perspective, voice is no longer alone as the main application. Video is here, in the form of webcasts, corporate events, and videoconferences. In this chapter, we will look at video mobility, building upon the concepts already covered. Then we will look towards the future, and try to see where mobility may be going, in the enterprise. 9.1 Packetized Video Video is an interesting thing. Besides that it contains voice as a proper subset, video also is responsible for carrying quite a bit more information. Whereas voice recording and playback technology was perfected in miniature form decades ago, video is always a work in progress, needing bigger and bigger screens with higher resolutions and more sharpness. In some senses, it’s hard to imagine how video and mobility go hand in hand, or simply in your hand, as video requires watching on a device that is not constantly shaking and jiggling, and requires nearly constant attention, whereas voice can be used on the run. However, video has the ability to connect with the user in a way voice can never. If a picture is worth a thousand words, a moving picture should be worth at least a thousand times more again. More practically speaking, videoconferencing and webcasting has become increasingly attractive, as travel budgets constrain companies from hosting large fly-in gatherings, and video technology has improved in concert. In the end, voice and data have already been able to make the transition from wires, but as video becomes more prevalent as a part of networking in general, the video must naturally follow the user, and thus become wireless as well. ©2010 Elsevier Inc. All rights reserved. doi:10.1016/B978-1-85617-508-1.00001-3. 350 Chapter 9 www.newnespress.com Let us look into some of the fundamental differences between voice and video. The first difference is the most obvious: video requires significantly more throughput than voice. Video has to carry the moving picture along with the same voice stream as before, and so the overall content must be quite a bit larger. In voice mobility, the throughput of voice is usually not the constraint except in very large voice aggregation centers. Instead, voice makes its presence known by its increase in the number of packets over the network. Video, on the other hand, is bandwidth constrained from the outset. The second difference is that video requires synchronizing multiple media streams. This impact is felt especially when loss rates rise or bandwidth constraints are hit, and some part of the video must be sacrificed. In the ideal case, the software on the endpoints keeps everything in synchronization, but this can often slip when poor network quality or lack of capacity begins to challenge the ability of the video client to find its way without all of the appropriate information. The third difference has more to do with how video is used today. For the most part, video is one-way. The user watches a video that is being streamed. Because of this, video can build up reasonable latency, and so video is not as latency- constrained as voice is. This may change, as mobile devices are given more sophisticated cameras and videoconferencing on a mobile device becomes possible. But, as of today, video is still fundamentally a broadcast mechanism. And for this reason, video also differs from voice at a fundamental network level, because it can be effectively multicast over the network. The final difference is that video can be significantly more sensitive to loss than voice. Without the two-way conversation, lost information may not be covered up by asking the other side to repeat, and the user may end up with a poorer opinion on the quality of the network. 9.1.1 Video Encoding Concepts As mentioned earlier, video has many more dimensions (quite literally) of information to it than voice does. Video is nothing more than a series of still pictures, whereas voice is a series of nothing more than still point-in-time readings of sound pressure: just one small number at a time. Where voice has just this one sample at any given time, video has the entire picture at that time. This picture, itself, is made of two dimensions of pixels, or small areas that possess the same color. Let’s dig into this a bit more deeply. A picture, or still image, has hundreds of thousands of pixels, or picture elements, arranged in a rectangular grid. Each pixel has some fixed dimension, often measured in millimeters, with the most important aspect being the ratio of the width to the height of the pixel. The entire screen is made of hundreds or thousands of pixels in each of the horizontal and vertical directions, resulting in images with a given resolution. Common resolutions are the small 640×480, meaning 640 pixels wide by 480 pixels high; 1280×720, used in 720p high-definition; and 1920×1080, used in 1080p and 1080i high-definition video. Resolution can also be a function of the actual number of . hard to imagine how video and mobility go hand in hand, or simply in your hand, as video requires watching on a device that is not constantly shaking and jiggling, and requires nearly constant. the theft of a voice mobility handset. Unfortunately, limiting the network access is not one of them. Take Wi-Fi-only handsets as an example. It is pretty clear to the voice mobility administrators. the endpoint, and a number of options. The next phase, in aggressive mode, comes back from the server. This selects the IPsec encryption and authentication that will be used, and how the user