240 Chapter 6 www.newnespress.com the root cause is for being out of range of the access point. Again, the thresholds required are not typically visible or exposed to the user or administrator. Voiceclientstendtobemoreproactiveintheprocessofscanning.Thetwomethodsjust described are for when the client has strong evidence that it is departing the range of the access point. However, because the scanning process itself can take as long as it does, clients may choose to initiate the scan before the client has disconnected. (This may sound like the beginnings of a make-before-break handoff scheme, but read on to Section 6.2.3, where we see that such a scheme does not, in fact, happen.) Clients may chose to start scanning proactively when the signal strength from the access point begins to dip below a predetermined threshold (the signal strength itself is usually measured directly for the beacons). Or, they may take into account increasing—but not yet disruptive—losses for data. Or, they may add into account observed information about channel conditions, such as an increasing noise floor or the encountering of a higher density of competing clients, to trigger the scan. In any event, the client is attempting to make some sort of preprogrammed expense/reward tradeoff. This tradeoff is often related to the problems of handoff, as mentioned shortly. Scanning may also happen in the background, for no reason at all. This is less common in voice clients, where the desire to ensure battery life acts as a deterrent, but nevertheless is employed from time to time. The main reason to do this sort of background scanning is to ensure that the client’s scanning table is generally not as stale, or to serve as a failsafe in case the triggered scanning behavior does not go off as expected. One of the chief problems with determining when to scan is that the client has no way of knowing whether it is moving or how fast it may be moving. A phone held in the hands of a forklift driver can rapidly go from having been standing still for many minutes to racing by at 15 miles per hour in a warehouse. This sort of scanning, not being triggered, is the least likely to lead to a change in access point selection, but may still serve its appropriate place in a network. For data clients, as a comparison, this form of background scanning, triggered for no reason, is often driven by the operating system. Windows-based systems often scan, for example, every 65 seconds, just to ensure that the operating system has a good sense of the networks that are available, in case the user should want to hop from one network to another. This sort of scanning causes a noticeable hit in performance for a short period of time on a periodic basis. 6.2.2.4 The Decision Whether the client has an updated scanning table, or whether it has been triggered to scan because of a disconnection or performance-limiting event, the decision to leave the access point and connect to a new one is entirely the client’s. This decision is driven by the same factors that trigger the scan in the first place. But it may also happen for other reasons, often a direct result of the updates made to the scanning table Voice Mobility over Wi-Fi 241 www.newnespress.com by a scan that might not have been triggered for the purposes of selecting a new access point. For voice clients, load can be an issue, and a scan result that shows a significantly lighter load on a different channel or access point can trigger a decision to hand off. So can varying signal strengths, even when the connection quality is more than adequate. It bears repeating ad nauseam that clients determine which access point they wish to connect to entirely on their own, for reasons neither specified by the standard nor available to the user of the device. The only influence the network has is to invoke harsh load balancing techniques (Section 6.1.2) to simply deprive the client of an otherwise legitimate choice in the name of better balance, or to use channel layered techniques to create a virtualized access point (Section 5.2.4.7), where the client is not aware of any transitions. Why do vendors insist on leveraging proprietary methods to determine when to scan, how to rank access points, and when to make the final transition? This is surely a perplexing question, because this very fact of hidden client control is what leads to one of the greater complexities for voice mobility. The best way to answer this is to look at it from the point of view of a vendor. Client manufacturers, especially those for voice mobility devices and phones, stake their reputation and brand value on being known for the quality of the voice calls made from their device. The behavior of a voice call over Wi-Fi, when the handset is not in motion, is already well defined, with WMM completely specifying how the voice traffic can gain priority, SIP and other signaling protocols establishing how the call is made and ended, and RTP describing how the voice traffic is encoded into UDP. Therefore, for handset hardware and software vendors to be able to differentiate themselves from the manufacturers of other devices in their market, they have an incentive to produce value by creating proprietary methods for improving handoff. Even with standards (such as 802.11k, Section 6.2.6) that we will see help with the information exchange, there is pressure on each phone manufacturer to focus on creating unique methods for trying to get the handoff decision-making process “justright,”andtoholdtheparametersthatgointothosemethodsclosetothevest. Unfortunately, this greatly complicates the job of anyone who must manage a voice mobility network. Generally, the documentation with any voice mobility device will be noticeably vague as to the procedures and controls that the administrator or user may be able to employ to influence the handoff behavior of the client. However, some vendors do recognize that a “one-size-ts-all”approachisnotlikelytoworkinallcases,andthereforeoffersome general-purposesettingsforhandoffbehavior.Goingbytermssimilarto“roaming aggressiveness”or“handoffaggressiveness,”thesesettingsoftenarescaledfromlowto high or in stages and change the behavior of the client in ways that are not made public, but are intended to allow the administrator to favor, in the low case, having the client avoid 242 Chapter 6 www.newnespress.com roaming unless necessary, and in the high case, having the client hand off whenever it may perceive any benefit from doing so. Because a handoff is a juggling act, and because even one network will vary immensely from point to point within its environment, it is impossible for the client to strike an optimum balance between roaming aggressiveness levels that will work within a large-scale voice mobility deployment. There are mitigation strategies to help determine settings that may provide better results than others, and we will explore those in Section 6.4). To understand the process better, you first need to understand where things can go wrong. In some situations, clients choose to initiate the handoff process too late. This is referred to as the sticky client problem (see Figure 6.6). There are two fundamental origins of this sticky client problem. The first is when the client is not able to adequately judge when voice quality is suffering. This is a problem more common to multi-purpose devices, such as smartphones or laptops, where voice quality on the Wi-Fi link is not the primary design concern. This can be especially true on devices that are running proprietary voice client extensions, as can be expected with enterprise FMC offerings. In these cases, the lack of complete integration of the voice over Wi-Fi application and the underlying Wi-Fi handoff decision engine can cause the voice quality to suffer. Imagine a smartphone with a Wi-Fi engine that is primarily designed for data uses, running a proprietary voice over IP application.Totheuser,thedeviceappearstobeacohesive,well-functioningwhole.Voice over Wi-Fi applications provide dialing keypads and address book functions very similar to that which the phone provides natively for cellular dialing. Furthermore, these applications will attempt to use the same microphone and speaker that is used by the cellular phone application, thus allowing the Wi-Fi calling experience to be as similar as possible to the cellular one. However, because the Wi-Fi engine probably does not take any input from the voice application, and is thus unaware that a true voice application is running (even though voice packets are being sent and received), there is no way for the Wi-Fi engine to become aware of changing or suffering voice quality. Ideally, a phone would take voice quality as one of the major inputs in the ranking of the entries in the scanning table, as well as using themastriggerstocausescanningtooccur.Voice-awareclientscanning—theprocessof scanning between the voice packets mentioned before—will also not be possible, because the phone is unable to know when the next packet will come in, and thus may easily miss returning to the channel in time. As a consequence of all of this, the phone’s handoff processisentirelydictatedbythedata-drivenbehavioroftheWi-Fiengine.Voicequality can begin to suffer rapidly when a channel becomes oversubscribed, or when the phone moves far enough out of range of an access point that multiple retransmissions of the voice packets are required to complete communication. Data-oriented Wi-Fi engines are less likely to choose lower transmit rates to avoid retransmission, which reduces latency, and are more likely to choose higher transmit rates, which increases throughput. Thus, voice quality Voice Mobility over Wi-Fi 243 www.newnespress.com Signal Quality a) The irregular cell boundary, produced by normal RF effects and shown by the ligher shading, signifying less coverage, can cause a phone to experience abrutply-varying voice quality. Time b) Stepping behind a filing cabinet can cause sudden shadowing effects, where the signal strength is suddenly diminished. c) The caller’s head can block 5dB of signal. Combined with the effects of the asymmetry of the internal antennas in the phone, the phone can lose up to 10dB of signal just by the caller turning her head. d) Running from a standing start can dramatically change the dynamics of the apparant RF environment. Signal Quality Time Signal Quality Time Signal Quality Time Figure 6.6: The Origins of Sticky Client Behavior 244 Chapter 6 www.newnespress.com may suffer, and the phone will not commence scanning. This may easily happen even when the loss rates and access point signal strength is more than adequate for data traffic. The second origin of the sticky client problem can easily plague dedicated voice devices with well-integrated technologies that ensure that voice quality is measured. The reason has to do with the relative ease that a client can underestimate its radio environment. What happens is that clients cannot simply tell when they are in an area or being used in a manner where the environment will rapidly change. Rapid changes can come from the physical environment as well as the way the phone is being used. One example of changes to the physical environment is that caused by irregularly shaped cell boundaries and the direct influence of the phone and the user shaping the boundary for that cell. Small motions in the handset can cause large changes in the link quality, simply because the phone may move across an irregular part of the cell. Because phones are not aware of the RF environment—they have no information about coverage areas, as would be available in a detailed site survey—they cannot plan for areas of variability. In this way, the phone will not see the changes coming, and will be suddenly caught having to react to quickly reestablish a solid connection, by kicking off the scanning process. If the phone’s user is lucky, and the phone had happened to have performed a random background scan recently, this might be a shorter process. But it is far more likely not to be the case, and the very act of trying to reestablish service will further disrupt the call whose quality is now suddenly shaky, possibly leading to the network dropping the call or either caller giving up, ending the call, and trying again later. An obvious example of this effect is when someone steps into an elevator or stairwell. But those may be thought of as areas of poor coverage planning, perhaps. Yet it is still rather easy to imagine a caller stepping behind a metal bookcase or cabinet that just happens to be in the line of sight between the access point and the phone. There may be plenty of coverage, as measured by signal strength for the call, but the sudden drop in signal strength may still cause poor wireless behavior, such as slow-to- change data rates and rapidly increasing retransmissions, and a handoff may nonetheless be in order. Far more common is the attenuation, or loss of signal strength and quality, caused simply by the caller’s head. The head, being full of water, does cause quality to suffer. A quick turn of the head can change a connection that was quite fine for voice into one where voice quality is greatly challenged. A human head provides about 5 dB of attenuation. Add to this the issue that phones do not have perfect radiation patterns. A phone is not a large device, and radio purity is sacrificed for both the size and the aesthetics of the phone. In terms of antenna coverage of the phone itself, the antennas are usually tiny, highly folded bits of metal that are wedged somewhere inside the phone. These are not high-performance antennas, by design. Moreover, they suffer from significant asymmetry: they just do not work as well in every direction. Some of the antennas that are used in mobile phones can vary their antenna gain by 5 dB, depending on the angle. Together, the effects of the Voice Mobility over Wi-Fi 245 www.newnespress.com asymmetric antenna, the caller’s head, and even the caller’s hand can cause signal loss that goes over 10dB, which can easily take off many yards of range. These effects are likely to be transitory, of course, but action is still often required to prevent the caller from getting a sense that the network or the phone is flaky. Another example of changes to the physical environment comes from that the client does not and cannot know how fast it is moving. Past performance is no indication of future behavior, and even the most adaptive and intelligent phone device can have its algorithms lulled into a false sense that the phone is not experiencing dramatic, nontransitory changes. In this case, a sudden movement that results in a permanent exiting of the coverage area of the access point can leave the phone unprepared, in terms of scanning, and force it to engage in the same sudden and disruptive repair processes as before. Hospitals and warehouses are two obvious types of deployments that can have voice mobility devices that can remain stationary for long periods of time, yet are followed by sudden and extreme movements, such as a forklift speeding down the aisle or a nurse running down a hall. In either case, sticky clients are not helped by the fact that phones tend to transmit at lower power levels as the access points serving them. Because of this, the phone may have the false sense that it is in range, getting sufficient signal strength to adequately support the downstream part of the voice call. However, the upstream part—the return link—may not arrive with a high-enough fidelity at the access point. This can result in what is known as one-way audio, in which the phone user can hear the other party but cannot be heard in return. For handoffs, this makes stickiness worse, as the phone sees a strong access point but is weak in return. The opposite problem is equally vexing: in some situations, clients choose to initiate the handoff process too early and often. This is referred to as the frisky client problem. This problem comes about more often because the phone designers may have been aware of the sticky client problem, and decided to increase the aggressiveness of the phone’s roaming behavior or the sensitivity of the phone to audio or radio variations. Frisky client behavior results in a higher than necessary number of handoffs. But more than that, frisky behavior can result in phones handing off to access points that are less capable of serving the phone and providing high-quality audio. This results when the phone correctly detects the variation—perhaps an increase in the noise floor on the channel or encountering higher densities of devices using the network—but incorrectly decides to act on it. The phone may choose to trigger an aggressive scan, thinking that the call quality is going to suffer or will shortly. If the caller is lucky, this aggressive scan will cause the phone quality to suffer for a shorter amount of time, while the scan progresses, but the phone may choose to remain on the same access point. But more likely, phones are tuned to make the transition, cutting bait on the original access point. This can be a very poor decision for one very simple reason. The phone was reasonably likely to have been associated to a close access point, with 246 Chapter 6 www.newnespress.com higher signal strength. Any change of access point can result in the phone being associated with a far more distant access point—this is especially true with microcell deployments and not generally so on layered deployments. (See Figure 6.2 in Section 6.1.2.) The consequences of the phone transitioning to a more distant access point are significant. The lower signal strength of the further access point increases the chance for RF interference, by reducing the link budget and SNR, as well as increasing the chance that the phone will go out of range of the newer, more distant access point more quickly than if it had stayed put. Furthermore, the data rate that the client and access point can use will be lower, which causes the voice packets to take more time and causes more interference with the clients in the cell the distant access point is generating. Finally, because the client is further away from the access point, its perceived RF environment is going to be more different from the access point and those of the other clients on that access point than if the phone were closer to the access point. The access point’s reporting of information using features such as 802.11k, its own load-balancing and decision-making properties, and the headroom that the access point is reserving for voice will all be incorrect for a distant client. Even worse, dynamic microcell architectures may be forced to increase the power level of the distant access point to cover the frisky client, increasing intercellular overlap and causing co-channel interference and 802.11 noise to rise. In short, the network and RF variability that leads to poor local audio performance with sticky clients can lead to bad decisions by frisky clients. The variations of the environment for a frisky client can lead those overly sensitive clients to make changes needlessly and to the detriment of the caller, as mentioned earlier. But overly aggressive frisky clients can make poor handoff decisions even when the existing connection has not changed appreciably. The same RF variations can cause neighboring access points’ signals to arrive at the phone, which may become increasingly tempting. Or access points that were ruled out earlier can become more tempting based on load variations. This is more true in denser deployments, and can result in negative effects caused by herd mentality. With herd mentality, the behavior of other phones affects the behavior of the phone in question. Unlike with real animal herds, it is highly unlikely that a frisky client will make a decision on the basis of directly observing the decisions of other phones and copying them. Rather, the phone is likely to make simultaneous, and sometimes identical, decisions, based on the indirect effects the other clients’ decisions have on the channel and on the access point’s reporting.Let’slookatsomeexamples. One source of herd behavior comes from the load reporting that access points perform. As will be mentioned in Section 6.2.6, access points are allowed to report the load of the network. They may report the capacity available, determined by the load balancing and admission control operations (Section 6.1.1). They may also measure the average access delay—how long it takes for its packets to get access to the air. The longer the delay, the busier the channel. They also can report the raw number of clients supported. These Voice Mobility over Wi-Fi 247 www.newnespress.com numbers are based not on anything intrinsic to the access point, but on the numbers and behavior of the clients within the cell. More intelligent voice clients will often base their roaming decisions on the basis of these numbers. Frisky clients will be especially sensitive to them. When the network is in steady state, and very little is changing, a client can quite effectively use this information to make a good decision. However, the herd mentality problem comes into play when the dynamism of the network increases, and these numbers start varying. Picture two access points within range of the client. Figure 6.7 illustrates this. There is some distribution of clients associated to each access point. Assuming that nothing changes, this situation is likely to be stable. However, let’s now imagine that a small handful of the clients move from one access point to another. A good reason for this could be that the holders of these phones have left a meeting, and have moved closer to the new access point. Immediately, the clients who are associated to the old access point may notice a difference. The load reporting information from the access point will announce the arrival of these new clients. Even if there is enough capacity on the new access point, and the admission controller allowed the clients in, phones’ having to compete for resources is not necessarily an ideal situation. On the other hand, the old access point suddenly has room to spare. This seems like an ideal situation, and the frisky clients may determine that they want to pack up and move for greener pastures. Of course, there is no way to know exactly how many will make this decision. The more aggressive ones will flood over to the new access point, tipping the balance back. If only a few move, then there is not likely to be an enduring issue. However, because many voice mobility deployments for the enterprise use the same devices with the same configurations, a large number of devices may make the transition. The grass is not always greener, and instead, the devices may start interchanging, producing a significant amount of handoff activity that eats up resources that are more needed for the sending of data. These sorts of positive feedback loops cause ping-ponging between the access points and are quite easy to encounter whenever feedback is offered without dampening and without memory. However, the design of 802.11 is such that this feedback is provided in precisely this manner. Moreover, even if none of the feedback is offered, clients can and do still use their own ability to measure over-the-air load to make these decisions. Overall, you will notice that both sticky and frisky clients are just opposite sides of the same coin. If they’re too passive, the call may suffer as detrimental effects that demand action go unnoticed or ignored. If they’re too sensitive or too aggressive, the call quality of both the current call and of those around the phone may suffer as too much changing occurs, causing cascading handoffs or increased network waste. There are strong parallels here between the handoff behavior of clients and the stability problems that can occur in dynamic routing protocols—or to any reactive, feedback-based system. Too much dampening, and needed changes don’t occur. Too little, and network 248 Chapter 6 www.newnespress.com a) Wireless variations, such as irregular cell boundaries, can cause a frisky client to be stolen away, for the possible detriment to the call quality. In this case, instead of the irregular coverage causing the connection to the connected access point to look bad, it causes neighboring access points to look good—especially if the more distant access point is lightly loaded. b) The wireless environmental variations that, when strong, can lead to sudden onsets of poor performance for sticky clients can, when minor, fool frisky clients into making needless transitions. c) Overly-aggressive frisky clients can become trapped when they have multiple choices that are all very similar, especially when the choices are not good, such as when caught in the weak spot between three access points. d) Herd mentality behavior cna be triggered by fluctuations in the steady- state distribution of clients, especially when density is higher. For example, the sudden motion of a few clients can trigger a cascading reaction, one that might not settle down. 2. 1. Figure 6.7: The Origins of Frisky Client Behavior Voice Mobility over Wi-Fi 249 www.newnespress.com flapping and oscillations can set it. Overall, the issue is that every client things for itself. Each client’s handoff behavior is a fundamentally local process. Only what the client can see—whether it is observed directly or indirectly, with aid of the access point—determines the client’s behavior. On the other hand, there is a stable, global optimum. This global optimum would take into account all of the effects—transient, permanent, important, irrelevant, direct, and indirect. The network has a better chance to have this view than the client, but Wi-Fi is as it is, and the client owns the decision. Because of the vast complexity of interactions available (not only the first-order factors based on which access point a client should hand off to from its own observations, but the increasing-order factors based on client-to-client interactions and indirect observations), the production of workable—never mind optimal—handoff schemes that apply to dense deployments is an area of active research and development for client manufactures. If you have a voice mobility deployment in which the above-mentioned handoff issues are prevalent or are a concern, you are encouraged to ask your phone vendor specifically as to the degree and meaning of the handoff controls they do offer, and to press them for recommended settings for your environment. 6.2.3 The Wi-Fi Break-Before-Make Handoff Basic Wi-Fi handoffs are always either break-before-make or just-in-time. In other words, there is no ability for a wireless phone to decide on a handoff and establish a relationship with a new access point without disconnecting from the previous one. The rules of 802.11 are rather simple here: no client is allowed to associate (send an Association message to one while maintaining data connectivity to another) to two access points at the same time. The reason for this is to remove any ambiguity as to which access point should forward wireline traffic destined to the client; otherwise, both access points would have the requirement of receiving the client’s traffic, and therefore would not work in a switched wireline environment. However, almost all of the important protocols for Wi-Fi happen only after a data connection has been established. This prevents clients from gaining much of a head start on establishing a connection when the old one is at risk. Let’slookatthecontentsoftheWi-Fihandoffprotocolitselfstepbystep.Itwillbehelpful to consult Section 5.2.3.3 for further information. 1. Once a client has decided to hand off, it need not break the connection to the original access point, but it must not use it any longer. 2. The client has the option of sending a Disassociation message to the old access point, a good practice that lets the old access point free up network resources. . view of a vendor. Client manufacturers, especially those for voice mobility devices and phones, stake their reputation and brand value on being known for the quality of the voice calls made from. 6 www.newnespress.com roaming unless necessary, and in the high case, having the client hand off whenever it may perceive any benefit from doing so. Because a handoff is a juggling act, and because even one network. slow-to- change data rates and rapidly increasing retransmissions, and a handoff may nonetheless be in order. Far more common is the attenuation, or loss of signal strength and quality, caused simply