Luận án tiến sĩ: Swarm intelligence methods for mobile ad hoc networks

Our unicast routing protocol, ANSI, is a hybrid protocol which works over networks withboth pure MANET nodes and infrastructure nodes hybrid networks.. PIDIS maintains information about

Mobile ad hoc networks: problems and applications

Mobile ad hoc networks (MANET) are a type of multihop wireless networks where nodes in the network autonomously establish connectivity to other nodes in the network, without the assistance of infrastructure nodes These networks establish and manage connectivity under several constraining issues, such as mobility (which breaks network topology), low energy reserves and several wireless communication issues such as hidden/exposed terminals, fading, propagation loss and extreme contention.

Figure 1.1: A mobile ad hoc network Node 3 communicates with node 5 via nodes 4, and 6.

A substantial portion of past research effort in MANET has concentrated on applications of an unbounded multihop communication such as battlefield communications, e.g., the US Army’s Future Combat Systems (FCS) [1] We refer to such networks as “pure” MANET in this dissertation (see Figure 1.1) In recent years, a number of interesting commercial applications have spawned from the domain of pure MANET research Among these, hybrid/mesh networks (see Figure 1.2) and vehicular/highway networks (see Fig- ure 1.3) have emerged with exciting commercial promise In both these approaches, an otherwise resource-constrained “limited hop” MANET is complemented using infrastructure nodes which assist in the dissemination and generation of pertinent content The problem in these scenarios is both to make use of infrastructure-based services and resources as and when available, and thrive under pure MANET service when such infrastructure is not present These criteria present a new design paradigm to MANET researchers.

Figure 1.2: A mesh network Node 11 communicates with node 1 via mesh router 2, and mesh router 1.

This dissertation is mindful of the new challenges in the MANET research community owing to new paradigms in multihop wireless communications and addresses some of the issues that are pertinent to both a pure, unbounded-hop wireless communication and a bounded-hop, hybrid ad hoc network with infrastructure nodes.

Figure 1.3: A vehicular ad hoc network Vehicle 2 communicates with vehicle 1 via vehicle 3 and with a gateway via vehicles 3 and 7.

Swarm intelligence mechanisms

The work in this dissertation uses the mechanisms of Swarm Intelligence (SI) to address some of the common problems in MANET Swarm intelligence and a number of models inspired by the SI metaphor are described in [10] SI refers to complex behaviors that emerge from simple individual behaviors and interactions, and is commonly observed among social insects such as ants and honeybees Although each individual (for instance, an ant) has little intelligence and simply follows basic rules using local information obtained from the environment, globally optimized behavior 1 emerges when these individuals work collectively as a group In this dissertation, we are interested in the self-organizing abilities of SI models which incorporate the following components:

• positive/negative feedback, which search for good solutions and stabilize the results,

1 For example, in the food foraging model, ants find the shortest path from their nest to a food source.

• amplification of fluctuations, (a mechanism for detecting and adapting to changes in the system) which discovers new solutions and adapts to changing environment, and

• multiple interactions, which allows distributed entities to coordinate and self-organize.

Together, these components comprise an adaptive, distributed control mechanism that effectively manages topological information and learns about the network topology in a more efficient manner than other approaches.

Motivation and problem statement

Broadly, we address a number of issues across different layers of the protocol stack for MANET in this dissertation We address the problems of unicast routing in hybrid ad hoc networks and the benefits of TCP-based evaluation for studying MANET unicast routing protocols; the problem of multicast packet recovery in MANET; and lastly, P2P solutions for MANET We outline the motivation and problem statement for each of these problems below:

1 Unicast routing: Our work in unicast routing is motivated by the fact that a number of MANET routing protocols have been proposed, e.g AODV [53], and DSR [38], but they make inefficient use of topological information collected during the route discovery process, and discard some topological information too quickly.

Our work investigates if (a) collecting more information during the routing process and using this information to perform routing decisions, and (b) managing the collected information efficiently using SI mechanisms will improve the performance of unicast routing protocols in MANET.

2 Using TCP as an evaluation tool: Our work on addressing unicast routing outlined above also motivates the need for studying routing protocols under TCP traffic This motivation is supported by three factors: (a) the fact that the networks perform drastically differently under TCP traffic as compared to UDP traffic, (b) the TCP imposition of fairness is a legitimate property which any routing protocol should strive for, and (c) TCP is “here to stay,” given legacy issues and the current shift in the MANET research community towards commercial efforts and the merging of the Internet and wireless networks.

Our work investigates if applying TCP traffic during the evaluation of MANET routing protocols will yield new insights into the performance of MANET unicast routing protocols.

3 Packet recovery for multicast protocols: Our work on packet recovery of multicast protocols is motivated by the need for a new approach to designing a more general purpose multicast packet recovery service.

Our work investigates if a gossip-based, protocol-independent packet recovery service which does not need group membership information and can be used without the help of unicast routing primitives can be more effective than a comparable approach in the face of topological fluctuations and high traffic characteristics in a MANET.

4 P2P for MANET: Our work, adapting BitTorrent for MANET, was motivated by the need for an efficient, mobility resistant P2P mechanism for MANET which will function under typical MANET issues such as partitions.

Our work investigates if a BitTorrent-based P2P model for MANET is able to re- solve the issues due to node mobility, failure and network partitions in MANET by using a decentralized model with cross-layer techniques, and replicated object models for resource redundancy; and studies the tradeoff between node-wise performance and network-wide performance in the BitTorrent-based P2P model when considering incentive mechanisms for peers.

Relevance of observed data and repeatability

Consistent with calls for more repeatable and statistically valid inference procedures outlined in [44], the dissertation has endeavored to provide as much information as possible about the simulation studies and protocol details This information, we expect, will allow for the repeatability of the experiments Wherever expected, the dissertation has shown confidence intervals of the observed data and used statistically sound inference techniques, as indicated in [37].

In addition to the above, the dissertation has measured as many metrics as necessary to provide a fuller, better understanding of the performance of the discussed approaches. For example, all our experiments show what costs our approaches incur due to taking a different approach These observations make for a more genuine understanding of the measured metrics and re-iterate the general design principle that there is “no free lunch” when it comes to solutions in MANET: improved performance in a MANET environment is possible, but the improved performance comes at the cost of increased MAC layer resource consumption.

We note that the simulation experiments detailed in this dissertation do not represent real- world scenarios Topologies for real world scenarios begin with a realistic terrain model,which is hard to obtain The terrain model in turn dictates the parameters for propagation characteristics and mobility characteristics (for example, nodes are constrained to move along roads in an urban terrain) These issues are beyond the scope of this dissertation.

Figure 1.4: The organization of the dissertation showing the publications for each topic.

Organization of dissertation and research contributions

A schematic showing the organization of this dissertation is shown in Figure 1.4 The references cited for each chapter show the author’s publications on each topic We start this dissertation by introducing the background and work related to this dissertation in Chapter 2.

In Chapter 3, we describe ANSI, our unicast routing approach, and show how ANSI is able to perform better than a competing protocol, AODV [53], in a variety of network, environment and traffic scenarios We show how the SI mechanisms in ANSI help in making better routing decisions with fewer route errors However, we also see that ANSI’s better performance metrics come at the cost of increased MAC layer resource consumption The work presented in this chapter is published in [54, 56].

In Chapter 4, we continue our work examining ANSI to show that using TCP as an evaluation tool for designing MANET routing protocols can help us gain new insights in the design process The work presented in this chapter is published in [57].

In Chapter 5, we discuss our approach to multicast packet recovery, PIDIS, and show how our gossip-based approach recovers lost packets without the use of membership information PIDIS is persistent with packet recovery, while striking a balance between the variability of packet delivery and the aggressiveness of packet recovery PIDIS does the above by maintaining information about which next hops recover lost packets rather than maintain the information about which group members help recover lost packets, thus making the process less sensitive to topological fluctuation Using simulation studies, we show that ODMRP+PIDIS shows better performance than a competing protocol by taking a different approach to gossip However, we also see that ODMRP+PIDIS is not able to perform well under conditions of low mobility The work presented in this chapter is published in [63, 64].

In Chapter 6, we discuss BitTorrent for MANET (BTM), our approach to P2P over MANET BTM adapts the BitTorrent framework and introduces decentralized behavior, redundant services, and cross-layer interactions with ANSI at the routing layer Our choice of ANSI at the routing layer is motivated by our results explained in chapters 3 and 4 Simulation studies show that BTM performs better in the MANET context as compared to a straightforward implementation of BitTorrent in MANET We also illuminate the problem of enforcing node-wise incentive mechanisms in the BTM context, and show that the balance between maximizing network performance and node-wise performance is a fundamental issue in MANET Our simulation studies also show that BTM is able to perform better than other P2P protocols that use only a single connection to download the entire file We attribute these favorable observations of BTM to its decentralized,resource-redundant characteristics which, when complemented by ANSI, provide an appropriate medium for content distribution in MANET However, we note that the better performance characteristics of BTM come at the cost of higher MAC resource consumption for BTM, both owing to higher routing activity and higher TCP activity Part of the work presented in this chapter is published in [55].

Finally, we conclude this dissertation in Chapter 7 by summarizing our contributions and describing our future research efforts.

Chapter 2 BACKGROUND AND RELATED WORK

In this chapter, we set the context for our work and discuss relevant past research effort in MANET In Section 2.1, we set the background for our unicast routing work, ANSI,and discuss the various approaches to unicast routing in MANET, including other swarm intelligence-based protocols In Section 2.2, we identify the general trends in TCP-related research in MANET and follow the above discussion with an outline of the various issues transport layer researchers for MANET address In Section 2.3, we discuss several approaches to the problem of reliable multicast in MANET and discuss a number of gossip- based approaches to the problem Lastly, in Section 2.4, we discuss the relevant past research for our work on P2P for MANET.

Unicast routing for hybrid ad hoc networks

The problem of hybrid ad hoc networking shares a number of problems with typical MANET problems, and typical routing solutions for hybrid networks start with a MANET routing solution and then apply some optimizations to work for specific scenarios A number of ad hoc routing protocols have been proposed, for example, [5, 31, 38, 52, 53, 59], of which some of them, like AODV [53] work on hybrid ad hoc networks In proactive protocols such as OLSR [5], nodes in the network maintain routing information to all other nodes in the network by periodically exchanging routing information Nodes using reactive protocols, such as AODV [53] and DSR [38], delay the route acquisition until a demand for a route is made Hybrid protocols, like ZRP [31] and SHARP [59], use a combination of both proactive and reactive mechanisms to gather routes to the destinations in a network—nodes using ZRP, for example, proactively collect routes in their zones, and other routes are collected reactively In SHARP, on the other hand, the level of proactive activity and reactive activity are chosen autonomously by the nodes in the network, and proactive activity is only seen around favorite destination nodes In most traditional reactive protocols, like AODV and DSR, only when a route breaks irreparably do the protocol mechanisms repair the damage In reality, route deterioration in mobile networks is most often not sudden but gradual 1 So the routing protocol should continuously maintain information about the nodes in the local area to perform effectively and handle link breakages.

In [8], Baras et al describe a swarm intelligence based reactive ad hoc routing protocol called PERA PERA uses broadcast forward ants as exploratory agents sent out on demand to find new routes to destinations Each ant holds a list of nodes that were visited while exploring the network, and since these ants are broadcast at each node, a forward ant can result in several backward ants—ants sent by destination nodes in response to forward ants This mechanism uncovers several routes for each forward ant sent, and at each node these multiple routes found to the destinations are maintained as probability values.

As with AntNet [12], the routing tableRiat nodeiis a probability matrix with a probability entryP ijd as the probability that a data packet ati’s FIFO queue will take the next hop j to be routed tod Positive reinforcement is managed in PERA using forward/backward ants and negative reinforcement is implicit—no explicit aging of the pheromone trails is done After a route has been established, PERA regularly uses forward ants to find newer routes to destinations This property is wasteful, because forward ants consume a lot of network resources and should not be sent when not necessary.

1 Some routes, such as routes to neighbors, break suddenly, when the neighbors go out of range We comment on the general case.

In [11], Camara et al outline a source routing scheme in which the network relies on location information and support from fixed infrastructure Owing to a source routing approach, the algorithm relies heavily on a source↔destination route which is available at the time of message creation New nodes in the network start with using their neighbors’ routing tables These routing table, generated using shortest path algorithms, on the other hand, may contain information which is outdated Ants are unicast from a source to a specific destination For example, the destination node may be the node with the oldest information in the routing table This mechanism is used to make sure that the routing information in the source is updated and recent Thereby, ants are used in [11] with the semantics of routing information updates, like classical distance vector protocols such as DSDV or DBF—Ants are not used as feedback agents to reinforce routes positively (in the case when a route is still good), negatively (when a route is no longer good) or explore new routes randomly Ants in this approach are unicast to specified direction, not allowing for amplification of fluctuations, and depending on known metrics such as the timestamp of a route in the routing table.

The approach used in [33] by Heissenbttel et al also relies on location information, and is a purely proactive routing approach based on dividing the network into logical zones and assigning logical routers to each Ants—forward ants and backward ants— are used by each logical router in this approach to periodically check if the logical links connecting the logical router to a randomly chosen destination are functional and report on the current state of the network surrounding the logical router Positive and negative reinforcement are achieved by means of multiple interactions and pheromone additions (by forward and backward ants) and pheromone aging Random amplification of a new good route in the face of topological fluctuations is possible by random dissemination of ants to destinations.

In [30], Găunesá et al outline ARA, a multipath, purely reactive scheme ARA uses forward ants and backward ants to create fresh routes from a node to a destination When routes to a destinationDare not known atS, a forward ant is broadcast, taking care to avoid loops and duplicate ants When a forward ant is received at an intermediate nodeXvia nodeY, the ant reinforces the linkXY inXto route to all the nodes covered so far by the forward ant When a forward ant is received atD, a backward ant is created which backtracks the path of the corresponding forward ant At each node the backward ant is received, the link via which the backward ant is received is reinforced In ARA, data packets perform the necessary (positive) reinforcement required to maintain routes When a path is not taken, the path subsequently evaporates (negative reinforcement) and cannot be taken by subsequent data packets Under the described scheme, amplification of topological and network fluctuations is not possible except under extreme conditions when routes break often.

In [20, 23], Di Caro, Ducattelle, and Gambardella describe AntHocNet, a hybrid, stochastic approach to the routing problem in MANET AntHocNet is a congestion-aware protocol which only finds routes on-demand, but once a route is established, the route is proactively maintained This approach, argued by the authors to be more ant-like [20] than other competing ant-based protocols, will fail to reduce overheads in high traffic/mobility scenarios, owing to the increase in the rate at which proactive ants are potentially unicast when the mobility increases We expect this behavior because in high mobility/traffic scenarios, routes get invalidated often and proactive activity has to increase appropriately to keep a valid view of the network for routing, thus increasing the load placed on the network Indeed, we do agree with the comment that the authors of AntHocNet make regarding the repeated path sampling, and ANSI manages to steer clear from repeated path sampling by carefully choosing when to engage in route discovery activity.

In [72], Wedde et al present a new routing algorithm for energy efficient routing in mobile ad hoc networks In their approach, they show that BeeAdHoc, a reactive source-routing protocol inspired by the foraging principles of honey bees, is able to achieve better energy consumption characteristics as compared to DSR, AODV and DSDV without compromis- ing on traditional performance metrics such as packet delivery and throughput.

TCP for MANET

TCP performance over wireless and mobile ad hoc networks has been studied extensively in the past, for example, [24,74,75] Some of the past transport layer research for MANET have shown that TCP is a bad protocol to use over MANET, for example, [26, 28, 34] and some transport layer research in MANET has also focused on improving TCP for MANET, for example, [15, 19, 71].

What past research shows is that TCP cannot be used for practical purposes over existing MANET as-is This is because the fundamental operations of TCP functions (such as congestion/flow control and fairness and reliability improvement mechanisms) were designed for wired networks, rather than MANET For instance, packet drops result only from congestion in the wired Internet whereas both congestion and link failures (due to mobility and interference) result in packet drops in MANET Most of the past research evaluating routing protocols for MANET, consequently, has concentrated on using CBR traffic over UDP for evaluation, for example, [42, 45, 53].

In [50], the authors summarize the various issues of using TCP over MANET Most of these issues are related to routing protocol design and should be addressed at the routing layer if the MAC layer is unable to address them The following list, adapted from [50], lists the various issues pertinent to routing protocols running under TCP:

1 The MANET environment is characterized by frequent path breaks.

2 The MANET environment is characterized by frequently changing path lengths of routes which affect the throughput capacity of the MANET [24].

3 Some or all links may suffer from asymmetric link behavior in a MANET.

4 Some routing protocols for MANET set up only a uni-directional path during the route discovery phase.

5 In some cases, multipath routing can be used for alleviating the effects due to hotspots in a MANET environment, but this has been shown to be not always effective [74].

6 Some MANET protocols can handle network partitioning and re-emerging, for example, [65], but these protocols may potentially need unbounded buffer spaces to function effectively.

All of the above issues directly hamper the path setup and maintenance aspects of aMANET, thus affecting TCP performance.

Gossip and multicast packet delivery improvement for MANET

Although several multicast routing protocols have been proposed for mobile ad hoc networks, for example, [45, 47, 49, 61, 73], improving packet delivery reliability has been a challenge, and has been addressed by several research approaches There are two cate- gories of protocols (1) reliability improvement services, like AG [13] and our approach, PIDIS, that run atop unreliable protocols, and (2) reliable protocols, like RDG [48], RALM [69], ReACT [58] and the reliability extension to ODMRP, RODMRP [68] Un- like the reliability extension to ODMRP, RDG is an example of a reliable multicast protocol implemented over a unicast protocol (DSR [38]) Both RDG and the reliable extension to ODMRP, however, concentrate on being reliable without using any repair services, like

AG or PIDIS Also, RALM uses an ACK/NACK scheme, and RODMRP uses adaptive flooding to achieve RODMRP’s goals It is easy to see that both RALM and RODMRP are forced to strike a balance between scalability and reliability.

Gossip based techniques are useful in situations where information dissemination is done without accruing the excessive overheads of flooding [32] In gossip, each node forwards a packet with a certain probability p This probability can be chosen appropriately to control the amount of flooding in the network Of the above, AG and RDG are two gossip-based approaches to the problem of reliable multicast Anonymous Gossip (AG) [13] provides a reliability improvement service that runs atop unreliable multicast protocols, and Route-Driven Gossip (RDG) [48] is a reliable multicast protocol In contrast to RODMRP and RALM which suffer from the tradeoff between reliability and scalability, gossip-based approaches exploit the non-deterministic nature of mobile ad hoc networks to provide probabilistic reliability in a scalable manner [48].

ReACT is a transport layer multicast protocol, working with a multicast and unicast protocol, and has end-to-end purviews (the approach detailed in [58] uses ODMRP and AODV) ReACT, using cross-layer mechanisms, performs “typical” transport layer actions such as congestion control and error-recovery These actions are outside the scope of PIDIS, which is designed purely as a packet-recovery service, and will kick in only if packets are lost at a receiver In addition, PIDIS has (a) no control over the source (because PIDIS does not have an end-to-end responsibility) to prevent congestion and perform flow control (by decreasing flow rate), and (b) no control over congestion in the network Furthermore, PIDIS is not concerned with the interaction across layers, and interacts only with a multicast protocol (i.e., PIDIS is a strictly routing layer interaction).

We feel there are two approaches to reliability, (a) designing a packet delivery improvement service, such as PIDIS and AG, and (b) designing cross-layer and transport layer multicast protocols with flow control, congestion control and error-recovery, such as Re-ACT These are two different approaches to the problem of reliability in multicast routing protocols in MANET.

P2P/content distribution methods for MANET

There exists a synergistic relationship between P2P problems and MANET problems: both problems are based on providing communication in decentralized and self-organizing communities [35] Furthermore, future Internet architectures are envisioned to co-exist with MANET Thus, P2P problems for MANET are an important research area.

Recently, there have been a number of P2P file-sharing systems developed for the wired Internet, for example, Gnutella [41], Napster [3], Kazaa [2], and BitTorrent [16] Unfor- tunately, these protocols are not directly applicable to mobile ad hoc networks (MANET) owing to the extreme conditions MANET operate under Wired Internet peers can expect stable routes, high bandwidth capacity, and high uptime of other peers in the network, but these are not tangible assumptions in the case of MANET Nodes in a MANET frequently experience link breakages, changes in topology and network partitions, and nodes can be removed from the network owing to other issues such as energy depletion, bombing, etc. These conditions make designing P2P systems for MANET a challenging task.

In [29], Gerla et al discuss the issues in designing P2P systems for MANET, with specific emphasis on the applicability of wired Internet P2P models in the MANET context.

In [35], the authors discuss improvements and optimizations to routing protocols (specifically DSR [38]) to improve the performance of P2P applications Note that our approach is concerned with developing a cross-layer approach with an emphasis on application layer design to improve overall performance of a fundamentally different P2P system.

In [21], Ding et al evaluate five routing approaches to P2P file-sharing in MANET,and conclude that cross-layer protocols perform better than simply providing an overlay- construction protocol on MANET This work is similar to the work in [17] where the authors identify that cross-layer designs are better suited for P2P systems in MANET.Some P2P systems proposed for ad hoc networks were originally developed for wired networks, but optimized for mobile ad hoc networks, for example, X-GNU (Cross-layer Gnutella) [18] while other P2P systems were proposed exclusively for mobile ad hoc networks, for example [40, 62] In [18], Conti, et al show how a straightforward implementation of Gnutella will not work for P2P in MANET, and propose X-GNU, an improved version of Gnutella for using a cross-layer design In X-GNU, the routing protocol, OLSR [5], provides lookup services which proactively supply peer identification information to Gnutella working at the application layer.

In [62], Schollmeier et al discuss a cross-layer P2P system in which they use an enhanced version of DSR [38] (EDSR) to provide network layer functions to an application layer P2P protocol, Mobile Peer-to-Peer Protocol (MPP) By using such a framework, the authors expect to reduce the latencies associated with data/service lookup at the application layer These ideas were also described in [17], where the authors discuss how a cross- layer protocol stack simplifies the application layer tasks of a P2P system in MANET.

In [40], Klemm et al discuss ORION, a protocol which is a routing-layer independent application layer P2P protocol ORION provides its functions by being a self-contained P2P system which requires no interaction with a routing protocol Even though ORION claims not to use any network layer protocol, its operations mimic the functions of network layer protocols for querying and service discovery Thus, the advantages of ORION lie in a network where no routing functions are provided.

Some research has been done in the past supporting the case for directory-based systems for service discovery in MANET, for example, [14, 43] We expect these approaches to work well for P2P systems which construct peer overlays proactively, but not P2P systems which construct peer overlays on-demand Besides, our approach is based on placing the responsibility of peer-list collection on the client, and not a centralized (or semi-centralized) entity.

In general, we agree with Conti, et al in [17, 18] and Ding et al in [21] that cross- layer designs, incorporating routing layer for service/peer lookup and discovery, are the preferred approach to P2P system design in MANET These designs allow P2P systems working at the application layer to seamlessly adapt to the changing topology and network conditions in a MANET However, all the above discussed P2P systems differ from BitTorrent in the following ways: (a) In BitTorrent, the peer overlay is constructed on- demand and only maintained per client (that is, overlays for other clients are not directly accessible) (b) In BitTorrent, the peer overlay is maintained only through the duration of the client download, and (c) In BitTorrent, the peer overlay changes during the download of the file at the client and depends on what pieces have already been downloaded at the client.

In [6], the authors describe AdTorrent, which utilizes the swarming nature of BitTorrent to apply it to scalable content “pushing.” AdTorrent does not concern itself with content delivery of large files, which is where the issues of TCP streams and peer degree play large roles Instead, AdTorrent is concerned with the problem of digital billboards and uses a push model for dissemination of data, rather than a client-requested, pull model which is the P2P paradigm.

Several papers have researched the case for proportional replication using replicas of objects in P2P For example, [70] studies proportional replication by ignoring service discovery costs altogether and studying the performance improvement due to replication during the file downloading process BTM, being such a model, is expected to reap the benefits of proportional replication of objects.

There has been some research activity discussing the issue of incentive enforcement forP2P systems in MANET, for example, [36, 66] In [66], Srinivasan et al detail a game theoretic model for proving that a Generous Tit-for-Tat (GTFT) incentive mechanism for wireless ad hoc networks will result in a Nash equilibrium that will converge to the rational pareto optimal operating point The approach detailed in the paper discusses a method where each node being rational, tries to optimize the amount of resources the node expends The node performs this optimization by increasing its Normalized Acceptance Ratio (NAR)–the ratio of the number of successful relay requests generated by the node to the number of relay requests made by the node We note that the GTFT model is applicable in a context where a node’s behavior will not affect its neighbors, such as the wired Internet, where one node’s activity cannot affect another node, but not applicable in the wireless ad hoc context, where increasing the cooperation with a remote node can adversely affect the performance of the network along the path of communication.

In [7], Andrade et al point out that the BitTorrent framework is already resistant to freeriding, and results in increased cooperative behavior as compared to other P2P protocols like Gnutella Regardless of the above, Huang et al in [36] point out that incentive mechanisms may not be needed at all in MANET, and certainly not in initial deployments.

Several papers have discussed the efficiency of the BitTorrent incentive system, for example, [9, 16, 39, 46] The paper describing the BitTorrent protocol, [16] argues that the mechanism can reduce the problem of free-riding in online communities Some of the above mentioned papers, for example, [39] argue that the BitTorrent incentive mechanisms are flawed and need to be changed, while a number of them, for example, [9, 46] argue that the system is inherently resilient to free-riding.

In [25], the authors discuss the problem of designing Reliable Server Pools (rSerPool) for battlefield ad hoc networks, as per FCS specifications, and outline the key architec- tural requirements of designing rSerPool for MANET as follows: (a) fast switchover, (b) dynamic (re)configuration, (c) survivability, (d) application transparency, and (e) control- signal efficiency.

ANSI: A HYBRID UNICAST ROUTING PROTOCOL FOR

In this chapter, we present a hybrid unicast routing suite (with both proactive and reactive components) for hybrid ad hoc networks which uses the mechanisms of swarm intelligence (SI) to select good routes to destinations The mechanisms of SI, positive/negative feedback, amplification of fluctuations and multiple interactions allow a node to change routing information quickly and efficiently to adjust to an ever-changing local topology and route deterioration, thus causing fewer link breakages.

ANSI unicast routing protocol

Protocol overview

ANSI is a hybrid routing protocol for hybrid ad hoc networks comprising of both proactive and reactive routing components Pure MANET (mobile) nodes in ANSI use only reactive routing, and choose routes deterministically, while nodes belonging to more capable, infrastructured (immobile) networks use a combination of both proactive and reactive routing and perform stochastic routing 2 when multiple paths are available.

The ANSI routing process is outlined as below:

1 When a route to a destinationDis required, but not known at a nodeS,Sbroadcasts a forward reactive ant to discover a route toD.

1 The computational equivalent of the chemical deposited on the forest floor by ants.

2 In stochastic routing, the routing protocol at a node i chooses one next hop for a destination from a set of k possible next hops for the destination That is, if J {j1, j2, , jk}are the possible next hops to a destination ati, then one of these next hops, sayj m , is chosen with a probabilityp(j m ), m ∈ [1, k],

Xk m=1 p(j m ) = 1, by the routing protocol at nodei.

2 WhenDreceives a forward reactive ant fromS,Dsource-routes a backward reactive ant to the sourceS The backward reactive ant updates the routing table of all the nodes in the path fromS toD, allowing for data transfer fromS toD.

3 When a route fails at an intermediate node X, X first checks if there are other routes which can be used to route the packet 3 toD If not, then ANSI buffers the packets which could not be routed and initiates a route discovery to findDby using a forward reactive ant to perform local route repair Additionally,X sends a route error message back to the source nodeS.

4 Nodes belonging to more capable, infrastructured networks maintain routes to their connected components proactively, by periodic route updates using proactive ants. Nodes belonging to more capable, infrastructured networks also use stochastic routing when multiple paths are available In addition, each node in the infrastructure collects information about which mobile nodes are connected to which infrastructure node.

5 When a route toDis known at a MANET nodeS,Sdeterministically chooses the best next hop to reach D If S is part of a highly capable infrastructure, then S may choose to perform stochastic routing to the destination D, depending on the availability of multipath routes.

We claim ANSI will perform better than typical MANET protocols because of the working of the SI mechanisms at each node, which maintain routing information and local information more effectively than traditional MANET routing protocols In addition, the congestion-awareness of ANSI also helps in controlling the extent of congestion in high traffic scenarios Lastly, in hybrid networks, ANSI is able to leverage the power of nodes belonging to more capable networks to assist in routing activities of the network In the

3 The word “packet” is used in this chapter to denote the network layer PDU (NPDU). following sections, we explain the details of the above actions, and show how the SI mechanisms work at each node in maintaining routing information.

Protocol model

Data structure (1) below represents the structure of all ant packets, and data structures (2) and (3) below are maintained at each node, and are updated every time an ant arrives at the node.

(1) Ant structure: The following information is carried by an antπ:

(a) The ID of the ant, which is the (node ID, sequence number) pair,

(b) The number of nodes,m, whichπvisits, including the nodeπoriginated from,

(c) The nodes visited stack (adapted from [12]),S π , which contains information about nodesV ={v1, v2, vm}that can be reached by backtracking the ant π’s movement (using the nodes visited stack), and

(2) Ant decision table at nodei, Ai : (adapted from [22]) An ant decision table is a data structure that stores pheromone trail information for routing from node i to a destination d via k possible next hop nodes J = {j1, j2, , jk} In the ANSI network, the link, ij, between two nodes i and j is assumed to be bidirectional. Routing tables are computed from ant decision tables Each ant decision table entry,

A ijd , for nodeimaintains a row for the destination-next hop pair(d, j)along with theτijd(t),ηijd,ψijd, andaijdvalues described below:

(a) τijd(t)is the pheromone trail concentration left on a linkijused as a first hop to destination d at current time t due to all the ants that have traversed the trail, taking into consideration the pheromone evaporation (see Equation 3.6). τ is thus a weighted measure of how many times the linkij was traversed by packets intended todand is thereby a measure of the goodness of trailijd 4

(b) η ijd is the heuristic value of going fromj toi In our mapping,ηis a measure of the distance to the destination,distijd, going fromitod, when using next hopj We setη ijd = 1 + dist 1 ijd. (c) ψ ijd ∈ [0,1] is the value of the congestion status at node j If ψ ijd = 1, then, nodej is considered not congested, and if ψijd = 0, then the nodej is considered congested The value ofψ at a nodej is measured as the ratio of available space in the IP queue atj to the number of packets already in the IP queue atj.

(d) We see that the goodness of a next hop j is directly proportional to τijd(t), inversely proportional to distijd, and directly proportional toψijd Thus, we write: aijd = (cτãτijd(t) α )ã(cηãη β ijd )ã(cψ ãψ ijd γ ) (3.1) where cτ > 0, cη > 0, and cψ > 0 are arbitrary constants, and α, β, γ are integers such thatα, β, γ >0.

For our use, we need to normalize the above value of aijd so that we may gauge the relative effectiveness of each next hop We normalizeaijdsuch that aijd∈[0,1]: a ijd = (cτãτijd(t) α )ã(cηãη β ijd )ã(cψ ãψ ijd γ )

4 A trail is made of several links and denotes the full or partial path to the destination. whereJ is the set of next hops atito destinationd We then setc τ = c η cψ = 1, and arrive at: aijd(t) = [τ ijd (t)] α [η ijd ] β [ψ ijd ] γ

[τ ild (t)] α [η ild ] β [ψ ild ] γ (3.3) whereα, β andγ are chosen appropriately (see Section 3.2) The above formula was adapted from Ant Colony Optimization outlined in [22] The intu- ition behind this equation is that we want to use the metrics of hop distance and path goodness and allow some flexibility as to how much we rely on either metric by varyingα,β, andγvalues.

As soon as an antπis received at a nodeivia neighbor nodej,ihas the information aboutj’s congestion status fromS π The pheromoneτ ijv π deposited by an antπand the heuristicη ijv to a destinationv in the antπtraversing from nodej to nodeivia nodesv ∈V are given by the following equations: τ ijv π = 1 pj −pi

(v, i, j ∈V) (3.4) and η ijv π = 1 + 1 depth(v) (v, i, j ∈V) (3.5) wherepi andpj are the pheromone amounts of antπat nodesiandj, respectively, and V = {v1, v2, , vm} denotes the set of m nodes visited by π The value depth(v)is the depth of the nodev inπ’s nodes visited stack.

Allτ values inAi are evaporated according to Equation 3.7 each time another ant, π ′ , visits nodei Let us sayπ ′ traverses the same linkijat time(t+ ∆)as traversed byπ at timet π ′ then positively reinforces the trailijv, ∀v ∈ V inSπ ′ All other trailsiJ ′ x(whereJ ′ is the set of all possible next hops fromiexceptjandxis any node not visited by π′), in the ant decision tableAi are not positively reinforced, and in the event no ant traverses through any of the other trails iJ ′ x, the trails iJ ′ x eventually become invalidated (negatively reinforced) owing to pheromone evaporation The newτijv at time(t+ ∆)is calculated as follows: τijv(t+ ∆) =evaporate(τijv(t),∆) +τ ijv π ′ (3.6) whereτ ijv π ′ is the pheromone deposited on the trail byπ ′ overij (see Equation 3.4). The functionevaporate(τ ijv (t),∆)returns the pheromone amount left on trailijv for destination v (after evaporation) due to the ants which traversedij before π ′ The pheromone evaporation model used to calculate how much of the earlier pheromone trail,τijv(t), is left behind at(t+ ∆)whenπ ′ traverses the trailijvis as follows: evaporate(τijv(t),∆) = τijv(t)

After all the τ values in the ant decision table are evaporated (including theτiJ ′ V ′ values on the trailsiJ ′ V ′ that were negatively reinforced, i.e., no ants that traveled

V ′ = {v 1 ′ , v 2 ′ , , v m ′ ′} were received) and recalculated, theaijv values for all entries ofV inSπ ′ are recomputed and the new best next hops to destinationsV are computed again This computation is followed by an update of the routing table at node i Negative reinforcement of routes also happens when a route is explicitly invalidated by a route error message.

(3) Routing Table: The routing table at nodeiis a table containing an entry for each destinationdreachable from nodeialong with the best next hop,Ji d , tod The best next hop, Ji d , to a destinationdis the next hop that contains the largesta ijd value in A i The value of Ji d is thereby updated every time an ant visits a node i The routing table also contains the distance ofdfromiin hops, and this information is used to set the number of hops for route discovery when the routing table entry to dinibecomes defunct.

In the case of nodes which are part of highly capable infrastructure, the routing is stochastic, and the next hop is chosen directly from the ant decision table probabilistically Specifically, a next hopj at nodei for destinationd is chosen with a probability ofaijd.

The process of broadcasting ants during reactive/proactive route discovery/recovery/maintanance finds new routes to nodes and updates the information in the ant decision table accord- ingly Because of the nature of broadcast in the wireless medium, the routes found as a result of forward reactive ant activity reflect the current status of the network, and accord- ingly amplify the current fluctuations in the topology Another mechanism amplifies the fluctuations in the local area: when a node receives a unicast packet, the node notes the neighbor node ID and reinforces the path to the source of the packet via the neighbor In addition, when a data packet is sent along a next hop, the node reinforces the next hop as a valid next hop to the destination This mechanism also amplifies local fluctuations of network and topological characteristics, and guarantee that the nodes in the ANSI network use up-to-date network and topological information.

Protocol description

A trail ijd to destination d, τijd, is positively reinforced by ANSI when (a) a new route to a destinationdis found (via ant activity) at ivia next hop (neighbor node) j, and (b) when iuses an already known nexthop node j again to route a packet tod A trail ijd is negatively reinforced when (a) the trailijdto destinationdis subjected to evaporation (as per Equation3.7), and (b) when next hop nodej todis no longer available (owing to MAC layer errors, route errors, or congestion atj) In the following sections, we describe the various reinforcement mechanisms at work in ANSI.

Local route management—reinforcement by data packets and the use of neighbor discoveryHELLOmessages.

Local route management is made possible by reinforcement due to both movement of data packets and an explicit neighbor discovery mechanism These two concepts are illustrated in Figure 3.1.

When a data packet arrives at a nodeivia a neighbor nodepand is sent to the destination along next hop j, both the trail to the previous hop,ip, and the trail to the next hop, ij, are reinforced by the SI mechanisms ati.

In addition, nodes running ANSI periodically broadcast a HELLO message This message can contain a variety of information about the node sending the message, such as

(a) Reinforcement by data packets Node i, upon receiving a data packet from S via node j, reinforces the path to node j via j and the source S via j.

(b) Reinforcement in neighbor discovery mechanisms Upon receiving a HELLO beacon from j all nodes i reinforce trails via j.

Figure 3.1: Local reinforcement in ANSI. congestion status In ANSI, hello messages are used to perform local route management by positively reinforcing previously known neighbors and new neighbors The advantage of using this mechanism can be explained as follows: If a direct route to a destinationdis known ativia this process, then a previously known indirect route todis less favored than the direct route by the reinforcement mechanisms in ANSI Note that HELLO messages are sent via all available interfaces to facilitate neighbor discovery over all possible paths.

Non-local route management and explicit positive reinforcement

Reactive route discovery is performed by forward reactive ants, πf, and backward reactive ants,πb Reactive route discovery can be used both at the source of a data packet and at an intermediate node looking for an alternate route to the destination in the event that previously known routes to the destination have proved ineffective A route request is sent by deploying a forward reactive ant,π f , and the route reply is sent using a backward reactive ant,πb Even though multiple routes can be gathered by a source sending forward reactive ants (by allowing the destination to send backward reactive ants in response to all copies of the forward reactive ants received), we allow the destination to send a backward reactive ant only for the first forward reactive ant received The backward reactive ant generation is controlled as described above because we found that in a high traffic/mobility scenario, in which a MANET node has many routes to the destination, packet delivery from source to destination can suffer invariably because using several routes will spread the traffic over more nodes, and increase the contention in the network In this case, it appears like using one route deterministically, while keeping tabs on the congestion status of neighboring nodes (which is what ANSI does) is a better approach 5

Regardless, multiple routes are collected owing to the interaction of the ant information from the nodes in the network andHELLObeacons, and are used as and when older routes become defunct Note that regardless of collecting information about multiple routes via other mechanisms, ANSI uses a deterministic choice of next hops when using pure MANET nodes (highly capable nodes collect multiple routes and use stochastic routing, as we will see later) We use deterministic routing in pure MANET because we found that stochastic approaches in MANET nodes using ANSI are not suited to high data delivery in high traffic scenarios.

In Figure 3.2, consider a nodeSwhich needs to route data packets toD, but does not have a route toD NodeS buffers the data destined forDand broadcasts (over all interfaces) a forward reactive ant, π f SD (with a nodes visited stackSf), intended to discover a route toD Because there is a good chance thatDhas moved, ANSI sets the number of hops, φf, for the forward reactive ant (sent fromS) to be a few hops larger than the last known

5 Stochastic approaches to routing in pure MANET networks are an effective approach when the mobility and traffic in the network are low. distance of S fromD, which can be obtained from the routing table at S IfS receives data intended to D afterπ f SD has been broadcast, S buffers the data When D receives π f SD , D copies the nodes visited stack, S f , into a new backward reactive ant, π b DS , and killsπ SD f Dthen sendsπ b DS toS π b DS is not broadcast, but just backtracks to the source

S by using the nodes visited stack Sb in π b DS The ant, π b DS , when visiting a node X along the path to S positively reinforces the route to all nodesv ∈ Sb upstream fromX toD, and adds an entry inAX toDvia the next hop immediately upstream (in the path fromS toD) An intermediate node thereby knows what next hop to use to route toD.

In this way, backward reactive ants perform explicit positive reinforcement of routes to destinationD WhenS receivesπ b DS fromD, S sends the buffered packets intended for

D over the newly discovered route and flushesS’s buffer Note that multiple paths may be readily collected (for example, by sending another backward reactive ant for the ant proceeding to D via nodes 2 and 4), but this multipath backward ant generation is not done for pure MANET networks.

In the event that π DS b is not received at S within a timeout period, the value of φf is increased by 2 more hops and the search for the route resumes again The process of route discovery is continued again if a route is not found after the second try ANSI retries twice for a route to destination.

To control the amount of MAC layer usage at a node X, a scheduledHELLO message is broadcast at X only if the last broadcast forward reactive ant was sent before the last HELLOmessage.

Route errors, and negative reinforcement

Route errors occur at a nodeXwhenX is unable to provide a route for the destinationD owing to non-availability of a routing table entry atXor due to the non-availability of the next hop suggested by the routing table entry atX When a route error occurs at a nodeX

Figure 3.2: The propagation of the forward reactive ant (shown in solid arrows) and the return of the backward reactive ant (dashed arrows) The rebroadcast from node 2, when received at node 1 is killed immediately to prevent route loops At each nodeX the forward reactive ant enters, the ant reinforces the path from X to all the other nodes in the nodes visited stack Thus, the forward reactive ant fromS when received at node 4 reinforces the link

4−2to both node 2 and nodeS On the return path, all nodes in path of the backward reactive ant reinforce the trails to the paths to all the nodes in the path leading from the node upstream all the way to the destination Thus, when the backward reactive ant is received atS via path1−3 D,S will reinforce linkS−1for destinations1,3, D. in a network running ANSI,Xfirst buffers the packet whichXneeds to forward and then sends a forward reactive ant to find the destinationD IfXhappens to be an intermediate node, in addition to sending a forward reactive ant,X also sends a route error back to the sourceS of the packet The packets buffered atX are relayed across the network after a backward reactive ant fromDreachesX.

In addition, when a route error is received at an intermediate node between X and S,the node explicitly invalidates the routing table entries to D The packets received atX before the route error is received atSareX’s responsibility (to forward), but the packets generated after the time when the route error is received atSfromXareS’s responsibility –Sgenerates a forward reactive ant to find the route toD.

Proactive routing within highly capable sections of the network

As mentioned in Section 3.1.1, nodes belonging to non-mobile, highly capable infrastructure, such as cellular networks engage in proactive routing as well as reactive routing because these nodes are not concerned about topological fluctuations These nodes also maintain a list of mobile nodes which are accessible from each other, thus assisting the reactive routing process within the mobile nodes as and when possible Nodes in non-mobile, highly capable infrastructure send proactive ants periodically to all the other highly capable nodes they are connected to Proactive ants are not returned like forward reactive ants, and they reinforce the route to the proactive ant sender along the path the proactive ant takes Proactive ants, apart from carrying a nodes-visited-stack for gather- ing information about the nodes that were visited, are fixed in hop length and also carry a data structure for indicating the mobile nodes which are accessible from the proactive ant sender These nodes engage in proactive route collecting activity using all their interfaces, and so are able to combine routes found via different interfaces effectively during the routing process.

Performance evaluation

Simulation and network model

As mentioned in Section 3.1.3, the current implementation of ANSI usedα=β =γ = 2.

In both AODV and ANSI, the reactive route recovery is retried twice, and for ANSI, the last try usesφf = 15 These parameters were chosen specifically for the terrain used For the first two trials in ANSI,φ f is determined according to the information available about the unknown destination: if the destination had a valid entry in the routing table earlier, φ f is set to one more than the earlier number of hops to the destination Otherwise, the initial value is set at φf = 5 The evaporation constant, c, used in Equation 3.7 is 15s. The value ofcwas chosen to reflect the issues due to topological change resulting from the mobility in the network A new entry for next hop j in the decision table at node i is considered stale at iif the decision table entry has not been reinforced in 15 seconds, in which time the nodes iandj could have moved roughly one transmission range apart(15s×20m/s = 300m) In hybrid networks, the nodes which are part of the high-speed wired infrastructure (see Figure 3.5) used a proactive route update interval of 10s The proactive update interval was chosen to reflect the changes in the topology of the MANET region each infrastructure node serves ANSI operations at a node sent neighbor discoveryHELLOmessages once a second, if no ant was broadcast within the last one second from the node, to optimize the use of MAC layer resources.

For AODV, neighbor discovery mechanisms using HELLO beacons were enabled, and the beacon timer was 1s We enabled local repair for AODV The maximum number of RREQ retries for AODV was set at 2 These parameters were chosen from the default parameter settings for AODV detailed in [53] For both protocols, nodes use a 100MB buffer for protocol operations This buffer is used for storing packets for which routes have not yet been found and is maintained in a first come first served basis.

We performed five experiments, in which we studied the performance of ANSI and AODV with increasing traffic and increasing number of nodes under both UDP and TCP flows in both hybrid and pure MANET In experiments 1, 2, and 3, we study ANSI and AODV in a mesh networking context, for both UDP and TCP loads, in Experiment 4, we study ANSI and AODV in an FCS context and reliable communication using TCP flows, and in Experiment 5, we study the scalability of ANSI and AODV in an FCS context with UDP flows.

In all these experiments, the source and the destination are chosen randomly and are pairwise-distinct for each trial Both ANSI and AODV experiments used the same set of pseudorandom number seeds to generate the network and application characteristics, thus ensuring that ANSI and AODV operate under exactly the same characteristics.

In Experiments 1 and 2, we studied the performance of ANSI vs AODV in a hybrid network, for both UDP (Experiment 1) and TCP (Experiment 2) flows In these experiments,the non-mobile nodes are connected to each other over a 100Mbps Ethernet link Fig- ure 3.5(a) shows the simulation topology The size of the entire terrain is 2000m×2000m.Inside this terrain, there are four MANET “regions,” each of which contains 20 MANET nodes inside a terrain of size 500m×500m, and is “serviced” by one highly capable,immobile node (nodes 81–84) located in the center of the mobile region These highly capable nodes, located in the center of each of the regions, have both an Ethernet interface and an 802.11 interface, and are connected to each other by another highly capable node, node 85, which has 4 Ethernet interfaces Note that MANET nodes within a region are not able to communicate with MANET nodes of other regions directly (the closest they can get is around 353m, which is beyond the transmission range of the MANET nodes).

(a) The hybrid network topology for experiments 1 and 2.

(b) The hybrid network topology for Experi- ment 3.

Figure 3.5: Hybrid network topologies used for Experiments 1, 2 and 3.

Four traffic streams are chosen for each region, with one traffic stream headed towards each of the four regions (thus, one stream will be an “internal” stream) There are thus, altogether, 16 traffic streams in this experiment For Experiment 1, each of these streams sends 512-byte CBR/UDP APDUs at a periodic rate of 1–20 APDUs/s 6 For Experiment

2, each of the 16 data source/receiver pairs use FTP running over TCP to send and receive,

6 WhenxAPDUs/s are sent periodically, each APDU is sent from the application layer to the transport layer with an inter-APDU spacing of1/xseconds For example, for

4 APDUs being sent startingt = 40s for a duration of 2 seconds, the 8 APDUs are sent at the following times: t1 = 40.00, t2 = 40.25, t3 = 40.50, t4 = 40.75, t5 41.00, t6 = 41.25, t7 = 41.50, t8 = 41.75, whereti is expressed in seconds and refers to when thei’th APDU is sent from the application layer to the transport layer. respectively, one file whose size varies from 200,000 bytes to 2,000,000 bytes, in steps of 200,000 bytes.

In Experiment 3, we studied the performance of ANSI and AODV in a larger hybrid network consisting of 360 pure MANET nodes spread over 9 MANET regions uniformly located in a 5000m×5000m terrain Each of the MANET regions of size 1000m×1000m is serviced by one highly capable, immobile infrastructure node (nodes 361–369) located at the center of each MANET region, with infrastructure node 36x servicing MANET regionx The highly capable nodes are all connected via 100Mbps Ethernet links The topology of our experiment is shown in Figure 3.5(b) Each highly capable node has both Ethernet interfaces and an IEEE 802.11 interface Six CBR/UDP streams are randomly generated, with the following profile of the source-destination pairs: (a) region 1 to region

4, (b) region 1 to region 7, (c) region 8 to region 5, (d) region 8 to region 2, (e) region

3 to region 6, and (f) region 3 to region 9 The data sources generated APDUs at the uniform rate of 2 to 20 APDUs per second in steps of 2 APDUs per second The size of the APDUs sent was 512 bytes.

In Experiment 4, we studied the effect of increasing TCP traffic in a pure MANET network In this experiment, 50 nodes were placed uniformly in a network of size 1100m×1100m. This terrain has a node density 7 of 8.15m − 2 , which, according to [60], is sparse for a network with mobile nodes The experiment simulates 25 data source/receiver pairs using FTP over TCP to send and receive, respectively, one file whose size varies from 20,000 bytes to 200,000 bytes in steps of 20,000 bytes.

In Experiment 5, we studied the performance of ANSI and AODV under CBR/UDP loads in a pure MANET environment with an increasing number of nodes The number of nodes was varied from 50 to 250 and exactly half of the nodes were data sources The

7 Node density is defined as the number of nodes in an area covered by the transmission range of a node. terrain size was such that the node density was constant at 8.15m −2 (for example, for 50 nodes, the terrain size was 1100m×1100m) The CBR/UDP data sources generated one 64-byte APDU a second to be sent to the data sink.

In all the experiments, the MANET nodes were uniformly distributed initially in the terrain and the mobile nodes moved as per the random waypoint model with a minimum speed of 0.001m/s, maximum speed of 20m/s, and with a pause time of 10s In hybrid networks (Experiments 1, 2, and 3), the mobile nodes were restricted to move only within their region (bounded by a 500m×500m terrain for Experiment 1 and 2 and a 1000m×1000m terrain for Experiment 3) The MANET nodes in the experiments used one IEEE 802.11 interface with omnidirectional antennas and a transmission range of 250m at the physical layer and IEEE 802.11DCF at the MAC layer The link bandwidth for the mobile nodes using 802.11 was 2Mbps In addition to using IEEE 802.11, the non-mobile nodes also used Ethernet with a capacity of 100Mbps The simulations used a two-ray pathloss model and no propagation fading model was assumed.

All experiments using CBR/UDP loads (experiments 1, 3, and 5) were run for a simulated time of 5 minutes and all sources started sending APDUs at exactly 40s into the simulation and ended data generation at exactly 260s We chose to send data after 40s to allow for the mobility in the network to randomize All experiments using FTP/TCP loads (experiments 2 and 4) started sending one file using FTP at exactly 40s into the simulation and were run for 25 minutes simulation time, to allow for TCP to complete the data transmission or to allow for TCP to break the connection TCP-Reno was used as the TCP variant in our experiments, and used an MSS of 512 bytes, maximum send/receive buffer of 16384 bytes each, and delayed ACKs.

We study the following performance characteristics for CBR over UDP flows:

• (End-to-end metric 1) APDU delivery ratio: APDU delivery ratio is the ratio of

CBR APDUs delivered at the CBR layer of the data receivers to the number of CBR APDUs sent by the CBR layer of the data senders.

• (End-to-end metric 2) End-to-end delay: End-to-end delay is the average latency experienced by an APDU from the time the APDU is generated by CBR at the data sender to the time it takes for CBR at the data receiver to receive this APDU.

Simulation results

Experiment 1: Hybrid network—effect of increasing the UDP packet rate

Figure 3.6 shows the results for the performance of ANSI vs AODV over a hybrid network using UDP flows We see that ANSI consistently outperforms AODV in terms of APDU delivery (see Figure 3.6(a)), delay (see Figure 3.6(b)), jitter (see Figure 3.6(c)) and number of RERR initiated (see Figure 3.6(d)) ANSI and AODV send a comparable 8

8 Two protocols A and B are “comparable” when the confidence intervals of their average values overlap [37] In the same vein, protocolAis “better” than protocolB when the average value ofAis better and the confidence intervals ofAandBdo not overlap. number of MAC unicasts, as we see in Figure 3.6(e) In Figure 3.6(f), we see that ANSI sends fewer MAC broadcasts when the APDU rate is low to moderate, but as the APDU rate increases, ANSI sends more MAC broadcasts.

The reason why ANSI performs better than AODV—delivering more APDUs with better metrics such as delay, jitter and number of route errors—is because ANSI manages the local network information better than AODV does, and performs congestion-aware routing ANSI reacts to congestion by repairing routes that are heading towards congestion and preventing route errors due to congestion For the same reason, ANSI shows lower route errors as compared to AODV (see Figure 3.6(d)) Owing to the above reasons, the ANSI network has fewer route request operations When routes do break in ANSI, they are managed by the protocol mechanisms locally rather than a network-wide flooding, in turn resulting in lower congestion at the nodes Thus, even though ANSI shows larger MAC broadcasts in higher APDU rates (see Figure 3.6(f)), ANSI still shows delays and jitter lower than that for AODV The higher number of broadcasts when the APDU rate increases is because of the congestion-aware properties of ANSI, which allow ANSI to drop badly congested routes and look for new ones This property, while delivering APDUs more quickly and smoothly, makes ANSI incur more route discovery overheads, which is what we see in terms of larger MAC broadcast overheads in Figure 3.6(f) The new, congestion-free (or low congestion) routes are then used to deliver more APDUs in ANSI. Note that AODV does not show an appreciable increase in the number of MAC broadcasts as the APDU rate increases because AODV does not perform congestion-aware routing, but owing to this property, the performance of AODV degrades The fact that the number of ANSI’s MAC unicasts, shown in Figure 3.6(e), are comparable to that of AODV (in the context of better performance metrics), along with its fewer route errors is an indication of the fact that ANSI is engaged in providing/finding better routes as compared to AODV.

The reason why ANSI has lower end-to-end delay as compared to AODV (see Fig- ure 3.6(b)) is because ANSI performs automatic load balancing in the network by always choosing paths which are least congested This load balancing feature of ANSI allows ANSI to have a consistent and smooth variation in the end-to-end delay which increases very little as APDU rate increases, because of the fact that the majority of the traffic is borne by the immobile mesh On the other hand, AODV continues to take congested paths until the nodes along these paths start dropping packets This property of AODV is why we see that the end-to-end delay as APDU rate increases is inconsistent for AODV The reason why ANSI shows a high variation in end-to-end delay (and thereby a higher average for the same metric) at 1 APDU/s is because at low APDU rates, the ANSI network is unable to update the congestion status at the intermediate nodes frequently, resulting in congestion along some paths.

The reason why jitter decreases with increasing APDU rate in both ANSI and AODV (see Figure 3.6(c)) is as follows: Jitter is a measure of the variation of interarrival times at the destination Thus, if end-to-end delay measured at the destination varies little, then jitter is bound to be low ANSI, being congestion-aware, chooses congestion-free routes and delivers APDUs at the destination with little variation in end-to end delay AODV, because it is not congestion-aware, delivers APDUs along congested routes, which results in higher end-to-end delays because a node running AODV does not react to congestion until a congested node along the path is no longer able to receive or transmit APDUs.Thus, for a single stream of UDP traffic from one source to one destination in AODV,the destination first experiences low variation in end-to-end delay, but thereafter, the path becomes more congested and the variation in end-to-end delay progressively increases until the path breaks AODV then engages in route discovery and finds a congestion-free path, and once again the measurement of end-to-end delay at the destination shows low variation until the new path becomes congested again Statistically, the value of jitter depends on the percentage of the APDUs that are delivered at the destination with low variation in end-to-end delay, and so if a higher percentage of APDUs are delivered with a higher variation, the jitter is bound to be larger.

As APDU rate increases, for both AODV and ANSI, the mean time before links break owing to node mobility is still the same However, because of the use of highly capable nodes (which are within 2 hops away (353m) for any MANET node), the percentage of APDUs delivered with lower variation in end-to end delay increases (in comparison to the number of APDUs delivered at higher variations in end-to-end delay) for both AODV and ANSI, thus bringing the overall variation down This reason is why we see a decrease in jitter as APDU rate increases for both ANSI and AODV.

Experiment 2: Hybrid network—effect of increasing the size of the file sent using FTP over TCP

Figure 3.7 shows the results for the performance of ANSI and AODV over a hybrid network under TCP flows We see that the ANSI and AODV performance are comparable, in terms of the number of files fully downloaded (see Figure 3.7(a)), session duration (see Figure 3.7(b)), and the number of MAC layer unicasts sent (see Figure 3.7(e)), but the ANSI network shows more duplicate acknowledgments than the AODV network (see Figure 3.7(c)) In addition, ANSI sends fewer MAC layer broadcasts (see Figure 3.7(f)) and fewer route errors (see Figure 3.7(d)) as compared to AODV Both the ANSI and the AODV networks succeed in downloading most of the files fully (a total of 16 file downloads are simulated in this experiment).

The above results can be explained as follows In a hybrid network, ANSI is able to take advantage of the presence of infrastructure in the following manner Each infrastructure node maintains a list of MANET nodes which are serviced by the infrastructure node.These lists are propagated inside the highly capable node mesh When a MANET node s from a MANET region requests a route to a MANET node d in another MANET region, the highly capable nodeh servicing the MANET nodes containing nodes already has a route to node d, and prevents the propagation of the forward reactive ant from s, by sending a backward reactive ant from node h This feature explains what we see in Figure 3.7(f) For the ANSI network, the fact that the forward reactive ants from the nodes in the MANET regions are stopped at the highly capable nodes result in fewer MAC layer broadcasts (a measure of how many forward reactive ant broadcasts were sent in the network), but the same is not the case for AODV For both AODV and ANSI in this scenario, the rate of route discovery activity as file size increases is not governed by mobility, because in either case, the MANET node is able to reach the infrastructure node in at most 2 hops (the maximum distance from any node in the MANET region to the infrastructure node that services the MANET region is 353 m in this topology) For the ANSI network, however, increasing congestion in the routes will result in increased ant activity to discover fresh paths towards the destination This increase in ant activity results in an increase in the number of broadcasts in the network as file size increases. For the AODV network, however, given the fact that AODV does not react to congestion until the route is broken, the rate of increase in the number of MAC layer broadcasts is negligible Note that in this experiment, the majority of the traffic is borne by the highly capable node mesh, resulting in fewer route breakages in the MANET regions.

The number of duplicate acknowledgments sent in the network is shown in Figure 3.7(c).The number of duplicate acknowledgments sent in the network is an indication of either the congestion in the network or the fact that TCP segments are being received out of order Our observations can be explained as follows For ANSI, the number of duplicate acknowledgments is greater than for AODV because of the congestion owing to proactive ant activity at the highly capable node’s wireless interface, which interferes with allMANET nodes within a transmission radius from the center of the MANET region, where the highly capable node is located This proactive ant activity at the highly capable node interferes with both data flow from the data senders and ACK flow to the data senders in the MANET regions and also data flow to the data receivers and ACK flow from the data receivers in the MANET regions ANSI’s reaction to congestion in the MANET region of the network (by sending forward reactive ants) also results in route fluctuation, which in turn causes out-of-order segment delivery at the TCP layer at the data receiver.

In Figure 3.7(d), we see the number of route errors generated by ANSI is lower than the number of route errors generated by AODV for the same reasons as Experiment 1.

Experiment 3: Large hybrid network—effect of increasing APDU rate for an application using UDP

Figure 3.8 shows the results for the performance of ANSI and AODV in a larger hybrid network The results are similar to the results of Experiment 1, shown in Figure 3.6, though the differences between ANSI and AODV performance is more pronounced in this experiment We see ANSI’s APDU delivery metrics (see Figure 3.8(a)) are on average around 10% more; ANSI also delivers these APDUs in roughly 1/3 as much time as AODV (see Figure 3.8(b)), with lower jitter (see Figure 3.8(c)), and fewer route errors (see Figure 3.8(d)) As with Experiment 1, we also see, in Figures 3.8(e) and 3.8(f), that these performance improvements are made in ANSI at the cost of higher MAC layer resource consumption owing to the congestion-aware routing in ANSI.

ANSI’s performance in comparison to AODV is better in Experiment 3 than in Experi- ment 1 due to the fact that in Experiment 3, ANSI is able to take advantage of proactive routing/stochastic routing in the highly capable nodes, which results in automatic load balancing for ANSI.

The reason why APDU delivery decreases, for both ANSI and AODV, more drastically (asAPDU rate increases) for this experiment as compared to Experiment 1 (see Figure 3.8(a)) is because of the effect of a larger mobile area In Experiment 1, the highly capable node in each mobile region is in the worst 353m away (250m×√

2) from a mobile node, which is 2 hops, but in Experiment 3, the highly capable node is in the worst case 707m away, which is 3 hops away Thus, the effects of traffic under high mobility scenarios weigh in more in Experiment 3 than in Experiment 1.

Finally, we note that the graphs for end-to-end delay, Figure 3.8(b), and jitter, Figure 3.8(c), are similar to the corresponding graphs for Experiment 1 for the same reasons.

In addition to the benefits of ANSI discussed in Experiment 1, the end-to-end delay and jitter characteristics are also favorably influenced by ANSI’s congestion-aware routing and stochastic routing, which result in automatic load balancing.

Experiment 4: Pure MANET—effect of increasing the size of the file sent using FTP over TCP

Figure 3.9 shows the results of the performance of ANSI and AODV as FTP over TCP traffic increases in the network Both ANSI and AODV show comparable numbers of files are fully downloaded (see Figure 3.9(a)), with ANSI showing either lower session duration or comparable session duration as compared to AODV (see Figure 3.9(b)) We also see, as with Experiment 2, in Figure 3.9(c), larger number of duplicate acknowledgments for ANSI, and lower number of route errors for ANSI (see Figure 3.9(d)) However, we see that ANSI sends more MAC unicasts and broadcasts as compared to AODV (see Figures 3.9(e) and 3.9(f), respectively).

Discussion

In these five experiments, ANSI is able to perform better or comparably when compared to AODV owing to a combination of both better route management and congestion-aware characteristics In addition, in hybrid ad hoc networks, ANSI is able to harness the power of proactive/stochastic routing in immobile, highly capable nodes connected to each other over wired links An insight we gain in these experiments is with regard to congestion- aware routing We see that there are benefits to doing congestion-aware routing, but it comes at a cost of MAC layer resources ANSI, being congestion-aware, results in fewer route errors, but uses a larger amount of MAC resources in general Congestion-aware routing protocols will have to invalidate routes more often than their counterparts which are not congestion-aware This property presents a trade-off between performance and finding congestion-free paths When the traffic load increases, which is when performing congestion-aware routing is most useful, the above-mentioned balance is delicate On the one hand, performing congestion-aware routing adapts the network to congestion, but congestion-aware routing also decreases the resources available to send data ANSI is able to handle the tradeoff between performing congestion-aware routing and incurring excessive overheads, and thus has significant advantages over AODV in low traffic networks, as also in high traffic scenarios However, we also see (in Experiment 3) that these positive aspects of performing congestion-aware routing are useful in UDP-based traffic, where out-of-order packet delivery is not an issue, but under TCP traffic, the congestion- awareness property can result in route fluctuation causing increased load on the TCP layer at the data sender (by increasing the number of duplicate acknowledgments).

Another insight is regarding the use of TCP to study MANET routing protocols Typi- cally, use of TCP over MANET is a divisive idea, with a lot of research leaning towards the fact that TCP is a bad protocol to use over MANET [26] Regardless of the stand taken by MANET transport layer researchers, from our results, we see that studying TCP loads over routing protocols can lead to better understanding of the behavior of the routing protocols We investigate the merits of using TCP to evaluate unicast routing protocols forMANET in the next chapter.

APDU sending rate at data sender (APDUs/s)

Figure 3.6: Experiment 1: Performance studies of ANSI vs AODV in a hybrid network with UDP flows.

Number of files fully downloaded

(a) Number of files fully downloaded.

(c) Total number of duplicate ACKs.

Figure 3.7: Experiment 2: Performance studies of ANSI vs AODV in a hybrid network with TCP flows.

APDU sending rate at data sender (APDU/s)

APDU sending rate at data sender (APDUs/s)

Figure 3.8: Experiment 3: Performance studies of ANSI vs AODV in a (larger) hybrid network with UDP flows.

Number of files fully downloaded

(a) Number of full files fully downloaded.

(c) Total number of duplicate ACKs.

Figure 3.9: Experiment 4: Performance studies of ANSI vs AODV in a pure MANET with TCP flows.

Total number of RERR initiated

Figure 3.10: Experiment 5: Performance studies of ANSI vs AODV in a pure MANET network with UDP flows and a varying number of nodes.

Chapter 4 USING TCP TO EVALUATE ROUTING PROTOCOLS

In this chapter, we distinguish the practical use of TCP for networking operations from the use of TCP for evaluation Specifically, we are concerned with whether or not using TCP as an evaluation tool for MANET routing protocols will yield new insights into the protocol design, even though we readily agree with past research that using TCP as-is

(that is, as designed for wired networks) for practical use over MANET is not a good approach We motivated this idea because even though TCP does not function well over MANET, legacy issues dictate that TCP is “here to stay,” and the problem of TCP traffic over wireless networks is especially pertinent, given the merging of wireless networks and the Internet In other words, while we agree that TCP is not practical for use in MANET, the reality is that some applications are going to use TCP, so it is important to get the best performance when TCP is used In addition, the fundamental expectations TCP has over the network, such as the congestion, feedback mechanisms supported by a bi-directional path, fairness of bandwidth distribution in the presence of multiple flows, etc., are “universal” metrics which MANET researchers should strive for as well Thus, by investigating the performance of the MANET routing protocols under TCP traffic, we will gain a different, “per-flow” understanding of routing protocols Such an understanding of the routing protocols is not possible when we apply CBR/UDP loads for evaluation.

We are also motivated by the fact that the routing layer is the lowest protocol layer where a notion of destination exists (as compared to a notion of next hop) Thus, the lowest protocol layer to have explicit end-to-end purviews (TCP) will be affected by the routing protocol Thus, we see that deficiencies pertaining to the end-to-end-ness of the routing protocol can only be pointed out by using TCP flows for evaluation, and we should try to improve the routing protocol’s behavior such that the routing protocol affects TCP performance favorably.

Lastly, we are motivated by the fact that the performance of a network running a UDP- based application varies fundamentally from the performance of a network running a TCP-based application for the same settings Consider the following scenario A robotic sensing node/sink pair is used for remote monitoring/survivor location in a rescue mission In such missions, a robotic node is required to sample and collect data periodically about the robot’s sensing area and send the collected data over to the sink, one APDU at a time 1 Note that we are not concerned with a file download application, but instead concerned with sending as many samples (APDUs), generated at the robot node, over to the sink, while pressuring the network minimally.

In our study in the above example scenario, two networks running AODV [53] in identical environments 2 , are tested, and these two networks differ only in the application used For this scenario, we study the performance characteristics when sending data from 25 different robot nodes to 25 different sink nodes (data source/sink are pairwise distinct) using a

1 WhenxAPDUs/s are sent periodically, each APDU is sent from the application layer to the transport layer with an inter-APDU spacing of1/xseconds For example, for

4 APDUs being sent startingt = 40s for a duration of 2 seconds, the 8 APDUs are sent at the following times: t1 = 40.00, t2 = 40.25, t3 = 40.50, t4 = 40.75, t5 41.00, t6 = 41.25, t7 = 41.50, t8 = 41.75, whereti is expressed in seconds and refers to when thei’th APDU is sent from the application layer to the transport layer.

2 The network is a 50-node network in a 1100m ×1100m terrain using 802.11b at the physical layer (with a250m transmission range) at 2Mbps and 802.11DCF at theMAC layer The mobility model is Random Waypoint with a maximum speed of20m/s and minimum speed of0.001m/s and a pause time of10s Our AODV model used default parameters, and processesHELLOmessages (sent every1s), performed local repair, and retried route requests twice.

APDU generation rate at data sender (APDU/s)

# of APDUs rcd at sinks

(a) Number of APDUs received at the application data receiver.

# APDUs sent by data sender

(b) APDUs sent by data sender.

(c) 802.11DCF, Unicasts sent (MAC-PDUs).

(d) 802.11DCF, Broadcasts sent (MAC-PDUs).

Figure 4.1: AODV performance supporting TCP loads and UDP loads.

UDP-based application and a TCP-based application In either of the above described networks using UDP-based and TCP-based applications, the application layer periodically generates xAPDUs/s of size 64 bytes intended to one data sink for a duration of 220s,starting at 40s into the simulation (we start the flows at 40s to ensure that the mobility in the network has randomized) The UDP-based application was CBR and the TCP-based application was Super Application, a generic application layer protocol which simulatesCBR-like traffic generation, but uses TCP The experiments were performed for a simulated time of 5 minutes.

The performance results of this study in our example scenario is shown in Figures 4.1(a)– 4.1(d) For this scenario, We see how the network using the UDP-based application con- sumes more MAC resources (shown as the number of MAC layer unicasts and broadcasts, Figures 4.1(c) and 4.1(d), in terms of the number of MAC-PDUs) for the smaller number of APDUs received at the sink’s application layer (see Figure 4.1(a)) The UDP-based application also ends up receiving an increasingly lower percentage of APDUs sent as the application layer APDU rate increases, despite the increase in the number of APDUs sent by the senders (see Figure 4.1(b)) The network using the UDP-based application sends more APDUs as APDU rate increases because UDP-based applications do not perform flow/congestion control Owing to the same reasons, UDP-based applications also choke the network, making inefficient use of MAC resources, judging by the amount of MAC resources consumed per APDU delivered to the application layer Thus, as the APDU sending rate increases, UDP’s use of network resources comes with heavy penalties The network using the TCP-based application, on the other hand, places loads into the network carefully—by gaging the amount of data actually received at the receivers, the network using the TCP-based application regulates the rate at which APDUs are pumped into the network and buffers the rest Owing to this property, the network using the TCP-based application controls congestion better, and by the same token, makes more efficient use of MAC resources Note that if the TCP-based data senders have enough energy for transmission and the network was operating for longer, TCP will eventually complete sending all the APDUs to the sink, provided the TCP connection between the sender and receiver does not break In Figure 4.1(a), the reason why the network using the TCP-based application delivers fewer APDUs than generated (220×25×x) is because the network is increasingly congested as APDU generating/sending rate increases, and TCP, by performing flow/congestion control and send/receive buffer blocking, sends fewer APDUs into the network, as we can see in Figure 4.1(b).

From the above study, we see that (a) as far as the communication between the robot node and the sink is concerned, using an application over TCP will serve the robot node/sink pair’s function better, and (b) we see that the UDP approach to evaluating routing protocols will force designers to assume the “worst case” during routing protocol design. These observations motivate us to study this scenario further, and form the basis of our simulation experiments, described in Section 4.2 in this chapter.

The rest of the chapter is organized as follows: In the next section, Section 4.1, we discuss how the MANET environment affects TCP and how TCP affects the routing protocol in a MANET, followed by what metrics we can study when using TCP In Section 4.2, we discuss our experimental results and insights obtained when using TCP over MANET routing protocols.

TCP and the routing protocol

How are MANET routing protocols affected under TCP loads?

During the connection establishment process, TCP expects a bi-directional path to be set up and maintained (by the routing protocol) for the duration of the connection This bi-directional path allows for the smooth flow of data/ACKs and the maintenance of end- state information at the end points of the connection Routing protocols that establish connections only in one direction thus incur additional overheads under TCP loads and increase the setup time for the TCP connection Reactive routing protocols should set up the path in both directions (source↔sink) With proactive protocols, this path setup problem is not an issue, but both classes of protocols should maintain the path and react fast enough so that TCP, operating above the routing layer, does not lose data (APDUs)/ACKs due to frequent congestion (caused by hotspots) or frequent path breakages common in a MANET environment If the routing protocol does not react fast enough, TCP performance suffers.

For example, if a link breaks in the path between source←sink during a connection, ACKs will be lost TCP at the data source responds by reducing the congestion window (because TCP at the data source perceives congestion in the network) so fewer APDUs are sent into the network Of course, if the routing protocol responds quicker than TCP in recognizing congestion or link breakage, then, TCP is oblivious to the problem, because the routing protocol has repaired the path Otherwise, TCP’s response of congestion window reduction will reduce the throughput of the connection Thus, a routing protocol not reacting quickly enough to topology change and congestion exposes TCP to the adverse effects of link breakage and congestion in the network.

Recall our robotic sensing/rescue mission scenario discussed in the introduction In this network, consider a robot data sender, S, generating application data intended to a sink (receiver), R, at the rate ofxAPDUs/second using a TCP-based application AtS, TCP buffers these application layer APDUs (sent by S’s application layer) in the TCP connections’ send buffer TCP has to first set up the connection betweenS ↔ R before the transmission of application layer data can begin A TCP connection is set up between S andRas soon as a bi-directional route is computed (by the routing protocol atS) between

S andR, and the three-way handshake has completed At the time when the application layer requests a transmission atS, if the routing layer atS does not have a bi-directional route between SandR, the routing protocol atS has to compute the bi-directional route before a connection can be set up between S andR As soon as a connection is set up using TCP betweenS andR, TCP atS is able to send data toR.

We explain the problems in a MANET running TCP-based applications using the following scenario In the above setup, consider an application at S that sends APDUs to R at a uniform rate of x APDUs/second starting at t = t0 seconds and ending at t = te seconds Consider that at time t = t 1 ∈ (t 0 , t e ) seconds, the TCP connection at S has not received an ACK from R within the RTO TCP mechanisms at S respond by re- transmitting the unacknowledged segment and resetting the RTO to a larger value, say, r1 seconds If the segment is unacknowledged again, then, the TCP mechanisms will further increase the RTO to r2 = 2× r1 seconds If the segment is not acknowledged again (withint 1 +r 1 +r 2 seconds), then the retransmission timer is doubled again, until an upper bound of 64 seconds After the final retransmission timeout period expires with the unacknowledged segment still unacknowledged, the TCP connection is dropped. However, RTO values can change depending on how long the transmitted segments take to be acknowledged [51] took (because the time taken for acknowledgment affects RTT calculation, which in turn affects RTO calculation) In the case of MANET, these values can vary drastically across the network In case the RTO is a large value, say, 32 seconds, the TCP connection takes a shorter time to be dropped.

In addition to the above, two issues play a key role in determining the performance of TCP-based applications in MANET First, for an already established connection, the congestion in the network increases as the value ofxincreases (upto a certain point, depending on the congestion in the network) This increase in congestion makes connection establishment increasingly difficult for other robot data source/sink pairs, because the wireless medium is a shared medium in which increasing traffic along one wireless hop not only involves the nodes directly involved in the hop by hop transaction, but (a) all nodes in the one-hop radius from the sender, and (b) all nodes in the one hop radius from the receiver Apart from the above, (a) the interference range may extend to one hop nodes from the neighbors of both the sender and receiver, and (b) the neighbors of the sender and receiver are unavailable for transmission with other nodes owing to the fact that they are exposed terminals.

Second, mobility of the nodes make connection management difficult, and may result in incorrect RTO settings This problem arises due to the mobility profile of the nodes in the MANET which may result in temporary network partitions, during which time the receiver is not able to acknowledge TCP segments, even though the receiver might have received the TCP segment In this case, the data sender assumes that the network is congested, even though the network may not be, and increases the sender’s TCP RTO value and reduces the sender’s TCP congestion window, reducing the throughput of the connection.

Some metrics for studying TCP loads over MANET routing

Using the insight gained by the issues discussed above we try to answer the following questions:

Q.1: What metrics of a network using a TCP-based application do we study to understand what aspect(s) of the routing protocol?

Q.2: Why can we not study these metrics of the routing protocol by using UDP loads? Q.3: Why are those metrics important?

We address Q.1 on a case-by-case basis For Q.3, we note that the expectations which TCP has over a network are “universal” metrics which routing protocols for MANET should also strive to provide for the MANET transport layer Thus, if TCP unearths some problems in the routing protocol, these are problems which the routing protocol designer should try and address for effective routing For Q.2, we note that by using TCP loads, we are able to study “per-flow,” end-to-end metrics in the context of reliable and fair data delivery, which cannot be studied under CBR/UDP loads.

We now discuss some metrics for studying TCP loads over routing protocols for MANET.

Fairness index of end-to-end metrics

Under TCP flows, measuring the end-to-end metrics such as the number of APDUs received at the sinks, the number of APDUs sent at the senders, and the end-to-end throughput and calculating the fairness index of these metrics help us understand how TCP reacts to the network conditions This study in turn helps us understand how the routing protocol has assisted TCP operations.

The Jain fairness index [37] of a sampleX ={x1, x2, xn},FX, is defined as follows:

FX = (Pn i=1xi) 2 nãPn i=1(xi) 2 such that FX ∈ (0,1) An FX close to 1 indicates that the measured metric has high fairness.

Some end-to-end metrics that can be measured for fairness are

1 The number of APDUs received at the application layer data sinks, measured across all application layer data sinks The fairness index of this metric refers to how the number of APDUs received at sinks is distributed A large variation in the APDUs received metric across receivers will result in low fairness.

2 The throughput at the data sinks across all application layer data sinks The fairness index of throughput at the data sinks refer to how the variation of throughput across data receivers is distributed If receivers across the network experience wide variations in the throughput, the fairness index measured is small.

The fairness index of the observed metrics across the data sinks/sources (ends of the TCP connection) should be as high as possible High variation in the number of receivedAPDUs, throughput etc result in low fairness indices for these quantities and suggest that the TCP flows have not “fairly” divided the bandwidth among them, thus indicating the failure of the routing protocol to do load balancing Information such as IP queue length may be used by the routing protocol to efficiently divide the flows in the network and prevent the creation of hotspots.

In addition to the above, when measuring the above metrics over several trials of an experiment, longer confidence intervals refer to the fact that the protocol shows wide variation in fairness index from one scenario to another.

Connect time is the time duration from the time when the application layer at the data sender requests TCP for a connection to a data receiver to the time when the connection between the data sender and the data receiver is established TCP connect times have been used before to evaluate the performance of routing protocols under (modified) TCP in a MANET, for example, in [24] The authors in [24] employ a fixed RTO (retransmission timeout) scheme to help the MANET environment distinguish between node unreachability (due to temporary partitions) and congestion at the next hop In using un-modified TCP as a tool for evaluating routing protocols for MANET, low connect times indicate a routing protocols’ resilience to both congestion and node unreachability under which conditions, TCP does not spend too much time in connection establishment.

The Jain fairness index formula can also be used to compute the dispersion of the connect times at data senders We call this metric Connect Time Closeness (CTC) In the ideal case, the CTC is as close to 1 as possible, that is, the connect times are all equal A CTC of 0 is measured when only one connection is made while the others are not That is, the connect time for one flow is finite, but the connect time for other flows is infinite.

As before, a smaller length of the confidence intervals of the connect time and CTC metrics indicate how consistent the protocols are over several trials of the same experiment.

Measuring the rate and extent of the congestion window growth and studying the congestion window-simulation time graph of the connection allows us to accurately study the congestion along the connection for the duration of the connection.

Measuring congestion windows for routing protocols has been done previously in litera- ture, for example, in [24] TCP starts a session using slow start (exponentialsnd cwnd 3 growth untilsnd cwnd > ssthresh), but TCP reacts to the (perceived) segment loss by switching to congestion avoidance (by manipulatingssthresh, resulting in linear snd cwndgrowth) In Figure 4.2, we see how both ANSI (see Chapter 3), a congestion- aware, hybrid, ant-based protocol, and AODV [53] have switched to congestion avoidance almost immediately after the start of the TCP session (40 seconds).

In Figure 4.2, we see that ANSI is able to steadily grow itssnd cwndto the maximum (16384 bytes), while AODV suffers from repeated losses (as seen from the number of timeouts, which is more than 10) Note that ANSI does not suffer even one timeout during the entire simulation (TCP connection closes at 260 seconds) ANSI is able to maintain a non-congested route from the data source to the data sink consistently and efficiently, owing to its congestion-aware behavior, thus steering IP from using next hops and paths which are congested AODV, however, reacts too slowly to congestion and link breakage, thus exposing TCP to the issues due to link breakage and congestion.

If the congestion window growth under a routing protocol shows a predominantly slow

3 TCP controls the rate and extent of its congestion window growth by manipulating the send congestion window,snd cwndand the slow start threshold,ssthresh.

Figure 4.2: ANSI vs AODV: congestion window growth for one TCP sender for the outgoing (headed towards the data sink) stream in a hybrid network containing both fixed infrastructure (with multiple interfaces, 802.11b and Ethernet) and MANET nodes. starting TCP, then the network using the routing protocol is not congested On the other hand, congestion avoidance behavior can also indicate low congestion levels if the congestion window grows to the maximum Frequent timeouts (resetting the congestion window to 1 MSS) indicate severe congestion and link breakage in the network, and point to the lack of resilience of the routing protocol.

Studying TCP for some routing protocols

Network and protocol models

ANSI and AODV protocol models, physical, MAC layer characteristics are identical to those used in Section 3.2.

We performed four experiments In all experiments, we used QualNet’s Super Applica- tion, which is a generic application used to generate a constant rate of TCP-based application layer traffic between a source and a sink (several APDUs/s) The APDUs are all 64 bytes in size In experiments 1, 2 and 3 we used 25 concurrent TCP streams under Super Application in QualNet In Experiment 4, we studied the effect of existing background traffic by using CBR/UDP-based flows on the connect times for TCP-based application sessions We used TCP-NewReno with a maximum send/receive buffers of 16384 bytes.

In all the experiments, 50 nodes with one IEEE 802.11b interface were used.

In Experiment 1, 50 MANET nodes are uniformly placed in a square terrain of 1100m×1100m. The random waypoint minimum speed was set at 0.001m/s and the maximum speed was set at 20m/s, with a pause time of 10s A non-zero value for minimum speed was chosen to alleviate the problems due to speed decay in the random waypoint model The experiment was simulated for a total time of 5 minutes In this experiment, we varied the APDU rate of the application from 1 APDU/s to 20 APDUs/s in steps of 1 APDU/s.

In Experiment 2, 50 MANET nodes are uniformly placed in a square terrain of 1100m×1100m.

The random waypoint speed was varied from no mobility (static) to 20 m/s in steps of 4 m/s To alleviate the problems due to speed decay and understand the effects due to mobility, we set both the minimum and maximum speed in the mobility model to be the same The pause time was set at 10s The experiment was simulated for a total time of 5 minutes, and the applications generated APDUs to be sent to the data receiver at the rate of 10 APDUs/s.

In Experiment 3, 50 nodes are uniformly placed in a terrain of varying size We varied the size of the network from a square network to an increasingly rectangular (aspect ratio of height to width is x, is greater than the connect time for connection x, because connection y is established in the context of a larger number of TCP flows,because connections(1, ,(y−1))have started sending data Ideally, the connect times for all TCP connections should be as close to each other as possible We measure this closeness of connect time using the CTC metric.

For Experiment 4, we measure the connect times under a constant background UDP traffic, but chronologically order the TCP connections (and the duration of data flow using TCP) in such a way that one TCP connection does not interfere with another, so CTC is not an appropriate metric in Experiment 4.

We performed 20 trials of all our experiments (to account for stochastic aberrations) We report our findings below with graphs showing the average and 95% confidence intervals of all the observed metrics.

Simulation results

Figures 4.3(a) – 4.3(d) show the results of Experiment 1; Figures 4.4(a) – 4.4(d) show the results of Experiment 2; and Figures 4.5(a) – 4.5(d) show the results of Experiment 3.

We explain our observations below.

APDUs received and fairness of APDUs received

From the figures for experiments 1, 2 and 3, we see that in general, ANSI is able to deliver more APDUs to the receiver in the presence of multiple TCP streams (see Fig- ures 4.3(a), 4.4(a), and 4.5(a)) In addition, ANSI has a higher or comparable fairness index of the number of APDUs received across the data sinks as compared to AODV (see Figures 4.3(b), 4.4(b), 4.5(b).

As the APDU sending rate increases, the congestion in the network increases, thus both protocols show a decreasing percentage of the APDUs being delivered (see Figure 4.3(a)).

As the possibility of MAC layer collisions 4 increase, despite the routing protocol’s best efforts, it is increasingly difficult to distribute network resources evenly amongst flows.

4 MAC layer collisions occur at a node x due to the physical layer interference of signals transmitted simultaneously in the neighborhood of nodex.

(a) Number of APDUs received at the application receiver.

FI (APDU rcd at L5) ANSI

(b) Fairness index of the number of APDUs received at the application receiver.

APDU sending rate at data senders (APDU/s)

(c) Connect time at TCP senders.

APDU sending rate at data senders (APDU/s)

Figure 4.3: Experiment 1: Performance of ANSI and AODV with increasing APDU sending rate.

This reason is why the fairness index decreases as the APDU sending rate increases (see Figure 4.3(b)).

In Figure 4.4(a), when the speed increases, we see that the number of APDUs received decreases slightly for ANSI and increases slightly for AODV Increasing rate of link breakage (with increasing speed) has a different effect on either protocol For ANSI, this increase in link breakage results in more aggressive congestion-aware routing, thus increasing hop length but decreasing APDU delivery For AODV, this increase in link breakage translates to more aggressive route discovery activity, improving the chances of APDU delivery ANSI shows an improvement in fairness as speed increases (see Figure 4.4(b)) owing to more aggressive congestion aware routing (which helps in load balancing), and AODV shows an improvement in fairness because of more aggressive route discovery activity.

Figure 4.5 shows the performance of ANSI and AODV when the network terrain changes from highly rectangular to a square terrain In Figure 4.5(a), we see that as the network becomes squarer, both ANSI and AODV are able to deliver more APDUs owing to shortening in the number of hops to the data sink This path shortening also makes more nodes available between a source and sink to perform routing activities For the same reason, as the network becomes squarer, we see an improvement in the fairness index (see Figure 4.5(b)).

Connect times and CTC of connect times

For experiments 1 and 2, ANSI and AODV are comparable in terms of connect times (Figures 4.3(c) and 4.4(c)) and closeness of connect times (Figures 4.3(d) and 4.4(d)). For Experiment 3, ANSI shows higher connection times (see Figure 4.5(c)).

For both ANSI and AODV, connect times increase as the APDU sending rate increases(see Figure 4.3(c)) The above observations can be explained as follows Congestion increases in network when the APDU sending rate increases As speed increases, bothANSI and AODV react to route breakages by increasing the amount of route discovery activity This increase in route discovery activity increases the congestion in the network,resulting in larger IP queue sizes at the nodes in the network, which in turn increases the connect times for both ANSI and AODV This congestion is also the reason why both ANSI and AODV show a slight decrease in the CTC of connect time as APDU rate increases (see Figure 4.3(d)) Note that both ANSI and AODV show a comparable and low CTC (around 0.2) The results for the CTC also indicate that in both ANSI and AODV

Figure 4.4: Experiment 2: Performance of ANSI and AODV with increasing speed. network, APDU rate affects the connect time dispersion minimally.

When the speed increases, connect times reflect how aggressively either protocol performs route discovery activity to repair the broken routes due to topological fluctuation.For ANSI and AODV, as the network becomes more mobile as compared to zero mobility,route discovery activity increases, and the traffic in the network is more “spread out,” resulting in a drop in connect times (see Figure 4.4(c)) But thereafter, increasing mobility places more pressure on the MAC layer resources owing to increase in route discovery activity, resulting in longer connect times for both ANSI and AODV Regardless of this fact, in Figure 4.4(d), as with Experiment 1, we see that both ANSI and AODV show

FI (APDUs rcd by L5) ANSI

Figure 4.5: Experiment 3: Performance of ANSI and AODV with increasing squareness of terrain. comparable and low CTC (around 0.2) as speed increases, indicating that speed affects the connect time dispersion minimally.

Figure 4.5 shows the performance of ANSI and AODV when the network terrain changes from highly rectangular to a square terrain When the network is very rectangular, both the ANSI network and AODV network show large connection times, owing the high path lengths (see Figure 4.5(c)) However, rectangular networks show a higher fairness in connection times as compared to squarer networks (see Figure 4.5(d)), because in a squarer network, paths tend to cross each other more than in a rectangular network, leading to increasing contention for the TCP flows In a rectangular network, the chances that one TCP flow does not affect any other TCP flow is larger.

Figure 4.6 shows the results of Experiment 4, where we study the change in connect time as the number of background CBR/UDP flows changes in the network As far as average connect times go, both ANSI and AODV show comparable average connect times, as we see in Figure 4.6(a) As the number of CBR over UDP background flows increase from zero, the connect times improve initially because the CBR activity helps TCP sessions owing to pre-computed routes at some of the data sender/receiver pairs, thus reducing the average connect time However, as the traffic increases in the network, the congestion increases, and this increase in traffic increases the connect time, owing to increasing average

IP queue lengths in a network with higher traffic Note that all the average connect times are below 40s, and the TCP sessions do not send much data (1 byte data), and thereby, likely close the connection before the next TCP connection is attempted.

In Figure 4.6(b), we study where the quartiles for the connect times measured in this experiment lie Note that we measured a total of 1000 connect times (20 trials × 50 measurements/trial) The graph in Figure 4.6(b) shows the median connect time and the errorbar with the lower quartile (Q1) and the upper quartile (Q3) We see that the interquartile range (IQR) for both ANSI and AODV lies below the average in all cases,indicating that atleast 75% of the measured values of connect times for both ANSI andAODV are below the average value for connect time Further, we also see that even though the average connect times for ANSI and AODV are comparable, the IQR plot inFigure 4.6(b) shows that atleast 75% of AODV’s connect times lie below ANSI’s IQR,indicating better connect times for AODV in the general case These observations can be explained as follows For ANSI, the congestion-aware property, which results in better performance for UDP also increases MAC layer resource consumption (see Section 3.2).This increase in resource consumption increases average IP queue length in ANSI, but this increase is “spread” over the network This property of ANSI improves APDU received characteristics, as we saw in Experiment 1, 2 and 3, but the congestion-awareness of ANSI also increases the average path length of the TCP control segments (SYN,SYN/ACKand ACK), resulting in increased connect times.

Number of background UDP flows

(b) Interquartile range for connect times The plotted point is the median of the observed values, and the errorbars indicate the lower quartile and the upper quartile.

Figure 4.6: Experiment 4: Connect times for ANSI and AODV under increasing

Discussion

Our experiments have unearthed some insights into the working of two routing protocols for MANET, ANSI and AODV We note that these insights are not possible when using applications which use UDP at the transport layer Even though TCP has been agreed to perform poorly in multihop wireless networks, the merging of the wireless and Internet domains dictate that some applications are going to use TCP at the transport layer, making our results applicable to current research.

More APDUs are received at the Super Application data sink in ANSI as compared toAODV in our experiment scenarios We see that the congestion awareness of ANSI does result in better or comparable fairness of the number of APDUs received metric in an

ANSI network in our experiment scenarios The fact that ANSI is better or comparable to AODV with respect to fairness of APDU received metrics indicates that the ANSI network is able to divide the network resources more evenly between flows.

Recall that the congestion-awareness property of ANSI improves the performance metrics of ANSI in the UDP-based experiments we discussed in Chapter 3 In the same chapter, however, we saw that congestion-awareness does not provide any benefits to ANSI when running TCP-based file downloads and result in TCP running over ANSI to be pressured more However, in this chapter, we see that congestion-awareness does help ANSI in terms of improving the number of APDUs delivered and fairness of APDU delivery, in the context of Super Application sendingxAPDUs/s.

The above described discrepancy can be understood as follows In Chapter 3, the experiments using FTP over TCP load the network differently by sending 1 APDU (the size of which is the file size) as compared to our experiments in this chapter, where Super Application running over TCP sends x APDUs/s TCP running under Super application thereby loads the network periodically, while TCP running under FTP loads the network in one spurt, trying to send as many TPDUs (subject to TCP’s congestion control and flow control mechanisms) until the entire file is sent For the experiments in this chapter, thereby, each node is pressured only when there is data to be sent (every 1/x seconds) In addition, for the experiments in experiments 2 and 4 in Chapter 3, we expect TCP to send

1 MSS worth of data (until the last TPDU is to be sent) several times a second, but in this chapter, several APDUs from the application layer can be grouped into one TCP segment of 1 MSS size, because the APDUs are small (64 bytes).

Though we see that ANSI is able to perform better and with comparable or higher fairness than AODV with respect to APDUs received owing to congestion-awareness, we see that both protocols are not upto the mark when it comes to connect time metrics—a measure of how responsive the protocols are Both protocols show wide fluctuations between receivers in the network, as seen by the larger averages as compared to the values measured at atleast 75% of the data senders (see Figure 4.6(b)), given the low and comparable CTC values, and wide and comparable confidence intervals for the corresponding values for both protocols That is, most receivers show low connect times, while others show significantly larger connect times Additionally, we see that while ANSI’s congestion-aware property is useful in sending CBR/UDP APDUs quicker, as we saw in Section 3.2, ANSI affects TCP-based applications differently, and result in a TCP-based application over ANSI showing larger connect times.

Lastly, when the network is very rectangular, the CTC of connect time seems to be higher than the fairness of APDU delivery (though the connect times are very high), and the condition reverses with increasing squareness of the network Further, we see that even though ANSI’s congestion-awareness helps the cause of ANSI, the property also increases the average path length of the NPDUs in the network, thereby increasing connect times.This condition is difficult for protocol designers to work around.

PIDIS: A PACKET DELIVERY IMPROVEMENT SERVICE FOR

In this chapter, we address the problem of improving multicast packet delivery ratio in mobile ad hoc networks via a protocol-independent, packet delivery improvement service that could be incorporated into any ad hoc multicast routing protocol The service, Protocol-Independent packet Delivery Improvement Service (PIDIS), uses the mechanisms of swarm intelligence to decide from where to recover unreceived packets 1 Notice that PIDIS itself is not a reliable multicast protocol, but a service which improves multicast packet delivery of an ad hoc multicast protocol that incorporates PIDIS.

The SI mechanisms in PIDIS comprise an adaptive search mechanism that facilitates PIDIS to quickly converge to good candidate routes (leading to other group members) through which unreceived multicast packets could be recovered with the greatest probability, while discovering alternate routes for packet recovery to adapt to changing packet delivery patterns and network topology This technique differs from the AG approach [13], which uses the information about group members, rather than routes, which have been effective in recovering unreceived packets.

1 Since we deal with network layer problems exclusively in this chapter, we use the word “packet” to denote a network layer PDU (NPDU) which contains one application layer PDU (APDU) When using CBR/UDP loads, if the CBR APDU size is less than the MTU of the interface, we can assume that each APDU generates one NPDU.

PIDIS is a gossip-based 2 service that is adaptive to network usage and may gossip several times for unreceived packets PIDIS continuously gauges the network conditions to control the extent of gossiping and number of gossips sent for a unreceived packet PIDIS does not depend on membership views, either partial or total Also, PIDIS is concerned with learning which next hop nodes give better packet recovery ratios when gossiped with, rather than learning which member nodes (when gossiped with) help recover the most packets (such as the use of member cache in AG) Thus, in PIDIS, the extent and number of gossip packets are restricted by choosing from a focused set of next hop nodes as gossip partners Using SI mechanisms in PIDIS, gossip messages are treated as ants, and valuable information collected during the gossip request phase is processed when the gossip returns as a gossip reply The effectiveness of PIDIS, as shown in simulation results, attributes to the efficient learning capability of swarm intelligence.

The rest of this chapter is organized as follows In Section 5.1, we give a brief overview of PIDIS and a detailed description of PIDIS as implemented over ODMRP In Section 5.2, we present a detailed simulation study of PIDIS.

PIDIS: A packet delivery improvement service

Overview of ODMRP

On-Demand Multicast Routing Protocol (ODMRP) [45] is an on-demand, mesh-based multicast protocol that attempts to establish a forwarding group only when a source of the group has multicast data to send A (group) member node is any node which is the intended recipient of the multicast data from sourcessent to groupg A forwarding group node for ans/gpair in ODMRP is any node in the network which has set the FG FLAG (see Algorithm 3) for the s/gpair A mesh node in ODMRP is any node that is either a sourcesof the multicast data, or the members of the multicast group,g, or a forwarding node for thes/gpair.

In ODMRP, the nodes in the forwarding group form a mesh that connects the group members together with the sources The ODMRP algorithm at the multicast source is shown in Algorithm 1 and shows how sourcesinitiates the creation of the multicast mesh over the geographical area surrounding the source and member nodes by sending aJOIN QUERY message The ODMRP operations at sources create a multicast mesh in which the data packets from the sources are flooded Because the number of nodes in the mesh is expected to be lesser than the number of nodes in the network, the extent of flooding in the network is controlled, but the data redundancy benefits of flooding are accrued To adapt to mobility, the ODMRP operations at the sourcesperiodically sendJOIN QUERYmes- sages, and new nodes are included into the mesh, as and when old mesh nodes move away from the geographical region surrounding the source and member nodes.

The ODMRP algorithm at group members is shown in Algorithm 2 We see how the multicast members act as the “end-points” of the JOIN QUERY messages sent by the multicast data source,s.

Algorithm 2 ODMRP operations at any group memberm if (JOIN QUERYreceived fromiatm) then constructJOIN REPLYsetting upstream node ID=i; broadcastJOIN REPLY; else if (data packet received) AND (FG FLAG set for source/group pair) then rebroadcast data packet; end if

The ODMRP algorithm at all other nodes (nodes which are not sources or group members) in the network is shown in Algorithm 3 Consider one such node, node i Node i will engage in a two-step process to ascertain if nodei will be part of the forwarding group First, every node in the network forwards theJOIN QUERY, but only nodes that are “good” candidates for forwarding group nodes forward theJOIN REPLYand subsequent data packets Note that a group member may also be a forwarding group member, if the member happens to be in “inside” the multicast mesh, which happens when the member node forwards theJOIN REPLY.

Algorithm 3 ODMRP operations at any other nodeiexcept the group members if (JOIN QUERYreceived from nodej) then store the route fromjtoiin routing table ati; rebroadcastJOIN QUERY; else if (JOIN REPLYreceived from nodej) then k =upstream node ID inJOIN REPLY, if (k== i) then set FG FLAG for source/group pair; rebroadcastJOIN REPLYwith upstream node IDi; end if else if (data packet received) AND (FL FLAG for source/group pair is set) then rebroadcast data packet; end if

Overview of PIDIS

PIDIS is a persistent packet recovery service: packet recovery attempts in PIDIS may be made more than once, and the number of attempts a packet recovery may be made is bounded and adaptive.

An implementation of PIDIS over a multicast routing protocol works as follows.

1 An (unreliable) multicast routing protocol,χ, delivers packets to a member nodei, and

2 PIDIS service “kicks in” at member nodei to fetch the packets which χ has not been able to deliver to nodei.

PIDIS, thus, provides services to the multicast protocol directly, i.e., at the network layer.

We describe an implementation of PIDIS over ODMRP, termed ODMRP+PIDIS How- ever, PIDIS can be implemented over any other multicast routing protocol with minimal changes from the implementation over ODMRP We note that while PIDIS is a service, ODMRP+PIDIS is a protocol, where the PIDIS service is implemented as a service to ODMRP.

ODMRP+PIDIS works as follows When packets belonging to a source/group pair, s/g, are not received at a member nodei,

1 Node itransmits a gossip request packet (GREQ) to recover unreceived packets. TheGREQis transmitted to a chosen one-hop destination, the gossip next-hopλ, given by Algorithm 4 (Section 5.1.4).

2 At any intermediate nodeX which receivesGREQ,

(a) The id of nodeX is recorded in theGREQ.

(b) If nodeXis not an ODMRP mesh node, nodeXdiscards the receivedGREQ.

(c) If nodeXis an ODMRP forwarding group node, but not a member or source node, nodeXforwardsGREQto a newly chosenλas per Algorithm 4.

(d) If node X is a member or source node, then node X checks if it has the unreceived packets If nodeX does not have any of the unreceived packets, nodeX forwards theGREQto a newly chosen nodeλas per Algorithm 4.

(e) If nodeXis a member or source node, and if nodeXhas any of the unreceived packets which were reported unreceived by node i, then node X recovers the packets from X’s cache (see Section 5.1.3) and prepares a gossip reply packet (GREP) for each packet found EachGREP backtracks the path of theGREQto nodei TheGREQis then discarded at nodeX.

3 At each nodey the GREPvisits on the route back to node i, the hop previous to nodey, nodez, is remembered (at nodey) as a useful hop to gossip with when there are unreceived packets from the source/group pairs/g This process maintains the Gossip Table (described in detail in Section 5.1.3), a data structure used to maintain the information about which next hop nodes were useful in fetchingGREPs.

4 IfGREPpackets intended to the source/group pairs/gare not received at nodeiaf- ter a timeout period, PIDIS GREQ TIMEOUT, nodeimay initiate anotherGREQ for the unreceived packets depending on a value,c g s , which is the number of times node i will gossip for unreceived packets from s/g The value of c g s is chosen according to the methods described in Section 5.1.7.

Figure 5.1 illustrates the gossip process in PIDIS The figure shows ODMRP forwarding group nodes with letter IDs, the source node S, and member nodes with number IDs, and the non-participating nodes are hollow (nodes not in the ODMRP mesh) Member node 1 has received some packets, and hence aGREQis sent from member node 1 to gossip for n j f a b c g k e d h i l p m q r u o s t

Figure 5.1: The path of the GREQfrom member node 1 is shown in solid arrows and the path of theGREPfrom member node 5 is shown in dashed arrows. the unreceived packets ThisGREQ(eventually) reaches node 5 via several hops in the mesh Node 5 is a member node and has cached some of the packets which member node

1 has not received Member node 5 responds to theGREQby sending aGREPback to member node 1, which backtracks the path taken by theGREQfrom node 1.

In PIDIS, the mechanisms of swarm intelligence are exploited as follows GREQs and GREPs work as ants – packets traversing the network, collecting information about the nodes they visit, which search and reinforce the (good) route(s) (leading to other group members) where unreceived packets could be recovered from Information about the nodes a gossip traverses is recovered from a GREP, which backtracks the path of the GREQ Since GREPs are only sent in response to GREQs received, the overheads of PIDIS due to ant activity is controlled.

When no information about choosing next hops to gossip is available, only the neighboring mesh nodes information is used – the gossip next-hop node, λ, is picked randomly from the neighboring nodes On the other hand, if information from previous gossip replies is available in the Gossip Table, the choice ofλis made intelligently – by choos- ingλfrom the Gossip Table By choosingλfrom the Gossip Table, there is a good chance that the gossip request sent toλresults in a gossip reply, thus improving the efficiency of gossiping.

In addition to being able to choose a next hop intelligently, PIDIS eventually converges to find the best (next hop) node, in terms of number of GREPs fetched, to gossip with. Furthermore, owing to the amplification of fluctuation mechanism of SI, PIDIS adapts well to mobility by reacting to the topology change locally and exploiting other (or better) nodes to gossip with Lastly, if several choices are available to gossip from the Gossip Table, a choice of gossip next-hop is made probabilistically to better distribute (load- balance) the recovery efforts.

Despite the above claims about PIDIS, we will see in our simulation results that PIDIS does not perform well in networks with limited or no mobility In addition, the gossiping nature of PIDIS interferes with the data broadcasts by ODMRP, thus increasing the variability of the number of packets received at the members We will discuss these issues in the context of our simulation experiments in Section 5.2.

Local data structures

Gossip table at nodei(Gi )

A gossip table, containing the information collected from ant activity, is maintained at each nodeiwhich is either a member node or a forwarding node The format and usage of the information in the gossip table is modeled from the ant decision table and algorithms described in [10] At nodei,G i maintains the information aboutGREPreceipts A node j unicasting a GREP to member node i will result in an entry for node j in Gi The information contained in Gi is used to choose a next hop when a GREQ is to be sent out in search of unreceived packets Gi stores the possible next hops for each multicast source/group pair,s/g, along with next hopj, the pheromone level τijsg 3 and a heuristic ηijsg 4, along with a probability valueρijsg calculated fromτijsg andηijsg.

The value of ρ ijsg is also a measurement of goodness of a particular next hop for gossiping Intuitively, a next hop which has larger values forτ has “higher goodness” (i.e., better) as compared to another next hop with the sameη, and likewise, a next hop that has higherηfor the sameτ has “higher goodness.” The value ofρfor each of the next hops is then a measure of the composite goodness of the next hop, taking both pheromone level and the heuristic into account.

In the presence of multiple possible next hops for gossip, the value of ρijsg is the probability of node ichoosing nodej as a next hop to gossip with when there are unreceived packets corresponding tos/g pair at nodei Note that at the time of nodei sending the GREQto nodej, nodejmust be a member of the forwarding group, otherwise aGREQ sent from nodeito nodej is discarded and not propagated at nodej The quantityρijsg is computed as follows: ρijsg = τ ijsg 2 ×η ijsg 2

P j ∈ Jτ ijsg 2 ×η ijsg 2 (5.1) whereJ = {j 1 , j 2 , , j m }is the set ofmnext hops available to contact for unreceived packets for as/gpair at nodei.

The value ofτijsg is a measure of how manyGREPs (corresponding to as/g pair) have reached nodeivia next hop nodej, and the value ofηijsg a measure of the closeness of nodeito the gossip reply sender.

3 The pheromone level of a linkij is proportional to the number of times ants travel the link, and hence is one measure of goodness of the path for recovering unreceived packets.

4 Higher values ofηhave “higher goodness”.

The pheromone level τ inGi evaporates at a predictable rate As with our approach to unicast routing (see Equation 3.7 in Chapter 3), a half life model is used for pheromone evaporation If the value of ρijsg gets below a threshold, nodej’s entry is purged from

Gi The evaporation half time should be chosen carefully; otherwise the likelihood of choosing a gossip next-hopλwhich does not belong in the current mesh for the multicast group increases.

Let us say a pheromone trail was first laid between nodesiandjfors/gat timet1due to the first gossip reply seen traversing the linkij At timet 2 , another gossip reply for the sames/gtraverses the linkij Then,Gi is modified as follows:

1 The old pheromone concentration is evaporated to reflect the current concentration due to the old trail: τijsg = τijsg

2 (t 2 −t 1 )/t 0 5 , (5.2) wheret0.5is the half life for evaporation.

2 Then the pheromone deposit and the heuristic due to the new gossip reply are computed as follows: τ ijsg = 1 +τ ijsg (5.3) and ηijsg = 1

D (5.4) whereDis the distance of nodeifrom the gossip sender in number of hops That is, each new GREPtraversing link ij strengthens the pheromone trail ij by one unit for gossiping for packets corresponding tos/g.

Thereafter, theρvalues are recalculated at nodeiaccording to Equation 5.1.

Neighbor table at nodei(Ni )

A neighbor table, Ni, is maintained at each member/forwarding node i Ni maintains the list of neighboring mesh nodes (which may be member nodes, sources or forwarding group nodes) of node i for a particular multicast group The neighbor table is used to select next hops whenGi cannot be used for selecting next hops To reflect the status of the current mesh,Ni is kept up-to-date by collecting information from the join replies of the ODMRP protocol Thus, this information is maintained without any extra protocol overheads.

We note here that when implementing PIDIS over other types of multicast protocols, the notion of neighbor mesh node is modified to refer to the neighboring node in the multicast structure For example, in a multicast protocol which uses a multicast tree, the neighbor node is an adjacent node in the multicast tree.

A gossip request message with sequence number node k initiated from member nodei, GREQ k i , contains the following information: (a) the gossip sender ID,i, (b) the sequence number of the gossip request,k, (c) the multicast source/group information,s/g, (d) the expected sequence number, e, (e) the number of packets, n, from s, unreceived since packet number e ati, and (f) the nodes-visited stack, S k i, where S k i is the record of the nodes which the gossip request visited.

At a nodexreceivingGREQ k i , if nodexcontains some or all the packets inGREQ k i , one gossip reply is potentially created for each unreceived packet specified inGREQ k i , subject to unreceived packet availability at nodex Hence, potentially,n separate gossip replies are generated for a gossip request that reports n unreceived packets at node i Because each gossip reply,GREP q x (qis the sequence number ofGREPsent) is source-routed, the

GREP q x must contain the nodes-visited stack from the corresponding GREQ k i Hence, GREP q x contains: (a) the gossip reply sender ID, x, (b) the nodes-visited stack S k i from GREQ k i , (c) a packet with sequence numberl ∈[e, e+n)which was unreceived at node i, and (d) thes/gpair for the packet.

Only members are concerned about unreceived packets, and hence, only members can generateGREQs Also, because only members/sources cache packets, only members/source nodes can sendGREPs As mentioned earlier, both gossip requests and replies are treated as ants, and the information contained inGREP q x is used to update the gossip table at all intermediate nodes.

Selecting a gossip next-hop (λ)

A GREQ with sequence number k from node i, GREQ k i , is broadcast, unicast or not sent at all depending upon two parameters: (a) the broadcast probability,P bi , and (b) the neighborhood information A gossip request is broadcast with a probability ofPbi The value of P bi has to be controlled so that the network is not overwhelmed by broadcast packets In the event GREQ k i is not to be broadcast, the gossip table Gi or the neighbor table Ni is used for picking a next hop for unicasting the gossip request At all times during the transmission of the gossip, to prevent cycling of the gossip, care is taken not to gossip with a node which has already been visited by comparing the node IDs in the nodes visited stack against a chosen next hop node The algorithm for the process of next hop selection in PIDIS is shown in Algorithm 4.

Packet caching

To enable the process of retrieving packets, each member/source node x in the PIDIS scheme stores a finite number|Cx|of the most recently received data packets the member/source node receives/sends, respectively When a gossip requestGREQ k i is received

Algorithm 4 Next hop selection for PIDIS

Require: λ, gossip next-hop for groupGand sourceS x⇐rand(); {Generate a random number} ifx 0) & (A new node not already gossiped with)then λ ⇐ chooseF romGossipT able(); {A node is probabilistically chosen, depending onρvalue} else if(|N isg |>0) & (A new node not already gossiped with)then λ⇐randomlyChooseF romNeighborNodes(); else λ⇐0; {Don’t sendGREQ} end if return λ; at a member/source nodex, nodexchecks its cacheC x for the packets unreceived at node i If a packet unreceived at nodeiis inCx, nodexretrieves the packet and sends a gossip reply message containing that packet to nodei.

Maintaining the gossip table

To illustrate how SI and the gossip table help in the gossip next-hop selection, consider the example in Figure 5.1 TheGREP(sent from node 5) is treated as an ant collecting information about the regions of the networkGREPtraverses At each intermediate node xwhich theGREPtraverses, the information in theGREPis used to update the gossip table Gx of nodex When the GREP reaches node 1, theGREPis used to update the gossip tableG 1 at node 1 The information in the gossip tableG x is used to choose gossip next-hops for future gossip requests from nodex.

Suppose that no other GREP has reached member node 1 yet The receipt of the newGREP, which has traveled a total of 4 hops including nodes 1 and 5, results in the following entry inG1:

In this way, the positive feedback mechanism of SI in PIDIS positively reinforces next hopf as an effective gossip next-hop Positive reinforcement of next hops can also occur at forwarding nodes and other members when a gossip reply is forwarded at these nodes to the intended unreceived packet recipient.

In addition to finding an effective gossip next-hop, a gossip is broadcast instead of being unicast with a probability of Pbi This amplification of fluctuations mechanism of SI allows to discover alternate or better routes When a gossip reply is received in this case, another entry is made into the gossip table.

For instance, in the example above, let us say aGREQwas broadcast, and a correspond- ingGREPtraversing 3 hops was received via nodebapproximatelyt = 5seconds after the first GREPis received Then, all the pheromone trails are first updated using Equa- tion 5.2 (assumet0.5 = 10seconds): τ1f sg = 2 τ t/t0 1 f sg 5 = 2 5 1 / 10 = 0.70(only one entry exists atG 1 )

Then an entry for theGREPreceipt via nodebis made, using Equations 5.3 and 5.4, into the gossip tableG1 at node 1, and theρvalues are re-calculated using Equation 5.1:

Thereafter, when a GREQ is to be made for packets from source s, λ = b is chosen with a probability ρ1bsg = 0.78, andλ = f is chosen with a probabilityρ1f sg = 0.22.

Evaporating a pheromone trail, the negative reinforcement mechanism of SI, carries the semantics of reducing the importance of information that is old.

Note that further receipts ofGREPs at node 1 via nodef update the entry corresponding to node f in G1 according to Equations 5.2, 5.3, and 5.4 So, if another GREP were received at node 1 from member node 3 via nodeb (two hops) 5 seconds later, after the receipt of thisGREP,G1would be:

Next hop τ η ρ g s f 1.49 1/2 0.91 g s b 0.70 1/3 0.09 whereτ1f sg was calculated as follows: τ1f sg = 1(due to the newGREP)+ 2 0.70 5 / 10 (remaining pheromone due to all old gossip replies)

Furthermore, in PIDIS, the multiple interactions mechanism of SI allows nodes to col- laboratively interact between each another via gossiping messages (GREQandGREP,which are used as ants) to discover routes By combining the effects of positive/negative reinforcement, amplification of fluctuations, and multiple interactions, we claim that the mechanisms of swarm intelligence allow PIDIS to adapt to mobility and at the same time discover information about the best possible routes to recover unreceived packets In this way the information maintained in the gossip table improves the choice of gossip next-hop,λ, used for gossiping.

Adaptive mechanisms in PIDIS

To improve the packet delivery at a multicast member nodem1, it seems intuitive to increase the number of times member nodem1gossips for a unreceived packet, i.e., increase the number of times aGREQis sent for a particular chunk of unreceived packets In this case, after a timeout PIDIS GREQ TIMEOUT, if all theGREPs are not received at node m1, anotherGREQis initiated for the unreceived packets The number of times aGREQ is initiated at nodem1 for packets intended to ans/g pair,c g s , 1 ≤ c g s ≤ l, is adaptive, where l is the the maximum number of times PIDIS can gossip for a particular chunk of unreceived packets from s/gat a node At m1, packets intended to eachs/g pair at m 1 are recorded and unreceived packets atm 1 corresponding to eachs/ginitiateGREQ. This gossip action initiated at m1 has the effect of improving the packet delivery atm1 for packets intended tos/g but also has the effect of increasing MAC layer demands at m1 Increasing MAC demands atm1have the detrimental effect of decreasing the amount of resources that are available to deliver packets to other members, and also deter packet delivery tom1 Hence, the number of times a gossip is initiated must be controlled.

To achieve a balance in the overall network resources used for gossiping for unreceived packets, members in a multicast group using PIDIS broadcast their packet delivery measurements maintained per s/g pair (this information is maintained as as set of tuples hs/g, pd g s ifor eachs/gpair at the member) to other nodes in the network via a limited hop count broadcast Any other member can use these measurements to decide whether or not a particular c g s value used for gossiping is appropriate For instance, consider a memberm1 which receives information from another memberm2 aboutm2’s packet delivery measurements of all s/gpairs to whichm 2 belongs If the packet delivery at m 2 for s/g is lower than the packet delivery at m1 fors/g, thenm1 reduces its c g s value to make more resources available tom 2 for gossiping and improving packet delivery This process also ensures the reduction of the packet delivery variation index, a measure of the variation in the number of packets received across members.

Justification for the reinforcement models

In this section, we justify our use of the reinforcement equations used in Equations 5.1 – 5.4 for PIDIS design.

We note that the goodness of a next hop for gossip,ρijsg, depends both on (a) the absolute value of the pheromone trail, τijsg, and (b) the distance of the GREP sender, η 1 ijsg A higher value ofτijsg means that the next hop was recently useful in fetching a unreceived packet, and higher values of ηijsg means that a unreceived packet can be recovered in fewer hops if next hop node j is chosen to gossip Also, if the path via one gossip next hop, j ′ , is not as good (measured as the absolute value ofρijsg) as the path via another next hop node,j, then the path viaj should be chosen more frequently BecauseGREQs should be sent more often via paths that have better ρ values, we choose to square the terms in our equation, because if we choose any greater an exponent for the terms in Equation 5.1, then PIDIS will increasingly choose better paths (i.e., paths with higher ρ values) more often than choosing paths with lower ρ values We choose the exponent of 2 to balance theGREQforwarding in favor of the better next hops, but perform load balancing of the GREQ propagation by occasionally choosing paths which are not as good.

Likewise, for Equation 5.2, we keep the following ideas in mind: (a) a previously found good next hop node to gossip with decreases in effectiveness as a next hop to gossip with as time progresses, and so, the pheromone trail (which is measured as an absolute value ofτijsg) for a next hop should progressively decrease in goodness as time progresses, and(b) a newer next hop found for gossiping for the sames/gpair is better (i.e., has “higher goodness”) than an older next hop, provided both the older and newer next hop share the same heuristics (η) In addition, in the context of mobile ad hoc network, we have to choose our negative reinforcement models such that (a) the pheromone trail concentration does not become insignificant in value too soon, as the resulting invalidation (of the pheromone trail) will trigger wastage of network resources (assuming no other next hop is available) by randomly choosing next hops, or (b) become insignificant too late before which a lot of next hops were used as though they would still be effective next hops Intu- itively, it seems like an exponential model will fit the MANET scenario, and the value of the base of the exponential model was chosen after observing the results from preliminary experiments.

For Equation 5.3, we note that the pheromone update model was chosen to reinforce the goodness of a newly found next hop In case that a trail for that next hop already exists, the equation simply re-iterates the goodness of the route.

The choice of the value of the heuristic in Equation 5.4 is chosen to reflect that longer paths to recovery are less favored Thus, if the GREP is sent from via two paths, one withηijsg = 1/D and another withηij ′ sg = 1/D ′ , where D < D ′ , then the effect of this heuristic equation (equation 4) is to choose next hop j over next hop j ′ for recovering unreceived packets for thes/gpair.

We end our description of PIDIS with a comparison of AG, RDG (see Section 2.3) andPIDIS, shown in Table 5.1.

Implementing Anonymous Gossip

The essence of AG is to allow a memberm, which has lost packets, to recover packets from another group memberm ′ , whose identity is not known bym The authors call the protocol “anonymous” for the above reason However, a few optimizations are used when

AG is implemented over MAODV, namely, localizing the gossip and caching information about which members fetch gossip replies These optimizations for MAODV+AG are tied together using a probabilistic model with several parameters These optimizations could not be implemented in our implementation of AG owing to lack of adequate information in the AG paper regarding the appropriate parameters for the probabilistic model (for example, the value of panon, which is the probability of gossiping anonymously.) Our implementation of AG was thus a “bare bones” version of AG which captured the essence of the anonymousness of AG.

In addition to the above, we had to adapt the AG protocol, which was described in the AG paper for an implementation over MAODV (a tree-based multicast protocol running over AODV), for use over ODMRP, which is a mesh-based multicast protocol, without the use of any unicast protocol For transporting gossip packets in our implementation of AG, we used the nodes visited stack used in PIDIS (see Section 5.1.3) for AG as well The AG mechanisms handling aGREQavoided loops by making sure that only nodes which are not already recorded in the nodes visited stack are sent theGREQ.

Network and protocol characteristics

The mobility model was Random Waypoint with a minimum speed of 0.001 m/s and a pause time of 100s These numbers for the mobility model were chosen to avoid the speed decay problem in random waypoint model The MAC layer used 802.11DCF and the physical layer used omnidirectional antennas with a transmission range of 251m using 802.11b The propagation path loss model used was two-ray and no propagation fading model was used, and pertain to a terrain which is a uniform, desert-like terrain with no vegetation or buildings and even ground The terrain size was 1000m×1000m.

We modeled a lossy network, and in our simulation model, the protocols dropped packets in a rectangular region defined by the cartesian coordinates (50, 50)−(250, 250) with a probability of 0.3 Our network contained 100 nodes Both the sources and the group members were chosen randomly The same pseudorandom number seed was used for all three protocols to assure the same traffic, network and mobility characteristics for several trials of the experiments The performance of the protocols ODMRP, ODMRP+AG and ODMRP+PIDIS was recorded and studied.

In addition to the parameters described above, several other parameters that were used were protocol specific For the ODMRP simulations, we used the parameters specified in [45] For the AG simulations, the protocol specification in [13] was used as guidelines. Several optimizations which were used in [13] could not be used in our AG implementation owing to the lack of specifications and/or lack of unicast framework in ODMRP. For AG, we used a lost buffer size of 200 at each member node, as specified in [13] For both AG and PIDIS, a packet cache, C, of size of 100 (|C| = 100) was used This size of the packet cache pertains to 1/2 MB of cache, which is a reasonable number for mobile devices GREQs in AG, sent one per second, sends the 10 most recently lost messages (a list of the 200 most recently lost messages is stored in the lost buffer at each node in AG) The size of the data packet was 512 bytes Gossip replies were source-routed (using a nodes-visited stack) in both ODMRP+AG and ODMRP+PIDIS While theGREQused a fixed size nodes visited stack of 10 for both AG and PIDIS, AG used other information in the GREQ, such as the last 10 sequence numbers lost at the receiver Thus, the size of theGREQfor AG was 108 bytes, and the size of theGREQfor PIDIS was 76 bytes, including the fixed nodes visited stack of size 10 for both AG and PIDIS The size of the GREPfor both AG and PIDIS were 76 bytes plus the size of the data packet recovered.

For PIDIS, probability of broadcast, P bi was 0.001 In addition, the evaporation half time,t0.5, was 12 seconds The evaporation half time was carefully chosen to reflect the fact that a gossip next-hop chosen from the gossip table will most likely be a member of the mesh for the group in consideration The pheromone threshold was set at 0.0625; a next hop j whoseτ value fell below this threshold was removed fromGi This value of pheromone threshold corresponds to a next hop for gossiping becoming stale in 48 seconds (4 half-life intervals) Both PIDIS and AG used a fixed nodes-visited array of size 10 for keeping track of the nodes visited by a gossip request The number was appropriate given the terrain size, node distribution, transmission range and the number of nodes in each experiment In PIDIS, the packet delivery data for eachs/gpair at the members are broadcast (with a TTL of 5 hops) every 20s These values are appropriate for the terrain/traffic model Also, the maximum lvalue was 6 and the minimum was 1.

That is, there was at least oneGREQper lost packet.

Experiments and performance metrics

We performed three experiments: In Experiment 1, one group consisting of 10 group members were sent packets at 10 packets/s from three sources, in Experiment 2, one group consisting of 30 members was sent 10 packets/s from 1 thru 5 sources In Experiment 3,

2 sources sent 10 packets/s to one group consisting of 10 thru 50 members (in steps of 10 members).

All experiments were performed for 10 minutes of simulated time, and MCBR (multicast CBR) over UDP was used to generate the application layer PDUs In MCBR, the multicast source generates APDUs intended to a multicast group.

We studied the following performance metrics and overheads from the simulations:

• Metric 1: The packet delivery ratio: Packet delivery ratio is defined as the ratio of the total number of application layer PDUs (APDUs) received at the receivers to the total number of APDUs which are expected to be received at the receivers. Note that duplicate APDUs are not counted (duplicates are received at the members owing the flooding mesh structure of ODMRP).

• Metric 2: The variation index (VI) of packet delivery: This metric is defined as the coefficient of variation among packet deliveries at different members.

V I = std(P D m 1, P D m 2, , P D m l ) mean(P Dm 1, P Dm 2, , P Dm l) whereP D m k is the packet delivery ratio at memberm k , andmean()andstd()are the average and standard deviation of the values respectively,

• Metric 3: The end-to-end delay: End-to-end delay is the average latency, in seconds, from the time the APDU is generated at the data sender’s application layer to the time when the APDU is received at the data receiver’s application layer.

• Overhead 1: The routing layer overheads, expressed as the total number ofGREQs sent by the members in ODMRP+AG and ODMRP+PIDIS.

• Overhead 2: The MAC layer overheads, represented by the total number of MAC layer unicasts sent in the entire network. and,

• Overhead 3: The MAC layer overheads, represented by the total number of MAC layer broadcasts sent in the entire network.

Our aim was to show that PIDIS achieves better performance metrics (packet delivery and end-to end delay), and achieved these goals with lower overhead than a comparable protocol, AG.

Before we describe and discuss our results, we note that while AG performs periodic lost packet recovery, PIDIS reacts to packet loss at a multicast receiver and thus is expected to send more GREQs in search of lost packets In fact, we will see that as per our simulation results, PIDIS, at times, sends nearly 8 times the number of GREQs which AG sends This fact should favor AG, but we will see how AG’s randomness in choosing gossip next hops actually decreases AG’s performance, by increasing AG’s MAC layer resource consumption, despite AG’s lower routing overheads (which are measured as the number of GREQs sent out at the members) We note here that for AG implemented over MAODV, the chance of a gossip request randomly sent out to a neighboring multicast tree node ending at a member or source is high (because the tree consists of only forwarding nodes and member nodes), but in ODMRP, the chances of a gossip request sent to a randomly chosen mesh node reaching a member or source are not as high as in

AG implemented over MAODV Indeed, it appears from our simulations that the chances are substantially lower.

Simulation results

Figures 5.2(a) – 5.4(f) show our results For all graphs, the errorbars indicate 95% confidence intervals of the recorded values of 50 trials.

In Experiment 1, we analyzed the effect of mobility on ODMRP, ODMRP+AG and ODMRP+PIDIS We see that ODMRP+PIDIS shows significant improvement in terms of packet delivery and end-to end delays, though in the static case, the performance benefits of ODMRP+PIDIS are not as clear as the mobile case In terms of variation index, ODMRP performs well, with ODMRP+AG and ODMRP+PIDIS showing greater variation index In terms of the number of gossip requests, ODMRP+PIDIS consistently shows a larger number of gossip requests sent than ODMRP+AG.

In Experiment 2 and 3, as in Experiment 1, we see that using PIDIS over ODMRP improves the metrics significantly in terms of packet delivery and end to end delay The variation index for ODMRP+PIDIS is larger than both ODMRP+AG and ODMRP, with ODMRP showing the lowest variation index, but the variation indices for ODMRP+AG and ODMRP+PIDIS are comparable From the figures, we also see that a larger number of gossip requests are initiated for ODMRP+PIDIS than ODMRP+AG.

In all three experiments, we see that MAC layer unicasts sent by ODMRP+AG are significantly larger than ODMRP+PIDIS and ODMRP In terms of MAC broadcasts,ODMRP+PIDIS shows more (or comparable) traffic (as compared to ODMRP+AG), andODMRP shows the most broadcast traffic Note that in ODMRP, the only MAC unicast packets are due to acknowledgment packets toJOIN REPLYs, and the protocol functions are mostly due to MAC broadcast packets.

The above observations, made in all three experiments, are are explained as follows. There are two components to packet recovery using gossip – (a) interfere with the host protocol minimally, and (b) gossip efficiently by choosing to gossip only with nodes along paths that will yieldGREPs Both of the above components go hand in hand when gossiping for lost packets Our results can be reasoned according to one or both of these interacting properties for the protocols in question.

Our arguments in the following paragraphs stem from the above insights In particular, we argue that:

1 PIDIS interferes minimally with ODMRP activity owing to carefully chosen gossip paths, and

2 AG interferes heavily with ODMRP activity owing to randomly chosen gossip paths.

We will first discuss the overheads from the results, which should set the context for the discussion of the observed metrics (packet delivery, end-to end delay and variation index) in our experiments.

The higher number of GREQs sent in ODMRP+PIDIS is because of two reasons: (a) PIDIS reacts to packet loss at a member, and so every time a packet is lost, aGREQ is generated, and (b) GREQs may potentially be generatedl times (the adaptive nature of gossiping in PIDIS controls this number) Owing to the above reasons, a large number of GREQs are generated in ODMRP+PIDIS, which are broadcast with a probability of 0.001.

As before, note that the routing overhead for AG is lower than PIDIS, but the protocol mechanisms in networks using AG do not guide the gossips correctly, leading to a large number of MAC layer unicasts, as we will soon see.

The only packets broadcast in ODMRP+AG are due to ODMRP packets, but in ODMRP+PIDIS, they are due to (a) ODMRP packets, (b)GREQs which are broadcast, and (c)s/gpacket delivery measurements at each member which are broadcast from each member every 20 seconds MAC layer broadcasts due to gossip activity have the detrimental effect of increasing collisions in the network, and reducing the effectiveness of the mesh-wide flooding activity due to ODMRP activity We observe this phenomenon in Figures 5.2(f), 5.3(f) and 5.4(f) Both ODMRP+AG and ODMRP+PIDIS show decreases in the number of broadcasts as compared to ODMRP In addition, ODMRP+AG shows the greatest reduction in the number of broadcasts We observe this reduction in the number of broadcasts because of the gossip-induced unicasts interfering with the ODMRP activity, as we will see soon At any rate, the reduction in the number of broadcasts for both ODMRP+AG and ODMRP+PIDIS is due to the MAC layer broadcasts of ODMRP colliding with the gossip activity of ODMRP+AG and ODMRP+PIDIS Thus, gossip activity, though intended to reduce packet loss, should accommodate for both the loss due to the gossip, and then recover the lost packets, for PIDIS to be an effective lost-packet recovery protocol.

MAC layer unicast packets have the detrimental effect of (a) increasing the average IP queue size at each node, and (b) increasing the propagation delay at each node Increasing traffic rate leads to increasing packet drops at both the IP layer (owing to buffer overflows) and the MAC layer (due to reaching the maximum retransmission limit) In addition, the packets that do get delivered have had to wait in the IP queue for a long durations We see from Figures 5.2(e), 5.3(e) and 5.4(e) that ODMRP+AG has significantly more MAC layer unicasts as compared to ODMRP+PIDIS, Thus, we expect ODMRP+AG, owing to unguided gossips, to have larger average IP queue sizes and a larger number of packet drops due to both reaching re-transmission limits at MAC and IP buffer overflows.

We would also like to comment on the number of MAC layer unicast bytes that are consumed in AG and PIDIS Despite the difference in the sizes of the GREQ in AG and

PIDIS, and even though we compare only the number of unicasts sent (which is a measure of the gossip activity) in AG and PIDIS and not the number of MAC layer bytes sent for all gossip activity, our comparison is legitimate owing to the fact that the size of theGREQ of PIDIS is lower than the size of the GREQin AG In our results, we have shown that the number of unicasts in PIDIS is lower than AG, thus, the number of MAC layer bytes consumed in ODMRP+AG is larger than the number of MAC layer bytes consumed in ODMRP+PIDIS by a factor of 108 76 × (The number of MAC layer unicasts in ODMRP+AG)

(The number of MAC layer unicasts in ODMRP+PIDIS), which is clearly greater than 1 Given the fact that ODMRP+PIDIS sends fewer MAC layer unicast packets in all our experiments, we can reasonably conclude that the MAC layer utilization in ODMRP+PIDIS is more efficient.

In a nutshell, the better packet delivery and delay characteristics of ODMRP+PIDIS are due to the fact that theGREQs are better guided in ODMRP+PIDIS than in ODMRP+AG. Owing to this feature, theGREQs in PIDIS take shorter trips to nodes in the network that fetchGREPs, and along paths that are more likely to yieldGREPs, rather than traversing the network randomly, which is what GREQs in AG do The side effect of the better guided gossips in PIDIS is that gossip activity interferes minimally with ODMRP flooding activity This feature of ODMRP+PIDIS is evident from Figures 5.2(f), 5.3(f), and 5.4(f) The beneficial effects due to carefully guiding theGREQs in ODMRP+PIDIS is significant enough to guarantee that even successive retrials for fetching the lost packets do not increase the MAC layer unicasts to such an extent that the network bogs down due to gossip activity in ODMRP+PIDIS We saw earlier how the MAC layer unicasts in ODMRP+AG affect ODMRP+AG’s packet-delivery.

The reason why PIDIS does not perform as expected in the static case is because of a lean mesh in the static case In the static case, ODMRP activity does not create as many new forwarding nodes as in the mobile case because the relative positions of the nodes do not change in the static case Owing to this issue, there are a fewer number of mesh nodes in the static case, resulting in a few gossip next hops being chosen over and over again from the gossip table This property leads to hotspots in the network, owing to frequent gossip activity along those paths, thus reducing the performance of ODMRP+PIDIS in the static case.

When the mobility increases, the mesh becomes progressively denser, as more and more (new) nodes are added into the mesh Note that some mesh nodes also drop out of the forwarding group if they do not receive JOIN REPLY messages for a specified period If more mesh nodes are added into the forwarding group as are dropping out of the forwarding group, as is the case when the nodes are mobile, the mesh progressively increases in size as the mobility increases With a larger mesh, more effective next hops are found by the gossip activity and are used, thus improving packet delivery for ODMRP+PIDIS when mobility increases.

Discussion

For lost packet recovery in ODMRP, we see that PIDIS (a) is able to recover significantly more packets than AG, (b) recovers the same using better guided gossips, gossiping over fewer hops than AG, (c) uses lesser MAC layer unicast resources than AG, and lastly, (d) does the above with a variation index comparable to AG, but larger than that of ODMRP.PIDIS is thus able to significantly improve metrics for ODMRP, along with a variation index close to that of AG These improved metrics for ODMRP+PIDIS are owing to the adaptive persistence model of PIDIS.

However, despite the above positive observations in the comparison study with ODMRP and ODMRP+AG, we see that ODMRP+PIDIS suffers from a number of drawbacks. When the network is static, owing to a lean mesh, OMDRP+PIDIS is not able to find new gossip next hops, and chooses the same gossip next hops frequently, resulting in hot spots at those next hops This choice interferes with both ODMRP activity and gossip activity along those next hops In a dense network with heavy ODMRP activity, the choice of the same next hops frequently can result in severe drops in performance In addition, the variation index of packet delivery will also increase, owing to the interference with theODMRP data flooding Also, as the number of sources and the number of group members increases in the network, we see that ODMRP+PIDIS, despite performing better than the base ODMRP protocol and ODMRP+AG, is not able to recover most of the unreceived packets, despite a persistent gossiping model for unreceived packets.

ODMRP ODMRP+AG ODMRP+PIDIS

Variation index of packet delivery

Figure 5.2: Experiment 1: Effect of increasing mobility.

Figure 5.3: Experiment 2: Effect of increasing the number of sources sending packets to a single group.

Figure 5.4: Experiment 3: Effect of two sources communicating with one group with an increasing number of members.

Chapter 6 BTM: A CROSS-LAYER DECENTRALIZED BITTORRENT

Among the characteristics that govern the issues of P2P networks for MANET, the most significant is node mobility Because peers/nodes in a MANET are mobile, P2P designs for MANET encounter a host of issues which are usually not present in wired networks. For instance, P2P systems that proactively construct overlays will incur the costs of maintaining the overlay to adjust to the current topology of MANET The maintenance of the overlay is not a straightforward problem owing to the fact that longer routes are more difficult to maintain in a MANET and overlays that span nodes over wider distances are prone to suffer from drastic reductions in performance Furthermore, hop-by-hop distances constantly change in length in a MANET, so the overlay structure must change constantly When the mobility increases, the P2P protocol design for MANET has to delicately handle the trade-off between network resources used for addressing overlay management and the actual data transfer.

In addition to the above, owing to peers’ mobility profile, a peer in a MANET may be temporarily cut-off from other nodes in the network, resulting in temporary network partitions of uncertain duration Any P2P system designed for MANET should work around such problems.

Of all of the proposed P2P systems for the wired Internet discussed in Section 2.4, Bit-Torrent has certain distinctive features which are advantageous in the context of MANET.Most importantly, the overlay construction in BitTorrent is on-demand—the peer overlay is not constructed until a client enters the P2P system, at which time a tracker node (whose

ID is known) is contacted by the client for a list of peers which are currently downloading the file or hold a full copy of the file (and are willing to share with other peers) Thus, peer discovery essentially involves a constant overhead in BitTorrent, and does not depend on how far from the client the peers are located Moreover, the peer overlay changes during the duration of file download, because clients are in contact with the tracker node through the duration of the download (and request a peer list periodically) The fact that the overlay changes during the client download directly affects the file download at a BitTorrent client because the BitTorrent client simultaneously downloads pieces of the file from all members of the overlay 1 In contrast, clients in Gnutella and Napster download the file from one peer over an overlay that may change, but the flux in the overlay has no effect on the file download at a client.

In addition, BitTorrent is concerned with amortizing the download cost of files across the network That is, in BitTorrent, the cost of downloading a particular file at a client is not fully borne by one peer, but instead borne by multiple peers in the system In BitTorrent, because the file is downloaded piece-by-piece, and because these pieces are of a pre- determined order, size and number, a BitTorrent client can resume download easily after long periods of disconnection from the network One advantage of this mechanism is that a BitTorrent client can start serving the file the BitTorrent client is downloading to other peers as soon as the BitTorrent client downloads the first piece of the file All of the above properties allow BitTorrent to circumvent a number of issues which typical P2P system designs for MANET face, thus making BitTorrent an attractive P2P system to adapt for MANET.

1 Kazaa also allows parallel download by using the byte-range header of HTTP TheKazaa approach differs from the approach used by BitTorrent because the piece size and order are pre-determined in BitTorrent However, these issues are beyond the scope of this work.

Despite the above advantages, BitTorrent is not immediately usable in the MANET context owing to some disadvantages The BitTorrent model is inherently centralized with respect to the tracker node operations—the presence of a single tracker node, which provides each client with the list of peers currently downloading a file, makes peer overlay construction simple, but also raises the issue of single point of failure of the tracker. Likewise, when all the nodes with a full copy of a file (seed nodes) leave the network, BitTorrent clients are no longer able to download the full file, if the undownloaded pieces are not available at other clients in the P2P system Lastly, network partitions make ac- cessing the tracker/seed nodes difficult, and potentially stall the client download process indefinitely.

We propose to address these issues using the techniques of data replication and resource redundancy, and make every client responsible for acquiring a list of peers by using querying primitives, thus decentralizing the on-demand overlay construction We provide these services using a cross-layer design in which the routing protocol provides querying and resource/peer discovery (lookup) services to the application layer In addition, the routing layer helps in localizing the P2P transaction to k-hop neighborhoods in the network so the expenses of a long, unbounded hopcount data transfer may not accrue.

The remainder of the chapter is as follows In Section 6.1, we provide an overview of the BitTorrent P2P system, followed by a description of a straightforward implementation of the wired Internet BitTorrent (BTI) over MANET, in Section 6.2 In Section 6.3, we describe a design of BitTorrent for MANET (BTM) using cross-layer techniques to address the issues BTI suffers in MANET In Section 6.4 we compare the results of the two approaches.

Overview of the BitTorrent P2P system

The BitTorrent protocol specification and protocol mechanisms are detailed in [4, 16]. The process of file download at a client C using BitTorrent comprises of the following steps:

1 Key (torrent/metainfo) information collection and service query: A client C wishing to download file π procures the torrent file, τπ, and contacts the tracker Γπ, for a list of peers which may be interested in lettingC download pieces ofπ.

In the BitTorrent model, a file is split into of a number of pieces (APDUs) of pre- determined size The torrent,τπ, contains information about the piece size and the number of pieces, along with data-integrity information, such as checksums for the fileπ, and is created by a node called the initial seed,σ 0 π (a node which has a full copy ofπinitially) The torrent is then disseminated in the network.

2 Service query response: Γπ responds to C by providing a list of peers, P {p1, p2, , pn}, which are either seeds or peers for file π (peer nodes are in the process of downloadingπ).

3 C opens connections with some/allp i ∈ P and starts downloading pieces of π, and becomes a part of the swarm of peers currently downloadingπ This swarm is referred to as the “π-swarm.”

4 During the file download process, the nodes in the swarm start the piece exchange by exchanging bitfields A bitfield forπat a particular node is one bit per piece, and denotes whether or not a particular piece is possessed by the node or not Thus,C’s bitfield has the length, in bits, of the number of pieces ofπspecified inτπ Nodes in the swarm that receiveC’s bitfield know which piecesC is looking for and which piecesCalready has Initially,C does not have any of the pieces ofπand so none of the bits inC’s bitfield forπwill be set.

5 After the full file is downloaded at C, C may decide to seed the file In case C decides to seed the file, C sends a message to Γπ informing that C will be able to provide seed services for the π-swarm in the network Any another client, D, wishing to downloadπ must open a connection withC and includeC inD’s peer overlay to take advantage of C’s availability (C is not automatically included in D’s overlay).

Typically, BitTorrent systems in Internet are used for files which are very large, in the order of tens of mega-bytes (10 6 bytes) The performance of the BitTorrent system depends on how many seeds are on the system If no seed is available in the swarm and all the pieces are not available in the swarm, or if the seed is not able to be contacted, it is easy to see that the entire file cannot be downloaded.

BTI: A straightforward implementation of BitTorrent over MANET

Some assumptions have to be made to implement BitTorrent over MANET We enumerate them as follows:

A.1: In the Internet, torrents are obtained via e-mails or by searching for them on the world wide web, in websites such ashttp://www.legaltorrents.com In MANET, likewise, when a torrent is required, a search is launched for the torrent.

In our implementation, both trackers and seeds are able to reply back to a torrent seeker with the torrent for the file.

A.2: The BitTorrent specification leaves it to the protocol implementors to implement suitable incentive mechanisms We do not implement an incentive mechanism in our implementation.

A.3: In our implementation, clients which have completed download ofπwill decide to seed a file with probabilityρ π s for a durationt π s

A.4: BitTorrent peers send clients pieces unreceived at the client randomly and not in any specific order.

Procuring torrent files in BTI

As mentioned in A.1, BTI searches for the torrent before the file download starts at client

C In BTI, the torrent search is implemented at the routing layer The routing protocol at C broadcasts a torrent request message, TREQ, looking for a torrent for a particular fileπ(identified by the file ID,f ileid(π)) TheTREQforπ fromCis propagated like a route request message, and uses an expanding ring search similar to the route discovery messages used in common routing protocols such as AODV [53] When aTREQreaches a node which has a copy of the torrent (a tracker node or a seed node), a torrent reply message,TREP, which contains the torrent file forπ,τπ, is generated, and unicast toC Our implementation of BTI uses ANSI (see Chapter 3), a congestion-aware, reactive routing protocol for disseminating TREQs and collecting TREPs We chose ANSI because it has been shown to perform better or comparable performance as compared to AODV (see Chapters 3 and 4) in a wide variety of scenarios under both UDP and TCP flows.

Acquiring peer lists from the tracker

As per the BitTorrent specification, once a client C receives τπ, the client C is ready to contact the tracker,Γπ, for a list of peers forπ In our implementation, the tracker sends a list of all the peers,P = {p1, p2, pn}, which are currently either seeds forπ or are currently downloading π (all nodes in the π-swarm) C then contacts all pi ∈ P and starts downloadingπ from allpi ∈ P During the file download atC, C sends periodic requests for peer lists forπtoΓπsoCmay open more connections to downloadπ, further amortizing the cost of download ofπatC.

File download process at clientC

The client, C, then opens TCP connections to all pi ∈ P to start downloading π for reliable transfer of pieces As soon as the connection is opened betweenCandp i ,Csends aHANDSHAKEmessage containingτπ toCindicating the desire to download pieces of π fromC The HANDSHAKE message is followed by a BITFIELD message from C, indicating the pieces which C already has The server (peer)pi uses the information in the bitfield message sent fromC to send pieces toC The peerp i (server) and client C then exchange randomly chosen undownloaded pieces with each other.

BitTorrent peers manage their download rates by periodically choking and unchoking the many TCP connections over which a file is downloaded from When a TCP connection from a peerpto peerCis choked,phas temporarily stopped sending data toC Likewise,

C can choke/unchokeC’s data flow top Every peerpunchokes its connection with another peerCafter a pre-determined choking period (CHOKING INTERVAL) and tests if pshould send pieces to the previously choked peerC This decision is based on whether or notChas been a “well behaved” leecher (that is,C has been uploading as much asC can while downloading from p) IfC has been a good leecher, thenpwill send C more pieces Otherwise, pchokes its connection to Cagain, and Cwill wait to receive pieces frompuntilpengages in optimistic unchoking That is,puploads pieces toCin the hope thatCwill reciprocate This optimistic unchoking is done with a probability of leniency,

Pl In effect, the basic BitTorrent incentive model rewards good peer behavior and pun- ishes bad peer behavior, but periodically engages in a “pardon” mechanism Regardless of the number of pieces a peer p sends to C, p sends the pieces all in one spurt to the transport layer (TCP), keeping the unchoking period short.

We note that choking can either be implemented per connection, in which each connection has a separate choking timer, or implemented per node, in which each node unchokes and determines the pieces to be sent to all the node’s remote peers before the node unchokes again Our implementation of BTI implements both choking mechanisms.

In our implementation of BTI, the server sends 4 pieces every 4 seconds (serverCHOKING INTERVAL is 4 sec), and the client sends 4 pieces every 8 seconds (client CHOKING INTERVALis

8 sec) AtC, as soon as the first piece is received fromp i ,C is able to serve the piece to other clients in the swarm which are interested in the piece.

Seeding strategy for BTI clients

The effective functioning of BitTorrent depends on how “gracious” clients in the swarm are in becoming seeds in the swarm and helping the file downloads at other downloading clients To this effect, we implemented a seeding strategy wherein clients which have completed download will seed a file with a probabilityρ π s for a timet π s

Upon completion of the download of file πatC, ifC decides to be a seed for π, then C sends a SEEDmessage to Γ π indicating that C will be a seed for π Thereafter, when another clientC ′ contactsΓπ for a peer list,Γπ will send a peer list toC ′ indicating that

C is available as a seed When the seeding duration, t π s , has elapsed at C, C sends an

UNSEEDmessage toΓπpreventing the tracker from sendingC’s identity to other clients wishing to downloadπ.

BTM: A cross-layer decentralized BitTorrent for MANET

BTM protocol operations

The BTM algorithm is comprised of BTM operations at the application layer, and theANSI/XL-BTM operations at the routing layer As mentioned before, BTM is a cross- layer P2P system which takes advantage of the routing layer activities in the MANET for collecting key/peer information for the smooth functioning of BTM at the application layer In our implementation of BTM, we use ANSI to manage key/peer information at the BTM cross layer interface (XL-BTM) The XL-BTM layer is responsible for collecting the subscribe events from the BTM layer and working with ANSI to get the information requested by the BTM layer When the information related to the subscription is collected by ANSI, ANSI sends message to the BTM application layer indicating that the corresponding information has been received Additionally, the torrent/peer cache are updated.

At the application layer of a client C, BTM is responsible for contacting peers for file π and managing the connections for downloadingπ As in BTI, as soon asCdownloads one piece ofπ,C is able to start serving the pieces to other nodes If peers are not available to be contacted, then, the BTM layer instructs ANSI to launch eitherTREQs orPREQs, depending on whether the client is looking for keys (τ π ) or peers, respectively Note that withoutτπ,Cis not able to look for peers forπ, so ifτπ is not available atC,C must first launch a search forτ π and look for peers forπonce aTREPcontainingτ π is received.

The application layer is also responsible for triggering application layer-subscribe messages to ensure that a peer list is requested periodically Note that in BTI, the application layer requests a peer list periodically directly from the tracker.

ANSI layer and XL-BTM operations

At the ANSI layer, the XL-BTM events trigger ANSI operations which provide the BTM application with keys and peers as and when required The basic XL-BTM and ANSI activities in BTM are described as follows:

1 Torrent Propagation: During bootstrapping, the XL-BTM interface at the initial seed forπ, σ π 0 , broadcasts the torrent forπ(τπ) using ANSI This broadcast, when received at the neighbors of σ π 0 , is re-broadcast with a probability pτ using gossip [32] (see Section 2.3) When these gossip messages are received at the nodes, the torrent is cached in the torrent cache, ready for use by the node when the BTM layer starts client download Proxy seeds (see Section 6.3.1) are created in this step.

2 TREQ Propagation: At the start of the file download atC, ifC does not have the torrent stored inC’s torrent cache, the BTM layer instructs the XL-BTM interface to look for the torrent using a TREQ The TREQ, as in BTI, is propagated with a limited TTL, like a route discovery message in common reactive protocols such as AODV [53] and uses an expanding ring search if the search fails initially Any nodeiwhich receives theTREQchecks nodei’s torrent cache, and if the torrent is present, sends aTREPback to the client.

3 PREQ Propagation: At the start of the file download at client C, ifC does not have any peers cached inC’s peer cache forπ(but hasτπ), thenC instructs ANSI to search for peers using a PREQ As with the TREQ, the PREQis propagated like a route discovery message and uses an expanding ring search to try larger hop counts when the initial search for peers fails If a node receiving a PREQ is not a peer for the file, the node is able to copy τ π (stored in the PREQ) before re- broadcasting thePREQ If the node receiving thePREQis a peer forπ, then the node unicasts aPREPback toC We can see that severalPREPs may be received at thePREQsender from several different peers.

As with BTI, BTM is designed to get a fresh list of peers periodically, so, in BTM, PREQs are periodically sent out, and the peer information gathered is stored in the peer cache In addition, at the time when aPREQ is expected to be sent out, if a peer found in the peer cache has not been connected to, the client connects with the peer in the peer cache, thus saving valuable network resources expended inPREQ/PREPpropagation.

4 HELLO Propagation: When node becomes a peer for a particular file, the node sends out this information in theHELLOmessages which are sent out periodically in ANSI Neighbor nodes receiving theHELLO update their peer cache with this information.

As mentioned earlier, the fact that BTI system starts with only one seed makes BTI vul- nerable to a host of issues in MANET For example, a seed σmay be physically located in another partition, be bombed, or lose all ofσ’s energy reserves, making the download of a full file difficult in the BTI swarm We address these issues in BTM by providing redundant resources.

In BTM, during the torrent broadcast phase at bootstrapping, nodes which satisfy a hash function, H(τπ), become proxy seeds, σ i π , i ∈ [1, n] 2 These proxy seeds immediately open connections to the initial seed,σ π 0 (whose ID is available inτ π ), download the fileπ, and become seeds for the fileπ This process assures that a full copy ofπis available in several sections of the network, and makes the BTM system more resilient to single point of failure of the initial seed In addition, even during the download of the file in the proxy seeds, the fact that more pieces are available in the network improves the resilience of the system The connections made by these proxy seeds are managed by BTM processes similar to other peer overlays during a “regular” file download.

2 There aren proxy seeds in the network for a fileπ The quantityn depends on the hash function.

We investigate both a BTM framework which does not impose incentives and a Tit-For- Tat (TFT) incentive mechanism for BTM In the No Incentive scheme, a BTM peer p uploads a constant number of pieces to a remote peerc(whenpunchokes) without regard to how many pieceschas uploaded top.

In the TFT scheme, a peer p uploads to the remote peer c(when p unchokes) as many pieces aspreceived fromcin the last choking interval To prevent starvation in circum- stances wherechas no pieces to send top, two optimizations are added: (1) ifpis a seed, then,pwill always send 4 pieces per choking interval toc, and (2) ifpis not a seed and has not received any pieces fromc, then, with a probabilityPl(the probability of leniency),p will sendc4 pieces to check if an optimistic unchoking is possible withc This optimistic unchoking is done so that cis not snubbed by p The algorithm for the TFT scheme is shown in Algorithm 5.

Algorithm 5 The Tit-For-Tat (TFT) algorithm for BTM

Require: The incentive to use at unchoking eventtfor remote peerc,ψ pc t x⇐rand(); {Generate a random number} if(isASeed(p))then ψ ⇐4; {Seeds use a constant incentive} else if((ψ == 0) && (x≤Pl))then ψ ⇐4; {Optimistic unchoking, prevent prolonged snubbing} else ψ ⇐uniqueP iecesRcd(c,CHOKING INTERVAL); {returns the number of pieces fromcinCHOKING INTERVALsec.} end if ψ t pc ⇐ ψ ×congestionStatusOnNextHop(c); {returns (0, 1] for congestion status on next hop toc} return ψ pc t ;

The rest of the BTM functions pertaining to the actual file download are similar to the BTI functions, and use the message types detailed in [4] As soon as the BTM client connects to peers (using TCP) and exchangeBITFIELDmessages, the peers start sending randomly chosen, undownloaded pieces of the file to each other BITFIELDmessages are thereafter exchanged periodically, everyBITFIELD INTERVALdepending on client/peer requirements When no incentives are imposed, a peer psends its remote peerc4 pieces every timepunchokes, subject to piece availability The maximum number of pieces a peer p will send client cin the TFT scheme is fixed at 20 For both No Incentive and TFT im- plementations, theCHOKING INTERVALis 10s As with BTI, regardless of the number of pieces a peerpsends toc, peerpsends the pieces all in one spurt to the transport layer (TCP), keeping the unchoking period short.

Evaluating BTI and BTM

Regardless of the motivation and design of P2P protocols for MANET, for example, X- GNU [18], ORION [40], and MPP [62], these protocols concentrate on providing two fundamental functions for P2P systems over MANET: service discovery and (proactive) overlay construction for the P2P system These protocols follow this design methodology because once a peer is discovered in the above systems, the client downloads the full file from that peer However, in BTI and BTM, the overlay is constructed only on-demand and is in flux even during the time when pieces are being downloaded at the client That is, the overlay construction does not stop when the BitTorrent client finds one peer, but is constantly evolving as per the needs of the client.

In both BTI and BTM, a clientCdownloading fileπrequests peer lists periodically (from the tracker in BTI and usingPREQs in BTM) and connects to fresh peers during the file download process, thus changing the peer overlay structure over time In the MANET context, the effect of changing peer overlay on piece download at the client is greater, given node mobility and other issues such as partitions.

Thus, a legitimate evaluation of BTI/BTM should evaluate the operations of BTI/BTM throughout the duration of the download, and evaluate how well the download itself proceeds, rather than just concentrate on peer/key lookup and discovery functions which other P2P system designs usually use to evaluate their approaches.

Another comment about the evaluation of BTM and BTI is about the effect of frequent link breakage and partitions characteristic of the MANET environment Recall that a clientC using BTI should first recover the torrent file,τ π , for a fileπ before connecting with the tracker For BTM, torrents forπ are gossiped in the network by the initial seed, σ 0 π , and so a clientCusing BTM may have received τ π already In any case, in both BTI and BTM, if the client C does not have the torrent, the client C should procure τπ by propagatingTREQs (and receiving a TREP) using the network layer operations before

C can start initiating file download After τπ is procured at C, in BTI, C connects to the tracker, Γπ, to procure a peer list for file π, so that C can connect to the nodes in the peer list to start downloading the file For BTM, once τπ is procured atC (either via torrent gossips or via TREQpropagation at the network layer), Cwill look for peers by propagatingPREQs (if BTM does not have any peers forπin the peer cache) Only after aPREPis received atC in BTM can the clientC connect to the peers which responded via aPREPand start the file download.

The MANET environment, characterized by frequent link breakage and partition and can delay any of the above described processes for both BTI and BTM indefinitely De- pending on the topology, congestion and mobility characteristics, the file download can indefinitely stall because (a) the torrent was not procured, or (b) the connection to the tracker was not possible, or communication degrades after the connection has been established with the tracker (for BTI) (c) the peers cannot be found (for BTM) or, lastly, (d) the connection was made to the peers, but the connection quality suffers owing to congestion or topological fluctuation Even though BTM’s decentralized model in the context of redundant resources will help BTM perform better as compared to BTI under the above conditions, the above conditions make the interaction between the peers in a swarm difficult to predict in both the BTI and the BTM model We list some of these problems as future work (see Section 7.2.1).

We keep these issues in mind when we discuss the performance results of BTI and BTM.

Network and protocol characteristics

Simulation models of BTI and BTM were developed in QualNet (version 3.9) Our experiments were studied under an an FCS context All our experiments were performed with 50 nodes spread over a terrain of 2000m×500m At the physical layer, nodes used a 250m transmission range with no fading and a two-ray pathloss model and 2Mbps capacity At the MAC layer, nodes used 802.11DCF For mobility, we used the random waypoint mobility model with a pause time of 0s For mobile nodes, we chose the same minimum waypoint speed and the maximum waypoint speed to avoid issues due to the speed decay problem.

At the network layer, nodes used ANSI to perform routing functions and provide cross- layer functions for both BTI and BTM We modified ANSI to handle torrent gossips(for BTM),TREQ/TREP(for BTI/BTM) andPREQ/PREPpropagation (for BTM) In addition, theHELLObeacon mechanism in ANSI was changed to advertise a node’s peer status when running under BTM The HELLOinterval in ANSI is 1s and the rest of the parameters for ANSI are as specified in Chapter 3.

Both BTI and BTM used an initial TTL of 2 hops for the TREQ After a 5s timeout, if aTREP is not received, the TREQis sent again with a TTL increased by 1 TREQ searches expired after the TTL reaches 10 hops These parameters are specific for the topology used In BTI, the clients request a fresh peer list from the trackers every 3-5 minutes This parameter reflects the current choice for such requests in the implemen- tations for BitTorrent in the Internet For BTM,PREQpropagation is similar toTREQ propagation, but fresh PREQs are sent every 30s in BTM to account for the rapidly changing topology Our implementation of BTM usedpτ = 0.6, and was chosen in such a way as to control the extent of flooding of the torrent gossips, but to also allow for the spread of the torrent gossip in the network we use The hash function used for BTM to create proxy seeds,H(τ π ), is defined as follows: ifj ∈ ∩{x|(x=σ 0 π ±10i, i∈[1, N

10])∧(x∈[1, N])} thenjis a proxy seed, where j is the node ID of the node receiving the torrent gossip, σ 0 π the node ID of the initial seed for fileπ, andN is the number of nodes in the network.

TCP-Reno, used for the TCP flows in our experiments, used an MSS of 512 bytes, maximum send/receive buffer of 16384 bytes each, and delayed ACKs.

Experiments and performance metrics

Experiments 1, 2 and 3 were performed for comparing BTI vs BTM, Experiment 4 was performed for comparing BTM (No Incentive) and BTM (TFT), and experiments

5 and 6 were performed for studying how long BTM takes to download a unique piece and how many bytes of overhead are expended per unique piece as speed increases and the performance of BTM with increasing swarm size In all the experiments, the clients started downloading the files at a randomly chosen timet ∈(s, e/20), wheresis the start time of the simulation (0s) andeis the end time of the simulation.

In Experiment 1 and 2, the simulation time was 60 minutes and three clients downloaded two different files (files 1 & 2) each The files contain 2000 pieces each In Experiment

1, we varied the node mobility from static (zero node mobility) to 20 m/s in steps of 5m/s and the pieces were 500 bytes each In Experiment 2, we varied the piece size of the files from 100 bytes to 1000 bytes in steps of 100 bytes, and the random waypoint speed was 20 m/s We chose a large range for the piece sizes because we wanted to show the variation in the performance of the BitTorrent framework in the MANET environment as piece size changes from below the MTU size of the wireless interface (802.11b) to values above the MTU These experiments send theBITFIELDmessage once after connecting with a remote peer, and do not impose incentive mechanisms on the peers for BTM, and use a per connection choking mechanism for both BTI and BTM.

We performed 20 trials for all data points in experiments 1 and 2, and present the following metrics with 95% confidence intervals:

1 Average goodput: measured as the average of the number of unique piece bytes received at a client divided by the effective duration, defined as the time between the start of the client session and the receipt of the last unique piece for the same client.

2 Average number of unique pieces received: measured as the average of the number of unique pieces received at the clients.

3 ANSI MAC-PDU bytes sent in the network: measured as the total number of bytes sent at the MAC layer for ANSI operations in the network, both unicasts and broadcasts Includes the MAC-PDUs sent for both ANSI control NDPUs and ANSINPDUs sent for BTI/BTM activity.

4 Average peer degree: measured as the average of the number of peers in the client overlays.

The goodput, and the average number of pieces received measure how many pieces are received how quickly, and the ANSI MAC-PDU metric measures the MAC overheads due to all network activity The peer degree metric will explain the performance characteristics.

In Experiment 3, we studied the time taken by one client to download a fixed percentage of the file in BTI and BTM The simulation time was 120 minutes, and the node speed was 20m/s In this experiment, one client downloads one file of size 160000 bytes, split into 800 pieces each of size 200 bytes We measured the time taken to download

10,20, ,100percent of the file in the BTI and BTM networks in 100 trials and report our results.

In Experiment 4, we studied the performance characteristics of BTM when no incentive mechanisms are imposed vs BTM when the TFT incentive mechanism is imposed The experiment scenario is similar to Experiment 1 In addition to some of the metrics described in the previous and next sections, in this experiment, we also measure (1) the Normalized Acceptance Ratio (NAR) for the clients and the energy consumed per piece downloaded at the clients Here, the energy consumed by a node is the total energy expended for communications That is, the energy consumed for processing and computation is not considered for this metric NAR, defined in [66], is the ratio of the number of requests initiated by a peercto the number of requests forwarded by the peer c(as a relay) We used the per node choking mechanism in BTM for this experiment, and theBITFIELD messages are exchanged once when the client establishes a connection with a remote peer We performed 20 trials for this experiment and reported our observations with 95% confidence intervals.

In Experiment 5, we studied the performance of BTM as speed increases and studied how long it took to download one unique piece, and reported how many bytes of TCP overhead are expended for downloading one unique piece Node speed was varied from 0 (static network) to 20m/s In this experiment, three clients download two different files each (identified by file ID 1 and 2) These files are 1MB each, and each file was split into 2000 pieces of 500 bytes each We performed 18 trials to account for stochastic aberrations and reported our results with a 95% confidence interval.

In Experiment 6, we study the scalability and performance of BTM when the number of peers in the swarm increases We performed this experiment to motivate the BTM model for the rSerPool problem (see Chapter 7) In this experiment, mobility is fixed at 20m/s (for both maximum and minimum speeds), and clients download 1 file of 2000 pieces (500 bytes each piece) The number of clients is varied from (swarm size) 1-7 clients.

We performed 14 trials to account for stochastic aberrations and reported our results with 95% confidence intervals.

Both experiments 5 and 6 used a per node choking mechanism for BTM and used a BITFIELD INTERVALof 1 minute.

For experiments 5 and 6, we evaluated the following metrics:

1 Total number of unique pieces downloaded: is the total number of unique pieces received at all clients in the network.

2 Total duration of download: is the total amount of time taken by all the clients in the network to download the unique pieces from the peer(s).

3 Time taken per piece download: is the time taken, on average, to download one unique piece, and is the (total duration of download)/(total number of unique pieces downloaded).

4 Average MAC layer overheads due to TCP activity per piece downloaded: is measured as the average MAC layer overhead bytes due to TCP activity per unique piece downloaded.

The total number of unique pieces and the total duration of download metrics measure how many pieces are received how quickly (shown using the time taken per piece downloaded), and the MAC layer overheads measure the overheads of due to TCP activity in the network per piece downloaded That is, the overheads measured in these experiments are the average MAC overheads, without the overheads due to ANSI control messages,per piece downloaded.

Simulation results

Experiment 1: BTI vs BTM: Node mobility

Figure 6.2 shows the results from Experiment 1 We see that, for the scenarios studied, BTM outperforms BTI in terms of goodput, number of unique pieces received and the peer degree.

In Figure 6.2(a), BTM shows better goodput as compared to BTI because overall, TCP connections in BTM are shorter (owing to the controlledPREQpropagation) and the peer degree in BTM is larger For both BTI and BTM, a slight increase in mobility, from 0 m/s to 5 m/s improves the network performance owing to relieving hotspots in the network In a static network, the same paths tend to be taken frequently and result in congestion resulting in more drops of TCP segments, but in a slightly mobile network, new paths are found every once a while, improving network performance, dropping fewer TCP segments This relationship between increase in mobility and discovery of new paths explains the slight surge in goodput received metrics for both BTI and BTM In addition, BTI also experiences the advantages due to a larger peer degree (see Figure 6.2(d)) Higher mobility decreases BTI goodput drastically because TCP packet drops increase owing to longer TCP recovery times over longer connections maintained in the context of node mobility, but in BTM, the shorter path lengths along with a higher peer degree manages to keep goodput high as mobility increases Thus, we see that BTM goodput only decreases slightly as speed increases over 5 m/s.

We explain the results seen in Figure 6.2(b) as follows In general, we see that BTM is able to deliver 3-6 times more unique pieces than BTI because of three reasons: (a) TCP performance in BTM is expected to be better, given that the average path lengths to peers are most likely shorter in BTM, (b) there are more peers in the BTM network, and (c) the peer degree for BTM is larger There is a slight surge in the number of unique pieces received at both BTI and BTM as speed increases from 0 m/s to 5 m/s for the same reasons that improve goodput when speed increases from 0 m/s to 5 m/s In general, increased mobility improves the spatial disjointness of the TCP paths, decreasing chances of contention among the TCP streams As speed increases beyond 5 m/s, however, BTI performance degrades and shows instability because the TCP connections in BTI are longer and are more difficult to maintain as speed increases For BTM, on the other hand, a large peer degree, coupled with shorter TCP connections help in delivering more pieces in a more streamlined fashion than in BTI Therefore, as speed increases, we see only a slight drop in unique pieces received for BTM.

We see, in Figure 6.2(d), that the average peer degree for BTI is more or less constant at

(a) Average goodput for pieces received at clients.

(b) Average number of unique pieces received at clients.

(c) Total MAC layer overhead bytes in the network due to ANSI activity.

(d) Average number of connections at clients.

Figure 6.2: Experiment 1: BTI vs BTM: Performance with increasing node mobility.

(roughly) 3 This number is the [number of other peers in the swarm (2 per file)] + [the number of initial seeds (1 per file)], and is the maximum peer degree for this scenario in BTI This fact is because in BTI, the clients contact the tracker for a full peer list, and in this case, the clients eventually end up connecting to each other and the initial seed.For BTM, we see that the clients are able to connect not only to other peers (2 per file) and the initial seed (1 per file), but also to the proxy seeds (4 per file, given a 50-node network) Thus, the peer degree is higher in BTM (maximum of 7 for this scenario) The reason why peer degree is low in a static network for BTM is because the limitedPREQ propagation prohibits BTM from finding other peers in the network which are located over 2 hops away from the clients As speed increases, the chances of finding a new, unconnected peers improve in BTM, thus increasing the average peer degree for BTM as speed increases.

As far as ANSI overhead bytes are concerned, in Figure 6.2(c), we see that ANSI expends more overheads in BTM than in BTI This observation is explained by the decentralized model for BTM, which makes both torrent/peer discovery expensive processes in BTM In BTM, the ANSI overheads reflect torrent gossips, TREQ/TREP propagation, and PREQ/PREP propagation apart from the regular ANSI activities (used for route discovery etc.), but in BTI, the ANSI overheads reflect only TREQ/TREPpropagation and regular ANSI activities Both BTI and BTM show an increase in ANSI overheads as speed increases as a response to higher rate of link breakage.

The reason for the spurt in goodput (see Figure 6.2(a)), and average number of pieces (see Figure 6.2(b)) for BTI is follows As speed increases beyond 15m/s, the increased mobility results in encountering more nodes per unit time, improving the chances of connecting to a new peer to which a connection was not earlier possible (note that BTI clients know about all the clients in the network, owing to the fact that BTI clients receive a full peer list from the tracker), resulting in a higher peer degree This higher peer degree in turn helps in improving the goodput and average number of unique pieces received metrics for BTI.

Experiment 2: BTI vs BTM: Piece size

Figure 6.3 shows the results of Experiment 2 As with Experiment 1, we see that BTM outperforms BTI in all the metrics—goodput, number of unique pieces received, and peer degree.

In Figure 6.3(a), both BTI and BTM show an increase in goodput as piece size increases

(a) Average goodput for pieces received at clients.

(b) Average number of unique pieces received at clients.

(c) Total MAC layer overhead bytes in the network due to ANSI activity.

(d) Average number of connections at clients.

Figure 6.3: Experiment 2: BTI vs BTM: Performance with increasing piece size. because more bytes are received per choking interval as piece size increases BTM shows better goodput metrics as compared to BTI because BTM uses more TCP connections over shorter distances than those used by BTI, which results in more pieces being delivered quicker for BTM (see Figure 6.3(b)) On the one hand, goodput for both BTI andBTM increase as piece size increases, but the bigger piece sizes also stress the network and increase the traffic in the network As piece size increases, the chances of a piece being fragmented at IP layer into smaller pieces to be sent over the network increase,placing more demands on the MAC layer in terms of both channel acquisition demands and MAC-PDUs transmitted to the next hop As a result, more TCP segments are received out of order at the client, pressuring TCP to retransmit these segments, increasing in turn the number of retransmissions attempted by the MAC layer (for these experiments, we observe an increase in both the number of retransmissions and the number of fast retransmissions as piece size increases) These TCP/MAC retransmissions increase the congestion in the medium and reduce the total number of unique pieces received at the clients This relationship between MAC retransmissions and piece size is why the number of unique pieces received decreases as piece size increases for both BTI and BTM. However, the effect of increasing MAC retransmissions is milder for BTI, given the lower load in the network for BTI.

As far as ANSI overheads are concerned (see Figure 6.3(c)), BTI shows lesser overheads as compared to BTM for the reasons explained in the results for Experiment 1 ANSI overheads do not seem to change as piece size increases because of the following reasons During the short unchoking period at peer p, when peer p sends pieces to other peers, the TCP segments are sent in quick succession, making route breakage an non- issue during these (short) periods That is, at the start of the unchoking period, if a peer p does not have a route to a remote client c, then p requests a route to c using ANSI, and the newly computed route is used for the duration of the (short) unchoking interval. Note that a maximum of 4×1000 bytes is sent per unchoking interval per TCP connection at a peer, which is below the send buffer size (16384 bytes), and these bytes get a full CHOKING INTERVALto be sent to the remote peer.

In Figure 6.3(d), we see that the average peer degree in BTM does not change with piece size because the mobility profile (which controls in part the number of peers contacted) does not change as piece size increases Recall that in BTI, the peer degree is not expected to change, given that BTI clients contact the tracker for a full peer list As with Experi- ment 1, BTI shows a peer degree of (roughly) 3 for the same reasons BTM consistently

Figure 6.4: Experiment 3: BTI vs BTM The graph shows the simulation time taken for downloadingx% of the file for all the trials On the right of the plots, the number of trials downloadingx% of the file in BTI and BTM are shown as BTI%/BTM%. shows a peer degree of (roughly) 6 because the high mobility allowsPREQs to find more peers in this scenario.

Experiment 3: BTI vs BTM: How long do BTI and BTM take to download what percentage of the file?

Figure 6.4 shows how long each of the trials for BTI and BTM took to download x% of the file, and lists the percentage of trials that downloadedx% of the file for both BTI and BTM to the right of the plots Note that 100 trials of the experiment were performed.

We see that in 88% of the cases, BTM has downloaded atleast 10% of the file, and a major portion of these 88% of the cases lead to full file download That is, 88 of the 100 trials download at least 10% of the file in BTM, and 69 cases download the full file For BTI,however, only 41 of the 100 trials download atleast 10% of the file, and none of the trials download the full file In our simulations, we also observed that for both BTI and BTM, if, in the trial, the client did not download 10% of the file, the client did not start the download That is, the client did not even download one piece We explain these results as follows.

For BTM, the increased peer degrees for the clients and the availability of pieces amongst several proxy seeds in the network, and not just the initial seed, improves the chances of downloading pieces In addition, the improved peer degree also improves average path length in BTM, improving the chances that if one TCP connection breaks in BTM, the other connections are still available for piece download Note that the maximum peer degree for BTM is a total of [1 initial seed] + [4 proxy seeds] = 5, but the maximum peer degree for BTI is 1 (connected to initial seed only) For BTM, even if one TCP connection breaks, there are other TCP connections to download pieces from, but for BTI, with one TCP connection to download the entire file, BTI is not able to download the entire file owing to issues due to mobility, which affects the TCP connection adversely Indeed, we see that there is a good probability, in our experiments, that if the file download starts in BTM, BTM will likely lead to complete download of the file This probability for BTM in our experiment is69/88 = 0.785, but for BTI, the same probability is zero.

The reason why both BTI and BTM are not able to start the download sometimes (11 out of 100 times for BTM and 59 times out of 100 in BTI) is explained as follows For BTI clients, mobility decreases the chances of finding the torrent files quickly usingTREQ.

In some cases, even if the torrent is found, the mobility profile makes connection even with the tracker difficult, and even if the tracker was contacted, the peer lists may not be downloaded Further, if the client and peer are located many hops from each other in BTI, the bitfield exchange and the file download cannot proceed if the TCP connection between the client and the peer is not possible For BTM, the same issues hold, but to a lesser degree, owing to the following reasons (a) torrents are gossiped, so BTM clients may already have the torrent, (b) BTM does not need to connect with a tracker, and (c) the peer degree is larger for BTM, because peers are found using localized transmissions of thePREQ.

BTM is not able to download 100% of the file in 19 of the 88 cases which start download because even if the download has started, some pieces are very hard to procure if the client is not connected to the initial seed or the connection to the initial seed was dropped Note that despite the presence of proxy seeds, this difficulty arises owing to the fact that the proxy seeds themselves may not have the full file when the client needs the unreceived pieces In fact, we see that 83 of the 88 cases which started the download receive 90% of the pieces atleast, but the rare pieces (last 10% of the file) are difficult to download in some of these cases.

Discussion

We see that BTM outperforms BTI with respect to goodput and number of unique pieces received at the clients in a variety of scenarios We also see that BTM’s performance with respect to goodput and number of pieces received is more consistent, and shows lesser variation across trials than the BTI performance, judging from the size of the confidence intervals of the average values for the data points in the graphs for the two protocols In addition, BTM is able to better amortize the expenses of client download by using higher peer degrees These better performance metrics for BTM are attributed to (a) BTM’s decentralized, cross-layer model for acquiring key/peer information, which is a better fit for MANET than the BTI model, and (b) the presence of redundant resources (proxy seeds) in BTM However, we note that these better performance characteristics come at the cost of higher MAC resource consumption for BTM.

Our experiments with incentive enforcement in BTM show that BTM (No Incentive) ex- hibits better network-wide performance, while BTM (TFT) maximizes node-wise performance, improving the node lifetime by decreasing the energy consumed to download pieces In general, assuming that peers will be rational implicitly assumes the selfish- ness of the peers, which runs contrary to the “spirit” of MANET Thus, striking a balance between node-wise performance and network-wide performance is a hard problem in a MANET.

Our results also show that redundant resources and distributed operations in the context of simultaneous file download (from multiple peers) can improve the performance of aP2P system and make the P2P system more resilient to typical MANET issues such as node mobility.

Chapter 7 CONCLUSIONS AND FUTURE WORK

Summary of contributions

The research presented in this dissertation shows that SI mechanisms, when used to collect and maintain topological information in a MANET, can result in better performance than traditional methods We see these mechanisms helping the cause of upper layer protocols in Chapters 4 and 6 But as we have seen in Chapters 3 and 5, these better performance characteristics come at the cost of network resources We note, however, that a case can be made by arguing that the MAC layer resources expended in the SI-based system are used towards “more useful” purposes of delivering more packets (in the case of ANSI) and recovering more packets (in the case of PIDIS).

In Chapter 3, we show that ANSI sometimes outperforms AODV for UDP traffic in both hybrid and pure MANET networks with respect to packet delivery, number of packets delivered, end-to end delay, and jitter We see that the congestion-aware property of ANSI helps CBR/UDP applications, because out-of-order delivery is not an issue with UDP loads In addition, the variance of the observed values (as measured by the width of the confidence intervals) is most often lower in ANSI, indicating a more stable performance.

We also see that implementing congestion-awareness at the routing layer comes at a cost.

In the case of the ANSI network, congestion awareness sometimes translates to higher route discovery activity because the routing layer invalidates congested routes ANSI’s congestion-awareness helps ANSI by delivering more APDUs in the case of CBR/UDP loads, but in the case of TCP, congestion-awareness sometimes increases out-of-order deliveries, pressuring TCP more than AODV Also, congestion-awareness raises scalability issues when the traffic increases In general, we note that it is difficult to design routing protocols which are scalable under extreme traffic conditions, but incorporating congestion awareness complicates the problem by incurring overheads in an already bogged network The trade-off between the amount of overhead expended in finding congestion- free paths and the amount of resources remaining for actual data delivery is delicate at high traffic loads and is worth studying However, we see that ANSI is able to achieve this balance and sometimes perform better than AODV in higher traffic load conditions, even though these come at the cost of network resources We believe that the basic ANSI architecture provides for implementing a more general purpose, self-organizing routing protocol incorporating autonomously adaptive characteristics that enable ANSI to behave well under a wider variety of network and traffic conditions.

In Chapter 4, we continue our study of ANSI, and show that using TCP to evaluate unicast routing protocols for MANET has the benefit of illustrating the routing process in an end- to-end, rather than hop-to-hop context, thus yielding new insights into the working of routing protocols for MANET In contrast, using UDP would not have yielded this “per- flow” understanding and how evenly the flows divide the resources Thus, we see that even though TCP is not a useful protocol as-is for practical use over a MANET, when used as an evaluation tool, TCP gives us a new way to design and understand routing protocols for MANET.

Our studies show that ANSI, by performing congestion-aware routing, can improve some performance metrics, but it remains to be seen how we can incorporate our insights into the routing protocol design for improving MANET routing protocol metrics for a wide variety of scenarios Studying the effect of TCP over other lower layers in the MANET will also provide new insights regarding the design of these protocols In this spirit, we call on the MANET research community to rethink the use of TCP for evaluating their protocols for MANET, and use TCP-based evaluation to ensure a “TCP-friendly” stack.

In Chapter 5, we described and evaluated PIDIS, an adaptive, persistent packet recovery mechanism which uses swarm intelligence to gossip for lost packets effectively PIDIS adapts to network conditions and adjusts the number of times lost message recovery attempts are made Our approach exploits the positive and negative feedback mechanisms of swarm intelligence to quickly search for good candidate routes from which lost packets could be recovered PIDIS also utilizes the amplification of fluctuation mechanism of swarm intelligence to discover alternate, and possibly better, routes to adapt to changing packet delivery patterns and network topology ODMRP+PIDIS is shown to have better performance characteristics in terms of packet delivery, end-to-end delay, and overheads as compared to ODMRP+AG, in the simulated scenarios This behavior is owing to the fact that ODMRP+PIDIS is able to guide the gossip process more effectively, thus controlling the number of messages traversing the network.

In Chapter 6, we show that BTM is more resistant to node mobility and network partitions, and performs better and more consistently than BTI with respect to goodput and the number of unique pieces received, and does so with the help of a higher peer degree per client Our results also show that BTM is faster and downloads more pieces at the client than BTI We also see that as the number of clients in the network increases, the performance of BTM improves, making BTM inherently scalable.

Our experiments with BTM incentive enforcement illuminate the problem of peer incentive enforcement in a MANET context in which the nodes are expected to cooperate.Specifically, we see that there is a hard tradeoff between maximizing network performance and maximizing node-wise performance.

Despite the advantages of BTM over BTI, we see that the redundancy provided by BTM and its replicated object model (using proxy seeds) increases the overheads associated with file download, though these costs are amortized in the network That is, the performance improvement of BTM comes at the cost of increased network resource consumption The higher network resource consumption is due to the protocol operations that make BTM decentralized and more resilient to node failure—the removal of the tracker node and the creation of proxy seeds.

Future work

BTM and mobility

In Chapter 6, we noted that mobility can affect the BTI and BTM model adversely In particular, we noted that even though the BTM model has a number of features which help BTM to cope better with issues such as node mobility and partitions, the BTM model suffers from bad performance under some criteria influenced by the effects of mobility on TCP.

Some of performance issues in BTM, such as (a) when the torrent file is not able to be downloaded at a clientC, and (b) when the torrent file is downloaded, but the peers cannot be found, need to be studied to optimize the performance of BTM.

In the addition to the above problems, some issues pertaining to the TCP performance under mobility are discussed as follows and need to be further studied to optimize the performance of BTM for MANET In general, a study to understand how often a TCP connection breaks in a BTM environment is useful, because knowing when a TCP connection is no longer active will help BTM clients recognize broken connections (before

TCP does) This information can help the clients conserve valuable network resources and help the clients concentrate their efforts towards finding more peers In addition to the above, a client c which recognizes a broken TCP connection with a peer p can re- connect with peer pat a future time, ifpis in the neighborhood ofc, thus improving the rate and efficiency of piece download.

In BTM, TCP is loaded differently, as compared to typical client-server P2P models,where one file is downloaded in an FTP-like fashion In BTM, the duration between the unchoking, and the fact that a remote peer p may not send a client c pieces for a long time (whenp is snubbingc) can make TCP assume that the connection may have timed out Under these conditions, if the TCP keep alive timer probes are lost, the connection betweenpandccan be dropped These issues are more pronounced in a MANET environment, owing to issues of mobility and fluctuating topology These issues need to be studied further.

BTM for mesh networks: an adaptive tracker mechanism

In the wired BitTorrent model, the tracker node, whose only function is to track the nodes in the swarm, is used to procure peer lists with minimal overheads As we discussed earlier, this model is not suitable for MANET However, the presence of trackers improve the performance of the network, judging from the far lower overheads expended by BTI, as seen in Figures 6.2(c) and 6.3(c) Thereby, in an network where tracker operations can be carried out without fear of node bombardment or failure, the presence of a tracker node can significantly improve performance by delivering peer lists to the clients with little overheads.

Our BTM model, being completely decentralized, is ideal for “pure” MANET However, in the context of hybrid/mesh networks, a suitable approach to the P2P problem would be to partially decentralize tracker operations That is, clients which are able to contact tracker services should be able to avail of less-expensive peer list procurement from the tracker, but the protocol should function in a purely decentralized fashion when the tracker nodes are not present in the vicinity of the client, or when the tracker node is not present in the same partition of the client In the presence of infrastructure nodes, replicated tracker services can be provided at infrastructure nodes.

In this model, a client that requires a peer list will try to contact a tracker, but if the connection is not successful, the client will send aPREQas per BTM protocol operations detailed in Section 6.3.

However, using the above described process has some drawbacks First, the tracker for fileπ, Γ π , is not able to judge how far a peerp i ∈ P, whereP is the peer list sent (from Γπ) to a clientC, is This lack of information can lead to a severe drop in download rate for the clientC if the peers inP are not geographically close toC Secondly, assuming that the peers downloadingπ are able to send periodic updates of their location to Γπ, in a highly mobile scenario, the benefits of using a tracker are far outweighed by the overheads of periodic location information updates to the tracker–it is easy to see that when the network is highly mobile, the location information sent by the peers is only valid if the updates are sent more often.

Our work proposes to understand the above tradeoff by studying an adaptive model which the clientCis able to use a mix of services from both the tracker,Γ π , andPREQactivity.

Lastly, in the presence of replicated tracker services (over infrastructure nodes), the co- herency/consistency of the services provided over the infrastructure nodes is a big concern This problem arises because the information which the tracker maintains is in flux,owing to the movement of nodes and the arrival and departure of peers in the π-swarm.

Any approach to provide a flexible, replicated tracking service should address this issue.

A BitTorrent solution to reliable server pooling

The vision of this endeavor is to investigate cross-layer designs of peer-to-peer (P2P) models and techniques to facilitate reliable, and secure content distribution/sharing techniques Our motivation comes from observing the inherently robust and distributed nature of P2P systems The research issues to be addressed in the endeavor may include mobile applications (such as serverless email and instant messaging), system services and system support (such as distributed DNS), decentralized lookup services for MANET, overlay construction techniques for MANET, etc We propose to investigate research problems emerging in decentralized resource sharing in MANET, and design a P2P-based secure solution to the problem of Reliable Server Pooling (rSerPool) Our research goals are to design commercial-grade, secure P2P models and techniques to facilitate seamless content access, distribution and sharing.

The problem of rSerPool, described in [25], is to increase system availability by allowing a pool of redundant information sources to be viewed as a single transport endpoint To this end, existing rSerPool solutions make available a right server, given the issues of QoS, at all times This switchover mechanism in rSerPool is to essentially choose one server from a pool of servers, giving existing rSerPool solutions (including those that employ clustering or dominating set techniques) an essentially client-server flavor.

P2P models for the Internet are very popular and in general may be used to facilitate critical applications In particular, the BitTorrent P2P model is already extremely popular in the wired Internet and has even spawned related technologies like Pando 1 We propose to design rSerPool by applying a BitTorrent-based P2P model, rather than a client-server

1 Pando (online athttp://www.pando.com) is a BitTorrent-based e-mail system for transferring large files (of the order of 100s of MBs) using an e-mail format. model Specifically, we propose BTSerPool which builds on our existing research efforts of adapting BTM (see Chapter 6) and ad hoc threshold signature security scheme [27] to the problem of rSerPool.

In the following section, we describe BTSerPool and discuss why we expect BTSer- Pool will show competitive performance characteristics when applied in the context of MANET, given the requirements of (a) fast switchover, (b) dynamic (re)configuration, (c) survivability, (d) application transparency, and (e) control-signal efficiency, as stated in [25].

BTSerPool is an application-layer protocol that builds on BTM to design a P2P solution for the problem of rSerPool in MANET BTSerPool assumes that the proxy seeds are already created as per BTM operations when the network is initiated This assumption is legitimate, given that the rSerPool problem assumes the presence of multiple redundant servers We can also assume that the torrent files are created and disseminated in the network (during network initiation) by one seed for each file and contain both the list of the proxy seeds and the hash function for creating proxy seeds (by BTM operations).

Given the above, any clientCwhich needs to use the rSerPool for a fileπprocures a copy of the torrent, τ π , for the file (note that C may already have τ π from the initial torrent dissemination, in this case, C bypasses this step) and proceeds to downloadπ from the redundant server pool, given by Σ = {σ 0 π , σ 1 π , , σ π n }, where σ 0 π is the initial seed for π andσ i π , i ∈ [1, n], are the proxy seeds Note that the basic BTM torrent file has to be modified to carry information about all the proxy seed nodes (and the hash function for creating them) in the network In case C does not haveτπ, C launches a torrent-request message, TREQ, which is propagated as per BTM operations at C Once the torrent- reply message TREPreachesC, C has all the information to start downloading π from Σ BTM operations atCwill find new peers in the network (by launchingPREQs) as the piece-wise download ofπ proceeds atC, further amortizing the network expenses of the file download atC.

We illustrate the actions of BTSerPool at a clientCdownloading a fileπin a MANET by using Figure 7.1 as an example.

Service push: As mentioned above, file π is available at each node in the server pool Σ = {σ 0 π , σ 1 π , , σ n π }, whereσ π 0 is the “creator” of the file (redundant availability) At network initiation,t=t0, one server in the pool, say,σ 0 π , disseminatesτπ in the network (service push) so that all other nodes receive the service advertisement.

Reactive overlay construction and maintenance: At time t = t1, the client starts the file download at C by connecting to all members of the server pool Σ, and starts downloading pieces ofπin random order and simultaneously from each of the members of the server pool At timet =t2, the protocol operations atC launch aPREQfor new peers because the file download at C is not yet complete At time t = t3, the PREQ results in a PREPfrom another client D which is also downloadingπ C connects to his new client and starts downloading pieces fromDas well At this time,C is not only connected to all members of the server pool, Σ, but also to other clients (includingD) which help C in downloadingπ This state of being connected to multiple peers at the same time not only alleviates the network expenses at the server pool, it also makes the BTSerPool model resilient to network-wide topological fluctuations, as we will see At time t = t4, C loses connection with σ 1 π , which has either been bombed or has moved away beyond a 2-hop radius from C Att =t 5 , we see that in the new configuration of the overlay at C, C is simultaneously downloading pieces from multiple peers As we can see, the absence ofσ π 1 (inC’s overlay) does not affectπdownload atC.

In the above framework, we see that BTSerPool operations at C will force the use of

One node, σ 0 ∈ Σ , broadcasts the torrent for π.

(b) At t = t 1 , C connects to all the servers, σ i ∈ Σ , in the BTSerPool.

(c) At t = t 2 , the file download at C is incomplete, so C launches a PREQ with limited TTL to find new peers.

(d) At t = t 3 , C receives a PREP from another client,

D, which is also downloading π, and connects to D.

(e) At t = t 4 , C’s connection to σ π 1 is disrupted.

(f) At t = t 5 , the new current state of the client overlay at C for downloading π.

Figure 7.1: The BTSerPool protocol in action atC. other peers (which are not seeds, but other clients) in the network, found using PREQ broadcasts made during BTSerPool operations (at C) As more peers are added to the network, the system’s resilience increases owing to the availability of more peers with the pieces of the file This availability of peers makes BTSerPool inherently scalable.

BTSerPool is expected to recover quickly from service interruptions because the basicBTSerPool operations atCwill force connections to new peers when the download atCis not yet complete In addition, the nature of thePREQpropagation allows for finding new peers as close to C as possible Intuitively, this mechanism allows for better throughput metrics at C given that TCP connections perform better over shorter path-lengths The PREQ propagation method also allows for clients to connect to servers that are more capable currently, as and when they are found The above described properties make our model inherently resistant to server failures Given these benefits, the download at the BTSerPool clientC is either affected minimally from service interruptions, becauseC is most likely connected to other peers, or becauseC is expected to find other peers owing to BTSerPool protocol operations.

As mentioned above, in BTSerPool,C downloads pieces ofπsimultaneously from multiple peers (seeds and other clients) in the network The use of concurrent TCP sessions makes the file download at C inherently survivable Also, for reasons described in 6, BTSerPool is resistant to network partitions and performs well under node mobility A good choice of hash function for creating proxy seeds will ensure uniform distribution of the seeds in the network In addition, if more peers are present in the network, the network becomes more stable, because there are multiple sources for piece download at

C BecauseCis expected to connect to a large number of peers at all times, the issue of application layer transparency to broken connections is not relevant in BTSerPool.

In addition to the above features, we see that the BTSerPool solution will have the following features:

1 The BTSerPool architecture is purely infrastructureless, given that the nodes do not use any centralized lookup feature.

2 Because the information about the seeds and proxy seeds (using a hash function) are encoded in the torrent file disseminated in the network at initiation (or file creation), no name-lookup service is necessary.

3 Expensive backbone formation and maintenance protocols are not necessary, given that that the BTSerPool model is completely distributed.

4 Given the above, service registration is not applicable to our model, because the clients use a pull model to discover new services in the network (by using PREQ messages).

5 Because BTSerPool places low loads on many nodes, we expect BTSerPool to be more applicable to resource-constrained networks and nodes.

6 Lastly, the BTSerPool model is inherently resistant to Byzantine adversarial attacks to atmosttserver nodes, (in a network that contains atleast3tserver nodes) 2 given that any client simultaneously downloads the file from multiple servers.

Despite the advantages of the BTSerPool architecture, BTSerPool needs to be extensively evaluated for security concerns and under conditions of seed failures In addition, several issues pertinent to the security of the nodes and links in the rSerPool problem will be addressed in the design of BTSerPool, which are outlined as follows.

and with a gateway via vehicles 3 and 7

Tiêu đề	Swarm Intelligence Methods for Mobile Ad Hoc Networks
Tác giả	Sundaram Rajagopalan
Người hướng dẫn	Chien-Chung Shen, Ph.D., Paul D. Amer, Ph.D., Adarshpal Sethi, Ph.D., Charles Boncelet, Ph.D.
Trường học	University of Delaware
Chuyên ngành	Computer Science
Thể loại	Dissertation
Năm xuất bản	2006
Thành phố	Newark

Định dạng
Số trang	190
Dung lượng	1,3 MB