8 Routing Control in Packet Switched Networks using Soft Computing Techniques Brian Carse, Terence C. Fogarty and Alistair Munro 8.1 Introduction This chapter describes the joint application of two soft computing methods – evolutionary algorithms and fuzzy reasoning – to the problem of adaptive distributed routing control in packet-switched communication networks. In this problem, a collection of geographically distributed routing nodes are required to adaptively route data packets so as to minimise mean network packet delay. Nodes reach routing decisions locally using state measurements which are delayed and necessarily only available at discrete sampling intervals. Interactions between different nodes' routing decisions are strong and highly non-linear. Extant routing methods in packet-switched networks (Khanna and Zinky, 1989) mostly employ, in one form or another, some direct form of least-cost or ‘shortest-path’ algorithm (Dijkstra, 1959) operating at each routing node. Such methods pay little attention to the dynamic interactions of routing decisions made at different nodes and do not directly address the temporal effects of delayed measurements and persistence of the effects of routing choices. This contribution proposes a very different approach to the routing problem. The routing policy of routing nodes is determined by a fuzzy rule base, which takes as inputs various network state measurements and provides route selection probabilities as outputs. Telecommunications Optimization: Heuristic and Adaptive Techniques, edited by D.W. Corne, M.J. Oates and G.D. Smith © 2000 John Wiley & Sons, Ltd. Telecommunications Optimization: Heuristic and Adaptive Techniques. Edited by David W. Corne, Martin J. Oates, George D. Smith Copyright © 2000 John Wiley & Sons Ltd ISBNs: 0-471-98855-3 (Hardback); 0-470-84163X (Electronic) Telecommunications Optimization: Heuristic and Adaptive Techniques138 A Genetic Algorithm (GA) is used to optimise these fuzzy rule bases. At each generation of the GA, identical copies of each candidate fuzzy rule base are deployed to routing nodes. Fitness evaluation for each individual fuzzy rule base is based on the network-wide performance of the distributed assembly of routing controllers. The layout of the chapter is as follows. The following section (section 8.2) concentrates on some preliminaries and offers a background on the adaptive distributed routing problem, and on fuzzy logic and fuzzy control. Section 8.2 also gives a brief overview of some other approaches which use the genetic algorithm to optimise fuzzy rule bases. Section 8.3 describes two versions of a fuzzy classifier system (a non-temporal version and a temporal version) used as the basis for experiments; this section then provides experimental results in applying the fuzzy classifier systems to routing control. Performance of the evolved fuzzy routing controllers is compared to that of routing using other methods. Finally, section 8.4 concludes and offers suggestions for further work. 8.2 Preliminaries 8.2.1 Adaptive Distributed Routing in Packet Switched Networks Communication networks employ two major methods of switching: circuit-switching and packet-switching (refer to Schwartz (1987), Stallings (1994) and Tanenbaum (1996) for descriptions of these switching methods). In the former, a dedicated amount of network bandwidth is allocated to a source-destination pair during a circuit set-up phase and end-to- end delays are usually small and fixed. These characteristics have lead to the widespread adoption of circuit-switching for telephony and real-time video. However, circuit-switching has the drawback of making inefficient use of network resources when information sources generate ‘bursty’ or sporadic traffic. Packet-switching attempts to overcome this problem by employing a distributed form of statistical or dynamic multiplexing. Each network user offers packets to the network and these packets are routed through the network by Packet- Switching Exchanges (PSEs) on a store-and-forward basis. Link bandwidth is no longer pre- allocated at connection set-up time, but instead each PSE maintains a queue of packets to be delivered over a particular outgoing link. Two main ways of implementing packet-switched networks have emerged: virtual-circuit and datagram. In virtual-circuit packet-switched networks, a connection set-up phase establishes a fixed path through the network between a source-destination pair (although it does not allocate network bandwidth). For the duration of a connection, all packets follow the same path through the network. In datagram networks, no connection set-up phase is involved and subsequent packets between a source- destination pair may take different routes through the network. While packet-switching makes better use of network resources for bursty traffic sources, end-to-end delays are variable and depend on the level of traffic offered to the network. This characteristic has meant that such networks have, until recently, been ruled out for conveyance of real-time information sources such as telephony and real-time video. The currently developing Broadband Integrated Services Digital Network (B-ISDN) is intended to convey telephony, video and computer–computer information over the same network. It is almost certain that B-ISDN networks will employ Asynchronous Transfer Mode (ATM), implying that B- ISDN will be a packet-based network (using fixed size packets called ‘cells’). Routing Control in Packet Switched Networks using Soft Computing Techniques 139 Routing policies in computer networks may be static, dynamic or adaptive; and centralised or distributed. Static routing uses fixed routing policies which do not change with time or network conditions. Dynamic routing alters routing policies in time (e.g. according to the time of day). Adaptive routing allows routing decisions to take into account the changing nature of network traffic distributions. With centralised routing, a single centralised node (Routing Control Centre) gathers network status information relating to topology and traffic distribution, calculates routing policies for individual nodes based on this information, and then informs nodes in the network of these policies. Distributed routing, however, has network nodes (PSEs) reaching their own routing decisions based upon the information available to them. Adaptive distributed routing has the advantage in that calculation of routes is spread over many nodes; there is no convergence of routing information to and from an individual routing control centre (causing congestion on links in its vicinity); and routing decisions can be made to adapt to changes in the network status. Virtually all packet-switched networks base their routing decisions using some form of least-cost criterion. This criterion may be, for example, to minimise the number of hops, or to minimise packet delay. Two elegant algorithms in widespread use in both centralised and distributed form, are those of Dijkstra (1959) and Ford and Fulkerson (1962), both of which translate to shortest-path routing algorithms in the communication network context. We now briefly discuss the development of the USA ARPANET packet-switched network, since the problems encountered, and solutions to these problems, exemplify the difficulties of adaptive, distributed routing techniques. The ARPANET is also significant since it formed the initial basis from which the current world wide ‘Internet’ has evolved. The original ARPANET routing mechanism used an adaptive, distributed approach using estimated delay as the performance criterion and a version of the Ford and Fulkerson algorithm (sometimes called the Bellman-Ford algorithm). Each switching node exchanged current estimates of minimum delays to each destination, with its neighbours every 128 ms. Once this information was received, a node calculated the likely least-delay next node for each destination and used these for routing. This original approach suffered from many problems, in particular, the distributed perception of the shortest route could change while a packet was en route, causing looping of packets. The second generation ARPANET routing mechanism (McQuillan et al., 1980), also adaptive and distributed, measured delays to each neighbour directly by time-stamping packets. Every 10 seconds, the measured link delays were averaged and then flooded (i.e. transmitted to every other node) through the network. Each node was then in possession of a (time-delayed) map of delays in the complete network. Nodes re-computed routing tables using Dijkstra's shortest-path algorithm. This strategy was initially found to be more responsive and stable than the old one. However, as network load grew, new problems arose, and instabilities in routing decisions were observed, whereby routes currently measured as heavily used were simultaneously avoided by all nodes and routes measured as lightly used were simultaneously selected, thus causing unwanted oscillations in routing decisions and inefficient network usage. One conclusion reached from these observations was that every node was attempting to obtain the best route for all destinations and that these efforts conflicted. As a result, the ARPANET routing method was further changed and in a later form (Khanna and Zinky, 1989) measures were introduced to damp oscillations through the use of digital filtering to smooth estimates of link utilisation, and the linearisation of projected delay as a function of link utilisation. Telecommunications Optimization: Heuristic and Adaptive Techniques140 8.2.2 Fuzzy Logic and Fuzzy Control Fuzzy logic is based on the concept of fuzzy sets (Zadeh, 1965). A fuzzy set is a generalisation of a classical set in the sense that set membership is extended from the discrete set {0,1} to the closed real interval [0,1]. A fuzzy set A of some universe of discourse X can be defined in terms of a membership function A µ mapping the universe of discourse to the real interval [0,1]: ]1,0[: →X A µ Fuzzy set theory redefines classical set operations. For example, the most common forms of fuzzy union, intersection and complement are )(1)( ))(),(min()( ))(),(max()( xx xxx xxx A A BABA BABA µµ µµµ µµµ −= = = ∩ ∪ Given a universe of discourse X, it is possible to identify fuzzy sets with linguistic variables. For example if X is identified with linear displacement, then fuzzy sets over X might be {Negative-Large, Negative-Small, Zero, Positive-Small, Positive-Large}, or {NL, NS, Z, PS, PL}. Given universes of discourse for a set of system input and output variables, rules can be written in terms of fuzzy sets. For example, a rule for linear position control might be: if (x is NL) and (v is Z) then (f is PL) where the rule inputs are position (x), velocity (v) and the rule output is force (f). Once a fuzzy rule base and associated fuzzy set membership functions have been defined, the mapping of actual (crisp) input values to output values is achieved by fuzzification, fuzzy inference and defuzzification. One widely used fuzzification and inference method is the max-min or Mamdani method (Mamdani, 1976). Fuzzification evaluates every crisp input parameter with respect to the fuzzy sets in rule antecedents. For the above example rule, these are evaluated and combined by: ))(),(min( vxs ZNL µµ = The output of the fuzzy rule is the fuzzy set defined by the function: )),(min()( sff PLR µµ = This method produces a fuzzy set for each rule R. Aggregation of these resulting fuzzy sets using fuzzy union (max) produces a single fuzzy set ( 21 RR µ µ ∪ ). A single crisp output is obtained by applying a defuzzification operator to this aggregate fuzzy set. Fuzzy logic has been applied in control systems for a wide variety of applications; an excellent overview, which includes a historical perspective together with many references to Routing Control in Packet Switched Networks using Soft Computing Techniques 141 work on fuzzy control, may be found in (Kruse et al., 1996). The main choices which need to be made by a fuzzy controller designer include (Lee, 1990): 1. Fuzzification and defuzzification strategies and operators. 2. Knowledge Base • universe of discourse • fuzzy partitions of input and output spaces • fuzzy set membership functions 3. Rule Base • choice of input and output variables • source and derivation of fuzzy control rules • types of fuzzy control rules 4. Decision Making Logic • definition of fuzzy implication • interpretation of connectives and and or • definition of compositional operator and inference mechanism. Commonly, fuzzification/defuzzification methods, implication and inference methods are fixed at the outset of the design and the main design element then is ascertaining a suitable ‘knowledge base’ (fuzzy sets) and rule base. A number of different methods are available for devising appropriate fuzzy sets and rules. These include: 1. Extracting a human expert's experience and knowledge (if available) through knowledge elicitation techniques. 2. Observing and modelling a human operator's actions (possibly using automatic supervised and/or unsupervised learning operating on the observed data sets). 3. Understanding the physics of the process to be controlled and creating a model of the process from which fuzzy sets and rules for the controller may be designed. 4. Automatic generation of fuzzy sets and rules employing a directed search strategy in combination with some form of performance measurement. Which approach to employ clearly depends upon whether a human expert exists and how easy it is to model the process to be controlled. Most real-world fuzzy controllers have been derived using one or more of the first three methods. However, in cases where no human expert knowledge nor input/output data sets are available, and additionally it is not possible to derive an accurate model of the process, these methods cannot be used and it becomes necessary to employ some sort of exploration strategy together with performance measurement to learn fuzzy sets and rules. One possible method of doing this is to employ reinforcement learning, which uses an adaptive critic for evaluating controller performance (e.g. Sutton's Adaptive Heuristic Critic (Sutton, 1984) and Watkins' Q-learning (Watkins, 1989)). Recently, researchers have investigated the use of evolutionary algorithms such as the genetic algorithm as a basis of learning fuzzy sets and rules. The next section summarises this work. Telecommunications Optimization: Heuristic and Adaptive Techniques142 8.2.3 Previous Work using the GA to Optimise Fuzzy Rule Bases Optimisation of fuzzy rule based systems using a GA is clearly a constrained optimisation problem; the main source of constraint being linguistic interpretability (the genotype must represent a valid rule base and fuzzy set membership functions should make sense). In an attempt to summarise previously published research in this area, this work is next categorised in decreasing order of constraints imposed on the (genetic) learning system. The categories used are: (1) learning fuzzy set membership functions only; (2) learning fuzzy rules only; (3) learning both fuzzy rules and fuzzy set membership functions in stages; and (4) learning both fuzzy rules and fuzzy set membership functions simultaneously. For a more detailed account of the brief summary given here, please see Carse et al. (1996). Learning Fuzzy Membership Functions with Fixed Fuzzy Rules Karr (1991) applied GAs to fuzzy controller design by adaptation of membership functions of a fixed rule set. This work demonstrated the success of the approach in generating both non-adaptive and adaptive fuzzy controllers for the cart-pole problem. Learning Fuzzy Rules with Fixed Fuzzy Membership Functions Thrift (1991) described the design of a fuzzy controller for centring a cart on a one- dimensional track by evolving fuzzy relations using fixed membership functions. Thrift’s system was able to evolve a fuzzy control strategy which compares well with the optimal ‘bang-bang’ control rule. Pham and Karaboga (1991) described a system which learns fuzzy rules and output membership functions simultaneously using fixed input membership functions. Optimisation of the controller was carried out in two stages. In the first stage, different populations of controllers were independently evolved (using different initial random seeds) to produce ‘preliminary’ designs. The second stage combined the best individuals from the first stage into a single population to which the GA is applied to evolve a ‘detailed’ design. Learning Fuzzy Rules and Membership Functions in Stages Kinzel et al. (1994) described an evolutionary approach to designing fuzzy controllers, and applied the technique to the cart-pole problem. They argued that learning fuzzy rules and membership functions simultaneously is difficult due to complex interactions and propose an alternative three stage task solving process. An initial rule base and membership functions were selected heuristically, rather than randomly and the initial population was seeded with mutations of this initial rule base. The GA was then applied to rules (keeping membership functions fixed). The final stage was to apply the GA for fine tuning of membership functions within good evolved rule bases. Learning Fuzzy Rules and Membership Functions Simultaneously Lee and Takagi (1993) employed the genetic algorithm to optimise simultaneously a variable size fuzzy rule base and fuzzy set membership functions of a Takagi–Sugeno controller. The system was applied with success to the cart-pole problem. Cooper and Vidal (1994) used a variable length genome to represent a fuzzy rule- et and accompanying membership functions. They argued that domain-based representations which imply complete coverage of the input space cannot be expected to scale well to high-dimensional problems, and that using variable size rule sets in conjunction with rule creation and Routing Control in Packet Switched Networks using Soft Computing Techniques 143 deletion operators allows the GA to evolve rule sets which do not include superfluous rules. Liska and Melsheimer (1994) used a GA for simultaneously discovering fuzzy rules and membership functions, with a final stage of fine-tuning membership functions using conjugate gradient descent. They applied the system to learning a dynamic model of plant using known input-output data. After the GA approached convergence, conjugate gradient descent was employed to further improve good solutions by fine-tuning membership function parameters. Linkens and Nyongesa (1995) described off-line evolutionary learning of fuzzy rules and associated membership functions for a multivariable fuzzy controller for medical anaesthesia. 8.3 Evolving Fuzzy Routing Controllers with the GA 8.3.1 Fuzzy Classifier System Details The fuzzy classifier system employed here is version of P-FCS1 (a Pittburgh Fuzzy Classifier System #1), described and evaluated in Carse et al. (1996). P-FCS1 is a synthesis of the classifier system (Holland, 1976; Booker et al., 1989) and fuzzy sets (Zadeh, 1965) which employs the genetic algorithm in the ‘Pittsburgh’-style (Smith, 1980) in which individuals in the population operated on by the genetic algorithm are complete rule sets. P- FCS1 employs variable length rule sets, uses a real-numbered representation of fuzzy membership function centres and widths, and applies modified recombination operators which are particularly suited to fuzzy as opposed to discrete classifier systems. In PFCS-1, each rule k R for an n-input, m-output system, is represented as: ),( );,(),( );,(: 1111 wmkcmkkwkcwnkcnkkwkcK yyyyxxxx ⇒R a similar representation to that used in Parodi and Bonelli (1993). The bracketed terms represent the centres and widths of fuzzy set membership functions over the range of input and output variables. The genome representing a complete rule set is a variable length concatenated string of such fuzzy rules. The two-point version of the crossover operator used in P-FCS1 involves the generation of two crosspoints i C 1 and i C 2 as follows: n ciiii ciiii RCC RC /1 212 11 )()min(max )()min(maxmin ⋅−+= ⋅−+= where c R 1 and c R 2 are selected randomly in the range [0,1] with uniform probability density. The range [min i , max i ] is the universe of discourse of the ith input variable. After crossover, Child 1 contains rules from Parent 1 such that ))minmax(())()((, 221 iiicikicikicik CxCxCxi <−+∨<∧>∀ together with rules from Parent 2 which do not satisfy this condition. Child 2 contains the remaining rules from both rule sets. This crossover operator is designed to enhance the Telecommunications Optimization: Heuristic and Adaptive Techniques144 probability that good fuzzy ‘building blocks’ (i.e. high-fitness assemblies of fuzzy rules with overlapping input membership functions) survive and proliferate in future generations. Mutation in P-FCS1 applies real number ‘creep’ to fuzzy set membership function centres and widths. In addition, a cover operator is employed to create a new classifier if inputs are encountered which no existing rules match. In the first set of experiments described below, P-FCS1 is applied to routing control in a simple simulated three node network. In the second set of experiments, an extended version of P-FCS1 called FCDACS (a Fuzzy Clocked Delayed Action Classifier System) is applied to routing control in a more complex network. In FCDACS, the fuzzy classifier syntax is ),();,( );,(),( );,(: 1111 wkckwmkcmkkwkcwnkcnkkwkcK ttyyyyxxxx ⇒R where ),( wkck tt is a tag which encodes the centre and width of a time membership function which is used to modulate the contribution of an activated classifier over time. This allows the evolution of temporal fuzzy classifiers with linguistic interpretations such as: IF (Route1 has lower delay than Route2) THEN increase proportion of traffic over Route1 OVER the medium future A full description of P-FCS1 and FCDACS can be found in Carse (1997). 8.3.2 Routing Control in a Small Scale Network using P-FCS1 In these experiments an assembly of fuzzy controllers are required to perform adaptive, distributed routing control in a simulated fully-connected 3-node datagram packet switched network, as illustrated in Figure 8.1). All links in the network are bidirectional full duplex. Packets requiring transmission over a particular link are queued using a first-come first- served discipline. Packets arrive from outside at network source node i ∈ {A,B,C}, to be delivered to destination node j ∈ {A,B,C}, j ≠ i, at an average rate of ij λ packets/second. A fuzzy controller at each node decides whether to route each packet directly to its destination or via an intermediate node. Controller decisions are based on packet delay measurements over different paths. The goal is to minimise average global packet delay (i.e. the mean delay between packet arrival at the source node and packet delivery to the destination node for all packets which arrive during the period of simulation, irrespective of source and destination). The learning system is therefore required to determine a routing policy, copies of which are deployed at each switching node and operate in parallel, which minimises global packet delay. Each routing controller is implemented as a variable size fuzzy classifier system with four inputs and two outputs. At each node the controller inputs are: DelayLeftDirect: The measured packet delay from the source node for packets destined for the node to the left of the source node and which are routed directly. DelayLeftIndirect: The measured packet delay from the source node for packets destined for the node to the left of the source node and which are routed indirectly. Routing Control in Packet Switched Networks using Soft Computing Techniques 145 DelayRightDirect: The measured packet delay from the source node for packets destined for the node to the right of the source node and which are routed directly. DelayRightIndirect: The measured packet delay from the source node for packets destined for the node to the right of the source node and which are routed indirectly. Figure 8.1 Three-node packet switched network used in simulation. Routing nodes A, B and C are connected by bidirectional communication links. Packet delays are measured at the destination node (each packet is time-stamped on arrival in the system) and averaged over the last N Measure packets for each route taken for each source node. In the simulation, we assume this information is transmitted without delay to source nodes once the averages have been taken, and transmission of control information does not consume network bandwidth. In a real network such information would be sent as control packets which would incur a finite delay and utilise network bandwidth (unless a separate signalling network is in place). N Measure is a parameter varied in the experiments described later, and determines the granularity of measurements. Also, in a real network, a trade-off would have to be made in choosing the value of N Measure . If too small a value is chosen, the network becomes swamped with control packets which compete with user data packets for use of the shared bandwidth. If too large a value is chosen, measurements become out of date and meaningless. At each node, the controller outputs are: PLeftDirect: The probability that a packet arriving at the source node which is destined for the node to the left of the source node is routed directly. PRightDirect: The probability that a packet arriving at the source node which is destined for the node to the right of the source node is routed directly. By dynamically adjusting local PLeftDirect and PRightDirect control outputs based on network delay measurements, the distributed assembly of controllers should attempt to spread the network load to minimise global mean packet delay in response to changing traffic conditions in the network. A B C user communication lin k routing node Telecommunications Optimization: Heuristic and Adaptive Techniques146 Each network simulation is run for a simulation time of 500 seconds. The data rates of all network links are set to 10,000 bits per second. Mean packet arrival rates used in the simulation, and their variation in time, are given by: 375250 cpackets/se 15 and ,375 ,2500 cpackets/se 3 250125 cpackets/se 15 and ,250 ,1250 cpackets/se 3 5000 c,packets/se 3 ,,, <<><<= <<><<= <<= ttt ttt t CB AB CABCBAAC λ λ λλλλ These patterns were chosen to test the dynamic capabilities of the routing controller in moving from relatively light network load, when direct routing is optimal, to heavy load when controllers must balance the offered load between direct and indirect network paths. In the simulation, packets arriving at an intermediate node are always forwarded to their destination to avoid a ‘ping-pong’ effect. The evaluation function for each rule set returns the inverse of the mean measured packet delay for all packets delivered during the simulation. Experiments were done with both deterministic and probabilistic (Poisson) packet arrival processes with packet sizes exponentially distributed with mean 1000 bits. To evaluate the controllers evolved by the fuzzy classifier system, we compared their performance with a shortest-path routing algorithm, which routes all packets along the route whose measured delay is least between a particular source/destination pair. A range of measurement intervals, N Measure from two packets to 100 packets were used. Ten independent runs of P-FCS1 were conducted with different initial random seeds. In addition, different initial random seeds were also used for each of the network simulations used in evaluating a particular individual. The latter introduces noise in the evaluation function and we were interested in whether the system could learn in the face of this potential difficulty. Each of the 10 evolved fuzzy controllers using P-FCS1 were evaluated in 20 subsequent simulations and the results are presented in Table 8.1 where they are compared with the shortest path routing algorithm. This table shows mean packet delay over the complete simulation interval. The results shown in Table 8.1 indicate that, when the measurement interval is small, the shortest-path algorithm outperforms the learned fuzzy controllers, although not by that large a margin. As the measurement interval increases, the learned fuzzy controllers begin to outperform the shortest path algorithm significantly. As mentioned earlier, an important characteristic of a routing algorithm is that routing control information should not consume excessive network bandwidth. A value of N Measure greater than (at least) 20 is realistic for a real network, and the results using a GA-derived fuzzy controller appear better than the simple shortest-path algorithm in this region of rate of feedback. Inspection of the measured delays on direct and indirect paths demonstrated that the shortest-path controller shows pronounced instabilities as N Measure is increased, while the evolved fuzzy controllers appear to shift traffic in a much more stable manner. 8.3.3 Routing Control in a More Complex Network using FCDACS Although experiments with the simple 3-node network, described in the previous section, offer some enlightenment regarding the behaviour of learned fuzzy controllers, we need to scale up to more complex networks. To further test the viability of the approach of [...]... for operation in the larger scale network This hybrid scheme scales well to very large networks 148 Telecommunications Optimization: Heuristic and Adaptive Techniques The approach we propose is to employ evolutionary fuzzy control in conjunction with the shortest-path heuristic However, rather than selecting a single shortest path for packet transmissions (with its attendant stability problems), two... offered to the network during the simulation interval were observed These results are shown in Table 8.5, for a value of T equal to 1.0 second for various traffic loads 150 Telecommunications Optimization: Heuristic and Adaptive Techniques Table 8.4 Mean packet delay(s) for λOffd = 5.0 versus measurement update interval (s) Τ Routing Method Static Shortest Hop Adaptive Shortest Path Fuzzy Bifurcated 0.1 . outputs. Telecommunications Optimization: Heuristic and Adaptive Techniques, edited by D.W. Corne, M.J. Oates and G.D. Smith © 2000 John Wiley & Sons, Ltd. Telecommunications Optimization: Heuristic and Adaptive. 1 6 17 3 7 Telecommunications Optimization: Heuristic and Adaptive Techniques148 The approach we propose is to employ evolutionary fuzzy control in conjunction with the shortest-path heuristic. However, rather. linearisation of projected delay as a function of link utilisation. Telecommunications Optimization: Heuristic and Adaptive Techniques140 8.2.2 Fuzzy Logic and Fuzzy Control Fuzzy logic is based on