Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
322,91 KB
Nội dung
150 MODELLING UNCERTAINTY and we know a priori the probabilities for the hypothesis P(H), the evidence P(E),and the evidence assuming the hypothesis is true P(E|H). Bayes’ theorem gives us now the probability of the hypothesis based on the evidence: P(H|E) = P(H ∩E) P(E) , (7.1) which we can rewrite as P(H|E) = P(E|H)· P(H) P(E) . (7.2) More generally, if we have a set of n hypotheses {H 0 ,H 1 , ,H n−1 }, Bayes’ theorem can be restated as P(H i |E) = P(E|H i ) · P(H i ) n−1 j=0 (P (E|H j ) · P(H j )) , (7.3) provided that the whole event space equals n−1 i=0 H i , H i ∩ H j =∅ when i = j,and P(E) > 0. Bayes’ theorem has assumptions that restrict its usability: First, all the statistical data regarding the evidence with the various hypotheses is assumed to be known. Because Bayesian reasoning requires complete and up-to-date probabilities, we have to adjust them whenever we find a new connection between a hypothesis and an evidence. Second, the terms P(E|H i ) must be independent of one another (i.e. the hypotheses are alternative explanations for the evidence). Both of these assumptions can be quite problematic to establish in the real world. Let us take a simple (but instructive) example of Bayes’ theorem. Suppose there is a 10% probability that an alpha-tested computer game has a bug in it. From past experience, we have observed that the likelihood of a detected bug to have resulted from an actual bug in the program is 90%. The likelihood of detecting a bug when it is not present (e.g. it is caused by the test arrangement) is 10%. Now, the components are as follows: • H – there is a bug in the code; • E – a bug is detected in the test; • E|H – a bug is detected in the test given that there is a bug in the code; • H |E – there is a bug in the code given that a bug is detected in the test. The known probabilities are as follows: P(H) = 0.10 P(E|H) = 0.90 P(E|¬H) = 0.10. By using the law of total probability, we can calculate for partitions H and ¬H P(E) = P(E|H)· P(H)+P(E|¬H)· P(¬H) = 0.18. MODELLING UNCERTAINTY 151 To get the probability of detecting an actual bug in the code, we apply Equation (7.2) and get P(H|E) = 0.5. To conclude, even if 90% of the time we can detect the actual bugs, a detected bug has a fifty-fifty chance that it is not in the actual code – which is not a reassuring result for a programmer. 7.1.2 Bayesian networks Bayesian network tries to solve the independence problem by modelling the knowledge modularly. Generally, propositions can affect each other in two alternative ways: (i) observing a cause changes the probabilities of its effects, or (ii) observing an effect changes the probabilities of its causes. The idea of a Bayesian network is to make a clear distinction between these two cases by describing the cause-and-effect relationships with a directed acyclic graph. The vertices rep- resent a proposition or variable. The edges represent the dependencies as probabilities, and the probability of a vertex is affected by the probabilities of its successors and predecessors. Let us take an example in which a guard is observing the surroundings. If he hears a noise, its cause is either a sentry making the rounds or an intruder, who is likely to avoid the time when the sentry is doing the rounds. The situation can be formed as a graph illustrated in Figure 7.1. If we know the probabilities for the dependencies between the vertices, we assign them to the edges or list them as in Table 7.1. We still need a mechanism to compute the propagation between the vertices. Suppose the guard hears a noise, what does it tell about the probability of the intruder? The propagation methods base on the idea that the vertices have local effects. Instead of trying to manage the complete graph, we can reduce the problem by focusing on one sub-graph at a time; for details, see Pearl (1986). Still, the problems of Bayesian reasoning – establishing the probabilities and updating them – remain, and Bayesian networks are usually too static for practical use. Noise Round Sentry Intruder Figure 7.1 A Bayesian network as a directed acyclic graph. 152 MODELLING UNCERTAINTY Table 7.1 Probabilities for a Bayesian net- work. H |EP(H|E) Noise | Sentry ∧ Intruder 0.95 Noise | Sentry ∧¬Intruder 0.9 Noise |¬Sentry ∧ Intruder 0.8 Noise |¬Sentry ∧¬Intruder 0.1 Sentry | Round 1.0 Sentry |¬Round 0.0 Intruder | Round 0.1 Intruder |¬Round 0.9 Round 0.3 7.1.3 Dempster–Shafer theory To address the problems of Bayesian reasoning, Dempster–Shafer theory (Shafer 1990) allows beliefs about propositions to be represented as intervals [belief, plausability] ⊆ [0, 1]. Belief (Bel) gives the amount of belief that directly supports the proposition. Plausability (Pl), which is defined as Pl(A) = 1 −Bel(¬A), describes how much the belief supporting the contradicting proposition ¬A reduces the possibility of proposition A (i.e. Bel(A) ≤ Pl(A)). Especially, if Bel(¬A) = 1 (i.e. the contradicting proposition is a certain), then Pl(A) = 0(i.e.A is not plausible) and the only possible belief value is Bel(A) = 0(i.e.A is not believable). The belief–plausability interval indicates how much information we have about the propositions (see Figure 7.2). For example, suppose that the proposition ‘there is an in- truder’ has a belief of 0.3 and a plausibility of 0.8. This means that we have evidence supporting that the proposition is true with probability 0.3. The evidence contrary to the hypothesis (i.e. ‘there is no intruder’) has probability 0.2, which means that the hypoth- esis is possible up to the probability 0.8, since the remaining probability mass of 0.5 is essentially ‘indeterminate’. Additional evidence can reduce the interval – increase the be- lief or decrease the plausibility – unlike in Bayesian approach, where the probabilities of 1 Belief Uncertainty Non-belief Plausability Doubt Bel( A ) Pl(A ) 0 Figure 7.2 Belief and plausability. MODELLING UNCERTAINTY 153 the hypotheses are assigned beforehand. For instance, in the beginning when we have no information about hypothesis A,weletBel(A) = 0andPl(A) = 1. Now, any evidence that supports A increases Bel(A) and any evidence supporting the contradicting hypothesis decreases Pl(A). Let us take an example and see how we use the belief function with a set of alternative hypotheses. Suppose that we have four hypotheses, ‘weather’, ‘animal’, ‘trap’ and ‘enemy’, which form the set ={W,A, T,E}. Now, our task is to assign a belief value for each element of . The evidence can affect one or more of the hypotheses. For example, evidence ‘noise’ supports hypotheses W, A, and E. Whereas Bayesian reasoning requires that we assign a conditional probability for each combination of propositions, Dempster–Shafer theory operates with sets of hypotheses. A mass function (or basic probability assignment) m(H), which is defined for all H ∈ ℘()\∅, indicates the current belief to the set H of hypotheses. Although the amount of subsets is exponential and the sum of their probabilities should be one, most of the subsets will not be handled and their probability is zero. Let us continue with our example: In the beginning we have no information at all, and we let m() = 1 and all the subsets have the value zero. In other words, all hypotheses are plausible and we have no evidence supporting any of them. Next, we observe a noise and know this evidence points to the subset {W,A,E} (i.e. we believe that the noise is caused by the weather, an animal, or an enemy) with the probability 0.6. The corresponding mass function m n is m n ({W,A,E}) = 0.6,m n () = 0.4. Note that the ‘excess’ probability of 0.4 is not assigned to the complement of the subset but to the set of all hypotheses. We can now define belief for a set X of hypotheses with respect to m( • ) as Bel(X) = Y ⊆X m(Y ) (7.4) and its plausability as Pl(X) = Y ∩X=∅ m(Y ). (7.5) To combine beliefs, we can use Dempster’s rule: Let m 1 and m 2 be the mass functions and X and Y be the subsets of for which m 1 and m 2 have non-zero values. The combined mass function m 3 is m 3 (Z) = X∩Y =Z m 1 (X) · m 2 (Y ) 1 − X∩Y =∅ m 1 (X) · m 2 (Y ) . (7.6) An implementation for this is given in Algorithm 7.1. Dempster’s rule can be used in both chaining (e.g. A → B and B → C) and conjoining (e.g. A → C, B → C) multiple propositions. Reverting to our example, evidence ‘footprints’ (supporting the hypotheses ‘animal’, ‘trap’ and ‘enemy’) has the mass function m f , which is defined as m f ({A, T , E}) = 0.8,m f () = 0.2. 154 MODELLING UNCERTAINTY Algorithm 7.1 Combining two mass functions. Combined-Mass-Function(m 1 ,m 2 ) in: mapping m 1 : ℘()\∅→[0, 1] (the domain elements with non-zero range value is denoted by M 1 ⊆ ℘()\∅); mapping m 2 is defined similarly as m 1 out: combined mapping m 3 constant: set of hypothesis 1: for all M ∈ (℘ () \{∅,}) do 2: m 3 (M) ← 0 3: end for 4: m 3 () ← 1 5: M 3 ← 6: e ← 0 7: for all M 1 ∈ M 1 do For pairs of members between M 1 and M 2 . 8: for all M 2 ∈ M 2 do 9: M 3 ← M 1 ∩ M 2 10: p ← m 1 (M 1 ) · m 2 (M 2 ) 11: m 3 () ← m 3 () − p 12: if M 3 =∅then Excess for imaginary m 3 (∅). 13: e ← e + p 14: else M 3 contributes to M 3 . 15: m 3 (M 3 ) ← m 3 (M 3 ) + p 16: if M 3 /∈ M 3 then 17: M 3 ← M 3 ∪{M 3 } 18: end if 19: end if 20: end for 21: end for 22: if 0 <e<1 then Normalization. 23: for all M ∈ M 3 do 24: m 3 (M) ← m 3 (M)/(1 − e) 25: end for 26: end if 27: return m 3 Assuming that the intersections X ∩ Y are non-empty, we get the combination m nf for the two evidences directly from the numerator of Equation (7.6): m nf ({A, E}) = 0.48,m nf ({A, T , E}) = 0.32, m nf ({W,A,E}) = 0.12,m nf () = 0.08. It is possible that we get the same intersection set Z more than once, but in that case we just add the mass functions together. The situation gets a bit more complicated if the intersection of subsets is empty. The numerator in Equation (7.6) ensures that the sum of different probabilities is one (provided MODELLING UNCERTAINTY 155 that this holds also for m 1 and m 2 ). If some intersections are empty, the amount given to the empty sets must be distributed to all non-empty sets, which is handled by the denominator of Equation (7.6). Let us add m c to the mass functions, which describes the evidence ‘candy wrapper’: m c ({E}) = 0.6,m c ({T }) = 0.3, m c () = 0.1. By combining functions m nf and m c , we get the following result from the numerator: m nf c ({E}) = 0.6,m nf c ({T }) = 0.12, m nf c ({A, E}) = 0.048,m nf c ({A, T , E}) = 0.032, m nf c ({W,A,E}) = 0.012,m nf c () = 0.008, m nf c (∅) = 0.18. The denominator is 1 − m nf c (∅) = 0.82, and we use it to scale to get m nf c (rounded to two decimals): m nf c ({E}) = 0.73,m nf c ({T }) = 0.15, m nf c ({A, E}) = 0.06,m nf c ({A, T , E}) = 0.04, m nf c ({W,A,E}) = 0.01,m nf c () = 0.01. From this it follows that if we have evidences ‘noise’, ‘footprints’ and ‘candy wrapper’, Equation (7.4) gives the belief in the hypothesis ‘enemy’ Bel(E) = 0.73, and Equation (7.5) gives its plausability Pl(E) = 0.85. In comparison, the combined hypothesis ‘trap or enemy’ has belief Bel({T,E}) = 0.88 and plausability Pl({T,E}) = 1, which means that a human threat is a more likely explanation to the evidence than natural phenomenon. 7.2 Fuzzy Sets Fuzzy sets acknowledge uncertainty by allowing elements to have a partial membership in a set. In contrast to classical sets with Boolean memberships, fuzzy sets admit that some information is better than no information. Although multi-valued logic was already developed in the 1920s by J. Łukasiewicz, the term ‘fuzziness’ was coined forty years later. In a seminal paper Zadeh (1965) applied Łukasiewicz’s multi-valued logic to sets: Instead of belonging or not belonging to a set, in a fuzzy set an element belongs to a set to a certain degree. One should always bear in mind that fuzzy sets depend on the context: There can be no universal agreement on a membership function, for example, on the adjective ‘small’ (cars, humans, nebulae), and, subjectively speaking, a small car can be something completely different for a basketball player than for a racehorse jockey. Furthermore, fuzziness is not a solution method in itself but we can use it in modelling to cope with uncertainty. For example, we can describe the objective function using an aggregation of fuzzy sets (see Figure 7.3). In effect, fuzziness allows us to do more fine-grained evaluations. 156 MODELLING UNCERTAINTY Response Action Figure 7.3 Uncertain or complex dependencies can be modelled with fuzzy sets that cover the solution space. 7.2.1 Membership function In classical (or ‘crisp’) set theory, the elements of set S are defined using a two-valued characteristic function χ S (x) = 1 ⇐⇒ x ∈ S 0 ⇐⇒ x/∈ S In other words, all the elements x in the universe U either belong to S or not (and there is nothing in between). Fuzzy set theory extends the characteristic function by allowing an element to have a degree with which it belongs to a set. This degree is called a membership in a set, and a fuzzy set is a class in which every element has a membership value. Theorem 7.2.1 Let U be a set (universe) and L be a lattice, L =L, ∨, ∧, 1, 0. A fuzzy set A in the universe U is defined by a membership function µ A µ A : U → L. (7.7) Each element x ∈ U has an associated membership function value µ A (x) ∈ L,whichis the membership value of the element x.Ifµ A (x) = 0, x does not belong to the set A.If µ A (x) = 1, x belongs to the set A.Otherwise(i.e.ifµ A (x) = 0, 1) x belongs partly to the set A. This general definition of a fuzzy set is usually used in a limited form, where we let the lattice L to be L = [0, 1] ⊂ R, 0 = 0and1 = 1. In other words, the membership function is defined on a real number range [0, 1], and the fuzzy set A in universe U is defined by the membership function µ A : U → [0, 1], which assigns for each element x ∈ U a membership value µ A (x) in the fuzzy set A. Another way to interpret the membership value is to think it as the truth value of the statement ‘x is an element of set A’. For example, Figure 7.4 illustrates different fuzzy sets for a continuous U . Here, the universe is the distance d in metres, and the sets describe the accuracy of different weapons with respect to the distance to the target. MODELLING UNCERTAINTY 157 Sword (d ) Bow Spear 0 102030405060708090 0 0.5 1 d m Figure 7.4 Membership functions µ sword , µ spear and µ bow for the attribute ‘accuracy’ of weapons with respect to the distance (in metres) to the target. When defining fuzzy sets, we inevitably face the question, how should one assign the membership functions. Suggested methods include the following: • Real-world data: Sometimes we can apply physical measurements, and we can assign the membership function values to correspond to the real-world data. Also, if we have statistical data on the modelled attribute, it can be used to define the membership functions. • Subjective evaluation: Because fuzzy sets often model human’s cognitive knowledge, the definition of a membership function can be guided by human experts. They can draw or select, among pre-defined membership functions, the one corresponding to their knowledge. Even questionnaires or psychological tests can be used when defining more complex functions. • Adaptation: The membership functions can be dynamic and evolve over time using the feedback from the input data. This kind of hybrid system can use, for example, neural networks or genetic algorithms for adaptation as the nature of the modelled attribute becomes clear. The beauty (and agony) of fuzzy sets is that there are an infinite number of possible dif- ferent membership functions for the same attribute. Although by tweaking the membership function we can get more accurate response, in practice even simple functions work sur- prisingly well as long as the general trend of the function reflects the modelled information. For example, if we are modelling the attribute ‘young’, it is sufficient that the membership value decreases as the age increases. 7.2.2 Fuzzy operations The logical fuzzy operations ∨ (i.e. disjunction) and ∧ (i.e. conjunction) are often defined using max{µ A ( • ), µ B ( • )} and min{µ A ( • ), µ B ( • )}, although they can be defined in various alternative ways using t-norms and t-conorms (Yager and Filev 1994). Also, negation can be defined in many ways, but the usual choice is 1 −µ A ( • ). All classical set operations have fuzzy counterparts. 158 MODELLING UNCERTAINTY (f) NOT expensive 1 0.5 0 m Swordsm an Spearman Archer 1 0.5 0 m Swordsm an Spearman Archer expensive mobile AND strong 1 0.5 0 m Swordsm an Spearman Archer mobile OR strong Swordsm an Spearman 1 0.5 0 m Archer 1 0.5 0 m Swordsman Spearman Archer strong 1 0.5 0 m Swordsman Spearman Archer mobile (a) (b) (c) (d) (e) AND strong Figure 7.5 Fuzzy operations for different attributes. (a) The membership function for mo- bility. (b) The membership function for strength. (c) The membership function for the union of mobility and strength. (d) The membership function for the intersection of mobility and strength. (e) The membership function for expensiveness. (f) The membership function for the intersection of the complement of expensiveness and strength. MODELLING UNCERTAINTY 159 Theorem 7.2.2 Let A, B, and C be fuzzy sets in the universe U. Further, assume that all operations have the value range [0, 1]. We can now define for each element x ∈ U Union C = A ∪ B ⇐⇒ µ C (x) = max{µ A (x), µ B (x)}, (7.8) Intersection C = A ∩ B ⇐⇒ µ C (x) = min{µ A (x), µ B (x)}, (7.9) Complement C = ¯ A ⇐⇒ µ C (x) = 1 − µ A (x). (7.10) Figure 7.5 illustrates the use of fuzzy set operations for a discrete U . The universe con- sists of three elements – swordsman, spearman, and archers – and they have three at- tributes – mobility, strength, and expensiveness. The union of mobility and strength describes the set of mobile or strong soldiers, whereas the intersection describes the set of mobile and strong soldiers. The intersection of the complement of expensiveness and strength gives the set of inexpensive and strong soldiers. 7.3 Fuzzy Constraint Satisfaction Problem Fuzzy optimization originates from ideas proposed by Bellman and Zadeh (1970), who introduced the concepts of fuzzy constraints, fuzzy objective, and fuzzy decision. Fuzzy decision-making, in general, concerns deciding future actions on the basis of vague or uncertain knowledge (Full ´ er and Carlsson 1996; Herrera and Verdegay 1997). The problem in making decisions under uncertainty is that the bulk of the information we have about the possible outcomes, the value of new information, and the dynamically changing conditions is typically vague, ambiguous, or otherwise unclear. In this section, we focus on multiple criteria decision-making, which refers to making decisions in the presence of multiple and possibly conflicting criteria. In a constraint satisfaction problem (CSP), one must find states or objects in a system that satisfy a number of constraints or criteria. A CSP consists of • asetofn variables X, • a domain D i (i.e. a finite set of possible values) for each variable x i in X,and • a set of constraints restricting the feasibility of the tuples (x 0 ,x 1 , ,x n−1 ) ∈ D 0 × ···×D n−1 . A solution is an assignment of a value in D i to each variable x i such that every constraint is satisfied. Because a CSP lacks an objective function, it is not an optimization problem. As an example of a CSP, Figure 7.6 illustrates a monkey puzzle problem (Harel 1987, pp. 153–155). The 3 · 4 = 12 tile positions identify the variables, the tiles define the domain set, and the requirement that all the monkey halves must match defines (3 −1) · 4 +3 · (4 − 1) = 17 constraints. Unfortunately, the modelled problems are not always as discrete and easy to form. Fuzzy sets have also been proposed for extending CSPs so that partial satisfaction of the constraints is possible. The constraints can be more or less relaxable or subject to preferences. These flexible constraints are either soft constraints, which express preferences among solutions, or prioritized constraints that can be violated if they conflict with constraints with a higher priority (Dubois et al. 1996). [...]... Replace the evidence ‘candy wrapper’ with this new evidence and determine a new combined mass function mnf e What are the belief and plausability of the hypotheses ‘enemy’ and ‘animal’? What are the beliefs and plausabilities if we observe all four evidences ‘noise’, ‘footprints’, ‘candy wrapper’ and ‘eaten leaves’? 7- 9 Figure 7. 4 gives fuzzy sets for the accuracy of weapons and Figure 7. 5 gives the attributes... a high-bandwidth network that has a low latency and vice versa For interactive real-time systems such as computer games, the rule of thumb is that latency between 0.1 and 1.0 s is acceptable For instance, the Distributed Interactive Simulation (DIS) standard used in military simulations specifies that the network latency should be less than 100 ms (Neyland 19 97) Latency affects the user’s performance... discern three communication layers: (i) The physical platform induces resource limitations (e.g bandwidth and latency) that reflect the underlying infrastructure (e.g cabling and hardware) (ii) The logical platform builds upon the physical platform and provides architectures for communication, data, and control (e.g mutually exclusive data locks and communication rerouting mechanisms) (iii) The networked... (see Figure 7. 7): Players are moving inside a closed two-dimensional play field Each player MODELLING UNCERTAINTY 161 p1 p3 h p2 Pond w Figure 7. 7 The set-up of Dog Eat Dog for three players Player p1 has the enemy p2 and the prey p3 , player p2 has the enemy p3 and the prey p1 , and player p3 has the enemy p1 and the prey p2 The dashed circles represent the limit of the players’ visual range and the dotted... this case, the nodes include more computation to reduce the bandwidth and latency requirements In reality, an architecture cannot achieve both high COMMUNICATION LAYERS 177 p1 (a) d0 d1 d2 d3 d4 d5 d6 d7 p0 p2 p1 (b) d0 d1 d2 d3 d4 d5 d6 d7 p0 p2 p1 (c) d0 d6 p0 p2 d0 d1 d2 d3 d4 d5 d6 d7 d0 d1 d2 d3 d4 d5 d6 d7 d2 d4 d7 d1 d3 d5 Figure 8.4 Data and control architectures: (a) In centralized data architecture,... Algorithms and Networking for Computer Games Jouni Smed and Harri Hakonen 2006 John Wiley & Sons, Ltd 172 COMMUNICATION LAYERS Artificiality Synthetic Augmented reality Virtual reality Physical Physical reality Telepresence Local Remote Transportation Figure 8.1 Classification of shared-space technologies according to transportation and artificiality To clarify conceptually how networked games work,... Add this information to the Bayesian network and recalculate the values of Table 7. 1 7- 5 Explain (intuitively) how the terms ‘plausability’ and ‘doubt’ presented in Figure 7. 2 relate to one another 7- 6 Model the situation of Exercise 7- 4 using Dempster–Shafer theory 7- 7 Why is the empty set excluded from the mass function? 7- 8 Let us add to the example given in page 153 a new evidence ‘eaten leaves’ with... COMMUNICATION LAYERS 173 8.1 Physical Platform Networking is subject to resource limitations (e.g physical, technical, and computational), which set boundaries for what is possible Once we have established a network of connections between a set of nodes (i.e the computers in the network), we need a technique for transmitting the data from one node to another The content and delivery of information are expressed... for the puzzle (and vice versa)? 7- 14 Is it possible to formulate the monkey puzzle problems of Exercises 7- 12 and 7- 13 as FCSPs? 168 MODELLING UNCERTAINTY o z p a p a q b q b r s z y a c a c b d b d s t y x c w c w d v d v t u Figure 7. 11 Monkey puzzle variant in which the tiles can be adjacent only when their corner letters match 7- 15 Let us denote the quarter monkeys of Exercise 7- 13 with numbers... football in a computer game) Apart from physical reality, where interaction is immediate, other shared-space technologies require a distributed system – namely, computers and networks – so that the participants can interact with each other Networked computer games mainly belong to the virtual reality category, although location-based games, which use wireless networking and mobile platforms, have more . ‘candy wrapper’ and ‘eaten leaves’? 7- 9 Figure 7. 4 gives fuzzy sets for the accuracy of weapons and Figure 7. 5 gives the attributes of infantry. Given that we know the distance to the enemy and. hypothesis 1: for all M ∈ (℘ () {∅,}) do 2: m 3 (M) ← 0 3: end for 4: m 3 () ← 1 5: M 3 ← 6: e ← 0 7: for all M 1 ∈ M 1 do For pairs of members between M 1 and M 2 . 8: for all M 2 ∈. Table 7. 1. 7- 5 Explain (intuitively) how the terms ‘plausability’ and ‘doubt’ presented in Figure 7. 2 relate to one another. 7- 6 Model the situation of Exercise 7- 4 using Dempster–Shafer theory. 7- 7