Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 25 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
25
Dung lượng
446,08 KB
Nội dung
Collective Classification in 2D and 3D Range Data 297 C( f )=FDR f = K i K j≡i (z i −z j ) 2 V i + V j , (6) where the subscripts i, j refer to the mean and variance of the classes w i and w j respectively. Additionally, the cross-correlation coefficient between any two features f and g given T training examples is defined as: U fg = T t=1 x tf x tg T t=1 x 2 tf T t=1 x 2 tg , (7) where x tf denotes the value of the feature f in the training example t. Finally, the selection of the best L features involves the following steps: • Select the first feature f 1 as f 1 = argmax f C( f ). • Select the second feature f 2 as: f 2 = argmax f ≡f 1 D 1 C( f )−D 2 |U f 1 f | , where D 1 and D 2 are weighting factors. • Select f l , l = 1, ,L, such that: f l = argmax f ≡f r D 1 C( f )− D 2 l −1 l r=1 |U f r f | , r = 1,2, ,l −1 6 Experiments The approach described above has been implemented and tested in several 2D maps and 3D scenes. The goal of the experiment is to show the effectiveness of the iAMN in different indoor range data. 6.1 Classification of places in 2D maps This experiment was carried out using the occupancy grid map of the building 79 at the University of Freiburg. For efficiency reasons we used a grid resolution of 20cm, which lead us to a graph of 8088 nodes. The map was divided into two parts, the left one used for learning, and the right one used for classification purposes (Figure 1). For each cell we calculate 203 geometrical features. This number was reduced to 30 applying the feature selection of Section 5. The right image of Figure 1 shows the resulting classification with a success rate of 97.6%. 298 Triebel et al. Fig. 1. The left image depicts the training map of building 79 at the University of Freiburg. The right image shows the resulting classified map using an iAMN with 30 selected features. 6.2 Classification of objects in 3D scenes In this experiment we classify 3D scans of objects that appear in a laboratory of the building 79 of the University of Freiburg. The laboratory contain tables, chairs, monitors and ventilators. For each object class, an iAMN is trained with 3D range scans each containing just one object of this class (apart from tables, which may have screens standing on top of them). Figure 2 shows three example training objects. A complete laboratory in the building 79 of the University of Freiburg was later scanned with a 3D laser. In this 3D scene all the objects appear together and the scene is used as a test set. The resulting classification is shown in Figure 3. In this experiment 76.0% of the 3D points where classified correctly. 6.3 Comparison with previous approaches In this section we compare our results with the ones obtained using other approaches for place and object classification. First, we compare the classification of the 2D map when using a classifier based on AdaBoost as shown by Martinez Mozos et al. (2005). In this case we obtained a classification rate of 92.1%, in contrast with the 97.6% ob- tained using iAMNs. We believe that the reason for this improvement is the neighbor- ing relation between classes, which is ignored when using the AdaBoost approach. In a second experiment, we compare the resulting classification of the 3D scene with the one obtained when using AMN and NN. As we can see in Table 1, iAMNs perform better than the other approaches. A posterior statistical analysis using the t-student test indicates that the improvement is significant at the 0.05 level. We additionally realized different experiments in which we used the 3D scans of isolated objects for training and test purposes. The results are shown in Table 1 and they confirm that iAMN outperform the other methods. 7 Conclusions In this paper we propose a semantic classification algorithm that combines associa- tive Markov networks with an instance-based approach based on nearest neighbor. Collective Classification in 2D and 3D Range Data 299 Fig. 2. 3D scans of isolated objects used for training: a ventilator, a chair and a table with a monitor on top. Fig. 3. Classification of a complete 3D range scan obtained in a laboratory at the University of Freiburg. Table 1. Classification results in 3D data Data set NN AMN iAMN Complete scene 63% 62% 76% Isolated objects 81% 72% 89% Furthermore, we show how this method can be used to classify points described by features extracted from 2D and 3D laser scans. Additionally, we present an approach to reduce the number of features needed to represent each data point, while main- taining their class discriminatory information. Experiments carried out in 2D and 3D 300 Triebel et al. maps demonstrated the effectiveness of our approach for semantic classification of places and objects in indoor environments. 8 Acknowledgment This work has been supported by the EU under the project CoSy with number FP6- 004250-IP and under the project BACS with number FP6-IST-027140. References ALTHAUS, P. and CHRISTENSEN, H.I. (2003): Behaviour Coordination in Structured Envi- ronments. Advanced Robotics, 17(7), 657–674. ANGUELOV, D., TASKAR, B., CHATALBASHEV, V., KOLLER, D., GUPTA, D., HEITZ, G. and NG, A. (2005): Discriminative Learning of Markov Random Fields for Segmen- tation of 3D Scan Data. IEEE Computer Vision and Pattern Recognition. BOYKOV, Y. and HUTTENLOCHER. D. P. (1999): A New Bayesian Approach to Object Recognition. IEEE Computer Vision and Pattern Recognition. FRIEDMAN, S., PASULA, S. and FOX, D. (2007): Voronoi Random Fields: Extracting the Topological Structure of Indoor Environments via Place Labeling. International Joint Conference on Artificial Intelligence. HUBER, D., KAPURIA, A., DONAMUKKALA, R. R. and HEBERT, M. (2004): Parts-Based 3D Object Classification. IEEE Computer Vision and Pattern Recognition. JOHNSON, A. (1997): Spin-Images: A Representation for 3-D Surface Matching. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. KOENIG, S. and SIMMONS, R. (1998): Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models. In: Kortenkamp, D. and Bonasso, R. and Murphy, R. (Eds). Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems. MIT-Press, 91–122. MARTINEZ MOZOS, O., STACHNISS, C. and BURGARD, W. (2005): Supervised Learning of Places from Range Data using Adaboost. IEEE International Conference on Robotics & Automation. MORAVEC, H. P. (1988): Sensor Fusion in Certainty Grids for Mobile Robots. AI Magazine, 61–74. OSADA, R., FUNKHOUSER, T., CHAZELLE, B. and DOBKIN, D. (2001): Matching 3D Models with Shape Distributions. Shape Modeling International 154–166. TASKAR, B., CHATALBASHEV, V. and KOLLER, D. (2004): Learning Associative Markov Networks.International Conference on Machine Learning. THEODORIDIS, S. and KOUTROUMBAS, K. (2006): Pattern Recognition. Academic Press, 3rd Edition, 2006. TRIEBEL, R., SCHMIDT, R., MARTINEZ MOZOS, O. and BURGARD, W. (2007): Instace- based AMN Classification for Improved Object Recognition in 2D and 3D Laser Range Data. International Joint Conference on Artificial Intelligence FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns Steffen Kempe 1 , Jochen Hipp 1 and Rudolf Kruse 2 1 DaimlerChrysler AG, Group Research, 89081 Ulm, Germany {Steffen.Kempe, Jochen.Hipp}@daimlerchrysler.com 2 Dept. of Knowledge Processing and Language Engineering, University of Magdeburg, 39106 Magdeburg, Germany Kruse@iws.cs.uni-magdeburg.de Abstract. Research in the field of knowledge discovery from temporal data recently focused on a new type of data: interval sequences. In contrast to event sequences interval sequences contain labeled events with a temporal extension. Mining frequent temporal patterns from interval sequences proved to be a valuable tool for generating knowledge in the automotive business. In this paper we propose a new algorithm for mining frequent temporal patterns from interval sequences: FSMTree. FSMTree uses a prefix tree data structure to efficiently organize all finite state machines and therefore dramatically reduces execution times. We demonstrate the algorithm’s performance on field data from the automotive business. 1 Introduction Mining sequences from temporal data is a well known data mining task which gained much attention in the past (e.g. Agrawal and Srikant (1995), Mannila et al. (1997), or Pei et al. (2001)). In all these approaches, the temporal data is considered to con- sist of events. Each event has a label and a timestamp. In the following, however, we focus on temporal data where an event has a temporal extension. These tempo- rally extended events are called temporal intervals. Each temporal interval can be described by a triplet (b,e,l) where b and e denote the beginning and the end of the interval and l its label. At DaimlerChrysler we are interested in mining interval sequences in order to further extend the knowledge about our products. Thus, in our domain one interval sequence may describe the history of one vehicle. The configuration of a vehicle, e.g. whether it is an estate car or a limousine, can be described by temporal intervals. The build date is the beginning and the current day is the end of such a temporal interval. Other temporal intervals may describe stopovers in a garage or the installation of additional equipment. Hence, mining these interval sequences might help us in tasks like quality monitoring or improving customer satisfaction. 254 Steffen Kempe, Jochen Hipp and Rudolf Kruse 2 Foundations and related work As mentioned above we represent a temporal interval as a triplet (b,e,l). Definition 1. (Temporal Interval) Given a set of labels l ∈ L, we say the triplet (b,e,l) ∈ R ×R ×L is a temporal interval, if b ≤ e. The set of all temporal inter- vals over L is denoted by I. Definition 2. (Interval Sequence) Given a sequence of temporal intervals, we say (b 1 ,e 1 ,l 1 ),(b 2 ,e 2 ,l 2 ), ,(b n ,e n ,l n ) ∈ I is an interval sequence, if ∀(b i ,e i ,l i ),(b j ,e j ,l j ) ∈I,i = j : b i ≤ b j ∧e i ≥ b j ⇒ l i = l j (1) ∀(b i ,e i ,l i ),(b j ,e j ,l j ) ∈ I,i < j : (b i < b j ) ∨(b i = b j ∧e i < e j ) ∨(b i = b j ∧e i = e j ∧l i < l j ) (2) hold. A given set of interval sequences is denoted by S. Equation 1 above is referred to as the maximality assumption (Höppner (2002)). The maximality assumption guarantees that each temporal interval A is maximal, in the sense that there is no other temporal interval in the sequence sharing a time with A and carrying the same label. Equation 2 requires that an interval sequence has to be ordered by the beginning (primary), end (secondary) and label (tertiary, lexicographically) of its temporal intervals. Without temporal extension there are only two possible relations. One event is before (or after as the inverse relation) the other or they coincide. Due to the tem- poral extension of temporal intervals the possible relations between two intervals become more complex. There are 7 possible relations (or 13 if one includes inverse relations). These interval relations have been described in Allen (1983) and are de- picted in Figure 1. Each relation of Figure 1 is a temporal pattern on its own that consists of two temporal intervals. Patterns with more than two temporal intervals are straightforward. One just needs to know which interval relation exists between each pair of labels. Using the set of Allen’s interval relations I, a temporal pattern is defined by: Definition 3. (Temporal Pattern) A pair P =(s,R), where s :1, ,n → L and R ∈ I n×n ,n∈ N, is called a “temporal pattern of size n” or “n-pattern”. Fig. 1. Allen’s Interval Relations FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns 255 a) b) ABA A eob B io e m A aime Fig. 2. a) Example of an interval sequence: (1,4,A), (3,7,B), (7,10,A) b) Example of a temporal pattern (e stands for equals, o for overlaps,bforbefore,mformeets,ioforis-overlapped-by, etc.) Figure 2.a shows an example of an interval sequence. The corresponding tempo- ral pattern is given in Figure 2.b. Note that a temporal pattern is not necessarily valid in the sense that it must be possible to construct an interval sequence for which the pattern holds true. On the other hand, if a temporal pattern holds true for an interval sequence we consider this sequence as an instance of the pattern. Definition 4. (Instance) An interval sequence S =(b i ,e i ,l i ) 1≤i≤n conforms to a n- pattern P =(s, R),if∀i, j : s(i)=l i ∧s( j)=l j ∧R[i, j]=ir([b i ,e i ],[b j ,e j ]) with func- tion ir returning the relation between two given intervals. We say that the interval sequence S is an instance of temporal pattern P. We say that an interval sequence S contains an instance of P if S ⊆ S , i.e. S is a subsequence of S . Obviously a temporal pattern can only be valid if its labels have the same order as their corresponding temporal intervals have in an instance of the pattern. Next, we define the support of a temporal pattern. Definition 5. (Minimal Occurrence) For a given interval sequence S a time interval (time window) [b,e] is called a minimal occurrence of the k-pattern P (k ≥2), if (1.) the time interval [b,e] of S contains an instance of P, and (2.) there is no proper subinterval [b ,e ] of [b,e] which also contains an instance of P. For a given interval sequence S a time interval [b,e] is called a minimal occurrence of the 1-pattern P,if (1.) the temporal interval (b,e,l) is contained in S, and (2.) l is the label in P. Definition 6. (Support) The support of a temporal pattern P for a given set of interval sequences S is given by the number of minimal occurrences of P in S: Sup S (P)= |{[b,e] : [b, e] is a minimal occurrence of P in S ∧S ∈S}|. As an illustration consider the pattern AbeforeAin the example of Figure 2.a. The time window [1, 11] is not a minimal occurrence as the pattern is also visible e.g. in its subwindow [2, 9]. Also the time window [5, 8] is not a minimal occurrence. It does not contain an instance of the pattern. The only minimal occurrence is [4,7] as the endofthefirst and the beginning of the second A are just inside the time window. The mining task is to find all temporal patterns in a set of interval sequences which satisfy a defined minimum support threshold. Note that this task is closely related to frequent itemset mining, e.g. Agrawal et al. (1993). Previous investigations on discovering frequent patterns from sequences of tem- poral intervals include the work of Höppner (2002), Kam and Fu (2000), Papapetrou 256 Steffen Kempe, Jochen Hipp and Rudolf Kruse et al. (2005), and Winarko and Roddick (2005). These approaches can be divided into two different groups. The main difference between both groups is the definition of support. Höppner defines the temporal support of a pattern. It can be interpreted as the probability to see an instance of the pattern within the time window if the time window is randomly placed on the interval sequence. All other approaches count the number of instances for each pattern. The pattern counter is incremented once for each sequence that contains the pattern. If an interval sequence contains multiple instances of a pattern then these additional instances will not further increment the counter. For our application neither of the support definitions turned out to be satisfying. Höppner’s temporal support of a pattern is hard to interpret in our domain, as it is generally not related to the number of instances of this pattern in the data. Also neglecting multiple instances of a pattern within one interval sequence is inapplicable when mining the repair history of vehicles. Therefore we extended the approach of minimal occurrences in Mannila (1997) to the demands of temporal intervals. In contrast to previous approaches, our support definition allows (1.) to count the number of pattern instances, (2.) to handle multiple instances of a pattern within one interval sequence, and (3.) to apply time constraints on a pattern instance. 3 Algorithms FSMSet and FSMTree In Kempe and Hipp (2006) we presented FSMSet, an algorithm to find all frequent patterns within a set of interval sequences S. The main idea is to generate all frequent temporal patterns by applying the Apriori scheme of candidate generation and sup- port evaluation. Therefore FSMSet consists of two steps: generation of candidate sets and support evaluation of these candidates. These two steps are alternately repeated until no more candidates are generated. The Apriori scheme starts with the frequent 1-patterns and then successively derives all k-candidates from the set of all frequent (k-1)-patterns. In this paper we will focus on the support evaluation of the candidate patterns, as it is the most time consuming part of the algorithm. FSMSet uses finite state machines which subsequently take the temporal intervals of an interval sequence as input to find all instances of a candidate pattern. It is straightforward to derive a finite state machine from a temporal pattern. For each label in the temporal pattern a state is generated. The finite state machine starts in an initial state. The next state is reached if we input a temporal interval that contains the same label as the first label of the temporal pattern. From now on the next states can only be reached if the shown temporal interval carries the same label as the state and its interval relation to all previously accepted temporal intervals is the same as specified in the temporal pattern. If the finite state machine reaches its last state it also reaches its final accepting state. Consequently the temporal intervals that have been accepted by the state machine are an instance of the temporal pattern. The minimal time window in which this pattern instance is visible can be derived from the temporal intervals which have been accepted by the state machine. We FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns 257 a) b) c) d) e) Fig. 3. a) – d) four candidate patterns of size 3 e) an interval sequence Table 1. Set of state machines of FSMSet for the example of Figure 3. Each column shows the new state machines that have been added by FSMSet. 1 2 3 4 5 6 S a () S a (1) S a (2) S c (3) S c (3,4) S a (5) S a (1,3,6) S b () S b (1) S b (2) S d (3) S d (3,4) S b (5) S b (2,3,6) S c () S a (1,3) S c (3,4,5) S d () S b (2,3) know that the time window contains an instance but we do not know whether it is a minimal occurrence. Therefore FSMSet applies a two step approach. First it will find all instances of a pattern using state machines. Then it prunes all time windows which are not minimal occurrences. To find all instances of a pattern in an interval sequence FSMSet is maintaining asetoffinite state machines. At first, the set only contains the state machine that is derived from the candidate pattern. Subsequently, each temporal interval from the interval sequence is shown to every state machine in the set. If a state machine can accept the temporal interval, a copy of the state machine is added to the set. The temporal interval is shown only to one of these two state machines. Hence, there will always be a copy of the initial state machine in the set trying to find a new instance of the pattern. In this way FSMSet also can handle situations in which single state machines do not suffice. Consider the pattern A meets B and the interval sequence (1, 2, A), (3, 4, A), (4, 5, B). Without using look ahead a single finite state machine would accept the first temporal interval (1, 2, A). This state machine is stuck as it cannot reach its final state because there is no temporal interval which is-met-by (1, 2, A). Hence the pattern instance (3, 4, A), (4, 5, B) could not be found by a single state machine. Here this is not a problem because there is a copy of the first state machine which will find the pattern instance. Figure 3 and Table 1 give an example of FSMSet’s support evaluation. There are four candidate patterns (Figure 3.a – 3.d) for which the support has to be evaluated on the given interval sequence in Figure 3.e. At first, a state machine is derived for each candidate pattern. The first column in Table 1 corresponds to this initialization (state machines S a – S d ). Afterwards each temporal interval of the sequence is used as input for the state machines. The first temporal interval has label A and can only be accepted by the state machines S a () and S b (). Thus the new state machines S a (1) and S b (1) are added. The numbers 258 Steffen Kempe, Jochen Hipp and Rudolf Kruse in brackets refer to the temporal intervals of the interval sequence that have been accepted by the state machine. The second temporal interval carries again the label A and can only be accepted by S a () and S b (). The third temporal interval has label B and can be accepted by S c () and S d (). It also stands to the first A in the relation after and to the second A in the relation is-overlapped-by. Hence also the state machines S a (1) and S b (2) can accept this interval. Table 1 shows all new state machines for each temporal interval of the interval sequence. For this example the approach of FSMSet needs 19 state machines to find all three instances of the candidate patterns. A closer examination of the state machines in Table 1 reveals that many state machines show a similar behavior. E.g. both state machines S c and S d accept ex- actly the same temporal intervals until the fourth iteration of FSMSet. Only the fifth temporal interval cannot be accepted by S d . The reason is that both state machines share the common subpattern B overlaps C as their first part (i.e. a common prefix pattern). Only after this prefix pattern is processed their behavior can differ. Thus we can minimize the algorithmic costs of FSMSet by combining all state machines that share a common prefix. Combining all state machines of Figure 3 in a single data structure leads to the prefix tree in Figure 4. Each path of the tree is a state machine. But now different state machines can share states, if their candidate patterns share a common pattern prefix. By using the new data structure we derive a new algorithm for the support evaluation of candidate patterns — FSMTree. Instead of maintaining a list of state machines FSMTree maintains a list of nodes from the prefix tree. In the first step the list only contains the root node of the tree. Af- terwards all temporal intervals of the interval sequence are processed subsequently. Each time a node of the set can accept the current temporal interval its corresponding child node is added to the set. Table 2 shows the new nodes that are added in each step if we apply the prefix tree of Figure 4 to the example of Figure 3. Obviously the algorithmic overhead is reduced significantly. Instead of 19 state machines FSMTree only needs 11 nodes to find all pattern instances. Fig. 4. FSMTree: prefix tree of state machines based on the candidates of Figure 3 [...]... (Kramer et al 20 01), FSG (Kuramochi and Karypis 20 01), MoSS/MoFa (Borgelt and Berthold 20 02) , gSpan (Yan and Han 20 02) , Closegraph (Yan and Han 20 03), FFSM (Huan et al 20 03), and Gaston (Nijssen and Kok 20 04) A related, but slightly different approach is used in Subdue (Cook and Holder 20 00) The basic idea of these approaches is to grow subgraphs into the graphs of the database, adding an edge and maybe... HÖPPNER, F and KLAWONN, F (20 02) : Finding informative rules in interval sequences Intelligent Data Analysis, 6(3) :23 7 25 5 KAM, P.-S and FU, A W.-C (20 00): Discovering Temporal Patterns for Interval-Based Events In: Data Warehousing and Knowledge Discovery, 2nd Int Conf., DaWaK 20 00 Springer, 317– 326 KEMPE, S and HIPP, J (20 06): Mining Sequences of Temporal Intervals In: 10th Europ Conf on Principles and Practice... are assumed to be estimated, were derived by Anselin (1988b): LM A = LM A = [ˆ W 2 u/ ˆ 2 ]2 u ˆ , T 22 − (T21A )2 var( ˆ ) ˆ [ˆ B BW 1 y ]2 u Hrho − H var( ˆ )H ˆ (6) , (7) 12) , B = I − ˆ W 2 , where T21A = tr[W 2W 1 A−1 +W 2W 1 A−1 ], A = I − ˆ W 1 , = ( = H = trW 2 + tr(BW B−1 ) (BW B−1 ) + 12 (BW X ) (BW X ) and H ⎞ ⎛ 1 2 (BX) BW X ⎝ tr(W B−1 ) BW B−1 + trWW B−1 ⎠, with var( ˜ ) as the estimated variance... Discovery and Data Mining (KDD2004, Seattle, WA), 647–6 52 ACM Press, New York, NY, USA YAN, X., and HAN, J (20 02) : gSpan: Graph-Based Substructure Pattern Mining Proc 2nd IEEE Int Conf on Data Mining (ICDM 20 03, Maebashi, Japan), 721 – 724 IEEE Press, Piscataway, NJ, USA YAN, X., and HAN, J (20 03): Closegraph: Mining Closed Frequent Graph Patterns Proc 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data. .. duplicates 60 40 20 23 5 20 0 0 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Fig 2 Experimental results on the IC93 data set, numbers of subgraphs used in the search Left: maximum source extensions, right: rightmost path extensions 80 80 subgraphs/104 generated accesses isom tests duplicates 60 40 subgraphs/104 generated accesses isom tests duplicates 60 40 20 20 0 0 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5... CHEN, Q., DAYAL, U and HSU, M (20 01): Prefixspan: Mining sequential patterns by prefix-projected growth In: Proc of the 17th Int Conf on Data Engineering (ICDE ’01) 21 5 22 4 WINARKO, E and RODDICK, J F (20 05): Discovering Richer Temporal Association Rules from Interval-Based Data In: Data Warehousing and Knowledge Discovery, 7th Int Conf., DaWaK 20 05 Springer, Berlin-Heidelberg, 315– 325 Graph Mining:... L., and HELMA, C (20 01): Molecular Feature Mining in HIV Data Proc 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD 20 01, San Francisco, CA), 136–143 ACM Press, New York, NY, USA KURAMOCHI, M., and KARYPIS, G (20 01): Frequent Subgraph Discovery Proc 1st IEEE Int Conf on Data Mining (ICDM 20 01, San Jose, CA), 313– 320 IEEE Press, Piscataway, NJ, USA NIJSSEN, S., and KOK, J.N (20 04):... (1998): Pharmacore Discovery Using the Inductive Logic Programming System PROGOL Machine Learning, 30 (2- 3) :24 1 27 0 Kluwer, Amsterdam, Netherlands HUAN, J., WANG, W., and PRINS, J (20 03): Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism Proc 3rd IEEE Int Conf on Data Mining (ICDM 20 03, Melbourne, FL), 549–5 52 IEEE Press, Piscataway, NJ, USA INDEX CHEMICUS — Subset from 1993 Institute... considered in the search before For example, Borgelt (20 06) studied a family of such canonical forms, which comprises the special forms used in gSpan (Yan and Han 20 02) and Closegraph (Yan and Han 20 03) as well as the one underlying MoSS/MoFa (Borgelt and Berthold 20 02) However, canonical form pruning is not the only way to avoid redundant search A simpler and much more straightforward approach is a repository... summing them and computing their bitwise exclusive or, but also apply bitwise shifts of varying width in order to cover the full range of values of ( 32 bit) integer numbers 23 4 Christian Borgelt and Mathias Fiedler 80 80 time/seconds canon form repository 60 time/seconds canon form repository 60 40 40 20 20 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Fig 1 Experimental results on the IC93 data set, . 20 01), FSG (Kuramochi and Karypis 20 01), MoSS/MoFa (Borgelt and Berthold 20 02) , gSpan (Yan and Han 20 02) , Closegraph (Yan and Han 20 03), FFSM (Huan et al. 20 03), and Gaston (Nijssen and Kok 20 04). A. sequences. Intelligent Data Analysis, 6(3) :23 7 25 5. KAM, P S. and FU, A. W C. (20 00): Discovering Temporal Patterns for Interval-Based Events. In: Data Warehousing and Knowledge Discovery, 2nd Int. Conf., DaWaK 20 00. Springer,. intervals include the work of Höppner (20 02) , Kam and Fu (20 00), Papapetrou 25 6 Steffen Kempe, Jochen Hipp and Rudolf Kruse et al. (20 05), and Winarko and Roddick (20 05). These approaches can be divided into