research on matching method for case retrieval process in cbr based on fcm

Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 174 (2017) 267 – 274 13th Global Congress on Manufacturing and Management, GCMM 2016 Research on matching method for case retrieval process in CBR based on FCM Zhao Yamina*, Zhang Mengmenga, Guo Xiaomina, Zhou Zhiweia, Zhang Jianhuaa a Management Engineering Department, ZhengzhouUniversity, Zhengzhou, China Abstract Era of knowledge economy, how to effectively mining, the use of knowledge is the enterprise growing concern CBR system from the field of artificial intelligence is a self-learning system to manage tacit knowledge (case) Case retrieval link is the core link, the advantages and disadvantages of search methods directly affect the efficiency of case retrieval and case matching accuracy Therefore, this paper proposes a new case matching process: when the size of the case database is small, it searches based on the case similarity algorithm; when the case database is large, it searches based on the FCM secondary retrieval model And illustrates the fastness and efficiency of FCM in matching large-scale case database 2016The TheAuthors Authors Published by Elsevier © 2017 Published by Elsevier Ltd Ltd This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management Keywords: CBR; case matching; similarity algorithm; FCM Introduction The focus of knowledge management is on the management of tacit knowledge Externality of tacit knowledge is the key to the success of the organization to create and utilize new knowledge effectively, and case is an important way for the explicitization of tacit knowledge CBR (Case Based Reasoning) system is widely used in all kinds of knowledge-based systems because of its advantages such as easy explicit tacit knowledge explicit, high knowledge utilization rate and good self-learning ability In general, the CBR process includes: case representation and organization, case retrieval, case adaptation and revision, case study and management In the CBR process, case * Corresponding author Tel.: 15690875485 E-mail address: 1217200588@qq.com 1877-7058 © 2017 The Authors Published by Elsevier Ltd This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management doi:10.1016/j.proeng.2017.01.134 268 Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 retrieval is the most core activities, the results of the search results will directly affect the efficiency of the entire CBR system operation and accuracy The quality of the retrieved cases depends on the accuracy of the case similarity calculation A good retrieval algorithm can quickly achieve the retrieval, and the retrieved cases are as similar as possible and the number is as small as possible Based on these, the focus of this study is how to effectively apply FCM to CBR, so as to solve the case matching problem in the case retrieval process Literature review At present, scholars have made a lot of research on case retrieval in CBR, and have also obtained a lot of valuable theoretical and case retrieval methods Including four aspects: First, the case for attribute reduction, thereby enhancing the speed of retrieval Zhu Haodong, Zhong Yong (2010) [1] proposed to attribute reduction and class correlation algorithm combined, and then retrieve the case The algorithm not only simplifies the retrieval process, but also improves the accuracy of the retrieval However, it needs to discretize the continuous attributes, resulting in information distortion.Second,by optimizing the representation structure of the case, thereby reducing the difficulty of retrieval Li Linlin, Sun Jiyin et al (2007) [2] proposed a case retrieval algorithm based on decision tree knowledge representation: transforming expert experience into tree structure and then searching This method is easy to retrieve, but when the case database changes need to re-create and storage, resulting in greater cost (2012) [3] analyzed and designed the object-oriented case-based knowledge structure of the case knowledge, and on this basis, the design of KM case knowledge representation subsystem model, and explain the working mechanism of the model, Full support for case knowledge representation Then,by improving the degree of recognition algorithm, to achieve the precise case retrieval (2012) [4] proposed an improved Nearest Neighbor method to calculate the similarity of cases, considering the local similarity of attribute values and the local similarity of attribute weights, but the algorithm is not applicable to all attributes Value Zhang Jianhua (2014) [5] on this basis, the attribute values were divided, from different situations on the calculation of the degree of understanding were described, and verified by an example This also helps to compute the degree of recognition in this paper.Finally,first classification algorithm with the case, thereby improving the speed of retrieval Xiao Feng, Xin Daxin (2002) [6] proposed a neural network-based method for retrieving multimedia database content to adaptively classify cases and then identify and match them This method can significantly improve the retrieval speed, but when the case attribute is more time-consuming retrieval is longer Although the existing literature on the common case matching algorithm to make improvements, but will increase as the case or case attributes, and increase the cost and time At this time, it is necessary to classify the cases in the case base by using clustering algorithm before case matching In the real world, people like to be divided into different categories of objects, such as the biological is divided into sectors, doors, classes, heads, families, genera, species This division process is the process of clustering Up to now, researchers have put forward a lot of clustering methods [7-11], the specific content as shown in Table Table Cluster analysis of the advantages and disadvantages Name Advantages Disadvantagrs Division Effective and simple The number of clusters is difficult to determine,and it is difficult to obtain the global optimum Level Effective simple and Easy to understand Irreversible Density-based approach The resulting clusters are of arbitrary shape The threshold value is not easy to determine Grid-based approach High speed The underlying granularity of knowledge is not easy to grasp Model-based approach Simple, easy to operate As the type of case attribute increases, the model needs to be continually transformed 269 Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 By comparing the advantages and disadvantages of each clustering method, it is found that only the shortcomings of the partitioning method have less influence on the whole case retrieval process However, K-means clustering method is very effective, but its objective function is "rigid", that is, the object either belongs to a certain cluster or does not belong to a cluster completely In view of this problem Case matching process 3.1 Fuzzy C - Means Clustering Algorithm Let be classified as a collection of objects: X = ^x1 , x2 ,K , xn ` , where each object has ࣾ characteristic indicators, set to a1 , a2 , K , am If the fuzzy classification, X is divided into ࣷ class, that the classification of objects in the object set to a certain degree of membership of a certain class, all objects belong to a different class of membership Each of these classes of X corresponds to a fuzzy matrix 1൑ࣷ൑ࣿ and c n i j P >0,1@ ; ¦ Pij ; ¦ Pij t ; i ij To achieve the classification of the classification of objects, according to the characteristics of ࣿ target index, under certain conditions to find the best fuzzy classification matrix U Let the matrix of ࣷ cluster center vectors be: V (V1 ,V2 ,K ,Vc)T , Vi {vi1 , vi ,K , vim } , i 1, 2, , c The objective function is: ¦¦ ( P n J (U ,V ) c j i ij ) q P X j Vi P2 The clustering criterion is to find the appropriate fuzzy classification matrix U and cluster center V , so that the objective function to a minimum Where q is a certain value, generally q ; P X j Vi P is the Euclidean distance between the object X j and the cluster center vector Vi of the ࣻ class, the formula is: m di ¦ (Zk (a jk vik )2 ) k Where a jk represents the value of the feature index ak of the object X j The iterative calculation is usually used to obtain the approximate solution of the given objective function Specific steps are as follows: Step 1: Select the number of categories ࣷ, take an initial fuzzy classification matrix U (0) , step iteration l 0,1, Step 2: For U (0) , calculate the cluster centers: n V (l ) (V1(l ) ,V2(l ) ,K ,VC(l ) )T Vi (l ) ¦ (P n (l ) q ij ) Aj j ¦ (P (l ) q ij ) j Step 3: Modify the fuzzy classification matrix U (0) , i , Aj z Vi , P ( l 1) ij ª c P Aj Vi (l ) P q21 º ) ằ ôƯ ( (l ) ơô k P Aj Vk P ¼» 1 Step 4: Compare U (l ) and U (l 1) , if max{ Pij(l 1) Pijl } d H and ε>0 , then U (l 1) and V (l ) are the requirements, stop the iteration; otherwise, let l l , return to Step2 Step5: Terminate the algorithm and output the result 270 Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 3.2 Similarity calculation The similarity between the two cases is generally based on the local similarity and the global similarity: the similarity: the similarity of the attribute values of each attribute among the cases; the overall similarity: the similarity between the various attributes is the similarity view similarity There are three types of attributes: numeric attribute, unordered attribute and ordered attribute (1) numerical attributes Property values are represented by consecutive values, such as age, weight, and so on (2) disordered properties The attribute values are for character, boolean or simple enumeration type, such as name, marital status (True & False), political appearance (masses,Communist Youth League members,Communist party members, other parties) There is no order relationship or number relationship between these attribute values (3) ordered type attribute Unlike the disordered attribute, the attribute value of the ordered attribute shows a hierarchical order relation The similarity between the attribute values can be measured but a certain ambiguity is shown For example, the evaluation of employee performance of the property value of the "good, good, medium, can, and poor", these property values are hierarchical relationship between the value of "good" property ratio "middle" As "excellent" attribute On the basis of literature research, this paper proposes the following definition of similarity:Place the figure as close as possible after the point where it is first referenced in the text If there is a large number of figures and tables it might be necessary to place some before their text citation If a figure or table is too large to fit into one column, it can be centred across both columns at the top or the bottom of the page (1) For the numerical attribute ak , k 1,2,/ , m , first find the range( >ak , ak max @ ) of the attribute ak ,the similarity of the case xi and the case x j on the attribute ak is: aik a jk Simak ( xi , x j ) (2) For the disordered attribute ak max ak (3.21) ak , k 1,2,/ , m , the similarity of the case xi and the case x j on the attribute ak is: Simak ( xi , x j ) (3) For the order type attribute 1 ® ¯0 aik a jk aik z a jk (3.22) ak , k 1,2,/ , m , the similarity of the case xi and the case x j on the attribute ak is: Ord (aik ) Ord (a jk ) Simak ( xi , x j ) (3.23) Card (ak ) Where Ord (aik ) and Ord (a jk ) denote the ordinal numbers of the attribute values aik and a jk in the range set, and Card (ak ) is the cardinality of the attribute For example, the attribute value of employee performance evaluation is "excellent, good, medium, bad, bad", the value of 4,3,2,1,0, and the attribute value is " The similarity of the case in this attribute is Sim ( x , x ) ˙0.5 a i j k (4) The overall similarity of the formula is as follows: m Sim( xi , x j ) ¦Z k Simak ( xi , x j ) (3.24) k Where Zk denotes the weight of the case attribute sum of the local cognition degree ak , that is to say, the overall cognition degree is the weighted Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 3.3 Case Matching Calculation Process Design Through the above analysis, CBR case retrieval in the design of case matching calculation is as follows: (1) When the target case input based on the general case of the target case, to determine their case class, search with matching case class; (2)The number of cases in the case class to determine whether it is greater than the system set threshold, if not, then similarity calculation based on the formula 3.21,3.22,3.23,3.24 traverse the case class to calculate all source case and the similarity between the target case; (3) With the increase of the size of the case base, when the number of cases in the case class is larger than the threshold set by the system, the fuzzy C-means clustering method is used firstly to find the cases with the highest degree of similarity with the target cases as the output results; (4) Calculate the distance from the target case to the centroid of each subclass and find the subclass of the nearest centroid of the target case; (5) Traverse all the sources in the subclass The case calculates its similarity with the target case, and outputs the source case with the highest similarity to the target case This case match the calculation process shown in Figure Figure This case matching the calculation flow chart 271 272 Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 Case study The following is an example of a case matching algorithm based on FCM clustering: {x1 , x2 ,/ , x11} library, each object xi ( i Case of a set of objects X 1,2,/ , n ) has four attributes are a1 , a2 , a3 , a4 ,where the property a1 and a4 continuous attribute type, a2 is the type of disorder and whose property values are two kinds of character, in the case of matching calculation When these two kinds of characters into numerical values, respectively and 1, a3 for the order type attribute and value of five species, in the case of matching calculation of these five values into numerical, respectively 1,2 , 3,4,5 x0 for the target case, as shown in Table 2: First of all, the attribute value of the normalized treatment, normalized treatment formula is as follows: aij' Where aij is the value of case xi a j max aij (4.11) a j max a j on attribute a j , aij' is the data of aij normalized processing, a j max and a j are the maximum and minimum values of attribute a j , respectively, which can be calculated from the data of case base Out, can also be directly assigned by the experts based on experience, this article directly calculated according to the case database data The normalized processing data is shown in Table Table 2: Object set and object case in the case library a1 a2 a3 a4 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x0 11 20 18 26 33 40 21 24 18 25 31 19 0 1 0 1 0 3 4 102 121 133 119 122 108 110 124 105 111 121 117 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x0 a1 0.310 0.241 0.517 0.759 0.345 0.448 0.241 0.483 0.69 0.276 a2 0 1 0 1 0 a3 0.5 0.75 0.5 0.5 0.25 0.75 0.75 0.25 0.75 a4 0.613 0.548 0.645 0.194 0.258 0.71 0.097 0.29 0.613 0.484 Table Normalized data The number of clusters is 3, the fuzzy weight is 2, and the maximum number of iterations is 100 In order to facilitate the calculation, the weights of the four attributes are all equal to 0.25, and FCM clustering is carried out on matlab7.1, The degree matrix is: Zhao Yamin et al / Procedia Engineering 174 (2017) 267 274 U ê0.568 ô0.153 ô ôơ0.279 0.036 0.049 0.049 0.794 0.791 0.761 0.472 0.852 0.106 0.096 0.097 0.479 0.099 0.100 0.113 0.142 0.013 0.730 0.739 0.074º 0.068 0.937 0.122 0.087 0.194»» 0.896 0.05 0.148 0.174 0.733»¼ From the membership matrix, we know that the three clusters are: x0 (0.276,0,0.75,0.484) The first cluster is {x1 , x4 , x5 , x6 , x9 , x10} ,the second cluster is {x3 , x8} , and the third cluster is {x2 , x7 , x11} ,At the same time, the coordinates of three cluster centers are: p1 (0.5476,0.9964,0.456,0.3323) p2 (0.3571,0.0389,0.7889,0.7852) p3 (0.4425,0.0989,0.2857,0.4081) The Euclidean distance of target cluster x0 (0.276,0, 0.75,0.484) to three cluster centroids is calculated d1 1.0844 , d2 0.3167 , d3 0.5088 From this distance, it should be in the second cluster to retrieve the case closest to the target case, respectively, the calculation of the target case and the source case x3 and x8 comprehensive awareness: Sim( x0 , x3 ) 0.5634 , Sim( x0 , x8 ) 0.8736 So the most similar case with the target case is x8 When there are a large number of cases in a certain case class, the fuzzy C-means clustering algorithm can quickly narrow the retrieval range and greatly improve the efficiency of case retrieval Conclusion In this paper, we focus on the CBM case retrieval process in the knowledge management in the case matching part of a detailed and in-depth study The link for the problems encountered by the corresponding solutions are summarized as follows: In the case matching calculation, considering the actual size of the case database, different case-matching calculation strategies are adopted for the cases of different sizes When the size of the case database is small, the calculation method of simpler cognition is adopted When the library size is large, even if there are a large number of cases in the same case class, a secondary retrieval strategy based on fuzzy C-means clustering is proposed We hope that the results of this paper will help enterprises to improve the efficiency of case retrieval Acknowledgements This work is supported by the Excellent Young Teacher Development Fund Project of Zhengzhou University (Foundation No 2015SKYQ15) References [1] Zhu Haodong, Zhong Yong.Feature Selection Method Based on Category Correlation and Cross Entropy [J] Journal of Zhengzhou University (Natural Science Edition), 2010,42 (2): 61-65 [2] Li Linlin, Sun Jiyin, Wan Lei Research on Decision Tree Knowledge Representation of Multi-fault Source Search Algorithm [J] Command Control & Simulation, 2007, 29 (3): 97-99 [3] Zhang Jianhua, Guo Zengmao, Liu Xiao.Study on Case Knowledge Representation Mechanism in Knowledge Management [J] Journal of Information, 2012,06: 112-115 [4] Wang Hao, Gao Jinji, Jiang Zhinong, et al Research on fault diagnosis system of rotating machinery based on case-based reasoning [J] Science Technology and Engi- neering, 2012,12 (29): 7585-7591 273 274 Zhao Yamin et al / Procedia Engineering 174 (2017) 267 – 274 [5] Zhang Jianhua.Similar structure model of knowledge management self-learning case parallel structure[J].Jour- nal of Information, 2014,10: 196-200 + 207 [6] Xiao Feng, Li Daxin.A Kind of MultimediaDatabase Content Retrieval Based on Neural Network [J] Journal of Zhengzhou University (Natural Science Edition),2002,34(2) :76-79 [7] Dong Xianyuan, Fang Shou-en, Wang Junhua.Traffic accident case retrieval optimization method [J] Journal of Tongji University: Natural Science Edition, 2012, 40 (5): 707-710 [8] Chen Ling, Chen Zhonghua, Zeng Hui- yan.Research on Case Retrieval in RCM Analysis System Based on Case-based Reasoning [J] Computer Engineering and Design, 2012, 33 (2): 581-585 [9] Chen Qian, Xiang Yang, Guo Xin and et al K-means clustering algorithm based on rough set in case retrieval [J] Computer Science, 2010, 37 (012): 161-164 [10] Quellec G, Lamard M, Cazuguel G, et al Case retrieval in medical databases by fusing heterogeneous information[J] Medical Imaging, IEEE Transactions on, 2011, 30(1): 108-118 [11] Kang Y B, Krishnaswamy S, Zaslavsky A A case retrieval approach using similarity and association knowledge[M]//On the Move to Meaningful Internet Systems: OTM 2011 Springer Berlin Heidelberg,2011:218-235

Định dạng
Số trang	8
Dung lượng	311,97 KB