Linguistic vector similarity measures and applications to linguistic information classification

Linguistic Vector Similarity Measures and Applications to Linguistic Information Classification Phong Pham Hong,1,∗ Son Le Hoang2 Faculty of Information Technology, National University of Civil Engineering, Hanoi, Vietnam VNU University of Science, Vietnam National University, Hanoi, Vietnam In this paper, we generalize the similarity degree for linguistic labels to the so-called the linguistic similarity measure Linguistic vector, whichcan be used to represent objects whose attributes are given in terms of linguistic labels, is defined Some mathematical properties are stated and proved The linguistic vector similarity measure is developed and applied to linguistic information classification Experimental results on real data confirm the effectiveness of the proposed method C 2016 Wiley Periodicals, Inc 1.1 INTRODUCTION Linguistic Labels and Linguistic Aggregation Operators In many situations, information is at best represented by linguistic labels.1 The transformation of these labels to numbers, in many cases, is costly and impossible.2 Example We consider the Car Evaluation Database,3 where each item has seven attributes, each of them takes linguistic values ranged in a corresponding label set (Table I) The label set, S, is mathematically described in many ways as follows (g is an even positive integer): r Finite and totally ordered discrete set:4 S = {s0 , s1 , , sg }; (1) r Subscript-symmetric linguistic evaluation scale5 : g g S = sα α = − , , −1, 0, 1, , 2 ∗ ; (2) Author to whom all correspondence should be addressed; e-mail: phphong84@yahoo.com INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL 00, 1–15 (2016) C 2016 Wiley Periodicals, Inc View this article online at wileyonlinelibrary.com • DOI 10.1002/int.21830 PHONG AND SON Table I Car evaluation database Attribute Linguistic values (label set) Buying price Price of the maintenance Number of doors Capacity in terms of persons to carry The size of luggage boot Estimated safety of the car Car acceptability Low, medium, high, very high Low, medium, high, very high Two, three, four, five or more Two, four, more Small, medium, big Low, medium, high Unacceptable (70.023%), Acceptable (22.222%), Good (3.993%), Very good (3.762%) r Multiplicative linguistic evaluation scale6 : S = sα α = g 1 g , , , 1, 2, , + +1 2 (3) Linguistic labels can be handled using linguistic aggregation operators, which are classified into four groups:7,8 the linguistic computational model based on membership functions, the linguistic computational model based on type-2 fuzzy sets, the linguistic symbolic computational models based on ordinal scales, and the linguistic symbolic computational models based on 2-tuple representation Xu9 introduced the computational model to improve the accuracy in processes of linguistic aggregation by extending the subscript-symmetric linguistic evaluation scale (Equation 2) to the continuous linguistic one, S¯ = {sα |α ∈ [−t, t]}, where t (t > g) is a sufficiently large positive integer If sα ∈ S, then sα is called an original linguistic label; otherwise, an extended (or virtual) linguistic label Example 2.10 The subscript-symmetric linguistic evaluation scale set (original linguistic terms) S = {s−3 , s−2 , s−1 , s0 , s1 , s2 , s3 } is extended to a continuous one ¯ for example, is a virtual linguistic S¯ = {sα |α ∈ [−3, 3]} The label s−0.3 ∈ S, label The linguistic weighted averaging (LWA) operator, which aggregates weighted continuous labels, was developed:11 LWA : S¯ where α¯ = j = , n, 1.2 n j =1 wj αj and w n j =1 wj = n −→ S, sα1 , , sα1 → sα¯ , (4) = (w1 , , wn ) is the weight vector, wj ≥ for all Deviation Measures and Similarity Degrees for Linguistic Labels Xu12 developed the deviation measures and the similarity degrees between two linguistic labels, sα and sβ , in the subscript-symmetric linguistic evaluation scale S given in Equation 2: International Journal of Intelligent Systems DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES r the deviation degree: d(sα , sβ ) = |α−β| and r the similarity degree: ρ(sα , sβ ) = g− d(sα , sβ ) 1.3 The Contributions and Organization of the Paper This paper contains two main contributions: (1) Some new concepts are defined: linguistic similarity measures, linguistic vectors, and linguistic vector similarity measures; (2) The aforementioned concepts are used to construct an algorithm to classify linguistic information (linguistic classification algorithm, LCA) The rest of the paper is organized as follows: r In Section 2, we define linguistic similarity measure and introduced some instances of concept; r this In the next section, linguistic vector is defined Then, linguistic vector similarity measures developed as combinations of the linguistic similarity measures; r are Section is devoted to propose linguistic classification and modified linguistic classifi- r cation algorithms (LCA and MLCA) The algorithms are applied on a real data set, and some statistical criteria are also assessed; The last section is conclusion LINGUISTIC SIMILARITY MEASURES Consider a linguistic scale S = s0 , s1 , , sg , such that:4 si ≥ sj iff i ≥ j , for all si , sj ∈ S DEFINITION (Linguistic similarity measure) Let sim : S × S → R, where R is a set of all real numbers sim is called a linguistic similarity measure if it satisfies the following conditions: (A1) (A2) (A3) (A4) sim (si , sj ) = sim (sj , si ), for all si , sj ∈ S; ≤ sim (si , sj ) ≤ 1, for all si , sj ∈ S; sim (si , sj ) = ⇔ si = sj , for all si , sj ∈ S; If si ≤ sj ≤ sk , sim (si , sj ) ≥ sim (si , sk ) and sim (sj , sk ) ≥ sim (si , sj ), for all si , sj , sk ∈ S DEFINITION For all si , sj ∈ S, we define: (1) sim1 (si , sj ) = (2) sim2 (si , sj ) = g−|i−j | , g min{i,j } , max{i,j } | 1−exp(− |i−j g ) (3) sim3 (si , sj ) = − (4) sim4 (si , sj ) = − , and |) 1−exp(−1) 1−exp(−1) | 1−exp(− √ √ i− j √ g Notice that to avoid the denominator being zero, set sim2 International Journal of Intelligent Systems 0 = in the definition of DOI 10.1002/int PHONG AND SON THEOREM sim1 , sim2 , sim3 , sim4 are linguistic similarity measures Proof It is easily seen that sim1 , sim2 , sim3 , sim4 satisfy the condition (A1) (1) Consider sim1 (A2) Since ≤ |i − j | ≤ g, ≤ g − |i − j | ≤ g | =⇒ ≤ g−|i−j ≤1 g =⇒ ≤ sim1 (si , sj ) ≤ (A3) sim1 si , sj = ⇐⇒ |i − j | = ⇐⇒ i = j ⇐⇒ si = sj (A4) If si ≤ sj ≤ sk , i.e i ≤ j ≤ k, |i − j | = j − i ≤ k − i = |i − k| =⇒ g − |i − j | ≥ g − |i − k| | ≥ g−|i−k| =⇒ g−|i−j g g We thus obtain sim1 (si , sj ) ≥ sim1 (si , sk ) and similarly for sim1 (sj , sk ) ≥ sim1 (si , sk ) (2) Consider sim2 (A2) ≤ min{si , sj } ≤ max{si , sj } min{i,j } =⇒ ≤ max{i,j } ≤1 =⇒ ≤ sim2 (si , sj ) ≤ (A3) sim2 (si , sj ) = ⇐⇒ {i, j } = max {i, j } ⇐⇒ i = j ⇐⇒ si = sj (A4) If si ≤ sj ≤ sk , i.e i ≤ j ≤ k, min{si , sj } i i {si , sk } = ≥ = max{si , sj } j k max {si , sk } This shows that sim2 (si , sj ) ≥ sim2 (si , sk ) Similarly, sim2 (sj , sk ) ≥ sim2 (si , sk ) International Journal of Intelligent Systems DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES (3) Consider sim3 (A2) ≤ |i − j | ≤ g =⇒ ≤ =⇒ −1 |i−j | ≤1 g |i−j | ≤− g ≤ | ≤1 =⇒ exp (−1) ≤ exp − |i−j g | ≤ − exp (−1) =⇒ ≤ − exp − |i−j g =⇒ ≤ | 1−exp − |i−j g ≤1 1−exp(−1) 1−exp | − |i−j g =⇒ ≤ − 1−exp(−1) ≤ =⇒ ≤ sim3 (si , sj ) ≤ (A3) sim3 si , sj = ⇐⇒ − ⇐⇒ | 1−exp − |i−j g =1 1−exp(−1) | 1−exp − |i−j g 1−exp(−1) =0 | =0 ⇐⇒ − exp − |i−j g | =1 ⇐⇒ exp − |i−j g | ⇐⇒ − |i−j =0 g ⇐⇒ i = j ⇐⇒ si = sj (A4) If si ≤ sj ≤ sk , i.e., i ≤ j ≤ k, =⇒ =⇒ =⇒ =⇒ |i−j | = j −i ≤ k−i = |i−k| g g g g | |i−k| − |i−j ≥ − g g |i−j | ≥ exp − |i−k| exp − g g |i−j | ≤ − exp − exp − g | 1−exp − |i−j g 1−exp(−1) =⇒ − ≤ | 1−exp − |i−j g 1−exp(−1) − |i−k| g 1−exp − |i−k| g 1−exp(−1) ≥1− 1−exp − |i−k| g 1−exp(−1) ) This proves sim3 (si , sj ) ≥ sim3 (si , sk ) In exactly the same way, sim3 (sj , sk ) ≥ sim3 (si , sk ) International Journal of Intelligent Systems DOI 10.1002/int PHONG AND SON (4) We finally consider sim4 (A2) √ √ √ i− j ≤ g √ √ | i− j | =⇒ ≤ √g ≤ √ √ | i− j | =⇒ −1 ≤ − √g ≤ √ √ | i− j | ≤1 =⇒ exp (−1) ≤ exp − √g √ √ | i− j | ≤ − exp (−1) =⇒ ≤ − exp − √g √ √ j i− | | 1−exp − √g =⇒ ≤ ≤1 1−exp(−1) √ √ i− | j| 1−exp − √g =⇒ ≤ − ≤1 1−exp(−1) =⇒ ≤ sim4 (si , sj ) ≤ 0≤ (A3) sim4 si , sj = √ | i−√j | 1−exp − √g =1 ⇐⇒ − 1−exp(−1) √ √ i− j | | 1−exp − g ⇐⇒ =0 1−exp(−1) √ √ | i− j | =0 ⇐⇒ − exp − g √ √ | i− j | =1 ⇐⇒ exp − g √ √ | i− j | =0 ⇐⇒ − g ⇐⇒ i = j ⇐⇒ si = sj (A4) If si ≤ sj ≤ sk , i.e i ≤ j ≤ k, √ √ √ √ √ √ √ i− j | k| | i− j− i k− √ √ √ i = √ = ≤ g g g g √ √ √ √ | i− j | | i− k| − √g ≥ − √g √ √ √ √ | i− j | | i− k| exp − √g ≥ exp − √g √ √ √ √ | i− k| | i− j | ≤ − exp − √g − exp − g √ | =⇒ =⇒ =⇒ =⇒ √ 1−exp − | √ i− j √ g 1−exp(−1) =⇒ − 1−exp | √ | − ≤ | √ i− j √ g 1−exp(−1) 1−exp − √ √ i− k √ g | | 1−exp(−1) ≥1− √ 1−exp − International Journal of Intelligent Systems | √ i− k √ g 1−exp(−1) | ) DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES This proves sim4 (si , sj ) ≥ sim4 (si , sk ) In exactly the same way, sim4 (sj , sk ) ≥ sim4 (si , sk ) LINGUISTIC VECTOR SIMILARITY MEASURES DEFINITION (Linguistic vector) We define a linguistic vector having the length of n as V = (v1 , , ) , where vt , which represents the linguistic value of the t-th attribute, is a linguistic label in the tth linguistic scale S t = {s1t , , sgt t } (t = 1, , n) From now on, S denotes the set of all n-length linguistic vector DEFINITION (Linguistic vector similarity measure) Let SIM : S × S → R SIM is called a linguistic vector similarity measure if it satisfies the follows (B1) (B2) (B3) (B4) SIM(U, V ) = SIM(V , U ), for all U , V ∈ S; ≤ SIM(U, V ) ≤ 1, for all U , V ∈ S; SIM(U, V ) = ⇔ U = V , for all U , V ∈ S; If U ≤ V ≤ T , then SIM(U, V ) ≥ SIM(U, T ) and SIM(V , T ) ≥ SIM(U, T ), for all U , V , T ∈ S (for U = (u1 , , un ) and V = (v1 , , ), U ≤ V means ut ≤ vt for all t = 1, , n) DEFINITION Let U = (u1 , , un ), V = (v1 , , ) ∈ S, sim is a linguistic similarity measure, and w = (w1 , , wn ) is a weighting vector satisfying wt ≥ 0, for all t = 1, , n, and nt=1 wt = We define: (1) The quadric linguistic similarity measure between U and V : n SIMQ (U, V ) = wt (sim (ut , vt )) t=1 (2) The arithmetic linguistic similarity measure between U and V : n SIMA (U, V ) = wt sim (ut , vt ) t=1 (3) The geometric linguistic similarity measure between U and V : n SIMG (U, V ) = (sim (ut , vt ))wt t=1 International Journal of Intelligent Systems DOI 10.1002/int PHONG AND SON (4) The harmonic linguistic similarity measure between U and V : n SIMH (U, V ) = t=1 wt sim (ut , vt ) −1 To avoid the violation mathematical rules, set 00 = in the definition of SIMG and set w0t = in definition of SIMH THEOREM For all U , V ∈ S, SIMQ (U, V ) ≥ SIMA (U, V ) ≥ SIMG (U, V ) ≥ SIMH (U, V ) Proof r Using n t=1 xt2 )( 1/2 n t=1 (wt ) the Cauchy–Schwarz inequality, ( (x1 , , xn ), (y1 , , yn ) ∈ R , note that n n = ≥ = n t=1 = yt2 ) ≥ ( n t=1 n t=1 xt yt ) for all wt = 1, wt (sim (ut , vt ))2 t=1 n 1/2 n t=1 n 1/2 wt sim (ut , vt ) wt t=1 1/2 1/2 wt wt sim (ut , vt ) t=1 n wt sim (ut , vt ) t=1 means (SIMQ (U, V )) ≥ (SIMA (U, V )) , or SIMQ (U, V ) ≥ SIMA (U, V ) r That Using the inequality of weighted arithmetic and geometric means (weighted AM-GM 2 inequality), ( nt=1 wt xt )/w ≥ ( n t=1 wt > 0, we have n t=1 xtwt ) w for all xt ≥ 0, wt ≥ (t = 1, , n), w = n wt sim (ut , vt ) = t=1 n n wt sim (ut , vt ) / t=1 wt t=1 n ≥ = 1/ n (sim (ut , vt )) wt wt t=1 t=1 n (sim (ut , vt ))wt t=1 So, SIMA (U, V ) ≥ SIMG (U, V ) International Journal of Intelligent Systems DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES r Using the weighted AM-GM inequality, n = ≥ = = wt sim(ut ,vt ) t=1 n wt (sim (ut , vt ))−1 / t=1 n wt t=1 (sim (ut , vt ))−1 t=1 n n wt wt (sim(ut ,vt )) t=1 n −1 (sim (ut , vt ))wt t=1 This proves SIMH (U, V ) (SIMH (U, V ))−1 ≥ (SIMG (U, V ))−1 , or SIMG (U, V ) ≥ THEOREM If wt > for all t = 1, , n, SIMQ , SIMA , SIMG , SIMH are linguistic vector similarity measures Proof Obviously, SIMQ , SIMA , SIMG , SIMH satisfy (B1) (B2) For all U , V ∈ S, by theorem 3.1, it is sufficient to prove that SIMQ (U, V ) ≤ for all U , V ∈ S By the fact that sim (ut , vt ) ≤ 1, ∀t = 1, , n, we obtain SIMQ (U, V) = ≤ n 2 wt (sim (ut , vt )) t=1 n wt = t=1 (B3) For all U , V ∈ S, by Theorem 2, if SIMH (U, V ) equals to 1, three values SIMQ (U, V ), SIMA (U, V ), SIMG (U, V ) equals to So, it is sufficient to prove (B3) for SIMH We have SIMH (U, V ) = wt = wt , ∀t = 1, , n ⇐⇒ sim(u t ,vt ) ⇐⇒ sim (ut , vt ) = 1, ∀t = 1, , n ⇐⇒ simut = vt , ∀t = t = 1, , n ⇐⇒ U = V (B4) We now examine the last condition By U = (u1 , , un ) ≤ V = (v1 , , ) ≤ T = (τ1 , , τn ), we have ut ≤ vt ≤ τt , for all t = 1, , n This International Journal of Intelligent Systems DOI 10.1002/int 10 PHONG AND SON implies sim (ut , vt ) ≥ sim (ut , τt ) , ∀t = 1, , n (5) Using (5), we will show that SIMQ (U, V ) ≥ SIMQ (U, T ), SIMA (U, V ) ≥ SIMA (U, T ), SIMG (U, V ) ≥ SIMG (U, T ), and SIMH (U, V ) ≥ SIMH (U, T ) (the remainders are runs as before) We have (sim (ut , vt ))wt ≥ (sim (ut , τt ))wt , ∀t 1, , n = =⇒ n wt (sim (ut , vt )) t=1 n ≥ wt (sim (ut , τt )) 2 t=1 =⇒ SIMQ (U, V ) ≥ SIMQ (U, T ) ; wt sim (ut , vt ) ≥ wt sim (ut , τt ) , ∀t = 1, , n =⇒ n wt sim (ut , vt ) ≥ t=1 n wt sim (ut , τt ) t=1 =⇒ SIMA (U, V ) ≥ SI M A (U, T ) ; (sim (ut , vt ))wt ≥ (sim (ut , τt ))wt ∀t = 1, , n =⇒ n (sim (ut , vt ))wt ≥ t=1 n (sim (ut , τt ))wt t=1 =⇒ SIMG (U, V ) ≥ SIMG (U, T ) ; =⇒ =⇒ , ∀t = 1, , n sim(ut ,τt ) wt ∀t = 1, , n sim(ut ,τt ) −1 −1 n wt wt ≥ sim(ut ,vt ) sim(ut ,τt ) t=1 t=1 sim(ut ,vt ) wt sim(ut ,vt ) n ≤ ≤ =⇒ SIMH (U, V ) ≥ SIMH (U, T ) AN APPLICATION TO LINGUISTIC INFORMATION CLASSIFICATION In this section, we use the linguistic similarity measure and the linguistic vector similarity measure to classify items with attributes given as linguistic labels 4.1 The LCA Algorithm Let D be a data set whose items are described by (n + 1) attributes Each attribute At takes values in corresponding linguistic scale St = {s0(t) , , sg(t)t }, t = 1, , (n + 1).D is classified on the (n + 1)-th attribute, An+1 International Journal of Intelligent Systems DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES 11 Linguistic Classification Algorithm (LCA) Let D1 ⊂ D be the training set having the cardinality of n1 Consider an item K ∈ D\D1 , with V = (v1 , , ) is the associated attribute vector containing values of n first attributes, where vt is the linguistic value of the tth attribute, for all t = 1, , n (1) For each Ij ∈ D1 , Vj denotes its attribute vector containing values of n first attributes of Ij (j = 1, , n1 ) The similarity between the items K and Ij is determined as the similarity between two linguistic vector V and Vj SIM K, Ij = SIM V , Vj (2) Aggregate the values of the (n + 1)-th attribute of all items Ij (j = 1, , n1 ) using the LWA operator (Equation 4) u = sα¯ = LWA u1 , un1 , where uj is the value of (n + 1)-th attribute of the item Ij (j = 1, , n1 ), the weighting SIM(K,Ij ) vector is w = (w1 , , wn1 ) with wj = n1 (j = 1, , n1 ) j =1 ∗ SIM(K,Ij ) (3) Evaluate value u of the (n + 1)-th attribute of the item K: predicted value of the (n + 1)-th attribute of K is u∗ = sl(n+1) , where l = round(α) ¯ Remark Step of algorithm LCA can be refined using two substeps as follows The modification is termed as modified linguistic classification algorithm (MLCA) (2a) Specify N nearest neighbors of K (1 ≤ N ≤ n1 ) That is to choose Ij∗ ∈ D1 (j = 1, , N) being N most similar to K according to the similarity in the previous step N is an adjustable integer determined by experiments (2b) Aggregate the values of the (n + 1)-th attribute of the items Ij∗ (j = 1, , N) using the LWA operator (eq (4)) u = sα¯ = LWA u∗1 , u∗N , where u∗j is the value of (n + 1)-th attribute of the item Ij∗ (j = 1, , N ), the weighting vector is w = (w1 , , wN ) with wj = SIM (K,Ij∗ ) N j =1 4.2 SIM (K,Ij∗ ) (j = 1, , N) An Illustrative Example In this example, Mushroom Database13 is used Each item has seven attributes with the corresponding linguistic scales being listed as in Table II In Table III, u∗1 and u∗2 being the values to be determined Consider sim = sim3 and SIM = SIMA , we have SIM (K1 , I1 ) = 0.5471505; SIM (K1 , I2 ) = 0.6109411; SIM (K1 , I3 ) = 0.5832141; SIM (K1 , I4 ) = 0.5549266; SIM (K1 , I5 ) = 0.8150827; SIM (K1 , I6 ) = 0.5558733; International Journal of Intelligent Systems DOI 10.1002/int 12 PHONG AND SON Table II Descriptions of Mushroom Database Attribute Linguistic values (label set) (1) s0 (1) s2 (1) s4 (1) s6 (1) s8 (2) s0 (2) s2 (2) s4 (2) s6 (2) s8 (3) s0 (3) s2 (4) s0 (4) s2 (4) s4 (4) s6 (4) s8 (5) s0 (5) s2 (5) s4 (5) s6 (5) s8 (6) s0 (6) s2 (6) s4 (6) s6 (7) s0 Cap Color Odor Stalk Surface Below Ring Stalk Color Above Ring Spore Print Color Habitat Edible OR Poisonous (1) = green, s1 = purple = cinnamon, s3 = white = gray, s5 = brown = red, s7 = pink = yellow, s9 = buff = almond, s1 = anise = none, s3 = creosote = f ishy, s5 = f oul = musty, s7 = pungent (1) (1) (1) (1) (2) (2) (2) (2) = spicy (3) = f ibrous, s1 = scaly = smooth, s3 = silky = gray, s1 = orange = red, s3 = white = pink, s5 = brown = cinnamon, s7 = buff (3) (4) (4) (4) (4) = yellow (5) = buff , s1 = orange = purple, s3 = yellow = brown, s5 = black = white, s7 = chocolate (5) (5) (5) = green (6) = waste, s1 = meadows = grasses, s3 = woods = leaves, s5 = urban (6) (6) = paths (7) = edible s1 = poisonous SIM (K1 , I7 ) = 0.6480977; SIM (K1 , I8 ) = 0.6596165; SIM (K1 , I9 ) = 0.5024457; SIM (K1 , I10 ) = 0.5832141 LWA is the LWA operator with corresponding weight vector SIM (K1 ,Ij ) (7) Then, LWA(u1 , , u10 ) = s0.2414265 w = (w1 , , w10 ), wj = 10 j =1 SIM (K1 ,Ij ) Moreover, round (0.2414265) = u∗2 = s0 = edible So, International Journal of Intelligent Systems u∗1 = s0 = edible DOI 10.1002/int Similarly, 13 LINGUISTIC VECTOR SIMILARITY MEASURES Table III Example Item Cap Color I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 K1 K2 yellow brown red white red red gray red gray gray brown red Odor almond none none anise foul none foul none none none spicy none Stalk Surface Below Ring Stalk Color Above Ring Spore Print Color Habitat Edible OR Poisonous smooth smooth smooth smooth smooth smooth silky smooth fibrous smooth silky smooth white gray gray white pink gray buff pink white gray pink white black black black brown white brown chocolate brown brown black white white grasses woods woods grasses leaves woods paths woods grasses woods leaves waste u1 = edible u2 =edible u3 =edible u4 =edible u5 =poisonous u6 =edible u7 =poisonous u8 =edible u9 =edible u10 =edible u∗1 u∗2 Table IV Experimental results of LCA (sim = sim1 , SIM = SIMA ) Fold 10 ME MAE RMSE −0.09573793 −0.09348888 −0.09176206 −0.09397043 −0.09247883 −0.09213617 −0.09226791 −0.09314711 −0.09187319 0.10978855 0.09859072 0.09514453 0.09645504 0.09413080 0.09333952 0.09333360 0.09389634 0.09302364 0.3145499 0.3038174 0.3006494 0.3028269 0.2992854 0.2971464 0.2976559 0.2989093 0.2963529 Table V Experimental results of LCA (sim = sim3 , SIM = SIMA ) Fold 10 ME MAE RMSE −0.06496693 −0.07633910 −0.07713243 −0.07733896 −0.07928631 −0.07187470 −0.07829437 −0.07189530 −0.08156118 0.06691246 0.07837162 0.09585902 0.07929018 0.07928631 0.07390105 0.07829437 0.07416287 0.08400020 0.2586038 0.2741455 0.2885403 0.2621525 0.2655189 0.2688542 0.2710520 0.2687646 0.2840986 4.3 4.3.1 Experimental Results Experimental Results of LCA Table IV (resp Table V) shows the experimental results of LCA with sim = sim1 (resp sim = sim3 ) and SIM = SIMA The data set consists of 1000 elements randomly selected from the Mushroom Database.13 The k-fold cross-validation International Journal of Intelligent Systems DOI 10.1002/int 14 PHONG AND SON Table VI Experimental results of MLCA (sim = sim1 , SIM = SIMA , N = 3) Fold 10 ME MAE RMSE 0.0005125113 −0.0010548523 0.0009212003 −0.0007983656 0.0009861933 0.0042299136 0.0027593889 0.0007608279 −0.0012153702 0.012852980 0.010748772 0.012819658 0.010913523 0.008959768 0.011984390 0.011284324 0.009070657 0.006691561 0.11298937 0.09885251 0.10847649 0.10293204 0.06623328 0.10438013 0.09629206 0.07599228 0.05713187 Table VII Experimental results of MLCA (sim = sim3 , SIM = SIMA , N = 3) Fold 10 ME MAE RMSE 0.008731433 0.003459078 0.002996333 0.004065396 0.002942613 0.007035607 0.005033495 0.003227953 0.002939000 0.012986752 0.007802183 0.007012397 0.007967835 0.006730492 0.007035607 0.005033495 0.003227953 0.002939000 0.11395531 0.08445474 0.08140801 0.08821215 0.05620564 0.05337777 0.04951553 0.02624510 0.02967192 method with k from to 10 is used to observe the changes of mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) In Tables IV and V, the better ones are marked in bold Remark The experimental results show that the use ofsim3 give better results than sim1 (Xu’s similarity degree) Hence, depending on the problem, it is necessary to choose an appropriate linguistic similarity measure instead of Xu’s similarity degree 4.3.2 Experimental Results of MLCA This part shows the experimental results of MLCA (Tables VI and VII) in conditions similar to LCA Here, N is selected as We see that, with a small change, MLCA gives much better results than LCA In addition, the use sim3 is still better than sim1 CONCLUSIONS In this paper, we proposed the notions of the linguistic similarity measure, the linguistic vector, and the linguistic vector similarity measure These measures were applied to the linguistic classification problem: LCA and MLCA algorithms International Journal of Intelligent Systems DOI 10.1002/int LINGUISTIC VECTOR SIMILARITY MEASURES 15 However, the new concepts have been defined for a specific language scale only Additionally, the performance of proposed methods was not compared with those of other methods They are the issues that will be studied in the future References 10 11 12 13 Zadeh LA, Kacprzyk J Computing with words in information/intelligent systems—part 1: foundations: part 2: applications Heidelberg, Germany: Physica-Verlag; 1999 Vol Chen SJ, Hwang CL Fuzzy multiple attribute decision making: Methods and applications Berlin: Springer-Verlag; 1992 Marko Bohanec’s Data 1997 Available at http://archive.ics.uci.edu/ml/machine-learningdatabases/car/car.data (access date: 06/21/2016) Yager RR A new methodology for ordinal multiobjective decisions based on fuzzy sets Decis Sci 1981;12:589–600 Dai YQ, Xu ZS, Da QL New evaluation scale of linguistic information and its application Chin J Manag Sci 2008;16(2):145–349 Xu ZS EOWA and EOWG operators for aggregating linguistic labels based on linguistic preference relations Int J Uncertainty, Fuzziness Knowl-Based Syst 2004;12:791–810 Herrera F, Alonso S, Chiclana F, Herrera-Viedma E Computing with words in decision making: foundations, trends and prospects Fuzzy Optim Decis Making 2009;8(4):337– 364 Martinez L, Ruan D, Herrera F Computing with words in decision support systems: an overview on models and applications Int J Comput Intell Syst 2010;3(4):382–395 Xu ZS Group decision making based on multiple types of linguistic preference relations Inform Sci 2008;178:452–467 Xu ZS A method based on linguistic aggregation operators for group decision making with linguistic preference relations Inform Sci 2004;166:19–30 Xu ZS On generalized induced linguistic aggregation operators Inter J Gen Syst 2006;35:17–28 Xu ZS Deviation measures of linguistic preference relations in group decision making Omega 2005;33:249–254 The Audubon Society Field Guide to North American Mushrooms; 1981 Available at http://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data (access date: 06/21/2016) International Journal of Intelligent Systems DOI 10.1002/int ... defined: linguistic similarity measures, linguistic vectors, and linguistic vector similarity measures; (2) The aforementioned concepts are used to construct an algorithm to classify linguistic information. .. notions of the linguistic similarity measure, the linguistic vector, and the linguistic vector similarity measure These measures were applied to the linguistic classification problem: LCA and MLCA... section, linguistic vector is defined Then, linguistic vector similarity measures developed as combinations of the linguistic similarity measures; r are Section is devoted to propose linguistic classification

Định dạng
Số trang	15
Dung lượng	134 KB