DSpace at VNU: Generalized picture distance measure and applications to picture fuzzy clustering tài liệu, giáo án, bài...
G Model ARTICLE IN PRESS ASOC 3589 1–12 Applied Soft Computing xxx (2016) xxx–xxx Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Generalized picture distance measure and applications to picture fuzzy clustering Le Hoang Son Q2 Q3 VNU University of Science, Vietnam National University, 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam 20 a r t i c l e i n f o a b s t r a c t 10 11 12 Article history: Received 14 January 2016 Received in revised form May 2016 Accepted May 2016 Available online xxx 13 19 Keywords: Clustering quality Hierarchical fuzzy clustering Intuitionistic fuzzy sets Picture distance measure Picture fuzzy sets 21 Introduction 14 15 16 17 18 Picture fuzzy set (PFS), which is a generalization of traditional fuzzy set and intuitionistic fuzzy set, shows great promises of better adaptation to many practical problems in pattern recognition, artificial life, robotic, expert and knowledge-based systems than existing types of fuzzy sets An emerging research trend in PFS is development of clustering algorithms which can exploit and investigate hidden knowledge from a mass of datasets Distance measure is one of the most important tools in clustering that determine the degree of relationship between two objects In this paper, we propose a generalized picture distance measure and integrate it to a novel hierarchical picture fuzzy clustering method called Hierarchical Picture Clustering (HPC) Experimental results show that the clustering quality of the proposed algorithm is better than those of the relevant ones © 2016 Elsevier B.V All rights reserved 42 Since fuzzy set (FS) [49] was firstly introduced by Zadeh in 1965, many extensions of FS have been proposed in the literature such as the type-2 fuzzy set (T2FS) [18], rough set (RS) [24], soft set, rough soft set and fuzzy soft set [15], intuitionistic fuzzy set (IFS) [3], intuitionistic fuzzy rough set (IFRS) [51], soft rough fuzzy set & soft fuzzy rough set [19], interval-valued intuitionistic fuzzy set (IVIFS) [38] and hesitant fuzzy set (HFS) [32] The aim of those extensions is to overcome the limitations of FS regarding the degree of fuzziness, the uncertainty of membership degrees, and the existence of neutrality Recently, a new generalized fuzzy set called picture fuzzy set (PFS) has been proposed by Cuong and Kreinovich in Ref [6] The word “picture” in PFS refers to generality as this set is the direct extension of FS and IFS In the other words, PFS integrates information of neutral and negative into its definition so that when the value(s) of one (both) of those degrees is (are) equal to zero, it returns to IFS (FS) set Comparing with IFS, PFS divides the hesitancy degree into two parts, i.e., refusal degree and neutral degree (see Definition and Examples and for details) This set shows great promises of better adaptation to many practical problems in pattern recognition, artificial life, robotic, expert and knowledge-based systems than some existing types of fuzzy sets 43Q4 Definition A picture fuzzy set (PFS) [6] in a non-empty set X is, 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 E-mail addresses: sonlh@vnu.edu.vn, chinhson2002@gmail.com A= x, A (x) , ÁA (x) , A (x) |x ∈ X , where A (x) is the positive degree of each element x ∈ X, ÁA (x) is the neutral degree and A (x) is the negative degree satisfying the constraints, A (x) , ÁA (x) , A (x) ∈ [0, 1] , ∀x ∈ X, ≤ A (x) + ÁA (x) + A (x) ≤ 1, ∀x ∈ X The refusal degree of an element is calculated as A (x) = − ( A (x) + ÁA (x) + A (x)), ∀x ∈ X In the case ÁA (x) = PFS returns to the IFS set, and when both ÁA (x) = A (x) = 0, PFS returns to the FS set Some properties of PFS operations, the convex combination of PFS, etc accompanied with proofs can be referenced in Ref [6] Example In a democratic election station, the council issues 500 voting papers for a candidate The voting results are divided into four groups accompanied with the number of papers namely “vote for” (300), “abstain” (64), “vote against” (115) and “refusal of voting” (21) Group “abstain” means that the voting paper is a white paper rejecting both “agree” and “disagree” for the candidate but still takes the vote Group “refusal of voting” is either invalid voting papers or bypassing the vote This example was happened in reality and IFS could not handle it since the neutral membership (group “abstain”) does not exist Example Personnel selection is a very important activity in the human resource management of an organization The process of selection follows a methodology to collect information about http://dx.doi.org/10.1016/j.asoc.2016.05.009 1568-4946/© 2016 Elsevier B.V All rights reserved Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 68 69 70 71 72 73 74 75 76 77 78 an individual in order to determine if that individual should be employed The selection results could be classified into classes: true positive, true negative, false negative, and false positive which are somehow equivalent to the positive, neutral, negative and refusal degrees of PFS Each candidate is ranked according to classes by his ability and suitability for the job, and the final decision is made based on results of the classes For example, if two candidates are ranked A-(50%, 20%, 20%, 10%) and B-(40%, 10%, 30%, 20%), the final decision can be made through the union operator and maximum of the positive degree in PFS which returns the value of 50% (A is selected) An emerging trend in PFS and other advanced fuzzy sets is the development of soft computing methods especially clustering algo80 rithms on these sets, which could produce better quality of results 81 than that on FS For instance, clustering algorithms on interval 82 T2FS focusing on uncertainty associated with the fuzzifier were 83 investigated in Refs [14,52] Regarding the IFS set, Pelekis et al 84 85Q5 [23] proposed a clustering approach utilizing a similarity-metric defined over IFS Xu and Wu [45] developed the IFCM algorithm 86 87 to classify IFS and interval-valued IFS Son et al [26] proposed an 88 intuitionistic fuzzy clustering algorithm for geo-demographic anal89 ysis Xu and his group developed a number of intuitionistic fuzzy 90 clustering methods in various contexts [36,37,39,42] Fuzzy clus91 tering algorithms on other sets namely HFS and PFS were found 92 in Refs [4,27] It is clear from the literature that distance measure 93 is the most important factor for an efficient clustering algorithm 94 The most widely used distance measures for two FSs A and B on 95 X = X1 , , XN is the Hamming, Euclidean and Hausdorff metrics 96 [6] Because of the FS’s drawbacks, distance measures on other sets 97 mostly IFS have been proposed Atanassov [3], Chen [5], Dengfeng 98 and Chuntian [7], Grzegorzewski [10], Hatzimichailidis et al [11], 99 Hung and Yang [12,13], Li et al [16], Liang and Shi [17], Mitchell 100 [21], Papakostas et al [22], Szmidt and Kacprzyk [28–30], Wang and 101 Xin [35], Xu and Chen [41], Xu and Xia [46], Yang and Chiclana [47] 102 and Xu [44] presented some distance measures in IFS namely the 103 (normalized) intuitionistic Hamming and Euclidean distances, and 104 the (normalized) Hausdorff intuitionistic Hamming and Euclidean 105 distances A basic distance measure on PFS has been given by Cuong 106 and Kreinovich [6] as follows 79 107 108 = ( A (xi ) − B p p (xi )) + (ÁA (xi ) − ÁB (xi )) + ( A (xi ) − B (xi )) p 110 111 Q6 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 We recognize that dP (A, B) is a generalization of those in IFS and FS when ÁA (x) = and both ÁA (x) = A (x) = 0, respectively As explained above, the integration of neutral degree ÁA (x) would measure information of objects more accurately and increase quality and accuracy of achieved results Yet again, to help improving the performance as motivated by the previous researches on IFS that tended to combine some basic distance measures into a complex one to improve the generality and accuracy, in this paper we propose a novel generalized picture distance measure and use it in a new clustering method on PFS called Hierarchical Picture Clustering (HPC) The reason for designing a new measure can be illustrated by an example as follows Consider that we would like to measure the truth-value of the proposition G = “through a point exterior to a line one can draw only one parallel to the given line” The proposition is incomplete, since it does not specify the type of geometrical space it belongs to In an Euclidean geometric space the proposition G is true; in a Riemannian geometric space the proposition G is false (since there is no parallel passing through an exterior point to a given line); in a geometric space covering the PFS set (constructed from mixed spaces, for example from a 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 170 2N | A (xi ) − B ˛ ˛ (xi ) | + |ÁA (xi ) − ÁB (xi ) | + | A (xi ) − B (xi ) | ˛ 171 i=1 i=1 133 1/˛ N N 132 169 = 1/p 131 d (A, B) dP (A, B) N 109 part of Euclidean subspace together with another part of Riemannian space) the proposition G is indeterminate (true and false in the same time) [48] It is obvious that objects, notions, ideas, etc can be better measured in PFS than in other types of fuzzy sets The main differences of the proposed distance measure with dP (A, B) and those on IFS such as in Xu [44] are highlighted as follows Firstly, as being shown above, dP (A, B) is a natural expansion of the well-known Minkowski distance of order p ≥ between two points under fuzzy environments When p = or p = 2, we have the Manhattan and Euclidean distances, respectively In the limiting case of p reaching infinity, we obtain the Chebyshev distance The Minkowski distance has the best performance for numerical data but works ineffectively with asymmetric binary variables, nonmetric vector objects, etc [20] For example, the similarity between two vectors can be denoted as a cosine measure which is further used to define a distance [48] For asymmetric binary variables, the contingency table, which reflects the matching states between two objects, is used to compute the distance between asymmetric binary variables [25] It is very often that a non-linear function is adopted as the distance metric for processing non-spherical data [9] One of the most common ways to create such the function is combining the basic distance measures into a complex one so that the deficiencies of the standalone metrics are settled This intuition leads to debut of the proposed measure which may enhance performance and accuracy of results Secondly, the proposed measure is a combination of the Hamming, Euclidean and Hausdorff distances It is different to dP (A, B) which in essence is the normalized form of well-known Minkowski distance of order p ≥ In the next section, we will explain why the hybridization should be made and emphasize on the advantages and disadvantages of using the proposed measure However, it is noted that the proposed distance measure is a generalization version of dP (A, B) Thirdly, the proposed distance measure is different to those on IFS such as in Xu [44] in many aspects Let us take some examples In Ref [44], Xu generalized the intuitionistic Hamming and Euclidean distances of Szmidt and Kacprzyk [28] as below 172 He then defined several similarity measures from the above distance function, for instance: 173 174 175 s (A, B) 176 1/˛ N 2N =1− | A (xi ) − B ˛ ˛ (xi ) | + |ÁA (xi ) − ÁB (xi ) | + | A (xi ) − B (xi ) | ˛ , 177 i=1 178 179 s (A, B) 180 ⎛ ⎜ ⎜ =1−⎜ ⎜ ⎝ ⎞1/˛ N | A (xi ) − B ˛ ˛ (xi ) | + |ÁA (xi ) − ÁB (xi ) | + | A (xi ) − B (xi ) | ˛ (xi ) | ˛ i=1 N | A (xi ) + B ˛ ˛ (xi ) | + |ÁA (xi ) + ÁB (xi ) | + | A (xi ) + B ⎟ ⎟ ⎟ ⎟ ⎠ 181 i=1 182 Even though d (A, B) is quite similar to dP (A, B), we recognize that d (A, B) is designed on the basis of IFS which means A (x) + ÁA (x) + A (x) = while dP (A, B) is the distance on PFS satisfying ≤ A (x) + ÁA (x) + A (x) ≤ Indeed, it is not intuitive and Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 183 184 185 186 G Model ASOC 3589 1–12 ARTICLE IN PRESS L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 logical when taking the difference between ÁA (x) and ÁB (x) since these values can be calculated through other degrees In the other word, although d(A, B) is expressed as a function of three components, it turns out that d(A, B) is dependent on two variables This is different to dP (A, B) which is measured by three separate degrees Thus, we realize that d(A, B) is different to dP (A, B) and certainly much different to the proposed (hybrid) measure Again, in Ref [39] Xu et al proposed two intuitionistic fuzzy similarity measures for spectral clustering based on the minimum operator between the membership and non-membership degrees of IFS Those similarity measures are defined based on the standard intuitionistic Hamming, Euclidean and Hausdorff distances An overview of distance and similarity measures of IFS given by Xu and Chen [41] affirmed that most of the relevant works in IFS pay much attention to the similarity degrees based on three basic distance functions namely the intuitionistic Hamming, Euclidean and Hausdorff The analysis clearly point out the difference and novelty of the proposed distance measure with those on IFS Once defining the generalized picture distance measure, we apply it to a new clustering method called Hierarchical Picture Clustering (HPC) It uses a simpler strategy and easier for implementation than the intuitionistic fuzzy clustering [36–38,39,42] For instance, Xu et al [42] proposed intuitionistic clustering using association coefficients of IFS to construct an association matrix, which is then transformed into an equivalent association matrix Based on the -cutting matrix of the equivalent association matrix, clusters of IFSs are then determined Xu et al [39] defined two intuitionistic fuzzy similarity measures for constructing an intuitionistic fuzzy similarity measure matrix used by a spectral algorithm to cluster intuitionistic fuzzy data The un-normalized graph Laplacian and eigenvectors were opted to cluster the samples in spectral clustering Wang et al [36] presented a netting method to make clustering analysis of IFSs via the intuitionistic fuzzy similarity matrix Wang et al [37] proposed the intuitionistic fuzzy square product which is transformed to the intuitionistic fuzzy similarity matrix for direct intuitionistic fuzzy clustering based on a confidence level Those algorithms are mostly complex and timeconsuming since they firstly constructed the intuitionistic fuzzy similarity matrix and then either used an exhausted iterative strategy to get the equivalent association matrix [42] or a complex calculation through graph Laplacian [39], netting method [36], etc Meanwhile, HPC relies solely on the generalized picture distance measure and hierarchical clustering scheme for the classification of PFSs It is indeed recognized that HPC has the advantages of simple processing and intuitive manners But more than that, HPC provides the way to deal with PFS data which were not investigated by the existing intuitionistic fuzzy clustering algorithms As mentioned early, there are many events and phenomena that are represented by the PFS set When facing with those data, clustering algorithms on IFS work ineffectively since they not take into account the refusal/neutral information Combining the refusal and neutral degrees in IFS would make lost information; let us say for example: a PFS-A = {(x, 0.3, 0, 0.1); (y, 0.4, 0.1, 0.1)} and a IFS-B = {(x, 0.3, 0.1); (y, 0.4, 0.1)} It is obvious that IFS regards neutral values of x and y being 0.6 and 0.5, respectively Yet, in fact the most dominant part in the neutral values of IFS is the refusal degree The observation in A reveals that the “real” neutral and refusal degrees of x are and 0.6 while those of y are 0.1 and 0.4, respectively Thus, it is misleading if we use clustering algorithms on IFS for dealing with PFS data In short, we clearly recognize the role and advantages of HPC in comparison with the relevant clustering algorithms on IFS We not mention the (comparison of) clustering qualities of those algorithms since they are designed on different base sets However, we would like to emphasize on the simplicity and first debut of a clustering algorithm on PFS which is the main contribution in this paper Fig Illustration of geometrical representation of fuzzy triangular The rest of the paper is organized as follows Section presents the generalized picture distance measure and the HPC algorithm Section validates the proposed algorithm by experiments Section draws the conclusions and delineates the future research directions The proposed methodology 253 254 255 256 257 258 In this section, we firstly introduce the definition of generalized picture distance measure and then present a novel hierarchical picture fuzzy clustering (HPC) Definition A function d (A, B) with A, B ∈ PFS(X) is called picture distance measure if it satisfies: 259 260 261 262 263 ≤ d (A, B) ≤ 1, 264 d (A, B) = ⇔ A = B, 265 d (A, B) = d (B, A) , 266 AB × d (A, B) + AC × d (A, C) ≥ BC × d (B, C) ∀A, B, C ∈ PFS(X) where the symbol “×” is the arithmetical product AB , BC and AC are composition operations of A, B, C ∈ PFS(X) As an example, the following min-max composition formulae are used to calculate the triple ( AB , AC , BC ) from the membership functions of A, B, C ∈ PFS(X) 267 268 269 270 271 272 AB = max A (xi ) , B (xi ) 273 BC = max B (xi ) , C (xi ) 274 Ac = max A (xi ) , c (xi ) i i i The aim of those formulae is to specify fuzzy coefficients AC , BC ∈ [0, 1] for the fuzzy triangular inequality as in the 4th property of this definition Besides the min-max, some typical compositions such as max-prod, Lukasiewicz t-norm, etc can be used accordingly A geometrical representation of the 4th property is given in Fig It is clear from the figure that a new fuzzy representation of AB is A’B’ which is bounded in a fuzzy domain called Area satisfying d(A’B’) = AB × d (A, B) Then, there exists fuzzy representations of AC and BC namely A’C’ and B’C’ that belong to equivalent fuzzy domains—Area and Area respectively so that AB , Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 275 276 277 278 279 280 281 282 283 284 285 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 286 287 288 289 290 291 292 293 294 295 296 A’B’C’ forms a triangular It is pointed out that if there exists a triple ( AB , AC , BC ) so that the 4th property holds then d (A, B) is a picture distance measure This implies that the distance measure is constructed on a fuzzy space In the equivalent articles on fuzzy sets and topology, Zadeh and coworkers [5,15,18,19,24,32,38,49,51] suggested that the triangular inequality for a metric should be fuzzified by membership degrees so that conditions and properties of fuzzy topology hold This can be regarded as soft version of the metric definition in a hard space Definition The function below is a generalized picture distance measure between A, B ∈ PFS(X) N N 297 for PFS data Nonetheless, other types of crisp data, e.g., numerical, categorical data and images can also be used within this measure with the support of a fuzzification process For instance, an image is firstly extracted into feature records which are then fuzzified by the Gaussian membership function to make fuzzy data Note that each degree in PFS would have different membership functions so that we will obtain values of degrees for each record The next process is then done within the achieved PFS data as in Definition Other types of data can be handled analogously This remark shows the generality of the proposed measure 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 344 345 346 347 348 349 350 351 352 1/p p + i p i Á + p i p , i + max p Ái , dG (A, B) = N p i 1/p p + i p i Á + p i + max p , i p Ái , p i 1/p N + max i i=1 299 343 i=1 N 298 342 ˚Ai , ˚Bi + N |˚Ai p − ˚Bi | +1 i=1 where i = | A (xi ) − B (xi )|, (i = 1, , N) Ái = |ÁA (xi ) − ÁB (xi )|, (i = 1, , N) i = | A (xi ) − B (xi )|, (i = 1, , N) ˚Ai = | A (xi ) + ÁA (xi ) + A (xi )|, (i = 1, , N) ˚Bi = | B (xi ) + ÁB (xi ) + B (xi )|, (i = 1, , N) Remarks 1) dG (A, B) is a hybrid measure of the well-known Hamming, Euclidean and Hausdorff distances Specifically, when p = 1, we have the hybrid between Hausdorff and Hamming measures When p = 2, a hybrid of Hausdorff and Euclidean distances is recognized 2) dG (A, B) is not a trivial hybridization of such the existing measures in the sense that it does not physically mix those measures together without taking care of their meaning and contexts In fact, dG (A, B) has been designed on the basis of the picture fuzzy set represented in the form of membership values i , Ái , i, ˚Ai and ˚Bi It is regarded as a generalization of dP (A, B), which is a basic picture distance measure of Cuong and Kreinovich [6], by employing the integration of other measures such as Hamming, Euclidean and Hausdorff distances 3) The reasons for the hybridization in dG (A, B) can be explained as follows Note that the basic picture distance measure of Cuong and Kreinovich relies on the Hamming (p = 1) and Euclidean (p = 2) distances which were shown to have limitations in dealing with non-spherical datasets [5,7,8,12,13,28–30] Since they assume that sample points are distributed around sample mean in a spherical manner, the probability of a test point belonging to the set depends not only on the distance from the sample mean but also on the direction so as to avoid non-spherical distributions [35,41,47] Meanwhile, Hausdorff metric measures how far two subsets of a metric space are from each other It turns the set of non-empty compact subsets of a metric space into a metric space in its own right Thus, Hausdorff distance has the advantage of being sensitive to position [40,41,45] Another important advantage of Hausdorff distance is the possibility of using separately dissimilarity measures between one object and a part of another [46] Therefore, combining Hausdorff distance with Hamming and Euclidean measures in a generalized picture distance measure as in dG (A, B) would achieve the advantages of each measure as well as increase the performance 4) dG (A, B) is applicable to a large class of problems As being demonstrated in Definition 3, dG (A, B) is computed through the A B degrees of PFS ( i , Ái , i , ˚i and ˚i ) which are appropriate Theorem dG (A, B) is a picture distance measure Proof From Definition 2, it is obvious that the generalized picture distances satisfy three first conditions For the last condition regarding triangular inequality in PFS, we have to prove the existence of a triple ( AB , AC , BC ) so that the condition holds Since working on PFS whose data elements have associated membership values, it is clear that each distance measure between two sets in PFS should be accompanied with a composition function of those membership values As such, the last condition is often named as the (picture) fuzzy triangular inequality or soft triangular inequality In the extent of this proof, we will show that there exists discrete values for the triple ( AB , AC , BC ) Consider p = For A, B ∈ PFS(X), let us denote: N i AB1 = + Ái + i , 353 354 355 356 357 358 359 360 361 362 363 364 365 366 i=1 AB2 = max i, Ái , i , 367 AB3 = max ˚Ai , ˚Bi , 368 i N |˚Ai − ˚Bi | AB4 = 369 i=1 The following inequality is needed to prove: 370 AB1 + AB2 AC1 + AC2 + AB1 + AB2 + AB3 + AB4 + AC1 + AC2 + AC3 + AC4 + ≥ BC1 + BC2 (BC1 + BC2 + BC3 + BC4 + 1) 372 The facts below come from the definition of PFS | A (xi ) − B (xi )| + | A (xi ) − C (xi )| 371 ≥| B (xi ) − C (xi )|, 373 374 375 |ÁA (xi ) − ÁB (xi )| + |ÁA (xi ) − ÁC (xi )| ≥ |ÁB (xi ) − ÁC (xi )|, 376 | 377 A (xi ) − B (xi )| + | A (xi ) − C (xi )| ≥| B (xi ) − C (xi )| It follows that, AB1 + AC1 ≥ BC1 , Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 378 379 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx Table Q7 The Guangzhou car dataset Car1 Car2 Car3 Car4 Car5 380 Price (G3 ) Comfort (G4 ) Design (G5 ) Safety (G6 ) (0.4,0.1,0.2) (0.6,0,0.1) (0.1,0.2,0.4) (0.9,0,0) (0.1,0.6,0.2) (0.8,0,0.1) (0.1,0.1,0) (0.2,0.1,0.5) (0.8,0.1,0) (0.1,0.1,0) (0.1,0.2,0.4) (0.3,0.3,0.3) (0.3,0.4,0.2) (0.2,0.2,0.3) (0.6,0.1,0.1) (0.5,0.3,0.1) (0.4,0.1,0.2) (0.6,0.3,0.1) (0.7,0.03,0.07) (0.2,0.1,0.2) where i (x), Ái (x), i (x) are the positive, neutral and negative membership degrees of Ai , respectively Assume: max{| =| 383 384 385 386 Aerod (G2 ) (0.6,0.05,0.05) (0.5,0.1,0.1) (0.1,0.5,0.3) (0.4,0.05,0.05) (0.4,0.1,0.5) AB2 + AC2 ≥ BC2 381 382 Fuel (G1 ) (0.3,0.1,0.4) (0.6,0.2,0.1) (0.4,0.3,0.2) (0.2,0.2,0.2) (0,0.4,0.1) B (x) − B (x) − C (x)|, |ÁB (x) − ÁC (x)|, | B (x) − Definition Picture Distance Matrix (PDM) of Ai ∈ PFS (X) (index i = 1, ,N) is a similarity matrix sized N × N where each element is computed by Definition C (x)|} C (x)| The HPC Algorithm: Step 1: Given a collection of Ai ∈ PFS (X) (index i = 1, ,N) Consider each Ai is a unique cluster Step 2: Calculate Picture Distance Matrix (PDM) Step 3: Merge two consecutive PFS sets based on PDM and calculate new centers by Definition Notice that only two clusters are jointed in each stage Step 4: Repeat Step with Ai being replaced with the new centers until the desirable number of clusters is achieved Then, | B (x) − ≤ max{| 387 max{| C (x)| ≤ | A (x) − A (x) − A (x) − B (x)| + | A (x) − C (x)| B (x)|, |ÁA (x) − ÁB (x)|, | A (x) − C (x)|, |ÁA (x) − ÁC (x)|, | A (x) − B (x)|}+ C (x)|} 388 389 390 391 BC1 + BC2 BC1 + BC2 + BC3 + BC4 + + 392 393 394 AC1 + AC2 (∗) AB1 + AB2 + AC1 + AC2 + BC3 + BC4 + If one of the facts below happen, max ˚Ai , ˚Bi , ˚Ci i 395 Evaluation AB1 + AB2 ≤ AB1 + AB2 + AC1 + AC2 + BC3 + BC4 + max ˚Ai , ˚Bi , ˚Ci i = ˚Bj , = ˚Cj , Then BC3 ≥ AB3 Again if max ˚Ai , ˚Bi , ˚Ci 396 397 = ˚Aj , i max ˚Ai , ˚Bi Then 398 i 399 max ˚Ai − ˚Bi , ˚Bi − ˚Ci i ˚Ai − ˚Ci max 401 AB3 − BC3 ≤ 3AC2 402 403 404 405 i ≤ max ˚Ai − ˚Bi + ˚Bi − ˚Ci i = Analogously, we achieve BC4 ≥ AB4 , Or 3AC2 + BC4 ≥ AB4 Thus, 406 3(AB1 + AB2 + AC1 + AC2 + BC3 + BC4 + 1) ≥ AB1 + AB2 + AB3 + AB4 + 1, (**) 407 3(AB1 + AB2 + AC1 + AC2 + BC3 + BC4 + 1) ≥ AC1 + AC2 + AC3 + AC4 + (***) 408 409 410 411 412 ≤ ≤ 3AC2 , 400 i − max ˚Bi , ˚Ci Combine (*, **, ***), the inequality is proven Thus, dG (A, B) is a picture distance measure Definition The average picture set of Ai ∈ PFS (X) (index i = 1, ,N) is denoted asAVG (Ai ), AVG (Ai ) = x, N N i=1 i (x) , N N i=1 Ái (x) , N N i i=1 (x) |x ∈ X , In this section, we aim to validate whether the new metric can accurately measure data elements of the picture fuzzy set: Ai ∈ PFS (X) Even though there exist many extensions of the classical Fuzzy C-Means (FCM) in the literature that used the Euclidean or Hamming or Mahalanobis distances for obtaining clusters of spherical or elliptic geometrical form, they were not designed to work in the PFS set which contains the information of positive, negative and neutral as in Definition Therefore, in order to classify PFS elements: Ai ∈ PFS (X), we should use the basic and generalized picture distance measures-dP (A, B) and dG (A, B) in a hierarchical clustering algorithm like HPC respectively This section will compare those measures in terms of performance of clustering algorithms Therefore, we have implemented the HPC algorithm with dG (A, B) in addition to a variant of HPC using dP (A, B) called CK The Intuitionistic Hierarchical Clustering (IHC) algorithm [43] has been implemented to evaluate clustering quality of HPC and CK The experimental data consists of datasets The first one, Guangzhou car [50] described in Table 1, is a small dataset consists of new cars in the Guangzhou market evaluated by criteria: Fuel (G1), Aerod (G2), Price (G3), Comfort (G4), Design (G5) and Safety (G6) Data of each car for a given criterion consist of three components representing for the positive, the neutral and the negative degrees Sum of the neutral and the negative degrees in Table is the non-membership value in Ref [50] The second one, Building materials [40] shown in Table 2, is another small dataset has building materials namely Sealant, Floor varnish, Wall paint, Carpet and Chloride flooring characterized by attributes Sum of the neutral and the negative degrees in Table is the non-membership value in Ref [40] The aim of this dataset is to validate the algorithms on a dataset having larger number of attributes than that of the Guangzhou car dataset The third one, Heart Disease [34] whose a part is expressed in Table 3, is a real large dataset from UCI Machine Learning Repository consists of 270 patients acquiring heart disease categorized by attributes such as Age (3#Age), Blood pressure (mm Hg/patient, 10# Trestbps) and heart rate (#32 Thalach) The positive, the neutral and the negative degrees are fuzzified from crisp data using Gaussian, triangular and trapezoid Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx Table The building materials dataset Sealant Floor varnish Wall paint Carpet Chloride flooring X1 X2 X3 X4 X5 X6 X7 X8 (0.9, 0, 0) (0.5, 0.2, 0.2) (0.45, 0.1, 0.25) (1, 0, 0) (0.9, 0, 0) (0.1, 0.3, 0.5) (0.6, 0.1, 0.05) (0.6, 0.1, 0.2) (1, 0, 0) (0.9, 0.1, 0) (0.5, 0.1, 0.2) (1, 0, 0) (0.9, 0, 0) (0.85, 0.05, 0.05) (0.8, 0, 0.1) (0.2, 0, 0) (0.15, 0.3, 0.35) (0.1, 0.5, 0.3) (0.75, 0.15, 0) (0.7, 0.1, 0.1) (0.4, 0.15, 0.2) (0, 0.3, 0.5) (0.2, 0.3, 0.4) (0.2, 0.2, 0.6) (0.5, 0.05, 0.1) (0.1, 0.4, 0.5) (0.7, 0.05, 0.1) (0.6, 0.1, 0.1) (0.15, 0.25, 0.6) (0.3, 0.35, 0.3) (0.3, 0.3, 0.2) (0.5, 0.1, 0.2) (0.15, 0.4, 0.4) (0.1, 0.3, 0.4) (0.15, 0.25, 0.5) (0.5, 0.1, 0) (0.65, 0.1, 0.1) (0.2, 0.05, 0.6) (0.3, 0.3, 0.4) (0.4, 0.2, 0.1) Table Q8 A part of the heart disease dataset 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 Patient Age Trestbps Thalach X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X270 (0,0.292,0.438) (0,0.417,0.583) (0,0.833,0.167) (0,0.542,0.458) (0,0.125,0.188) (0,0.5,0.5) (0.011,0.875,0.114) (0,0.75,0.25) (0,0.708,0.292) (0,0.583,0.417) (0,0.75,0.25) (1,0,0) (0,0.625,0.375) (0,0.667,0.333) (0,0.833,0.167) (0,0.679,0.321) (0,0.396,0.604) (0,0.566,0.434) (0,0.642,0.358) (0,0.491,0.509) (0,0.491,0.509) (0,0.679,0.321) (0,0.302,0.453) (0,0.868,0.132) (0.011,0.943,0.046) (0,0.774,0.226) (0,0.906,0.094) (0,0.868,0.132) (0,0.755,0.245) (0,0.642,0.358) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.5,0.5) (0,0.417,0.583) (0,0.755,0.245) (0,0.5,0.5) membership functions The aim of this dataset is to validate the algorithms on a dataset having larger number of objects than that of the Guangzhou car dataset Lastly, Forest Cover Type [33] is a large dataset extracted from UCI Machine Learning Repository includes 1000 instances in 10 dimensions showing the actual forest cover type for a given observation (30 × 30 m cell) determined from US Forest Service (USFS) Region Resource Information System (RIS) The positive, the neutral and the negative degrees are fuzzified from crisp data using Gaussian, triangular and trapezoid membership functions The aim of this dataset is to validate the algorithms on a large dataset having both the number of objects and the number of attributes greater than those of the Guangzhou car dataset In order to evaluate clustering qualities of the algorithms, we use NMI (Normalized Mutual Information), F-Measure and Purity These evaluation indices are the-larger-the-better k r nij log 481 j=1 i=1 NMI = r ni log ni n i=1 ⎛ ⎝ Recalli = 482 k max nij , (i = 1, , r), nj∗ j=1 483 k j∗ = arg max nij , (j∗ ∈ [1, k] ), 484 j=1 Fi = × Precisioni × Recalli , Precisioni + Recalli F-Measure = r 485 r Fi , 486 max nij , 487 i=1 Purity = n r i=1 k j=1 where 488 • T = T1 , , Tk and C = C1 , , Cr are k correct and r predicted clusters, respectively • n is total number of data points • nij = |Ci ∩ Tj |: common number of data points between Ci and Tj (i = 1, , r; j = 1, , k) 489 490 491 492 493 k • ni = nij : number of data points of Ci (i = 1, , r) 494 nij : number of data points of Tj (j = 1, , k) 495 j=1 r nj = i=1 nìnij ni ×nj Firstly, we illustrate the activities of the HPC algorithm to classify the Guangzhou car dataset in Table In the first phase, each car in the dataset is a unique cluster ⎞, k nj log k max nij , (i = 1, , r), ni j=1 Precisioni = nj n ⎠ Car1 , Car2 , Car3 , Car4 , Car5 j=1 496 497 498 499 Table Classification results by phases ID Fuel (G1 ) Aerod (G2 ) Price (G3 ) Comfort (G4 ) Design (G5 ) Safety (G6 ) Phase Car1 Car4 Car2 Car3 Car5 0.267 0.3 0.3 0.2 0.167 0.233 0.233 0.3 0.167 0.333 0.233 0.233 0.233 0.3 0.3 0.3 0.067 0.267 0.3 0.067 0.233 0.3 0.3 0.233 0.267 0.3 0.233 0.333 0.267 0.167 Phase C14 C23 C5 0.233 0.3 0.167 0.2 0.267 0.333 0.267 0.233 0.3 0.3 0.167 0.067 0.233 0.3 0.267 0.283 0.283 0.167 Phase C1423 C5 0.267 0.167 0.233 0.333 0.25 0.3 0.233 0.067 0.267 0.267 0.283 0.167 Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 500 PDM is calculated as follows ⎛ 501 0.2908 0.3275 503 504 505 506 507 508 509 Table The results after using PCA ⎞ ⎜ 0.2908 0.3036 0.2912 0.3286 ⎟ ⎜ ⎟ ⎟ PDM = ⎜ ⎜ 0.3275 0.3036 0.3581 0.3062 ⎟ ⎝ 0.2560 0.2912 0.3581 0.3767 ⎠ 0.3406 0.3286 502 0.2560 0.3406 0.3062 0.3767 Because d (Car1, Car4) = 0.2560 is the minimal value among all distances, Car1 and Car4 are grouped into a cluster Removing all distance values related to Car1 and Car4, we recognize that d (Car2, Car3) = 0.3036 is the minimal value among all Thus, Car2 and Car3 are merged into another cluster Results of the second phase are: Definition 4, the centers of Car1, Car4 and Car2, Car3 are: 510 Aerod (G2 ) Fuel (G1 ) C14 (0.25,0.15, (0.5, 0.3) 0.05, 0.05) (0.5, (0.3,0.3, 0.25, 0.2) 0.15) C23 511 512 0.2887 515 516 517 Comfort (G4 ) Design (G5 ) Safety (G6 ) (0.65, 0.05, 0.1) (0.35,0.1, 0.25) (0.8,0.05, 0.05) (0.15,0.2, 0.35) (0.6,0.165, 0.085) (0.15,0.1, 0.25) (0.3,0.35, 0.25) (0.5,0.2, 0.15) X Y Phase Car1 Car4 Car2 Car3 Car5 0.115 −0.115 0.08 0.106 −0.186 −0.004 −0.072 −0.084 0.098 0.061 Phase C14 C23 C5 0.137 0.027 −0.165 −0.055 0.087 −0.032 Phase C1423 C5 0.126 −0.126 0 ⎞ PDM = ⎝ 0.2887 0.2958 ⎠ Since d Table The clustering results of algorithms on the Guangzhou car dataset Phase IHC CK HPC Car1,Car4 Car2,Car3 Car5 Car1,Car4 Car2,CarCar5 Car3 Car1,Car4 Car2,Car3 Car5 Car1,Car4, Car2,Car3 Car5 Car1,Car4, Car2,Car5 Car3 Car1,Car4, Car2,Car3 Car5 Table The NMI values of algorithms on the Guangzhou car dataset 0.3485 0.3485 0.2958 514 Price (G3 ) Next, we calculate the PDM of Phase ⎛ 513 ID Car1, Car4 , Car2, Car3 , Car5 Using Phase CK HPC 0.737 0.101 1 Car1, Car4 , Car2, Car3 = 0.2887 is the smallest value among all distances in PDM2 , we combine those cars into a cluster Results of the third phase are: Car1, Car4, Car2, Car3 , Car5 The center of cluster Car1, Car4, Car2, Car3 is: Table The F-Measure values of algorithms on the Guangzhou car dataset Phase CK HPC 0.833 0.875 1 Table The Purity values of algorithms on the Guangzhou car dataset 518 C1423 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 Fuel (G1 ) Aerod (G2 ) Price (G3 ) Comfort (G4 ) Design (G5 ) Safety (G6 ) (0.375, 0.2, 0.225) (0.4, 0.175, 0.125) (0.5, 0075, 0.175) (0.475, 0.075, 0.15) (0.225, 0.275, 0.3) (0.55, 0.1825, 0.1175) The PDM of Phase is: PDM = 0.3110 Lastly, in the fourth phase, all cars are grouped into a unique cluster A hierarchical tree for the classification of Guangzhou car dataset using HPC algorithm is shown in Fig If we compute the average values of the positive, the neutral and the negative memberships of all cars and group by phases then we get the results in Table In order to visualize the clustering results, we use Principal Component Analysis (PCA), which is a well-known method in statistics, to reduce dimensions of data in Table and get the results in Table 2D distributions of data points and centers of all phases are also depicted in Figs 3–6 Secondly, we compare the clustering qualities of HPC and CK through evaluation indices on the experimental datasets The results on the Guangzhou car dataset are shown in Tables 6–9 Phase CK HPC 0.8 0.8 1 The results have shown that clustering quality of HPC is better than that of CK Moreover, as illustrated in Figs and and Table 6, we clearly recognize that the hierarchical tree of IHC is identical to HPC This means that using the generalized picture distance measure in clustering algorithms results in better quality than using the basic picture distance Analogously, we made the comparison on other datasets and achieved the results in Tables 10–19 The values in these tables affirm the efficiency of HPC even in the cases that the number of attributes or the number of objects is higher than that of Guangzhou car This clearly shows the fact that using the generalized picture distance measure would result in accurate calculations of similarity between objects In Figs and 9, we illustrate the hierarchical tree of HPC for building materials and a clustering tool called HPCS—a kind of knowledge-based systems to assist clustering on PFS datasets, respectively Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 G Model ASOC 3589 1–12 ARTICLE IN PRESS L.H Son / Applied Soft Computing xxx (2016) xxx–xxx Fig The hierarchical tree of HPC for Guangzhou car Fig The distributions of data and centers in Phase Fig The distributions of data and centers in Phase Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 G Model ASOC 3589 1–12 ARTICLE IN PRESS L.H Son / Applied Soft Computing xxx (2016) xxx–xxx Fig The distributions of data and centers in Phase Fig The distributions of data and centers in Phase Fig The hierarchical tree of CK for Guangzhou car Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 10 Table 10 The clustering results of algorithms on the building materials dataset Phase IHC CK HPC Floor varnish, Wall paint Carpet, Chloride flooring Sealant Floor varnish, Wall paint Carpet, Chloride flooring Sealant Floor varnish, Wall paint Carpet, Chloride flooring Sealant Floor varnish, Wall paint Floor varnish, Wall paint, Carpet, Chloride flooring Sealant Floor varnish, Wall paint Carpet, Chloride flooring, Sealant Carpet, Chloride flooring, Sealant Fig The hierarchical tree of HPC for building materials Table 11 The NMI values of algorithms on the building materials dataset Table 14 The NMI values of algorithms on the heart disease dataset Phase CK HPC Phase CK HPC 0.204 1 0.88 0.74 0.609 0.43 0.353 0.265 0.164 0.027 0.881 0.744 0.601 0.465 0.353 0.296 0.138 0.024 Table 12 The F-Measure values of algorithms on the building materials dataset Phase CK HPC 0.952 1 Table 13 The Purity values of algorithms on the building materials dataset Phase CK HPC 0.8 1 Table 15 The F-Measure values of algorithms on the heart disease dataset Phase CK HPC 0.574 0.408 0.397 0.327 0.397 0.452 0.573 0.785 0.578 0.423 0.374 0.364 0.395 0.471 0.417 0.609 Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 G Model ARTICLE IN PRESS ASOC 3589 1–12 L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 11 Fig Visualization of hierarchical trees in HPCS Table 16 The Purity values of algorithms on the heart disease dataset Table 18 The F-Measure values of algorithms on the forest cover type dataset Phase CK HPC Phase CK HPC 0.574 0.415 0.396 0.337 0.396 0.459 0.6 0.941 0.578 0.422 0.381 0.385 0.433 0.504 0.622 0.941 10 0.593 0.467 0.39 0.348 0.328 0.342 0.376 0.49 0.626 0.588 0.457 0.378 0.319 0.271 0.342 0.319 0.406 0.704 Table 17 The NMI values of algorithms on the forest cover type dataset 553 554 555 556 557 558 559 560 561 562 563 564 565 Phase CK HPC 10 0.909 0.813 0.708 0.595 0.48 0.36 0.219 0.156 0.047 0.908 0.81 0.701 0.575 0.438 0.339 0.197 0.119 0.123 Table 19 The Purity values of algorithms on the forest cover type dataset Phase CK HPC 10 0.593 0.469 0.388 0.342 0.325 0.356 0.374 0.493 0.626 0.588 0.457 0.38 0.312 0.28 0.347 0.312 0.407 0.704 Conclusions In this paper, we proposed a generalized picture distance measure and a novel hierarchical picture fuzzy clustering namely Hierarchical Picture Clustering (HPC) for the classification of picture fuzzy data HPC was experimentally validated on various PFS datasets including small and real large datasets The findings could be summarized as follows: (i) using the generalized picture distance measure would obtain better clustering quality than using other picture measures; (ii) HPC is able to adapt with various types of datasets and produces more accurate results than GK; (iii) hierarchical trees affirm stability and insensitivity with outliers of the proposed algorithm; (iv) clustering quality of HPC is stable even if the number of attributes or objects in the dataset increases Lastly, a clustering and visualization system namely HPCS was designed for the classification of picture fuzzy sets datasets Those results would enrich the knowledge of deploying soft computing methods on advanced fuzzy sets such as picture fuzzy sets As motivated by the preliminary picture fuzzy clustering algorithm in Ref [27], the contributions of this research continue to enrich the collection of clustering algorithms in picture fuzzy sets The insightful and practical implication of the contributed work can be easily captured since clustering algorithms are often widely applied to various practical socio-economic and pattern recognition problems However, in order for advancing the proposed measure in this paper for larger domains of applications, more general expression which parameterized appropriately reduces Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 566 567 568 569 570 571 572 573 574 575 576 577 578 G Model ASOC 3589 1–12 12 ARTICLE IN PRESS L.H Son / Applied Soft Computing xxx (2016) xxx–xxx 600 to known expressions should be investigated in the sense that meaningful generalizations (with meaningful parameters) can help defining applications Torra and Narukawa [31] gave us a hint by introducing the CMI operator that generalizes the Choquet integral and the Mahalanobis distance It can be used in problems where variables are not independent In a real application, not all attributes/variables/components that describe an object/record have the same importance Therefore, it is natural to include weights Moreover, the problem of distance learning, that is, having a parametric distance or a set of possible distances among which to choose and develop machine learning algorithms to find appropriate parameters and/or select the appropriate distance should be investigated For example, Abril et al [1,2] proposed a parameterized symmetric bilinear-form aggregation operator and a supervised learning method to find values of the aggregation parameters that maximize the number of re-identification (or correct links) It was shown that the aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators Last but not least, an advanced measure such as picture association measure can be extended from the generalized picture distance measure which is further applied to construct classification/forecast algorithms 601 Appendix A 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 Source codes and experimental datasets of this research can be retrieved at this link: https://sourceforge.net/projects/hpcs/ References [1] D Abril, G Navarro-Arribas, V Torra, Improving record linkage with supervised learning for disclosure risk assessment, Inf Fusion 13 (4) (2012) 274–284 [2] D Abril, V Torra, G Navarro-Arribas, Supervised learning using a symmetric bilinear form for record linkage, Inf Fusion 26 (2015) 144–153 [3] K.T Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets Syst 20 (1986) 87–96 [4] N Chen, Z Xu, M Xia, Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis, Appl Math Model 37 (4) (2013) 2197–2211 [5] T.Y Chen, A note on distances between intuitionistic fuzzy sets and/or interval-valued fuzzy sets based on the Hausdorff metric, Fuzzy Sets Syst 158 (22) (2007) 2523–2525 [6] B.C Cuong, V Kreinovich, Picture fuzzy sets, J Comput Sci Cybern 30 (4) (2014) 409–416 [7] L Dengfeng, C Chuntian, New similarity measures of intuitionistic fuzzy sets and application to pattern recognitions, Pattern Recognit Lett 23 (1) (2002) 221–225 [8] P Diamond, P Kloeden, Metric Spaces of Fuzzy Sets Theory and Applications, World Scientific Publishing, Singapore, 1994 [9] P.J Groenen, K Jajuga, Fuzzy clustering with squared Minkowski distances, Fuzzy Sets Syst 120 (2) (2001) 227–237 [10] P Grzegorzewski, Distances between intuitionistic fuzzy sets and/or interval-valued fuzzy sets based on the Hausdorff metric, Fuzzy Sets Syst 148 (2) (2004) 319–328 [11] A.G Hatzimichailidis, G.A Papakostas, V.G Kaburlasos, A novel distance measure of intuitionistic fuzzy sets and its application to pattern recognition applications, Int J Intell Syst 27 (4) (2012) 396–409 [12] W.L Hung, M.S Yang, Similarity measures of intuitionistic fuzzy sets based on Hausdorff distance, Pattern Recognit Lett 25 (14) (2004) 1603–1611 [13] W.L Hung, M.S Yang, Similarity measures of intuitionistic fuzzy sets based on Lp metric, Int J Approx Reason 46 (1) (2007) 120–136 [14] C Hwang, F.C.H Rhee, Uncertain fuzzy clustering: interval type-2 fuzzy approach to c-means, IEEE Trans Fuzzy Syst 15 (1) (2007) 107–120 [15] M Irfan Ali, A note on soft sets, rough soft sets and fuzzy soft sets, Appl Soft Comput 11 (4) (2011) 3329–3332 [16] Y Li, D.L Olson, Z Qin, Similarity measures between intuitionistic fuzzy (vague) sets: a comparative analysis, Pattern Recognit Lett 28 (2) (2007) 278–285 [17] Z Liang, P Shi, Similarity measures on intuitionistic fuzzy sets, Pattern Recognit Lett 24 (15) (2003) 2687–2693 [18] J.M Mendel, R.B John, Type-2 fuzzy sets made simple, IEEE Trans Fuzzy Syst 10 (2) (2002) 117–127 [19] D Meng, X Zhang, K Qin, Soft rough fuzzy sets and soft fuzzy rough sets, Comput Math Appl 62 (12) (2011) 4635–4645 [20] J.M Merigó, M Casanovas, A new Minkowski distance based on induced aggregation operators, Int J Comput Intell Syst (2) (2011) 123–133 [21] H.B Mitchell, On the Dengfeng–Chuntian similarity measure and its application to pattern recognition, Pattern Recognit Lett 24 (16) (2003) 3101–3104 [22] G.A Papakostas, A.G Hatzimichailidis, V.G Kaburlasos, Distance and similarity measures between intuitionistic fuzzy sets: a comparative analysis from a pattern recognition point of view, Pattern Recognit Lett 34 (14) (2013) 1609–1622 [23] N Pelekis, D.K Iakovidis, E.E Kotsifakos, I Kopanakis, Fuzzy clustering of intuitionistic fuzzy data, Int J Bus Intell Data Min (1) (2008) 45–65 [24] L Polkowski, Rough Sets: Mathematical Foundations, Springer, US, 2013 [25] P Sarnacchiaro, L D’Ambra, I Camminatiello, Measures of association and visualization of log odds ratio structure for a two way contingency table, Aust N Z J Stat 57 (3) (2015) 363–376 [26] L.H Son, B.C Cuong, P.L Lanzi, N.T Thong, A novel intuitionistic fuzzy clustering method for geo-demographic analysis, Expert Syst Appl 39 (10) (2012) 9848–9859 [27] L.H Son, DPFCM: a novel distributed picture fuzzy clustering method on picture fuzzy sets, Expert Syst Appl 42 (1) (2015) 51–66 [28] E Szmidt, J Kacprzyk, Distances between intuitionistic fuzzy sets, Fuzzy Sets Syst 114 (3) (2000) 505–518 [29] E Szmidt, J Kacprzyk, Distances Between Intuitionistic Fuzzy Sets and Their Applications in Reasoning, Springer, Berlin, Heidelberg, US, 2005 [30] E Szmidt, J Kacprzyk, Distances between intuitionistic fuzzy sets: straightforward approaches may not work, Proceedings of 3rd International IEEE Conference on Intelligent Systems (2006) 716–721 [31] V Torra, Y Narukawa, On a comparison between Mahalanobis distance and Choquet integral: the Choquet–Mahalanobis operator, Inf Sci 190 (2012) 56–63 [32] V Torra, Hesitant fuzzy sets, Int J Intell Syst 25 (6) (2010) 529–539 [33] UCI Machine Learning Repository, Covertype Data Set, 2015, Available at: https://archive.ics.uci.edu/ml/datasets/Covertype [34] UCI Machine Learning Repository, Heart Disease, 2013, Available at: https:// archive.ics.uci.edu/ml/datasets/Heart+Disease [35] W Wang, X Xin, Distance measure between intuitionistic fuzzy sets, Pattern Recognit Lett 26 (13) (2005) 2063–2069 [36] Z Wang, Z Xu, S Liu, J Tang, A netting clustering analysis method under intuitionistic fuzzy environment, Appl Soft Comput 11 (8) (2011) 5558–5564 [37] Z Wang, Z Xu, S Liu, Z Yao, Direct clustering analysis based on intuitionistic fuzzy implication, Appl Soft Comput 23 (2014) 1–8 [38] C.P Wei, P Wang, Y.Z Zhang, Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications, Inf Sci 181 (19) (2011) 4273–4286 [39] D Xu, Z Xu, S Liu, H Zhao, A spectral clustering algorithm based on intuitionistic fuzzy information, Knowl Based Syst 53 (2013) 20–26 [40] Z.S Xu, Intuitionistic fuzzy hierarchical clustering algorithms, J Syst Eng Electron 20 (2009) 90–97 [41] Z.S Xu, J Chen, An overview of distance and similarity measures of intuitionistic fuzzy sets, Int J Uncertain Fuzziness Knowl Based Syst 16 (2008) 529–555 [42] Z Xu, J Chen, J Wu, Clustering algorithm for intuitionistic fuzzy sets, Inf Sci 178 (19) (2008) 3775–3790 [43] Z Xu, Intuitionistic Fuzzy Aggregation and Clustering, Springer, US, 2012 [44] Z Xu, Some similarity measures of intuitionistic fuzzy sets and their applications to multiple attribute decision making, Fuzzy Optim Decis Mak (2) (2007) 109–121 [45] Z Xu, J Wu, Intuitionistic fuzzy C-means clustering algorithms, J Syst Eng Electron 21 (4) (2010) 580–590 [46] Z Xu, M Xia, Distance and similarity measures for hesitant fuzzy sets, Inf Sci 181 (11) (2011) 2128–2138 [47] Y Yang, F Chiclana, Consistency of 2D and 3D distances of intuitionistic fuzzy sets, Expert Syst Appl 39 (10) (2012) 8665–8670 [48] J Ye, Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses, Artif Intell Med 63 (3) (2015) 171–179 [49] L.A Zadeh, Fuzzy sets, Inf Control (1965) 338–353 [50] H.M Zhang, Z.S Xu, Q Chen, On clustering approach to intuitionistic fuzzy sets, Control Decis 22 (2007) 882–888 [51] X Zhang, B Zhou, P Li, A general frame for intuitionistic fuzzy rough sets, Inf Sci 216 (2012) 34–49 [52] G Zheng, J Xiao, J Wang, Z Wei, A similarity measure between general type-2 fuzzy sets and its application in clustering, Proceeding of 2010 8th World Congress on Intelligent Control and Automation (WCICA) (2010) 6383–6638 Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl Soft Comput J (2016), http://dx.doi.org/10.1016/j.asoc.2016.05.009 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 ... knowledge-based systems to assist clustering on PFS datasets, respectively Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, ... data and centers in Phase Fig The distributions of data and centers in Phase Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy. .. Intelligent Control and Automation (WCICA) (2010) 6383–6638 Please cite this article in press as: L.H Son, Generalized picture distance measure and applications to picture fuzzy clustering, Appl