DSpace at VNU: Generalized picture distance measure and applications to picture fuzzy clustering tài liệu, giáo án, bài...
Trang 1jo u r n al hom e p a g e :w w w e l s e v i e r c o m / l o c a t e / a s o c
Generalized picture distance measure and applications to picture
fuzzy clustering
Q2
Q3
a r t i c l e i n f o
Keywords:
a b s t r a c t
Picturefuzzyset(PFS),whichisageneralizationoftraditionalfuzzysetandintuitionisticfuzzyset, showsgreatpromisesofbetteradaptationtomanypracticalproblemsinpatternrecognition,artificial life,robotic,expertandknowledge-basedsystemsthanexistingtypesoffuzzysets.Anemergingresearch trendinPFSisdevelopmentofclusteringalgorithmswhichcanexploitandinvestigatehiddenknowledge fromamassofdatasets.Distancemeasureisoneofthemostimportanttoolsinclusteringthatdetermine thedegreeofrelationshipbetweentwoobjects.Inthispaper,weproposeageneralizedpicturedistance measureandintegrateittoanovelhierarchicalpicturefuzzyclusteringmethodcalledHierarchicalPicture Clustering(HPC).Experimentalresultsshowthattheclusteringqualityoftheproposedalgorithmisbetter thanthoseoftherelevantones
©2016ElsevierB.V.Allrightsreserved
1 Introduction
Sincefuzzyset(FS)[49]wasfirstlyintroducedbyZadehin1965,
manyextensionsofFShavebeenproposedintheliteraturesuchas
thetype-2fuzzyset(T2FS)[18],roughset(RS)[24],softset,rough
softsetandfuzzysoftset[15],intuitionisticfuzzyset(IFS)[3],
intu-itionisticfuzzyroughset(IFRS)[51],softroughfuzzyset&softfuzzy
roughset[19],interval-valuedintuitionisticfuzzyset(IVIFS)[38]
andhesitantfuzzyset(HFS)[32].Theaimofthoseextensionsis
toovercomethelimitationsofFSregardingthedegreeof
fuzzi-ness,theuncertainty ofmembershipdegrees,and theexistence
ofneutrality.Recently,anewgeneralizedfuzzysetcalledpicture
fuzzyset(PFS)hasbeenproposedbyCuongandKreinovichinRef
[6].Theword“picture”inPFSreferstogeneralityasthissetisthe
directextensionofFSandIFS.Intheotherwords,PFSintegrates
informationofneutralandnegativeintoitsdefinitionsothatwhen
thevalue(s)ofone(both)ofthosedegreesis(are)equaltozero,it
returnstoIFS(FS)set.ComparingwithIFS,PFSdividesthehesitancy
degreeintotwoparts,i.e.,refusaldegreeandneutraldegree(see
Definition1andExamples1and2fordetails).Thissetshowsgreat
promisesofbetteradaptationtomanypracticalproblemsin
pat-ternrecognition,artificiallife,robotic,expertandknowledge-based
systemsthansomeexistingtypesoffuzzysets
Definition 1. Apicturefuzzyset(PFS)[6]inanon-emptysetXis,
Q4
A=
x,A(x) ,A(x) ,A(x)|x∈X
, whereA(x) isthepositivedegreeofeachelementx∈X,A(x) is theneutraldegreeandA(x) isthenegativedegreesatisfyingthe constraints,
A(x) ,A(x) ,A(x) ∈ [0,1] ,∀x ∈X,
0≤A(x)+A(x)+A(x)≤1,∀x ∈X
TherefusaldegreeofanelementiscalculatedasA(x)=1− (A(x)+A(x)+A(x)),∀x∈X.InthecaseA(x)=0PFSreturns
totheIFSset,andwhenbothA(x)=A(x)=0,PFSreturnstothe
FSset.SomepropertiesofPFSoperations,theconvexcombination
ofPFS,etc.accompaniedwithproofscanbereferencedinRef.[6]
Example 1. Inademocraticelectionstation,thecouncilissues
500votingpapersforacandidate.Thevotingresultsaredivided intofourgroupsaccompaniedwiththenumberofpapersnamely
“votefor”(300),“abstain”(64),“voteagainst”(115)and“refusalof voting”(21).Group“abstain”meansthatthevotingpaperisawhite paperrejectingboth“agree”and“disagree”forthecandidatebut stilltakesthevote.Group“refusalofvoting”iseitherinvalidvoting papersorbypassingthevote.Thisexamplewashappenedinreality andIFScouldnothandleitsincetheneutralmembership(group
“abstain”)doesnotexist
Example 2. Personnelselectionis averyimportantactivityin thehumanresourcemanagementofanorganization.Theprocess
ofselection followsa methodologytocollectinformation about http://dx.doi.org/10.1016/j.asoc.2016.05.009
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45 46
48
49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67
Trang 2anindividualinordertodetermineifthat individualshouldbe
employed.Theselectionresultscouldbeclassifiedinto4classes:
truepositive,truenegative,falsenegative,andfalsepositivewhich
are somehow equivalent to the positive, neutral, negative and
refusaldegrees ofPFS.Eachcandidate isranked accordingto4
classesbyhisabilityandsuitabilityforthejob,andthefinaldecision
ismadebasedonresultsoftheclasses.Forexample,iftwo
candi-datesarerankedA-(50%,20%,20%,10%)andB-(40%,10%,30%,20%),
thefinaldecisioncanbemadethroughtheunionoperatorand
max-imumofthepositivedegreeinPFSwhichreturnsthevalueof50%
(Aisselected)
AnemergingtrendinPFSandotheradvancedfuzzysetsisthe
developmentofsoftcomputingmethodsespeciallyclustering
algo-rithmsonthesesets,whichcouldproducebetterqualityofresults
thanthat onFS For instance,clustering algorithmson interval
T2FSfocusingonuncertaintyassociatedwiththefuzzifierwere
investigatedinRefs.[14,52].RegardingtheIFSset,Pelekisetal
[23]proposedaclusteringapproachutilizingasimilarity-metric
Q5
definedoverIFS.XuandWu[45]developedtheIFCMalgorithm
toclassifyIFSandinterval-valuedIFS.Sonetal.[26]proposedan
intuitionisticfuzzyclusteringalgorithmforgeo-demographic
anal-ysis.Xuandhisgroupdevelopedanumberofintuitionisticfuzzy
clusteringmethodsinvariouscontexts[36,37,39,42].Fuzzy
clus-teringalgorithmsonothersetsnamelyHFSandPFSwerefound
inRefs.[4,27].Itisclearfromtheliteraturethatdistancemeasure
isthemostimportantfactorforanefficientclusteringalgorithm
ThemostwidelyuseddistancemeasuresfortwoFSsAandBon
X=
X1, ,XN
istheHamming,EuclideanandHausdorffmetrics [6].BecauseoftheFS’sdrawbacks,distancemeasuresonothersets
mostlyIFShavebeenproposed.Atanassov[3],Chen[5],Dengfeng
andChuntian[7],Grzegorzewski[10],Hatzimichailidisetal.[11],
HungandYang[12,13],Lietal.[16],LiangandShi[17],Mitchell
[21],Papakostasetal.[22],SzmidtandKacprzyk[28–30],Wangand
Xin[35],XuandChen[41],XuandXia[46],YangandChiclana[47]
andXu[44]presentedsomedistancemeasuresinIFSnamelythe
(normalized)intuitionisticHammingandEuclideandistances,and
the(normalized)HausdorffintuitionisticHammingandEuclidean
distances.AbasicdistancemeasureonPFShasbeengivenbyCuong
andKreinovich[6]asfollows
=
1
N
N
i=1
.
WerecognizethatdP(A,B) isageneralizationofthoseinIFS
Q6
andFSwhenA(x)=0andbothA(x)=A(x)=0,respectively
Asexplainedabove,theintegrationofneutraldegreeA(x) would
measureinformationofobjectsmoreaccuratelyandincrease
qual-ityandaccuracyofachievedresults.Yetagain,tohelpimproving
theperformanceasmotivatedbythepreviousresearchesonIFS
thattendedtocombinesomebasicdistancemeasuresintoa
com-plexoneto improvethegenerality and accuracy,in this paper
weproposeanovelgeneralizedpicturedistancemeasureanduse
itinanewclusteringmethodonPFScalledHierarchicalPicture
Clustering(HPC).Thereasonfordesigninganewmeasurecanbe
illustratedbyanexampleasfollows.Considerthatwewouldlike
tomeasurethetruth-valueofthepropositionG=“throughapoint
exteriortoalineonecandrawonlyoneparalleltothegivenline”
Thepropositionisincomplete,sinceitdoesnotspecifythetypeof
geometricalspaceitbelongsto.InanEuclideangeometricspace
theproposition Gistrue;in aRiemanniangeometricspacethe
propositionGisfalse(sincethereisnoparallelpassingthrough
anexteriorpointtoagivenline);inageometricspacecovering
thePFSset(constructedfrommixedspaces,forexamplefroma
partofEuclideansubspacetogetherwithanotherpartof Riemann-ianspace)thepropositionGisindeterminate(trueandfalseinthe sametime)[48].Itisobviousthatobjects,notions,ideas,etc.can
bebettermeasuredinPFSthaninothertypesoffuzzysets
Themaindifferencesoftheproposeddistancemeasurewith
dP(A,B) and thoseonIFSsuchasinXu[44] arehighlightedas follows
Firstly,asbeingshownabove,dP(A,B) isanaturalexpansionof thewell-knownMinkowskidistanceoforderp≥1betweentwo pointsunderfuzzyenvironments.Whenp=1orp=2,wehavethe Manhattanand Euclideandistances,respectively.Inthelimiting caseofpreachinginfinity,weobtaintheChebyshevdistance.The Minkowskidistancehasthebestperformancefornumericaldata but works ineffectivelywithasymmetric binaryvariables, non-metricvectorobjects,etc.[20].Forexample,thesimilaritybetween twovectorscanbedenotedasacosinemeasurewhichisfurther usedtodefinea distance[48].Forasymmetricbinaryvariables, thecontingencytable,whichreflectsthematchingstatesbetween twoobjects,isusedtocomputethedistancebetweenasymmetric binaryvariables[25].Itisveryoftenthatanon-linearfunctionis adoptedasthedistancemetricforprocessingnon-sphericaldata [9].Oneofthemostcommonwaystocreatesuchthefunctionis combiningthebasicdistancemeasuresintoacomplexonesothat thedeficienciesofthestandalonemetricsaresettled.This intu-itionleadstodebutoftheproposedmeasurewhichmayenhance performanceandaccuracyofresults
Secondly,theproposedmeasureisacombinationofthe Ham-ming,EuclideanandHausdorffdistances.ItisdifferenttodP(A,B) whichinessenceisthenormalizedformofwell-knownMinkowski distanceoforderp≥1.Inthenextsection,wewillexplainwhy thehybridizationshouldbemadeandemphasizeonthe advan-tagesanddisadvantagesofusingtheproposedmeasure.However,
itisnotedthattheproposeddistancemeasureisageneralization versionofdP(A,B)
Thirdly,theproposeddistancemeasureisdifferenttothoseon IFSsuchasinXu[44]inmanyaspects.Letustakesomeexamples.In Ref.[44],XugeneralizedtheintuitionisticHammingandEuclidean distancesofSzmidtandKacprzyk[28]asbelow
=
1 2N
N
i=1
.
Hethendefinedseveralsimilaritymeasuresfromtheabove dis-tancefunction,forinstance:
1 2N
N
i=1
,
⎛
⎜
⎜
⎝
N
i=1
N
i=1
⎞
⎟
⎟
⎠ 1/˛
Even though d (A,B) is quite similar to dP(A,B), we recog-nize that d (A,B) is designed on thebasis of IFS which means
A(x)+A(x)+A(x)=1whiledP(A,B) isthedistanceonPFS sat-isfying0≤A(x)+A(x)+A(x)≤1.Indeed,itisnotintuitiveand
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
171
172 173 174 175 176
177
178 179 180
181
182 183 184 185 186
Trang 3logicalwhentakingthedifferencebetweenA(x) andB(x) since
thesevaluescanbecalculatedthroughotherdegrees.Intheother
word,althoughd(A,B)isexpressedasafunctionofthree
compo-nents,itturnsoutthatd(A,B)isdependentontwovariables.Thisis
differenttodP(A,B) whichismeasuredbythreeseparatedegrees
Thus,werealizethatd(A,B)isdifferenttodP(A,B) andcertainly
muchdifferenttotheproposed(hybrid)measure.Again,inRef
[39]Xuetal.proposedtwointuitionisticfuzzysimilaritymeasures
forspectralclusteringbasedontheminimumoperatorbetween
themembershipandnon-membershipdegreesofIFS.Those
sim-ilaritymeasuresaredefinedbasedonthestandard intuitionistic
Hamming,EuclideanandHausdorffdistances.Anoverviewof
dis-tanceandsimilaritymeasuresofIFSgiven byXuandChen[41]
affirmedthatmostoftherelevantworksinIFSpaymuchattention
tothesimilaritydegreesbasedonthreebasicdistancefunctions
namelytheintuitionisticHamming,EuclideanandHausdorff.The
analysisclearlypointoutthedifferenceandnoveltyoftheproposed
distancemeasurewiththoseonIFS
Oncedefining the generalized picture distancemeasure, we
applyit toa newclusteringmethod calledHierarchicalPicture
Clustering(HPC).Itusesasimplerstrategyandeasierfor
imple-mentationthantheintuitionisticfuzzyclustering[36–38,39,42]
For instance, Xu et al [42] proposed intuitionistic clustering
using associationcoefficients of IFS toconstruct anassociation
matrix,whichisthentransformedintoanequivalentassociation
matrix.Basedonthe-cuttingmatrixoftheequivalentassociation
matrix,clustersofIFSsarethendetermined.Xuetal.[39]defined
two intuitionisticfuzzysimilarity measuresfor constructingan
intuitionisticfuzzysimilaritymeasurematrixusedbyaspectral
algorithmtoclusterintuitionisticfuzzydata.Theun-normalized
graphLaplacianandeigenvectorswereoptedtoclusterthesamples
inspectralclustering.Wangetal.[36]presentedanettingmethod
tomakeclusteringanalysisofIFSsviatheintuitionisticfuzzy
sim-ilaritymatrix.Wangetal.[37]proposedtheintuitionisticfuzzy
squareproductwhichistransformedtotheintuitionisticfuzzy
sim-ilaritymatrixfordirectintuitionisticfuzzyclusteringbasedona
confidencelevel.Thosealgorithmsaremostlycomplexand
time-consumingsincetheyfirstlyconstructedtheintuitionisticfuzzy
similaritymatrixandtheneitherusedanexhaustediterative
strat-egyto gettheequivalentassociation matrix [42]or a complex
calculationthroughgraphLaplacian[39],nettingmethod[36],etc
Meanwhile,HPCreliessolelyonthegeneralizedpicturedistance
measureandhierarchicalclusteringschemefortheclassification
ofPFSs Itis indeedrecognizedthat HPChastheadvantages of
simpleprocessingandintuitivemanners.Butmorethanthat,HPC
providesthewaytodealwithPFSdatawhichwerenot
investi-gatedbytheexistingintuitionisticfuzzyclusteringalgorithms.As
mentionedearly,therearemanyeventsandphenomenathatare
representedbythePFSset.Whenfacingwiththosedata,
cluster-ingalgorithmsonIFSworkineffectivelysincetheydonottakeinto
accounttherefusal/neutralinformation.Combiningtherefusaland
neutraldegreesinIFSwouldmakelostinformation;letussayfor
example:aPFS-A={(x,0.3,0,0.1);(y,0.4,0.1,0.1)}andaIFS-B={(x,
0.3,0.1);(y,0.4,0.1)}.ItisobviousthatIFSregardsneutralvalues
ofxandybeing0.6and0.5,respectively.Yet,infactthemost
dom-inantpartintheneutralvaluesofIFSistherefusaldegree.The
observationinArevealsthatthe“real”neutralandrefusaldegrees
ofxare0and0.6whilethoseofyare0.1and0.4,respectively
Thus,it ismisleadingifweuseclusteringalgorithmsonIFSfor
dealingwithPFSdata.Inshort,weclearlyrecognizetheroleand
advantagesofHPCincomparisonwiththerelevantclustering
algo-rithmsonIFS.Wedonotmentionthe(comparisonof)clustering
qualitiesofthosealgorithmssincetheyaredesignedondifferent
basesets.However,wewouldliketoemphasizeonthesimplicity
andfirstdebutofaclusteringalgorithmonPFSwhichisthemain
contributioninthispaper
Therestofthepaperisorganizedasfollows.Section2presents thegeneralizedpicturedistancemeasureandtheHPCalgorithm Section3validatestheproposedalgorithmbyexperiments Sec-tion4drawstheconclusionsand delineatesthefuture research directions
2 The proposed methodology
Inthissection,wefirstlyintroduce thedefinitionof general-izedpicturedistancemeasureandthenpresentanovelhierarchical picturefuzzyclustering(HPC)
Definition 2. Afunctiond (A,B) withA,B ∈PFS(X)iscalled pic-turedistancemeasureifitsatisfies:
0≤d (A,B)≤1,
d (A,B)=0⇔A=B,
d (A,B)=d (B,A) ,
AB×d (A,B)+AC×d (A,C)≥BC×d (B,C)∀A,B,C ∈PFS(X) wherethesymbol“×”isthearithmeticalproduct.AB,BCandAC arecompositionoperationsofA,B,C ∈PFS(X).Asanexample,the followingmin-maxcompositionformulaeareusedtocalculatethe triple (AB,AC,BC) fromthemembershipfunctionsofA,B,C ∈ PFS(X)
AB=min
i
max
A(xi) ,B(xi)
BC=min
i
max
B(xi) ,C(xi)
Ac=min
i
max
A(xi) ,c(xi)
The aim of those formulae is to specify fuzzy coefficients
AB,AC,BC ∈ [0,1] forthefuzzytriangularinequalityasinthe 4thpropertyofthisdefinition.Besidesthemin-max,some typi-calcompositionssuchasmax-prod,Lukasiewiczt-norm,etc.can
beusedaccordingly.Ageometricalrepresentationofthe4th prop-ertyisgiveninFig.1.Itisclearfromthefigurethatanewfuzzy representationofABisA’B’whichisboundedinafuzzydomain calledArea1satisfyingd(A’B’)=AB×d (A,B).Then,there exists fuzzyrepresentationsofACandBCnamelyA’C’andB’C’thatbelong
toequivalentfuzzydomains—Area2andArea3respectivelysothat
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253 254 255 256 257
258
259 260 261 262 263
264
265
266
267
268 269 270 271
273
274
275
276 277 278 279 280 281 282 283 284 285
Trang 4(AB,AC,BC) sothatthe4thpropertyholdsthend (A,B) isa
pic-turedistancemeasure.Thisimpliesthatthedistancemeasureis
constructedonafuzzyspace.Intheequivalentarticlesonfuzzysets
and topology, Zadeh and coworkers [5,15,18,19,24,32,38,49,51]
suggestedthat the triangularinequality for a metric shouldbe
fuzzifiedbymembershipdegreessothatconditionsand
proper-tiesoffuzzytopologyhold.Thiscanberegardedassoftversionof
themetricdefinitioninahardspace
Definition 3. Thefunctionbelowisageneralizedpicturedistance
measurebetweenA,B∈PFS(X)
dG(A,B)=
1 N N
i =1
pi+ p
i + p i
pi,pi,ip 1/p
1 N N
i =1
pi+ p
i + p i
pi,pi,ip 1/p
+
max
i
˚A
i,˚B i
+1 N N
i =1
|˚A
i −˚B
i|p
1/p
+1
where
i=|A(xi)−B(xi)|,(i=1, ,N)
i=|A(xi)−B(xi)|,(i=1, ,N)
i=|A(xi)−B(xi)|,(i=1, ,N)
˚A
i =|A(xi)+A(xi)+A(xi)|,(i=1, ,N)
˚B
i =|B(xi)+B(xi)+B(xi)|, (i=1, ,N)
Remarks.
1)dG(A,B) is a hybrid measure of the well-known Hamming,
EuclideanandHausdorffdistances.Specifically,whenp=1,we
have thehybridbetweenHausdorffand Hammingmeasures
Whenp=2,a hybridofHausdorffandEuclideandistances is
recognized
2)dG(A,B) isnotatrivialhybridizationofsuchtheexisting
meas-uresinthesensethatitdoesnotphysicallymixthosemeasures
togetherwithouttakingcareoftheirmeaningandcontexts.In
fact,dG(A,B) hasbeendesignedonthebasisofthepicturefuzzy
setrepresentedintheformofmembershipvaluesi,i,i,
˚A
i and˚B
i.ItisregardedasageneralizationofdP(A,B),whichis
abasicpicturedistancemeasureofCuongandKreinovich[6],by
employingtheintegrationofothermeasuressuchasHamming,
EuclideanandHausdorffdistances
3)ThereasonsforthehybridizationindG(A,B) canbeexplainedas
follows.NotethatthebasicpicturedistancemeasureofCuong
and Kreinovich relies ontheHamming(p=1)and Euclidean
(p=2)distanceswhichwereshowntohavelimitationsin
deal-ingwithnon-sphericaldatasets[5,7,8,12,13,28–30].Sincethey
assumethatsamplepointsaredistributedaroundsamplemean
inasphericalmanner,theprobabilityofatestpointbelongingto
thesetdependsnotonlyonthedistancefromthesamplemean
butalsoonthedirectionsoastoavoidnon-spherical
distribu-tions[35,41,47].Meanwhile,Hausdorffmetricmeasureshowfar
twosubsetsofametricspacearefromeachother.Itturnstheset
ofnon-emptycompactsubsetsofametricspaceintoametric
spaceinitsownright.Thus,Hausdorffdistancehasthe
advan-tageofbeingsensitivetoposition[40,41,45].Anotherimportant
advantageofHausdorffdistanceisthepossibilityofusing
sep-aratelydissimilaritymeasuresbetweenoneobjectandapart
ofanother[46].Therefore,combiningHausdorffdistancewith
HammingandEuclideanmeasuresinageneralizedpicture
dis-tancemeasureasindG(A,B) wouldachievetheadvantagesof
eachmeasureaswellasincreasetheperformance
4)dG(A,B) is applicableto a large class of problems Asbeing
demonstratedinDefinition3,dG(A,B) iscomputedthroughthe
degreesofPFS(i,i,i,˚A
iand˚B
i)whichareappropriate
forPFSdata.Nonetheless,othertypesofcrispdata,e.g., numer-ical,categoricaldataandimagescanalsobeusedwithinthis measurewiththesupportofafuzzificationprocess.Forinstance,
animageisfirstlyextractedintofeaturerecordswhicharethen fuzzifiedbytheGaussianmembershipfunctiontomakefuzzy data.NotethateachdegreeinPFSwouldhavedifferent mem-bershipfunctionssothatwewillobtainvaluesofdegreesfor eachrecord.Thenextprocessisthendonewithintheachieved PFSdataasinDefinition3.Othertypesofdatacanbehandled analogously.Thisremarkshowsthegeneralityoftheproposed measure
Theorem 1. dG(A,B) isapicturedistancemeasure
Proof. FromDefinition2,itisobviousthatthegeneralized pic-turedistancessatisfythreefirstconditions.Forthelastcondition regardingtriangularinequalityinPFS,wehavetoprovethe exist-enceofatriple (AB,AC,BC) sothattheconditionholds.Since workingonPFSwhose dataelementshaveassociated member-shipvalues,it isclearthateachdistancemeasurebetweentwo setsinPFSshouldbeaccompaniedwithacompositionfunction
ofthosemembershipvalues.Assuch,thelastconditionisoften namedasthe(picture)fuzzytriangularinequalityorsoft triangu-larinequality.Intheextentofthisproof,wewillshowthatthere existsdiscretevaluesforthetriple (AB,AC,BC).Considerp=1 ForA,B∈PFS(X),letusdenote:
AB1=
N
i =1
i+i+i
AB2=max
i,i,i
,
AB3=max
i
˚A
i,˚B i
,
AB4=
N
i=1
|˚A
i −˚Bi|
Thefollowinginequalityisneededtoprove:
AB1+AB2
AB1+AB2+AB3+AB4+1+ AC1+AC2
AC1+AC2+AC3+AC4+1
≥ BC1+BC2
3 (BC1+BC2+BC3+BC4+1). ThefactsbelowcomefromthedefinitionofPFS
|A(xi)−B(xi)|+|A(xi)−C(xi)|≥|B(xi)−C(xi)|,
|A(xi)−B(xi)|+|A(xi)−C(xi)|≥|B(xi)−C(xi)|,
|A(xi)−B(xi)|+|A(xi)−C(xi)|≥|B(xi)−C(xi)|
Itfollowsthat,
AB1+AC1≥BC1,
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342 343 344 345 346 347 348 349 350 351 352
353 354 355 356 357 358 359 360 361 362 363 364 365
366
367
368
369
370
371
372 373
375
376
377 378
379
Trang 5Table 1
Q7
AB2+AC2≥BC2
Assume:
max{|B(x)−C(x)|,|B(x)−C(x)|,|B(x)−C(x)|}
=|B(x)−C(x)|
Then,
|B(x)−C(x)|≤|A(x)−B(x)|+|A(x)−C(x)|
≤max{|A(x)−B(x)|,|A(x)−B(x)|,|A(x)−B(x)|}+
max{|A(x)−C(x)|,|A(x)−C(x)|,|A(x)−C(x)|}
BC1+BC2
BC1+BC2+BC3+BC4+1
AB1+AB2+AC1+AC2+BC3+BC4+1
AB1+AB2+AC1+AC2+BC3+BC4+1(∗)
Ifoneofthefactsbelowhappen,
max
i
˚Ai,˚Bi,˚Ci
=˚Bj,
max
i
˚Ai,˚Bi,˚Ci
=˚Cj, ThenBC3≥AB3
Againifmax
i
˚A
i,˚B
i,˚C i
=˚A
j,
i
˚A
i,˚B i
−max
i
˚B
i,˚C i
≤ max
i
˚A
i −˚B
i,˚B
i −˚C i
≤max
i
˚A
i −˚B
i +˚B
i −˚C i
= max
i
˚A
i −˚C
i
≤3AC2,
AB3−BC3≤3AC2
Analogously,weachieve
BC4≥AB4,
Or3AC2+BC4≥AB4
Thus,
Combine(*,**,***),theinequalityisproven.Thus,dG(A,B) isa
picturedistancemeasure
Definition 4. The average picture set of Ai ∈PFS (X) (index
i=1, ,N)isdenotedasAVG (Ai),
AVG (Ai)=
x,N1
N
i=1
i(x) ,1 N
N
i=1
i(x) ,1 N
N
i=1
i(x)|x∈X
,
where i(x), i(x), i(x) are thepositive, neutral and negative membershipdegreesofAi,respectively
Definition 5. PictureDistanceMatrix(PDM)ofAi∈PFS (X) (index
i=1, ,N)isasimilaritymatrixsizedN×Nwhereeachelementis computedbyDefinition3
TheHPCAlgorithm:
Step1:GivenacollectionofAi∈PFS (X) (indexi=1, ,N) Con-sidereachAiisauniquecluster
Step2:CalculatePictureDistanceMatrix(PDM)
Step3:MergetwoconsecutivePFSsetsbasedonPDMand cal-culatenewcentersbyDefinition4.Noticethatonlytwoclusters arejointedineachstage
Step4:RepeatStep2withAibeingreplacedwiththenew cen-tersuntilthedesirablenumberofclustersisachieved
3 Evaluation
In this section, we aimtovalidate whetherthenew metric canaccurately measure dataelementsof thepicturefuzzyset:
Ai∈PFS (X).Eventhoughthereexistmanyextensionsofthe classi-calFuzzyC-Means(FCM)intheliteraturethatusedtheEuclideanor HammingorMahalanobisdistancesforobtainingclustersof spher-icalorellipticgeometricalform,theywerenotdesignedtoworkin thePFSsetwhichcontainstheinformation ofpositive,negative andneutralasinDefinition1.Therefore,inordertoclassifyPFS elements:Ai ∈PFS (X),weshouldusethebasicandgeneralized picture distancemeasures-dP(A,B) and dG(A,B) ina hierarchi-cal clusteringalgorithm like HPCrespectively Thissection will compare thosemeasures interms of performance ofclustering algorithms.Therefore,wehaveimplementedtheHPCalgorithm withdG(A,B) inadditiontoavariantofHPCusingdP(A,B) called
CK.TheIntuitionisticHierarchicalClustering(IHC)algorithm[43] hasbeenimplementedtoevaluateclusteringqualityofHPCand CK
The experimental data consists of 4 datasets The first one, Guangzhoucar[50]describedinTable1,isasmalldatasetconsists
of5newcarsintheGuangzhoumarketevaluatedby6criteria:Fuel (G1),Aerod(G2),Price(G3),Comfort(G4),Design(G5)andSafety (G6).Dataofeachcarforagivencriterionconsistofthree com-ponentsrepresentingforthepositive,theneutralandthenegative degrees.SumoftheneutralandthenegativedegreesinTable1
isthenon-membershipvalueinRef.[50].Thesecondone, Build-ingmaterials[40]showninTable2,isanothersmalldatasethas5 buildingmaterialsnamelySealant,Floorvarnish,Wallpaint, Car-petandChlorideflooringcharacterizedby8attributes.Sumofthe neutralandthenegativedegreesinTable2isthenon-membership valueinRef.[40].Theaimofthisdatasetistovalidatethe algo-rithmsonadatasethavinglargernumberofattributesthanthat
of theGuangzhoucar dataset.Thethirdone,Heart Disease[34] whoseapartisexpressedinTable3,isareallargedatasetfrom UCIMachineLearningRepositoryconsistsof270patients acquir-ingheartdiseasecategorizedby3attributessuchasAge(3#Age), Bloodpressure(mmHg/patient,10#Trestbps)andheartrate(#32 Thalach).Thepositive,theneutraland thenegativedegreesare fuzzifiedfromcrispdatausingGaussian,triangularandtrapezoid
380
381
382
383
384
386
387
388
389
390
391
392
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413 414
415 416 417
418 419 420 421 422 423 424 425 426
427
428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465
Trang 6Table 2
Table 3
Q8
membershipfunctions.Theaimofthisdatasetistovalidatethe
algorithmsonadatasethavinglargernumberofobjectsthanthat
oftheGuangzhoucardataset.Lastly,ForestCoverType[33]isalarge
datasetextractedfromUCIMachineLearningRepositoryincludes
1000instancesin10dimensionsshowingtheactualforestcover
typeforagivenobservation(30×30mcell)determinedfromUS
ForestService(USFS)Region2ResourceInformationSystem(RIS)
Thepositive,theneutralandthenegativedegreesarefuzzifiedfrom
crispdatausingGaussian,triangularandtrapezoidmembership
functions.Theaimofthisdatasetistovalidatethealgorithmsona
largedatasethavingboththenumberofobjectsandthenumberof
attributesgreaterthanthoseoftheGuangzhoucardataset
Inordertoevaluateclusteringqualitiesofthealgorithms,we
useNMI(NormalizedMutualInformation),F-MeasureandPurity
Theseevaluationindicesarethe-larger-the-better
NMI=
k
j =1
r
i =1
nijlogn×nij
ni×n j
r
i =1
nilogni
n
⎛
⎝ k
j =1
njlognj
n
⎞
⎠ ,
Precisioni= 1
ni
k
max
j =1
nij ,(i=1, ,r),
Recalli= 1
nj∗
k
max
j =1
nij ,(i=1, ,r),
j∗=argmaxk
j =1
nij ,(j∗∈ [1,k] ),
Fi=2×Precisioni×Recalli Precisioni+Recalli ,
F-Measure=1
r
r
i =1
Fi,
Purity=1n
r
i=1
k
max
j=1
nij ,
where
• T=
T1, ,Tk
andC=
C1, ,Cr
arekcorrectandrpredicted clusters,respectively
• nistotalnumberofdatapoints
• nij=|Ci∩Tj|:commonnumberofdatapointsbetweenCiandTj (i=1, ,r;j=1, ,k)
• ni=
k
j=1
nij:numberofdatapointsofCi(i=1, ,r)
• nj=
r
i=1
nij:numberofdatapointsofTj(j=1, ,k)
Firstly,weillustratetheactivitiesoftheHPCalgorithmtoclassify theGuangzhoucardatasetinTable1.Inthefirstphase,eachcarin thedatasetisauniquecluster
Car1 , Car2 , Car3 , Car4 , Car5
Table 4
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489 490 491 492 493
494
495
496 497 498
499
Trang 7PDM1=
⎛
⎜
⎜
⎝
0 0.2908 0.3275 0.2560 0.3406
0.2908 0 0.3036 0.2912 0.3286
0.3275 0.3036 0 0.3581 0.3062
0.2560 0.2912 0.3581 0 0.3767
0.3406 0.3286 0.3062 0.3767 0
⎞
⎟
⎟
⎠.
Becaused (Car1,Car4)=0.2560 is theminimalvalue among
alldistances,Car1 andCar4are groupedintoacluster
Remov-ingalldistancevaluesrelatedtoCar1andCar4,werecognizethat
d (Car2,Car3)=0.3036istheminimalvalueamongall.Thus,Car2
andCar3aremergedintoanothercluster.Resultsofthesecond
phaseare:
Car1,Car4
, Car2,Car3
, Car5
Using Definition 4, the centers of
Car1,Car4
and
Car2,Car3
are:
Fuel
Aerod
Price
Comfort
Design
Safety
0.3)
(0.5, 0.05, 0.05)
(0.65, 0.05, 0.1)
(0.8,0.05, 0.05)
(0.15,0.2, 0.35)
(0.6,0.165, 0.085)
0.25,
0.15)
(0.3,0.3, 0.2)
(0.35,0.1, 0.25)
(0.15,0.1, 0.25)
(0.3,0.35, 0.25)
(0.5,0.2, 0.15)
Next,wecalculatethePDMofPhase2
PDM2=
⎛
⎝00.28870.28870 0.34850.2958
0.3485 0.2958 0
⎞
⎠
Sinced
Car1,Car4
, Car2,Car3
=0.2887isthesmallest valueamongalldistancesinPDM2,wecombinethosecarsintoa
cluster.Resultsofthethirdphaseare:
Car1,Car4,Car2,Car3
, Car5 Thecenterofcluster
Car1,Car4,Car2,Car3
is:
Fuel
Aerod
Price
Comfort
Design
Safety
0.2,
0.225)
(0.4, 0.175, 0.125)
(0.5, 0075, 0.175)
(0.475, 0.075, 0.15)
(0.225, 0.275, 0.3)
(0.55, 0.1825, 0.1175)
ThePDMofPhase3is:
PDM3=0.3110
Lastly,inthefourthphase,allcarsaregroupedintoaunique
cluster.AhierarchicaltreefortheclassificationofGuangzhoucar
datasetusingHPCalgorithmisshowninFig.2.Ifwecomputethe
averagevaluesofthepositive,theneutralandthenegative
mem-bershipsofallcarsandgroupbyphasesthenwegettheresultsin
Table4
Inordertovisualizetheclusteringresults,weusePrincipal
Com-ponentAnalysis(PCA),whichisawell-knownmethodinstatistics,
toreducedimensionsofdatainTable4andgettheresultsinTable5
2Ddistributionsofdatapointsandcentersofallphasesarealso
depictedinFigs.3–6
Secondly,wecomparetheclusteringqualitiesofHPCand CK
through evaluation indices on the experimental datasets The
resultsontheGuangzhoucardatasetareshowninTables6–9
Table 5
Table 6
Table 7
Table 8
Table 9
TheresultshaveshownthatclusteringqualityofHPCisbetter thanthatofCK.Moreover,asillustratedinFigs.2and7andTable6,
weclearlyrecognizethatthehierarchicaltreeofIHCisidenticalto HPC.Thismeansthatusingthegeneralizedpicturedistance mea-sureinclusteringalgorithmsresultsinbetterqualitythanusingthe basicpicturedistance
Analogously, wemadethecomparisononotherdatasetsand achievedtheresultsinTables10–19.Thevaluesin thesetables affirmtheefficiencyofHPCeveninthecasesthatthenumberof attributesorthenumberofobjectsishigherthanthatofGuangzhou car.Thisclearlyshowsthefactthatusingthegeneralizedpicture distancemeasurewouldresultinaccuratecalculationsofsimilarity betweenobjects
In Figs 8 and 9, we illustrate the hierarchical tree of HPC for buildingmaterialsand aclusteringtool calledHPCS—akind
ofknowledge-basedsystemstoassistclusteringonPFSdatasets, respectively
500
501
502
503
504
505
506
508
509
510
511
513
514
515
517
518
519
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552
Trang 8Fig 2. The hierarchical tree of HPC for Guangzhou car.
Trang 9Fig 5. The distributions of data and centers in Phase 3.
Trang 10Table 10
Carpet,
Carpet,
Carpet,
Carpet,
Sealant
Sealant
Table 11
Table 12
Table 13
Table 14
Table 15