DSpace at VNU: Generalized picture distance measure and applications to picture fuzzy clustering

DSpace at VNU: Generalized picture distance measure and applications to picture fuzzy clustering tài liệu, giáo án, bài...

Trang 1

jo u r n al hom e p a g e :w w w e l s e v i e r c o m / l o c a t e / a s o c

Generalized picture distance measure and applications to picture

fuzzy clustering

Q2

Q3

a r t i c l e i n f o

Keywords:

a b s t r a c t

Picturefuzzyset(PFS),whichisageneralizationoftraditionalfuzzysetandintuitionisticfuzzyset, showsgreatpromisesofbetteradaptationtomanypracticalproblemsinpatternrecognition,artiﬁcial life,robotic,expertandknowledge-basedsystemsthanexistingtypesoffuzzysets.Anemergingresearch trendinPFSisdevelopmentofclusteringalgorithmswhichcanexploitandinvestigatehiddenknowledge fromamassofdatasets.Distancemeasureisoneofthemostimportanttoolsinclusteringthatdetermine thedegreeofrelationshipbetweentwoobjects.Inthispaper,weproposeageneralizedpicturedistance measureandintegrateittoanovelhierarchicalpicturefuzzyclusteringmethodcalledHierarchicalPicture Clustering(HPC).Experimentalresultsshowthattheclusteringqualityoftheproposedalgorithmisbetter thanthoseoftherelevantones

1 Introduction

Sincefuzzyset(FS)[49]wasﬁrstlyintroducedbyZadehin1965,

manyextensionsofFShavebeenproposedintheliteraturesuchas

thetype-2fuzzyset(T2FS)[18],roughset(RS)[24],softset,rough

softsetandfuzzysoftset[15],intuitionisticfuzzyset(IFS)[3],

intu-itionisticfuzzyroughset(IFRS)[51],softroughfuzzyset&softfuzzy

roughset[19],interval-valuedintuitionisticfuzzyset(IVIFS)[38]

andhesitantfuzzyset(HFS)[32].Theaimofthoseextensionsis

toovercomethelimitationsofFSregardingthedegreeof

fuzzi-ness,theuncertainty ofmembershipdegrees,and theexistence

ofneutrality.Recently,anewgeneralizedfuzzysetcalledpicture

fuzzyset(PFS)hasbeenproposedbyCuongandKreinovichinRef

[6].Theword“picture”inPFSreferstogeneralityasthissetisthe

directextensionofFSandIFS.Intheotherwords,PFSintegrates

informationofneutralandnegativeintoitsdeﬁnitionsothatwhen

thevalue(s)ofone(both)ofthosedegreesis(are)equaltozero,it

returnstoIFS(FS)set.ComparingwithIFS,PFSdividesthehesitancy

degreeintotwoparts,i.e.,refusaldegreeandneutraldegree(see

Deﬁnition1andExamples1and2fordetails).Thissetshowsgreat

promisesofbetteradaptationtomanypracticalproblemsin

pat-ternrecognition,artiﬁciallife,robotic,expertandknowledge-based

systemsthansomeexistingtypesoffuzzysets

Deﬁnition 1. Apicturefuzzyset(PFS)[6]inanon-emptysetXis,

Q4

A=

x,A(x) ,A(x) ,A(x)|x∈X

, whereA(x) isthepositivedegreeofeachelementx∈X,A(x) is theneutraldegreeandA(x) isthenegativedegreesatisfyingthe constraints,

A(x) ,A(x) ,A(x) ∈ [0,1] ,∀x ∈X,

0≤A(x)+A(x)+A(x)≤1,∀x ∈X

TherefusaldegreeofanelementiscalculatedasA(x)=1− (A(x)+A(x)+A(x)),∀x∈X.InthecaseA(x)=0PFSreturns

totheIFSset,andwhenbothA(x)=A(x)=0,PFSreturnstothe

FSset.SomepropertiesofPFSoperations,theconvexcombination

ofPFS,etc.accompaniedwithproofscanbereferencedinRef.[6]

Example 1. Inademocraticelectionstation,thecouncilissues

500votingpapersforacandidate.Thevotingresultsaredivided intofourgroupsaccompaniedwiththenumberofpapersnamely

“votefor”(300),“abstain”(64),“voteagainst”(115)and“refusalof voting”(21).Group“abstain”meansthatthevotingpaperisawhite paperrejectingboth“agree”and“disagree”forthecandidatebut stilltakesthevote.Group“refusalofvoting”iseitherinvalidvoting papersorbypassingthevote.Thisexamplewashappenedinreality andIFScouldnothandleitsincetheneutralmembership(group

“abstain”)doesnotexist

Example 2. Personnelselectionis averyimportantactivityin thehumanresourcemanagementofanorganization.Theprocess

ofselection followsa methodologytocollectinformation about http://dx.doi.org/10.1016/j.asoc.2016.05.009

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45 46

48

49 50 51 52 53 54

55 56 57 58 59 60 61 62 63 64 65 66 67

Trang 2

anindividualinordertodetermineifthat individualshouldbe

employed.Theselectionresultscouldbeclassiﬁedinto4classes:

truepositive,truenegative,falsenegative,andfalsepositivewhich

are somehow equivalent to the positive, neutral, negative and

refusaldegrees ofPFS.Eachcandidate isranked accordingto4

classesbyhisabilityandsuitabilityforthejob,andtheﬁnaldecision

ismadebasedonresultsoftheclasses.Forexample,iftwo

candi-datesarerankedA-(50%,20%,20%,10%)andB-(40%,10%,30%,20%),

theﬁnaldecisioncanbemadethroughtheunionoperatorand

max-imumofthepositivedegreeinPFSwhichreturnsthevalueof50%

(Aisselected)

AnemergingtrendinPFSandotheradvancedfuzzysetsisthe

developmentofsoftcomputingmethodsespeciallyclustering

algo-rithmsonthesesets,whichcouldproducebetterqualityofresults

thanthat onFS For instance,clustering algorithmson interval

T2FSfocusingonuncertaintyassociatedwiththefuzziﬁerwere

investigatedinRefs.[14,52].RegardingtheIFSset,Pelekisetal

[23]proposedaclusteringapproachutilizingasimilarity-metric

Q5

deﬁnedoverIFS.XuandWu[45]developedtheIFCMalgorithm

toclassifyIFSandinterval-valuedIFS.Sonetal.[26]proposedan

intuitionisticfuzzyclusteringalgorithmforgeo-demographic

anal-ysis.Xuandhisgroupdevelopedanumberofintuitionisticfuzzy

clusteringmethodsinvariouscontexts[36,37,39,42].Fuzzy

clus-teringalgorithmsonothersetsnamelyHFSandPFSwerefound

inRefs.[4,27].Itisclearfromtheliteraturethatdistancemeasure

isthemostimportantfactorforanefﬁcientclusteringalgorithm

ThemostwidelyuseddistancemeasuresfortwoFSsAandBon

X=

X1, ,XN

istheHamming,EuclideanandHausdorffmetrics [6].BecauseoftheFS’sdrawbacks,distancemeasuresonothersets

mostlyIFShavebeenproposed.Atanassov[3],Chen[5],Dengfeng

andChuntian[7],Grzegorzewski[10],Hatzimichailidisetal.[11],

HungandYang[12,13],Lietal.[16],LiangandShi[17],Mitchell

[21],Papakostasetal.[22],SzmidtandKacprzyk[28–30],Wangand

Xin[35],XuandChen[41],XuandXia[46],YangandChiclana[47]

andXu[44]presentedsomedistancemeasuresinIFSnamelythe

(normalized)intuitionisticHammingandEuclideandistances,and

the(normalized)HausdorffintuitionisticHammingandEuclidean

distances.AbasicdistancemeasureonPFShasbeengivenbyCuong

andKreinovich[6]asfollows

=

1

N

i=1

.

WerecognizethatdP(A,B) isageneralizationofthoseinIFS

Q6

andFSwhenA(x)=0andbothA(x)=A(x)=0,respectively

Asexplainedabove,theintegrationofneutraldegreeA(x) would

measureinformationofobjectsmoreaccuratelyandincrease

qual-ityandaccuracyofachievedresults.Yetagain,tohelpimproving

theperformanceasmotivatedbythepreviousresearchesonIFS

thattendedtocombinesomebasicdistancemeasuresintoa

com-plexoneto improvethegenerality and accuracy,in this paper

weproposeanovelgeneralizedpicturedistancemeasureanduse

itinanewclusteringmethodonPFScalledHierarchicalPicture

Clustering(HPC).Thereasonfordesigninganewmeasurecanbe

illustratedbyanexampleasfollows.Considerthatwewouldlike

tomeasurethetruth-valueofthepropositionG=“throughapoint

exteriortoalineonecandrawonlyoneparalleltothegivenline”

Thepropositionisincomplete,sinceitdoesnotspecifythetypeof

geometricalspaceitbelongsto.InanEuclideangeometricspace

theproposition Gistrue;in aRiemanniangeometricspacethe

propositionGisfalse(sincethereisnoparallelpassingthrough

anexteriorpointtoagivenline);inageometricspacecovering

thePFSset(constructedfrommixedspaces,forexamplefroma

partofEuclideansubspacetogetherwithanotherpartof Riemann-ianspace)thepropositionGisindeterminate(trueandfalseinthe sametime)[48].Itisobviousthatobjects,notions,ideas,etc.can

bebettermeasuredinPFSthaninothertypesoffuzzysets

Themaindifferencesoftheproposeddistancemeasurewith

dP(A,B) and thoseonIFSsuchasinXu[44] arehighlightedas follows

Firstly,asbeingshownabove,dP(A,B) isanaturalexpansionof thewell-knownMinkowskidistanceoforderp≥1betweentwo pointsunderfuzzyenvironments.Whenp=1orp=2,wehavethe Manhattanand Euclideandistances,respectively.Inthelimiting caseofpreachinginfinity,weobtaintheChebyshevdistance.The Minkowskidistancehasthebestperformancefornumericaldata but works ineffectivelywithasymmetric binaryvariables, non-metricvectorobjects,etc.[20].Forexample,thesimilaritybetween twovectorscanbedenotedasacosinemeasurewhichisfurther usedtodefinea distance[48].Forasymmetricbinaryvariables, thecontingencytable,whichreflectsthematchingstatesbetween twoobjects,isusedtocomputethedistancebetweenasymmetric binaryvariables[25].Itisveryoftenthatanon-linearfunctionis adoptedasthedistancemetricforprocessingnon-sphericaldata [9].Oneofthemostcommonwaystocreatesuchthefunctionis combiningthebasicdistancemeasuresintoacomplexonesothat thedeficienciesofthestandalonemetricsaresettled.This intu-itionleadstodebutoftheproposedmeasurewhichmayenhance performanceandaccuracyofresults

Secondly,theproposedmeasureisacombinationofthe Ham-ming,EuclideanandHausdorffdistances.ItisdifferenttodP(A,B) whichinessenceisthenormalizedformofwell-knownMinkowski distanceoforderp≥1.Inthenextsection,wewillexplainwhy thehybridizationshouldbemadeandemphasizeonthe advan-tagesanddisadvantagesofusingtheproposedmeasure.However,

itisnotedthattheproposeddistancemeasureisageneralization versionofdP(A,B)

Thirdly,theproposeddistancemeasureisdifferenttothoseon IFSsuchasinXu[44]inmanyaspects.Letustakesomeexamples.In Ref.[44],XugeneralizedtheintuitionisticHammingandEuclidean distancesofSzmidtandKacprzyk[28]asbelow

=

1 2N

N

i=1

.

Hethendeﬁnedseveralsimilaritymeasuresfromtheabove dis-tancefunction,forinstance:

1 2N

N

i=1

,

⎛

⎜

⎝

N

i=1

N

i=1

⎞

⎟

⎠ 1/˛

Even though d (A,B) is quite similar to dP(A,B), we recog-nize that d (A,B) is designed on thebasis of IFS which means

A(x)+A(x)+A(x)=1whiledP(A,B) isthedistanceonPFS sat-isfying0≤A(x)+A(x)+A(x)≤1.Indeed,itisnotintuitiveand

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170

171

172 173 174 175 176

177

178 179 180

181

182 183 184 185 186

Trang 3

logicalwhentakingthedifferencebetweenA(x) andB(x) since

thesevaluescanbecalculatedthroughotherdegrees.Intheother

word,althoughd(A,B)isexpressedasafunctionofthree

compo-nents,itturnsoutthatd(A,B)isdependentontwovariables.Thisis

differenttodP(A,B) whichismeasuredbythreeseparatedegrees

Thus,werealizethatd(A,B)isdifferenttodP(A,B) andcertainly

muchdifferenttotheproposed(hybrid)measure.Again,inRef

[39]Xuetal.proposedtwointuitionisticfuzzysimilaritymeasures

forspectralclusteringbasedontheminimumoperatorbetween

themembershipandnon-membershipdegreesofIFS.Those

sim-ilaritymeasuresaredeﬁnedbasedonthestandard intuitionistic

Hamming,EuclideanandHausdorffdistances.Anoverviewof

dis-tanceandsimilaritymeasuresofIFSgiven byXuandChen[41]

afﬁrmedthatmostoftherelevantworksinIFSpaymuchattention

tothesimilaritydegreesbasedonthreebasicdistancefunctions

namelytheintuitionisticHamming,EuclideanandHausdorff.The

analysisclearlypointoutthedifferenceandnoveltyoftheproposed

distancemeasurewiththoseonIFS

Oncedeﬁning the generalized picture distancemeasure, we

applyit toa newclusteringmethod calledHierarchicalPicture

Clustering(HPC).Itusesasimplerstrategyandeasierfor

imple-mentationthantheintuitionisticfuzzyclustering[36–38,39,42]

For instance, Xu et al [42] proposed intuitionistic clustering

using associationcoefﬁcients of IFS toconstruct anassociation

matrix,whichisthentransformedintoanequivalentassociation

matrix.Basedonthe-cuttingmatrixoftheequivalentassociation

matrix,clustersofIFSsarethendetermined.Xuetal.[39]deﬁned

two intuitionisticfuzzysimilarity measuresfor constructingan

intuitionisticfuzzysimilaritymeasurematrixusedbyaspectral

algorithmtoclusterintuitionisticfuzzydata.Theun-normalized

graphLaplacianandeigenvectorswereoptedtoclusterthesamples

inspectralclustering.Wangetal.[36]presentedanettingmethod

tomakeclusteringanalysisofIFSsviatheintuitionisticfuzzy

sim-ilaritymatrix.Wangetal.[37]proposedtheintuitionisticfuzzy

squareproductwhichistransformedtotheintuitionisticfuzzy

sim-ilaritymatrixfordirectintuitionisticfuzzyclusteringbasedona

conﬁdencelevel.Thosealgorithmsaremostlycomplexand

time-consumingsincetheyﬁrstlyconstructedtheintuitionisticfuzzy

similaritymatrixandtheneitherusedanexhaustediterative

strat-egyto gettheequivalentassociation matrix [42]or a complex

calculationthroughgraphLaplacian[39],nettingmethod[36],etc

Meanwhile,HPCreliessolelyonthegeneralizedpicturedistance

measureandhierarchicalclusteringschemefortheclassiﬁcation

ofPFSs Itis indeedrecognizedthat HPChastheadvantages of

simpleprocessingandintuitivemanners.Butmorethanthat,HPC

providesthewaytodealwithPFSdatawhichwerenot

investi-gatedbytheexistingintuitionisticfuzzyclusteringalgorithms.As

mentionedearly,therearemanyeventsandphenomenathatare

representedbythePFSset.Whenfacingwiththosedata,

cluster-ingalgorithmsonIFSworkineffectivelysincetheydonottakeinto

accounttherefusal/neutralinformation.Combiningtherefusaland

neutraldegreesinIFSwouldmakelostinformation;letussayfor

example:aPFS-A={(x,0.3,0,0.1);(y,0.4,0.1,0.1)}andaIFS-B={(x,

0.3,0.1);(y,0.4,0.1)}.ItisobviousthatIFSregardsneutralvalues

ofxandybeing0.6and0.5,respectively.Yet,infactthemost

dom-inantpartintheneutralvaluesofIFSistherefusaldegree.The

observationinArevealsthatthe“real”neutralandrefusaldegrees

ofxare0and0.6whilethoseofyare0.1and0.4,respectively

Thus,it ismisleadingifweuseclusteringalgorithmsonIFSfor

dealingwithPFSdata.Inshort,weclearlyrecognizetheroleand

advantagesofHPCincomparisonwiththerelevantclustering

algo-rithmsonIFS.Wedonotmentionthe(comparisonof)clustering

qualitiesofthosealgorithmssincetheyaredesignedondifferent

basesets.However,wewouldliketoemphasizeonthesimplicity

andﬁrstdebutofaclusteringalgorithmonPFSwhichisthemain

contributioninthispaper

Therestofthepaperisorganizedasfollows.Section2presents thegeneralizedpicturedistancemeasureandtheHPCalgorithm Section3validatestheproposedalgorithmbyexperiments Sec-tion4drawstheconclusionsand delineatesthefuture research directions

2 The proposed methodology

Inthissection,weﬁrstlyintroduce thedeﬁnitionof general-izedpicturedistancemeasureandthenpresentanovelhierarchical picturefuzzyclustering(HPC)

Deﬁnition 2. Afunctiond (A,B) withA,B ∈PFS(X)iscalled pic-turedistancemeasureifitsatisﬁes:

0≤d (A,B)≤1,

d (A,B)=0⇔A=B,

d (A,B)=d (B,A) ,

AB×d (A,B)+AC×d (A,C)≥BC×d (B,C)∀A,B,C ∈PFS(X) wherethesymbol“×”isthearithmeticalproduct.AB,BCandAC arecompositionoperationsofA,B,C ∈PFS(X).Asanexample,the followingmin-maxcompositionformulaeareusedtocalculatethe triple (AB,AC,BC) fromthemembershipfunctionsofA,B,C ∈ PFS(X)

AB=min

i

max

A(xi) ,B(xi)

BC=min

i

max

B(xi) ,C(xi)

Ac=min

i

max

A(xi) ,c(xi)

The aim of those formulae is to specify fuzzy coefﬁcients

AB,AC,BC ∈ [0,1] forthefuzzytriangularinequalityasinthe 4thpropertyofthisdeﬁnition.Besidesthemin-max,some typi-calcompositionssuchasmax-prod,Lukasiewiczt-norm,etc.can

beusedaccordingly.Ageometricalrepresentationofthe4th prop-ertyisgiveninFig.1.Itisclearfromtheﬁgurethatanewfuzzy representationofABisA’B’whichisboundedinafuzzydomain calledArea1satisfyingd(A’B’)=AB×d (A,B).Then,there exists fuzzyrepresentationsofACandBCnamelyA’C’andB’C’thatbelong

toequivalentfuzzydomains—Area2andArea3respectivelysothat

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253 254 255 256 257

258

259 260 261 262 263

264

265

266

267

268 269 270 271

273

274

275

276 277 278 279 280 281 282 283 284 285

Trang 4

(AB,AC,BC) sothatthe4thpropertyholdsthend (A,B) isa

pic-turedistancemeasure.Thisimpliesthatthedistancemeasureis

constructedonafuzzyspace.Intheequivalentarticlesonfuzzysets

and topology, Zadeh and coworkers [5,15,18,19,24,32,38,49,51]

suggestedthat the triangularinequality for a metric shouldbe

fuzziﬁedbymembershipdegreessothatconditionsand

proper-tiesoffuzzytopologyhold.Thiscanberegardedassoftversionof

themetricdeﬁnitioninahardspace

Deﬁnition 3. Thefunctionbelowisageneralizedpicturedistance

measurebetweenA,B∈PFS(X)

dG(A,B)=

1 N N

i =1

pi+ p

i + p i

pi,pi,ip 1/p

1 N N

i =1

pi+ p

i + p i

pi,pi,ip 1/p

+

max

i

˚A

i,˚B i

+1 N N

i =1

|˚A

i −˚B

i|p

1/p

+1

where

i=|A(xi)−B(xi)|,(i=1, ,N)

˚A

i =|A(xi)+A(xi)+A(xi)|,(i=1, ,N)

˚B

i =|B(xi)+B(xi)+B(xi)|, (i=1, ,N)

Remarks.

1)dG(A,B) is a hybrid measure of the well-known Hamming,

EuclideanandHausdorffdistances.Speciﬁcally,whenp=1,we

have thehybridbetweenHausdorffand Hammingmeasures

Whenp=2,a hybridofHausdorffandEuclideandistances is

recognized

2)dG(A,B) isnotatrivialhybridizationofsuchtheexisting

meas-uresinthesensethatitdoesnotphysicallymixthosemeasures

togetherwithouttakingcareoftheirmeaningandcontexts.In

fact,dG(A,B) hasbeendesignedonthebasisofthepicturefuzzy

setrepresentedintheformofmembershipvaluesi,i,i,

˚A

i and˚B

i.ItisregardedasageneralizationofdP(A,B),whichis

abasicpicturedistancemeasureofCuongandKreinovich[6],by

employingtheintegrationofothermeasuressuchasHamming,

EuclideanandHausdorffdistances

3)ThereasonsforthehybridizationindG(A,B) canbeexplainedas

follows.NotethatthebasicpicturedistancemeasureofCuong

and Kreinovich relies ontheHamming(p=1)and Euclidean

(p=2)distanceswhichwereshowntohavelimitationsin

deal-ingwithnon-sphericaldatasets[5,7,8,12,13,28–30].Sincethey

assumethatsamplepointsaredistributedaroundsamplemean

inasphericalmanner,theprobabilityofatestpointbelongingto

thesetdependsnotonlyonthedistancefromthesamplemean

butalsoonthedirectionsoastoavoidnon-spherical

distribu-tions[35,41,47].Meanwhile,Hausdorffmetricmeasureshowfar

twosubsetsofametricspacearefromeachother.Itturnstheset

ofnon-emptycompactsubsetsofametricspaceintoametric

spaceinitsownright.Thus,Hausdorffdistancehasthe

advan-tageofbeingsensitivetoposition[40,41,45].Anotherimportant

advantageofHausdorffdistanceisthepossibilityofusing

sep-aratelydissimilaritymeasuresbetweenoneobjectandapart

ofanother[46].Therefore,combiningHausdorffdistancewith

HammingandEuclideanmeasuresinageneralizedpicture

dis-tancemeasureasindG(A,B) wouldachievetheadvantagesof

eachmeasureaswellasincreasetheperformance

4)dG(A,B) is applicableto a large class of problems Asbeing

demonstratedinDeﬁnition3,dG(A,B) iscomputedthroughthe

degreesofPFS(i,i,i,˚A

iand˚B

i)whichareappropriate

forPFSdata.Nonetheless,othertypesofcrispdata,e.g., numer-ical,categoricaldataandimagescanalsobeusedwithinthis measurewiththesupportofafuzziﬁcationprocess.Forinstance,

animageisfirstlyextractedintofeaturerecordswhicharethen fuzzifiedbytheGaussianmembershipfunctiontomakefuzzy data.NotethateachdegreeinPFSwouldhavedifferent mem-bershipfunctionssothatwewillobtainvaluesofdegreesfor eachrecord.Thenextprocessisthendonewithintheachieved PFSdataasinDefinition3.Othertypesofdatacanbehandled analogously.Thisremarkshowsthegeneralityoftheproposed measure

Theorem 1. dG(A,B) isapicturedistancemeasure

Proof. FromDeﬁnition2,itisobviousthatthegeneralized pic-turedistancessatisfythreeﬁrstconditions.Forthelastcondition regardingtriangularinequalityinPFS,wehavetoprovethe exist-enceofatriple (AB,AC,BC) sothattheconditionholds.Since workingonPFSwhose dataelementshaveassociated member-shipvalues,it isclearthateachdistancemeasurebetweentwo setsinPFSshouldbeaccompaniedwithacompositionfunction

ofthosemembershipvalues.Assuch,thelastconditionisoften namedasthe(picture)fuzzytriangularinequalityorsoft triangu-larinequality.Intheextentofthisproof,wewillshowthatthere existsdiscretevaluesforthetriple (AB,AC,BC).Considerp=1 ForA,B∈PFS(X),letusdenote:

AB1=

N

i =1

i+i+i

AB2=max

i,i,i

,

AB3=max

i

˚A

i,˚B i

,

AB4=

N

i=1

|˚A

i −˚Bi|

Thefollowinginequalityisneededtoprove:

AB1+AB2

AB1+AB2+AB3+AB4+1+ AC1+AC2

AC1+AC2+AC3+AC4+1

≥ BC1+BC2

3 (BC1+BC2+BC3+BC4+1). ThefactsbelowcomefromthedeﬁnitionofPFS

|A(xi)−B(xi)|+|A(xi)−C(xi)|≥|B(xi)−C(xi)|,

Itfollowsthat,

AB1+AC1≥BC1,

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342 343 344 345 346 347 348 349 350 351 352

353 354 355 356 357 358 359 360 361 362 363 364 365

366

367

368

369

370

371

372 373

375

376

377 378

379

Trang 5

Table 1

Q7

AB2+AC2≥BC2

Assume:

max{|B(x)−C(x)|,|B(x)−C(x)|,|B(x)−C(x)|}

=|B(x)−C(x)|

Then,

≤max{|A(x)−B(x)|,|A(x)−B(x)|,|A(x)−B(x)|}+

max{|A(x)−C(x)|,|A(x)−C(x)|,|A(x)−C(x)|}

BC1+BC2

BC1+BC2+BC3+BC4+1

AB1+AB2+AC1+AC2+BC3+BC4+1

AB1+AB2+AC1+AC2+BC3+BC4+1(∗)

Ifoneofthefactsbelowhappen,

max

i

˚Ai,˚Bi,˚Ci

=˚Bj,

max

i

˚Ai,˚Bi,˚Ci

=˚Cj, ThenBC3≥AB3

Againifmax

i

˚A

i,˚B

i,˚C i

=˚A

j,

i

˚A

i,˚B i

−max

i

˚B

i,˚C i

≤ max

i

˚A

i −˚B

i,˚B

i −˚C i

≤max

i

˚A

i −˚B

i +˚B

i −˚C i

= max

i

˚A

i −˚C

i

≤3AC2,

AB3−BC3≤3AC2

Analogously,weachieve

BC4≥AB4,

Or3AC2+BC4≥AB4

Thus,

Combine(*,**,***),theinequalityisproven.Thus,dG(A,B) isa

picturedistancemeasure

Deﬁnition 4. The average picture set of Ai ∈PFS (X) (index

i=1, ,N)isdenotedasAVG (Ai),

AVG (Ai)=

x,N1

N

i=1

i(x) ,1 N

N

i=1

i(x) ,1 N

N

i=1

i(x)|x∈X

,

where i(x), i(x), i(x) are thepositive, neutral and negative membershipdegreesofAi,respectively

Deﬁnition 5. PictureDistanceMatrix(PDM)ofAi∈PFS (X) (index

i=1, ,N)isasimilaritymatrixsizedN×Nwhereeachelementis computedbyDeﬁnition3

TheHPCAlgorithm:

Step1:GivenacollectionofAi∈PFS (X) (indexi=1, ,N) Con-sidereachAiisauniquecluster

Step2:CalculatePictureDistanceMatrix(PDM)

Step3:MergetwoconsecutivePFSsetsbasedonPDMand cal-culatenewcentersbyDeﬁnition4.Noticethatonlytwoclusters arejointedineachstage

Step4:RepeatStep2withAibeingreplacedwiththenew cen-tersuntilthedesirablenumberofclustersisachieved

3 Evaluation

In this section, we aimtovalidate whetherthenew metric canaccurately measure dataelementsof thepicturefuzzyset:

Ai∈PFS (X).Eventhoughthereexistmanyextensionsofthe classi-calFuzzyC-Means(FCM)intheliteraturethatusedtheEuclideanor HammingorMahalanobisdistancesforobtainingclustersof spher-icalorellipticgeometricalform,theywerenotdesignedtoworkin thePFSsetwhichcontainstheinformation ofpositive,negative andneutralasinDeﬁnition1.Therefore,inordertoclassifyPFS elements:Ai ∈PFS (X),weshouldusethebasicandgeneralized picture distancemeasures-dP(A,B) and dG(A,B) ina hierarchi-cal clusteringalgorithm like HPCrespectively Thissection will compare thosemeasures interms of performance ofclustering algorithms.Therefore,wehaveimplementedtheHPCalgorithm withdG(A,B) inadditiontoavariantofHPCusingdP(A,B) called

CK.TheIntuitionisticHierarchicalClustering(IHC)algorithm[43] hasbeenimplementedtoevaluateclusteringqualityofHPCand CK

The experimental data consists of 4 datasets The ﬁrst one, Guangzhoucar[50]describedinTable1,isasmalldatasetconsists

of5newcarsintheGuangzhoumarketevaluatedby6criteria:Fuel (G1),Aerod(G2),Price(G3),Comfort(G4),Design(G5)andSafety (G6).Dataofeachcarforagivencriterionconsistofthree com-ponentsrepresentingforthepositive,theneutralandthenegative degrees.SumoftheneutralandthenegativedegreesinTable1

isthenon-membershipvalueinRef.[50].Thesecondone, Build-ingmaterials[40]showninTable2,isanothersmalldatasethas5 buildingmaterialsnamelySealant,Floorvarnish,Wallpaint, Car-petandChlorideﬂooringcharacterizedby8attributes.Sumofthe neutralandthenegativedegreesinTable2isthenon-membership valueinRef.[40].Theaimofthisdatasetistovalidatethe algo-rithmsonadatasethavinglargernumberofattributesthanthat

of theGuangzhoucar dataset.Thethirdone,Heart Disease[34] whoseapartisexpressedinTable3,isareallargedatasetfrom UCIMachineLearningRepositoryconsistsof270patients acquir-ingheartdiseasecategorizedby3attributessuchasAge(3#Age), Bloodpressure(mmHg/patient,10#Trestbps)andheartrate(#32 Thalach).Thepositive,theneutraland thenegativedegreesare fuzziﬁedfromcrispdatausingGaussian,triangularandtrapezoid

380

381

382

383

384

386

387

388

389

390

391

392

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413 414

415 416 417

418 419 420 421 422 423 424 425 426

427

428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465

Trang 6

Table 2

Table 3

Q8

membershipfunctions.Theaimofthisdatasetistovalidatethe

algorithmsonadatasethavinglargernumberofobjectsthanthat

oftheGuangzhoucardataset.Lastly,ForestCoverType[33]isalarge

datasetextractedfromUCIMachineLearningRepositoryincludes

1000instancesin10dimensionsshowingtheactualforestcover

typeforagivenobservation(30×30mcell)determinedfromUS

ForestService(USFS)Region2ResourceInformationSystem(RIS)

Thepositive,theneutralandthenegativedegreesarefuzziﬁedfrom

crispdatausingGaussian,triangularandtrapezoidmembership

functions.Theaimofthisdatasetistovalidatethealgorithmsona

largedatasethavingboththenumberofobjectsandthenumberof

attributesgreaterthanthoseoftheGuangzhoucardataset

Inordertoevaluateclusteringqualitiesofthealgorithms,we

useNMI(NormalizedMutualInformation),F-MeasureandPurity

Theseevaluationindicesarethe-larger-the-better

NMI=

k

j =1

r

i =1

nijlogn×nij

ni×n j

r

i =1

nilogni

n

⎛

⎝ k

j =1

njlognj

n

⎞

⎠ ,

Precisioni= 1

ni

k

max

j =1

nij ,(i=1, ,r),

Recalli= 1

nj∗

k

max

j =1

nij ,(i=1, ,r),

j∗=argmaxk

j =1

nij ,(j∗∈ [1,k] ),

Fi=2×Precisioni×Recalli Precisioni+Recalli ,

F-Measure=1

r

i =1

Fi,

Purity=1n

r

i=1

k

max

j=1

nij ,

where

• T=

T1, ,Tk

andC=

C1, ,Cr

arekcorrectandrpredicted clusters,respectively

• nistotalnumberofdatapoints

• nij=|Ci∩Tj|:commonnumberofdatapointsbetweenCiandTj (i=1, ,r;j=1, ,k)

• ni=

k

j=1

nij:numberofdatapointsofCi(i=1, ,r)

• nj=

r

i=1

nij:numberofdatapointsofTj(j=1, ,k)

Firstly,weillustratetheactivitiesoftheHPCalgorithmtoclassify theGuangzhoucardatasetinTable1.Intheﬁrstphase,eachcarin thedatasetisauniquecluster

Car1 , Car2 , Car3 , Car4 , Car5

Table 4

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489 490 491 492 493

494

495

496 497 498

499

Trang 7

PDM1=

⎛

⎜

⎝

0 0.2908 0.3275 0.2560 0.3406

0.2908 0 0.3036 0.2912 0.3286

0.3275 0.3036 0 0.3581 0.3062

0.2560 0.2912 0.3581 0 0.3767

0.3406 0.3286 0.3062 0.3767 0

⎞

⎟

⎠.

Becaused (Car1,Car4)=0.2560 is theminimalvalue among

alldistances,Car1 andCar4are groupedintoacluster

Remov-ingalldistancevaluesrelatedtoCar1andCar4,werecognizethat

d (Car2,Car3)=0.3036istheminimalvalueamongall.Thus,Car2

andCar3aremergedintoanothercluster.Resultsofthesecond

phaseare:

Car1,Car4

, Car2,Car3

, Car5

Using Deﬁnition 4, the centers of

Car1,Car4

and

Car2,Car3

are:

Fuel

Aerod

Price

Comfort

Design

Safety

0.3)

(0.5, 0.05, 0.05)

(0.65, 0.05, 0.1)

(0.8,0.05, 0.05)

(0.15,0.2, 0.35)

(0.6,0.165, 0.085)

0.25,

0.15)

(0.3,0.3, 0.2)

(0.35,0.1, 0.25)

(0.15,0.1, 0.25)

(0.3,0.35, 0.25)

(0.5,0.2, 0.15)

Next,wecalculatethePDMofPhase2

PDM2=

⎛

⎝00.28870.28870 0.34850.2958

0.3485 0.2958 0

⎞

⎠

Sinced

Car1,Car4

, Car2,Car3

=0.2887isthesmallest valueamongalldistancesinPDM2,wecombinethosecarsintoa

cluster.Resultsofthethirdphaseare:

Car1,Car4,Car2,Car3

, Car5 Thecenterofcluster

Car1,Car4,Car2,Car3

is:

Fuel

Aerod

Price

Comfort

Design

Safety

0.2,

0.225)

(0.4, 0.175, 0.125)

(0.5, 0075, 0.175)

(0.475, 0.075, 0.15)

(0.225, 0.275, 0.3)

(0.55, 0.1825, 0.1175)

ThePDMofPhase3is:

PDM3=0.3110

Lastly,inthefourthphase,allcarsaregroupedintoaunique

cluster.AhierarchicaltreefortheclassiﬁcationofGuangzhoucar

datasetusingHPCalgorithmisshowninFig.2.Ifwecomputethe

averagevaluesofthepositive,theneutralandthenegative

mem-bershipsofallcarsandgroupbyphasesthenwegettheresultsin

Table4

Inordertovisualizetheclusteringresults,weusePrincipal

Com-ponentAnalysis(PCA),whichisawell-knownmethodinstatistics,

toreducedimensionsofdatainTable4andgettheresultsinTable5

2Ddistributionsofdatapointsandcentersofallphasesarealso

depictedinFigs.3–6

Secondly,wecomparetheclusteringqualitiesofHPCand CK

through evaluation indices on the experimental datasets The

resultsontheGuangzhoucardatasetareshowninTables6–9

Table 5

Table 6

Table 7

Table 8

Table 9

TheresultshaveshownthatclusteringqualityofHPCisbetter thanthatofCK.Moreover,asillustratedinFigs.2and7andTable6,

weclearlyrecognizethatthehierarchicaltreeofIHCisidenticalto HPC.Thismeansthatusingthegeneralizedpicturedistance mea-sureinclusteringalgorithmsresultsinbetterqualitythanusingthe basicpicturedistance

Analogously, wemadethecomparisononotherdatasetsand achievedtheresultsinTables10–19.Thevaluesin thesetables afﬁrmtheefﬁciencyofHPCeveninthecasesthatthenumberof attributesorthenumberofobjectsishigherthanthatofGuangzhou car.Thisclearlyshowsthefactthatusingthegeneralizedpicture distancemeasurewouldresultinaccuratecalculationsofsimilarity betweenobjects

In Figs 8 and 9, we illustrate the hierarchical tree of HPC for buildingmaterialsand aclusteringtool calledHPCS—akind

ofknowledge-basedsystemstoassistclusteringonPFSdatasets, respectively

500

501

502

503

504

505

506

508

509

510

511

513

514

515

517

518

519

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552

Trang 8

Fig 2. The hierarchical tree of HPC for Guangzhou car.

Trang 9

Fig 5. The distributions of data and centers in Phase 3.

Trang 10

Table 10

Carpet,

Sealant

Table 11

Table 12

Table 13

Table 14

Table 15

Định dạng
Số trang	12
Dung lượng	1,56 MB