Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
1,14 MB
Nội dung
Expert Systems with Applications 39 (2012) 9848–9859 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa A novel intuitionistic fuzzy clustering method for geo-demographic analysis Le Hoang Son a,⇑, Bui Cong Cuong b, Pier Luca Lanzi c, Nguyen Tho Thong a a Hanoi University of Science, Vietnam National University, 334 Nguyen Trai, Thanh Xuan, Ha Noi, Viet Nam Institute of Mathematics, Vietnamese Academy of Science and Technnology, 18 Hoang Quoc Viet, Cau Giay, Ha Noi, Viet Nam c Department of Electronics and Information, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133, Italy b a r t i c l e i n f o Keywords: Geo-demographic analysis Geographic information systems Intuitionistic fuzzy sets Policies making Possibilistic fuzzy C-means a b s t r a c t Geo-Demographic Analysis (GDA) is an important tool to explore the underlying rules that regulate our world, and therefore, it has been widely applied to the development of effective socio-economic policies through the analysis of data generated from Geographic Information Systems (GIS) In GDA applications, clustering plays a major role however, the current state-of-the-art algorithms, namely the Fuzzy Geographically Weighted Clustering (FGWC), have demonstrated several limitations both in terms of speed and in terms of quality of the achieved results Accordingly, in this paper, we propose a novel clustering algorithm for GDA application, based on recent results regarding intuitionistic fuzzy sets and the possibilistic fuzzy C-means, that aims at overcoming some of the limitations of the existing methods Ó 2012 Elsevier Ltd All rights reserved Introduction Recently, there has been a great interest in Geo-Demographic Analysis (GDA) and its application to many branches of society; for instance, GDA is usually applied to study the geographical distribution of specific spending groups so as to predict future spending trends Accordingly, GDA plays an essential role in planning distribution, managing products, services, capital, etc Geo-Demographic Analysis (GDA) combines Geographical Information Systems (GIS) and Data Mining algorithms GDA is usually defined as ‘‘the analysis of spatially referenced geo-demographic and lifestyle data’’ (Sleight, 1993) and widely used in the public and private sectors for the planning and provision of products and services There are two main underlying assumptions in GDA (Palmer, 2008): Firstly, it assumes that two people who live in the same area are more likely to have similar characteristics than two people selected at random; Secondly, it assumes that two areas can be characterized in terms of their population, using demographics and other measures Based on these two principles, clustering is applied to group geo-demographic data into meaningful clusters that capture existing regularities (relevant geo-demographic profiles) thus making the data more manageable for the target analysis (Mason & Jacobson, 2007) Notwithstanding the several clustering methods currently available, in this specific area, the existing state-of-theart, namely the Fuzzy Geographically Weighted Clustering (FGWC) (Mason & Jacobson, 2007), is limited both in terms of speed and in terms of achieved clustering quality (Son, Lanzi, Cuong, & Hung, 2011) Accordingly, in (Son et al., 2011) we presented an approach ⇑ Corresponding author Tel.: +84 904171284; fax: +84 0438623938 E-mail address: sonlh@vnu.edu.vn (L.H Son) 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd All rights reserved doi:10.1016/j.eswa.2012.02.167 to improve the speed of FGWC and provided experimental results showing that our approach could perform better that FGWC In this paper, we extend our previous work (Son et al., 2011) and present a novel clustering algorithm for GDA, called Intuitionistic Possibilistic Fuzzy Geographically Weighted Clustering (IPFGWC), that tries to cope with some of the limitations of the FGWC algorithm by integrating elements of Intuitionistic Fuzzy Sets (IFS) (Atanassov, 1986), Possibilistic Fuzzy C-Means (PFCM) (Pal, Pal, Keller, & Bezdek, 2005) and Generalized IFS (GIFS) (Liu, 2010) The rest of the paper is structured as follows Section introduces some related works for the GDA problem Section describes our novel algorithm while Section validates the proposed approach through a set of experiments involving real-world data Finally, Section draws the conclusions and delineates the future research directions Related works In this section, we provide a brief overview of the works in GDA that are more relevant to this work Walford (2011) described a method using Principal Component Analysis (PCA) to study the spatial distribution of the 1991 census data scores The distribution of these scores helps to determine whether the underlying demographic and socio-economic phenomena are geographically concentrated or dispersed Loureiro et al (2006) applied self-organizing maps (SOM) (Kohonen, 2001) to GDA Self-organizing maps are unsupervised neural networks that have been used as the visualization tool for high dimensional data When a SOM is trained with a given dataset, its units tend to spread themselves in input space following by some density functions of the data patterns Based on the variations in edge length in a path between two units on the SOM, the authors presented a new way of calculating fuzzy member- L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 ships of fuzzy clustering method Then, results of the proposed method are adequate in the identification of their main clusters In general, this method gives better results than other ones due to the use of SOM in the initialization phase However, it requires a lot of memory spaces to store all neurons and weights Furthermore, the speed of training phase is quite slow Clustering algorithms have been widely used in GDA (Tryon & Bailey, 1970) Day, Pearce, and Dorling (2008) used agglomerative hierarchical clustering with the Euclidean distance measure as a mean to classify 190 countries into groups that are homogeneous in terms of mortality rates The aim of this study was to identify clusters of nations grouped by health outcomes in order to provide sensible groupings for international comparisons Then, 12 clusters of countries were identified with the average life expectancy of each one ranged from 81.5 years (cluster 1) to 37.7 years (cluster 12) However, the interpretation of the hierarchy is complex and often confusing Besides, the deterministic nature of the technique prevents re-evaluation after points are grouped into a node Hence, people prefer using k-means clustering to using this algorithm Lee et al (1999) applied k-means clustering to classify the data of population and housing census in Korea into 26 clusters with 3752 basic administrative units separately for 1990 and 1995 Then, by the hierarchical method, they made six ‘‘Grand Clusters’’ from these 26 clusters’ profiles The result of this research is the distribution of six ‘‘Grand Clusters’’ during the period 1990 –1995 Another example of using k-means for GDA was from Gibbs et al (2010) In this literature, the authors divided all the primary schools at London in 2007 into groups or clusters that have similar ethnic and socio-economic characteristics Indeed, this process helps parents find schools for their children easily by marking the suitable variables in the map, i.e ‘‘at least 80% pupils are White British’’ and ‘‘Eligible for Free School Meals’’ A recent result of Petersen et al (2011) has confirmed again the use of k-means clustering for this problem The authors presented London Output Area Classification (LOAC) which, in essence, is a geo-demographic system The main core of this system is k-means clustering serving for mapping health care needs and other health indicators that are useful for targeting neighborhoods in public health campaigns LOAC was also compared to six other geo-demographic systems from both governmental and commercial sources Basically, k-means algorithm can be used for GDA with acceptable results However, these are some limitations of this algorithm that we should consider carefully Firstly, it is very sensitive to the outliers Thus, in some literatures such as (Lee et al., 1999), the authors had to try k-means clustering manually for a rather big number of clusters and then identify some outliers Indeed, it takes much time to that Secondly, the algorithm starts with random positions at the beginning Somehow, they fall into bad cases and the outputted results are not optimal In fact, assigning a geographical area to a single group is not always the best choice This assumption leads to the issues of ecological fallacy An ecological fallacy is a logical fallacy in the interpretation of statistical data in an ecological study, whereby inferences about the nature of specific individuals are based solely upon aggregate statistics collected for the group to which those individuals belong In the other words, it can be shortly understood that statistics accurately describing group characteristics not necessarily apply to individuals within that group For example, if a group of people is measured to have a lower average IQ than the general population; it is an error to assume that all members of that group have a lower IQ than the general population Accordingly, fuzzy clustering method is often preferred Fuzzy clustering assigns a membership value for each area instead of assigning a geographical area to a single group, thus helping to overcome the issues of ecological fallacy The fuzzy clustering algorithm typically used in GDA is the fuzzy C-means clustering algorithm of Bezdek known as FCM (Bezdek & Ehrlich, 1984; Kannan, Ramathilagam, & Chung, 2012; Kỹỗỹkdeniz, Baray, 9849 Esnaf, & Ecerkale, 2012) However, FCM misses geographical factor in its design For example, consider that residential area type 27 is the same and behaves in the same way wherever it happens to be located But what happens if type 27 areas respond differently depending on their map locations? To overcome this shortcoming, Feng and Flowerdew (1998) proposed an extension to the fuzzy clustering technique, which provides for the ex post facto adjustment of the cluster membership values based on ‘‘Neighbourhood Effects’’ (NE) The neighbourhood effects incorporate geography into the model The neighbourhood effects formula adjusts the cluster membership as shown in the following equation, u0k ẳ a uk ỵ b  c X  wkj  uj ; A jẳ1 1ị where u0k is the new cluster membership of area k and uk is the old cluster membership of this area Two parameters a and b are scaling variables which affect the proportion of the original membership and satisfy the condition, a ỵ b ẳ 1; 2ị A is a factor to scale the ‘‘sum’’ term to the range [0–1] The weighted membership is calculated as follows, a wkj ẳ pbkj =dkj ; 3ị where p is the length of the common boundary between k and j; dkj is the distance between k and j; a and b are user-defined parameters However, Feng and Flowerdew’s neighbourhood effects have some limitations: Firstly, they ignore the effects of areas that have no common boundaries; Secondly, they exclude the effects of population, a key geo-demographic consideration To overcome these limitations, Mason and Jacobson (2007) introduced a modified cluster membership adjustment which incorporates a spatial interaction effect model The new membership function is inspired by the principles of geographical spatial interaction (Birkin & Clarke, 1991) and incorporates a basic spatial interaction model into the weighting of the memberships as well as the typical neighbourhood effects of fuzzy clustering algorithms This makes cluster centers ‘‘geographically aware’’ In (Mason & Jacobson, 2007), the authors introduced the Fuzzy Geographically Weighted Clustering (FGWC) to calculate the influence of one area upon another one as the product of the populations of the areas A distance decay effect is implemented in the denominator This effect is implemented through the weighting factor as described in equation below, a wkj ¼ ðpopk  popj Þb =dkj ; ð4Þ where popk, popj are the population of areas k and j respectively; the term dkj is the distance between k and j; a and b are user-defined parameters while A is a factor to scale the ‘‘sum’’ term, and it is calculated across all clusters, ensuring that the sum of the memberships for a given area for all clusters is equal to one FGWC is considered as the state-of-the-art for GDA problems However, its speed and the quality of clustering achieved can be improved (Son et al., 2011) In particular, while in (Son et al., 2011) we showed an approach to significantly speed-up FGWC, in this paper we instead focus on the quality of the clusters produced In particular, based on the results of Intuitionistic Fuzzy Sets (IFS) (Atanassov, 1986), Possibilistic Fuzzy C-Means (PFCM) (Pal et al., 2005) and Generalized IFS (GIFS) (Liu, 2010), we propose an integrated approach to produce clusters of higher quality The approach has been validated empirically 9850 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 Definition A Vague Set (VS) (Gau & Buehrer, 1993) in a universe of discourse X is The proposed algorithm 3.1 Intuitionistic fuzzy sets V¼ Fuzzy sets have been introduced by Zadeh (1965) and since then this concept has been applied to various algebraic structures Basically, a fuzzy set is defined as follows Definition A Fuzzy Set (FS) l (Zadeh, 1965) in a non-empty set X is a function l : X ! ½0; 1; x # lðxÞ ð5Þ where l(x) is the membership degree of each element x X A fuzzy set can be alternately dened as A ẳ fhx; lxịijx Xg: 6ị Along with the development of practical problems, this notion is too cramped An obvious evidence for this consideration can be extracted from the electoral problem (Atanassov, 2003) Suppose that we have a list of all countries (n) with elective governments Normally, if we assume p is the number of countries whose electors vote for the corresponding government then the number of countries whose electors not vote is n À p However, it is possible that there exist some votes given to parties or persons outside the government Indeed, this situation cannot be modelled by traditional fuzzy sets n X ½aV ðxi Þ; À bV ðxi Þ=xi ; xi X; 11ị iẳ1 aV xi ị À bV ðxi Þ 1; i n; ð12Þ where aV(xi) is a lower bound on the grade of membership of xi derived from the evidence for xi, and bV(xi) is a lower bound on the grade of membership of the negation of xi derived from the evidence against xi The grade of membership of xi is bounded to a subinterval [aV(xi), À bV(xi)] & [0, 1] In case of aV(xi) = bV(xi), VS set becomes FS set of Zadeh However, VS is shown to be equivalent to IFS with the intuitionistic fuzzy index À aV(xi) À bV(xi) However, sometimes we not know both the membership degree and the non-membership degree of an element but we know its the value range instead To cope with this issue, Atanassov and Gargov (1989) extended IFS and introduced the concept of Interval-Valued Intuitionistic Fuzzy Set (IVIFS) Definition An Interval-Valued Intuitionistic Fuzzy Set (IVIFS) (Atanassov & Gargov, 1989) in a non-empty set X is ~ IV ¼ fhx; l ~ ðxÞ; c ~ ðxÞijx Xg; A AIV AIV 13ị where l m lA~IV xị ẳ lLA~IV xị; lUA~IV xị & ẵ0; 1; 14ị lLA~IV xị ¼ Inf lA~IV ðxÞ; lUA~IV ðxÞ ¼ SuplA~IV ðxÞ; 8x X; ð15Þ Motivated by the concept of ‘‘intuitionism’’ of L Brouwer when he invited mathematicians to remove Aristoteles’ law of excluded middle, Atanassov (1986) presented the idea of intuitionistic fuzzy sets That is to say, if we have a proposition A, we can state that A is true, that A is false, or that we not know whether A is true or false This idea is expressed by the denition below cA~IV xị ẳ cLA~IV xị; cUA~IV 16ị Definition An Intuitionistic Fuzzy Set (IFS) (Atanassov, 1986) in a non-empty set X is The intuitionistic fuzzy index is ~ ẳ fhx; l ~ xị; c ~ xịijx Xg; A A A ð7Þ where lA~ ðxÞ is the membership degree of each element x X and cA~ ðxÞ is the non-membership degree, lA~ ðxÞ; cA~ ðxÞ ẵ0; 1; 8x X; lA~ xị ỵ cA~ ðxÞ 1; 8x X: 8x X: L ~ xị A IV c ẳ Inf cA~ IV xị; i xị & ẵ0; 1; U ~ xị A IV c ẳ SupcA~ IV xị; 8x X: 17ị A constraint is set up for (13) lA~IV xị ỵ cA~IV xị 1: l 18ị m PA~IV xị ẳ PLA~ IV xị; PUA~ IV xị ; 8x X; L ~ A ð19Þ U ~ A P IV xị ẳ lUA~ IV xị cUA~ IV xị; P IV xị ẳ lLA~ xị À cLA~ ðxÞ: IV Note that, when IV ð20Þ lA~IV xị ẳ lLA~IV xị ẳ lUA~IV xị and cA~IV xị ¼ cLA~IV ðxÞ ¼ ð8Þ cUA~IV ðxÞ, the IVIFS set is reduced to IFS ð9Þ In 2002, Mondal and Samanta (2002) presented a generalized version of IFS where the condition (9) is extended by a fuzzy operator The intuitionistic fuzzy index of an element showing the non-determinacy is denoted as PA~ xị ẳ lA~ xị ỵ cA~ xị; h 10ị When PA~ xị ẳ 0, the IFS set becomes FS set of Zadeh Since this notation was introduced, several applications of the IFS set have been presented For instance, Sen and Saha (1986) defined the concept of a C-semigroup and established a relation between regular C-semigroup and C-group Kim and Jun (2001) investigated some properties of the intuitionistic fuzzy sets in a semigroup Uckun, OZTURK, and Jun (2007) introduced the notion of an intuitionistic fuzzy C-ideal of a C-semigroup and investigated some properties connected with intuitionistic fuzzy C-ideals in a C-semigroup Similarly, Gau and Buehrer (1993) introduced Vague Sets In this literature, the authors noted that the drawback of using the single membership value in fuzzy set theory is that the evidence for x X and the evidence against x X are in fact mixed together Therefore, instead of using point-based membership in FS, interval-based membership is used (Lu & Ng, 2005) Definition A Mondal and Samanta’s Generalized Intuitionistic Fuzzy Set (MS-GIFS) (2002) in a non-empty set X is ~ MS ¼ fhx; l ~ ðxÞ; c ~ ðxÞijx Xg: A AMS AMS ð21Þ The meanings of variables are similar to those in Def Nevertheless, a little change is occurred in the condition (9) as shown below lA~ ðxÞ ^ cA~ ðxÞ 0:5; 8x X: ð22Þ Recently, Liu (2010) has shown an extension of Mondal and Samanta model by attaching an extensional index so called L-value This index was incorporated to the condition (22) above Definition A Liu’s Generalized Intuitionistic Fuzzy Set (L-GIFS) (Liu, 2010) in a non-empty set X is ~ L ẳ fhx; l ~ xị; c ~ ðxÞijx Xg: A AL AL ð23Þ 9851 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 The condition (22) is replaced by lA~ xị ỵ cA~ xị ỵ L; 8x X: ð24Þ When L = 0, the Liu fuzzy sets return to IFS Liu has also proved that for any universe of discourse X FSðXÞ & IFSðXÞ & MS À GIFSðXÞ & L À GIFSðXÞ; ð25Þ L À GIFSðXÞ å MS À GIFSðXÞ å IFSðXÞ å FSðXÞ: ð26Þ Notice that the L-value used in (25) and (26) is one Besides, from the Def 3, we also confirm that IFSðXÞ ¼ VSðXÞ: ð27Þ Some algebraic operations, fuzzy relations and fuzzy topology for these sets were discussed intensively in equivalent literatures As we have mentioned in the previous section, fuzzy clustering is the method of choice to tackle GDA problems However, the usual Fuzzy C-Means (FCM) and its variants FGWC rely on the basic principles about fuzzy sets of Zadeh (1965) This kind of fuzzy sets has been shown to contain some limitations for current practical problem Indeed, to fulfil the objective of this paper, we need to develop a clustering algorithm in some extended fuzzy sets of Zadeh Thus, in some next parts, we will pay much attention to the Intuitionistic Fuzzy Set (IFS) and its latest version L-GIFS As such, the proposed clustering algorithm will be firstly constructed in IFS and then expanded in L-GIFS Example By Matlab simulation, we illustrate these fuzzy sets from Figs 1–4 3.2 Status quo of clustering algorithms for IFS Clustering algorithms for IFS sets have been being studied intensively in recent times Providing a useful way to describe fuzzy sets, these kinds of algorithm often enhance the understanding of internal structure of data Pelekis, Iakovidis, Kotsifakos, and Kopanakis (2008) argued that the distances determining the membership of a feature vector to a cluster are also subject to uncertainty Current fuzzy clustering approaches not utilize any information about uncertainty at the constitutional feature level The most typical one, FCM algorithm tries to partition the dataset by just looking at the feature vectors, and as such it ignores the fact that these vectors may be accompanied by qualitative information which may be given per feature For example, following the idea of intuitionistic fuzzy set theory, a data point Xk is not just a p-dimensional vector (Xk1, Xk2, , Xkp) Fig The IFS sets of quantitative information, but instead it is a p-dimensional vector of triplets [(Xk1, lk1, ck1), (Xk2, lk2, ck2), , (Xkp, lkp, ckp)] where for each Xki there exists qualitative information which is provided via the intuitionistic membership lki and non-membership cki of the current data point to the feature i In this situation, the traditional distance function in FCM algorithm is not suitable because it operates only on the feature vectors and not on the qualitative information which may be given per feature For this reason, Pelekis et al (2008) presented a modified version of FCM algorithm for IFS sets with a new intuitionistic fuzzy set distance metric, DA; Bị ẳ dlA ; lB ị ỵ dcA ; cB ịị=2; 28ị where dA0 ; B0 ị ẳ PN minA0 xi ị;B0 xi ịị > i¼1 > A0 [ B0 – / < P N > > : maxðA0 ðxi Þ;B0 ðxi ÞÞ : 29ị iẳ1 0 A [B ẳ/ Zhang, Xu, and Chen (2007) defined the concept of intuitionistic fuzzy similarity matrix (IFSM), and constructed an intuitionistic fuzzy equivalence matrix (IFEM) Then they transformed the IFSM into the IFEM and clustered the dataset based on the k-cutting matrix of the interval-valued matrix However, the whole process is performed on the IVIFS sets (Def 4) that are converted from the original IFS sets Indeed, it requires a lot of computations and loses too Fig The FS sets 9852 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 Fig The MS-GIFS sets based on the well-known fuzzy C-means clustering method and the basic distance measures between IFS (Szmidt & Kacprzyk, 2000; Xu, 2007) vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X d1 A;Bị ẳ t wi ẵlA xi ị lB xi ịị2 ỵ cA xi ị cB xi ịị2 þ ðPA ðxi Þ À PB ðxi ÞÞ2 : iẳ1 31ị The differences between two versions of modied FCM algorithm for IFS in (Pelekis et al., 2008; Xu & Wu, 2010) are not only the distance metric but also the way to define clusters’ centers In (Pelekis et al., 2008), the membership (non-membership) degree of a center is calculated by average of the ones of its data points specified through the membership matrix While in (Xu & Wu, 2010), it is calculated through a combination of membership values and data points; thus giving better representation of clusters’ centers In general, Xu and Wu‘s algorithm performs more stably and effectively than any other clustering methods on IFS sets Fig The L-GIFS sets much information in the process of calculating the intuitionistic fuzzy similarity degree (IFSD) To overcome this limitation, Xu, Chen, and Wu (2008) proposed an algorithm that uses the derived association coefficients (AC) of IFS to construct an association matrix (AM), and utilizes the transitivity principle of the equivalent matrix (Zhang, 1983) to transform it into an equivalent association matrix (EAM) Then, the EAM will be used to construct the k-cutting matrix Based on this matrix, classification will be performed CA;Bị ẳ max PN w ẵ x ị: x ị ỵ A xi ị: B xi ị ỵ i¼1 h i A i B i iP h N N 2 w i A ðxi Þ ; iẳ1 iẳ1 wi A xi ị ỵ A xi ị þ nP l l l c c P c PA xi ị:PB xi ị l2B xi ị ỵ c2B xi ị ỵ P2B xi ị io : 30ị The formula to calculate AC Eq (30) is somehow similar to the similarity degree of (Pelekis et al., 2008) In 2009, Xu (2009) also introduced an intuitionistic fuzzy hierarchical algorithm for clustering IFS, which is based on the traditional hierarchical clustering procedure and the intuitionistic fuzzy aggregation operator However, these are some limitations in the series works of Xu et al Firstly, the number of clusters is not set up beforehand In case we wish to partition the dataset into a pre-defined number of clusters, it is impossible for these algorithms Secondly, we cannot determine the suitable value of parameter k Depend on a specific value of this parameter, the number of clusters will be established Thirdly, these algorithms cannot provide the information about membership degrees of the objects to each cluster Solving these obstacles, Xu and Wu (2010) developed another intuitionistic fuzzy C-means algorithm to cluster IFS, which is 3.3 A new fuzzy clustering method for intuitionistic fuzzy sets Throughout the previous section, a natural question can be arisen: ‘‘Is the Xu and Wu method (2010) enough for the GDA problem?’’ In a certain extent, it is acceptable However, due to some limitations below, we cannot totally rely on this method and have to think about a new one The Xu and Wu method uses FCM as a skeleton to deploy the main algorithm As pointed out by Pal et al (2005), FCM is very sensitive to outliers Consider the number of clusters is two If an outlier is equidistant from two prototypes, then its membership in each cluster will be the same regardless of the absolute value of the distance of this point from the two centers as well as from the other points in the data In such cases, it should be different membership values following by the distance criteria, The membership values in both methods (Pelekis et al., 2008; Xu & Wu, 2010) are still crisp value In the other word, it is considered the traditional fuzzy set (FS) of Zadeh In some specific context, we should extend this set into intuitionistic one, Most of the geo-demographic data treat their features equally In the other word, the membership values of an element to its features are often one Indeed, we should reduce the pattern and center sets to the FS sets instead of IFS ones From those observations, we should investigate a new fuzzy clustering algorithm for IFS Pal et al (2005) presented an effective fuzzy clustering method so called Possibilistic Fuzzy C-Means (PFCM) PFCM was proven to solve the noise sensitivity defect of FCM, overcome the coincident clusters problem of Possibilistic 9853 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 C-Means (PCM) and eliminate the row sum constraints of Fuzzy Possibilistic C-Means (FPCM) Therefore, it is suitable for the proposed algorithm Now, we state the objective function of the considered problem (Intuitionistic Possibilistic Fuzzy C-Means – IPFCM) k¼1 j¼1 C X jẳ1 cj N X tkj ịg ! : C X C X s g a1 um ukj ; kj ỵ a2 t kj ỵ a3 hkj kX k V j k kk jẳ1 32ị @LU k ;kk ị ¼ a1  m  umÀ1 kj kX k À V j k À kk ; @ukj ð43Þ In (32), we have to partition the pattern set X into C clusters Each pattern has a membership value U, a hesitation level H and a typicality value T The constraints for this problem are !mÀ1 ; j ¼ 1;C;k ¼ 1;N: 44ị !m1 C X kk jẳ1 a1 m kX k V j k2 kẳ1 42ị jẳ1 @LU k ;kk ị kk ẳ () ukj ¼ @ukj a1  m  kX k À V j k2 P Since Cj¼1 ukj ¼ 1, we easily have N X C X s g J m;g;s U; T; H; V; Xị ẳ a1 um kj ỵ a2 t kj ỵ a3 hkj kX k V j k ỵ LU k ;kk ị ẳ ! () kk ¼ a1  m  C X ¼ 1; ð45Þ kX k À V j k2 : 46ị jẳ1 From (44) and (46), we get ukj ỵ hkj 1; C X 33ị ukj ẳ 1; k ¼ 1; N; ; mÀ1 j ¼ 1; C: 47ị C kX k V j k iẳ1 kX k V i k 34ị jẳ1 C X ukj ¼ P Similarly, fix U, T, V for the kth column Hk of H and use Lagrange multiplier, we obtain the solution hkj ¼ 1 hkj ¼ ; PC kXk ÀV j ksÀ1 j¼1 m; g; s > 35ị > 0; i ẳ 1; 3: cj ; j ¼ 1; C are constants k ẳ 1; N; j ẳ 1; C: 48ị iẳ1 kX k ÀV i k Fix U, H, V for the typicality value tkj, we obtain the reduced problem, s g g m Jkj ð49Þ m;g;s ðt kj ị ẳ a1 ukj ỵ a2 t kj ỵ a3 hkj kX k V j k ỵ cj ð1 À t kj Þ ! min: Using gradient method for (49), the solution is found as ukj ; t kj ; hkj ½0; 1: @J kj m;g;s ðt kj Þ Some special cases can be seen as follows, @tkj ỵ a3 ẳ : IPFCM ! PFCM @J kj m;g;s t kj ị ỵ a1 ẳ ^ a3 ẳ : IPFCM ! PCM ỵ a3 ¼ ^ a2 ¼ ^ cj ¼ 0; 8j ẳ 1;C : IPFCM ! FCM 36ị Theorem The optimal solutions of systems (32)–(35) are ukj ¼ PC kX k ÀV j k i¼1 kX k ÀV i k ; mÀ1 hkj ¼ ; PC kX k V j ks1 iẳ1 1ỵ j ¼ 1; C; a2 kX k ÀV j k2 gÀ1 ; a2 kX k À V j k2 37ị 1ỵ k ẳ 1; N; j ẳ 1; C; 38ị k ẳ 1; N; j ẳ 1; C; 39ị cj PN s g m kẳ1 a1 ukj ỵ a2 t kj ỵ a3 hkj X k ; Vj ¼ P s N g m kẳ1 a1 ukj ỵ a2 t kj ỵ a3 hkj ¼ () tkj ¼ kX k ÀV i k t kj ¼ k ¼ 1; N; 50ị g1 ẳ () a2 g tkj  kX k À V j k2 ¼ cj  g  ð1 À tkj ÞgÀ1 ; ð51Þ gÀ1 À1 ; tkj cj !gÀ1 a2 kX k À V j k2 ẳ ; () ỵ tkj cj () ỵ a3 ẳ ^ a01 ẳ a2 ¼ ^ cj ¼ 0; 8j ¼ 1; C : IPFCM ! FPCM: @tkj ¼ a2  g  t gkjÀ1 kX k À V j k2 À cj  g  ð1 À t kj ÞgÀ1 ; a2 kX k ÀV j k2 gÀ1 52ị 53ị ; k ẳ 1; N; j ẳ 1;C: ð54Þ cj Fix U, H, T, we calculate the gradient of Jm,g,s(V) with respect to each Vj, Jm;g;s Vị ẳ N X C X s g a um kj ỵ a2 t kj ỵ a3 hkj kX k V j k ! min; 55ị kẳ1 jẳ1 j ẳ 1; C: 40ị Proof Fix T, H, V for the kth column Uk of U, we get the reduced problem, C X s g J m;g;s U k ị ẳ a1 um kj ỵ a2 t kj ỵ a3 hkj kX k V j k ! : ð41Þ N @Jm;g;s Vị X s g ẳ a1 um kj ỵ a2 t kj ỵ a3 hkj 2X k ỵ 2V j Þ; @V j k¼1 PN s g m @Jm;g;s kẳ1 a1 ukj ỵ a2 t kj ỵ a3 hkj  X k ; j ¼ 1; C: ¼ () V j ¼ P s N @V j a1 um ỵ a2 t g ỵ a3 h kẳ1 kj kj 56ị 57ị kj Similar to (Pal et al., 2005), in this paper, we try to investigate some interesting properties of the solutions above j¼1 Using Lagrange Multiplier for (41) we obtain Property Since the exponential factor belongs to the interval [0, 1], we get 9854 L.H Son et al / Expert Systems with Applications 39 (2012) 98489859 lim fukj g ẳ m!1ỵ lim P C kX k ÀV j k i¼1 kX k V i k m!1ỵ ẳ 1; m1 58ị lim fukj g ẳ 0: 59ị m!1 Property If a2 = cj = 1; j ¼ 1; C; g = then Z Property The series expansion at m = is When the parameters are quite large, all the clusters’ centers tend to move to the central point of the dataset 2 4log aị ỵ 4log aị ỵ logaị logaị 2log aị ỵ logaịị þ 1þ þ ; m3 m m 2log aị ỵ 4log aị ỵ 6log aị þ 2logðaÞ þ m 6 ! 4log5 aị ỵ 8log3 aị ỵ 8log aị þ 8log ðaÞ þ 2logðaÞ 15 ; ð60Þ þ þO m m5 Z N C t kj dkdj ẳ N C ỵ 1ị logN C ỵ 1ị ỵ N 1ị logN 1ị ðC À 3Þip À ðC À 3Þ logðC À 3Þ: If a2 = cj = 1; j ¼ 1; C; g = then Z Z N C t kj dkdj ẳ N 2ịC 2ị ỵ N C ỵ 1ị logN C ỵ 1ị N 1ị logN 1ị ỵ C 3ịip where aẳ ỵ C 3ị logC 3ị: C X kX k À V j k : kX k V i k iẳ1 61ị Thus, lim fukj g ẳ 1: 62ị m!1 Obviously, when the parameter m is getting larger, the value of ukj tends to converge to one Property Similarly, some limits of the hesitation level are lim fhkj g ẳ 63ị s!1 lim fhkj g ¼ cj > < if kX k À V j k < a2 cj lim ft kj g ¼ ¼ > if kX k À V j k ẳ a2 ; g!1ỵ a2 kX k V j k2 g1 : ỵ lim otherwise cj g!1ỵ > > < ð64Þ c c if kX k À V j k2 ¼ a2j : > > : otherwise ð65Þ lim ðm;g;sÞ!ð1;1;1Þ N lim PN lẳ1;lk 1ỵ g a1 um ỵa2 t ỵa3 hs lj lj lj kj Xk m;g;sị!1;1;1ị ẳV PN X kẳ1 k g a1 um ỵa2 t ỵa3 hs kẳ1 X kẳ1 k 73ị kX k V j k iẳ1 kX k V j k ỵ mÀ1 P 1: ð74Þ C kX k ÀV j k sÀ1 i¼1 kX k ÀV j k From the Cauchy inequality, we obtain PN PN 2ðm þ sÞ > þ ms: ð66Þ Property The double limit of the sequence {uj} is ¼ From these formulas, we may see that ukj and hkj have similar roles P C This property shows the fact that the typicality values are independent of the distances between Xk and Vj when the parameter a2 = In this case, the objective function approximates to the one of FCM algorithm with the supplement of the hesitation level 1ỵN1ị 72ị Proof Since the constraint (33), we get Property if a2 = then tkj ¼ 8k ¼ 1; N; j ẳ 1; C 71ị Theorem The relation between two parameters m and s to assure the constraint (33) is the IPFCM condition below kX k À V j k2 < a2j if : lim fv j g ẳ ukj ẳ 1; m!1ỵ hkj lim s!1 Property m!1 g!1 s!1 70ị s!1ỵ 2 !sÀ1 !mÀ1 < X C C = X ukj kX k À V j k kX k À V j k ¼ : ; lim m!1 h : i¼1 kX k À V j k ; kX k À V j k kj i¼1 Property Limits of the typicality value are lim ftkj g ¼ 2ðmÀsÞ !ðmÀ1Þð sÀ1Þ C X ukj kX k À V j k ~ ¼ lim ¼ 1; m!1ỵ hkj m!1ỵ kX k V j k iẳ1 lim s!1À s!1 g!1 Property ~ is complex infinity where lim fhkj g ¼ 1: lim ft kj g ẳ 69ị From this property, we see that there is a gap of the quantity (N À 2)(C À 2) between the areas of typicality values in two steps g = and g = s!1ỵ s!1ỵ g!1 68ị g a1 um ỵa2 t ỵa3 hs lj lj lj g a1 um ỵa2 t ỵa3 hs kj kj kj l ½1; N kj kX k ÀV j k iẳ1 kX k V j k ) ỵ mÀ1 P C kX k ÀV j k s1 iẳ1 kX k V j k mỵs2 P logA 2; ðm À 1Þðs À 1Þ P PC kX k ÀV j k i¼1 kX k V j k ; 75ị mỵs2 m1ị s1ị 76ị where A¼ P C ! C X kX k À V j k : kX k À V j k iẳ1 77ị Since A < kj : 67ị ) mỵs2 P logA > 1; m 1ịs 1ị ) 2m ỵ sị > ỵ ms: à ð78Þ ð79Þ The condition (73) allows us to choose the correct value of m and s for the IPFCM algorithm In the proof above, the strict upper bound for A is one However, due to these results below, L.H Son et al / Expert Systems with Applications 39 (2012) 98489859 lim logA ẳ 1; 80ị lim logA ẳ 1: 81ị A!1 A!1ỵ We, therefore, should choose the loose upper bound for this parameter In this case, we set it to two 3.4 Fuzzy clustering method for Liu’s generalized intuitionistic fuzzy sets In this section, we extend the IPFCM for the L-GIFS sets Notice that the only difference between the L-GIFS and IFS sets is the replacement of constraint (9) in IFS by constraint (24) in L-GIFS with the support of an extensional index In the other word, the constraint (24) is a generalization of (9) because when this index is set to zero, the L-GIFS returns to IFS Therefore, the IPFCM algorithm itself remains unchanged However, for the adaptation of constraint (24), we now state another condition of parameters Theorem The relation between two parameters m and s to assure the constraint (24) in the IPFCM algorithm for the L-GIFS sets is mỵs2 : P log2 m 1ịs 1ị 1ỵL 82ị Proof Similar to the above proof and following by the Cauchy condition, we obtain the inequality below, P C kX k ÀV j k i¼1 kX k ÀV j k ỵ L: mỵs2 m1ị s1ị 83ị By marking A as in (77), we get the following facts, mỵs2 2 P log2 : P logA ðm À 1Þðs 1ị 1ỵL 1ỵL 84ị Especially when L = 1, m ỵ s P 2: 85ị Because 2s P À 2s : 2Às ð86Þ The values of parameters m and s in (85) are definitely stricter than the ones in (73) Indeed, we can use these constraints to find the suitable values of these parameters for the IPFCM algorithm h 3.5 The intuitionistic possibilistic fuzzy geographically weighted clustering algorithm For all previous parts of this section, we have presented the fuzzy clustering algorithm for IFS and L-GIFS sets (IPFCM) Now, we present the main algorithm of this paper for the GDA problem This algorithm is named Intuitionistic Possibilistic Fuzzy Geographically Weighted Clustering (IPFGWC) It is an extension of IPFCM for the considered problem We have a relationship here, IPFCM & IPFGWC: ð87Þ The algorithm is stated below Step 1: Set the number of patterns N, the number of clusters C, the threshold e > and other parameters such as m; g; s > 1; > 0; i ¼ 1; 3, cj ; j ¼ 1; C satisfying the IPFCM condition for the L-GIFS sets (82) and geographic parameters a, b, a, b, Step 2: Initialize the centers of clusters Vj, j ¼ 1; C at t = 0, 9855 Step 3: Use (37)–(39) to calculate the membership values, the hesitation level and the typicality values, respectively The distance used in these formulas is Euclidean Squared Distance Metric, Step 4: Perform geographic modifications through equations (1), (2) and (4) Notice that the population of an area is decided through the current membership values, Step 5: Use (40) to calculate the centers of clusters at t + 1, Step 6: If the error kV(t+1) À V(t)k e then stop the algorithm Otherwise, assign V(t) = V(t+1) and returns to Step The outputted results of this algorithm are the clusters’ centers, final membership values and hesitation levels Results and discussions 4.1 Experiment Environment In this part, we describe the experimental tool, the experimental datasets and the cluster validity measurement Experimental tools: We have implemented the proposed algorithms (IPFGWC) in addition to the above three algorithms: FCM (Bezdek & Ehrlich, 1984), FGWC (Mason & Jacobson, 2007) and CFGWC (Son et al., 2011) in C programming language and executed them on a PC Intel (R) Core (TM)2 Duo CPU T6400 @ 2.00 GHz (2 CPUs), 2048 MB RAM and the operating system is Windows Professional 32-bit Some parameters of these algorithms are set up as below The proposed algorithm: Threshold e = 10À3, parameters m = 3, s = 2, g = 2, a1 ¼ a2 ¼ a3 ¼ 1; cj ¼ j ¼ 1; C and geographic parameters a = b = 1, a = 0.7, b = 0.3, FCM, FGWC, CFGWC: These algorithms have the similar values of geographic parameters, m and threshold with the algorithm above Experimental datasets: We use a real dataset of socio-economic demographic variables from United Nation Organization – UNO (UNSD Statistical Databases, 2011) These data have been collected from national statistical authorities since 1948 through a set of questionnaires dispatched annually by the United Nations Statistics Division to over 230 national statistical offices (Fig 5) They contain statistics on population size and composition, births, deaths, marriage and divorce on an annual basis, economic activity, educational attainment, household characteristics, housing, ethnicity and language, etc Cluster Validity Measurement: We use the validity function of fuzzy clustering for spatial data namely IFV (Chunchun, Lingkui, & Wenzhong, 2008) This index was shown to be robust and stable when clustering spatial data The definition of this index is characterized below, " #2 C < N N = SD X 1X 1X max IFV ¼ u2kj log2 C À log2 ukj :  ; C j¼1 :N k¼1 N kẳ1 rD 88ị The maximal distance between centers is SDmax ¼ max kV k À V j k2 : k–j ð89Þ The even deviation between each object and the cluster center is rD ¼ ! C N 1X 1X kX k À V j k2 : C j¼1 N kẳ1 90ị When IFV ? max, the value of IFV is said to yield the most optimal of the dataset 9856 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 Fig Populations of 233 countries in 2001 4.2 IFV comparison Firstly, we calculate the values of IFV index following by m and C The results are shown in Table From these results, we can recognize that the IFV values of the proposed algorithm (IPFGWC) are larger than the ones of CFGWC, FGWC and FCM algorithms The descending order of these algorithms following by IFV index is IPFGWC, CFGWC, FGWC and FCM It is obvious that the IPFGWC algorithm obtains the best quality of clustering among all other ones Moreover, this test also reconfirms the results shown in the literature (Son et al., 2011) where the authors compared three algorithms FCM, FGWC and CFGWC The FCM has the smallest values of IFV index due to the lack of geographic effects in the algorithm itself Overcoming this deficiency, FGWC obtains the higher values of IFV than FCM does However, by supporting a context term, the algorithm CFGWC really outperforms than FGWC and, of course, FCM Finally, the IFV values of IPFGWC are shown to be larger than the ones of CFGWC (Figs and 7) Thus, IPFGWC is considered as the most effective algorithm in terms of quality of clustering among all When the parameter m = 1.5, the optimal number of clusters of IPFGWC, CFGWC, FGWC and FCM algorithm is 3, 7, 4, 6, respectively These numbers in case of m = 2.5 are 3, 2, and Generally, the number of clusters produced by IPFGWC is the smallest among all results Indeed, the IPFGWC method can generate results, which are near to optimal ones more than other methods In IPFGWC algorithm, the average number of IFV per the number of clusters C is 2452.1 and 2391.8, respectively when m = 1.5 and m = 2.5 Thus, another conclusion can be drawn from this test: When the parameter m increases, the IFV value of IPFGWC tends to decrease Secondly, we measure the values of IFV index following by a and C The results in Table clearly show that the IFV value of IPFGWC is still the largest among all Furthermore, IPFGWC’s result is much better than CFGWC’s and FGWC’s The IFV value of IPFGWC is 1.5 times larger than CFGWC’s and 37.1 times larger than FGWC’s in average Especially when C = 2, these numbers are 33 and 1079, respectively Obviously, the supplements of typicality values and hesitation level on the objective function have made a great improvement in the IFV values of the IPFGWC algorithm When a = 0.4 (Fig 8), the optimal number of clusters of three algorithms is 3, and 5, respectively These numbers in case of a = 0.6 (Fig 9) and a = 0.8 (Fig 10) are 3, 2, and 3, 2, 2, respectively The results of IPFGWC seem to be stable through different values of a On the contrary, the results of CFGWC and FGWC tend to be small when the number of a increases In case of a < 0.5, the number of clusters of the IPFGWC algorithm is said to be the most optimal result among all Table IFV values by m and C C m = 1.5 m = 2.5 IPFGWC CFGWC FGWC FCM IPFGWC CFGWC FGWC FCM 22 6369 828 1741 1674 2546 1555 21.15 218 771 1177 974 1977 1463 21.08 217 707 2.83 18.74 13.37 5.31 0.07 0.02 0.005 0.001 1.277 1.275 0.0001 2555 6220 1097 1751 1469 2369 1445 2411 203.77 766 603 807 977 1165 20.51 203.22 681 454 19.9 5.67 23.3 0.034 0.005 9.73 0.0005 0.0003 17.04 8.1 9857 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 Fig IFV values when a = 0.4 Fig IFV values when m = 1.5 Fig IFV values when m = 2.5 Similar to the previous test, when the parameter a increases, the IFV value of IPFGWC tends to decrease The final conclusion of this part is: the quality of clustering of the IPFGWC algorithm really outperforms the ones of CFGWC, FGWC and FCM 4.3 The running time comparison In this part, we measure the speed of these algorithms following by C (Table 3) and a (Table 4) Obviously, the running time of the IPFGWC method is slower than other ones’ Table shows that the running time of IPFGWC is 4.33 times larger than the one of CFGWC in average These numbers in case of FGWC and FCM are 4.05 and 3.12, respectively Because of some modifications in the objective function, IPFGWC has to undertake some extra calculations Thus, they cause the slow running time of IPFGWC as shown in this table Table also reconfirms the experimental results in the literature (Son et al., 2011) where the descending order of running time is FCM, FGWC and CFGWC Another finding extracted from Table is the average increment of these algorithms per the number clusters C When a cluster is added, the running time of IPFGWC is increased by 0.132 s These incremental numbers in CFGWC, FGWC and FCM are 0.03, 0.032 and 0.043, respectively This finding helps us to estimate the running time of an algorithm when more clusters are provided In Table 4, the running time of IPFFGWC is 3.42 and 3.06 times larger than CFGWC and FGWC, respectively These numbers are smaller than the ones in Table Indeed, the number of cluster C contributes to the increment of running times of these algorithms more than the parameter a does Similar to Table 3, the average increment of IPFGWC, CFGWC and FGWC per a = 0.1 is 0.22, 0.06 and 0.08, respectively These incremental numbers are double the ones in Table Thus, the increment of the parameter a often leads to higher values of the average increment of these algorithms than the increment of the number of cluster C does Fig 11 shows the number of iteration steps of four algorithms in equivalent to the results in Table The final conclusion of this part is: the running time of the IPFGWC algorithm is slower than the ones of CFGWC, FGWC and FCM due to extra calculations However, it is not too slow and is acceptable 4.4 The changes of objective function of IPFGWC Finally, we investigate the changes of objective function’s values of the proposed method following by its parameters (Table 5) Table IFV values by a and C C a = 0.4 a = 0.6 a = 0.8 IPFGWC CFGWC FGWC IPFGWC CFGWC FGWC IPFGWC CFGWC FGWC 2553 6205 1096 1687 1454 2355 20 202 691 1045 823 1856 0.04 15 82 105.4 0.67 27 2537 6113 1119 1704 1414 2279 2353 195 644 1407 744 1640 192 0.005 157 105.1 244 26.3 2523 6043 1136 1670 1380 2211 2286 169 738 903 673 2142 184 0.003 162 96 0.669 26.5 9858 L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 Fig IFV values when a = 0.6 Fig 11 The number of iteration steps Table The changes of objective function Fig 10 IFV values when a = 0.8 Table The running times of four algorithms by C (sec) C Algorithms IPFGWC CFGWC FGWC FCM 0.113 0.186 0.435 0.752 0.894 1.123 1.918 0.03 0.06 0.16 0.19 0.20 0.26 0.24 0.03 0.07 0.16 0.19 0.26 0.27 0.25 0.05 0.14 0.18 0.19 0.32 0.38 0.31 a1 a2 a3 m g s a b a J 1 1 1 1 1 1 1 2 1 1 1 1 2 1 3 3 3 3 6 2 2 2 4 2 2 2 2 4 1 1 1 1 1 1.5 1.5 1 1 1 1 1 2.5 2.5 0.3 0.6 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.8 16031870.3 14563852.3 11181884.5 16031865.4 17637890.2 25904147.4 16031870.4 22162283.3 32063312.8 12836435.7 14175296.1 19513136.3 When we increase the value of parameter a, the value of objective function decreases by 9.15% This situation is occurred again when we increase the parameter g (1%) or the pair (m, g, s) (19.9%) or (a, b) (11.6%) The largest decremental percent is recorded at 30.2% when we increase the fuzziness m Other cases in this test will increase the value of objective function For example, when we double s, the value of objective function increases by 10.01% Similarly, the increment of a1, a2, a3 and all parameters will lead to the increment of J value by 61.5%, 1%, 38.2% and 21.7%, respectively When we increase the pair (a1, a2, a3), the value of J is doubled The final conclusion of this part is: This test helps us to control and predict the value of objective function through its parameters Besides, the fuzziness m should be paid much attention in order to reduce the J value significantly Conclusions Table The running times of three algorithms by a (sec) a 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Algorithms IPFGWC CFGWC FGWC 0.072 0.079 0.094 0.096 0.119 0.124 0.140 0.02 0.02 0.03 0.03 0.03 0.03 0.07 0.03 0.03 0.03 0.03 0.03 0.03 0.07 In this paper, we focused on Geo-Demographic Analysis (GDA), a research area at the intersection between GIS and Data Mining that has been widely applied in real-world applications We discussed the state-of-the-art of GDA and highlighted some of the limitations of the best existing method, the Fuzzy Geographically Weighted Clustering (FGWC) We proposed the Intuitionistic Possibilistic Fuzzy Geographically Weighted Clustering (IPFGWC) that tries to cope with the limitations of FGWC and validated experimentally We validated our algorithm through extensive experimentation using the real-world datasets of UNO Our results show that IPFGWC can outperform some of the best-known approaches Future re- L.H Son et al / Expert Systems with Applications 39 (2012) 9848–9859 search directions include the use of context information within IPFGWC and the development of a better way to model to represent the geographic effects Acknowledgement The authors are greatly indebted to the Editor-in-Chief, Prof J Liebowitz, and to the anonymous reviewers for their comments and their invaluable suggestions that improved the quality and clarity of paper The authors also wish to thank the people who supported us during this work: Prof Pham Ky Anh of VNU, Prof Nguyen Dinh Hoa, of VNU, Dr Roberto Colonello, and the research group at Centre for High Performance Computing of VNU This work is sponsored by a research grant of Vietnam National University, Hanoi for promoting Science and Technology (QGTD.11.01) References Atanassov, K (1986) Intuitionistic fuzzy sets Fuzzy sets and systems, 20, 87–96 Atanassov, Krassimir T (2003) Intuitionistic fuzzy sets past, present and future In Proceedings of the 3rd conference of the European society for fuzzy logic and technology, Germany (pp 12–19) Atanassov, K., & Gargov, G (1989) Interval-valued intuitionistic fuzzy sets Fuzzy Sets and Systems, 31(3), 343–349 Bezdek, J C., Ehrlich, R., et al (1984) FCM: The fuzzy C-means clustering algorithm Computers & GeoSciences, 10, 191–203 Birkin, M., & Clarke, G P (1991) Spatial interaction in geography Geography Review, 4(5), 16–24 Chunchun, H U., Lingkui, MENG., & Wenzhong, SHI (2008) Fuzzy clustering validity for spatial data Geo-spatial Information Science, 11(3), 191–196 Day, P., Pearce, J., & Dorling, D (2008) Twelve worlds: A geo-demographic comparison of global inequalities in mortality Journal of Epidemiology and Community Health, 62, 1002–1010 Feng, Z., & Flowerdew, R (1998) Fuzzy Geodemographics: A contribution from fuzzy clustering methods London: Taylor & Francis Gau, W L., & Buehrer, D J (1993) Vague sets IEEE Transactions on Systems, Man, and Cybernetics, 23, 610–614 Gibbs, Anne, Stillwell, John, & See, Linda (2010) A geodemographic classification of primary schools in London Kannan, S R., Ramathilagam, S., & Chung, P C (2012) Effective fuzzy c-means clustering algorithms for data clustering problems Expert Systems with Applications, 39(7), 6292–6300 Kim, K H., & Jun, Y B (2001) Intuitionistic fuzzy interior ideals of semigroups International Journal of Mathematics and Mathematical Sciences, 27(5), 261–267 Kohonen, T (2001) Self-organizing maps (3rd ed.) Berlin-Heidelberg: Springer Kỹỗỹkdeniz, T., Baray, A., Ecerkale, K., & Esnaf, S (2012) Integrated use of fuzzy cmeans and convex programming for capacitated multi-facility location problem Expert Systems with Applications, 39(4), 4306–4314 Lee, Jae Chang, Jhun, Myoungshic, & Jin, Seohoon (1999) Geo-demographic analysis for marketing applications: Megatrending lifestyles in Korea In Proceeding of bulletin of the international statistical institute, Finland (pp 1–4) 9859 Liu, Hsiang-Chuan (2010) Liu’s generalized intuitionistic fuzzy sets Journal of Educational Measurement and Statistics, 18(1), 6981 Loureiro, Miguel, Baỗóo, Fernando, Lobo, Victor (2006) Fuzzy classification of geodemographic data using self-organizing maps In Proceeding of 4th international conference of giscience 2006, Münster, Germany (pp 123–127) Lu, An, & Ng, Wilfred (2005) Vague sets or intuitionistic fuzzy sets for handling vague data: Which one is better? In Proceedings of the 24th international conference on conceptual modelling (ER 2005), Klagenfurt, Austria (pp 401–416) Mason, G A., & Jacobson, R D (2007) Fuzzy geographically weighted clustering In Proceedings of the 9th international conference on geocomputation, Maynooth, Eire, Ireland (electronic proceedings on CD-ROM) Mondal, Tapas Kumar., & Samanta, S K (2002) Generalized intuitionistic fuzzy sets Journal of Fuzzy Mathematics, 10, 839–861 Palmer, Claire (2008) Geodemographic analysis Pal, Nikhil R., Pal, Kuhu., Keller, James M., & Bezdek, James C (2005) A possibilistic fuzzy C-means clustering algorithm IEEE Transactions on Fuzzy Systems, 13(4), 517–530 Pelekis, Nikos, Iakovidis, Dimitris K., Kotsifakos, Evangelos E., & Kopanakis, Ioannis (2008) Fuzzy clustering of intuitionistic fuzzy data International Journal of Business Intelligence and Data Mining, 3(1), 45–65 Petersen, Jakob., Gibin, Maurizio., Longley, Paul., Mateos, Pablo., Atkinson, Philip., & Ashby, David (2011) Geodemographics as a tool for targeting neighbourhoods in public health campaigns Journal of Geographical Systems, 13, 173–192 Sen, M K., & Saha, N K (1986) On C-semigroup I Bulletin of the Calcutta Mathematical Society, 78(3), 180–186 Sleight, P (1993) Targeting customers: How to use geodemographics and lifestyle data in your business Henley-on-Thames: NTC Publication Son, L H., Lanzi, P L., Cuong, B C., & Hung, H A (2011) Data mining in GIS: A novel context-based fuzzy geographically weighted clustering algorithm In Proceedings of the 2011 3rd IEEE international conference on machine learning and computing (ICMLC 2011), Singapore (pp 508–511) Szmidt, E., & Kacprzyk, J (2000) Distances between intuitionistic fuzzy sets Fuzzy Sets and Systems, 114(3), 505–518 Tryon, R C., & Bailey, D E (1970) Cluster analysis New York: McGraw-Hill Uckun, Mustafa., OZTURK, Mehmet Ali., & Jun, Young Bae (2007) Intuitionistic fuzzy sets in gamma-semigroups Bulletin of the Korean Mathematical Society, 44(2), 359–367 UNSD Statistical Databases (2011) Demographic yearbook Walford, Nigel (2011) An introduction to geodemographic classification (census learning) Xu, Z S (2007) Some similarity measures of intuitionistic fuzzy sets and their applications to multiple attribute decision making Fuzzy Optimization and Decision Making, 6(2), 109–121 Xu, Z S (2009) Intuitionistic fuzzy hierarchical clustering algorithms Journal of Systems Engineering and Electronics, 20(1), 90–97 Xu, Z S., Chen, J., & Wu, J J (2008) Clustering algorithm for intuitionistic fuzzy sets Information Sciences, 178(18), 3775–3790 Xu, Zeshui, & Wu, Junjie (2010) Intuitionistic fuzzy C-means clustering algorithms Journal of Systems Engineering and Electronics, 21(4), 580–590 Zadeh, L A (1965) Fuzzy sets Information and Control, 8, 338–353 Zhang, P Z (1983) Fuzzy sets theory and applications Shanghai: Shanghai Scientific and Technical Publisher Zhang, H M., Xu, Z S., & Chen, Q (2007) On clustering approach to intuitionistic fuzzy sets Control and Decision, 22(7), 882–888 in Chinese ... use a real dataset of socio-economic demographic variables from United Nation Organization – UNO (UNSD Statistical Databases, 2011) These data have been collected from national statistical authorities... is not always the best choice This assumption leads to the issues of ecological fallacy An ecological fallacy is a logical fallacy in the interpretation of statistical data in an ecological study,... 0.07 In this paper, we focused on Geo-Demographic Analysis (GDA), a research area at the intersection between GIS and Data Mining that has been widely applied in real-world applications We discussed