Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
4,32 MB
Nội dung
Expert Systems with Applications 42 (2015) 3682–3701 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis Nguyen Tho Thong, Le Hoang Son ⇑ VNU University of Science, Vietnam National University, Hanoi, Viet Nam a r t i c l e i n f o Article history: Available online 31 December 2014 Keywords: Fuzzy sets Hybrid Intuitionistic Fuzzy Collaborative Filtering Intuitionistic fuzzy recommender systems Medical diagnosis Picture fuzzy clustering a b s t r a c t The health care support system is a special type of recommender systems that play an important role in medical sciences nowadays This kind of systems often provides the medical diagnosis function based on the historic clinical symptoms of patients to give a list of possible diseases accompanied with the membership values The most acquiring disease from that list is then determined by clinicians’ experience expressed through a specific defuzzification method An important issue in the health care support system is increasing the accuracy of the medical diagnosis function that involves the cooperation of fuzzy systems and recommender systems in the sense that uncertain behaviors of symptoms and the clinicians’ experience are represented by fuzzy memberships whilst the determination of the possible diseases is conducted by the prediction capability of recommender systems Intuitionistic fuzzy recommender systems (IFRS) are such the combination, which results in better accuracy of prediction than the relevant methods constructed on either the traditional fuzzy sets or recommender system only Based upon the observation that the calculation of similarity in IFRS could be enhanced by the integration with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method, in this paper we propose a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called HIFCF (Hybrid Intuitionistic Fuzzy Collaborative Filtering) Experimental results reveal that HIFCF obtains better accuracy than IFCF and the standalone methods of intuitionistic fuzzy sets such as De, Biswas & Roy, Szmidt & Kacprzyk, Samuel & Balamurugan and recommender systems, e.g Davis et al and Hassan & Syed The significance and impact of the new method contribute not only the theoretical aspects of recommender systems but also the applicable roles to the health care support systems Ó 2014 Elsevier Ltd All rights reserved Introduction In recent years, the health care support system or the clinical decision support system has emerged as an important tool in medical sciences to assist clinicians in decision making especially medical diagnosis specifying which diseases could be found from a list of measured symptoms of a patient as well as the most acquiring disease among them Physicians, nurses and other healthcare professionals use the health care support system to prepare a diagnosis and to review the diagnosis as a means of improving the final result According to Basu, Fevrier-Thomas, and Sartipi (2011), Foster, McGregor, and El-Masri (2005), and Kyriacou, Pattichis, and Pattichis (2009), the health care support system can be defined ⇑ Corresponding author at: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam Tel.: +84 904 171 284 E-mail addresses: nguyenthothongtt89@gmail.com (N.T Thong), sonlh@vnu edu.vn, chinhson2002@gmail.com (L.H Son) http://dx.doi.org/10.1016/j.eswa.2014.12.042 0957-4174/Ó 2014 Elsevier Ltd All rights reserved as computer applications that support and assist clinicians in improved decision-making by providing evidence-based knowledge with respect to patient data This type of computer-based system consists of three components: a language system, a knowledge system and a problem processing system It is able to handle complex problems, applying domain-specific expertise to assess the consequences of executing its recommendations There are two main types of the health care support system (Rouse, 2014) The first one uses a knowledge base, applies rules to patient data using an inference engine and displays the results to the end user Systems without a knowledge base, on the other hand, rely on machine learning to analyze clinical data (Fig 1) Machine learning methods are conducted to examine patients’ medical history in conjunction with relevant clinical researches, which are able to predict potential events ranging from drug interactions to disease symptoms Utilizing the medical diagnosis process, characteristics of an individual patient are matched to a computerized clinical N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 knowledge base and patient-specific assessment and recommendations are then presented to the clinical or the patient for a decision (Rajalakshmi, Mohan, & Babu, 2011) An important issue in the health care support system is increasing the accuracy of the medical diagnosis Previous researches concentrated on improving the machine learning methods/knowledge systems appeared in Phase of the medical diagnosis process in Fig A brief summary is shown as follows: A hybrid evolutionary algorithm between genetic programming and genetic algorithms (Tan, Yu, Heng, & Lee, 2003) Genetic algorithm (Anbarasi, Anupriya, & Iyengar, 2010) The combination of a type-2 fuzzy logic with genetic algorithm (Hosseini, Ellis, Mazinani, & Dehmeshki, 2011) An evolutionary artificial neural network approach based on the Pareto differential evolution algorithm augmented with local search (Abbass, 2002) Neuro-fuzzy inference system – CANFIS (Parthiban & Subramanian, 2008) Complex modular neural network (Kala, Janghel, Tiwari, & Shukla, 2011) Bayesian networks (Gevaert, De Smet, Timmerman, Moreau, & De Moor, 2006; Roberts, Kahn, & Haddawy, 1995) Hierarchical Association Rule Model – HARM (McCormick, Rudin, & Madigan, 2011) C4.5 Rule-PANE, which combines an artificial neural network ensemble with rule induction (Zhou & Jiang, 2003) Support vector machines (Kampouraki, Vassis, Belsis, & Skourlas, 2013) However, these methods often fail to achieve high accuracy of prediction with real medical diagnosis datasets This is because that the relations between the patients – the symptoms and the symptoms – the diseases (Fig 1) are often vague, imprecise and uncertain For instance, doctors could faced with patients who are likely to have personal problems and/or mental disorders so that the crucial patients’ signs and symptoms are missing, incomplete and vague even though the supports of patients’ medical histories and physical examination are provided within the diagnosis Even if information of patients are clearly provided, how to give accurate evaluation to given symptoms/diseases is another challenge requiring well-trained, copious-experienced physicians These evidences raise the need of using fuzzy set or its extension to model and assist the techniques that improve the accuracy of diagnosis The definition of fuzzy set is stated below Clinical Data (Patients-Symptoms) Phase Phase Knowledge System Phase Phase Machine Learning Consequent Rules Results (Patients-Diseases) Fig The medical diagnosis process of the health care support system 3683 Definition A Fuzzy Set (FS) (Zadeh, 1965) in a non-empty set X is a function l : X ! ½0; 1; x#lðxÞ; ð1Þ where lðxÞ is the membership degree of each element x X A fuzzy set can be alternately dened as, A ẳ fhx; lxịijx Xg: 2ị An extension of FS that is widely applied to the medical prognosis problem is Intuitionistic Fuzzy Set (IFS), which is defined as follows Definition An Intuitionistic Fuzzy Set (IFS) (Atanassov, 1986) in a non-empty set X is, n o e ẳ hx; l xị; c xịijx X ; A e e A A ð3Þ where le ðxÞ and ce ðxÞ are the membership and non-membership A A degrees of each element x X, respectively leA ðxÞ; ceA xị ẵ0; 1; 8x X; 4ị lexị ỵ ce xị 1; 5ị A A 8x X: The intuitionistic fuzzy index of an element showing the non-determinacy is denoted as, peA xị ẳ leA xị ỵ ceA xị; 8x X: 6ị when pe xị ẳ for 8x X, IFS returns to the FS set of Zadeh A Various researches utilizing FS and IFS for the medical diagnosis process can be found in the literature De, Biswas, and Roy (2001) extended the Sanchez’s approach with the notion of intuitionistic fuzzy set theory for medical diagnosis The information of symptoms – patients and symptoms – diseases are fuzzified by intuitionistic fuzzy memberships, and the possibilities of acquired diseases are calculated based on those membership values and intuitionistic fuzzy relations Szmidt and Kacprzyk (2001), Szmidt and Kacprzyk (2003, Szmidt and Kacprzyk (2004) used the concept of intuitionistic fuzzy set to express new aspects of imperfect information between the sets of symptoms and diagnoses and defined a new similarity measure between intuitionistic fuzzy sets for the applications of medical diagnostic reasoning Khatibi and Montazer (2009) employed five similarity measures of fuzzy sets and intuitionistic fuzzy sets to encounter uncertainty in medical pattern recognition The experimental results showed that both fuzzy sets and intuitionistic fuzzy sets have powerful capabilities to cope with the uncertainty in the medical pattern recognition problems but intuitionistic fuzzy sets especially the measure of Hausdorf and Mitchel yield better detection rate as a result of more accurate modeling which is involved with incurring more computational cost Own (2009) studied the switching relation between type-2 fuzzy sets and intuitionistic fuzzy sets to deal with the vagueness and insufficient information Moein, Monadjemi, and Moallem (2009) offered a hybrid fuzzy-neural automatic system for medical diagnosis without concerning about how to calculate the best membership function for each fuzzy data Neog and Sut (2011) introduced a matrix representation of fuzzy soft set and extended Sanchez’s approach for medical diagnosis using the notion of fuzzy soft complement Xiao et al (2012) proposed the concept of D–S generalized fuzzy soft sets by combining Dempster–Shafer theory of evidence and generalized fuzzy soft sets A new method of evaluation based on D–S generalized fuzzy soft sets was presented and applied to the medical diagnosis Agarwal, Hanmandlu, and Biswas (2011) introduced a generalized intuitionistic fuzzy soft set and a new scoring function to compare two 3684 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 intuitionistic fuzzy numbers for multi-criteria medical diagnosis Meenakshi and Kaliraja (2011) presented a method that extends Sanchez’s approach for medical diagnosis through the arithmetic mean of an interval valued fuzzy matrix, which is a simpler technique than that of using intuitionistic fuzzy sets Ahn, Han, Oh, and Lee (2011) developed an interview chart with interval fuzzy degrees based on the relation between symptoms and diseases (three types of headache), and utilized the interval-valued intuitionistic fuzzy weighted arithmetic average operator to aggregate fuzzy information from the symptoms A measure based on distance between interval-valued intuitionistic fuzzy sets for medical diagnosis was also presented Samuel and Balamurugan (2012) proposed a new technique named intuitionistic fuzzy max–min composition to study the Sanchez’s approach for medical diagnosis Shinoj and John (2012) introduced a new concept namely intuitionistic fuzzy multisets, which are the combination of intuitionistic fuzzy sets and fuzzy multisets of Yager Intuitionistic fuzzy multisets are characterized by the count membership and the count non-membership functions, and when the sum of these functions is equal to one, intuitionistic fuzzy multisets returns to intuitionistic fuzzy sets Intuitionistic fuzzy multisets are used to model the symptoms by various timestamps Other recent works could be found in Ahn (2014), Bora, Bora, Neog, and Sut (2014), Bourgani, Stylios, Manis, and Georgopoulos (2014), Das and Kar (2014), Muthuvijayalakshmi, Kumar, and Venkatesan (2014), Nguyen, Khosravi, Creighton, and Nahavandi (2014), Sanz, Galar, Jurio, Brugos, Pagola, et al (2014), Shanmugasundaram and Seshaiah (2014), Sharaf-El-Deen, Moawad, and Khalifa (2014) The limitations of the relevant researches utilizing FS and IFS for the medical diagnosis process are: Firstly, these works calculate the relation between the patients and the diseases solely from those between the patients – the symptoms and the symptoms – the diseases In some practical cases where the relation between the patients – the symptoms or the symptoms – the diseases is missing, those works could not be performed This fact is happened in reality since clinicians somehow not accurately express the values of membership and non-membership degrees of symptoms to diseases or vive versa; secondly, the information of previous diagnoses of patients could not be utilized That is to say, a patient has had some records in the patients-diseases databases beforehand Nevertheless, the calculation of the next records of this patient is made solely on the basis of both the relations between the patients – the symptoms and the symptoms – the diseases Historic diagnoses of patients are not taken into account so that the accuracy of diagnosis may not be high as a result; thirdly, the determination of the most acquiring disease is dependent from the defuzzification method For instance, De et al (2001) used the hybrid function of membership and non-membership values for the defuzzification, Samuel and Balamurugan (2012) relied on the reduction matrix from W PD and Szmidt and Kacprzyk (2001), Szmidt and Kacprzyk (2003, Szmidt and Kacprzyk (2004), Khatibi and Montazer (2009) and Shinoj and John (2012) employed the distance functions Independent determination from the defuzzification method should be investigated for the stable performance of the algorithm Due to these reasons, a combination of fuzzy sets and a machine learning method is a good choice to eliminate the disadvantages of the relevant works using FS and IFS Recommender Systems – RS (Ricci, Rokach, & Shapira, 2011) are such the machine learning method, which can give users information about predictive ‘‘rating’’ or ‘‘preference’’ that they would like to assess an item; thus helping them to choose the appropriate item among numerous possibilities This kind of expert systems is now commonly popularized in numerous application fields such as books, documents, images, movie, music, shopping and TV programs personalized systems Recommender Systems have been applied to medical diag- nosis Davis, Chawla, Blumm, Christakis, and Barabási (2008) proposed CARE, a Collaborative Assessment and Recommendation Engine, which relies only on a patient’s medical history in order to predict future diseases risks and combines collaborative filtering methods with clustering to predict each patient’s greatest disease risks based on their own medical history and that of similar patients An iterative version of CARE so-called ICARE that incorporates ensemble concepts for improved performance was also introduced These systems required no specialized information and provided predictions for medical conditions of all kinds in a single run Hassan and Syed (2010) employed a collaborative filtering framework expressed in Eq (7) that assessed patient risk both by matching new cases to historical records and by matching patient demographics to adverse outcomes so that it could achieve a higher predictive accuracy for both sudden cardiac death and recurrent myocardial infraction than popular classification approaches such as logistic regression and support vector machines Ra; i ị ẳ r a ỵ P b2Unfag SIMða; bÞ P à ðr b;ià À r b Þ b2Unfag jSIMða; bÞj à ; ð7Þ where a; b are patients and i is the considered disease The similarity between two patients – SIMða; bÞ is calculated by the Pearson à coefficient from the demographic information of patients Rða; i Þ Ã Ã and rb;i are the possibilities of acquiring disease i of patient a and b, respectively and rb are the average possibilities of acquiring all diseases of patient a and b, respectively More works on the applications of RS to the medical diagnosis could be referenced in Duan, Street, and Xu (2011), Meisamshabanpoor and Mahdavi (2012), West and Marion (2014) and our previous works in Cuong, Son, and Chau (2010), Son, Cuong, Lanzi, and Thong (2012), Son, Lanzi, Cuong, and Hung (2012), Son, Cuong, and Long (2013), Son, Linh, and Long (2014), Thong and Son (2014), Son (2014a), Son (2014b, Son (2014c, Son (2015) and Son and Thong (2015) The standalone RS methods such as the works of Davis et al (2008), Hassan and Syed (2010), Duan et al (2011), Meisamshabanpoor and Mahdavi (2012) and West and Marion (2014) are solely effective with the crisp dataset but not the fuzzy one Moreover, they work only if the historic diagnoses of patients for the prediction are provided, and their accuracies of diagnosis are depended on the defuzzification method Therefore, a cooperation of fuzzy systems and recommender systems is regarded as an effective strategy to exclude the drawbacks of both the researches using FS and IFS only in the sense that uncertain behaviors of symptoms and the clinicians’ experience are represented by fuzzy memberships whilst the determination of the possible diseases is conducted by the prediction capability of recommender systems Intuitionistic fuzzy recommender systems – IFRS (Son & Thong, 2015) are such the combination, which results in better accuracy of prediction than the relevant standalone methods constructed on either the traditional fuzzy sets or recommender systems only This work is the first effort to initiate fuzzy-based recommender systems for the health care support system In this research, new definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS (MC-IFRS) that extend the definition of RS taking into account a feature of a user and a characteristic of an item expressed by intuitionistic linguistic labels were proposed Next, new definitions of intuitionistic fuzzy matrix (IFM), which is a representation of SCIFRS and MC-IFRS in the matrix format and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs with the intersection/ union operation were presented and used to design some new similarity degrees of IFMs such as the intuitionistic fuzzy similarity matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD) From these similarity functions, a novel Intuitionistic Fuzzy Collaborative Filtering method so-called Intuitionistic Fuzzy Collaborative Filtering (IFCF) was presented for the medical diagnosis problem 3685 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 IFCF has been validated on benchmark medical diagnosis datasets from UCI Machine Learning Repository in terms of the accuracy of diagnosis and showed better performance than the standalone methods of FS and RS The motivation and contributions of this paper are elicited as follows IFCF used IFSD to calculate the similarity between two patients This measure is the generalization of the hard user-based, item-based and the rating-based similarity degrees in RS (Ricci et al., 2011) Nonetheless, IFSD could be enhanced by the integration with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method That is to say, if we know the new patient belongs to which group then the similarities of this patient with others in the group should be given a high influence in the calculation of IFSD Therefore, in this paper we propose a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF) HIFCF makes uses of a newest picture fuzzy clustering method namely Distributed Picture Fuzzy Clustering Method – DPFCM (Son, 2015) to classify the patients into some groups according to the relations information of patients Then, the possibility of a patient belonging to a certain cluster is used to calculate the similarity degrees between users They are supplemented into IFSD to give the final similarity between patients The new hybrid algorithm HIFCF will be validated experimentally on benchmark UCI Machine Learning Repository dataset and compared with the relevant methods in terms of accuracy The rests of the paper are organized as follows Section presents the new algorithm HIFCF Section validates the proposed model by experiments Section gives the conclusions and future works of the paper The medical diagnosis problem aims to determine the relation between the patients and the diseases described by the set – RPD ¼ fRPD ðP i ; Dj ịj 8i ẳ 1; ; n; 8j ¼ 1; ; kg where RPD ðPi ; Dj Þ is either or showing that patient Pi acquires disease Dj or not The medical diagnosis problem can be shortly represented by the implication fRPS ; RSD g ! RPD Definition (Single-criterion intuitionistic fuzzy recommender systems – SC-IFRS (Son & Thong, 2015)) The utility function R is a mapping specified on ðX; YÞ as follows R:XÂY !D ðl1X ðxÞ; c1X ðxÞÞ; * Definition (Medical diagnosis (Son & Thong, 2015)) Given three lists: P ¼ fP1 ; ; P n g; S ¼ fS1 ; ; Sm g and D ¼ fD1 ; ; Dk g where P is a list of patients, S a list of symptoms and D a list of diseases, respectively Three values n; m; k N þ are the numbers of patients, symptoms and diseases, respectively The relation between the patients and the symptoms is characterized by the setRPS ẳ fRPS P i ; Sj ịj 8i ¼ 1; ; n; 8j ¼ 1; ; mg where RPS ðP i ; Sj Þ shows the level that patient Pi acquires symptom Sj and is represented by either a numeric value or a (intuitionistic) fuzzy value depending on the domain of the problem Analogously, the relation between the symptoms and the diseases is expressed as RSD ¼ fRSD ðSi ; Dj Þj 8i ¼ 1; ; m; 8j ¼ 1; ; kg where RSD ðSi ; Dj Þ reflects the possibility that symptom Si would lead to disease Dj * ðl2Y ðyÞ; c2Y ðyÞÞ; + * ðl2D ðDÞ; c2D ðDÞÞ; + ! ðlsX ðxÞ; csX ðxÞÞ ðlsY ðyÞ; csY ðyÞÞ lsD Dị; csD Dịị 8ị where liX xị ẵ0; (resp ciX xị ẵ0; 1), 8i f1; ; sg is the membership (resp non-membership) value of the patient to the linguistic label ith of feature X: ljY yị ẵ0; (resp cjY yị ½0; 1), 8j f1; ; sg is the membership (resp non-membership) value of the symptom to the linguistic label jth of characteristic Y: Finally, llD Dị ẵ0; (resp clD Dị ẵ0; 1), 8l f1; ; sg is the membership (resp non-membership) value of disease D to the linguistic label lth SC-IFRS provides two basic functions: (a) Prediction: determine the values of ðllD ðDÞ; clD ðDÞÞ; 8l f1; ; sg; à (b) Recommendation: choose i ẵ1; s satisfying i ẳ arg maxiẳ1;s fliD Dị ỵ liD Dị1 liD Dị ciD Dịịg Definition (Multi-criteria intuitionistic fuzzy recommender systems – MC-IFRS (Son & Thong, 2015)) The utility function R is a mapping specified on ðX; YÞ below R : X  Y ! D1  Á Á Á  Dk ðl1X ðxÞ; c1X ðxÞÞ; * ðl2X ðxÞ; c2X ðxÞÞ; 2.1 Intuitionistic fuzzy recommender system Firstly, the definition of medical diagnosis under the light of intuitionistic fuzzy sets is described as follows + ðl1D ðDÞ; c1D ðDÞÞ;  The proposed method In this section, we firstly recall some principal terms and algorithms of Intuitionistic fuzzy recommender system – IFRS (Son & Thong, 2015) especially the Intuitionistic Fuzzy Collaborative Filtering – IFCF algorithm in Section 2.1 Secondly, we recall one of the best recently-published picture fuzzy clustering methods namely Distributed Picture Fuzzy Clustering Method – DPFCM (Son, 2015) used to classify the patients into some groups according to their relations information in Section 2.2 Thirdly, the main contribution of the paper regarding a novel hybrid model between DPFCM and IFRS for medical diagnosis so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF) is presented in Section 2.3 Lastly, some theoretical analyses of the new algorithm are made in Section 2.4 ðl2X ðxÞ; c2X ðxÞÞ; ðl1Y ðyÞ; c1Y ðyÞÞ; ðl1Y ðyÞ; c1Y ðyÞÞ; + *  ðl2Y ðyÞ; c2Y ðyÞÞ; ðl1D ðD1 Þ; c1D ðD1 ÞÞ; + * ! ðl2D ðD1 Þ; c2D ðD1 ÞÞ; ðlsX ðxÞ; csX ðxÞÞ ðlsY ðyÞ; csY ðyÞÞ ðlsD ðD1 Þ; csD ðD1 ÞÞ + ðl1D ðDk Þ; c1D ðDk ÞÞ; *  Á ÁÁ  ðl2D ðDk Þ; c2D ðDk ÞÞ; + ð9Þ ðlsD ðDk Þ; csD ðDk ÞÞ MC-IFRS is the system that provides two basic functions below (a) Prediction: determine the values of ðllD ðDi Þ; clD ðDi ÞÞ; 8l f1; ; sg; 8i f1; ; kg; à à (b) Recommendation: choose i ½1; s satisfying i ¼ Pk arg maxi¼1;s f j¼1 wj liD Dj ị ỵ liD Dj ị1 liD Dj Þ À ciD ðDj ÞÞÞg where wj ½0; 1 is the weight of Dj satisfying the constraint: Pk j¼1 wj ¼ 3686 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Table The relation between the patients and the symptoms Table The recommended diseases P Temperature Headache Stomach_pain Cough Chest_pain P Viral_Fever Malaria Typhoid Stomach Chest Ram Mari Sugu Somu (0.8, 0.1) (0, 0.8) (0.8, 0.1) (0.6, 0.1) (0.6, 0.1) (0.4, 0.4) (0.8, 0.1) (0.5, 0.4) (0.2, 0.8) (0.6, 0.1) (0, 0.6) (0.3, 0.4) (0.6, 0.1) (0.1, 0.7) (0.2, 0.7) (0.7, 0.2) (0.1, 0.6) (0.1, 0.8) (0, 0.5) (0.3, 0.4) Sugu Somu 0.5537 0.5358 0.6552 0.6552 0.4032 0.4068 0.504 0.4446 0.122 0.122 Table The training dataset with Definition (Son & Thong, 2015) Suppose that Z and Z are two IFM in MC-IFRS The intuitionistic fuzzy similarity matrix (IFSM) between Z and Z is defined as follows ⁄ being the values to be predicted P Viral_Fever Malaria Typhoid Stomach Chest Ram Mari Sugu Somu (0.4, 0.1) (0.3, 0.5) (0.7, 0.1) (0.2, 0.6) (0.6, 0.1) (0.4, 0.4) (0.2, 0.4) (0.6, 0.1) (0.2, 0.6) (0.1, 0.7) ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ A representation of MC-IFRS in the matrix format is demonstrated as follows e S 11 Be B S 21 B Be BS e S ¼ B 31 Be B S 41 B @ e S t1 a12 a1s Bb B 21 B B c31 Z¼B Bc B 41 B @ b22 b2s C C C c3s C C: c4s C C C A ct1 c32 c42 ct2 cts In Eq (10), t ¼ k þ where k N is the number of diseases in Definition The value s Nỵ is the number of intuitionistic linguistic labels a1i ; b2i ; chi ; 8h f3; ; tg; 8i f1; ; sg are the intuitionistic fuzzy values (IFV) consisting of the membership and non-membership values as in Denition a1i ẳ liX xị; ciX ðxÞÞ; 8i f1; ; sg represents for the IFV value of the patient to the linguistic label ith of feature X b2i = (liY(y), ciY(y)), "i e {1, , s} stands for the IFV value of the symptom to the linguistic label ith of characteristic Y chi = (liD(Dh-2), ciD(Dh-2)), "i e {1, , s}, "h e {3, , t} is the IFV value of the disease to the linguistic label ith Each line from the third one to the last in Eq (10) is related to a given disease Table The extracted SC-IFRS dataset with ⁄ being the values to be predicted P S Ram Temperatureð0:8; 0:1Þ; * + Headacheð0:6; 0:1Þ Stomach painð0:2; 0:8Þ; Coughð0:6; 0:1Þ Chest painð0:1; 0:6Þ Temperatureð0:0; 0:8Þ; * + Headacheð0:4; 0:4Þ Stomach painð0:6; 0:1Þ; Coughð0:1; 0:7Þ Chest painð0:1; 0:8Þ Temperatureð0:8; 0:1Þ; * + Headacheð0:8; 0:1Þ Stomach painð0:0; 0:6Þ; Coughð0:2; 0:7Þ Chest painð0:0; 0:5Þ Temperatureð0:6; 0:1Þ; * + Headacheð0:5; 0:4Þ Stomach painð0:3; 0:4Þ; Coughð0:7; 0:2Þ Chest painð0:3; 0:4Þ Mari Sugu Somu D Viral feverð0:4; 0:1Þ; + Malariað0:7; 0:1Þ Typhoidð0:6; 0:1Þ; Stomach problemð0:2; 0:4Þ Chest problemð0:2; 0:6Þ Viral feverð0:3; 0:5Þ; * + Malariað0:2; 0:6Þ Typhoidð0:4; 0:4Þ; Stomach problemð0:6; 0:1Þ Chest problemð0:1; 0:7Þ e S t2 ð11Þ ; ð12Þ 8i f1; ;sg; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ð1Þ ð2Þ À exp À1=2ð liY ðyÞ À liY ðyÞ þ ciY ðyÞ À ciY ðyÞÞ À expðÀ1Þ 13ị 8i f1; ;sg; e S hi ẳ ỵ exp1ị e S 2i ẳ 10ị e S 32 e S 42 e S 1s C e S 2s C C C e S 3s C C; e S 4s C C C A e S ts qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ð1Þ ð2Þ À exp À1=2 liX ðxÞ À liX xị ỵ ciX xị ciX xị a11 where, e S 1i ¼ À Definition (Son & Thong, 2015) An intuitionistic fuzzy matrix (IFM) Z in MC-IFRS is defined as, e S 12 e S 22 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cð1Þ ðDhÀ2 Þ À cð2Þ ðDhÀ2 Þ À exp À1=2 lð1Þ lð2Þ iD ðDhÀ2 Þ iD Dh2 ị ỵ iD iD À expðÀ1Þ ; ð14Þ 8i f1; ; sg; 8h f3; ;tg: Definition (Son & Thong, 2015) Suppose that Z and Z are two IFM in MC-IFRS The intuitionistic fuzzy similarity degree (IFSD) between Z and Z is s s t X s X X X SIMðZ ; Z ị ẳ a w1i e whi e S 1i ỵ b w2i e S 2i ỵ v S hi ; iẳ1 iẳ1 15ị hẳ3 iẳ1 e where and S is the IFSM between Z1 Z : W ẳ wij ị8i f1; ; tg; 8j f1; ; sg) is the weight matrix of IFSM between Z and Z satisfying, s X w1i ¼ 1; s X w2i ¼ 1; s X whi ¼ 1; i¼1 i¼1 i¼1 8h f3; ; tg; a ỵ b ỵ v ẳ 1: 16ị 17ị * Denition (Son & Thong, 2015) The formulas to predict the values of linguistic labels of patient P u ð8u f1; ; ngÞ to symptom Sj ð8j f1; ; mgÞ according to diseases ðD1 ; D2 ; ; Dk Þ in MCIFRS are: lPiDu Dj ị ẳ Pn Pv SIMPu ; Pv ị liD Dj ị v ẳ1P ; 8i f1; ; sg; n v ¼1 SIMðP u ; P v Þ 8j f1; ; kg; 8u f1; ; ng; cPiDu Dj ị ẳ Pn 18ị Pv SIMðPu ; Pv Þ Â ciD ðDj Þ v ¼1P ; n v ¼1 SIMðP u ; P v Þ 8j f1; ; kg; 8u f1; ; ng: 8i f1; ; sg; ð19Þ 3687 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Example We illustrate the steps of IFCF by an example in Son and Thong (2015) Assume that the system has four patients namely P = {Ram, Mari, Sugu, Somu}, five symptoms S = {Temperature, Headache, Stomach-pain, Cough, Chest-pain} and five diseases D = {Viral-Fever, Malaria, Typhoid, Stomach, Heart} The relation between the patients and the symptoms is illustrated in Table The training dataset is demonstrated in Table where ⁄ values in this table are needed to be predicted Motivated by Definition 4, we extract the results in Table from those in Tables and From Definition and a ¼ 0; b ¼ c ¼ 1=2; w1i ¼ w2i ¼ w3i ¼ 0:2, the IFSD between Sugu (Somu) and Ram & Mari are shown below IFSDðSugu; Ramị ẳ 0:87; 20ị IFSDSugu; Mariị ẳ 0:57; 21ị IFSDSomu; Ramị ẳ 0:83; 22ị IFSDSomu; Mariị ẳ 0:58: 23ị Table The pseudo-code of DPFCM Distributed Picture Fuzzy Clustering Method (DPFCM) I: – Data X whose number of elements (N) in r dimensions – Number of clusters: C – Number of peers: P ỵ Fuzzier m Threshold e > – Parameters: c; a1 ; a2 ; a; max Iter n o n o O: V ljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r ; ulkj ; glkj ; nlkj ịjl ẳ 1; P; k ¼ 1; Y l ; j ¼ 1; C DPFCM 1S: n o wljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r : Initialization: – Set the number of iterations: t ¼ Set Dlijh tị ẳ hlijh tị ẳ 0, (8il; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; r) – Randomize fðulkj ðtÞ; glkj ðtÞ; nlkj tịịjl ẳ 1; P; k ẳ 1; Y l ; j ẳ 1; Cg satisfying (31) Set wljh tị ¼ 1=rðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r) 2S: Calculate cluster centers V ljh ðtÞ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r) from ðulkj ðtÞ; glkj ðtÞ; nlkj ðtÞÞ; wljh ðtÞ and hlijh ðtÞ by (39) 3S: Calculate attribute-weights wljh t ỵ 1ị; l ẳ 1; P; j ẳ 1; C; h ẳ 1; rị from ulkj tị; glkj tị; nlkj ðtÞÞ; V ljh ðtÞ and Dlijh ðtÞ by (41) 4S: Send fDlijh ðtÞ; hlijh ðtÞ; V ljh ðtÞ; wljh t ỵ 1ịji; l ẳ 1; P; il; k ẳ 1; Y l ; j ¼ 1; Cg to Master 5M: Calculates fDlijh t ỵ 1ị; hlijh t ỵ 1ịji; l ¼ 1; P; i–l; k ¼ 1; Y l ; j ¼ 1; Cg by (38) and (40) and send them to Slave peers 6S: Calculate cluster centers V ljh t ỵ 1ị, (l ẳ 1; P; j ẳ 1; C; h ẳ 1; r) from ulkj tị; glkj tị; nlkj tịị; wljh t ỵ 1ị and hlijh t þ 1Þ by (39) 7S: Calculate positive degrees fulkj ðt ỵ 1ịjl ẳ 1; P; k ẳ 1; Y l ; j ẳ 1; Cg from glkj tị; nlkj tịị; wljh t ỵ 1ị and V ljh t ỵ 1ị by (37) 8S: Compute neutral degrees fglkj t ỵ 1ịjl ¼ 1; P; k ¼ 1; Y l ; j ẳ 1; Cg from ulkj t ỵ 1ị; nlkj tịị; wljh t ỵ 1ị and V ljh t ỵ 1ị by (42) 9S: Calculate refusal degrees fnlkj t ỵ 1ịjl ¼ 1; P; k ¼ 1; Y l ; j ẳ 1; Cg from ulkj t ỵ 1ị; glkj t þ 1ÞÞ; wljh ðt þ 1Þ and V ljh ðt þ 1Þ by (43) 10S: If maxl fmaxfkulkj ðt þ 1ị ulkj tịk; kglkj t ỵ 1ị glkj tịk; knlkj t ỵ 1ị nlkj tịkgg < e or t > max Iter then stop the algorithm, Otherwise set t ẳ t ỵ and return Step 3S S: Operations in Slave peers M: Operations in the Master peer Fig The working flow of the hybrid model – HIFCF 3688 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig MAE values of algorithms by 2-fold cross validation Fig MAE values of algorithms by 3-fold cross validation Next, use Definition to calculate the predictive IFM results of Sugu and Somu Viral fever0:49; 0:38ị; * DiseaseSuguị ẳ Malaria0:52; 0:22ị + Typhoid0:36; 0:52ị; ; ð24Þ Stomach problemð0:40; 0:34Þ 2.2 Distributed Picture Fuzzy Clustering Method Chest problemð0:10; 0:68Þ Son (2015) has proposed a novel Distributed Picture Fuzzy Clustering Method on picture fuzzy sets so-called DPFCM Firstly, we raise the definition of picture fuzzy sets Viral fever0:47; 0:39ị; * DiseaseSomuị ẳ Malaria0:52; 0:22ị + Typhoid0:36; 0:51Þ; : Stomach problemð0:39; 0:47Þ Chest problemð0:10; 0:68Þ Based on the recommendation function of Definition and Eqs (24) and (25), we recommend the disease those patients suffer the most as in Table From this table, we conclude that Sugu and Somu both suffer from the Malaria ð25Þ Definition 10 A Picture Fuzzy Set (PFS) (Cuong & Kreinovich, 2013) in a non-empty set X is, È É A_ ¼ hx; lA_ ðxÞ; gA_ ðxÞ; cA_ ðxÞijx X ; ð26Þ 3689 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig MAE values of algorithms by 4-fold cross validation Fig MAE values of algorithms by 5-fold cross validation where lA_ ðxÞ is the positive degree of each element x X; gA_ ðxÞ is the neutral degree and cA_ ðxÞ is the negative degree satisfying the constraints, lA_ ðxÞ; gA_ ðxÞ; cA_ ðxÞ ½0; 1; 8x X; ð27Þ lA_ ðxÞ þ gA_ ðxÞ þ cA_ ðxÞ 1; ð28Þ 8x X: The refusal degree of an element is calculated as nA_ xị ẳ lA_ xị ỵ gA_ xị ỵ cA_ xịị; 8x X In cases nA_ xị ẳ PFS returns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), and when both gA_ xị ẳ nA_ xị ¼ 0, PFS returns to fuzzy sets (FS) (Zadeh, 1965) In DPFCM, the communication model is the facilitator or the Master–Slave model having a Master peer and P Slave peers, and each Slave peer is allowed to communicate with the Master only Each Slave peer has a subset of the original dataset X consisting of N data points in r dimensions We call the subset Y j ðj ¼ 1; Pị PP and [Pjẳ1 Y j ẳ X; jẳ1 jY j j ¼ N The number of dimensions in a subset is exactly the same as that in the original dataset The clustering problem is to divide the dataset X into C groups satisfying the objective function below J¼ Yl X P X C X l¼1 k¼1 j¼1 ulkj À glkj À nlkj !m r X wljh kX lkh À V ljh k2 h¼1 P X C X r X ỵc wljh log wljh ! min; 29ị lẳ1 jẳ1 h¼1 where ulkj ; glkj and nlkj are the positive, the neutral and the refusal degrees of data point kth to cluster jth in the Slave peer lth This reflects the clustering in the PFS set expressed through Definition 10 wljh is the attribute-weight of attribute hth to cluster jth in the 3690 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig MAE values of algorithms by 6-fold cross validation Fig MAE values of algorithms by 7-fold cross validation Slave peer lth V ljh is the center of cluster jth in the Slave peer lth according to attribute hth X lkh is the kth data point of the Slave peer lth according to attribute hth m and c are the fuzzifier and a positive scalar, respectively The constraints for (29) are shown below ulkj ; glkj ; nlkj ẵ0; 1; 30ị ulkj ỵ glkj ỵ nlkj 1; 31ị C X ulkj À glkj À nlkj j¼1 C X j¼1 n glkj ỵ lkj C ! ẳ 1; 32ị ẳ 1; 33ị r X wljh ẳ 1; 34ị hẳ1 V ljh ¼ V ijh ; ð8i – l; i; l ¼ 1; PÞ ð35Þ wljh ¼ wijh ð8i–l; i; l ¼ 1; PÞ ð36Þ The clustering model in Eqs (29)–(36) relies on the principles of the PFS set and the facilitator model By using the Lagranian method and the Picard iteration, the optimal solutions of this model are shown as in Eqs (37)–(43) À glkj À nlkj ulkj ¼ ; ð8l ¼ 1;P; k ¼ 1;Y l ; j ẳ 1;Cị; Pr PC w kX lkh ÀV ljh k2 mÀ1 h¼1 ljh P r i¼1 hẳ1 wlih kX lkh V lih k 37ị 3691 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig MAE values of algorithms by 8-fold cross validation Fig 10 MAE values of algorithms by 9-fold cross validation hlijh ẳ hlijh ỵ a1 V ljh V ijh ị; 8i l; i; l ẳ 1; P; j ẳ 1; C; h ẳ 1; rị; 38ị ! PY l ulkj m P exp À 1c kX lkh V ljh k2 ỵ c ỵ Pi¼1 Dlijh k¼1 1Àglkj Ànlkj i–l ! ; wljh ¼ PY l ulkj m P Pr 0 kX lkh V ljh k ỵ c ỵ Piẳ1 Dlijh0 h0 ẳ1 exp c  k¼1 1Àg Àn lkj lkj ð8l ¼ 1; P; j ẳ 1; C; h ẳ 1; rị; 41ị 8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rị; PY l V ljh ẳ m P ulkj wljh X lkh À Pi¼1 hlijh k¼1 1Àglkj Ànlkj i–l PY l ulkj m w ljh k¼1 1Àglkj Ànlkj il ; glkj ẳ nlkj ỵ 39ị C1 C PC PC ulkj Pr i¼1 ulki i¼1 nlki wlih kX lkh ÀV lih k2 h¼1 w kX ÀV k2 ljh lkh ljh ; mỵ1 8l ẳ 1; P; k ¼ 1; Y l ; j ¼ 1; Cị; Dlijh ẳ Dljih ỵ a2 wljh wijh ị; ð8i–l; i; l ¼ 1; P; j ¼ 1; C; h ẳ 1; rị; 40ị 42ị 1=a nlkj ẳ ulkj ỵ glkj ị ulkj ỵ glkj ịa ị 8l ẳ 1; P; k ẳ 1; Y l ; j ẳ 1; Cị: ; 43ị 3692 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 11 MAE values of algorithms by 10-fold cross validation Fig 12 The computational time of algorithms by various folds Details of the DPFCM algorithm (Son, 2015) are shown in the pseudo-code below DPFCM has advanced the clustering quality of the relevant algorithms by experiments on the benchmark UCI Machine Learning Repository datasets (Son, 2015) 2.3 Hybrid Intuitionistic Fuzzy Collaborative Filtering Consider the IFSD function in Eq (15) of Definition We must remind that IFSD could be enhanced by the integration with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method That is to say, if we know the new patient belongs to which group then the similarities of this patient with others in the group should be given a high influence in the calculation of IFSD Thus, in the new algo- rithm – Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF), DPFCM expressed in Table is used to classify the patients into N C groups according to the relations information of patients The relations are determined by the relationship between patients and symptoms as in Table They are defuzzified into crisp values before using the clustering algorithm For some crisp datasets such as HEART (University of California, 2007), which is then used later in the experiments, the relations are specified by all attributes of the dataset except the class attribute Notice that in this algorithm, some parameters are set up for the medical diagnosis as follows: P ¼ 0; c ¼ a1 ¼ a2 ¼ 1; a ¼ 0:5; max Iter ¼ 1000; m ¼ 2; e ¼ 0:001 Then, the possibility of a patient belonging to a certain cluster expressed in Eqs (44) and (45) is used to calculate the similarity degrees between users 3693 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 13 MAE values of algorithms with the cardinalities of testing being 10 Fig 14 MAE values of algorithms with the cardinalities of testing being 20 PROj; kị ẳ CSj; kị ; MaxCSi; kị 44ị P CSj; kị ẳ À i i iXjV k kX j kkV k k : ð45Þ where PROðj; kÞ is the possibility of patient j belonging to the cluster k; CSðj; kÞ is the counter-similarity between the patient j and the cluster k calculated in Eq (45) with X j and V k being the patient j and the center of cluster k respectively Based upon the fPROðj; kÞg information, we calculate the similarity degrees between users in Eq (46) C X SIMða; bị ẳ PROa; iị PROaịịPROb; iị PRObịị; group NC iẳ1 SIMa; bị ẳ SIM a; bị kị ỵ SIMa; bị k; history group 47ị where k ½0; 1 is an adjustable coefficient Clearly, we recognize that the similarity degrees between users derived from the picture fuzzy clustering are supplemented into IFSD to give the final similarity between patients Thus, higher accuracy of prediction is achieved with the integrated approach than other existing ones The overall process is illustrated in Fig 2.4 Theoretical analyses of the HIFCF algorithm We clearly recognize the following advantages of HIFCF N ð46Þ where PROðaÞ is the mean value of fPROa; kịg; 8k ẳ 1; N C Let us denote the IFSD degree in Eq (15) as SIMhistory ða; bÞ The final similarity degree is calculated in Eq (47) (a) HIFCF gives better accuracy of prediction than other relevant methods Apparently, it supplements the similarity degree conducted from the picture fuzzy clustering method so that better reflection of analogous patients to the considered one is obtained 3694 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 15 MAE values of algorithms with the cardinalities of testing being 30 Fig 16 MAE values of algorithms with the cardinalities of testing being 40 (b) HIFCF does not take much computational time in comparison with IFCF and other existing methods In fact, the clustering algorithm is executed one time only to determine the groups of users so that HIFCF is little slower than IFCF and other relevant clustering algorithms (c) Easy parameters control through the adjustable coefficient – k ½0; 1 That is to say, we can reduce the impact of a specific similarity degree through the use of this coefficient (d) Easy to implement and inherit from the existing methods Obviously, HIFCF is inherited mostly from IFCF but with the different similarity degree The steps of the algorithm are clear and simple so that they can be implemented quickly (e) Foster more advanced works on designing general similarity degrees to the IFRS to achieve high accuracy of prediction as in this research Evaluation 3.1 Experimental design In this part, we describe the experimental environments such as, Experimental tools: We have implemented the proposed hybrid algorithm – HIFCF in addition to IFCF (Son & Thong, 2015) and the typical standalone methods of IFS such as De et al (2001), Szmidt and Kacprzyk (2004), Samuel and Balamurugan (2012) and RS such as Davis et al (2008) and Hassan and Syed (2010) in PHP programming language and executed them on a PC Intel(R) core(TM) Duo CPU T6400 @ 2.00 GHz GB RAM The results are taken as the average value of 50 runs N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 3695 Fig 17 MAE values of algorithms with the cardinalities of testing being 50 Fig 18 MAE values of algorithms with the cardinalities of testing being 60 Evaluation indices: Mean Absolute Error (MAE) and the computational time Datasets: The benchmark medical diagnosis dataset namely HEART from UCI Machine Learning Repository (University of California, 2007) consisting of 270 patients characterized by 13 attributes Cross validation: The cross-validation method for the experiments is the k-fold validation with k from to 10 Besides testing with the k-fold validation, the random experiments with the cardinalities of the testing being from 10 to 100 random elements are also performed In order to validate the results with accurate classes, the intuitionistic defuzzification method (Albeanu & Popentiu-Vladicescu, 2010) is used for experimental algorithms Parameter setting: N C ¼ 2; k ¼ 0:4 and the weights in IFSD are set up as in Example Objective: & To evaluate HIFCF in comparison with the relevant algorithms in terms of accuracy through evaluation indices & To evaluate HIFCF by various parameters 3.2 Assessment In this section, we compare HIFCF with other algorithms in terms of accuracy and computational time by various numbers of folds determined by the cross-validation method The results are illustrated from Figs 3–12 Obviously, we recognize that MAE values of HIFCF are better than those of other algorithms in all cases of folds The average MAE value of HIFCF by folds is 0.395 whilst those of IFCF, DAVIS, HASSAN, DE, SAMUEL and SZMIDT are 0.491, 0.495, 0.487, 0.481, 0.519 and 0.656, respectively HIFCF takes little computational 3696 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 19 MAE values of algorithms with the cardinalities of testing being 70 Fig 20 MAE values of algorithms with the cardinalities of testing being 80 time to process, i.e 2.68 s (sec) on average It is longer than those of other algorithms with the numbers of the list above being 2.49, 2.76, 1.18, 0.10, 0.08 and 0.09, respectively In what follows, we continue to validate these algorithms by the random experiments The results described from Figs 13–23 also reaffirm the findings above with the average MAE value of HIFCF being smaller than those of other algorithms The MAE values of algorithms in the order above are 0.404, 0.492, 0.495, 0.488, 0.486, 0.514 and 0.486, respectively The average time of algorithms is 2.63, 1.26, 1.84, 0.26, 0.06, 0.06 and 0.06, respectively 3.3 Validation of HIFCF by parameters In this section, we also made other experiments of HIFCF by various numbers of clusters – N C and various coefficients – k The results are shown from Figs 24–27 From the achieved results, the recommendations are choosing small number of clusters, e.g in Figs 24 and 25 to get the best MAE value and reasonable computational time Analogously, the coefficients- k should be smaller than 0.5, ideally in the ranges [0.1, 0.2] and [0.4, 0.5] as expressed in Figs 26 and 27 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 3697 Fig 21 MAE values of algorithms with the cardinalities of testing being 90 Fig 22 MAE values of algorithms with the cardinalities of testing being 100 Conclusions In this paper, we concentrated on improving the accuracy of prediction in medical diagnosis of the health care support system Medical diagnosis is regarded as the determination of diseases could be found from a list of measured symptoms of a patient as well as the most acquiring disease among them Since the relations between the patients – the symptoms and the symptoms – the diseases are often vague, imprecise and uncertain, most of the machine learning methods failed to achieve high accuracy of prediction with real medical diagnosis datasets Due to these reasons, a combination of fuzzy sets and a machine learning method is a good choice to eliminate these disadvantages and those of the relevant works using standalone fuzzy sets and recommender systems Intuitionistic fuzzy recommender systems (IFRS) were such the combination, which resulted in better accuracy of prediction than the relevant standalone methods constructed on either fuzzy sets or recommender systems only IFRS used IFSD to calculate the similarity between two patients This measure could be enhanced by the integration with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method Therefore, our contribution in this paper was a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF), which makes uses of a newest picture fuzzy clustering method namely DPFCM to classify 3698 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 23 The computational time of algorithms by various cardinalities of testing Fig 24 MAE values of HIFCF by various numbers of clusters the patients into some groups according to the relations information of patients Then, the possibility of a patient belonging to a certain cluster is used to calculate the similarity degrees between users They are supplemented into IFSD to give the final similarity between patients This is the theoretical contribution of the paper in Expert and Intelligent Systems compared to those in related ones In the experiments, we have validated the new hybrid algorithm - HIFCF on the benchmark medical diagnosis dataset namely HEART from UCI Machine Learning Repository consisting of 270 patients characterized by 13 attributes under different cross validation methods and parameters settings The new algorithm was compared with some relevant works such as the intuitionistic fuzzy recommender systems and the standalone methods of intuitionistic fuzzy sets such as De, Biswas & Roy, Szmidt & Kacprzyk, Samuel & Balamurugan and recommender systems, e.g Davis et al and Hassan & Syed The findings from the research are: (i) HIFCF is better than other relevant methods in terms of accuracy with the average mean absolute error being 0.4; (ii) HIFCF is stable through various cross validation methods and parameters; (iii) suitable parameters of HIFCF are: small number of clusters and the coefficients which is smaller than 0.5; (iv) the computational time of HIFCF is larger than those of other algorithms but is acceptable The insightful and practical implications of the proposed research work could be interpreted as follows Firstly, this paper N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 3699 Fig 25 The computational time of HIFCF by various numbers of clusters Fig 26 MAE values of HIFCF by various values of lamda presented a know-how method of extending the similarity degree – IFSD used in intuitionistic fuzzy recommender systems by additional information of clusters This could somehow guide a variety of researches involving this kind of extension by other additional information or different types of similarity functions Secondly, enhancing the accuracy of medical diagnosis by the new hybrid method between picture fuzzy clustering and intuitionistic fuzzy recommender systems as in this research work guarantees the development of the health care support system Thirdly, the theoretical contribution of this paper could expand a minor research direction about picture fuzzy clustering and intuitionistic fuzzy recommender systems that are application-oriented One of the limitations of this research is the time complexity of the HIFCF algorithm Even though the executing time of HIFCF is approximately 2.68 s for a given dataset but in the context of the practical health care support system using large and multi-dimensional datasets, it should be accelerated and fasten up Thus, one further work of this theme could investigate the parallel processing of the hybrid model Next, another research limitation is the capability of the hybrid algorithm when dealing with a new patient That is to say, if a new patient migrates into the system then the clustering algorithm must re-run again for the whole dataset This could take lots of time and cannot be acceptable especially in the context above Indeed, developing a new semi-supervised hybrid model taken into account this situation is our second further work Some remain further research directions of this article could be: (i) building fuzzy rules derived from both picture fuzzy clustering and intuitionistic fuzzy recommender systems to make the prediction 3700 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Fig 27 The computational time of HIFCF by various values of lamda of items; (ii) proposing a new algorithm to deal with the imbalanced and missing value medical diagnosis datasets; (iii) Applying the hybrid algorithm for other group decision making problems Acknowledgments The authors are greatly indebted to the editor-in-chief, Prof B Lin and anonymous reviewers for their comments and their valuable suggestions that improved the quality and clarity of paper This work is sponsored by the NAFOSTED under contract No 102.05-2014.01 References Abbass, H A (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis Artificial Intelligence in Medicine, 25(3), 265–281 Agarwal, M., Hanmandlu, M., & Biswas, K K (2011) Generalized intuitionistic fuzzy soft set and its application in practical medical diagnosis problem Proceedings of 2011 IEEE international conference on fuzzy systems (pp 2972–2978) Ahn, J Y (2014) A comparison of distance measures for medical diagnosis ICIC Express letters Part B, Applications: An International Journal of Research and Surveys, 5(3), 871 Ahn, J Y., Han, K S., Oh, S Y., & Lee, C D (2011) An application of interval-valued intuitionistic fuzzy sets for medical diagnosis of headache International Journal of Innovative Computing, Information and Control, 7(5), 2755–2762 Albeanu, G., & Popentiu-Vladicescu, F L (2010) Intuitionistic fuzzy methods in software reliability modelling Journal of Sustainable Energy, 1(1), 30–34 Anbarasi, M., Anupriya, E., & Iyengar, N C S N (2010) Enhanced prediction of heart disease with feature subset selection using genetic algorithm International Journal of Engineering Science and Technology, 2(10), 5370–5376 Atanassov, K T (1986) Intuitionistic fuzzy sets Fuzzy Sets and Systems, 20(1), 87–96 Basu, R., Fevrier-Thomas, U., & Sartipi, K (2011) Incorporating hybrid CDSS in primary care practice management McMaster eBusiness Research Centre Bora, M., Bora, B., Neog, T J., & Sut, D K (2014) Intuitionistic fuzzy soft matrix theory and its application in medical diagnosis Annals of Fuzzy Mathematics and Informatics, 7(1), 143–153 Bourgani, E., Stylios, C D., Manis, G., & Georgopoulos, V C (2014) Time dependent fuzzy cognitive maps for medical diagnosis In Artificial intelligence: Methods and applications (pp 544–554) Springer International Publishing Cuong, B C., & Kreinovich, V (2013) Picture fuzzy sets – a new concept for computational intelligence problems In Proceedings of 2013 third world congress on information and communication technologies (pp 1–6) Cuong, B C., Son, L H., & Chau, H T M (2010) Some context fuzzy clustering methods for classification problems In Proceedings of the 2010 ACM symposium on information and communication technology (pp 34–40) Das, S., & Kar, S (2014) Group decision making in medical system: An intuitionistic fuzzy soft set approach Applied Soft Computing, 24, 196–211 Davis, D A., Chawla, N V., Blumm, N., Christakis, N., & Barabási, A L (2008) Predicting individual disease risk based on medical history In Proceedings of the 17th ACM conference on Information and knowledge management (pp 769–778) De, S K., Biswas, R., & Roy, A R (2001) An application of intuitionistic fuzzy sets in medical diagnosis Fuzzy Sets and Systems, 117(2), 209–213 Duan, L., Street, W N., & Xu, E (2011) Healthcare information systems: Data mining methods in the creation of a clinical recommender system Enterprise Information Systems, 5(2), 169–181 Foster, D., McGregor, C., & El-Masri, S (2005) A survey of agent-based intelligent decision support systems to support clinical management and research In Proceedings of the second international workshop on multi-agent systems for medicine, computational biology, and bioinformatics (pp 16–34) Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., & De Moor, B (2006) Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks Bioinformatics, 22(14), e184–e190 Hassan, S., & Syed, Z (2010) From netflix to heart attacks: Collaborative filtering in medical datasets In Proceedings of the first ACM international health informatics symposium (pp 128–134) Hosseini, R., Ellis, T., Mazinani, M., & Dehmeshki, J (2011) A genetic fuzzy approach for rule extraction for rule-based classification with application to medical diagnosis In Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD) (pp 05– 09) Kala, R., Janghel, R R., Tiwari, R., & Shukla, A (2011) Diagnosis of breast cancer by modular evolutionary neural networks International Journal of Biomedical Engineering and Technology, 7(2), 194–211 Kampouraki, A., Vassis, D., Belsis, P., & Skourlas, C (2013) E-doctor: A web based support vector machine for automatic medical diagnosis Procedia-Social and Behavioral Sciences, 73, 467–474 Khatibi, V., & Montazer, G A (2009) Intuitionistic fuzzy set vs fuzzy set application in medical pattern recognition Artificial Intelligence in Medicine, 47(1), 43–52 Kyriacou, E C., Pattichis, C S., & Pattichis, M S (2009) An overview of recent health care support systems for eEmergency and mHealth applications In Proceedings of the IEEE annual international conference of the engineering in medicine and biology society 2009 (pp 1246–1249) McCormick, T H., Rudin, C., & Madigan, D B (2011) A hierarchical model for association rule mining of sequential events: An approach to automated medical symptom prediction Annals of Applied Statistics, 1–19 Meenakshi, A R., & Kaliraja, M (2011) An application of interval valued fuzzy matrices in medical diagnosis International Journal of Mathematical Analysis, 5(36), 1791–1802 Meisamshabanpoor & Mahdavi, M (2012) Implementation of a recommender system on medical recognition and treatment International Journal of eEducation, e-Business, e-Management and e-Learning, 2(4), 315–318 Moein, S., Monadjemi, S A., & Moallem, P (2009) A novel fuzzy-neural based medical diagnosis system International Journal of Biological and Medical Sciences, 4(3), 146–150 Muthuvijayalakshmi, M., Kumar, E., & Venkatesan, P (2014) TB disease diagnosis using fuzzy max-min composition technique Fuzzy Systems, 6(1) Neog, T J., & Sut, D K (2011) An application of fuzzy soft sets in medical diagnosis using fuzzy soft complement International Journal of Computer Applications, 33, 30–33 N.T Thong, L.H Son / Expert Systems with Applications 42 (2015) 3682–3701 Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S (2014) Medical diagnosis by fuzzy standard additive model with wavelets In Proceedings of the 2014 IEEE international conference on fuzzy systems (pp 1937–1944) Own, C M (2009) Switching between type-2 fuzzy sets and intuitionistic fuzzy sets: An application in medical diagnosis Applied Intelligence, 31(3), 283–291 Parthiban, L., & Subramanian, R (2008) Intelligent heart disease prediction system using CANFIS and genetic algorithm International Journal of Biological, Biomedical and Medical Sciences, 3(3) Rajalakshmi, K., Mohan, S C., & Babu, S D (2011) Decision support system in healthcare industry International Journal of Computer Applications, 26(9), 42–44 Ricci, F., Rokach, L., & Shapira, B (2011) Introduction to recommender systems handbook US: Springer (pp 1–35) Roberts, L M., Kahn, E., & Haddawy, P (1995) Development of a Bayesian network for diagnosis of breast cancer In Proceedings of the IJCAI-95 workshop on building probabilistic networks, Montréal, Québec, Canada Rouse, M (2014) Clinical decision support system (CDSS) Available at: Samuel, A E., & Balamurugan, M (2012) Fuzzy max–min composition technique in medical diagnosis Applied Mathematical Sciences, 6(35), 1741–1746 Sanz, J A., Galar, M., Jurio, A., Brugos, A., Pagola, M., & Bustince, H (2014) Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system Applied Soft Computing, 20, 103–111 Shanmugasundaram, P., & Seshaiah, C V (2014) An application of intuitionistic fuzzy technique in medical diagnosis Australian Journal of Basic & Applied Sciences, 8(9) Sharaf-El-Deen, D A., Moawad, I F., & Khalifa, M E (2014) A new hybrid case-based reasoning approach for medical diagnosis systems Journal of Medical Systems, 38(2), 1–11 Shinoj, T K., & John, S J (2012) Intuitionistic fuzzy multi sets and its application in medical diagnosis World Academy of Science, Engineering and Technology, 6, 1418–1421 Son, L H (2014a) Enhancing Clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization Applied Soft Computing, 22, 566–584 Son, L H (2014b) HU-FCF: A hybrid user-based fuzzy collaborative filtering method in recommender systems Expert Systems with Applications, 41(15), 6861–6870 Son, L H (2014c) Optimizing municipal solid waste collection using chaotic particle swarm optimization in GIS based environments: A case study at Danang City, Vietnam Expert Systems with Applications, 41(18), 8062–8074 Son, L H (2015) DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets Expert Systems with Applications, 42(1), 51–66 3701 Son, L H., Cuong, B C., Lanzi, P L., & Thong, N T (2012) A novel intuitionistic fuzzy clustering method for geo-demographic analysis Expert Systems with Applications, 39(10), 9848–9859 Son, L H., Cuong, B C., & Long, H V (2013) Spatial interaction–modification model and applications to geo-demographic analysis Knowledge-Based Systems, 49, 152–170 Son, L H., Lanzi, P L., Cuong, B C., & Hung, H A (2012) Data mining in GIS: A novel context-based fuzzy geographically weighted clustering algorithm International Journal of Machine Learning and Computing, 2(3), 235–238 Son, L H., Linh, N D., & Long, H V (2014) A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network Engineering Applications of Artificial Intelligence, 29, 33–42 Son, L H., & Thong, N T (2015) Intuitionistic fuzzy recommender systems: An effective tool for medical diagnosis Knowledge-Based Systems, 74, 133–150 Szmidt, E., & Kacprzyk, J (2001) Intuitionistic fuzzy sets in some medical applications In Computational intelligence Theory and applications (pp 148–151) Berlin, Heidelberg: Springer Szmidt, E., & Kacprzyk, J (2003) An intuitionistic fuzzy set based approach to intelligent data analysis: An application to medical diagnosis In Recent advances in intelligent paradigms and applications (pp 57–70) Physica-Verlag HD Szmidt, E., & Kacprzyk, J (2004) A similarity measure for intuitionistic fuzzy sets and its application in supporting medical diagnostic reasoning In Artificial intelligence and soft computing – ICAISC 2004 (pp 388–393) Berlin Heidelberg: Springer Tan, K C., Yu, Q., Heng, C M., & Lee, T H (2003) Evolutionary computing for knowledge discovery in medical diagnosis Artificial Intelligence in Medicine, 27(2), 129–154 Thong, P H., & Son, L H (2014) A new approach to multi-variables fuzzy forecasting using picture fuzzy clustering and picture fuzzy rules interpolation method In Proceeding of sixth international conference on knowledge and systems engineering (pp 679–690) University of California (2007) UCI Repository of Machine Learning Databases Available at: West, T A., & Marion, D W (2014) Current recommendations for the diagnosis and treatment of concussion in sport: A comparison of three new guidelines Journal of Neurotrauma, 31(2), 159–168 Xiao, Z., Yang, X., Niu, Q., Dong, Y., Gong, K., Xia, S., et al (2012) A new evaluation method based on D–S generalized fuzzy soft sets and its application in medical diagnosis problem Applied Mathematical Modelling, 36(10), 4592–4604 Zadeh, L A (1965) Fuzzy sets Information and Control, 8, 338–353 Zhou, Z H., & Jiang, Y (2003) Medical diagnosis with C4 rule preceded by artificial neural network ensemble IEEE Transactions on Information Technology in Biomedicine, 7(1), 37–42 ... calculation of IFSD Therefore, in this paper we propose a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called Hybrid Intuitionistic. .. in Duan, Street, and Xu (2011), Meisamshabanpoor and Mahdavi (2012), West and Marion (2014) and our previous works in Cuong, Son, and Chau (2010), Son, Cuong, Lanzi, and Thong (2012), Son, Lanzi,... imperfect information between the sets of symptoms and diagnoses and defined a new similarity measure between intuitionistic fuzzy sets for the applications of medical diagnostic reasoning Khatibi and