DSpace at VNU: Intuitionistic fuzzy recommender systems: An effective tool for medical diagnosis

Knowledge-Based Systems 74 (2015) 133–150 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys Intuitionistic fuzzy recommender systems: An effective tool for medical diagnosis Le Hoang Son ⇑, Nguyen Tho Thong VNU University of Science, Vietnam National University, Viet Nam a r t i c l e i n f o Article history: Received 12 May 2014 Received in revised form October 2014 Accepted 10 November 2014 Available online 20 November 2014 Keywords: Accuracy Fuzzy sets Intuitionistic fuzzy collaborative filtering Intuitionistic fuzzy recommender systems Medical diagnosis a b s t r a c t Medical diagnosis has been being considered as one of the important processes in clinical medicine that determines acquired diseases from some given symptoms Enhancing the accuracy of diagnosis is the centralized focuses of researchers involving the uses of computerized techniques such as intuitionistic fuzzy sets (IFS) and recommender systems (RS) Based upon the observation that medical data are often imprecise, incomplete and vague so that using the standalone IFS and RS methods may not improve the accuracy of diagnosis, in this paper we consider the integration of IFS and RS into the proposed methodology and present a novel intuitionistic fuzzy recommender systems (IFRS) including: (i) new definitions of single-criterion and multi-criteria IFRS; (ii) new definitions of intuitionistic fuzzy matrix (IFM) and intuitionistic fuzzy composition matrix (IFCM); (iii) proposing intuitionistic fuzzy similarity matrix (IFSM), intuitionistic fuzzy similarity degree (IFSD) and the formulas to predict values on the basis of IFSD; (iv) a novel intuitionistic fuzzy collaborative filtering method so-called IFCF to predict the possible diseases Experimental results reveal that IFCF obtains better accuracy than the standalone methods of IFS such as De et al., Szmidt and Kacprzyk, Samuel and Balamurugan and RS, e.g Davis et al and Hassan and Syed Ó 2014 Elsevier B.V All rights reserved Introduction In this section, we formulate the medical diagnosis problem and give some illustrated examples in Section 1.1 Section 1.2 describes the relevant works using the intuitionistic fuzzy sets for the medical diagnosis problem Section 1.3 summarizes the limitations of those relevant works, and based on these facts the motivation and ideas of the proposed approach are highlighted in Section 1.4 Section 1.5 demonstrates our contributions in details, and their novelty and significance are discussed in Section 1.6 Lastly, Section 1.7 elaborates the organization of the paper 1.1 The medical diagnosis problem Medical diagnosis has been being considered as one of the most important and necessary processes in clinical medicine that determines acquired diseases of patients from given symptoms According to Kononenko [20], diagnosis commonly relates to the probability or risk of an individual developing a particular state of health over a specific time, based on his or her clinical and ⇑ Corresponding author at: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam Tel.: +84 904171284; fax: +84 0438623938 E-mail addresses: sonlh@vnu.edu.vn, chinhson2002@gmail.com (L.H Son) http://dx.doi.org/10.1016/j.knosys.2014.11.012 0950-7051/Ó 2014 Elsevier B.V All rights reserved non-clinical profile It is useful to minimize the risk of associated health complications such as osteoporosis, small bowel cancer and increased risk of other autoimmune diseases Mathematically, its definition is stated as follows Definition (Medical diagnosis) Given three lists: P = {P1, , Pn}, S = {S1, , Sm} and D = {D1, , Dk} where P is a list of patients, S a list of symptoms and D a list of diseases, respectively Three values n, m, k N+ are the numbers of patients, symptoms and diseases, respectively The relation between the patients and the symptoms is characterized by the set- RPS ¼ fRPS ðP i ; Sj ịj8i ẳ 1; ; n; 8j ¼ 1; ; mg where RPS(Pi, Sj) shows the level that patient Pi acquires symptom Sj and is represented by either a numeric value or a (intuitionistic) fuzzy value depending on the domain of the problem Analogously, the relation between the symptoms and the diseases is expressed as RSD ẳ fRSD Si ; Dj ịj8i ¼ 1; ; m; 8j ¼ 1; ; kg where RSD(Si, Dj) reflects the possibility that symptom Si would lead to disease Dj The medical diagnosis problem aims to determine the relation between the patients and the diseases described by the set- RPD ¼ fRPD P i ; Dj ịj8i ẳ 1; ; n; 8j ¼ 1; ; kg where RPD(Pi, Dj) is either or showing that patient Pi acquires disease Dj or not The medical diagnosis problem can be shortly represented by the implication fRPS ; RSD g ! RPD 134 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Example Consider the dataset in [31] having four patients namely P = {Ram, Mari, Sugu, Somu}, five symptoms S = {Temperature, Headache, Stomach-pain, Cough, Chest-pain} and five diseases D = {Viral-Fever, Malaria, Typhoid, Stomach, Heart} The relations between the patients – the symptoms and the symptoms – the diseases are illustrated in Tables and 2, respectively The relation between the patients and the diseases determined by the medical diagnosis is illustrated in Table Since the domain of the problem is the intuitionistic fuzzy values, this relation is also expressed in this form The most acquiring disease that the patients suffer is expressed in Table 4, which is converted from Table by a trivial defuzzification method considering the maximal membership degree of disease among all Medical diagnosis is considered as an efficient support tool for clinicians to make the right therapeutical decisions especially in the cases that medicine extends its predictive capacities using genetic data [5] As being observed in Table 3, medical diagnosis could assist the clinicians to enumerate the possible diseases of patients accompanied with certain membership values Thus, it is convenient for clinicians, who are experts in this field, to quickly diagnose and give proper medicated figures This fact clearly shows the importance of medical diagnosis in medicine sciences nowadays P Temperature Headache Stomach_pain Cough Chest_pain Ram Mari Sugu Somu (0.8, 0.1) (0, 0.8) (0.8, 0.1) (0.6, 0.1) (0.6, (0.4, (0.8, (0.5, (0.2, 0.8) (0.6, 0.1) (0, 0.6) (0.3, 0.4) (0.6, (0.1, (0.2, (0.7, (0.1, 0.6) (0.1, 0.8) (0, 0.5) (0.3, 0.4) 0.1) 0.7) 0.7) 0.2) S Viral_fever Malaria Typhoid Stomach Heart Temperature Headache Stomach_pain Cough Chest_pain (0.4, (0.3, (0.1, (0.4, (0.1, (0.7, 0) (0.2, 0.6) (0, 0.9) (0.7, 0) (0.1, 0.8) (0.3, (0.6, (0.2, (0.2, (0.1, (0.1, (0.2, (0.8, (0.2, (0.2, (0.1, 0.8) (0, 0.8) (0.2, 0.8) (0.2, 0.8) (0.8, 0.1) 0.3) 0.1) 0.7) 0.6) 0.9) 0.7) 0.4) 0) 0.7) 0.7) Table The relation between the patients and the diseases – RPD expressed by intuitionistic fuzzy values P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu (0.4, (0.3, (0.4, (0.4, (0.7, (0.2, (0.7, (0.7, (0.6, (0.4, (0.6, (0.5, (0.2, (0.6, (0.2, (0.3, (0.2, (0.1, (0.2, (0.3, 0.1) 0.5) 0.1) 0.1) 0.1) 0.6) 0.1) 0.1) 0.1) 0.4) 0.1) 0.3) Definition A Fuzzy Set (FS) [49] in a non-empty set X is a function ð1Þ where l(x) is the membership degree of each element x X A fuzzy set can be alternately defined as, A ẳ fhx; lxịijx Xg: 2ị An extension of FS that is widely applied to the medical prognosis problem is Intuitionistic Fuzzy Set (IFS), which is defined as follows Definition An Intuitionistic Fuzzy Set (IFS) [4] in a non-empty set X is, Table The relation between the symptoms and the diseases – RSD 0) 0.5) 0.7) 0.3) 0.7) Computerized techniques for medical diagnosis such as fuzzy set, genetic algorithms, neural networks, statistical tools and recommender systems aiming to enhance the accuracy of diagnosis have been being introduced widely [20] Nonetheless, an important issue in medical diagnosis is that the relations between the patients – the symptoms (RPS) and the symptoms – the diseases (RSD) are often vague, imprecise and uncertain For instance, doctors could faced with patients who are likely to have personal problems and/or mental disorders so that the crucial patients’ signs and symptoms are missing, incomplete and vague even though the supports of patients’ medical histories and physical examination are provided within the diagnosis Even if information of patients are clearly provided, how to give accurate evaluation to given symptoms/diseases is another challenge requiring welltrained, copious-experienced physicians These evidences raise the need of using fuzzy set or its extension to model and assist the techniques that improve the accuracy of diagnosis The definition of fuzzy set is stated below l : X ! ½0; 1 x # lðxÞ; Table The relation between the patients and the symptoms – RPS 0.1) 0.4) 0.1) 0.4) 1.2 The previous works 0.4) 0.1) 0.4) 0.4) 0.6) 0.7) 0.5) 0.4) Table The most acquiring diseases of patients P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 0 0 1 0 0 0 0 0 e¼ A nD E o x; le ðxÞ; ce ðxÞ jx X ; A A ð3Þ where leðxÞ and ce ðxÞ are the membership and non-membership A A degrees of each element x X, respectively leA ðxÞ; ceA ðxÞ ẵ0; 1; 8x X; lexị ỵ ce ðxÞ 1; 8x X: A A ð4Þ ð5Þ The intuitionistic fuzzy index of an element showing the non-determinacy is denoted as, peA xị ẳ leA xị ỵ ceA xị; 8x X: 6ị When pe xị ẳ for "x X, IFS returns to the FS set of Zadeh A Some extensions of fuzzy sets are not appropriate for modeling uncertainty in the medical diagnosis such as the rough set [28], rough soft sets [11,12,16], intuitionistic fuzzy rough sets [50] and soft rough fuzzy sets & soft fuzzy rough sets [23] The limitations of these sets, as pointed out by Yao [48], Rodriguez et al [30], Jafarian and Rezvani [17] and many other authors lie to their intrinsic nature and how they are organized and operated such as (i) The positive and the boundary rules are considered in rough sets and their variants so that in cases of many concepts, the negative rules would be redundant; (ii) The modeling of linguistic information is limited due to the elicitation of single and simple terms that should encompass and express the information provided by the experts regarding the a linguistic variable; (iii) if exact membership degrees cannot be determined due to insufficient information then it is impossible to consider the uncertainty on the membership 135 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 function Thus, these types of fuzzy sets could not be used for the application of medical diagnosis The first approach for the medical diagnosis problem was drawn from the Sanchez’s notion of medical knowledge [32] Since then several improvements of the Sanchez’s approach in association with IFS and other advanced fuzzy sets have been introduced De et al [9] fuzzified the relations between the patients – the symptoms and the symptoms – the diseases by intuitionistic fuzzy memberships and derived the relation between the patients and the diseases by means of intuitionistic fuzzy relations The algorithm contains the following steps Calculate the relation between the patients and the diseases by intuitionistic fuzzy relations with the membership and non-membership functions being expressed in Eqs (7) and (8), respectively È É lPD Pi ; Dj ị ẳ max minflPS Pi ; Sl Þ; lSD ðSl ; Dj Þg ; Table The SPD matrix where bold values imply the most possible disease P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 0.35 0.2 0.35 0.35 0.68 0.08 0.68 0.68 0.57 0.32 0.57 0.44 0.04 0.57 0.04 0.18 0.08 0.05 0.05 0.18 Table The WPD matrix P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu (0.4, (0.3, (0.4, (0.4, (0.7, (0.2, (0.7, (0.7, (0.6, (0.4, (0.6, (0.5, (0.2, (0.6, (0.2, (0.3, (0.2, (0.2, (0.2, (0.3, 0.9) 0.5) 0.9) 0.9) 0.9) 0.4) 0.9) 0.9) 0.9) 0.6) 0.9) 0.7) 0.6) 0.9) 0.6) 0.6) 0.4) 0.5) 0.5) 0.6) l¼1;m 8i f1; ; ng; 8j f1; ; kg; È ð7Þ É Table The reduction matrix where bold values imply the most possible disease cPD ðPi ; Dj ị ẳ maxfcPS Pi ; Sl ị; cSD Sl ; Dj ịg ; lẳ1;m 8i f1; ; ng; 8j f1; ; kg: ð8Þ Perform the defuzzification through the SPD, SPD ẳ lPD cPD pPD : 9ị P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 0.4 0.3 0.4 0.4 0.7 0.2 0.7 0.7 0.6 0.4 0.6 0.5 0.2 0.6 0.2 0.3 0.2 0.2 0.2 0.3 Determine the most acquiring diseases of patients based on the maximal SPD and minimal pPD Example Consider the dataset in Example The relation between the patients and the diseases calculated by Eqs (7) and (8) is expressed in Table The SPD matrix is described in Table Based upon this table, Ram, Sugu and Somu suffer from the Malaria and Mari acquires Stomach the most Samuel and Balamurugan [31] improved the method of De et al [9] by a new technique named intuitionistic fuzzy max–min composition This method is analogous to that of De et al [9] except that Steps & are replaced by, Compute W PD ¼ ðlPD ; À cPD Þ For each Pi find maxj fminðlPD ðP i ; Dj Þ; À cPD ðP i ; Dj ÞÞg and conclude the most acquiring diseases Use the Hamming or Euclidean function to calculate the relation between the patients and the diseases as in Eqs (10) and (11), respectively m À X l ðP i ; Sl ị l Sl ; Dj ị ỵ jc Pi ; Sl ị PS PS SD 2m lẳ1 cSD Sl ; Dj ị ỵ jpPS Pi ; Sl Þ À pSD ðSl ; Dj Þj ; 10ị RPD Pi ; Dj ị ẳ m Á X lPS ðPi ; Sl Þ À lSD Sl ; Dj ị 2m lẳ1 ỵ cPS ðP i ; Sl Þ À cSD ðSl ; Dj ị 1=2 ỵpPS Pi ; Sl ị pSD ðSl ; Dj ÞÞ2 : RPD ðPi ; Dj Þ ¼ ð11Þ Example Consider again the dataset in Example The WPD matrix is shown in Table The reduction of WPD is presented in Table From this table, Ram, Sugu and Somu suffer from the Malaria and Mari acquires Stomach the most Conclude the possible diseases of patients based on the minimal distance criterion Another approach for the medical diagnosis is utilizing the distance functions to calculate the relation between the patients and the diseases from the relations between the patients – the symptoms and the symptoms – the diseases as described in [42–44,19,33] The general activities of these algorithms are, Example Use this method for the dataset in Example 1, we have the relations between the patients and the diseases by the Hamming (Table 9) or Euclidean function (Table 10) The most acquiring diseases of patients are highlighted in bold Table The relation between the patients and the diseases – RPD in the method of De et al [9] expressed by intuitionistic fuzzy values P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu (0.4, (0.3, (0.4, (0.4, (0.7, (0.2, (0.7, (0.7, (0.6, (0.4, (0.6, (0.5, (0.2, (0.6, (0.2, (0.3, (0.2, (0.2, (0.2, (0.3, 0.1) 0.5) 0.1) 0.1) 0.1) 0.6) 0.1) 0.1) 0.1) 0.4) 0.1) 0.3) 0.4) 0.1) 0.4) 0.4) 0.6) 0.5) 0.5) 0.4) Besides these approaches, some authors have extended them for special cases, e.g multi-criteria medical diagnosis and the multiple time intervals modeling for the relation between the patients and the symptoms This requires the deployment on other advanced fuzzy sets such as the type-2 fuzzy sets [26], the interval-valued intuitionistic fuzzy sets [2], fuzzy soft set [25,47] and intuitionistic fuzzy soft set [1,21] The combination of these fuzzy sets with machine learning methods to handle the special cases such as the fuzzy-neural automatic system [27,24] and the type2 fuzzy genetic algorithm [45,14] was also investigated 136 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table The relation between the patients and the diseases by the Hamming function where bold values imply the most possible disease P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 0.28 0.40 0.38 0.28 0.24 0.50 0.44 0.30 0.28 0.31 0.32 0.38 0.54 0.14 0.50 0.44 0.56 0.42 0.55 0.54 Table 10 The relation between the patients and the diseases by the Euclidean function where bold values imply the most possible disease decision support systems, can give users information about predictive ‘‘rating’’ or ‘‘preference’’ that they would like to assess an item; thus helping them to choose the appropriate item among numerous possibilities This kind of expert systems is now commonly popularized in numerous application fields such as books, documents, images, movie, music, shopping and TV programs personalized systems The mathematical definition of RS is stated below Definition (Recommender Systems – RS [29]) Suppose U is a set of all users and I is the set of items in the system The utility function R is a mapping specified on U1 & U and I1 & I as follows R : U Â I1 ! P ð12Þ P Viral_fever Malaria Typhoid Stomach Heart ðu1 ; i1 Þ # Rðu1 ; i1 Þ; Ram Mari Sugu Somu 0.29 0.43 0.36 0.25 0.25 0.56 0.41 0.29 0.32 0.33 0.32 0.35 0.53 0.14 0.52 0.43 0.58 0.46 0.57 0.50 where R(u1, i1) is a non-negative integer or a real number within a certain range P is a set of available ratings in the system Thus, RS is the system that provides two basic functions below Ã 1.3 The limitations of the previous works Considering the relevant works involving the usage of the IFS set, we clearly recognize that IFS was used mainly for the applications of medical diagnosis among the advanced fuzzy sets Nonetheless, these works have the following disadvantages (a) The previous works calculate the relation between the patients and the diseases (RPD) solely from those between the patients – the symptoms (RPS) and the symptoms – the diseases (RSD) In some practical cases where the relation between the patients – the symptoms or the symptoms – the diseases is missing, those works could not be performed This fact is happened in reality since clinicians somehow not accurately express the values of membership and non-membership degrees of symptoms to diseases or vive versa; (b) The information of previous diagnoses of patients could not be utilized That is to say, a patient has had some records in the patients-diseases databases (RPD) beforehand Nevertheless, the calculation of the next records of this patient is made solely on the basis of both RPS and RSD Historic diagnoses of patients are not taken into account so that the accuracy of diagnosis may not be high as a result; (c) The determination of the most acquiring disease is dependent from the defuzzification method For instance, De et al [9] used SPD for the defuzzification, Samuel and Balamurugan [31] relied on the reduction matrix from WPD and Szmidt and Kacprzyk [42–44], Khatibi and Montazer [19] and Shinoj and John [33] employed the distance functions Independent determination from the defuzzification method should be investigated for the stable performance of the algorithm (d) Mathematical properties of operations such as the fuzzy implication in De et al [9], Samuel and Balamurugan [31] and the distance function in Szmidt and Kacprzyk [42–44], Khatibi and Montazer [19] and Shinoj and John [33] were not discussed in the equivalent articles Readers could not know the theoretical bases of these operations and why they were selected for the medical diagnosis problem 1.4 The motivation and ideas From the disadvantages of the previous works, our idea in this article is using the hybrid method between Recommender Systems (RS) and the IFS set to handle them RS, which are a subclass of Ã (a) Prediction: determine RðuÃ ; i Þ for any ðuÃ ; i Þ ðU; IÞ n ðU ; I1 Þ (b) Recommendation: choose i⁄ I satisfying i⁄ = arg maxi2IR(u, i) for all u U RS has been applied to the medical diagnosis problem Davis et al [8] proposed CARE, a Collaborative Assessment and Recommendation Engine, which relies only on a patient’s medical history in order to predict future diseases risks and combines collaborative filtering methods with clustering to predict each patient’s greatest disease risks based on their own medical history and that of similar patients An iterative version of CARE so-called ICARE that incorporates ensemble concepts for improved performance was also introduced These systems required no specialized information and provided predictions for medical conditions of all kinds in a single run Hassan and Syed [13] employed a collaborative filtering framework that assessed patient risk both by matching new cases to historical records and by matching patient demographics to adverse outcomes so that it could achieve a higher predictive accuracy for both sudden cardiac death and recurrent myocardial infraction than popular classification approaches such as logistic regression and support vector machines More works on the applications of RS could be referenced in Duan et al [10], Meisamshabanpoor and Mahdavi [22] and our previous works in [7,38,40,39,41,34–37] Example Consider the training dataset in Table 11 Taking a simple encoded method by multiplying the membership degree by 10 and adding the non-membership degree to it, we have a crisp training in Table 12 The method of Hassan and Syed [13] employed a collaborative filtering including the traditional Pearson coefficient to calculate the similarity between users and the k-nearest neighbor approximation function to predict the blank values in Table 12 The results are shown in Table 13 If taking the maximal value among all for a given patient in Table 13 then we can conclude that Ram, Sugu and Somu are suffered from Malaria and Mari acquires Stomach Analogously, Table 14 shows the results of the method of Davis Table 11 The training dataset with ⁄ being the values to be predicted P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu (0.4, 0.1) (0.3, 0.5) (0.4, 0.1) ⁄ (0.7, 0.1) (0.2, 0.6) (0.7, 0.1) ⁄ (0.6, 0.1) (0.4, 0.4) ⁄ (0.5, 0.3) (0.2, 0.4) (0.6, 0.1) ⁄ (0.3, 0.4) (0.2, 0.6) (0.1, 0.7) ⁄ ⁄ L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 12 The crisp training dataset with ⁄ being the values to be predicted P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 4.1 3.5 4.1 ⁄ 7.1 2.6 7.1 ⁄ 6.1 4.4 ⁄ 5.3 2.4 6.1 ⁄ 3.4 2.6 1.7 ⁄ ⁄ et al [8] where Ram is suffered from Malaria, Mari acquires Stomach and Sugu and Somu have Typhoid From Example 5, we clearly recognize the following facts: (a) RS could be applied to the medical diagnosis Yet in cases that the relations are expressed by fuzzy memberships as in Table 11, the accuracy of diagnosis in RS is dependent on the encoded method In the other words, RS is effective with the crisp dataset such as Table 12 but not the fuzzy one, e.g Table 11; (b) The problem of the previous researches about the dependence of the determination of the most acquiring disease from the defuzzification method, e.g the maximal function in Example still exists; (c) RS works only if the training dataset is provided That is to say, we must have the historic diagnoses of patients for the prediction From Sections 1.3 and 1.4 and illustrated examples, we clearly recognize that the IFS and RS approaches have their own advantages and disadvantages Thus, a combination of these approaches in order to combine the advantages and eliminate the disadvantages could handle the mentioned issues Scanning the literature, we realize that some hybrid methods were also designed for the medical diagnosis problem, to name but a few such as Davis et al [8] combined collaborative filtering methods with clustering; Kala et al [18] integrated genetic algorithms with modular neural network; Hosseini et al [14] joined a type-2 fuzzy logic with genetic algorithm These evidences show that the combination of groups of methods such as between RS and IFS is a trendy approach for medical diagnosis 1.5 The contributions of this work Based upon the observations, our contribution in this paper is a novel intuitionistic fuzzy recommender system (IFRS) for medical diagnosis consisting of the following components: Table 13 The full dataset derived by the method of Hassan & Syed [13] where bold values imply the most possible disease P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 4.1 3.5 4.1 5.9 7.1 2.6 7.1 9.8 6.1 4.4 4.8 5.3 2.4 6.1 1.9 3.4 2.6 1.7 3.2 5.5 Table 14 The full dataset derived by the method of Davis et al [8] where bold values imply the most possible disease P Viral_fever Malaria Typhoid Stomach Heart Ram Mari Sugu Somu 4.1 3.5 4.1 2.6 7.1 2.6 7.1 4.7 6.1 4.4 7.3 5.3 2.4 6.1 5.2 3.4 2.6 1.7 2.1 0.2 137 (a) The new definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS (MC-IFRS) that extend the definition of RS (Definition 4) taking into account a feature of a user and a characteristic of an item expressed by intuitionistic linguistic labels (See Section 2.1) These definitions are the basis for the deployment of similarity degrees used for the prediction of RPD(Pi, Dj) (Definition 1); (b) The new definitions of intuitionistic fuzzy matrix (IFM), which is a representation of SC-IFRS and MC-IFRS in the matrix format and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs with the intersection/union operation Some interesting theorems and properties of IFM and IFCM are presented (See Section 2.2); (c) Some new similarity degrees of IFMs such as the intuitionistic fuzzy similarity matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD) The formulas to predict RPD(Pi, Dj) on the basis of IFSD accompanied with an interesting theorem is proposed (See Section 2.3); (d) From the predicting formulas, a novel intuitionistic fuzzy collaborative filtering method so-called IFCF is presented for the medical diagnosis problem (See Section 2.4); (e) The validation of the IFCF method in comparison with the standalone methods of IFS such as De et al [9], Szmidt and Kacprzyk [44], Samuel and Balamurugan [31] and RS, e.g Davis et al [8], Hassan and Syed [13] is made by both a numerical illustration on the dataset in Example and the experiments on benchmark medical diagnosis datasets from UCI Machine Learning Repository in terms of the accuracy of diagnosis (See Section 3) 1.6 The novelty and significance of the proposed work According to the contributions in Section 1.5 and the limitations of IFS and RS in Sections 1.3 and 1.4, respectively, the novel and the significance of the proposed work are stressed as follows (a) The proposed work is different from the previous ones especially the standalone IFS and RS methods Specifically, it employs the ideas of both the IFS set and RS in the definitions of SC-IFRS and MC-IFRS, which are the basis to develop some new terms and similarity degrees for the IFCF algorithm Furthermore, as being observed from Example to 3, the determination of the relation between patients and diseases in the standalone IFS methods is performed by some operations such as the fuzzy implication in De et al [9], Samuel and Balamurugan [31] and the distance function in Szmidt and Kacprzyk [42–44] In the proposed work, this can be done through the intuitionistic fuzzy similarity degree (IFSD) in Section 2.3, which is developed based on SC-IFRS and MC-IFRS Comparing with the standalone RS methods such as Davis et al [8] and Hassan and Syed [13], the similarity degree – IFSD in the proposed work is constructed from the light of the IFS set but not by the Pearson coefficient from the hard values such as in Table 12 Additionally, the formulas to predict RPD(Pi, Dj) are also made according to the membership and non-membership functions but not by the hard values above These proofs demonstrate the novel of the proposed work; (b) The proposed hybrid method could handle the issues of the standalone IFS and RS methods For instance, the limitations of IFS relating to the missing relations and the historic diagnoses of patients stated in Section 1.3(a) and (b) and the limitations of RS relating to the crisp and training datasets stated in Section 1.4(a) and (c) are solved by the integration of IFS and RS The deficiency of mathematical properties of operations in Section 1.3(d) is resolved by a number of interesting 138 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 theorems and properties in Section Lastly, when predicting RPD(Pi, Dj), users could find a suitable defuzzification method for the determination of the most acquiring disease; (c) The proposal of this work is significance in terms of both theory and practice In the theoretical aspect, the proposed work motivates researching on advanced algorithms of IFS and RS especially the hybrid method between them to enhance the accuracy of the algorithm Looking for details in Section 1.5, we recognize that the proposed method is constructed on a well-defined mathematical foundation, which is not paid much attention in the previous researches Thus, this guarantees the further deployment of other advanced methods of both IFS and RS on such the mathematical foundation In the practical side, the proposed work contributes greatly to the medical diagnosis problem and some extensions and variants of this method could be quickly deployed for other socio-economic problems This clearly affirms the significance of the proposed work where liX(x) [0, 1] (resp ciX(x) [0, 1]), "i {1, , s} is the membership (resp non-membership) value of the patient to the linguistic label ith of feature X ljY(y) [0, 1] (resp cjY (y) [0, 1]), "j {1, , s} is the membership (resp non-membership) value of the symptom to the linguistic label jth of characteristicY Finally, llD(D) [0, 1] (resp clD(D) [0, 1]), "l {1, , s} is the membership (resp non-membership) value of disease D to the linguistic label lth SC-IFRS provides two basic functions: (a) Prediction: determine the values of ðllD ðDÞ; clD ðDÞÞ, "l {1, , s}; Ã (b) Recommendation: choose i⁄ [1, s] satisfying i ẳ arg maxiẳ1;s fliD Dị ỵ liD Dị1 liD ðDÞ À ciD ðDÞÞg Remark (a) From Definition and Eq (13), the medical diagnosis is represented by the implication {Patient, Symtomp} ? Disease, which is identical to that of Definition Thus, we clearly recognize that of SC-IFRS in Definition is another representation and an extension of medical diagnosis in Definition inspired by the ideas of RS in Definition 4; (b) SC-IFRS in Definition could be regarded as the extension of the traditional RS in Definition in cases that $i: liX(x) = ^ ciX(x) = 0; "j – i: ljX(x) = ^ cjX(x) = 1, $i: liY(y) = ^ ciY(y) = 0; "j – i: ljY(y) = ^ cjY(y) = 1, $i: liD(D) = ^ ciD(D) = 0; "j – i: ljD(D) = ^ cjD(D) = 1, 1.7 The organization of the paper The rest of the paper is organized as follows Section presents the main contribution including the IFRS and its elements stated in Section 1.5 Section validates the proposed approach through a set of experiments involving benchmark medical diagnosis data Section draws the conclusions and delineates the future research directions Intuitionistic fuzzy recommender systems In this section, we present the new definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS (MC-IFRS) in Section 2.1; the new definitions of intuitionistic fuzzy matrix (IFM) and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs with the intersection/union operation in Section 2.2; the intuitionistic fuzzy similarity matrix (IFSM), the intuitionistic fuzzy similarity degree (IFSD) and the formulas to predict RPD(Pi, Dj) on the basis of IFSD in Section 2.3; a novel intuitionistic fuzzy collaborative filtering method so-called IFCF in Section 2.4 2.1 The single-criterion and multi-criteria intuitionistic fuzzy recommender systems Recall P, S and D from Definition being the sets of patients, symptoms and diseases, respectively Each patient Pi ("i {1, , n}) (resp symptom Sj, "j {1, , m}) is assumed to have some features (resp characteristics) For the simplicity, we consider RS including a feature of the patient and a characteristic of the symptom denoted as X and Y, respectively X and Y both consist of s intuitionistic linguistic labels Analogously, disease Di ("i {1, , k}) also contains s intuitionistic linguistic labels A new definition of RS under the lights of those of medical diagnosis expressed by IFS and the traditional RS in Definition & respectively is stated as follows Definition (Single-criterion Intuitionistic Fuzzy Recommender Systems – SC-IFRS) The utility function R is a mapping specified on (X, Y) as follows R:XÂY !D * ðl1X ðxÞ; c1X ðxÞÞ; ðl2X ðxÞ; c2X ðxÞÞ;u ðlsX ðxÞ; csX ðxÞÞ + * Â ðl1Y ðyÞ; c1Y ðyÞÞ; ðl2Y ðyÞ; c2Y ðyÞÞ; ðlsY ðyÞ; csY ðyÞÞ + * ! À Á l1D ðDÞ; c1D ðDÞ ; + ðl2D ðDÞ; c2D ðDÞÞ; ; then the mapping in (13) could be re-written as, R:PÂS!D ð14Þ ððp; XÞ; ðs; YÞÞ # RPD : Now we extend SC-IFRS to handle the cases of multiple diseases D = {D1, , Dk} Definition (Multi-criteria Intuitionistic Fuzzy Recommender Systems – MC-IFRS) The utility function R is a mapping specified on (X, Y) below R : X Â Y ! D1 Â Á Á Á Â Dk * ðl1X ðxÞ; c1X ðxÞÞ; ðl2X ðxÞ; c2X ðxÞÞ; u + * Â ðlsX ðxÞ; csX ðxÞÞ * ! ðl1Y ðyÞ; c1Y ðyÞÞ; ðl2Y ðyÞ; c2Y ðyÞÞ; ðlsY ðyÞ; csY ðyÞÞ ðl1D ðD1 Þ; c1D ðD1 ÞÞ; ðl2D ðD1 Þ; c2D ðD1 ÞÞ; ðlsD ðD1 Þ; csD ðD1 ÞÞ + + * Â ÁÁÁ Â ðl1D ðDk Þ; c1D ðDk ÞÞ; ðl2D ðDk Þ; c2D ðDk ÞÞ; ð15Þ + : ðlsD ðDk Þ; csD ðDk ÞÞ MC-IFRS is the system that provides two basic functions below (a) Prediction: determine the values of ðllD ðDi Þ; clD ðDi ÞÞ, "l {1, , s}, "i {1, , k}; Ã (b) Recommendation: choose i⁄ [1, s] satisfying i ¼ nP o À Á k arg maxi¼1;s j¼1 wj liD Dj ị ỵ liD Dj ị1 liD Dj Þ À ciD ðDj ÞÞ where wj [0, 1] is the weight of Dj satisfying the constraint: Pk j¼1 wj ẳ lsD Dị; csD Dịị 13ị 139 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Example In a medical diagnosis system, there are patients whose feature X is ‘‘Age’’ consisting of linguistic labels {low, medium, high} (s = 3) The symptom‘s characteristic Y is ‘‘Temperature’’ including linguistic labels {cold, medium, hot} The diseases (D1, D2) are {‘‘Flu’’, ‘‘Headache’’}, and both of them contain linguistic labels {Level 1, Level 2, Level 3} We would like to verify which ages of users and types of temperature are likely to cause the diseases of flu and headache In this case we have a MC-IFRS system By using the trapezoidal intuitionistic fuzzy number – TIFN ([3]) characterized À Á by a1 ; a2 ; a3 ; a4 ; a01 ; a04 with a01 a1 a2 a3 a4 a04 , the membership (non-membership) functions of patients to the linguistic label ith of feature X are: x 20 > < llow xị ẳ 35 xị=15 20 < x 35 ; > : x > 35 x 20 > < v low xị ẳ > ðx À 20Þ=15 20 < x 35 ; : x > 35 x 20; x > 60 > > > > < ðx À 20ị=15 20 < x 35 lmedium xị ẳ ; > 35 < x 45 > > > : ð60 À xÞ=15 45 < x 60 x 20; x > 60 > > > > < ð35 À xÞ=15 20 < x 35 v medium xị ẳ > ; 35 < x 45 > > > : ðx À 45Þ=15 45 < x 60 x 45 > < lhigh xị ẳ x 45ị=15 45 < x 60 ; > : x > 60 x 45 > < v high xị ẳ > ð60 À xÞ=15 45 < x 60 : : x > 60 ð16Þ Joeð45tÞ : hhighð0; 1Þ; mediumð1; 0Þ; lowð0; 1Þi; Tedð50tÞ : hhighð0:33; 0:67Þ; mediumð0:67; 0:33Þ; lowð0; 1Þi: ð17Þ ð18Þ ð19Þ ð29Þ 35 < x 40 ð30Þ x 35 > < v hot xị ẳ > 40 xị=5 35 < x 40 : : x > 40 ð31Þ ð4 CÞ : hcoldð1; 0Þ; mediumð0; 1Þ; hotð0; 1Þi; ð32Þ ð16 CÞ : hcoldð0:267; 0:733Þ; mediumð0:733; 0:267Þ; hotð0; 1Þi; ð33Þ ð39 CÞ : hcoldð0; 1Þ; mediumð0:2; 0:8Þ; hotð0:8; 0:2Þi; ð34Þ ð35Þ From Eqs (22)–(25), (32)–(35) we have a MC-IFRS described by Table 15 In Table 15, the cells having question marks are needed to predict the intuitionistic fuzzy values ðllD ðDi Þ; clD ðDi ÞÞ; 8l f1; 2; 3g; 8i f1; 2g 2.2 Intuitionistic fuzzy matrix and intuitionistic fuzzy composition matrix ð21Þ B B b21 B B B c31 Z¼B B B c41 B B @ ð22Þ ð23Þ ð24Þ ð25Þ x65 > x 5ị=15 < x 20 ; : x > 20 ð27Þ a11 ct1 a12 b22 c32 c42 ct2 a1s C b2s C C C c3s C C: C c4s C C C A ð36Þ cts In Eq (36), t = k + where k N+ is the number of diseases in Definition The value s N+ is the number of intuitionistic linguistic labels a1i, b2i, chi, " h {3, , t}, "i {1, , s} are the intuitionistic fuzzy values (IFV) consisting of the membership and non-membership values as in Denition a1i ẳ liX xị; ciX ðxÞÞ, "i {1, , s} represents for the IFV value of the patient to the linguistic label ith of featureX b2i ẳ liY yị; ciY yịị, " i {1, , s} stands for the IFV value of the symptom to the linguistic label ith of characteristic Y chi ẳ liD Dh2 ị; ciD ðDhÀ2 ÞÞ, "i {1, , s}, "h {3, , t} is the IFV value of the disease to the linguistic label ith Each line from the third one to the last in Eq (36) is related to a given disease Example The first line in Table 15 describing the information of user Al (Age: 25) at the temperature 4°C can be expressed by the IFM as follows ð0:0; 1:0Þ ð0:33; 0:67Þ ð0:67; 0:33Þ C B ð0:0; 1:0Þ ð0:0; 1:0Þ C B ð1:0; 0:0ị C: ZẳB C B 0:2; 0:6ị 0:1; 0:9ị A @ ð0:8; 0:1Þ x 5; x > 40 35 < x 40 ; x 35 > < lhot xị ẳ x 35ị=5 35 < x 40 ; > : x > 40 ð20Þ ð26Þ 20 < x 35 20 < x 35 Definition An intuitionistic fuzzy matrix (IFM) Z in MC-IFRS is defined as, x65 > : x > 20 < x 20 > > > : ðx À 35Þ=5 ð25 CÞ : hcoldð0; 1Þ; mediumð1; 0Þ; hotð0; 1Þi: Similarly, the membership (non-membership) functions of the symptom to the linguistic label jth of characteristicY are: > > > < x 5ị=15 lmedium xị ẳ >1 > > : 40 xị=5 v medium xị ẳ > The information of symptom are shown as follows Based on Eqs (16)–(21), we calculate the information of patients as follows Alð25tÞ : hhighð0; 1Þ; mediumð0:33; 0:67Þ; lowð0:67; 0:33Þi; Bobð40tÞ : hhighð0; 1Þ; mediumð1; 0Þ; lowð0; 1Þi; x 5; x > 40 > > > > < ð20 À xÞ=15 < x 20 ; ð28Þ ð0:1; 0:8Þ ð0:6; 0:35Þ ð0:3; 0:55Þ ð37Þ 140 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 15 A MC-IFRS for medical diagnosis with ⁄ being the values to be predicted Age * Alð25Þ : * Alð25Þ : highð0; 1Þ; mediumð0:33; 0:67Þ; lowð0:67; 0:33Þ highð0; 1Þ; mediumð0:33; 0:67Þ; lowð0:67; 0:33Þ * highð0; 1Þ; mediumð1; 0Þ; lowð0; 1Þ Bobð40Þ : * highð0; 1Þ; mediumð1; 0Þ; lowð0; 1Þ Joeð45Þ : * Tedð50Þ : Temperature * + coldð1; 0Þ; ð4 CÞ : mediumð0; 1Þ; hotð0; 1Þ + + * ð16 CÞ : * + ð39 CÞ : * + ð16 CÞ : highð0:33; 0:67Þ; mediumð0:67; 0:33Þ; lowð0; 1Þ * + ð25 CÞ : coldð0:267; 0:733Þ; mediumð0:733; 0:267Þ; hotð0; 1Þ coldð0; 1Þ; mediumð0:2; 0:8Þ; hotð0:8; 0:2Þ ð1Þ a B 11 B ð1Þ Bb B 21 B B ð1Þ B c31 B B B cð1Þ B 41 B B B @ ð1Þ ct1 ð1Þ a12 ð1Þ b22 ð1Þ c32 ð1Þ c42 ð1Þ ct2 ð12Þ a11 B B ð12Þ Bb B 21 B B 12ị B c31 ẳB B B c12ị B 41 B B B @ ð12Þ ct1 ð1Þ a1s ð2Þ a C B 11 C B ð1Þ C B ð2Þ b2s C B b21 C B ð1Þ C B ð2Þ c3s C B c31 CB C B ð1Þ C B ð2Þ c4s C B c41 C B C B C B A @ ð1Þ ð2Þ cts ct1 ð12Þ ð12Þ a12 a1s C ð12Þ ð12Þ C b22 b2s C C C ð12Þ ð12Þ C c32 c3s C C; C ð12Þ ð12Þ c42 c4s C C C C C A ð12Þ ð12Þ ct2 cts ð2Þ a12 ð2Þ ð2Þ a1s b22 c32 ð2Þ ð2Þ c42 ð2Þ ct2 C C ð2Þ b2s C C C ð2Þ C c3s C C C ð2Þ C c4s C C C C A ð2Þ cts ð12Þ 12ị b2i ẳ 1ị b2i ^ 2ị b2i 12ị chi ð1Þ Level2(0.6, 0.35); Level3(0.3,0.55) Level1(0.4, 0.5); Level1(0.0, 0.9); Level2(0.6,0.2); Level3(0.1,0.9); Level2(0.2, 0.75); Level3(0.7,0.2) Level1(0.8, 0.2); Level1(0.8, 0.1); Level2(0.1,0.8); Level3(0.0,0.95); Level2(0.1,0.9); Level3(0.0,0.9) Level1(0.0, 1.0); Level1(0.0, 0.9); Level2(0.2, 0.7); Level3(1.0,0.0); Level2(0.7, 0.3); Level3(0.1,0.85); Level1(⁄, ⁄); Level1(⁄, ⁄); Level2(⁄, ⁄); Level3(⁄, ⁄); Level2(⁄, ⁄); Level3(⁄, ⁄); Example Given IFM below ð0:0; 1:0Þ ð0:33; 0:67Þ ð0:67; 0:33Þ B ð1:0; 0:0Þ ð0:0; 1:0ị 0:0; 1:0ị C C B Z1 ẳ B C; @ ð0:8; 0:1Þ ð0:2; 0:6Þ ð0:1; 0:9Þ A ð0:7; 0:3Þ ð42Þ ð43Þ ð0:1; 0:85Þ ð0:0; 1:0Þ ð0:33; 0:67Þ B 0:267; 0:733ị 0:0; 1:0ị B Z ẳ Z1 Z2 ẳ B @ 0:0; 1:0ị 0:2; 0:7ị 0:0; 0:9ị ð0:0; 1:0Þ ð0:0; 1:0Þ C C C: ð0:1; 0:9Þ A ð0:6; 0:35Þ ð44Þ ð0:1; 0:85Þ ð38Þ Definition Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy composition matrix (IFCM) of Z1 and Z2 with the union operation is defined as follows ð40Þ ð2Þ ¼ chi ^ chi ð1Þ 2ị 1ị 2ị ẳ liD Dh2 ị; liD Dh2 Þ ; max ciD ðDhÀ2 Þ; ciD ðDhÀ2 Þ ; 8h f3; ; tg: Level2(0.2, 0.6); Level3(0.1, 0.9); The IFCM of Z1 and Z2 with the intersection operation is: ð1Þ ð2Þ 1ị 2ị ẳ liY yị; liY yị ; max ciY ðyÞ; ciY ðyÞ ; 8i f1; ; sg; Level1(0.1, 0.8) ð0:0; 0:9Þ ð39Þ 8i f1; ; sg; Level1(0.8, 0.1); ð0:1; 0:8Þ ð0:6; 0:35Þ ð0:3; 0:55Þ ð0:0; 1:0Þ ð1:0; 0:0Þ ð0:0; 1:0Þ B ð0:267; 0:733Þ ð0:733; 0:267Þ ð0:0; 1:0Þ C C B Z2 ẳ B C: @ 0:0; 1:0ị 0:2; 0:7ị ð1:0; 0:0Þ A ð1Þ ð2Þ 1ị 2ị 1ị 2ị ẳ a1i ^ a1i ẳ liX ðxÞ; liX ðxÞ ; max ciX ðxÞ; ciX ðxÞ ; 8i f1; ; sg; Headache where a1i + + Definition Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy composition matrix (IFCM) of Z1 and Z2 with the intersection operation is, + coldð0:267; 0:733Þ; mediumð0:733; 0:267Þ; hotð0; 1Þ coldð0; 1Þ; mediumð1; 0Þ; hotð0; 1Þ + Flu ð41Þ ð1Þ a11 B ð1Þ Bb B 21 B ð1Þ Bc B 31 B ð1Þ B c41 B B @ ð1Þ ct1 ð1Þ a12 ð1Þ b22 ð1Þ c32 ð1Þ c42 ð1Þ ct2 ð12Þ a B 11 B bð12Þ B 21 B 12ị Bc 31 ẳB B 12ị B c41 B B @ ð12Þ ct1 ð1Þ a1s ð2Þ a11 C B ð2Þ ð1Þ B b2s C C B b21 C B ð2Þ ð1Þ B c3s C C B c31 B ð2Þ ð1Þ C B c4s C C B c41 C A B @ ð1Þ ð2Þ ct1 cts ð12Þ ð12Þ a12 a1s C ð12Þ ð12Þ b22 b2s C C C ð12Þ ð12Þ c32 c3s C C; ð12Þ ð12Þ C c42 c4s C C C A ð12Þ ð12Þ ct2 cts ð2Þ a12 ð2Þ b22 ð2Þ c32 ð2Þ c42 ð2Þ ct2 ð2Þ a1s C ð2Þ b2s C C C ð2Þ c3s C C ð2Þ C c4s C C C A ð2Þ cts ð45Þ 141 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133150 where 12ị a1i It follows that, 1ị 2ị ẳ a1i _ a1i ð1Þ ð2Þ ð1Þ 2ị ẳ max liX xị; liX xị ; ciX ðxÞ; ciX ðxÞ ; 8i f1; ; sg; 12ị b2i 1ị 2ị ẳ b2i _ b2i ð46Þ ð1Þ ð2Þ ð1Þ ð2Þ ¼ max liY ðyÞ; liY ðyÞ ; ciY ðyÞ; ciY ðyÞ ; 8i f1; ; sg; ð12Þ chi ð1Þ ð47Þ 8h f3; ; tg: ð48Þ ð0:0; 1:0Þ ð1:0; 0:0Þ ð0:67; 0:33Þ B ð1:0; 0:0Þ ð0:733; 0:267Þ ð0:0; 1:0Þ C C B Z ¼ Z1 Z2 ¼ B C: @ ð0:8; 0:1Þ ð0:2; 0:6Þ ð1:0; 0:0Þ A ð0:1; 0:8Þ ð0:7; 0:3Þ ð49Þ ð0:3; 0:55Þ Theorem The IFCM of Z1 and Z2 with the intersection (union) operation is an IFM Proof We prove the theorem with the intersection operation only The theorem with the union operation is proven analogously From Definition 8, we know that 1ị 2ị 1ị 2ị 1ị 2ị ẳ a1i ^ a1i ẳ liX xị; liX xị ; max ciX ðxÞ; ciX ðxÞ ; 8i f1; ; sg; ð12Þ b2i 1ị 2ị 1ị 2ị 1ị 2ị ẳ b2i ^ b2i ¼ liY ðyÞ; liY ðyÞ ; max ciY ðyÞ; ciY ðyÞ ; 8i f1; ; sg; ð12Þ chi B B ð12Þ B b21 B B 12ị B c31 Z1 Z2 ẳ B B 12ị Bc B 41 B B @ ð12Þ ct1 ð1Þ 2ị ẳ chi ^ chi 1ị 2ị 1ị 2ị ẳ liD Dh2 ị; liD ðDhÀ2 Þ ; max ciD ðDhÀ2 Þ; ciD ðDhÀ2 Þ ; 8i f1; ; sg; ð1Þ a1i 8h f3; ; tg: ð52Þ ð2Þ a1i ð12Þ a1i Since and are two IFV, the function of them – is also an ð12Þ ð12Þ IFV Similar conclusions are found for b2i and chi Thus, the IFCM of Z1 and Z2 with the intersection operation is an IFM The proof is complete h Property Given Z1, Z2 and Z3 being IFM The following properties hold for these IFM ð12Þ b22 ð12Þ c32 ð12Þ c42 ð12Þ ct2 ð12Þ a1s ð1Þ ð2Þ the intersection operation Since a1i and a1i are two IFV, we obtain 12ị ẳ a1i ^ a1i ẳ a1i ^ a1i ẳ a1i ; 53ị 12ị b2i 12ị chi 1ị 2ị 2ị 1ị 21ị b2i ^ b2i ẳ b2i ^ b2i ẳ b2i ; 1ị 2ị 2ị 1ị 21ị chi ^ chi ẳ chi ^ chi ẳ chi : 54ị ¼ ¼ ð2Þ ð2Þ ð1Þ ð21Þ a11 C B B ð21Þ ð12Þ C b2s C B b21 C B B ð21Þ ð12Þ C B c3s C C ẳ B c31 C B 21ị ð12Þ C c4s C B B c41 C B C A B @ ð12Þ ð21Þ cts ct1 ð21Þ a12 ð21Þ b22 ð21Þ c32 ð21Þ c42 ð21Þ ct2 ð21Þ a1s C ð21Þ C b2s C C ð21Þ C c3s C C C ð21Þ c4s C C C C A 21ị cts ẳ Z2 Z1 : ð56Þ The proof is analogously performed with the IFCM of Z1 and Z2 equipped with the union operation.(b) Suppose that the IFCM of Z1 and Z2 is equipped with the intersection operation We have, ð123Þ a11 ð123Þ a12 ð123Þ a1s C B B ð123Þ ð123Þ ð123Þ C B b21 b22 b2s C C B B ð123Þ ð123Þ ð123Þ C C B c31 c c 32 3s C: ðZ Z Þ Z ¼ B C B ð123Þ ð123Þ ð123Þ C Bc c c C B 41 42 4s C B C B A @ ð123Þ ð123Þ ð123Þ ct2 cts ct1 ð123Þ ð1Þ ð2Þ ð3Þ a1i ¼ a1i ^ a1i ^ a1i ; 8i f1; ; sg; ð123Þ ð1Þ 2ị 3ị b2i ẳ b2i ^ b2i ^ b2i ; 8i f1; ; sg; 123ị 1ị 2ị 3ị chi ẳ chi ^ chi ^ chi ; 8i f1; ; sg; 8h f3; ; tg: ð57Þ ð58Þ ð59Þ ð60Þ Because ð1Þ ð2Þ ð3Þ 1ị 2ị 3ị a1i ^ a1i ^ a1i ẳ a1i ^ a1i ^ a1i ; ð1Þ ð2Þ ð3Þ ð1Þ ð2Þ ð3Þ b2i ^ b2i ^ b2i ¼ b2i ^ b2i ^ b2i ; ð1Þ ð2Þ ð3Þ ð1Þ ð2Þ ð3Þ chi ^ chi ^ chi ẳ chi ^ chi ^ chi : 61ị ð62Þ ð63Þ It follows that ðZ Z Þ Z ¼ Z ðZ Z Þ: ð64Þ The proof is analogously performed with the IFCM of Z1 and Z2 equipped with the union operation h 2.3 The intuitionistic fuzzy similarity matrix and intuitionistic fuzzy similarity degree Motivated by the ideas of Hung and Yang [15], we present the definition of intuitionistic fuzzy similarity matrix as follows Proof (a) Suppose that the IFCM of Z1 and Z2 is equipped with ð1Þ Definition 10 Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy similarity matrix (IFSM) between Z1 and Z2 is defined as follows (a) Z1Z2 = Z2Z1, (b) (Z1Z2)Z3 = Z1(Z2Z3) a1i ð12Þ a12 ð2Þ Example Given IFM in Example The IFCM of Z1 and Z2 with the union operation is: ð12Þ ð12Þ a11 ¼ chi _ chi 1ị 2ị 1ị 2ị ẳ max liD Dh2 ị; liD ðDhÀ2 Þ ; ciD ðDhÀ2 Þ; ciD ðDhÀ2 Þ ; 8i f1; ; sg; a1i ð21Þ ð55Þ e S 11 B Be B S 21 Be B S 31 e S¼B Be B S 41 B B @ e S t1 e S 12 e S 22 e S 32 e S 42 e S t2 e S 1s C e S 2s C C C e S 3s C C; C e S 4s C C C A e S ts ð65Þ 142 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 where, Since e S 1i ¼ À qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ð1Þ ð2Þ À exp À1=2 liX xị liX xị ỵ ciX xị ciX ðxÞ À expðÀ1Þ 8i f1; ;sg; ; ð66Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ð1Þ ð2Þ À exp À1=2 liY ðyÞ À liY yị ỵ ciY yị ciY yị e S 2i ¼ À À SIMðPu ; P2 ị lPiDv Dj ị ỵ cPiDv Dj Þ SIMðPu ; P1 Þ; l Pv iD ðDj ị Pv iD Dj ị ỵc 78ị SIMPu ; P2 ị; 79ị lPiDv Dj ị ỵ cPiDv Dj Þ SIMðPu ; Pn Þ: ð80Þ SIMðPu ; Pn Þ Â À Á À expðÀ1Þ 8i f1; ;sg; e S hi ¼ À SIMðPu ; P1 Þ Â ð67Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ð1Þ ð2Þ À exp À1=2 liD ðDhÀ2 Þ À liD Dh2 ị ỵ ciD Dh2 ị ciD Dh2 Þ À expðÀ1Þ 8i f1; ; sg; ; 8h f3; ; tg: ð68Þ Definition 11 Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy similarity degree (IFSD) between Z1 and Z2 is Pu iD ðDj Þ l s s t X s X X X S 1i þ b w2i e S 2i þ v S hi ; SIMZ ; Z ị ẳ a w1i e whi e i¼1 i¼1 It follows that Pu iD Dj ị ỵc 69ị hẳ3 iẳ1 where e S is the IFSM between Z1 and Z2 W = (wij) ("i {1, , t}, "j {1, , s}) is the weight matrix of IFSM between Z1 and Z2 satisfying, s X w1i ¼ 1; s X w2i ¼ 1; s X whi ¼ 1; i¼1 i¼1 i¼1 8h f3; ; tg; 70ị a ỵ b ỵ v ẳ 1: 71ị Pn v ẳ1 SIMP u ; P v ị Pn v ẳ1 SIMP u ; P v ị Pn SIMPu ; Pv ị Pvn ẳ1 ẳ 1: v ¼1 SIMðP u ; P v Þ ¼ The proof is complete lPiDv Dj ị ỵ cPiDv Dj ị ð81Þ h 2.4 The intuitionistic fuzzy collaborative filtering method Fig 1) Evaluation Remark The formula of IFSD in Eq (68) can be recognized as the generalization of the hard user-based, item-based and the ratingbased similarity degrees in recommender systems [29] when b = v = 0, a = v = and a = b = 0, respectively In this section, we describe the experimental environment in Section 3.1 The database for experiments is given in Section 3.2 Definition 12 The formulas to predict the values of linguistic labels of patient Pu ("u {1, , n}) to symptom Sj (" j {1, , m}) according to diseases (D1, D2, , Dk) in MC-IFRS are: lPiDu Dj ị ẳ Pn Pv SIMPu ; Pv ị liD Dj ị v ẳ1P ; 8i f1; ; sg; n v ẳ1 SIMP u ; P v ị 8j f1; ; kg; cPiDu Dj ị ẳ Pn 8u f1; ; ng; ð72Þ Pv SIMðP u ; P v Þ Â ciD ðDj Þ v ¼1P ; n v ¼1 SIMðP u ; P v Þ 8j f1; ; kg; 8i f1; ; sg; 8u f1; ; ng: ð73Þ Theorem The predictive IFM results in Definition 12 are an IFV Proof We have the following fact lPiDu ðDj Þ ỵ cPiDu Dj ị ẳ Pn v ẳ1 SIMP u ; P v Þ Â Pn À lPiDv ðDj Þ ỵ cPiDv Dj ị v ẳ1 SIMP u ; P v Þ Á ; ð74Þ lPiDv ðDj Þ þ cPiDv ðDj Þ 1; ð75Þ SIMðPu ; Pv Þ 1: ð76Þ It is obvious that lPiDu Dj ị ỵ cPiDu Dj ị P 0: 77ị Fig The IFCF algorithm 143 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 16 The descriptions of experimental datasets Dataset No elements No attributes No classes Elements in each classes HEART RHC 270 5735 13 2 (150, 120) (3804, 1931) Section 3.3 illustrates the activities of IFCF on the intuitionistic medical diagnosis dataset in [31] Lastly, Section 3.4 presents the experimental results on the benchmark medical diagnosis datasets namely HEART and RHC 3.1 Experimental design In this part, we describe the experimental environments such as, Experimental tools: We have implemented the proposed hybrid algorithm – IFCF in addition to the typical standalone methods of IFS such as De et al [9], Szmidt and Kacprzyk [44], Samuel and Balamurugan [31] and RS such as Davis et al [8] and Hassan and Syed [13] in PHP programming language and executed them on a PC Intel(R) core(TM) Duo CPU T6400 @ 2.00 GHz GB RAM The results are taken as the average value of 50 runs Evaluation indices: Mean Absolute Error (MAE) and the computational time Objective: To illustrate the activities of IFCF on an illustrated dataset; To evaluate the IFCF in comparison with the relevant algorithms in terms of accuracy through evaluation indices 3.2 Database In the evaluation, we use three kinds of datasets for experiments The intuitionistic medical diagnosis dataset in [31], which was used from Example to to illustrate the activities of the relevant algorithms in Section 1; The benchmark medical diagnosis dataset namely HEART from UCI Machine Learning Repository [46] (Table 16 and Fig 2); A large benchmark medical diagnosis dataset namely RHC (Right Heart Catheterization) including 5735 critically ill adult patients receiving care in ICU [6] (Table 16 and Fig 3) The cross-validation method for the experiments is the k-fold validation with k from to 10 The aim for various folds is to observe the changes of MAE and computational time of algorithms so that this could help us better analysis of experimental results Besides testing with the k-fold validation, the random experiments with the cardinalities of the testing being from 10 to 100 random elements are also performed In order to validate the results with accurate classes, the intuitionistic defuzzification method [3] is used for experimental algorithms 3.3 An illustration of IFCF In this section, we illustrate the activities of IFCF on the intuitionistic medical diagnosis dataset in [31] described from Tables 1–3 Similar to Example 5, the training dataset is demonstrated in Table 17 where ⁄ values in this table are needed to be predicted From Tables and 17 we have extracted the SC-IFRS dataset in Table 18 similar to that in Table 15 where ⁄ values in this table are needed to be predicted From Table 18, we realize that a semi-SC-IFRS with the hard patients being used instead of their features Thus, the parameters of the IFCF algorithm are automatically updated as a = 0, b = c = 1/2 and w1i = w2i = w3i = 0.2 From Definition 11, we calculate the IFSD between Sugu (Somu) and Ram and Mari as follows IFSDSugu; Ramị ẳ 0:87; IFSDSugu; Mariị ẳ 0:57; 82ị 83ị IFSDSomu; Ramị ẳ 0:83; 84ị IFSDSomu; Mariị ẳ 0:58: 85ị From Eqs (82)–(85), we used Definition 12 to calculate the predictive IFM results of Sugu and Somu Viral feverð0:49; 0:38Þ; * Malaria0:52; 0:22ị DiseaseSuguị ẳ Fig The 2D distribution of HEART Typhoidð0:36; 0:52Þ; Stomach problemð0:40; 0:34Þ Chest problemð0:10; 0:68Þ + ; ð86Þ 144 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 17 The training dataset for IFCF with ⁄ being the values to be predicted Table 18 The extracted SC-IFRS dataset with ⁄ being the values to be predicted P Viral_fever Malaria Typhoid Stomach Chest Ram Mari Sugu Somu (0.4, 0.1) (0.3, 0.5) ⁄ ⁄ (0.7, 0.1) (0.2, 0.6) ⁄ ⁄ (0.6, 0.1) (0.4, 0.4) ⁄ ⁄ (0.2, 0.4) (0.6, 0.1) ⁄ ⁄ (0.2, 0.6) (0.1, 0.7) ⁄ ⁄ P S D Ram Temperatureð0:8; 0:1Þ; * + Headacheð0:6; 0:1Þ Stomach painð0:2; 0:8Þ; Coughð0:6; 0:1Þ Chest painð0:1; 0:6Þ Temperatureð0:0; 0:8Þ; * + Headacheð0:4; 0:4Þ Stomach painð0:6; 0:1Þ; Coughð0:1; 0:7Þ Chest painð0:1; 0:8Þ Temperatureð0:8; 0:1Þ; * + Headacheð0:8; 0:1Þ Viral feverð0:4; 0:1Þ; + Malariað0:7; 0:1Þ Typhoidð0:6; 0:1Þ; Stomach problemð0:2; 0:4Þ Chest problemð0:2; 0:6Þ Viral feverð0:3; 0:5Þ; * + Malariað0:2; 0:6Þ Typhoidð0:4; 0:4Þ; Stomach problemð0:6; 0:1Þ Chest problemð0:1; 0:7Þ Mari Viral fever0:47; 0:39ị; * Malaria0:52; 0:22ị DiseaseSomuị ẳ Typhoidð0:36; 0:51Þ; + : ð87Þ Sugu Stomach problemð0:39; 0:47Þ Chest problemð0:10; 0:68Þ Based on the recommendation function of Definition and Eqs (86) and (87), we recommend the disease those patients suffer the most as in Table 19 From Table 19, we conclude that Sugu and Somu both suffer from the Malaria Somu Stomach painð0:0; 0:6Þ; Coughð0:2; 0:7Þ Chest painð0:0; 0:5Þ Temperatureð0:6; 0:1Þ; * + Headacheð0:5; 0:4Þ Stomach painð0:3; 0:4Þ; Coughð0:7; 0:2Þ Chest painð0:3; 0:4Þ * ⁄ Remark (a) The result of IFCF expressed in Table 19 is identical to those of De et al [9] in Example 2, Samuel and Balamurugan [31] in Example and Hassan and Syed [13] in Example Besides these methods, if we perform other relevant algorithms such as Own [26] and Shinoj and John [33] then the same results would be given This proves the correctness of the proposed method; (b) IFCF is capable to perform the prediction and recommendation with more types of datasets including the hard and (intuitionistic) fuzzy data, not only the semi-SC-IFRS dataset like what being experimented in this section This affirms the flexibility and usefulness of IFCF in comparison with other relevant methods; (c) The limitation of the defuzzification method stated in Sections 1.3(c) and 1.4(b) of the relevant works such as De et al [9], Szmidt and Kacprzyk [44], Samuel and Table 19 The recommended diseases where the most suffered one is highlighted in bold P Viral_fever Malaria Typhoid Stomach Chest Sugu Somu 0.5537 0.5358 0.6552 0.6552 0.4032 0.4068 0.504 0.4446 0.122 0.122 Balamurugan [31], Davis et al [8] and Hassan and Syed [13] is handled by the recommendation function in SC-IFRS (Definition 5) and MC-IFRS (Definition 6); (d) IFCF used both the data about the relations between the patients – the symptoms and the patients – the diseases for the prediction, not solely on the data of the patients – the diseases as in the relevant methods relating to RS such as Davis et al [8] and Hassan and Syed [13] This enhances the accuracy of diagnosis since more information is count for the calculation of the algorithm Fig The 2D distribution of RHC 145 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 20 The results of random experiments on the HEART dataset Data sets 10 20 30 40 50 60 70 80 90 100 MAE IFCF DAVIS HASSAN DE SAMUEL SZMIDT 0.49641 0.492051 0.490383 0.488195 0.492451 0.490892 0.491429 0.493946 0.494265 0.493219 0.500286 0.495541 0.492238 0.49185 0.494166 0.494094 0.494952 0.497251 0.49741 0.495757 0.494036 0.488739 0.486775 0.483692 0.487049 0.486138 0.487727 0.490348 0.490197 0.488414 0.489147 0.496361 0.493604 0.47396 0.490438 0.476327 0.480793 0.491512 0.484721 0.483999 0.508045 0.504834 0.509998 0.526556 0.510221 0.523458 0.519985 0.507438 0.517029 0.516403 0.491955 0.495166 0.490002 0.473444 0.489779 0.476542 0.480015 0.492562 0.482971 0.483597 0.646215 0.703126 0.739453 1.192436 1.234968 2.084581 2.360242 3.014339 3.081387 3.305678 0.089682 0.152622 0.211139 0.255561 0.290003 0.304465 0.306179 0.323535 0.324501 0.328335 0.011868 0.028907 0.032683 0.043212 0.043985 0.048735 0.058691 0.078036 0.081523 0.127595 0.009564 0.0347 0.036562 0.037371 0.043722 0.052548 0.053144 0.060445 0.11539 0.11964 0.014055 0.023194 0.029526 0.046678 0.053152 0.058788 0.072443 0.072655 0.082507 0.102687 Computational time (s) 10 20 30 40 50 60 70 80 90 100 0.327558 0.796293 1.166132 1.221915 1.232003 1.260366 1.278752 1.469952 1.613406 2.210751 Table 21 The results of k-fold cross validation on the HEART dataset Fold 10 MAE IFCF DAVIS HASSAN DE SAMUEL SZMIDT 0.495733 0.494624 0.490572 0.488722 0.489378 0.492201 0.492132 0.488487 0.487778 0.498977 0.497455 0.49394 0.492826 0.493186 0.496148 0.496581 0.492575 0.491602 0.491243 0.489978 0.486429 0.485395 0.484268 0.488301 0.488893 0.485452 0.484024 0.479124 0.474681 0.491674 0.474125 0.474017 0.468833 0.492033 0.486527 0.490158 0.521357 0.525323 0.510025 0.525475 0.526214 0.527847 0.509502 0.515524 0.514568 2.060489 0.474677 0.489975 0.474525 0.474768 0.472153 0.490498 0.484476 0.485432 4.059911 4.039289 3.383637 3.208319 2.952161 2.342186 2.186523 2.150605 0.492826 1.572734 1.431444 1.335837 1.137519 1.108358 1.101027 1.097188 0.956852 0.863702 0.213536 0.191975 0.09894 0.092019 0.088419 0.076773 0.057825 0.047176 0.045803 0.21604 0.103076 0.100901 0.083503 0.04564 0.043674 0.034734 0.029806 0.019865 0.207523 0.144031 0.128371 0.086687 0.075412 0.07241 0.06922 0.0414 0.037433 Data sets Computational time (s) 10 3.225795 3.158217 2.821404 2.70524 2.310253 2.166859 2.088267 2.060489 1.889909 3.4 Assessment In this section, we perform the experiments on the benchmark medical diagnosis datasets namely HEART and RHC The experimental results are described from Tables 20–23 The MAE results of random experiments (resp k-fold cross validation) on the HEART dataset is illustrated in Fig (resp Fig 5) Analogously, the MAE results of random experiments (resp k-fold cross validation) on the RHC dataset is illustrated in Fig (resp Fig 7) The discussion of the experimental results is demonstrated in Remark Remark (a) Tables 20 and 21 have revealed that the MAE values of IFCF are approximate to those of other algorithms They are better than those of Davis et al [8] and Samuel and Balamurugan [31] only for this dataset Specifically, the average MAE values of IFCF, Davis et al [8], Hassan and Syed [13], De et al [9], Samuel and Balamurugan [31] and Szmidt and Kacprzyk [44] in Table 20 are 0.4923241, 0.4953545, 0.4883115, 0.4860862, 0.5143967 and 0.4856033, respectively These numbers in Table 21 are 0.491069667, 0.49481, 0.487109222, 0.481241333, 0.519537222 and 0.656332556, respectively Figs and illustrate this fact This shows that the proposed algorithm IFCF is not really effective in case that the dataset contains hard values and is small-sized & concentrative in a small range such as the HEART dataset; (b) Tables 22 and 23 show other results of MAE in cases of a large dataset such as RHC The average MAE values of IFCF, Davis et al [8], Hassan and Syed [13], De et al [9], Samuel and Balamurugan [31] and Szmidt and Kacprzyk [44] in Table 22 are 0.43716495, 0.4374099, 0.4504119, 146 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Table 22 The results of random experiments on the RHC dataset Data sets 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 MAE IFCF DAVIS HASSAN DE SAMUEL SZMIDT 0.442175 0.439844 0.438635 0.438282 0.436268 0.436115 0.437641 0.436992 0.440996 0.435963 0.436537 0.435991 0.435262 0.438705 0.437769 0.435194 0.434399 0.435067 0.436633 0.434831 0.44247 0.440324 0.438546 0.438322 0.436218 0.436708 0.438068 0.436864 0.441618 0.436172 0.436788 0.436306 0.435451 0.439116 0.437992 0.435462 0.434582 0.435319 0.436888 0.434984 0.441502 0.451435 0.450023 0.450814 0.444011 0.438171 0.453067 0.449501 0.450962 0.441205 0.453276 0.458081 0.438105 0.45581 0.463094 0.444957 0.451129 0.463573 0.448817 0.460705 0.479901 0.482438 0.481689 0.476943 0.474844 0.480948 0.480032 0.47773 0.481943 0.475933 0.480885 0.478958 0.478893 0.481182 0.481186 0.476458 0.479923 0.476773 0.483203 0.478549 0.52626 0.524069 0.524843 0.529039 0.531077 0.525275 0.526245 0.528056 0.523331 0.529638 0.525715 0.526851 0.527255 0.524615 0.524855 0.529806 0.526656 0.529274 0.523412 0.527785 0.47374 0.475931 0.475157 0.470961 0.468923 0.474725 0.473755 0.471944 0.476669 0.470362 0.474285 0.473149 0.472745 0.475385 0.475145 0.470194 0.473344 0.470726 0.476588 0.472215 33.57283 65.9519 87.0333 113.8669 135.3158 152.9428 168.4303 189.6871 219.4764 249.4249 262.0359 275.2695 291.7030 315.0818 333.7332 351.0495 385.3729 402.2348 421.2111 442.4772 12.06656 30.21522 51.16982 67.81981 83.36726 96.91099 113.4414 129.7520 142.0267 156.7367 169.7921 180.1441 191.0308 202.0023 204.5337 213.1942 217.8114 226.5664 262.3495 287.3276 0.010758 0.015019 0.021002 0.038148 0.049271 0.056363 0.061749 0.067974 0.081742 0.096007 0.108004 0.110438 0.128739 0.138693 0.153086 0.163775 0.169128 0.185551 0.205338 0.216944 0.010931 0.019594 0.028967 0.038453 0.047737 0.056865 0.066663 0.076921 0.087006 0.094882 0.112752 0.134178 0.143444 0.151820 0.166440 0.170800 0.187068 0.199256 0.208938 0.237930 0.014407 0.028265 0.041422 0.055456 0.068599 0.081633 0.095873 0.113722 0.123928 0.135023 0.154736 0.162411 0.188254 0.210567 0.214335 0.242015 0.252885 0.268898 0.273174 0.306299 Computational time (s) 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 40.06698 77.14874 113.0401 147.2283 180.2224 211.1297 242.7504 272.7885 319.4657 325.5207 344.3712 365.4371 385.1481 404.1568 421.5027 436.5416 453.7668 465.9425 478.8142 597.5560 Table 23 The results of k-fold cross validation on the RHC dataset Fold MAE IFCF DAVIS HASSAN DE SAMUEL SZMIDT 10 0.440949 0.437283 0.434009 0.436070 0.435874 0.437098 0.434386 0.437215 0.441839 0.441186 0.437374 0.434141 0.436107 0.436164 0.437535 0.434495 0.437222 0.442047 0.469381 0.463187 0.458663 0.464382 0.436214 0.461758 0.44394 0.435230 0.457397 0.481136 0.480916 0.477842 0.479325 0.482992 0.479249 0.481571 0.479689 0.477239 0.524992 0.525445 0.528936 0.527072 0.523708 0.526433 0.525246 0.52695 0.527824 0.475008 0.474555 0.471064 0.472928 0.476292 0.473567 0.474754 0.47305 0.472176 Data sets Computational time (s) 10 736.6445 701.3804 692.8024 629.4906 566.3853 508.6334 456.6042 436.1555 409.1319 477.9642 476.6182 450.9788 424.9808 396.0038 370.6705 349.1144 329.7963 319.6725 262.6263 262.9038 251.8981 238.3611 226.2907 215.6599 206.5204 198.37997 194.55609 0.791994 0.503385 0.366384 0.291197 0.229366 0.113842 0.051412 0.037128 0.013360 0.942148 0.712606 0.617685 0.615317 0.494319 0.346833 0.287202 0.277723 0.250653 1.651110 1.358187 1.215837 0.993458 0.780865 0.356283 0.321834 0.288569 0.275802 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 147 Fig The MAE results of random experiments on the HEART dataset Fig The MAE results of k-fold cross validation on the HEART dataset 0.47942055, 0.52670285 and 0.47329715, respectively These numbers in Table 23 are 0.437191444, 0.437363444, 0.454461333, 0.479995444, 0.526289556 and 0.473710444, respectively In this case, the MAE value of IFCF is the smallest among all Looking for the comparisons of algorithms in terms of MAE in Figs and also illustrates this fact This show that IFCF is effective in cases of large medical diagnosis datasets having wide ranges of values; 148 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Fig The MAE results of random experiments on the RHC dataset Fig The MAE results of k-fold cross validation on the RHC dataset (c) The computational time is a drawback of the IFCF algorithm In cases of small-sized datasets such as HEART, the computational time of IFCF is ranged from 1.2 to 2.5 s on average and is not quite larger than those of other algorithms that take approximately from 0.3 to 1.8 s to process Yet in cases of large-size datasets such as RHC, the difference is getting larger and obvious Thus a better trade-off between the computational time and the accuracy of diagnosis should be paid much attention for the assurance of the performance of the algorithm L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 Conclusions In this paper, we concentrated on the problem of enhancing the accuracy of medical diagnosis and presented a novel intuitionistic fuzzy recommender system (IFRS) consisting of the following components: (i) the new definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS (MC-IFRS) that extend the definition of traditional recommender systems (RS) taking into account a feature of a user and a characteristic of an item expressed by intuitionistic linguistic labels; (ii) the new definitions of intuitionistic fuzzy matrix (IFM), which is a representation of SC-IFRS and MC-IFRS in the matrix format and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs with the intersection/union operation; (iii) some new similarity degrees of IFMs such as the intuitionistic fuzzy similarity matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD) and the formulas to predict diseases on the basis of IFSD accompanied with an interesting theorem; (iv) a novel intuitionistic fuzzy collaborative filtering method relying on the basis of the predicting formulas so-called IFCF Some interesting theorems and properties of the proposed components were also investigated The proposed IFCF algorithm was used mainly for the medical diagnosis problem Some numerical examples have been introduced throughout the paper to illustrate the problem and the activities of the algorithms In the Evaluation section, a numerical example on a small intuionistic fuzzy medical diagnosis data was presented The next experiments were conducted on both the small and large real hard benchmark medical diagnosis data from UCI Machine Learning Repository The findings from the experiments are summarized as follows: (i) the proposed IFCF algorithm is capable to perform the prediction and recommendation with more types of datasets including the hard and (intuitionistic) fuzzy data than other algorithms such as the standalone methods of intuitionistic fuzzy sets (IFS) such as De et al [9], Szmidt and Kacprzyk [44], Samuel and Balamurugan [31] and RS such as Davis et al [8] and Hassan and Syed [13]; (ii) IFCF has better accuracy of prediction than those algorithms in cases of the (intuitionistic) fuzzy data and the large medical diagnosis datasets having wide ranges of values; (iii) IFCF could handle the limitations of those works as pointed out in Section 1.6(b) of this article; (iv) Lastly, a better trade-off between the computational time and the accuracy of diagnosis in IFCF should be paid much attention for the assurance of the performance of the algorithm These findings clearly show the effectiveness of the proposed algorithm As being mentioned in the Section 1.6 stressing the importance and significance of the proposed work, some solid further works are necessary to impulse the development of the algorithm in this field Firstly, a variation of IFCF algorithm that tackles with the deficiency of processing small-sized real datasets should be studied Secondly, a trade-off between the computational time and the accuracy of diagnosis in IFCF is examined Thirdly, a hybrid algorithm between IFCF and a fuzzy clustering method to enhance the accuracy is considered Fourthly, the theoretical analyses of the IFRS especially the formulation of IFCM with other operations such as t-norm and t-conorm are examined Lastly, applications of IFRS for other problems, e.g the time series forecast and the nowcasting could be performed These future works will enrich the knowledge of deploying advanced fuzzy recommender systems for practical problems Acknowledgement The authors are greatly indebted to the editors-in-chief, Prof H Fujita, Prof J Lu and anonymous reviewers for their comments and 149 their valuable suggestions that improved the quality and clarity of paper This work is sponsored by the NAFOSTED under Contract No 102.05-2014.01 References [1] M Agarwal, M Hanmandlu, K.K Biswas, Generalized intuitionistic fuzzy soft set and its application in practical medical diagnosis problem, in: Proceeding of IEEE International Conference on Fuzzy Systems (FUZZ 2011), 2011, pp 2972– 2978 [2] J.Y Ahn, K.S Han, S.Y Oh, C.D Lee, An application of interval-valued intuitionistic fuzzy sets for medical diagnosis of headache, Int J Innovative Comput Inform Control (5) (2011) 2755–2762 [3] G Albeanu, F.L Popentiu-Vladicescu, Intuitionistic fuzzy methods in software reliability modelling, J Sustainable Energy (1) (2010) 30–34 [4] K.T Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets Syst 20 (1) (1986) 87–96 [5] G Bernegger, M Musalek, C Rehmann-Sutter, An alternative view on the task of prognosis, Critical Rev Oncol./Hematol 84 (2012) S17–S24 [6] A.F Connors et al., The effectiveness of right heart catheterization in the initial care of critically III patients, Jama 276 (11) (1996) 889–897 [7] B.C Cuong, L.H Son, H.T.M Chau, Some context fuzzy clustering methods for classification problems, in: Proceedings of the 2010 ACM Symposium on Information and Communication Technology, 2010, pp 34–40 [8] D.A Davis, N.V Chawla, N Blumm, N Christakis, A.L Barabási, Predicting individual disease risk based on medical history, in: Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008, pp 769– 778 [9] S.K De, R Biswas, A.R Roy, An application of intuitionistic fuzzy sets in medical diagnosis, Fuzzy Sets Syst 117 (2) (2001) 209–213 [10] L Duan, W.N Street, E Xu, Healthcare information systems: data mining methods in the creation of a clinical recommender system, Enterprise Inform Syst (2) (2011) 169–181 [11] F Feng, C Li, B Davvaz, M.I Ali, Soft sets combined with fuzzy sets and rough sets: a tentative approach, Soft Comput 14 (9) (2010) 899–911 [12] F Feng, X Liu, V Leoreanu-Fotea, Y.B Jun, Soft sets and soft rough sets, Inform Sci 181 (6) (2011) 1125–1137 [13] S Hassan, Z Syed, From netflix to heart attacks: collaborative filtering in medical datasets, in: Proceedings of the 1st ACM International Health Informatics Symposium, 2010, pp 128–134 [14] R Hosseini, T Ellis, M Mazinani, J Dehmeshki, A genetic fuzzy approach for rule extraction for rule-based classification with application to medical diagnosis, in: Proceeding of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2011, pp 5–9 [15] W.L Hung, M.S Yang, On similarity measures between intuitionistic fuzzy sets, Int J Intell Syst 23 (3) (2008) 364–383 [16] M Irfan Ali, A note on soft sets, rough soft sets and fuzzy soft sets, Appl Soft Comput 11 (4) (2011) 3329–3332 [17] E Jafarian, M.A Rezvani, A valuation-based method for ranking the intuitionistic fuzzy numbers, J Intell Fuzzy Syst 24 (1) (2013) 133–144 [18] R Kala, R.R Janghel, R Tiwari, A Shukla, Diagnosis of breast cancer by modular evolutionary neural networks, Int J Biomed Eng Technol (2) (2011) 194– 211 [19] V Khatibi, G.A Montazer, Intuitionistic fuzzy set vs fuzzy set application in medical pattern recognition, Artif Intell Med 47 (1) (2009) 43–52 [20] I Kononenko, Machine learning for medical diagnosis: history, state of the art and perspective, Artif Intell Med 23 (1) (2001) 89–109 [21] A.R Meenakshi, M Kaliraja, An application of interval valued fuzzy matrices in medical diagnosis, Int J Math Anal (36) (2011) 1791–1802 [22] Meisamshabanpoor, M Mahdavi, Implementation of a recommender system on medical recognition and treatment, Int J e-Education, e-Business, eManagement e-Learning (4) (2012) 315–318 [23] D Meng, X Zhang, K Qin, Soft rough fuzzy sets and soft fuzzy rough sets, Comput Math Appl 62 (12) (2011) 4635–4645 [24] S Moein, S.A Monadjemi, P Moallem, A novel fuzzy-neural based medical diagnosis system, Int J Biol Med Sci (3) (2009) 146–150 [25] T.J Neog, D.K Sut, An application of fuzzy soft sets in medical diagnosis using fuzzy soft complement, Int J Comput Appl 33 (2011) 30–33 [26] C.M Own, Switching between type-2 fuzzy sets and intuitionistic fuzzy sets: an application in medical diagnosis, Appl Intell 31 (3) (2009) 283–291 [27] L Parthiban, R Subramanian, Intelligent heart disease prediction system using CANFIS and genetic algorithm, Int J Biol Biomed Med Sci (3) (2008) 157– 160 [28] Z Pawlak, Rough sets and intelligent data analysis, Inform Sci 147 (1) (2002) 1–12 [29] F Ricci, L Rokach, B Shapira, in: Introduction to Recommender Systems Handbook, Springer, US, 2011, pp 1–35 [30] R.M Rodriguez, L Martinez, F Herrera, Hesitant fuzzy linguistic term sets for decision making, IEEE Trans Fuzzy Syst 20 (1) (2012) 109–119 [31] A.E Samuel, M Balamurugan, Fuzzy max–min composition technique in medical diagnosis, Appl Math Sci (35) (2012) 1741–1746 [32] E Sanchez, Resolution of composition fuzzy relation equations, Inform Control 30 (1976) 38–48 150 L.H Son, N.T Thong / Knowledge-Based Systems 74 (2015) 133–150 [33] T.K Shinoj, S.J John, Intuitionistic Fuzzy Multi sets and its Application in Medical Diagnosis, World Acad Sci Eng Technol (2012) 1418–1421 [34] L.H Son, Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization, Appl Soft Comput 22 (2014) 566–584 [35] L.H Son, HU-FCF: a hybrid user-based fuzzy collaborative filtering method in recommender systems, Expert Syst Appl 41 (15) (2014) 6861–6870 [36] L.H Son, Optimizing municipal solid waste collection using chaotic particle swarm optimization in GIS based environments: a case study at Danang City, Vietnam, Expert Syst Appl 41 (18) (2014) 8062–8074 [37] L.H Son, DPFCM: a novel distributed picture fuzzy clustering method on picture fuzzy sets, Expert Syst Appl 42 (1) (2015) 51–66 [38] L.H Son, B.C Cuong, P.L Lanzi, N.T Thong, A novel intuitionistic fuzzy clustering method for geo-demographic analysis, Expert Syst Appl 39 (10) (2012) 9848–9859 [39] L.H Son, B.C Cuong, H.V Long, Spatial interaction–modification model and applications to geo-demographic analysis, Knowl.-Based Syst 49 (2013) 152– 170 [40] L.H Son, P.L Lanzi, B.C Cuong, H.A Hung, Data mining in GIS: a novel contextbased fuzzy geographically weighted clustering algorithm, Int J Machine Learning Comput (3) (2012) 235–238 [41] L.H Son, N.D Linh, H.V Long, A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network, Eng Appl Artif Intell 29 (2014) 33–42 [42] E Szmidt, J Kacprzyk, Intuitionistic fuzzy sets in some medical applications, in: Proceeding of Computational Intelligence: Theory and Applications, 2001, pp 148–151 [43] E Szmidt, J Kacprzyk, An intuitionistic fuzzy set based approach to intelligent data analysis: an application to medical diagnosis, in: Proceeding of Recent Advances in Intelligent Paradigms and Applications, 2003, pp 57–70 [44] E Szmidt, J Kacprzyk, A similarity measure for intuitionistic fuzzy sets and its application in supporting medical diagnostic reasoning, in: Proceeding of Artificial Intelligence and Soft Computing (ICAISC 2004), 2004, pp 388– 393 [45] K.C Tan, Q Yu, C.M Heng, T.H Lee, Evolutionary computing for knowledge discovery in medical diagnosis, Artif Intell Med 27 (2) (2003) 129–154 [46] University of California, UCI Repository of Machine Learning Databases, 2007 [47] Z Xiao, X Yang, Q Niu, Y Dong, K Gong, S Xia, Y Pang, A new evaluation method based on D–S generalized fuzzy soft sets and its application in medical diagnosis problem, Appl Math Modell 36 (10) (2012) 4592–4604 [48] Y Yao, Three-way decisions with probabilistic rough sets, Inform Sci 180 (3) (2010) 341–353 [49] L.A Zadeh, Fuzzy sets, Inform Control (1965) 338–353 [50] X Zhang, B Zhou, P Li, A general frame for intuitionistic fuzzy rough sets, Inform Sci 216 (2012) 34–49 ... handle the issues of the standalone IFS and RS methods For instance, the limitations of IFS relating to the missing relations and the historic diagnoses of patients stated in Section 1.3(a) and... the new definitions of intuitionistic fuzzy matrix (IFM), which is a representation of SC-IFRS and MC-IFRS in the matrix format and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs... diseases of patients based on the minimal distance criterion Another approach for the medical diagnosis is utilizing the distance functions to calculate the relation between the patients and the diseases

Định dạng
Số trang	18
Dung lượng	1,86 MB