DSpace at VNU: On the performance evaluation of intuitionistic vector similarity measures for medical diagnosis

1597 Journal of Intelligent & Fuzzy Systems 31 (2016) 1597–1608 DOI:10.3233/JIFS-151654 IOS Press On the performance evaluation of intuitionistic vector similarity measures for medical diagnosis1 Le Hoang Sona,∗ and Pham Hong Phongb a VNU University of Science, Vietnam National University, Hanoi, Vietnam University of Civil Engineering, Hanoi, Vietnam b National Abstract Intuitionistic fuzzy recommender system (IFRS), which has been recently presented based on the theories of intuitionistic fuzzy sets and recommender systems, is an efficient tool for medical diagnosis IFRS used the intuitionistic fuzzy similarity degree (IFSD) regarded as the generalization of the hard user-based, item-based and the rating-based similarity degrees in recommender systems to calculate the analogousness between patients in the system In this paper, we firstly extend IFRS by using a new term - the intuitionistic fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in IFRS Then, the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are defined on the basis of the intuitionistic fuzzy vector Some mathematical properties of these new terms are examined, and several IVSM functions are proposed The performances of these IVSM functions for medical diagnosis are experimentally validated and compared with the existing similarity degrees of IFRS The suggestion and recommendation of this paper involve the most efficient IVSM function(s) that should be used for medical diagnosis Keywords: Intuitionistic fuzzy recommender systems, intuitionistic fuzzy vector, intuitionistic vector similarity measure, medical diagnosis, performance evaluation Introduction Medical diagnosis is an important and necessary process to issue appropriate medical figures for patients in health care support systems It involves the determination of the possible relations between The authors are greatly indebted to the editor-in-chief: Prof Reza Langari and anonymous reviewers for their comments and suggestions which improved the quality and clarity of the paper This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2014.01 ∗ Corresponding author Le Hoang Son, VNU University of Science, Vietnam National University, Hanoi, Vietnam Tel.: +84 904 171 284; E-mails: sonlh@vnu.edu.vn, chinhson2002@ gmail.com patients and diseases from those between patients and symptoms The answer of medical diagnosis for a certain disease is often yes/no that eventually leads to the final specification of the most acquiring disease and appropriate treatments The medical diagnosis indeed must ensure the accuracy, which raises great interests of researchers to enhance as far as possible Recent advances of the health care support systems have raised a great concentration to enhancing the accuracy of medical diagnosis both in theory and practice [2] An effort of this theme has presented an efficient tool namely Intuitionistic Fuzzy Recommender System (IFRS), which was designed based on the theories of intuitionistic fuzzy sets and recommender systems [9] In IFRS, the intuitionistic 1064-1246/16/$35.00 © 2016 – IOS Press and the authors All rights reserved 1598 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis fuzzy similarity degree (IFSD) is utilized as a generalization of the hard user-based, item-based and the rating-based similarity degrees to calculate the analogousness between patients A hybrid similarity degree between IFSD and the degree produced by a picture fuzzy clustering method has been proposed to enhance the accuracy of prediction These relevant researches mostly investigated on improving the similarity degree of IFRS to ensure the high accuracy of the system In this paper we propose intuitionistic fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in IFRS Then, a generalization of the existing multi-criteria IFRS so-called the Modified multi-criteria IFRS (MMC-IFRS) that takes into account the IFV is presented Two new measures namely the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are defined Some mathematical properties of these new terms are examined, and several IVSM functions are proposed The performances of these IVSM functions for medical diagnosis are experimentally validated and compared with the existing similarity degrees of IFRS The rests of the paper are organized as follows Section recalls some previous works Section presents the new contributions of this paper Section validates the proposed model by experiments Section gives the conclusions and future works of the paper Preliminaries 2.1 Related works Assume that P, S and D being the sets of patients, symptoms and diseases, respectively Each patient Pi (i = 1, n) (resp symptom Sj , j = 1, m) is assumed to have some features (resp characteristics) For the simplicity, we consider the recommender system including a feature of the patient and a characteristic of the symptom denoted as X and Y , respectively X and Y both consist of s intuitionistic linguistic labels Analogously, disease Dk (k = 1, p) also contains s intuitionistic linguistic labels Thus, the definition of Multi-criteria Intuitionistic Fuzzy Recommender Systems (MC-IFRS) was given as follows Definition [9] (Multi-criteria Intuitionistic Fuzzy Recommender Systems – MC-IFRS) The utility function R is a mapping specified on (X, Y ) as in Equation (1) R : X × Y → D1 × · · · × Dp , (μ1X (x) , γ1X (x)) ··· (μsX (x) , γsX (x)) (μ1Y (y) , γ1Y (y)) × ··· (μsY (y) , γsY (y)) (μ1D (Dk ) , γ1D (Dk )) p → k=1 ··· (μsD (Dk ) , γsD (Dk )) (1) In Equation (1), (μiX (x) , γiX (x)) is an intuitionistic fuzzy value (IFv) of the patient to the i-th linguistic label of feature X, (μiY (y) , γiY (y)) represents the IFv of the symptom to the i-th linguistic label of character Y , and (μiD (Dk ) , γiD (Dk )) stands for the IFv of the disease Dk to the linguistic label i-th (i = 1, s, k = 1, p) MC-IFRS is the system that provides two basic functions below a) Prediction: determine the values of μiD (Dk ) , γiD (Dk ) , i = 1, s, k = 1, p b) Recommendation: choose i∗ = 1, s which maximize(s) the expression p wk (μiD (Dk ) + μiD (Dk ) πiD (Dk )), k=1 where πiD (Dk ) = − μiD (Dk ) − γiD (Dk ) and wk ∈ [0, 1] is the weight of Dk satisfying p the constraint: k=1 wk = MC-IFRS could be compressed in a matrix form as in Definition Definition [9] An intuitionistic fuzzy matrix (IFM) Z in MC-IFRS is defined as, ⎞ ⎛ a11 a12 · · · a1s ⎜b b · · · b ⎟ 2s ⎟ ⎜ 21 22 ⎟ ⎜ ⎜ c c · · · c 3s ⎟ (2) Z = ⎜ 31 32 ⎟ ⎜ · · · · · · · · · · · ·⎟ ⎠ ⎝ ct1 ct2 · · · cts In Equation (2), t = p + where p ∈ N∗ is the number of diseases in Definition The value s ∈ N∗ is the number of intuitionistic linguistic labels a1i , b2i , chi , h = 3, t, i = 1, s are the IFvs consisting of the membership and non-membership values as in Definition 1: a1i = (μiX (x) , γiX (x)), b2i = (μiY (y) , γiY (y)) and chi = (μiD (Dh−2 ) , γiD (Dh−2 )) , i = 1, s, L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis h = 3, t Each line from the third one to the last in Equation (2) is related to a given disease Based on IFM, the intuitionistic fuzzy similarity matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD) were defined as in Definitions and Definition [9] Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy similarity matrix (IFSM) between Z1 and Z2 is defined as follows ⎞ ⎛˜ ˜ S11 S12 · · · S˜ 1s ⎜S˜ S˜ · · · S˜ ⎟ 2s ⎟ ⎜ 21 22 ⎟ ⎜ ˜S = ⎜S˜ 31 S˜ 32 · · · S˜ 3s ⎟ , ⎟ ⎜ ⎜ · · · · · · · · · · · ·⎟ ⎠ ⎝ S˜ t1 S˜ t2 · · · S˜ ts (1) (2) (1) (2) where S˜ 1i = sim a1i , a1i , S˜ 2i = sim b2i , b2i , (1) (2) and S˜ hi = sim chi , chi , i = 1, s, h = 3, t sim is a measure specifying the similarity between two intuitionistic values u = (μu , γu ) and v = (μv , γv ), another recent paper [11], the authors have defined a hybrid similarity degree between IFSD and the degree produced by a picture fuzzy clustering method [8] to enhance the accuracy of prediction as in Definition Definition [11] Let us denote the IFSD in Equation (4) as SIM (a, b) The hybrid similarity degree is history then calculated as SIM (a, b) = (1 − λ) SIM (a, b) + λSIM (a, b) , history =1− − exp √ μu − √ √ √ μ v + γu − γv − exp (−1) (3) Definition [9] Suppose that Z1 and Z2 are two IFM in MC-IFRS The intuitionistic fuzzy similarity degree (IFSD) between Z1 and Z2 is s s w1i S˜ 1i + β SIM (Z1 , Z2 ) = α i=1 w2i S˜ 2i group where λ ∈ [0, 1] is an adjustable coefficient, and SIM (a, b) is the similarity degree from the picture group fuzzy clustering [8] as in the equations below SIM (a, b) group = NC NC P (a, i) − P (a) P (b, i) − P (b) , i=1 P (j, k) = − CS (j, k) , max {CS (i, k)} i sim (u, v) − 21 1599 CS (j, k) = − i Xj Xji Vki Vk , where P (j, k) is the possibility of patient j belonging to the cluster k, CS (j, k) is the counter similarity between the patient j and the cluster k with Xj and Vk being the patient j and the center of cluster k respectively P (a) is the mean value of P (a, k), k = 1, NC NC is the number of groups used in the picture fuzzy clustering – DPFCM method [8] i=1 t 2.2 Some remarks s +χ whi S˜ hi , (4) h=3 i=1 where S˜ is the IFSM between Z1 and Z2 W = wji (j = 1, t, i = 1, s) is the weight matrix of IFSM between Z1 and Z2 satisfying s wji = 1, j = 1, t, i=1 α + β + χ = IFSD is used to calculate the analogousness between patients in the system, and to make the prediction of possible diseases for a patient It is obvious that the better the IFSD is, the higher of accuracy the health care support system may be achieved Thus, in The methods recalled in sub-section 2.1 achieved better accuracies than the relevant ones such as the standalone algorithms of intuitionistic fuzzy sets namely [4, 7, 10] and recommender systems, e.g [3, 5] These relevant researches mostly investigated on improving the similarity degree of IFRS to ensure the high accuracy of the system Being noticed that the most important assumption in IFRS is the numbers of intuitionistic linguistic labels in the features of the patients, in the characteristics of the symptoms and in the diseases being the same and denoted as s (See some lines before Definition 1) In practical applications, this situation may not happen and brings out the difficulty to apply IFSD in Definition and the hybrid similarity degree in Definition to 1600 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis them This motivates us to extend MC-IFRS in Definition and the equivalent similarity degrees to the new context Thus, in this paper we firstly extend MC-IFRS by using a new term – the intuitionistic fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in IFRS Then, the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are defined on the basis of the IFV Some mathematical properties of these new terms are examined, and several IVSM functions are proposed The performances of these IVSM functions for medical diagnosis are experimentally validated and compared with the existing similarity degrees of IFRS The suggestion and recommendation of this paper involve the most efficient IVSM function(s) that should be used for medical diagnosis Hence, the contributions of this paper occupy an important role to not only the theoretical aspects of recommender systems but also the applicable roles to the health care support system The proposed method In this section, we firstly propose a new MC-IFRS so-called the Modified MC-IFRS (MMC-IFRS) to handle the problem of different numbers of intuitionistic linguistic labels in the features of patients, the characteristics of symptoms and the diseases in subsection 3.1 An illustrated example of MMC-IFRS and the conversion of MMC-IFRS to the intuitionistic fuzzy value (IFV) are also given herein Secondly, we define the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) accompanied with some mathematical properties in sub-section 3.2 Several IVSM functions for the validation in the experiments are also proposed in this sub-section 3.1 Modified multi-criteria intuitionistic fuzzy recommender system Recall that P, S and D being the sets of patients, symptoms and diseases having the cardinalities of n, m and p, respectively Each patient Pi (i = 1, n) is assumed to have N features X1 , , XN Each feature Xe consists of re linguistic labels (e = 1, N) Each symptom Sj (j = 1, m) is assumed to have M characteristics Y1 , , YM Each characteristic Yf consists of sf linguistic labels (j = 1, m ) Each disease Dg contains tg intuitionistic linguistic labels (g = 1, p) Definition (Modified Multi-criteria Intuitionistic Fuzzy Recommender Systems – MMC-IFRS) The utility function R is a mapping: ⎞ ⎛ N Xe ×⎝ M p Yf ⎠ → f =1 e=1 N μ1Xe (xe ) , γ1Xe (xe ) e=1 μre Xe (xe ) , γre Xe (xe ) M × Dg , g=1 μ1Yf yf , γ1Yf yf f =1 μsf Yf yf , γsf Yf yf p μ1D Dg , γ1D Dg g=1 μtg D Dg , γtg D Dg → , (5) where μxXe (xe ) , γxXe (xe ) is the IFv of the patient to the x-th linguistic label of the feature Xe (x = 1, re , e = 1, N) μyYf yf , γyYf yf is the IFv of the symptom to the y-th linguistic label of the characteristic Yf (y = 1, sf , f = 1, M) Finally, μzD Dg , γzD Dg is the IFv of the disease Dg to the z-th linguistic label (z = 1, tg , g = 1, p) MMCIFRS provides two basic functions: a) Prediction: determine the values of μzD Dg , γzD Dg , z = 1, tg , g = 1, p b) Recommendation: choose z∗ = 1, tg which maximize(s) the expression p wg μzD Dg + μzD Dg πzD Dg , g=1 where πzD Dg = − μzD Dg − γzD Dg and wg ∈ [0, 1] is the weight of Dg satisfying p the constraint: g=1 wg = It is obvious that MMC-IFRS in Definition is a generalization of MC-IFRS in Definition Consider the example below to illustrate the new definition to medical diagnosis Example In a medical diagnosis system, there are patients The feature X is “Age” consisting of linguistic labels: “VL=very low”, “L=low”, “M=medium”, “H=high”, “VH=very high” By L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis using the trapezoidal intuitionistic fuzzy numbers – TIFNs [1] characterized by a1 , a2 , a3 , a4 ; a1 , a4 with a1 ≤ a1 ≤ a2 ≤ a3 ≤ a4 ≤ a4 , the membership (non-membership) functions of patients to the linguistic labels of the feature X are: ⎧ x ≤ 10 ⎪ ⎨ μVL (x) = (20 − x)/10 10 < x ≤ 20 , ⎪ ⎩ x > 20 ⎧ ⎪ ⎨ γVL (x) = (x − 10)/10 ⎪ ⎩ μL (x) = γL (x) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(x − 10)/10 ⎪ ⎪ ⎪ ⎪ ⎩(40 − x)/10 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(20 − x)/10 ⎪ ⎪ ⎪ ⎪ ⎩(x − 30)/10 1601 ⎧ ⎪ ⎨ (x − 70)/10 μVH (x) = ⎪ ⎩ x ≤ 70 70 < x ≤ 80 , x > 80 ⎧ ⎪ ⎨ γVH (x) = (80 − x)/10 ⎪ ⎩ x ≤ 70 70 < x ≤ 80 x > 80 Based on the membership and non-membership functions, we calculate the information of patients as follows x ≤ 10 10 < x ≤ 20 , x > 20 Al (18) : VL (0.2, 0.8) , L (0.8, 0.2) , M (0, 1) , x ≤ 10, x > 40 10 < x ≤ 20 , 20 < x ≤ 30 30 < x ≤ 40 x ≤ 10, x > 40 10 < x ≤ 20 , 20 < x ≤ 30 30 < x ≤ 40 H (0, 1) , VH (0, 1) , Bob (39) : VL (0, 1) , L (0.1, 0.9) , M (0.9, 0.1) , H (0, 1) , VH (0, 1) , Joe (53) : VL (0, 1) , L (0, 1) , M (0.7, 0.3) , H (0.3, 0.7) , VH (0, 1) , Ted (74) : VL (0, 1) , L (0, 1) , M (0.6, 0.4) , μM (x) = γM (x) = μH (x) = γH (x) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(x − 30)/10 ⎪ ⎪ ⎪ ⎪ ⎩(60 − x)/10 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(40 − x)/10 ⎪ ⎪ ⎪ ⎪ ⎩(x − 50)/10 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(x − 50)/10 x ≤ 30, x > 60 30 < x ≤ 40 , 40 < x ≤ 50 50 < x ≤ 60 x ≤ 30, x > 60 30 < x ≤ 40 , 40 < x ≤ 50 50 < x ≤ 60 ⎪ ⎪ ⎪ ⎪ ⎩(80 − x)/10 x ≤ 50, x > 80 50 < x ≤ 60 , 60 < x ≤ 70 70 < x ≤ 80 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(60 − x)/10 x ≤ 50, x > 80 50 < x ≤ 60 ⎪ ⎪ ⎪ ⎪ ⎩(x − 70)/10 60 < x ≤ 70 70 < x ≤ 80 , H (0.4, 0.6) , VH (0, 1) The symptom’s characteristic Y is “Temperature” including three linguistic labels: “C=cold”, “M=medium”, “H=hot” Similarly, the membership (non-membership) functions of the symptom to the linguistic labels of characteristic are defined using TIFNs as follows ⎧ x≤5 ⎪ ⎨ μC (x) = (20 − x) /15 < x ≤ 20 , ⎪ ⎩ x > 20 ⎧ ⎪ ⎨ γC (x) = (x − 5) /15 ⎪ ⎩ μM (x) = x≤5 < x ≤ 20 , x > 20 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(x − 5) /15 x ≤ 5, x > 40 < x ≤ 20 ⎪ ⎪ ⎪ ⎪ ⎩(40 − x) /5 20 < x ≤ 35 35 < x ≤ 40 , 1602 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis γM (x) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨(20 − x) /15 ⎪ ⎪ ⎪ ⎪ ⎩ (x − 35) /5 ⎧ ⎪ ⎨ (x − 35) /5 μH (x) = ⎪ ⎩ x ≤ 5, x > 40 < x ≤ 20 , 20 < x ≤ 35 35 < x ≤ 40 x ≤ 35 35 < x ≤ 40 , x > 40 ⎧ x ≤ 35 ⎪ ⎨ γH (x) = (40 − x) /5 35 < x ≤ 40 ⎪ ⎩ x > 40 The information of symptom is shown as follows 4◦ C : C (1, 0) , M (0, 1) , H (0, 1) , ◦ 16 C : C (0.267, 0.733) , M (0.733, 0.267) , H (0, 1) , Table A MMC-IFRS for medical diagnosis with ∗ being the values to be predicted Age Flu Headache 4◦ C : C (1, 0) M (0, 1) H (0, 1) L1 (.8, 1) L2 (.6, 3) L3 (.2, 6) L4 (.1, 9) L1 (.1, 8) L2 (.2, 7) L3 (.5, 35) L4 (.6, 2) L5 (.4, 5) L6 (.3, 55) Bob (39) : VL (0, 1) L (.1, 9) M (.9, 1) H (0, 1) VH (0, 1) 39◦ C : C (0, 1) M (.2, 8) H (.8, 2) L1 (.4, 5) L2 (.6, 2) L3 (.3, 6) L4 (.1, 9) L1 (0, 9) L2 (.2, 75) L3 (.4, 55) L4 (.55, 35) L5 (.7, 2) L6 (.6, 3) Joe (53) : VL (0, 1) L (0, 1) M (.7, 3) H (.3, 7) VH (0, 1) 16◦ C : C (.267, 733) M (.733, 267) H (0, 1) L1 (0, 1) L2 (.2, 7) L3 (.4, 5) L4 (1, 0) L1 (0, 0.9) L2 (.4, 6) L3 (.4, 45) L4 (.7, 2) L5 (.3, 6) L6 (.1, 85) Ted (74) : VL (0, 1) L (0, 1) M (.6, 4) H (.4, 6) VH (0, 1) 25◦ C : C (0, 1) M (1, 0) H (0, 1) L1 (∗, ∗) L2 (∗, ∗) L3 (∗, ∗) L4 (∗, ∗) L1 (∗, ∗) L2 (∗, ∗) L3 (∗, ∗) L4 (∗, ∗) L5 (∗, ∗) L6 (∗, ∗) Al (18) : VL (.2, 8) L (.8, 2) M (0, 1) H (0, 1) VH (0, 1) Temperature 39◦ C : C (0, 1) , M (0.2, 0.8) , H (0.8, 0.2) , ◦ 25 C : C (0, 1) , M (1, 0) , H (0, 1) The diseases (D1 , D2 ) are “Flu” and “Headache”, where D1 contains four linguistic labels: “L1=Level 1”, “L2=Level 2”, “L3=Level 3” and “L4=Level 4”, D2 contains six linguistic labels: “L1=Level 1”, “L2=Level 2”, “L3=Level 3”, “L4=Level 4”, “L5=Level 5” and “L6=Level 6” We would like to verify which ages of users and types of temperature are likely to cause the diseases of flu and headache In this case we have a MMC-IFRS system We have a MMC-IFRS described in Table In this table, the cells having asterisk marks are needed to predict the intuitionistic fuzzy values μzD Dg , γzD Dg (z = 1, tg , g = 1, 2]) A compression form of MMCIFRS is shown in Definition Definition An intuitionistic fuzzy vector (IFV) in MMC-IFRS is defined as follows V = (v1 , v2 , , vK ) , where K = K1 + K2 + K3 , K1 = N e=1 re , K2 = p M f =1 sf , K3 = g=1 tg The first K1 elements of V are a11 , , a1r1 , ae1 , , ae re , aN1 , , aNrN , with aex represents for an IFv of the patient to the linguistic label x-th of feature Xe (x = 1, re , e = 1, N) The next K2 elements of V are b11 , ,b1s1 , , bf , , bfsf , , bM1 , , bMsM, where bfy means an IFv of the symptom to the linguistic label y-th of characteristic Yf (y = 1, sf , f = 1, M) And the last K3 elements of V are c11 , , c1t1 , , cg1 , , cgtg , , cp1 , , cptp , where cgz is an IFv of the disease Dg to the linguistic label z-th (z = 1, tg , g = 1, p) 3.2 Intuitionistic value similarity measure and intuitionistic vector similarity measure In the following definition, θ denotes the set of all intuitionistic fuzzy values (IFVs) Definition (Intuitionistic value similarity measure–IvSM) Let R be the set of all real number, sim : θ × θ → R is called an intuitionistic value similarity measure (IvSM) if it satisfies the following conditions: (A1) sim (u, v) = sim (v, u), for all u, v ∈ θ; (A2) ≤ sim (u, v) ≤ 1, for all u, v ∈ θ; L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis (A3) sim (u, v) = ⇔ u = v, for all u, v ∈ θ; (A4) If u ≤ v ≤ w, then sim (u, v) ≥ sim (u, w) and sim (v, w) ≥ sim (u, w), for all u, v, w ∈ θ (u ≤ v means μu ≤ μv and γu ≥ γv ) Theorem For all u, v ∈ θ, we define: sim1 (u, v) = − sim2 (u, v) = (|μu − μv | + |γu − γv |) ; (6) {μu , μv } + {γu , γv } ; (7) max {μu , μv } + max {γu , γv } sim3 (u, v) = exp − 21 (|μu − μv | + |γu − γv |) − exp (−1) − exp (−1) ; (8) exp − 21 √ μu − √ √ √ μ v + γ u − γv − exp (−1) − exp (−1) Definition (Intuitionistic vector similarity measure–IVSM) Let SIM : × → R SIM is called an intuitionistic vector similarity measure (IVSM) if it satisfies the following conditions: (B1) SIM (U, V ) = SIM (V, U), for all U, V ∈ ; (B2) ≤ SIM (U, V ) ≤ 1, for all U, V ∈ ; (B3) SIM (U, V ) = ⇔ U = V , for all U, V ∈ ; (B4) If U ≤ V ≤ T, then SIM (U, V ) ≥ SIM (U, T ) and SIM (V, T ) ≥ SIM (U, T ), for all U, V , T ∈ (let U = (u1 , , uK ), V = (v1 , , vK ) U ≤ V means u ≤ v for all = 1, K) Definition 10 Let U, V ∈ , sim is an IvSM, and W = (w1 , , wK ) is weight vector satisfying w ≥ ( = 1, K) and K=1 w = We define: 1) The quadric intuitionistic fuzzy similarity degree between U and V : sim4 (u, v) = 1603 (9) K SIMQ (U, V ) = w (sim (u , v ))2 =1 Then, sim1 , sim3 , sim3 and sim4 are IvSMs Notice that to avoid the denominator being zero, set 00 = in the definition of sim2 Proof We consider sim1 , the remainders are also proved by analogous calculation (A1) and (A3) are straightforward (A2) We have ≤ |μu − μv | + |γu − γv | ≤ It follows that ≤ sim1 (u, v) ≤ (A4) We prove sim1 (u, v) ≥ sim1 (u, w) with condition of u ≤ v ≤ w By the definition of the relation ≤ of IFvs, we get μu ≤ μv ≤ μw and γu ≥ γv ≥ γw which implies ((μv − μu ) + (γu − γv )) ≥ − ((μw − μu ) + (γu − γw )) = − (|μw − μu | + |γw − γu |) = sim1 (u, w) sim1 (u, v) = − By similar argument, we get sim1 (v, w) ≥ sim1 (u, w) In the following definition, denotes the set of all intuitionistic fuzzy vectors (IFVs) having the lengths of K in MMC-IFRS (10) 2) The arithmetic intuitionistic fuzzy similarity degree between U and V : K SIMA (U, V ) = w sim (u , v ) (11) =1 3) The geometric intuitionistic fuzzy similarity degree between U and V : K SIMG (U, V ) = (sim (u , v ))w (12) =1 4) The harmonic intuitionistic fuzzy similarity degree between U and V : K SIMH (U, V ) = =1 w sim (u , v ) −1 (13) Theorem Let U, V ∈ We have SIMQ (U, V ) ≥ SIMA (U, V ) ≥ SIMG (U, V ) ≥ SIMH (U, V ) Proof The proof is done by using inequalities: the Cauchy-Schwarz and the AM-GM inequalities For example, we SIMQ (U, V ) ≥ SIMA (U, V ) Using the Schwarz inequality, classical weighted consider Cauchy- 1604 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis or SIMQ (U, V ) ≥ SIMQ (U, T ) The remainders of proof are analogous K ≥ = 1K y2 x2 = 1K x y , =1 Definition 11 Let SIM is an IVSM The formulas to predict the values of linguistic labels of the patient P ∗ to the diseases Dg (g = 1, p) in MMC-IFRS are: for all (x1 , , xK ), (y1 , , yK ) ∈ RK , we have K w (sim (u , v ))2 n =1 P∗ K = w K 1/2 w =1 1/2 μzD Dg = sim (u , v ) w 1/2 w 1/2 n sim (u , v ) P∗ γzD Dg = =1 v=1 Pv Dg SIM (P ∗ , Pv ) × γzD n K = w sim (u , v ) , SIM (P ∗ , Pv ) v=1 ≥ , SIM (P ∗ , Pv ) v=1 for all ∀z ∈ 1, tg , g = 1, p =1 That means SIMQ (U, V ) ≥ (SIMA (U, V ))2 , or SIMQ (U, V ) ≥ SIMA (U, V ) Theorem Assume that w > for all = 1, K SIMQ , SIMA , SIMG and SIMH are IVSM Proof Obviously, SIMQ , SIMA , SIMG and SIMH satisfy (B1) (B2) For all U, V ∈ By Theorem 2, it is sufficient to prove that SIMQ (U, V ) ≤ From sim (u , v ) ≤ 1, for all = 1, K, we obtain Theorem For all z ∈ 1, tg , g = 1, p and patient ∗ ∗ P D P ∗ , we have μPzD Dg , γzD g is an IFv ∗ P Dg ≥ 0, and Proof It is easily seen that μzD ∗ P D ≥ Moreover, γzD g ∗ ∗ P μPzD Dg + γzD Dg n = v=1 Pv Dg SIM (P ∗ , Pv ) × μPzDv Dg + γzD n SIM SIMQ (U, V ) (P ∗ , Pv ) v=1 K w (sim (u , v )) K ≤ =1 w = =1 (B3) For all U, V ∈ sim (u , v ) ≥ sim (u , t ) , ∀ = 1, K Thus, (sim (u , v ))w ≥ (sim (u , t ))w , for all 1, K Hence 2 w (sim (u , v )) is an IFv, then μPzDv Dg + ∗ ∗ Pv P D γzD Dg ≤ Thus, μPzD Dg + γzD g ≤ Evaluation By Theorem 2, if one in the values SIMQ (U, V ), SIMA (U, V ) and SIMG (U, V ) equals to 1, then SIMH (U, V ) equals to Then, SIMQ (U, V ), SIMA (U, V ) and SIMG (U, V ) satisfy (B3) (B4) The condition U ≤ V ≤ T yields that K Pv μPzDv Dg , γzD Dg , it is easily to show that SIMH (U, V ) = ⇔ U = V =1 n =1 K = v=1 SIM (P ∗ , Pv ) × μPzDv Dg = K ≥ w (sim (u , t )) =1 , 4.1 Experimental design In this part, we describe the experimental environments such as, Experimental tools: We have implemented 16 variants of the prediction algorithm for medical diagnosis by matching each IVSM function in Equations (6–9) with each IvSM function given in Equations (10– 13) in PHP programming language Notice that the variant combining Equations (9, 11) is exactly the IFSD function of MC-IFRS defined in Equations (3–4) of Definitions & 4, respectively Thus, we clearly recognize that IFSD is a special case of L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis 1605 Table The MAE values of the variants by k-fold cross validation with the best values being marked as bold k A1 A2 A3 A4 A5 A6 A7 A8 10 0.49778 0.49649 0.49269 0.49117 0.4918 0.49461 0.49481 0.49092 0.49012 0.49593 0.4948 0.49075 0.48895 0.4896 0.49245 0.4923 0.48866 0.48793 0.48619 0.48123 0.47834 0.47588 0.47414 0.47895 0.47481 0.47362 0.4726 0.4946 0.49524 0.49241 0.49131 0.49297 0.49384 0.49705 0.4899 0.49059 0.49781 0.49657 0.49266 0.49115 0.49179 0.49462 0.49484 0.49098 0.49017 0.49569 0.49456 0.49046 0.48867 0.48931 0.49212 0.49212 0.48843 0.48776 0.48365 0.4761 0.47035 0.47284 0.46134 0.47172 0.47531 0.46213 0.46502 0.49275 0.49368 0.49125 0.4898 0.49144 0.49192 0.49599 0.48804 0.48898 A9 0.49779 0.49647 0.49268 0.49115 0.49179 0.4946 0.49482 0.49095 0.49015 A10 A11 A12 A13 A14 (IFSD) A15 A16 Average 0.49573 0.49462 0.49057 0.48872 0.48938 0.4922 0.49213 0.48849 0.48778 0.48508 0.48031 0.47735 0.47471 0.47313 0.47783 0.47373 0.47258 0.47154 0.49318 0.49418 0.49149 0.4903 0.49197 0.49241 0.49629 0.48856 0.48952 0.49798 0.49662 0.49287 0.49138 0.49204 0.49489 0.49512 0.4912 0.49038 0.49621 0.49499 0.49098 0.48921 0.48987 0.49278 0.49268 0.48903 0.48826 0.4867 0.48148 0.47864 0.47614 0.47432 0.47932 0.47519 0.47417 0.47294 0.4953 0.49578 0.49315 0.49191 0.49382 0.49453 0.49794 0.49093 0.49154 0.49327 0.49144 0.48792 0.48646 0.48617 0.4893 0.4897 0.48491 0.48471 the proposed IVSM functions in this work The variants are denoted from A1 to A16 with A1 being matched between Equations (6, 10), A2 being matched between Equations (6, 11), and A16 being matched between Equations (9, 13) A14 is replaced with the IFSD function [9] as explained above Notice that the hybrid similarity degree [9] described in Definition is just a derivative of IFSD with the supplement of information from a picture fuzzy clustering method so that for the accurate comparison between the original similarity degrees, it should not be mentioned herein Further hybridization between the IVSM functions and the degree from a picture fuzzy clustering method is considered in another work These algorithms are executed on a PC Intel(R) core(TM) Duo CPU T6400 @ 2.00GHz 2GB RAM The results are taken as the average value of 50 runs Evaluation indices: Mean Absolute Error (MAE) and the computational time Datasets: The benchmark medical diagnosis da-taset namely HEART from UCI Machine Learning Repository [12] consisting of 270 patients characterized by 13 attributes This dataset was also used for experiments in [9, 11] Cross validation: The cross-validation method for the experiments is the k-fold validation with k from to 10 Besides testing with the k-fold validation, the random experiments with the cardinalities of the testing being from 10 to 100 random elements are also performed In order to validate the results with accurate classes, the intuitionistic defuzzification method of [1] as in Example is used for experimental algorithms Parameter setting: the weights of the degrees are set up as in [9, 11] Objective: To validate the performance of IVSM functions in terms of accuracy through evaluation indices 4.2 Assessment In Tables and 3, we illustrate the MAE values and the computational time of the variants by k-fold cross validation respectively From Table 2, we calculate the average MAE values of variants by the numbers of folds This Tab shows the MAE values of the A7 variant is the best among all Besides A7, other variants such as A3, A11 and A15 should be used for the best MAE values of the algorithm It is clear that a large number of folds not correspond to the better MAE value of algorithm For the sake of both the computational time and MAE values, the number of folds should be selected within the range [8, 10] especially when it is equal to 9, the average and the best MAE values of all variants are 0.484 and 0.462 respectively, which hold the best trials among all In Table 3, the average computational time of all variants by various numbers of folds are illustrated Apparently, the processing time of these algorithms is from 0.68 to 1.44 seconds (sec) Furthermore, 1606 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis Table The computational time of the variants by k-fold cross validation with the best values being marked as bold (sec) k A1 A2 A3 A4 A5 A6 A7 A8 10 0.62368 1.02375 1.0243 0.96545 0.88094 0.79568 0.72382 0.68045 0.61986 0.53082 0.86802 0.85899 0.81039 0.74145 0.67237 0.61374 0.56868 0.51953 0.6361 1.04342 1.03488 0.97614 0.89037 0.80626 0.7345 0.68627 0.62579 0.64182 1.05719 1.0515 1.18887 0.91322 0.97744 0.75013 0.69806 0.63991 1.11738 1.81597 1.80621 1.70328 1.55427 1.40876 1.2764 1.19186 1.09598 0.99672 1.65204 1.6414 1.55404 1.41456 1.28346 1.16276 1.08773 0.99844 1.04728 1.73033 1.72061 1.71526 1.63634 1.43476 1.48867 1.55723 1.11612 1.11162 1.85574 1.84013 1.73646 1.58472 1.43311 1.2996 1.22066 1.11776 A9 0.74955 1.23906 1.23272 1.16188 1.06067 0.96116 0.87313 0.81649 0.74981 A10 A11 A12 A13 A14 (IFSD) A15 A16 Average 0.65647 1.08841 1.08387 1.02141 0.93403 0.84421 0.76648 0.7164 0.65793 0.73759 1.21238 1.20848 1.14285 1.04405 0.94368 0.85827 0.80007 0.73414 0.77081 1.26825 1.27078 1.20027 1.08885 0.98992 0.89568 0.83835 0.76777 0.87228 1.44838 1.44172 1.36625 1.24264 1.16437 1.02154 0.96022 0.88082 0.87036 1.38778 1.89513 2.20693 1.39472 1.16454 0.94968 0.90273 0.8214 0.88133 1.46329 1.45699 1.37967 1.25416 1.13861 1.03229 0.96385 0.88472 0.87031 1.45111 1.44352 1.36779 1.24258 1.12948 1.02651 0.95893 0.8783 0.81963 1.35032 1.3757 1.34356 1.17985 1.07174 0.96707 0.9155 0.81927 Table The MAE values of the variants by random experiments with the best values being marked as bold Dataset 10 20 30 40 50 60 70 80 90 A9 0.4987 0.4942 0.49168 0.49056 0.49366 0.49295 0.4936 0.49604 0.49624 A1 A2 A3 A4 A5 A6 A7 A8 0.49872 0.49422 0.49167 0.49055 0.49372 0.49296 0.49359 0.49602 0.49624 0.4966 0.49222 0.49049 0.48833 0.49261 0.49106 0.4916 0.49409 0.49446 0.48595 0.48074 0.4841 0.47383 0.4824 0.47807 0.47891 0.48219 0.4847 0.50024 0.49817 0.491 0.49299 0.49411 0.49316 0.49389 0.49624 0.49567 0.49872 0.49421 0.4917 0.49057 0.49365 0.49295 0.49362 0.49606 0.49626 0.49635 0.49202 0.49038 0.48815 0.49241 0.49081 0.49139 0.49392 0.49425 0.4801 0.47099 0.47765 0.4648 0.48028 0.46979 0.47604 0.47761 0.48173 0.49907 0.49757 0.48979 0.4921 0.49356 0.49182 0.49268 0.49488 0.494 A10 A11 A12 A13 A14 (IFSD) A15 A16 Average 0.49641 0.49205 0.49038 0.48819 0.49245 0.49089 0.49143 0.49395 0.49427 0.48488 0.47994 0.48348 0.47282 0.48193 0.47711 0.47799 0.48128 0.48379 0.49953 0.49777 0.49027 0.49242 0.49382 0.4922 0.49303 0.49542 0.49462 0.49894 0.49438 0.4919 0.49072 0.49378 0.49321 0.49384 0.4963 0.49646 0.49685 0.49238 0.49081 0.48854 0.4927 0.49138 0.49191 0.49444 0.49473 0.48616 0.4807 0.48444 0.47398 0.48258 0.47843 0.47919 0.48258 0.48501 0.50106 0.4985 0.49165 0.49357 0.49432 0.49402 0.49459 0.49715 0.49632 0.49489 0.49063 0.48884 0.48576 0.4905 0.48818 0.48921 0.49176 0.49242 A2 is the best variant in term of the computational time In order to validate the efficiencies of variants, we have made experiments on another cross validation method Tables and demonstrate the MAE values and the computational time of the variants by random experiments respectively The remarks about the superior of A7 and other variants such as A3, A11 and A15 are kept intact The results have clearly shown that the ideal cardinality of the testing set should be selected as 40 or in the range [20, 60] Conclusions In this paper, we concentrated on improving the accuracy of medical diagnosis in the health care support system We have shown that Intuitionistic Fuzzy Recommender System (IFRS) and the hybridization between IFRS and a picture fuzzy clustering method are the efficient tools to achieve the desired goal Nonetheless, both these methods were relied on an important assumption in IFRS confirming that the numbers of intuitionistic linguistic L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis 1607 Table The computational time of the variants by random experiments with the best values being marked as bold (sec) Dataset 10 20 30 40 50 60 70 80 90 A9 0.33304 0.61239 0.82699 1.01144 1.1541 1.24604 1.2845 1.30123 1.26806 A1 A2 A3 A4 A5 A6 A7 A8 0.27702 0.50971 0.70282 0.84714 0.9358 0.96953 1.0147 1.03242 1.002 0.22733 0.42011 0.57596 0.70389 0.80834 0.86197 0.89381 0.90212 0.88085 0.27488 0.50733 0.69197 0.82985 0.93538 1.00034 1.04289 1.05397 1.02964 0.27221 0.49151 0.67851 0.82521 0.94585 1.00412 1.04741 1.05864 1.03186 0.46429 0.83186 1.14413 1.39252 1.58242 1.70361 1.76746 1.7912 1.74536 0.43112 0.7901 1.08339 1.32589 1.50208 1.62146 1.6842 1.7068 1.66037 0.48753 0.87638 1.19806 1.466 1.66725 1.78988 1.85808 1.89346 1.84491 0.46789 0.86537 1.18301 1.44347 1.63351 1.75908 1.83507 1.857 1.80743 A10 A11 A12 A13 A14 (IFSD) A15 A16 Average 0.28602 0.52829 0.72495 0.88395 1.00826 1.07822 1.12394 1.14231 1.10832 0.33092 0.60442 0.81761 0.99718 1.12683 1.21263 1.26343 1.27956 1.24964 0.32278 0.59332 0.81626 0.99415 1.1277 1.21261 1.26634 1.28096 1.24248 0.37994 0.69138 0.9456 1.16695 1.32588 1.42422 1.48078 1.49851 1.46627 0.33401 0.61902 0.84867 1.0418 1.17979 1.26936 1.31853 1.33657 1.30283 0.37816 0.69674 0.95819 1.17251 1.33447 1.43551 1.49154 1.51599 1.47189 0.37537 0.69527 0.95693 1.16493 1.32474 1.42471 1.48255 1.49864 1.46174 0.35266 0.64583 0.88457 1.07922 1.22453 1.31333 1.36595 1.38434 1.34835 labels in the features of patients, in the characteristics of symptoms and in the diseases are the same, which results in unrealistic practical applications Therefore, in this paper we proposed a new term – the intuitionistic fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in IFRS Then, a generalization of the existing multicriteria IFRS so-called the Modified multi-criteria IFRS (MMC-IFRS) that takes into account the IFV has been presented Two new measures namely the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) have been defined Some mathematical properties of these new terms were examined, and several IVSM functions were proposed The performances of these IVSM functions for medical diagnosis were experimentally validated and compared with the existing similarity degrees of IFRS Our contributions including the definitions of the new system for medical diagnosis, some interesting theorems and the performance evaluation of 16 IVSM functions were presented accordingly The findings from the research are summarized as follows Firstly, the modification did make the improvement of accuracy of the system Clearly, the MAE values of the new IVSM functions are better than that of IFSD, which is currently used in IFRS to make the prediction of possible diseases for patients The results from Tables to demonstrated that the values of IFSD in the variant A14 are worse than some values of other variants Secondly, the best variants in term of the accuracy are A7, A3, A11 and A15 Thirdly, the best variants in term of the computational time are A1, A2, A3, A4 and A10 Fourthly, the ideal number of folds for validation should be selected within the range [8, 9] especially Fifthly, the ideal cardinality of the testing set should be selected as 40 or in the range [20, 60] Lastly, all variants are stable by various cross validation methods These findings would help researchers choose appropriate algorithms and variants for specific purposes in the health care support system Further works of this research could be lay into the following directions Firstly, a variation of the IVSM function based algorithm that tackles with the deficiency of processing small-sized real datasets should be studied Secondly, a hybrid algorithm with a picture fuzzy clustering method to enhance the accuracy is considered Thirdly, more theoretical analyses of the MMC-IFRS especially the adaptation with other fuzzy operators such as t-norm and t-conorm are examined Lastly, applications of the variants in this paper for other problems, e.g the time series forecast and the nowcasting could be performed These future works will enrich the knowledge of deploying advanced fuzzy recommender systems for practical problems References [1] G Albeanu and F.L Popentiu-Vladicescu, Intuitionistic fuzzy methods in software reliability modelling, Journal of Sustainable Energy 1(1) (2010), 30–34 1608 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis [2] R Basu, U Fevrier-Thomas and K Sartipi, Incorporating hybrid CDSS in primary care practice management, McMaster eBusiness Research Centre (MeRC), DeGroote School of Business, McMaster University, Canada, 2011 [3] D.A Davis, N.V Chawla, N Blumm, N Christakis and A.L Barabási, Predicting individual disease risk based on medical history, in: Proceedings of the 17th ACM Conference on Information and Knowledge Management, New York, NY, USA, 2008, pp 769–778 [4] S.K De, R Biswas and A.R Roy, An application of intuitionistic fuzzy sets in medical diagnosis, Fuzzy Sets and Systems 117(2) (2001), 209–213 [5] S Hassan and Z Syed, From netflix to heart attacks: Collaborative filtering in medical datasets, in: Proceedings of the 1st ACM International Health Informatics Symposium, New York, NY, USA, 2010, pp 128–134 [6] E.C Kyriacou, C.S Pattichis and M.S Pattichis, An overview of recent health care support systems for eEmergency and mHealth applications, in: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, Minnesota, USA, 2009, pp 1246–1249 [7] A.E Samuel and M Balamurugan, Fuzzy max-min composition technique in medical diagnosis, Applied Mathematical Sciences 6(35) (2012), 1741–1746 [8] L.H Son, DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets, Expert Systems with Applications 42(1) (2015), 51–66 [9] L.H Son and N.T Thong, Intuitionistic fuzzy recommender systems: An effective tool for medical diagnosis, KnowledgeBased Systems 74 (2015), 133–150 [10] E Szmidt and J Kacprzyk, A similarity measure for intuitionistic fuzzy sets and its application in supporting medical diagnostic reasoning, in: Artificial Intelligence and Soft Computing-ICAISC, Springer Berlin Heidelberg, 2004, pp 388–393 [11] N.T Thong and L.H Son, HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis, Expert Systems With Applications 42(7) (2015), 3682–3701 [12] University of California (2007) UCI Repository of Machine Learning Databases Available at: http://archive ics.uci.edu/ml/ ... 1.44 seconds (sec) Furthermore, 1606 L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis Table The computational time of the variants by k-fold cross validation with... that IFSD is a special case of L.H Son and P.H Phong / On the performance evaluation of IVSMs for medical diagnosis 1605 Table The MAE values of the variants by k-fold cross validation with the. .. Phong / On the performance evaluation of IVSMs for medical diagnosis h = 3, t Each line from the third one to the last in Equation (2) is related to a given disease Based on IFM, the intuitionistic

Định dạng
Số trang	12
Dung lượng	120,53 KB