Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.Tóm tắt tiếng việt: Hệ tư vấn dựa trên trường hàm ý thống kê.
UNIVERSITY OF DANANG UNIVERSITY OF SCIENCE AND TECHNOLOGY - - NGUYEN TAN HOANG RECOMMENDER SYSTEM BASED ON STATISTICAL IMPLICATIVE FIELD Specialization: Computer Science Code: 48 01 01 DOCTORAL THESIS SUMMARY Danang – 2022 The dissertation is completed at: UNIVERSITY OF SCIENCE AND TECHNOLOGY UNIVERSITY OF DANANG Academic Instructors: Associate Professor Huynh Xuan Hiep, PhD Huynh Huu Hung, PhD Opponent 1:…………………………… …………… Opponent 2:……………… ……… ……………… Opponent 3:……………… …… ………………… The dissertation will be defended before the Board of thesis review Meeting at: University of Science and Technology – The University of Da Nang At hour day month year The dissertation is available at: - National Library - Information and Learning Center, University of Da Nang PREFACE The urgency of the thesis In the online world, where information is growing at an exponential rate with the growth of e-commerce, online storage services and information delivery services, finding the right information for demand is a challenge for users to be able to make the right decisions For businesses and organizations in the field of services and commerce, obtaining customers' trust in search results is extremely important and is a really difficult task Recommender systems quickly prove to be a very useful tool in assisting in providing necessary and relevant information to users and commercial and service providers in such situations They support in effective decision making, saving time and effort However, to meet the increasing demand for quality as well as quantity of recommendations, the study of new recommender algorithms or improvement of recommender systems and improvement of the quality of recommendations, limits or weaknesses of the current recommender system approaches is the current research trend This thesis focuses on proposing a new recommender model based on the statistical implication field in order to improve the accuracy and processing time of the recommendations as well as expand the recommendation capacity to reflect the relationship between the user/item to a certain degree of implication Deploy proposed and experimental models on standard data sets to evaluate and compare results with other effective models Objectives, objects and scope of research of the thesis 2.1 Research objectives The objective of the thesis is to survey the recommender system and study the basic content of statistical implication, especially implicative variation and implication fields, as a basis for researching and proposing a implication rule mining framework (association rule satisfying the condition of statistical implication), and from that, we propose the application of a implication rule mining framework in building an recommender model based on the implication field 2.2 Research subjects The subjects of the thesis include: The measures of implication variation in the implication field formed from the process of statistical implication variation; Collaborative filtering recommender models based on implication variation and recommender models based on statistical implication field 2.3 Research scopes The scope of the study is: Learning the theory of statistical implication analysis, especially implication variation and statistical implication field; collaborative filtering recommender, studies on recommender systems based on statistical implication analysis to serve as the basis for the proposal; and propose new recommender models based on implication field that can be applied on both binary and non-binary data and improve recommender efficiency (as measured by the accuracy of the prediction item, classification of the recommended item, predicted item ratings) Research methodology Literature review and experiment are two main research methods to be used by this dissertation Contribution of the thesis - Firstly, propose a set of measures of statistical implication variation (including four measures of implication index variation and four measures of implication intensity variation) to serve as a basis for building functional rule mining frameworks and consulting models - Secondly, propose an implication association rule mining framework (implication rule) based on the integration of association rule mining framework (the mining framework using support and reliability) with the implication variability measure - Thirdly, propose recommender models include (1) The recommender model based on association rule mining using implication variation to generate recommendations based on the implication isoequivalence of association rules and is applied to binary data sets; (2) The recommender model based on the statistical implication field developed from the recommender model based on association rule mining using implication variation and implication rule mining framework can be applied on both binary data and non-binary according to significance levels of statistical implication on association rules, users, and data items - Fourthly, data partitioning based on the item evaluated on each transaction instead of the data partitioning method based on the number of transactions in the data set to improve the quality of training and evaluation of the recommender model and is applied to the implication field-based recommender model - Finally, develop tools to build, train and evaluate the implicationfieldRS recommender system and test scenarios for the proposed recommender model using this tool Thesis structure The thesis is organized into parts as the followings The opening part introduces the urgency, objectives, objects, research scope and research methods of the thesis Chapter 1: An overview of statistical implicative field and recommender system Chapter 2: Models of recommender system based on implication field, including a collaborative filtering recommender model based on implication variation and recommendation model based on statistical implication field Chapter 3: Experiment and evaluate the results The conclusion part includes the main contributions and future work Appendices include: (1) Proving the asymmetry of the measures of statistical implication; and (2) Prove the equivalence of the implication index formulas in the case of binary data CHAPTER AN OVERVIEW OF STATISTICAL IMPLICATION FIELD AND RECOMMENDER SYSTEM 1.1 Statistical implication analysis Overview of statistical implication analysis (SIA), a method for studying the rule-like relationship between variables and/or between variables and rules proposed by Regis Gras in the 1990s, then, SIA proposes implication measures that have statistical, asymmetrical, nonlinear properties and rely on statistical probability to evaluate the relationship between data variables 1.1.1 Statistical implication measures SIA includes two main measures to evaluate the degree of implication of the relationship a → b, which is the implication index presented by the formula (1.1) 𝑞(𝑎, 𝑏̅) 𝑛𝐴 𝑛𝐵̅ 𝑛 , 𝑖𝑓 𝑎, 𝑏 ∈ {0,1} 𝑛𝐴 𝑛𝐵̅ √ 𝑛 𝑛 𝑛𝐵̅ = 𝐴 ̅ ∑𝑖∈𝐸 𝑎(𝑖)𝑏(𝑖) − 𝑛 , 𝑖𝑓 𝑎, 𝑏 ∈ [0,1] 2 2 2 𝑠𝐵̅ + 𝑛𝐵̅ ) √(𝑛 𝑠𝐴 + 𝑛𝐴 )((𝑛 𝑛 { 𝑛𝐴𝐵̅ − (1.1) And the implication intensity, according to the formula (1.2) 𝜑(𝑎, 𝑏) = {√2𝜋 ∞ 𝑡2 ∫ 𝑒 − 𝑑𝑡 , 𝑛𝐵 ≠ 𝑛 (1.2) 𝑞(𝑎,𝑏̅ ) 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 In which, the lower the implication index, the higher the implication intensity and the higher the level of implication 1.1.2 Implication index variation and implication field Variation of 𝑞(𝑎, 𝑏̅) for variables(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) creates a scalar vector field 𝐶 which in Frechet's geometric sense is expressed by the formula (1.3): 𝜑𝑑𝑞 = 𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 ̅ 𝜕𝑛 𝜕𝑛𝐴 𝐴 𝜕𝑛𝐵 𝐵 𝜕𝑛𝐴𝐵̅ 𝐴𝐵 = 𝑔𝑟𝑎𝑑𝑞 𝑑𝑀 (1.3) Where 𝑀 is a point with coordinates(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) of the scalar vector field 𝐶, 𝑑𝑀 is the differential component vertor of the variation and grad q is the partial derivative vertor of the variation This gradient field satisfies the Schwartz criterion for mixed differential for each pair of variables 𝑋, 𝑌 ∈ {𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ } as formula (1.4) and is called the implication field in this thesis 𝜕 𝜕𝑞(𝑎, 𝑏̅) 𝜕 𝜕𝑞(𝑎, 𝑏̅) ( )= ( ) 𝜕𝑛𝑋 𝜕𝑛𝑌 𝜕𝑛𝑌 𝜕𝑛𝑋 (1.4) The implication field generated from the variation of the implication index, consisting of the set of equipotential surface of the implication rule with the same statistical implication value determined by the equation (1.5) 𝑞(𝑎, 𝑏̅) − 𝑛𝐴 𝑛𝐵̅ 𝑛 =0 𝑛𝐴 𝑛𝐵̅ √ 𝑛 𝑛𝐴𝐵̅ − (1.5) 1.2 Recommender system 1.2.1 definition A recommender system consists of a set of users denoted by 𝑈 (users), and a set of items denoted by 𝐼 (items) Furthermore, the set of ratings in the system is represented by matrice 𝑅𝑈×𝐼 , and the set of possible values for a rating is 𝑆 (Scores) The recommender system model is built as a function (formula (1.6)) 𝑓: 𝑈 × 𝐼 → 𝑆 (1.6) And its task is to predict the rating 𝑓(𝑢, 𝑖) of a user 𝑢 ⊂ 𝑈 for a new item 𝑖 ⊂ 𝐼 This function is then used to recommend the target user ua an item 𝑖 ∗ which evaluates to the highest value estimate as the formula (1.7) 𝑖 ∗ = 𝑎𝑟𝑔 max 𝑓(𝑢𝑎 , 𝑗) 𝑗∈𝐼\𝐼𝑢 (1.7) 1.2.2 Evaluation The evaluation of the recommender model will be carried out according to the approaches: splitting, bootstraping and k-fold crossevaluation There are two common groups of measures to evaluate recommender systems, namely, the group of rating prediction accuracy measures (MAE, MSE, RMSE) and the group of items classification accuracy recommendation (precision, recall, F1) 1.2.3 Classification In terms of techniques, the recommender system is built according to content filtering; collaborative filtering, including memory-based (user-based, item-based) and model-based (build machine learning models for recommender systems); other techniques and hybridization of techniques Among them, the most commonly used and effective technique is collaborative filtering 1.2.4 Research status and recommendations Learn about the research and development of recommender systems in general and recommender systems based on collaborative filtering in particular, especially collaborative filtering recommender systems based on association rule mining and collaborative filtering models based on statistical implication analysis and then point out their limitations and propose a research direction to build recommender systems based on statistical implication field 1.3 Chapter summary Chapter focuses on obtaining the understanding on (1) statistical implication analysis, especially implication variation and statistical implication field; (2) recommender system including definition, classification, evaluation, application domains Besides, this chapter also presents weaknesses of existing recommender systems based on rules mining and statistical implication analysis as the basis for sketching research proposals CHAPTER RECOMMENDATION MODELS BASED ON STATISTICAL IMPLICATION FIELD 2.1 Collaborative filtering recommendation model based on implicative variation 2.1.1 The problems of rule-based recommender system For recommender systems, association rule mining (ARM) algorithms face a number of problems that make the quality of the rules not good enough for recommendations, including (1) ARM framework only deals with binary data; (2) The time and quality requirements of the rule for the recommended problem have not been met; (3) Confidence of the rule is insensitive and does not show the correlation between premises and consequences; (4) Symmetrical measures such as confidence, lift and some other measures are not suitable for recommendation problems where the role of items/users is not always the same; (5) The support decreases with the increase in the size of the rule; (6) The number of rules generated increases exponentially with the number of items; and (7) The nature of the association rule mining framework is not concerned with the number of counter-examples, while in fact a rule has a confirmation number (𝑛𝐴𝐵 ) the higher and the number of counterexamples (𝑛𝐴𝐵̅ ) and the lower the counter-examples, the stronger the rule Therefore, using statistical implication analysis measures is a possible solution to address these limitations 2.1.2 Statistical implication variation measure and the threshold of implication variation Measures are one of the key elements in building recommendation models, for a collaborative filtering recommender model based on association rule mining using implication variation measures, in addition to the framework measures In order to exploit the rule as support and reliability, it is necessary to build a measure of implication variation to filter out a set of implicative equipotential surface of the rules as the basis for recommendations of the recommender model Statistical implication variation measure The proposed measures used for the recommender model based on association rule mining using implication variation include measures of the value of the implication index 𝑞(𝑎, 𝑏̅) and magnitude of implication 𝜑(𝑎, 𝑏) variation according to factors 𝑛, 𝑛𝐴 , 𝑛𝐵 𝑛𝐴𝐵̅ described in Table 1.1 Table 1.1 Statistical implication variation measure Measure Description Implication index 𝑞𝑛 variation according to 𝑛 Implication index 𝑞𝑛𝐴 variation according to 𝑛𝐴 𝑞𝑛𝐵 Implication index variation according to 𝑛𝐵 formulas 𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛 = 𝑞(𝑎, 𝑏̅) + 𝑛𝐴 𝑛𝐵 ̅ 𝑛 𝜑𝑛 𝜑𝑛𝐴 𝜑𝑛𝐵 𝜑𝑛𝐴𝐵̅ Implication index variation according to 𝑛𝐴𝐵̅ Implication intensity variation according to 𝑛 Implication intensity variation according to 𝑛𝐴 Implication intensity variation according to 𝑛𝐵 Implication intensity variation according to 𝑛𝐴𝐵̅ (𝑛𝐴𝐵̅ + ) 𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐴 = 𝑞(𝑎, 𝑏̅) + − 𝑛𝐴𝐵 ̅ √ 𝑛𝐵 ̅ 𝑛 𝑛 ( ) − √ 𝐵̅ 𝑛 𝑛 𝐴 𝐴 𝑛 𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐵 = 𝑞(𝑎, 𝑏̅) + 𝑛 𝑛𝐴𝐵̅ ( 𝐴) − 𝑞𝑛𝐴𝐵̅ 2√𝑛 𝑛 − 1 𝑛 (𝑛 − 𝑛𝐵 )−2 + ( 𝐴)2 (𝑛 − 𝑛 𝑛𝐵 ) 𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐴𝐵̅ = 𝑞(𝑎, 𝑏̅) + 𝑛 (𝑛−𝑛𝐵 ) √ 𝐴 𝑛 𝜑(𝑎, 𝑏) + ∆𝜑𝑛 = 𝜑(𝑎, 𝑏) + 𝑞𝑛 (𝑎,𝑏̅ ) −𝑡 𝑒 ∫ √2𝜋 𝑞(𝑎,𝑏̅ ) 𝑑𝑡 𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐴 = 𝜑(𝑎, 𝑏) + 𝑞𝑛 (𝑎,𝑏̅) −𝑡 𝑒 ∫ 𝐴 √2𝜋 𝑞(𝑎,𝑏̅ ) 𝑑𝑡 𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐵 = 𝜑(𝑎, 𝑏) + 𝑞𝑛 (𝑎,𝑏̅ ) −𝑡 𝑒 ∫ 𝐵 √2𝜋 𝑞(𝑎,𝑏̅ ) 𝑑𝑡 𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐴𝐵̅ = 𝜑(𝑎, 𝑏) + 𝑞𝑛 ̅ (𝑎,𝑏̅ ) ∫ 𝐴𝐵 √2𝜋 𝑞(𝐴,𝐵̅) 𝑒 −𝑡2 𝑑𝑡 Threshold of Statistical implication variation In the experiment, on a equipotential surface consisting of a set of rules whose implication values are approximately the same with an implication threshold 𝜃, his threshold of implication variation needs to be determined Depending on the measure, there is a threshold of implication index variation and a threshold of implication intensity variation 12 dataset Training set Evaluating measures Testing set Evaluation model Evaluating result Recommender Recommender model Result of model algorithms Figure 2.3 Recommender system model evaluation precedure In which k-fold cross-evaluation (with k=5) with the number of repetitions is as the method used, the data is divided into training and test sets according to the number of transactions in the data set The evaluation procedure is depicted in the flowchart in Figure 2.4, whereby the evaluation measures used include two groups: (1) predictive accuracy (MAE, MSE and RMSE) and (2) classification accuracy of recommended items (Precision, recall, and F1) Hình 2.4 Flowchart of the recommender system evaluation algorithm 2.2 field Recommender system model based on statistical implication 13 2.2.1 Problems about recommender systems based on statistical implication analysis Existing recommender models based on statistical implication analysis, including association rule mining recommender models using statistical implication variation, are contributing to enriching solution studies to improve the efficiency of the collaborative filtering recommender system However, they still have some limitations such as (1) Only processing on binary data, leading to a problem that needs to be solved is combinatorial explosion and information loss due to processing non-binary data; (2) For the rule mining-based models of these works, implication measures are all proposed in the post-processing stage of the rule mining task, as a result, they not contribute significantly to the limitation of the combinatorial explosion of rule results in large data sets, which require large processing time and storage space To overcome these limitations, the recommender model based on the statistical implication field is proposed based on the development and improvement of the recommender model based on association rule mining using implication variation 2.2.2 Implication rule and implication rule mining framework The recommender model based on the statistical implication field has extended the association rule mining framework into the implication rule mining framework, including Modeling the quantitative implication rule To solve the limitation of association rule mining framework on non-binary data, the quantitative implication (hereinafter referred to as implication rule) is built on frequent item sets that satisfy both reliability and validity and implication variation in rule generation, this helps to solve problems on non-binary data and effectively contributes to limiting combinatorial explosion during rule generation Like association rule, implication rule is also modeled as equation (2.5): 14 | ≤ 𝑛𝐴 ≤ 𝑛𝐵 ≤ 𝑛 , ≤ 𝑛𝐴𝐵̅ ≤ 𝑛𝐵 𝑙𝑒𝑛𝑔ℎ𝑡ℛ𝐼𝑀𝑃 ≤ 𝑘 ℛ𝐼𝑀𝑃 = (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) { |𝑟ℎ𝑠ℛ𝐼𝑀𝑃 | = (𝑠𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝𝑝, | 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 𝑚𝑖𝑛𝑐𝑜𝑛𝑓 𝑆𝐼𝐴 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑖𝑚𝑝 ℜ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑)} (2.5) where ℜ determined by the equation (2.6) " ≤ ", ℜ={ " ≥ ", 𝜕𝑞(𝑎, 𝑏̅) imp 𝜖 { | 𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )} 𝜕𝜉 𝜕𝜑(𝑎, 𝑏) imp 𝜖 { | 𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )} 𝜕𝜉 (2.6) Modeling the implication rules mining framework The implication rule mined by the implication rule mining framework is developed from the association rule mining framework as shown in Figure 2.5 and modeled according to the formula (2.7) Figure 2.5 Flowchart of the implication rule mining framework algorithm 𝐹𝑅 𝐼𝑀𝑃 𝐼𝑅𝑀 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 ≤ 𝑛𝐴 ≤ 𝑛𝐵 ≤ 𝑛 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑠, ≤ 𝑛𝐴𝐵̅ ≤ 𝑛𝐴 = {( )| } 𝑠𝑚𝑖𝑛 ≤ 𝑠, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑐, 𝑆𝐼𝐴 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑖𝑚𝑝 𝑐𝑚𝑖𝑛 ≤ 𝑐, 𝑖𝑚𝑝𝑚𝑖𝑛 ℜ 𝑖𝑚𝑝 (2.7) 15 This framework works in the following steps (1) Use the apriori algorithm to generate frequent item sets that satisfy the support threshold 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 from the matrix 𝑅𝑈𝐼 transformed from the data set 𝐷 This step inherits the algorithm (2) Build implication variation measures imp and integrates into the rule mining framework to generate implication rules from frequent itemsets that satisfy the minimum confidence threshold and satisfy implication variation measures; (3) building and extracting equipotential surface according to the threshold of variation 𝜃 for recommendation 2.2.3 Proposed model The proposed statistical implication field-based recommender model is shown in Figure 2.6, this model evolves from the recommender model based on association rule mining using implication variation through additional developments such as following (1) implication rule mining framework evolved from association rule mining framework to generate implication rules from binary and non-binary data sets; (2) adding a data partitioning approach to building, training and evaluating recommender models based on the number of items evaluated per transaction of the dataset to improve model training and make the model have better results; (3) The recommender system evaluation algorithm has added a group of evaluation measures based on the proposed item position rating (including nDCG and RankScore measures) so that the evaluation reflects more deeply the effectiveness of the recommender model Figure 2.6 Recommender model based on Implication Field 16 2.2.4 Evaluation of the proposed model The evaluation procedure of the recommender model is still the same as that of the collaborative filtering recommender model based on implication variation, also using the k-fold crossevaluation method but with two important additions as follows (1) In addition to the method of partitioning the observed data into training and test sets according to the number of transactions in the data set, the model is also supplemented with a partitioning method according to the number of evaluation items on each transaction to solve the issue of " bottleneck" in determining the number of known items in advance for too sparse data in the recommender problems, which helps to increase the efficiency of model training, making the recommendation quality better (2) A position-ranking measures of items in the recommendation list of the model are added to evaluate recommendations' quality, include nDCG and Rankscore, as shown in the model evaluation algorithm in Figure 2.7 Figure 2.7 Flowchart of the recommendation model evaluation algorithm 2.3 Chapter summary Chapter proposes a new approach based on the implication variation in the implication field to mine association rules in the collaborative filtering recommender problem The first proposal is a collaborative filtering recommender system model based on implicative variation to improve the efficiency of rule-based and 17 memory-based collaborative filtering recommender models on binary data sets Next, the recommender model based on the implication field is proposed by upgrading the original proposed model on the nonbinary data set to further improve its performance compared with existing collaborative filtering recommender models and recommender models based on statistical implication analysis 18 CHAPTER EXPERIMENT AND RESULTS This chapter focuses on presenting the organization and implementation of evaluation experiments, comparing the models, that were proposed in Chapter 2, with the memory-based and rule-based collaborative filtering recommender models combined, in addition, it is also compared with the previously proposed statistical implication recommender models 3.2 Experimental tools Experiments were performed on implicationfieldRS tools developed in R language that inherit the RecommenderLab tool packages for building and evaluating recommender system models and the Rchic tool package for processing statistical implication information 3.3 Experiment of collaborative filtering recommender model based on implicative variation The association rule mining recommender model using implicative variation is built in two approaches of item-based and user-based and therefore they are conducted experimentally, compared with the collaborative filtering models also in the two directions above The implicationfieldRS tool was used to conduct the experiment 3.3.1 Item-based recommender model The model is tested on the Movielens dataset with a binary threshold of (that is, movie ratings of or more are assigned and if different) The model is evaluated, compared offline with collaborative filtering recommender models on two groups of evaluation measures: predictive accuracy evaluation group (MAE, MSE and RMSE); and recommendative classification accuracy evaluation group (Precision, recall, and F1) according to the following experimental scenarios Scenario Survey and recommendation based on implication variation equipotential surface The model has generated an implication field consisting of a set of implication equipotential surface of association rules satisfying the threshold of implication variation, these surfaces have irregular density, 19 high implication density in the equipotential surface of values with index values imply less variation and more concentration of values in equipotential surface and less in other equipotential surface This shows the agreement of the rule with the trend of the implication index, when the implication index varies to a certain amount, then the rule is not accepted at a certain implication threshold it will move to another equipotential surface with a more suitable implication threshold And so, it will help to advise users on the data items with the most appropriate level of implication A target user will be recommended the movie or list of movies that he or she will like according to the respective rule content based on the previous movies they have seen based on the rules in the equipotential surfaces Scenario Comparison of recommendation item prediction accuracy with collaborative filtering recommender models The experimental results show that the accuracy of predicting the recommendation item of the recommendation model based on association rule mining using implicative variation (ISF) has superior results, the prediction error evaluation indexes of RMSE, MSE and MAE of the ISF model are the lowest, followed by the user-based collaborative filtering models including the model using the Cosine measure (UBCFcosine), using the Pearson measure (UBCFpeason) and finally the Item-based collaborative filtering models include those using Consine degrees (IBCFcosine), and using Pearson measures (IBCFpeason) Thereby, it shows that the measure of implicative variation and the custom association rule mining framework to satisfy the implication measure contribute to the association rule mining model to improve the recommendation results significantly Scenario Comparison of classification accuracy with collaborative filtering recommender models The experimental results of the ISF model have superior classification accuracy of IBCFcosine, IBCFpeason, and UBCFpeason models and are close to the accuracy of the UBCFcosine model through the evaluation of precision, recall, and ROC curve 3.3.2 user-based recommender model The evaluation is similar to that in User-based association rule mining recommender model using implicative variation, also 20 performed on the Movielens dataset and on the same scenarios as done on the mining recommender model association rule using implication variation by user The experimental results obtained on the scenarios are similar to the experiments on Item-based association rule mining recommender model using implicative variation Through two experiments, Item-base and User-based association rule mining recommender model using implicative variation, it shows that the recommendation model has contributed significantly to improving the collaborative filtering recommender model by association rule mining model 3.3 Experiment of recommender model based on statistical implication field The recommender model based on the statistical implication field was experimentally evaluated by the k-fold cross-validation method (with k = 5) and repeated twice, on the MSWeb binary data set and the non-binary Binary Movielens dataset, these datasets are partitioned by the number of transactions and by the number of items evaluated on transaction 3.3.1 Experiment on data partitioned by the number of transactions Scenario Comparison of association rule model and implication rule model on binary data set Compared with the collaborative filtering recommender model based on association rule model, the experimental results on the precision classifiers precision, recall, F1 as well as the ROC curve and recall/precision on the data recommender model based on implication field is much better Scenario Comparison of association rule model and implication rule model on quantitative dataset On the quantitative data set, the classification accuracy based on the Precision, recall, and F1 measures of the IFARRS recommender model is also much better than the recommender model based on the association rule mining model Scenario Recommender performance and timing 21 This scenario compares the performance and recommendation generating time (including model building time and recommendation item prediction) between the recommender model based on the implication field and the association rule mining model Experiments show that the recommender model based on the statistical implication field has faster model construction and execution time, respectively, by 53% (the time to build the recommender model) and 37% (the time to execute the recommender model) is based on association rule mining, while the generated rule set is reduced to about 9% compared to the rule set generated by the recommender model based on association rule mining This meets the requirement of time and better processing rule set for a recommender system Scenario Comparison with collaborative filtering recommender models on quantitative data set Comparing according to the classification accuracy criteria, the statistical implication field-based recommender model gives superior results compared to the collaborative filtering recommender models both on the item and on the traditional user using the Cosine and Pearson similarity measure 3.3.2 Experiment on partitioned data according to the evaluation item of the transaction Analysis of equipotential surfaces in the implication field The survey results are presented in the form of 3D scatter plots and 3D graphs, representing the common warm (red) equipotential surfaces in the implication intensity range from 0.8 to 1.0, and the remaining scattering are equipotential surfaces whose magnitudes are implication to decrease with decreasing color (blue) The survey results are presented in a countour graph, whereby the implicative field with equipotential surfaces has a spectrum of varying values of implicative intensity concentrated in the range of 0.8 to is represented by the gray spectrum, and the rest is represented by the green gradient color spectrum The implicative intensity variation on equipotential surfaces is presented in 3D, it is easy to see that the implication patterns with high implicative intensity concentrated on warm colored equipotential surface and decrease rapidly in the low intensity region indicated in 22 blue Common recommendations will be filtered on high-intensity equipotential surfaces, while recommendations for rare and specific items will be provided in lower implication equipotential surfaces Scenario Comparison with traditional recommendation models In this experiment scenario, the statistical implication field-based recommender system model (ISFRS), is compared with the traditional user-based collaborative filtering recommendation models for both Cosine measures ( 𝑈𝐵𝐶𝐹_𝑐𝑅𝑆 ) and Pearson ( 𝑈𝐵𝐶𝐹_𝑝𝑠𝑅𝑆 ), and recommends item-based model collaborative filtering for both Cosine (𝐼𝐵𝐶𝐹_𝑐𝑅𝑆) and Adjusted Cosine (𝐼𝐵𝐶𝐹_𝑎𝑐𝑅𝑆) measures, The dataset used in this experiment is Movielens non-binary dataset For the collaborative filtering models to have good results, by testing on many neighboring parameters 𝑘 = 2,5,10,15 and it is found that k = 15 is better than other values The recommendation models have been tested on measures of two groups of measures: classification and rating First, the models are tested on the categorical accuracy measures, the results include the ROC curve, precision /recall, F1, whereby the 𝐼𝑆𝐹𝑅𝑆 model is the best, followed by the User-based collaborative filtering model collaboration uses both Pearson and Cosine measures, and finally the weakest model is the item-based collaborative filtering model (in the case of both Pearson and the adjusted Cosine measures The results in this experiment show the contribution of both the proposed 𝐼𝑆𝐹𝑅𝑆 model and the proposed data partitioning method to the assessment in improving the classification and rating ability as well as the mining quality generation of the model compared to the proposed models based on traditional collaborative filtering Scenario Comparison with implication recommendation models In this experiment scenario, the MSWeb binary dataset is used to compare the implication field recommender system (ISFRS) model with two other existing statistical implication analysis application models including works using the implication index and implication intensity (𝐼𝐼𝐼𝑅𝑆) and the model using the implicative measure Phicoherence measure - Cohesion- and importance measure -Gamma (𝑃𝐶𝐺𝑅𝑆) on two types of measures as in scenario First are the classification accuracy measures including precision/recall, ROC and F1 curves, the experimental results show the superiority of the 𝐼𝐹𝑆𝑅𝑆 23 recommendation model over the 𝑃𝐶𝐺𝑅𝑆 model and the model 𝐼𝐼𝐼𝑅𝑆 model, in which the weakest is the 𝐼𝐼𝐼𝑅𝑆 model on all measures The second is the rating accuracy measures, the experimental results shown are also quite similar to the results on the group of accuracy classification measures, that is, the 𝐼𝑆𝐹𝑅𝑆 model has the best results rating categories according to the criteria 𝑛𝐷𝐶𝐺 and 𝑅𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒 measure, followed by the 𝑃𝐶𝐺𝑅𝑆 model and the worst is the 𝐼𝐼𝐼𝑅𝑆 model This suggests that the recommendation system based on the statistical implication field is potentially better in both classification and rating than the existing statistical implication recommendation model The experiment demonstrated that the proposed 𝐼𝑆𝐹𝑅𝑆 solved the three problems of these systems Therefore, it is clear that this is a new and promising trend in applying statistical implication analysis theory to the field of recommender systems 3.4.Chapter summary Chapter focuses on organizing the implementation of experiments to evaluate the models proposed in Chapter 2, including the preparation of data sets, experimental tools, execution of experimental scenarios Accordingly, they are experimentalised and compared with memorybased and rule-based collaborative filtering recommender models In addition, they are also compared with the existing statistical implicative analysis approach recommender models Experimental results show that the proposed models in the thesis have significantly contributed to improving the effectiveness of the recommendation system 24 CONCLUSION AND FUTURE WORKS Results of the study The thesis has been contributions to the recommender system, including: Firstly, propose a set of measures of statistical implication variation for the collaborative filtering recommendation problem on binary and non-binary data sets Secondly, propose an implication rule mining framework based on inheriting the advantages of association rule mining framework and the implication variation measure to improve the accuracy, scales and the rules-mining time for recommender model Thirdly, propose recommendation models based on the implication variation approach, including: The first is the implication variationbased collaborative filtering recommendation model on binary dataset The second, the recommendation model based on the statistical implication field was developed on the basis of the initial model to extend the model's processing to non-binary data and further improve the quality of the recommendation Fourthly, propose a new method to partition the dataset into training and test sets to improve the efficiency of training and testing model on sparse dataset for recommender systems based on the rated items on each transaction Finally, building the implicationfieldRS experimental toolkit on R language and experimental scenarios for evaluating recommendation model in the thesis Future works Extending the implication rule mining framework to hyper-rules that are relationships between implication rules or between data and implication rules in the implication field for the recommender models - Expanding the data processing direction on other data types such as vector data for recommendation problems - Expanding the application of the trend of implication variation to other measures like cohesion, typical, contribution - Developing a hybrid recommendation system between implicative field-based recommender and other systems to improve the quality of recommendations 25 PUBLISHED WORKS [1] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh, Raphaël Couturier, (2017), Recommended based on asymmetric user relations using TIMP (temporal implicative) measure, IX International Conference A.S.I Analyse Statistique Implicative – Statistical Implicative Analysis (ASI9), Franch, pp.493,507 [2] Nguyễn Tấn Hoàng, Huỳnh Hữu Hưng, Huỳnh Xuân Hiệp, 2017, “Tư vấn dựa biến thiên số hàm ý trường hàm ý”, Hội thảo quốc gia lần thứ X nghiên cứu ứng dụng Công nghệ thông tin (FAIR’17); Đà Nẵng, pp.938-950 [3] Nguyễn Tấn Hoàng, Huỳnh Hữu Hưng, Huỳnh Xuân Hiệp, (2017), “Tư vấn lọc cộng tác theo mục dựa độ biến thiên số hàm ý trường hàm ý”, Hội thảo quốc gia @ lần thứ 20 nghiên cứu ứng dụng Công nghệ thông tin; Quy Nhơn, pp.372-379 [4] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh (2018) Collaborative filtering recommendation with threshold value of the equipotential plane in implication field, the 2nd International Conference on Machine learning and Soft computing (ICMLSC2018); Phu Quoc island, Vietnam ISBN: 978-1-4503-6336-5 doi>10.1145/3184066.3184072 (Scopus index) [5] Hoang Tan Nguyen, Phan Phuong Lan, Hung Huu Huynh, Hiep Xuan Huynh (2019) Improved collaborative filtering recommendations using quantitative implication rules mining in implication field , the 3rd International Conference on Machine learning and Soft computing (ICMLSC2019); Dalat, Vietnam ISBN: 978-1-4503-6612-0 doi>10.1145/3310986.3310996 (Scopus index) [6] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh (2018), Collaborative filtering recommendation in the implication field, International Journal of Machine Learning and Computing (IJMLC) 26 2018 doi: 10.18178/ijmlc.2018.8.3.690 (Scopus index) [7] Hoang Tan Nguyen, Phan Phuong Lan, Hung Huu Huynh, Hiep Xuan Huynh (2019), Recommendation with quantitative implication rules, EAI Endorsed Transactions on Context-aware Systems and Applications, 2019 doi: 10.4108/eai.13-7-2018.156837 [8] Hoang Tan Nguyen, Phan Phuong Lan, Hung Huu Huynh, Hiep Xuan Huynh (2021), Collaborative recommendation based on implication, International Journal of Advanced Computer Science and Applications,Vol 12, No 10, 2021 (Scopus index) ... Hoàng, Huỳnh Hữu Hưng, Huỳnh Xuân Hiệp, 2017, ? ?Tư vấn dựa biến thiên số hàm ý trường hàm ý? ??, Hội thảo quốc gia lần thứ X nghiên cứu ứng dụng Công nghệ thông tin (FAIR’17); Đà Nẵng, pp.938-950 [3]... Hưng, Huỳnh Xuân Hiệp, (2017), ? ?Tư vấn lọc cộng tác theo mục dựa độ biến thiên số hàm ý trường hàm ý? ??, Hội thảo quốc gia @ lần thứ 20 nghiên cứu ứng dụng Công nghệ thông tin; Quy Nhơn, pp.372-379