A Model to Forecast Learning Outcomes for Students in Blended Learning Courses Based On Learning Analytics44977

A Model to Forecast Learning Outcomes for Students in Blended Learning Courses Based On Learning Analytics Viet Anh Nguyen Quang Bach Nguyen Vuong Thinh Nguyen VNU University of Engineering and VNU University of Engineering and Vietnam Maritime University Technology Technology 484 LachTray - NgoQuyen - Haiphong E3, 144 Xuan Thuy, Cau Giay, Hanoi E3, 144 Xuan Thuy, Cau Giay, Hanoi thinhnv@vimaru.edu.vn vietanh@vnu.edu.vn 14020652@vnu.edu.vn ABSTRACT One of the difficulties experienced by online learners is the lack of regular supervision as well as the need to provide instructions to support the learning process more effectively The analysis of the learning data in the online courses is not only becoming increasingly important in forecasting learning outcomes but also providing effective instructional strategies for learners to help them get the best results In this paper, we propose a forecast learning outcomes model based on learners’ interaction with online learning systems by providing learning analytics dashboard for both learners and teachers to monitor and orient online learners This approach is mainly based on some machine learning and data mining techniques This research aims to answer two research questions: (1) Is it possible to accurately predict learners' learning outcomes based on their interactive activities? (2) How to monitor and guide learners in an effective online learning environment? To answer these two questions, our model has been developed and tested by learners participating in the Moodle LMS system The results show that 75% of students have outcomes close to the predicted results with an accuracy of over 50% These positive results, though done on a small scale, can also be considered as suggestions for studies of using learning analytics in predicting learning outcomes of learners through learning activities CCS Concepts Applied computing➝ Education➝ E-learning Keywords Learning analytics, learning activities, learning outcomes, predictive modeling, forecast model INTRODUCTION Online learning support systems are being increasingly invested and developed There are not only huge numbers of online learning websites such as the edx.com, coursera.com but also a lot of online learning platforms such as Moodle, and Blackboard Besides the advantages, online learning also has many defects Learners often not have information about the learning outcomes in each stage during the learning process to adjust the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee Request permissions from Permissions@acm.org ICSET 2018, August 13–15, 2018, Taipei, Taiwan © 2018 Association for Computing Machinery ACM ISBN 978-1-4503-6528-4/18/08…$15.00 DOI: https://doi.org/10.1145/3268808.3268827 learning method properly In addition, a learner does not receive timely support from his instructor These things can make an online training process less effective Teachers also not have accurate information about learners such as the interaction level, the comprehension level Therefore, they can not track the learning process of their learners to suggest and orient in time to help learners get the desired result Moreover, teachers also lack the feedback of learners on the content of the lesson and the learning activities not have the content adjustments to help students get the highest results To improve the quality of online learning platforms, we need additional tools to help teachers and learners track the learning process, interactions, suggestions, and feedback This means that it offers recommendations for effective learning These tools must address the shortcomings of the online learning system: suggesting and regulating the learner interaction with the course is essential and should be prompted in time to help learners adjust the interaction level as well as learning methods to achieve the desired results An instructor also needs to have an overview of the interaction of trainees with the course to remind them Furthermore, teachers can rely on learners' interactions to adjust the way they give lessons to learners? In addition, to forecast the learning outcomes of learners in each stage of the course plays a more and more important role Based on the forecast results, a learner can imagine the final result and change his own study to achieve better results in the next time In this study, we focus on the predictive model over time to forecast student outcomes The basis for making predictions about future learner outcomes is based on the data of learners’ interaction with the system, along with the learning outcomes of other students who have taken the previous courses and learning analytics techniques Learning Analytics (LA) is a topic of increasing interest in the educational research communities [1] Learning data is generated from more and more learning activities but most of them are not being used effectively LA creates tools for data mining and creates models that help improve online learning systems LA includes five steps: data collection, reporting, prediction, action, and improvement [2] LA provides not only useful information about the learning process, the relationship between learners but also specific actions, suggestions, and warnings for learners to improve their learning outcomes LA will provide the ability to collect and analyze data from various sources to provide information on what works and what does not involve teaching and learning LA allows individuals and organizations to understand the learning and make informed decisions about the allocation of resources and interventions needed to promote learner success [3] In this paper, we propose a model to predict learning outcomes over time and alert learners based on learning analytics We use machine learning algorithms to evaluate these interactions to predict learners' learning outcomes and help teachers give suggestions and warnings to learners about learning progress In addition, this study also focuses on clarifying the influence of interactions between learners and online learning systems on learning outcomes of learners To better understand the research results, the following contents of the paper are presented in the following structure: - The next section presents some researches related to predicting and forecasting results based on learning analytics - In section 3, we describe our forecasting results model based on the interactive activities of the learners - Some results of the research and model testing are presented in section 4, some of the exchanges and implications are presented in section 5, and finally the conclusions LITERATURE REVIEW To deal with the variety of collected data, LA has many techniques to achieve the users’ goals The most used technique is building a predictive model which has the ability to forecast uncertain future events [4] The predictive model helps learners to understand more deeply about their learning process and provides solutions for learners to next Administrators can use forecasting models to predict learning trends, forecast the number of courses, which are needed for learners [5] There are some commonly used algorithms for forecasting models such as Linear Regression, Logistic [6], Decision Trees [7], Naïve Bayes Classifiers… But, which algorithm is the most appropriate? Comparing algorithms can help to find out the optimal algorithm for forecasting models I Babic's [3] compared three algorithms: SVM, Classification Tree (CART) and Artificial Neural Networks (RBF model) The classification models were divided into two groups according to SDI (Self Determination Index) with four attributes V1 - assign view, V2 - forum view discussion, V3 questionnaire view, V4 - resource view The RBF model achieved the highest result of 76.92% so that the RBF model is the most suitable model for dynamic learning prediction Along with the development of machine learning algorithms, forecasting models are increasingly developed, so the reliability of the model is increasing Content which is taught by teachers becomes more and more diversified These data contain a lot of information about teaching content, interactive content… They can help the analysts know the trader's trends, the quality of the material that helps to improve teaching and learning LA definite content analysis technique is an automated method to test, evaluate, index, filter, present and visualize different types of digital learning content, regardless of the manufacturer such as instructors, students with the goal of understanding learning, improving practice and educational research [9] Recently, many methods and algorithms have been proposed and successfully implemented to predict learning outcomes Bovo [10] and colleagues created a tool to get data from Moodle and they used clustering algorithms to split learner's data into groups The research has shown that using clustering algorithms can yield clusters, but it is difficult to distinguish between clusters, which are not enough to describe data Consequently, it is not enough to use the clustering algorithm to predict student learning outcomes We need to describe the differences between clusters to evaluate and make predictions Another model to predict is the use of logistic regression suggested by Barber and Sharkey [6] In this study, the user data, which were the input data of Logistic regression, were divided into two groups of low- risk and high- risk multiple times with different properties The discrimination rate of the two groups was about 98% With Logistic regression algorithm, we can predict the possibility of passing or slipping and providing timely information to the instructor, but learners still can not know theirs ability to learn in order to adjust their learning To classify learners, we can use many other classification algorithms such as Decision Tree, Naive Bayes, In his research, Romero [11] compared different classification algorithms to evaluate the performance of algorithms when predicting learning outcomes through user interaction on the online learning system The author concludes: It is not the only algorithm that achieves the best classification accuracy in all cases The accuracy of algorithms is not high, only about 65% so it is difficult to predict by classification algorithms It is also the performance forecast of the learner, the study of Ali Daud and colleagues [12] provide another perspective The authors argue that family financial factors have a great impact on job performance and excitement in learning, so the paper focuses on attributes related to spending and financial ability of the family Research uses classification algorithms to predict the attributes such as academic performance, family income, family assets, student personal information, and family expenditures SVM algorithm is the most suitable algorithm for the given problem Some studies rely on prior time to predict the outcome of the present Thakur and his colleagues [13] used the Autoregressive model, which is based on the idea of present values based on past values, to analyze time series to explore and describe relative variability in classes The authors proposed predictive models based on Logistic Regression and Feed-Forward Neural Network The Feed-Forward Neural Network algorithm gave better results, especially week 1, averaged over the three weeks is 84% Through their observations, they found that the homework attribute most influenced the results and added the "post" attribute in the forum When adding this attribute, the predicted accuracy rate was higher with both algorithms The study by Althaf Hussain Basha and colleagues [14] suggested that the percentage of students passing the course was dependent on the factors of the previous school year by using the formula OP = Cr * SVOP * 1.05 where OP is the number of students, Cr is the number of students involved, SVOP is the ratio of student passing/number of students taking part of the previous year The predicted number of students passing the course with the error rate was 7-8% Zacharis [15] used a linear regression model to calculate the student's point value The author found that the four attributes with the highest correlation coefficient with the point to create regression model were reading and posting messages, Content creation contribution, quiz effort and file viewed Next, the author used Binary Logistic Regression to predict students who were at risk of slipping and achieve an accuracy of 81.3% From the above studies, we can see some problems that need to be resolved: (1) Algorithms cannot accurately predict the learning outcomes of learners (2) Using only a single algorithm, it is difficult to predict the learning outcomes of learners (3) Data selection, pre-processing (cleaning, standardization, etc.) is very important, having a great impact on the results achieved METHODS 3.1 Participants Participating in this course prediction model is 290 students in the second and third year of the 4-year course, taking part in three online courses built on Moodle All students are students in information technology, studying at the Faculty of Information Technology of the VNU-University of Engineering and Technology, so they are proficient in using computers and using online courses Each course is implemented in 15 weeks in the form of blended-learning, specifically 81 students enrolled in the first course, 150 students enrolled in the second course and 59 students enrolled in the third course 3.2 The forecast learning outcomes model Our learning outcomes model focuses on some of the key elements of student engagement when participating in online learning The number of views/posts of learners on the materials is used to measure the degree of interactivity, frequency of use, level of focus on the materials, at what times To assess the interaction of learners with one another and learners with teachers, we need to get more information about the course discussions A number of viewings/postings of learners in course forums can help us in this case That is typical for the interaction, which is viewed, answered in the course topics Submission deadline: Assessment of homework assignments is also quite important Learners who have mastered the lesson will usually the assignment and submit the papers early and on time, however those who not understand the lesson often take more time to complete so they will be able to submit later or not to submit These can be considered as a basis for forecasting results The results are forecasted at some point throughout the course for each learner to help the learner adjust his / her learning behavior In addition, the overall student performance forecast is provided for the teachers Particularly, they are provided with an overview of student involvement as well as their results in some phases throughout the course The prediction model, described in Figure 1, consists of the following main components: •Log Analysis Module: The module will retrieve data directly from the tables in the Moodle database, which will collect some data on the number of views/post of student, course information, student information, submitted assignments, progress of assignments UET Analyitics Model In this model, we apply some specific techniques including classifying, clustering, regression to forecast learning outcomes for learners The basic steps for making the resulting forecast are described in Figure This can be considered as a two-stage process (1) A regression model is used to predict the amount of learners' interaction to the time they need to anticipate learning outcomes, for example at week 7th or week 15th of the semester (2) Classify learners into classes labeled A, B, C, D, F Data Trainning Log Database Clustering Predict value of interaction factors Filtering & Labeling Classifying Figure The predictive module workflow To perform the first stage (1), we collected data from rd, 6th, 7th, 10th, 13th, 15th weeks, then created regression models for 3rd to 6th week and 6th to 7th week, 7th to 10th week, 10th to 13th week, and 13th to 15th week We found that the data interaction from 3rd week to 7th week and the 15th week had much lower accuracy than predicted between successive weeks In the second phase (2), our study data have not been labeled, if manual labeling will take a lot of time and effort so we use the K-means algorithm to clone the original data and label each cluster automatically The K-means algorithm is used because after checking the Silhouette coefficient, this algorithm has a larger value, as described in Table 1, as the clusters are dense and distinct Table Compare the Shihoutte correlation coefficient result of some algorithms Log Analysis Module Algorithm Forecast Result Forecast Results Silhouette cofficient Forcast Results Forecast Module Learner Teacher Notification Module Figure The forecast learning outcomes model •Predictive Module: This module can be viewed as the core component of the predictive model We use machine learning techniques to analyze the collected information, classify learners and predict outcomes for learners KMeans 0.6808 Birch 0.6442 agglomerative clustering 0.644205 In this step, we grouped learners into three clusters: a cluster of learners who were at risk of not achieving the desired learning outcomes, a high-impact cluster and a medium-scale cluster We distinguish them based on the cluster average Labeling student who is at risk of not completing the course is F With the remaining two clusters we continue to use a clustering algorithm to split each cluster into two small clusters With the highly interactive cluster, we divide it into two small clusters labeled A and B With the intermediate cluster, we divide it into two clusters labeled C and D In both cases, the labels are based on point’s average of the cluster After assigning labels to all clusters, we continue to filter the actual point values that not match the scores for A from 8-10, B from 6.5-7.9, C from 5.5-6.5, D from to 5.5, F

Định dạng
Số trang	7
Dung lượng	359,94 KB