1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo nghiên cứu khoa học: Predictive modeling for student performance in education

44 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Predictive Modeling for Student Performance in Education: A Data Mining Approach
Tác giả Tran Quoc Dang, Ngo Minh Chau, Nguyen Ngoc Quynh, Van Vu Thanh Phuong, Nguyen Thi Lan Phuong
Người hướng dẫn Tran Thi Oanh, Advisor
Trường học Vietnam National University, Hanoi
Chuyên ngành Informatics and Computer Engineering
Thể loại Student Research Report
Năm xuất bản 2024
Thành phố Hanoi
Định dạng
Số trang 44
Dung lượng 1,29 MB

Nội dung

VIETNAM NATIONAL UNIVERSITY, HANOIINTERNATIONAL SCHOOL STUDENT RESEARCH REPORT PREDICTIVE MODELING FOR STUDENT PERFORMANCE IN EDUCATION: A DATA MINING APPROACH Team Leader: Tran Quoc Dan

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI

INTERNATIONAL SCHOOL

STUDENT RESEARCH REPORT

PREDICTIVE MODELING FOR STUDENT PERFORMANCE

IN EDUCATION: A DATA MINING APPROACH

Team Leader: Tran Quoc Dang

Hanoi, 2024

Trang 2

TEAM LEADER INFORMATION

- Program: Informatics and Computer Engineering

- Address: Ha Dong - Hanoi

- Phone no /Email: 0332192968/20070819@vnu.edu.vn

II Academic Results (from the first year to now)

III Other achievements

1 Received the "Five Good Student" award at the university level in 2020-2021

2 Recognized for outstanding contributions to student union and youth movementactivities in 2021-2022 & 2022-2023

3 Achieved top 20 placement in the IStartup 2023 entrepreneurship competition

4 Delegate/Representative of Vietnam National University in AUN AELP 2023 held in Singapore

-5 Delegate/Representative of Vietnam National University - International School

in TF Scale 2024 - held in Singapore and Vietnam

Trang 3

6 Achieved a remarkable #638 ranking in the global WiDS Datathon 2021.

7 Top 10 placement in the Cybersecurity track of the JunctionX Hackathon 2023

Team Member: (no more than 5 people)

2 Nguyen Ngoc Van

Quynh

Phuong

Advisor

(Sign and write full name)

Tran Thi Oanh

Hanoi, 2024

Team Leader

(Sign and write full name)

Tran Quoc Dang

Trang 4

We are deeply grateful to Mrs Tran Thi Oanh invaluable guidance, support, and

insightful contributions throughout the research process Her expertise and

encouragement have played a fundamental role in the successful completion of thisresearch paper

We would like to express our sincere appreciation for the care and support provided byour teacher From the initial stages of ideation to the final completion phase, she has beeninstrumental in guiding us and providing the necessary motivation to overcome

challenges along the way

Without her unwavering support, we would not have been able to accomplish this

research We extend our heartfelt gratitude for her significant contributions and eagerlyanticipate future collaborations on upcoming projects

Trang 5

TABLE OF CONTENTS

CHAPTER 2: INVESTIGATING THE INFLUENCE OF FACTORS ON STUDENT

CHAPTER 3: PRACTICAL APPLICATIONS AND CONSIDERATIONS 33

Trang 7

LIST OF FIGURES

Figure 1.Predictive Modeling Framework for Student Performance 21

Figure 4 Educational Data Mining Process diagram 24

Figure 5 The flowchart of the prediction model 25

Figure 7 Total score between female and male (Insight 1) 27

Figure 8.Lunch types comparison (Insight 2) 28

Figure 10 Random Forest Regressor performance

Figure 14 Average scores for standard vs free/reduced lunch

Figure 15 Average scores grouped by parental education level

36

Trang 8

Figure 16 Average scores for each race/ethnicity group 37

Figure 17 Average scores for students with/without test prep

Figure 18 Correlations between math scores and continuous features 38

LIST OF TABLE

Trang 9

LIST OF ABBREVIATIONS

MAE

Mean Absolute Error

Trang 10

English

Traditional educational settings often lack data-driven tools to proactively supportstudent performance This study presents a novel predictive modeling frameworkutilizing machine learning algorithms (e.g., decision trees, random forests) to addressthis challenge The model integrates student demographics, academic history, andteaching methodology data to forecast student outcomes in traditional offlineclassroom settings Results indicate that a student's parental level of education andparticipation in test preparation courses were strong predictors of performance,demonstrating the potential of this framework to enhance decision-making andresource allocation in traditional education These findings highlight the power ofpredictive modeling to personalize learning approaches and provide data-driveninsights to educators working within established offline teaching structures

Vietnamese

Các cơ sở giáo dục truyền thống thường thiếu các công cụ dựa trên dữ liệu để chủ động

hỗ trợ thành tích học tập của học sinh Đề tài nghiên cứu này trình bày một khuôn khổ

mô hình dự đoán mới sử dụng các thuật toán học máy (ví dụ: cây quyết định, rừng ngẫunhiên) để giải quyết thách thức này Mô hình này tích hợp dữ liệu nhân khẩu học, lịch sửhọc tập và phương pháp giảng dạy của học sinh để dự đoán kết quả học tập trong các bốicảnh lớp học truyền thống Kết quả chỉ ra rằng trình độ học vấn của cha mẹ học sinh vàviệc tham gia các khóa luyện thi là những yếu tố dự báo mạnh mẽ về kết quả học tập,cho thấy tiềm năng của khuôn khổ này trong việc nâng cao việc ra quyết định và phân bổnguồn lực trong giáo dục truyền thống Những phát hiện này nhấn mạnh sức mạnh của

mô hình dự đoán trong việc cá nhân hóa các phương pháp học tập và cung cấp các hiểubiết sâu sắc dựa trên dữ liệu cho các nhà giáo dục làm việc trong các cấu trúc giảng dạy

Keywords

Predictive Modeling, Data-Driven Decision Making, Offline Learning, StudentPerformance, Academic History, Teaching Methodology, Parental Education Level, TestPreparation Courses, Resource Allocation, Personalized Learning

Trang 11

Tran Quoc Dang 20070819 ICE2020A Informatics and

Computer Engineering

4th

Trang 12

I INTRODUCTION

In this chapter, the study introduces the topic of predictive modeling in education,highlighting its significance in addressing student performance challenges It outlines theresearch objectives, focusing on identifying factors influencing student mathperformance and developing predictive models to aid educational decision-making Byframing the research within the context of educational challenges and the potential ofpredictive modeling, this chapter sets the stage for subsequent discussions, emphasizingthe importance of personalized learning and equitable educational outcomes

2 Motivation

Addressing these challenges is crucial for both individual student success and thebetterment of society as a whole Predictive modeling offers a powerful tool to tacklethis issue, yet much research focuses on online or blended learning environments Thisstudy seeks to address this gap by investigating how predictive modeling can empowereducators specifically within traditional offline classroom settings By understanding thefactors influencing student performance in this context, educators can proactively tailortheir instructional strategies and support systems

3 Research Methods

This study employs a quantitative approach, utilizing the "Students Performance inExams" dataset and focusing on variables such as parental education level, lunchprogram participation, and test preparation Decision tree or random forest algorithms

Trang 13

will be used to build predictive models for math performance Model interpretability will

be prioritized to provide actionable insights for educators, allowing them to understandnot just which students are at risk, but why

● Algorithms: Decision tree and/or random forest algorithms will be prioritized due

to their ability to handle diverse data types and provide insights into featureimportance

5 Object and Scope of the Study

● Scope of Research

The primary focus is predicting performance in mathematics using the "StudentsPerformance in Exams" dataset

Trang 14

Due to potential dataset limitations, the model may not generalize perfectly to allstudent populations or educational contexts.

Trang 15

II LITERATURE REVIEW

Provides a comprehensive review of literature relevant to predictive modeling ineducation, synthesizing theoretical and empirical insights It examines previous research

on student performance prediction, identifying key concepts, methodologies, andfindings By summarizing existing knowledge and identifying gaps in the literature, thischapter establishes the theoretical framework for the study, emphasizing the need forfurther investigation into the factors influencing student achievement and the efficacy ofpredictive models in educational settings

● Introduction

Predictive modeling is gaining traction as a powerful tool within the educationallandscape, offering the potential to personalize learning experiences andproactively target student support By leveraging student data and machinelearning algorithms, predictive models can reveal patterns and identify factors thatinfluence academic performance This literature review explores existing research

on predictive modeling in education, focusing on the features commonlyinfluencing student outcomes, the effectiveness of various modeling techniques,and the ethical considerations surrounding their implementation It specificallyexamines these trends in light of this study's focus on parental education,socioeconomic status (SES), and their impact on math performance

● Related Works

Research into the application of predictive modeling within education hasexpanded significantly in recent years A substantial body of work underscoresthe impact of socioeconomic factors (SES) and parental education on studentoutcomes Decision trees and random forests are particularly popular due to theirability to handle complex relationships and their interpretability Let's examinekey trends in this research:

- Focus on Socioeconomic Factors

Trang 16

Numerous studies have demonstrated the predictive power of SES indicators, such

as participation in free or reduced-price lunch programs, on student achievement[Durga et al., 2020] Additionally, a strong correlation exists between parentaleducation levels and students' academic performance These studies highlight theneed to consider equity and social determinants when developing predictivemodels

- Varied Algorithms and Outcomes

Researchers have employed a range of machine learning algorithms in educationalprediction tasks Linear models [Cui et al., 2019], support vector machines, andneural networks have been explored alongside decision trees and random forests.These studies have focused on various outcome variables, including overall GPA,success in specific subjects, and likelihood of dropout [Kurni et al., 2023]

- The Need for Nuance and Interpretability

While predictive models hold promise, it's crucial to go beyond simple predictionand aim for interpretability Understanding which factors have the greatestinfluence on student outcomes is essential for designing effective interventions.Some research highlights the potential for bias in predictive models, calling forfairness and transparency in their development [Durga et al., 2020]

- Gaps and Opportunities

While predictive modeling in education has made significant strides, there remainopportunities to enhance its impact on understanding and improving student mathperformance Existing research often suffers from limitations in dataset size anddiversity, potentially hindering the ability to draw generalizable conclusionsacross different student populations and school settings Additionally, there's aneed to expand the range of features investigated Focusing on traditionaldemographics may overlook other influential factors, such as student engagement

Trang 17

or access to outside-of-class resources Finally, ensuring models are not onlyaccurate but also interpretable is crucial This allows educators to move beyondpredictions and towards actionable insights that guide targeted interventions.

This study directly addresses these gaps by utilizing a comprehensive dataset thataims to represent a broader spectrum of students The inclusion of variables liketest preparation participation may reveal previously underappreciated factorssignificantly impacting math scores Most importantly, this research emphasizesmodel interpretability This will enable educators to understand the reasoningbehind predictions and translate data-driven insights into customized supportstrategies, ultimately enhancing student success in mathematics

- Contributions of This Study

This study aims to provide a nuanced understanding of the relationship betweenparental education, socioeconomic factors (indicated by lunch type), testpreparation, and student math performance within traditional offline classrooms.Its key contributions are:

+ Quantifying Impact: By using a comprehensive dataset, the modelquantifies the relative influence of these specific factors on math scores.This allows educators to prioritize interventions targeting the areas with thehighest potential for improvement

+ Focus on Equity: Investigating socioeconomic indicators as predictorsbrings attention to the potential disparities in educational outcomes.Understanding how these factors operate in the model is crucial to promoteequitable resource allocation and support systems for disadvantagedstudents

+ Actionable Insights: The emphasis on model interpretability provides

Trang 18

predictions are made These insights allow them to tailor their instructionand support strategies to address specific student needs.

● Factors Influencing Student Performance

Educational research consistently highlights the complex ways in whichsocioeconomic factors and parental education level shape student outcomes.Students from lower-SES backgrounds often face disadvantages due to limitedresources, reduced access to learning opportunities, and less potential foracademic support at home Similarly, parental education level can profoundlyimpact a student's ability to receive help outside of school, particularly in subjectslike mathematics, where conceptual understanding is key This study seeks toquantify the impact of these specific factors on math performance, providingdata-driven insights to inform resource allocation and targeted interventions

● Predictive Modeling in Education

The application of predictive modeling in education is an evolving field withpromising results Decision trees and random forests are widely employed due totheir ability to handle diverse data types and their interpretability, both of whichare crucial for understanding the factors driving predictions and ensuring

educators can apply insights [Smith & McKenna, 2013] Other algorithms likelinear regression, support vector machines, and neural networks have also beenexplored [Shahiri et al., 2015] While reported success rates vary, this

underscores the importance of careful dataset selection, feature engineering, andmodel evaluation for educational applications [Kumar et al., 2017]

● Limitations and Gaps

While predictive modeling holds promise, it's crucial to acknowledge limitations

in the existing research Many studies rely on relatively small datasets, potentiallyhindering model generalizability to diverse student populations [Romero &

Trang 19

Ventura, 2010] Further, there is a need to move beyond traditional demographics,exploring student motivation and engagement to build more nuanced predictivemodels [Baker & Inventado, 2014] Doing so is vital to ensure that predictionsand subsequent interventions based on those predictions are equitable and avoidperpetuating existing biases.

● Conclusion

Predictive modeling has the potential to transform educational practice byproviding data-driven insights into factors influencing student performance.Existing research highlights the importance of features such as parental education,socioeconomic status, and test preparation Future work can enhance the field byincorporating larger and more diverse datasets, exploring a wider array ofstudent-level features, and continually refining modeling approaches for fairnessand effectiveness This study contributes directly to this evolution, aiming toprovide actionable insights specifically focused on math achievement

● Future Directions

To enhance the robustness and generalizability of the findings, future work couldincorporate a larger and more diverse dataset representing a wider range ofstudent populations

Trang 20

III METHODOLOGY

The research methodology and data collection process are described in detail It outlinesthe research design, data sources, variables, and statistical techniques employed in thestudy By providing a clear overview of the study's methodology, this chapter ensurestransparency, replicability, and rigor in the research process It sets the stage for dataanalysis and interpretation, laying the groundwork for the empirical findings presented insubsequent chapters

● Modeling Approach

This study aims to predict student math performance, a classification problem.Random Forest classifiers were employed due to their ability to handle non-linearrelationships and provide insights into feature importance

● Data Preprocessing

Data preprocessing involved addressing missing values, encoding categoricalfeatures, and standardizing numerical features for model compatibility

● Model Training and Evaluation

The models were trained using k-fold cross-validation to prevent overfitting Themodels' performance was evaluated using accuracy, precision, and recall metrics

● Feature Importance Analysis

Feature importance was determined using the built-in feature_importances_

attribute of the Random Forest classifier to identify factors most influential onstudent math performance

Trang 21

CHAPTER 1: PREDICTIVE MODELING IN EDUCATION

1 The Need for Predictive Models

Traditional educational systems often face challenges such as the difficulty of identifyingat-risk students early, the inefficient allocation of limited resources, and the lack ofpersonalization in interventions Predictive modeling offers a powerful tool to addressthese challenges By leveraging student data and machine learning algorithms, predictivemodels can uncover patterns, make predictions about future performance, and provideeducators with valuable insights to guide proactive support

Figure 1 Predictive Modeling Framework for Student Performance.

Trang 22

Predictive modeling employs a range of data analysis and statistical techniques to predictfuture outcomes Machine learning, a field within artificial intelligence, centers aroundalgorithms that learn patterns from data without being explicitly programmed Thisproject will explore decision trees and random forests These methods offer severaladvantages in the educational context:

● Decision Trees: These models create a tree-like structure of decisions based on

student features, leading to a prediction Their visual nature aids in understandingthe factors most influential in determining student performance

Figure 2 Simple decision tree diagram.

● Random Forests: An ensemble method combining multiple decision trees,

resulting in models that generally achieve higher accuracy and are less susceptible

to overfitting

Ngày đăng: 08/10/2024, 02:15

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
2. Hanushek, E. A., Woessmann, L., & Machin, S. (2011). The economics of international differences in educational achievement. Handbook of the economics of education (Vol. 3, pp.89-200) Sách, tạp chí
Tiêu đề: differences in educational achievement. Handbook of the economics of education (Vol. 3, pp
Tác giả: Hanushek, E. A., Woessmann, L., & Machin, S
Năm: 2011
6. Li, J. (2023, January 9). Dynamic Interaction between Student Learning Behaviour and Learning Environment: Meta-Analysis of Student Engagement and Its Influencing Factors. MDPI.https://www.mdpi.com/2076-328X/13/1/59 Sách, tạp chí
Tiêu đề: Environment: Meta-Analysis of Student Engagement and Its Influencing Factors. MDPI
Tác giả: Li, J
Năm: 2023
7. Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State of the Art.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601-618 Sách, tạp chí
Tiêu đề: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6)
Tác giả: Romero, C., & Ventura, S
Năm: 2010
11. Cui, Y., Chen, F., Shiri, A., & Fan, Y. (2019). Predictive analytic models of student success in higher education: A review of methodology. Information and Learning Sciences, 120(3/4), 208-227 Sách, tạp chí
Tiêu đề: higher education: A review of methodology. Information and Learning Sciences, 120(3/4)
Tác giả: Cui, Y., Chen, F., Shiri, A., & Fan, Y
Năm: 2019
3. Hattie, J., & Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81-112. https://journals.sagepub.com/doi/10.3102/003465430298487 Link
4. Kiattisak, R. (n.d.). Students Performance in Exams [Data set]. Students Performance in Exams.https://www.kaggle.com/datasets/spscientist/students-performance-in-exams Link
1. Baker, R. S., & Inventado, S. P. (2014). Educational Data Mining and Learning Analytics.Handbook of Educational Data Mining, 61-75 Khác
5. Kumar, R. (2017). Predictive Modeling for Student Performance. Proceedings of the International Conference on Learning Analytics & Knowledge, 24-28 Khác
8. Shahiri, A. M. (2015). Exploring Predictive Models in Education. International Journal of Information and Education Technology, 5(6), 442-446 Khác
9. Smith, J., & McKenna, P. (2013). Predictive Modeling in Education. Journal of Educational Analytics, 1(1), 1-15 Khác
10. Kurni, M., Mohammed, M. S., & Srinivasa, K. G. (2023). Predictive Analytics in Education. In A Beginner’s Guide to Introduce Artificial Intelligence in Teaching and Learning (pp. 55–81) Khác
12. Durga, V. S., & Thangakumar, J. (2020). Predictive Education—from Idea to Implementation. In Advances in Smart System Technologies (Vol. 1163, pp. 759–764) Khác

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN