Handbook of Educational Data Mining Chapman & Hall/CRC Data Mining and Knowledge Discovery Series SERIES EDITOR Vipin Kumar University of Minnesota Department of Computer Science and Engineering Minneapolis, Minnesota, U.S.A AIMS AND SCOPE This series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and handbooks The inclusion of concrete examples and applications is highly encouraged The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues PUBLISHED TITLES UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS David Skillicorn TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS Ashok N Srivastava and Mehran Sahami COMPUTATIONAL METHODS OF FEATURE SELECTION Huan Liu and Hiroshi Motoda BIOLOGICAL DATA MINING Jake Y Chen and Stefano Lonardi CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS Sugato Basu, Ian Davidson, and Kiri L Wagstaff KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT David Skillicorn MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY Zhongfei Zhang and Ruofei Zhang NEXT GENERATION OF DATA MINING Hillol Kargupta, Jiawei Han, Philip S Yu, Rajeev Motwani, and Vipin Kumar DATA MINING FOR DESIGN AND MARKETING Yukio Ohsawa and Katsutoshi Yada INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS Vagelis Hristidis TEMPORAL DATA MINING Theophano Mitsa RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS Bo Long, Zhongfei Zhang, and Philip S Yu KNOWLEDGE DISCOVERY FROM DATA STREAMS João Gama STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION George Fernandez THE TOP TEN ALGORITHMS IN DATA MINING Xindong Wu and Vipin Kumar INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS AND TECHNIQUES Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION Harvey J Miller and Jiawei Han HANDBOOK OF EDUCATIONAL DATA MINING Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d Baker Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Handbook of Educational Data Mining Edited by Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d Baker MATLAB® is a trademark of The MathWorks, Inc and is used with permission The MathWorks does not warrant the accuracy of the text or exercises in this book This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2011 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Printed in the United States of America on acid-free paper 10 International Standard Book Number: 978-1-4398-0457-5 (Hardback) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To my wife, Ana, and my son, Cristóbal Cristóbal Romero To my wife, Inma, and my daughter, Marta Sebastián Ventura To my wife, Ekaterina, and my daughter, Aleksandra Mykola Pechenizkiy To my wife, Adriana, and my daughter, Maria Ryan S J d Baker Contents Preface .xi Editors xv Contributors xvii Introduction Cristóbal Romero, Sebastián Ventura, Mykola Pechenizkiy, and Ryan S J d Baker Part Iâ•… Basic Techniques, Surveys and Tutorials Visualization in Educational Environments Riccardo Mazza Basics of Statistical Analysis of Interactions Data from Web-Based Learning Environments 27 Judy Sheard A Data Repository for the EDM Community: The PSLC DataShop 43 Kenneth R Koedinger, Ryan S J d Baker, Kyle Cunningham, Alida Skogsholm, Brett Leber, and John Stamper Classifiers for Educational Data Mining 57 Wilhelmiina Hämäläinen and Mikko Vinni Clustering Educational Data .75 Alfredo Vellido, Félix Castro, and Àngela Nebot Association Rule Mining in Learning Management Systems 93 Enrique García, Cristóbal Romero, Sebastián Ventura, Carlos de Castro, and Toon Calders Sequential Pattern Analysis of Learning Logs: Methodology and Applications .107 Mingming Zhou, Yabo Xu, John C Nesbit, and Philip H Winne Process Mining from Educational Data 123 Nikola Trcˇ ka, Mykola Pechenizkiy, and Wil van der Aalst 10 Modeling Hierarchy and Dependence among Task Responses in Educational Data Mining 143 Brian W Junker vii viii Contents Part IIâ•… Case Studies 11 Novel Derivation and Application of Skill Matrices: The q-Matrix Method .159 Tiffany Barnes 12 Educational Data Mining to Support Group Work in Software Development Projects 173 Judy Kay, Irena Koprinska, and Kalina Yacef 13 Multi-Instance Learning versus Single-Instance Learning for Predicting the Student’s Performance 187 Amelia Zafra, Cristóbal Romero, and Sebastián Ventura 14 A Response-Time Model for Bottom-Out Hints as Worked Examples 201 Benjamin Shih, Kenneth R Koedinger, and Richard Scheines 15 Automatic Recognition of Learner Types in Exploratory Learning Environments .213 Saleema Amershi and Cristina Conati 16 Modeling Affect by Mining Students’ Interactions within Learning Environments .231 Manolis Mavrikis, Sidney D’Mello, Kaska Porayska-Pomsta, Mihaela Cocea, and Art Graesser 17 Measuring Correlation of Strong Symmetric Association Rules in Educational Data 245 Agathe Merceron and Kalina Yacef 18 Data Mining for Contextual Educational Recommendation and Evaluation Strategies .257 Tiffany Y Tang and Gordon G McCalla 19 Link Recommendation in E-Learning Systems Based on Content-Based Student Profiles 273 Daniela Godoy and Analía Amandi 20 Log-Based Assessment of Motivation in Online Learning 287 Arnon Hershkovitz and Rafi Nachmias 21 Mining Student Discussions for Profiling Participation and Scaffolding Learning 299 Jihie Kim, Erin Shaw, and Sujith Ravi 2 Analysis of Log Data from a Web-Based Learning Environment: A Case Study 311 Judy Sheard Contents ix 23 Bayesian Networks and Linear Regression Models of Students’ Goals, Moods, and Emotions .323 Ivon Arroyo, David G Cooper, Winslow Burleson, and Beverly P Woolf 24 Capturing and Analyzing Student Behavior in a Virtual Learning Environment: A Case Study on Usage of Library Resources 339 David Masip, Julià Minguillón, and Enric Mor 25 Anticipating Students’ Failure As Soon As Possible 353 Cláudia Antunes 26 Using Decision Trees for Improving AEH Courses 365 Javier Bravo, César Vialardi, and Alvaro Ortigosa 27 Validation Issues in Educational Data Mining: The Case of HTML-Tutor and iHelp .377 Mihaela Cocea and Stephan Weibelzahl 28 Lessons from Project LISTEN’s Session Browser .389 Jack Mostow, Joseph E Beck, Andrew Cuneo, Evandro Gouvea, Cecily Heiner, and Octavio Juarez 29 Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks 417 Zachary A Pardos, Neil T Heffernan, Brigham S Anderson, and Cristina L Heffernan 30 Mining for Patterns of Incorrect Response in Diagnostic Assessment Data 427 Tara M Madhyastha and Earl Hunt 31 Machine-Learning Assessment of Students’ Behavior within Interactive Learning Environments 441 Manolis Mavrikis 32 Learning Procedural Knowledge from User Solutions to Ill-Defined Tasks in a Simulated Robotic Manipulator .451 Philippe Fournier-Viger, Roger Nkambou, and Engelbert Mephu Nguifo 33 Using Markov Decision Processes for Automatic Hint Generation .467 Tiffany Barnes, John Stamper, and Marvin Croy 34 Data Mining Learning Objects .481 Manuel E Prieto, Alfredo Zapata, and Victor H Menendez 35 An Adaptive Bayesian Student Model for Discovering the Student’s Learning Style and Preferences 493 Cristina Carmona, Gladys Castillo, and Eva Millán Index 505 ... Han HANDBOOK OF EDUCATIONAL DATA MINING Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d Baker Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Handbook of Educational. .. Purpose of This Book The goal of this book is to provide an overview of the current state of knowledge of educational data mining (EDM) The primary goal of EDM is to use large-scale educational data. .. Beck) of the First International Conference on Educational Data Mining, and is an associate editor of the Journal of Educational Data Mining and a founder of the International Working Group on Educational