Comparing the learning from intelligent tutoring systems, non-intelligent computer- based versions, and traditional classroom instruction.

1 Michael Mendicino Educational Psychology West Virginia University Neil Heffernan Computer Science Department Worcester Polytechnic Institute Journal of Interactive Learning Research Title: Comparing the learning from intelligent tutoring systems, non-intelligent computerbased versions, and traditional classroom instruction Abstract There have been some studies that show that computer-assisted instructional systems (CAI) can be superior to traditional classroom instruction (Kulik, 1983, 1994, 2003; Bangert-Drowns, Kulik & Kulik, 1985) Other studies have compared new “intelligent tutoring systems” (ITS) to classroom instruction (Koedinger, Anderson, Hadley, & Mark, 1997; Anderson, Corbett, Koedinger, & Pelletier, 1995) while many studies have compared Intelligent tutoring systems to CAI-like controls (Carroll & Kay, 1988; Corbett, & Anderson, 2001; Mathan, 2003; Schooler, & Anderson, 1990) We are aware of no studies that have taken a single ITS and compared it to both: 1) classroom instruction and 2) CAI In this study we compare these three (classroom instruction, CAI and ITS) using a newly developed ITS (Heffernan & Koedinger, 2003).We seek to try to quantify the value added of CAI over classroom instruction, versus the value-added of ITS on top of CAI We found evidence that the ITS was much better than the classroom instruction but with an effect size of only 0.6 Our results in trying to calculate the value-added of the CAI over the classroom were mixed, with studies showing effects but the third one not showing statistically reliable differences The extra value-added of the ITS over CAI did seem to be robust across the three studies with an average effect size Introduction There have been some studies that show that traditional computer- assisted instructional systems (CAI) can be superior to traditional classroom instruction (Kulik, 1983, 1994, 2003; Bangert-Drowns, Kulik & Kulik, 1985) Other studies have compared new so-called “intelligent” tutoring systems (ITS) to classroom instruction (Koedinger & Anderson, 1993; Koedinger, Anderson, Hadley, & Mark, 1997; Anderson, Corbett, Koedinger, & Pelletier, 1995) while many studies have compared intelligent tutoring systems to CAI-like controls (Carroll & Kay, 1988; Corbett, & Anderson, 2001; Mohan, 2003; Schooler, & Anderson, 1990) We are aware of no studies that have taken a single ITS and compared it to both:1) classroom instruction and 2) CAI In this study we compare all three with respect to student learning and “motivation” in the algebra domain Kulik’s (1985 & 1994) studies suggest CAI systems lead to about 0.3 to 0.5 standard-deviation effect sizes over classroom instruction The Koedinger, et al., (1997) study which compared a commercially available ITS (Cognitive Tutor) to a classroom control suggest a standard-deviation effect size for experiment designed metrics, while for external metrics (The Iowa Algebra Aptitude test and a subset of the Math SAT) found an effect size of 0.3, but this study may also suffer from a confound of the effect of the ITS with a new text-book prepared to go along with the curriculum We are uncertain how to compare these effect sizes with the Kulik and Kulik effect size of about 0.4 as we don’t know if the metrics in the Kulik and Kulik studies are more generally like externally designed measures or experiment defined measures In another study, VanLehn et al (2005) compared an ITS not to classroom instruction, but to doing homework in a traditional paper-and-pencil manner They found results similar to the Cognitive Tutor results mentioned above with effect sizes of about SD for their own measures, and about 0.4 for what they consider analogue to “externally designed measures” In this study we compare these three (classroom instruction, CAI and ITS) using a newly developed ITS (Heffernan & Koedinger, 2003).We seek to try to quantify the value added of CAI over classroom instruction, versus the value-added of ITS on top of CAI How much more learning does adding an “intelligent tutor” get you over CAI? This question is important because ITSs are complex and costly to build and we need to understand if it’s worth the investment, or maybe CAI is good enough? We this in the context of a mathematics classroom while teaching the skill of writing algebra expressions for word problems, a skill we call symbolization In this paper we report three experiments, with three teachers and a total of 160 students All studies involved analyzing the amount of learning by students within a one classroom period, measured by experimenter constructed pre- and posttests the day before and after the experiment Seven of the items were experimenter-designed questions and two were standardized test questions Area of Math we focused on Students in the United States lag behind many other countries in math skills, particularly at the eighth and twelfth grade levels (TIMSS, 1997) While US eighth grade students showed improvement in math, scoring above the international average (TIMMS, 2003), better math instruction that integrates technology is still needed to ensure continued improvement One skill students have difficulty with is writing algebra expressions for word problems, a skill we call symbolization Heffernan & Koedinger (1997) stated that “symbolization is important because if students cannot translate wordproblems into the language of algebra, they will not be able to apply algebra to solve real world problems.” The need for this skill is more crucial now because students have access to graphing calculators and computers that can perform symbol manipulation skills, but translating word problems into the symbolic language of algebra remains a uniquely human endeavor Human Tutoring Studies indicate that experienced human tutors provide the most effective form of instruction known For example, Cohen, Kulik & Kulik (1982) performed a meta-analysis on the findings of 65 independent evaluations of school tutoring programs and found that tutoring raised performance on student achievement as measured on examinations with an effect size of 0.4 standard deviation units (called sigma’s).The authors did not state whether the outcome measures were externally designed or experimenter designed Bloom (1984), however, found significantly better results In his study, tutors produced effect sizes of sigma’s over students receiving standard classroom instruction Human tutors also enhanced learning with an effect size of over students in a mastery learning condition Other studies have been conducted to determine what behaviors make human tutoring effective and how these behaviors can be incorporated into computer-based tutoring systems (McArthur, Stasz, & Zmuidznasa, 1990; Merrill, Reiser, Ranney and Trafton 1992; Graesser & Person, 1994; Chi, Siler, Jeong, Yamauchi, & Hausmann, 2001;) For example, Merrill, et al.,(1992) concluded that a major reason human tutors are effective is that they let students most of the work in overcoming impasses, while providing only as much assistance as necessary and keeping students from following “garden paths” of reasoning that are unlikely to lead to learning VanLehn, Siler, & Murray (2003) also found that allowing students to reach impasses correlated with learning gains Finally, numerous studies (Swanson, 1992; Grasser, Person, & Magliano, 1995; Chi, Siler, Jeong, Yamauchi, & Hausmann, 2001; Katz, Connelly, & Allbritton 2003) hypothesized that it is the interactive nature of the tutorial dialog (i.e., interaction hypothesis) that accelerates learning Computer-Assisted Instruction Computer-based tutoring systems appear to hold promise for improving mathematics instruction The first computer-based tutoring systems appeared over thirty years ago with the goal of approaching the effectiveness of human tutors According to Corbett & Trask (2000) these systems, called computer-assisted instruction (CAI) afforded one advantage of human tutors: individualized interactive learning support While these systems were interactive and provided explicit instruction in the form of long web pages or lectures they offered no dialog Studies demonstrated the effectiveness of CAI in mathematics at the elementary level (Burns & Bozeman, 1981) secondary level (Kulik, Bangert, & Williams, 1983) and college level (Kulik, Kulik, & Cohen, 1980) In a meta-analysis of 28 studies involving CAI, Kulik et al., (1985) found that CAI improved student achievement by an average effect size of 0.47 over students receiving conventional instruction In another meta-analysis Kulik (1994) summarized 97 studies from the 1980’s that compared classroom instruction to computer-based instruction and found an average effect size of 0.32 in favor of computer-based instruction Kulik claimed that students learned more and learned faster in courses that involved computerbased instruction Finally, Kulik (2003) summarized the findings of eight meta-analyses covering 61 studies published after 1990 The median effect size for studies using computer tutorials was 0.59 meaning that students who received computer tutorials performed in the 72nd percentile while students receiving conventional instruction performed in the 59th percentile While these studies suggest that CAI can be an effective instructional aid in both elementary and secondary schools, CAI does not address the main concern of McArthur et al., (1990) who claims that teaching tactics and strategies are the least well developed components of most intelligent tutors Model-tracing ITS With advances in computer science research on artificial intelligence and in cognitive psychology research on human learning and performance, the next generation of computer-based tutoring systems moved beyond the simple presentation of pages of text or graphics These new intelligent tutoring systems (ITS ) called cognitive tutors, incorporated model-tracing technology which is a cognitive model of student problem solving that captures students’ multiple strategies and common misconceptions The cognitive tutor uses model-tracing to understand students’ input and to indicate when they have made a mistake With model-tracing, cognitive tutors provide students individualized assistance that is just-in-time and sensitive to the students’ particular approach to a problem (Anderson, Corbett, Koedinger, and Pelletier 1995) They also provide canned explanations and hint messages that get more explicit as students continue asking for help until the tutor is telling the student exactly what to The feedback is immediate and step-wise and is structured so as to lead students toward expert-like performance The tutor intervenes as soon as students deviate from the solution path, but the cognitive tutor does not engage students in dialog by asking new questions Cognitive tutors also use knowledge tracing technology that traces student’s knowledge growth across problem solving activities and uses this information to select problems and adjust the pacing to adapt to individual student needs Even though these new cognitive tutors not engage students in dialog, they have nonetheless had a significant impact on student learning in a variety of domains For example, Koedinger, Anderson, Hadley, & Mark (1997) compared a cognitive tutor, PAT (Pump Algebra Tutor) to traditional algebra instruction The PAT intelligent tutor was built to support the Pittsburgh Urban Mathematics Project (PUMP) algebra curriculum that is centrally focused on mathematical analysis of real world situations and the use of computational tools The study evaluated the effect of the PUMP curriculum and PAT tutor use and found that students in the experimental classes outperformed control classes by 100% on assessments of the targeted problem solving and multiple representations These results also translated into a one standard deviation effect size Recent studies comparing PAT and traditional algebra instruction have found improvements in the 50100% range thus replicating the above results (Koedinger, Corbett, Ritter, & Shapiro, 2000) This cognitive tutor is currently used by approximately 375,000 students in over 1000 schools Morgan & Ritter (2002) conducted a study comparing the Cognitive Tutor Algebra I course and a Traditional Algebra I Course which used a different text with students in their junior high school system Dependent measures included the Education Testing Service (ETS) Algebra I end-of-course exam, course grades and a survey of attitudes towards mathematics These measures certainly seem to have the benefit of not having been defined by the experimenters themselves When restricting the analysis to only those teachers who taught both curricula, the researchers found statistically significant differences on all dependent measures in favor of the cognitive tutor Morgan and Ritter state that the strongest components of teacher effects have to with teacher education and professional development and only indirectly with practices In their study the curriculum effect that they were examining had to with teacher practices which would be expected to be relatively small Therefore, they conclude that the effect size of 0.29 is impressive taken in this context Finally, as part of the Andes project, VanLehn et al., (2004) evaluated Andes an ITS developed to replace paper-and-pencil homework and to increase student learning in introductory college physics courses Andes provides immediate feedback to student responses and also provides three kinds of help including: 1) pop up error messages when the error is probably due to lack of attention rather than lack of knowledge, 2) What’s Wrong Help when the student is essentially asking what is wrong with that, and 3) Next Step Help if students are not sure what to next The What’s Wrong and Next Step Help selections generate a hint sequence that includes a pointing hint, a teaching hint, and a bottom-out hint that tells students exactly what to Andes was evaluated from 1999 to 2003 and in all years Andes students scored higher than control students with effect sizes ranging from 0.21 to 0.92 VanLehn et al compared their results to the results of the Koedinger et al (1995) study which they suggest is the benchmark study with respect to tutoring systems The Koedinger et al., study evaluated the PAT intelligent tutoring system and a novel curriculum (PUMP) which Carnegie Learning distributes as the Algebra I Cognitive Tutor Koedinger et al used both experimenter-designed questions and standardized tests While analyzing the experimenter-designed tests, they found effect sizes of 1.2 and 0.7 and effect sizes of 0.3 while analyzing multiple-choice standardized tests VanLehn et al found very similar effect sizes (1.21 & 0.69) for their conceptual experimenter-written tests and similar effect sizes, 0.29, for their multiple-choice standardized tests Thus, both evaluations have similar tests and effect sizes They both have impressive 1.2 and 0.7 effect sizes for conceptual, experimenter-designed tests, and lower effect sizes on standardized, answer only tests Given the large difference between experimenter-designed tests versus externally designed tests, it makes one wonder how to interpret the Kulik studies that argue that CAI , when compared to classroom instruction, gives between 0.3 and 0.7 effect sizes The authors of the Andes study stated, that their evaluation differed from the Koedinger et al evaluation in a crucial way The Andes evaluations manipulated only the way that students did their homework—on Andes vs on paper The evaluation of the Pittsburgh Algebra Tutor (PAT) was also an evaluation of the Pittsburgh Urban Mathematics Project curriculum (PUMP), which focused on analysis of real world situations and the use of computational tools such as spreadsheets and graphers Therefore, how much gain was due to the tutoring system and how much was due to the new curriculum is not clear Finally, VanLehn et al stated that in their study, the curriculum was not reformed; therefore, the gains in their evaluation may be a better measure of the power of intelligent tutoring systems per se Dialog-based Intelligent tutors Both CAI and cognitive tutors have proved to be more effective than traditional classroom instruction, yet neither has approached the effectiveness of human tutors Perhaps they have not captured the features of human tutoring that account for its effectiveness Researchers have recently developed ITSs that incorporate dialog that is based on human tutors in specific domains Preliminary results are promising We mention two related projects before focusing on Heffernan’s system used in this evaluation The Tutoring Research Group at the University of Memphis has developed AutoTutor (Graesser et al., 2001), an ITS that helps students construct answers to computer literacy questions and qualitative physics problems by holding a conversation in natural language thus taking advantage of the interaction hypothesis AutoTutor attempts to imitate a human tutor by reproducing the dialog patterns and strategies that were likely to be used by a human tutor AutoTutor presents questions and problems from a curriculum script, attempts to comprehend learner contributions that are entered by keyboard, formulates dialog moves that are sensitive to the learner’s contributions … and delivers the dialog moves with a talking head that simulates facial expressions and speech to give the impression of a discussion between the tutor and student (Graesser, Wiemer-Hastings, K., Wiemer-Hastings, P & Kreuz, 1999) AutoTutor has produced gains of 0.4 to 1.5 sigma depending on the learning performance measure, the comparison condition, the subject matter, and the version of AutoTutor (Graesser et al., 2003) Rosé C, Jordan P, Ringenberg M, Siler S, VanLehn K, and Weinstein A (2001) integrated Atlas and the Andes system to compare a model-tracing ITS with an ITS incorporating dialog Atlas facilitates incorporating tutorial dialog while Andes is a model-tracing ITS for quantitative physics that provides immediate feedback by highlighting each step attempted in either red or green to indicate a right or wrong answer Andes also provides a hint sequence for students asking for help The researchers were able to compare student learning between the original Andes and the integrated Atlas-Andes with dialog AtlasAndes students scored significantly higher on post-test measures with a difference of 0.9 standard deviations Heffernan & Koedinger (2002) & Heffernan, (2001) developed an intelligent tutoring system (ITS) that uses dialog to help students write algebra expressions It models both student behavior and tutorial behavior by combining a cognitive model of student behavior in the domain of symbolization with a tutorial model of strategies observed in human tutors The cognitive student model has a set of production rules that models the problem solving skills needed to write algebraic expressions The tutorial model is based on the observation of an experienced human tutor during an hour long tutoring session and thus captures tutorial strategies that were observed to be effective for human tutors in the domain of symbolization It was the first intelligent tutor that had both a model of student thinking and a model of tutorial planning The system is of the “coached practice” variety that is meant to scaffold “learning by doing” while students practice their problem solving skills It is different from typical Cognitive Tutors in that it takes its cues more from the dialogs that human tutors have with students and it is also more flexible in the interactions it has with students For example, it can acknowledge that part of an answer is correct and then engage a student in a “sub dialog” to help him or her to improve the incorrect path It “breaks” problems down for students by asking questions and re-phrasing questions, but does not give students answers Whereas Cognitive Tutors typically cover a full-year of algebra, Heffernan’s ITS deals only with symbolization problems The following is a sample tutorial dialog: Tutor: In the marching band there are fewer flute players than drummers If the number of drummers is "x", write an expression for the total number of flute players and drummers Student: 7-x+x {Correct answer is x+(x-7)} Tutor: No We will come back to the total number of flute players and drummers Let me try to break this down for you Please compute the number of flute players if the number of drummers was 10 drummers? {Note: the ITS notices that the student computed the number of drummers wrong, so she focuses in on that sub goal.} Student: 10-7 Tutor: Correct Now use "x" instead of 10 to again express the number of flute players Student: x-7 Tutor: Correct Now, lets try this question again: Please write an expression for the total number of flute players and drummers Student: x-7+x Tutor: Correct Let us go onto the next problem Heffernan (2001) investigated the impact on learning of two versions of his ITS In the control version, if students answered incorrectly the tutor told them the answer to type then moved on to another problem This approximates a common homework situation in which students can look up correct solutions in the back of the book In the experimental version, the ITS engaged students in tutorial dialog specific to student errors in an attempt to help students construct the correct answer Students in the experimental condition performed better on post-test measures showing evidence of learning form dialogs Heffernan only controlled for the number of problems in this experiment and not for time, therefore, he did not determined if the extra time spent in the dialogs was worth the effort Heffernan (2002) reported on a web-based experiment in which he controlled for time in an attempt to see if the learning gains students acquired were worth extra time students spent in dialog Heffernan found students in the experimental condition completed only half as many problems as students in the control condition, but still showed learning gains over the control condition with an effect size of 0.5 Heffernan also reported a possible motivation benefit to dialog In summary, Ms Lindquist seems like one example that supports the hypothesis that incorporating dialog into an ITS can lead to increases in student learning Heffernan & Croteau (2004) replicated some of the findings that Ms Lindquist seems to show some benefit over CAI for some lessons The purpose of these experiments is to replicate research comparing normal classroom instruction and CAI and to extend the research by also comparing supposed “intelligent” tutoring instruction to the other conditions We will test the hypothesis that “intelligent” dialog accounts for more learning than 1) computer-assisted instruction as well as 2) classroom instruction This investigation will seek to determine how much added value “intelligence” will account for above computer-assisted instruction when compared to classroom instruction We will also investigate differences in learning and motivation when comparing classroom instruction, computer-assisted instruction, and intelligent tutoring Experiment 1: Compare One Teacher to CAI and ITS In this experiment, students’ learning of symbolization skills is measured from pretests and post-tests administered before and after classroom instruction (traditional or cooperative), computer-assisted instruction, or computer instruction with additional intelligent tutoring Research Question: The research question for these studies was: Are the effects of computer-delivered instruction significantly better than the effects of classroom instruction on students’ ability to learn to symbolize? At a finer grained level, are the effects of intelligent tutoring feedback different than the effects of the simple nonintelligent tutoring approach of traditional CAI? Method Setting and Participants The study took place in the students’ regular algebra classrooms and in a computer lab that consisted of 20 computers with internet access The high school was located in a rural area and served approximately 1200 students Fortysix percent of the students received either free or free and reduced lunches According to Department of Education data on NCLB, this school ranked in the bottom half and did not meet AYP due to low socio-economic subgroup scores The participants for Experiment were students enrolled in equivalent algebra inclusion classes during the 2004-2005 school year The classes were not Honors or Advanced Placement, but were typical classes with students mostly of average ability One class had twenty-two students and the other had twenty-one students However, a total of seven students, four from one class and three from the other, were not included in the study because they missed at least one day during the experiment Therefore, a total of thirty-six students participated in the study, twenty-two females and fourteen males Fourteen were students identified as learning disabled, and twenty-two were typical regular education students There were thirty freshman and six sophomores ranging in ages from fourteen to sixteen years The classes were co-taught by a fully certified regular education math teacher and a highly qualified, math through algebra 1, special education teacher Both teachers shared responsibilities for teaching algebra content, lesson planning, and student accommodations The lead author was the primary instructor for both classes during the experiment, but not the students’ regular teacher Individual Education Programs were reviewed to ensure that the general classroom placement was the least restrictive and most appropriate algebra I placement for students with learning disabilities Content The computer curriculum is composed of five sections, starting with relatively easy one-operator problems (i.e., “7x”), and progressing up to more difficult four or five operator problems (i.e., “3x+5*(20-x”) The content of the item pre- and post-tests was identical and contained four multiple-choice questions and five questions requiring students to write algebraic expressions (See Appendix A for sample tests.) Seven of the items were experimenter-designed questions and two were standardized test questions An answer key was constructed and used by the scorer to award one point for each correct answer The classroom lessons were designed with items of similar content, format and difficulty level In fact, problems used in the classroom lessons were isomorphic to the computer lessons so no group had an unfair advantage (See Appendix B for sample classroom problems) Procedures Both the control and experimental conditions took place during the students’ regular fifty-minute class periods The classroom lessons were delivered by the lead author and the study was conducted over a one-week period with pretest, mid-test, and post-test administered on Monday, Wednesday, and Friday and the computer condition presented on Tuesday and Thursday Prior to the experiment, students in both classes had minimal exposure to algebraic expressions and equations while working in their text: Algebra I, Glencoe Mathematics Series During the traditional instruction condition, the classroom activities were divided into two main parts: 1) introduction with in-class examples, and 2) guided practice The 10 introduction period began with the teacher giving each student a worksheet containing twenty-five word problems ranging in difficulty from simple one-operator problems to complex four-operator problems After reviewing the objective of the lesson, problems were displayed on an over-head projector while the instructor read a problem and demonstrated how to translate it into an algebraic expression The instructor used various instructional strategies separately and in combination while demonstrating problems For example, on one problem the instructor exclusively used the “clue” word method by identifying clue words such as “more than”, “less than”, and “sum” that indicate mathematical operations and parentheses On another problem, he used the “clue” word method along with dividing the problem into component parts and solving each part separately On all problems demonstrated, however, the instructor continually checked for understanding by asking comprehension gauging questions and eliciting questions and discussion from students A total of five problems were presented and took approximately twenty minutes During guided practice, students were instructed to work on the remaining problems until the end of the class period, approximately thirty minutes The instructor was available to all students and assisted in the order in which help was requested The guidance was not interactive in nature, but consisted mainly of prompting students to look for clue words, defining words (e.g., “per” means “divide by”, “twice” means “two times a number”), explaining procedures (e.g., “less than” is a backwards construction), and giving hints All questions were answered regardless of the nature The cooperative instruction condition also consisted of two parts: introduction with in-class examples and cooperative learning groups The introduction period followed the same instructional sequence used during the traditional instruction condition and also lasted twenty minutes However, students were placed in groups of four and encouraged to work together on the problems with no additional guidance from the instructor The cooperative learning model had been used on a regular basis in these classes, so students were familiar with the structure and expectations For example, students understood the concept of peer support inherent in the groupings and the many forms in which it can be manifested such as clarifying, interpreting, modeling, explaining, and taking responsibility for their own learning as well as the group’s learning When students requested assistance from the instructor, they were reminded to attempt the problem as a group first and then were given indirect support, when needed Students worked on the problems in their groups for thirty minutes During the computer-delivered lesson, students logged on to the computer as soon as the class began This process took five minutes for the majority of students, a few, however, needed more time to fully log on The computer-system then randomly assigned each student either to the ITS or the CAI condition Students continued working on computer delivered problems until the end of class The additional five to seven minutes spent logging on effectively resulted in less instructional time for the students in the computer lesson Therefore, students in the classroom conditions received about eight percent more time on task Design A counterbalanced design, in which all groups received all treatments but in a different order, was used in this study Each student participated in the experimental condition and either in the cooperative or traditional instruction part of the control condition For example, students in Group -1 participated in the control condition first, while students in Group - participated in the experimental condition first ensuring that 17 CAI ITS Wednesday Thursday ~ 10 -15 minutes Control: Average 45 minutes Mid test Friday ~10-15minutes Post-test Traditional Learning Condition Results from Experiment Results: There was no statistically significant difference (t =-.655, p

Tiêu đề	Comparing The Learning From Intelligent Tutoring Systems, Non-Intelligent Computer-Based Versions, And Traditional Classroom Instruction.
Tác giả	Michael Mendicino, Neil Heffernan
Trường học	West Virginia University
Chuyên ngành	Educational Psychology
Thể loại	journal article

Định dạng
Số trang	31
Dung lượng	369,5 KB