1. Trang chủ
  2. » Ngoại Ngữ

Helping Children Learn Vocabulary during Computer-Assisted Oral Reading

204 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Helping Children Learn Vocabulary during Computer-Assisted Oral Reading
Tác giả Gregory Aist
Người hướng dẫn Jack Mostow, Advisor, Albert Corbett, Alex Rudnicky, Charles Perfetti
Trường học Carnegie Mellon University
Chuyên ngành Language and Information Technologies
Thể loại dissertation
Năm xuất bản 2000
Thành phố Pittsburgh
Định dạng
Số trang 204
Dung lượng 4,01 MB

Cấu trúc

  • 1.1 Learning to read (13)
  • 1.2 Knowing a word (14)
  • 1.3 Learning the meaning of a new word (15)
  • 1.4 Assessing vocabulary knowledge (17)
  • 1.5 Learning vocabulary from assisted oral reading (18)
  • 2.1 Description of the baseline 1997-98 Reading Tutor (24)
    • 2.1.1 The Reading Tutor in its social context (24)
    • 2.1.2 A prototypical session with the 1997-1998 Reading Tutor (26)
    • 2.1.3 The Reading Tutor as a software program (31)
  • 2.2 Comparison of baseline 1997-1998 Reading Tutor and other software (32)
  • 2.3 Learning vocabulary in the Reading Tutor (33)
    • 2.3.1 Stories read per unit time (35)
    • 2.3.2 New words seen per story read (35)
    • 2.3.3 New words learned per word seen (36)
  • 2.4 Goal: Help students encounter new words, and learn them (37)
  • 3.1 Revising the story choice policy (39)
    • 3.1.1 Design considerations for revised story choice policy (39)
    • 3.1.2 Story choice policy: Options considered (40)
  • 3.2 Implementing Take Turns (43)
    • 3.2.1 Reading Tutor and student take turns choosing stories (43)
    • 3.2.2 Reading Tutor story choice (45)
    • 3.2.3 Student story choice (48)
  • 3.3 Evaluation of Take Turns story choice policy (51)
    • 3.3.1 Classroom-compatibility (51)
    • 3.3.2 Usability (51)
    • 3.3.3 Acceptance (52)
    • 3.3.4 Efficiency (60)
    • 3.3.5 Effectiveness (62)
  • 3.4 Lessons learned from story choice experiments (72)
  • 4.1 Rationale (76)
    • 4.1.1 Automatic generation of vocabulary assistance (77)
    • 4.1.2 Automatic generation of vocabulary assessment (77)
  • 4.2 Experiment design (79)
  • 4.3 Implementation of the factoid experiment (83)
    • 4.3.1 Selecting target words (84)
    • 4.3.2 Assigning words to conditions for factoid vocabulary assistance (87)
    • 4.3.3 Constructing vocabulary assistance (88)
    • 4.3.4 Presenting vocabulary assistance (89)
    • 4.3.5 Administering the automatically constructed multiple-choice questions (91)
    • 4.3.6 Data integrity (92)
  • 4.4 Results of the factoid experiment (92)
    • 4.4.1 Overall, factoids did not help… (94)
    • 4.4.2 Exploration revealed possible effect for rare single-sense words, tested one or two days later (94)
    • 4.4.3 Further characterization of factoid results (98)
    • 4.4.4 Word recency effect (99)
  • 4.5 Lessons learned from factoid study (100)
  • 5.2 Assignment of students to treatments (103)
  • 5.3 Treatments (104)
  • 5.4 Outcome measures (107)
  • 5.5 Results on Word Comprehension (108)
    • 5.5.1 Did Reading Tutor students gain from pre- to post-test? (108)
    • 5.5.2 Did Reading Tutor students gain more than a national cohort? (110)
    • 5.5.3 Did Reading Tutor students gain more than their peers who received classroom instruction? (113)
    • 5.5.4 Did Reading Tutor students gain more than their peers who received one-on-one human tutoring? (114)
  • 5.6 Relationship between students’ word comprehension gains and distinct words seen (117)
  • 5.7 Lessons learned (118)
  • 6.1 Can (low-reading elementary) students make use of explanations?: The comets and meteors experiment (120)
    • 6.1.1 Texts (121)
    • 6.1.2 Test (122)
    • 6.1.3 Experiment design (124)
    • 6.1.4 Results (125)
    • 6.1.5 Lessons learned from comets and meteors experiment (126)
  • 6.2 Can explanations add to natural contexts?: The limericks experiment (126)
    • 6.2.1 Texts (127)
    • 6.2.2 Tests (128)
    • 6.2.3 Treatment fidelity: 3% of trials affected by bug (131)
    • 6.2.4 Results (133)
    • 6.2.5 Lessons learned from limericks experiment (141)
  • 7.1 Contributions of this dissertation (148)
  • 7.2 Future work (149)
  • 7.3 Concluding remarks (150)

Nội dung

Learning to read

Reading runs deeper than merely turning print into sound; reading makes meaning from print. What does it take for children to learn to read?

Motivation plays a crucial role in the amount of time a child dedicates to reading Observing parents, older siblings, or classmates engaged in reading can inspire children to develop an interest in books Additionally, allowing children to choose their reading material can further enhance their desire to read Conversely, feelings of embarrassment among peers or frustration with reading can diminish a child's motivation to engage with texts.

Access to reading materials greatly varies among children, with some benefiting from parents who read to them, supportive communities, and abundant books at home, while others may encounter few books before starting school This disparity highlights a concerning reality for students from low socioeconomic backgrounds, who often experience limited exposure to print even within the classroom (Duke 2000).

Skills Children must acquire a wide range of skills to ultimately comprehend text (NRP 2000,

Phonemic awareness helps children identify and manipulate sounds in spoken words, while knowledge of print conventions teaches them how to navigate text from left to right and top to bottom Mastery of the alphabetic principle connects sounds to their corresponding letters, and decoding skills enable the transformation of printed letters into sounds Increased reading fluency enhances speed and automaticity, while background knowledge and vocabulary are essential for comprehension Additionally, the ability to draw inferences and integrate information from various sources allows readers to derive meaning from text effectively.

We focus in this dissertation on learning vocabulary.

Knowing a word

What does it mean to know a word? A person’s knowledge of a word may range from none at all to complete mastery Aspects of word knowledge include:

Pronunciation astronaut is pronounced [ ], as written in the International Phonetic Alphabet.

Part of speech astronaut is a noun.

Morphology Inflectional morphology carries agreement For example, the plural of astronaut is astronauts Derivational morphology turns one word into another For example, astronaut (noun) + -ic → astronautic (adjective).

Syntax astronaut refers to a person, so the word astronaut takes he or she as a pronoun.

Lexical semantics The core meaning of astronaut is a space traveler.

Pragmatics cosmonaut shares the core meaning of astronaut, but is used for Russian astronauts.

In this dissertation, we focus on learning the core meanings of words.

Learning the meaning of a new word

How can we help children learn new words? We consider two primary methods: direct instruction and learning through reading; and a hybrid: adding information to text.

Direct instruction in vocabulary can lead to a solid understanding of words, but it requires a significant investment of time For instance, a study from 1983 demonstrated that teaching fourth graders 104 words over five months involved 75 lessons, averaging 21 minutes of instruction per word (McKeown et al 1983) Students engaged in various activities, such as matching definitions and creating contexts, resulting in substantial exposure to the words Beck and McKeown (1991) argue that this time-consuming approach can be optimized by focusing on the most useful vocabulary, specifically second-tier words that are commonly used and applicable across various subjects (Beck and McKeown 1991; McKeown 1993) While direct instruction is beneficial for critical vocabulary, it is impractical to apply this intensive method to every new word (Zechmeister et al 1995).

Reading Children can learn words from written contexts (Nagy, Herman, and Anderson 1985,

According to McKeown (1985) and Gipe and Arnold (1978), learning is an incremental process where each exposure contributes a small amount of knowledge Despite the modest gains from individual experiences, the cumulative effect can be significant, as noted by Eller, Pappas, and others.

Research indicates that readers with advanced vocabularies are more adept at learning from context due to their extensive prior knowledge of words This enables them to grasp information more effectively, even when the text contains less new information compared to what is available to readers with limited vocabularies.

Reading plays a crucial role in expanding children's vocabulary, especially when they engage with texts that include unfamiliar words According to Carver (1994), for students to encounter a sufficient number of new words—ideally one to three for every 100 words—they should read books that challenge their independent reading level Simpler texts often lack the richness of vocabulary necessary for significant improvement, highlighting the importance of selecting appropriate reading materials to foster language development.

Research by McKeown (1985) investigated whether simple exposure to text is enough for all readers to learn new vocabulary, focusing on the experiences of high- and low-ability fifth graders The study involved 15 fifth-graders who had varying vocabulary scores at the end of fourth grade, ranging from grade equivalent 3.3 to 4.1 on the Stanford Achievement Test Findings revealed that low-reading fifth graders struggled to learn words from context due to making incorrect inferences about word meanings Contrary to expectations, using multiple sentence contexts did not help eliminate these incorrect inferences for either the low-reading students or their higher-ability peers.

(1985) study, who had scored above grade equivalent 4.8 on the Stanford Vocabulary subtest, had some trouble integrating multiple sentence contexts to derive meaning

Research indicates that while there have been efforts to teach children to learn words from context, the primary benefit may stem from simply practicing this skill rather than from specific instructional strategies (Kuhn and Stahl, 1998) They suggest that increasing children's reading volume is the most effective way to enhance their understanding of word meanings, independent of any additional contextual training As Schwanenflugel, Stahl, and McFalls (1997) note, most vocabulary growth is attributed to exposure to words in both written and spoken forms, rather than through direct teaching; individual encounters with words in natural contexts often provide limited information about their meanings.

Enhancing the context in which a word appears can significantly improve vocabulary learning, as traditional dictionary definitions often fail to meet learners' needs Research indicates that providing context-specific definitions in digital formats is particularly beneficial for sixth graders This approach combines direct instruction with reading by first presenting a text and then integrating concise, targeted word instruction, thereby facilitating effective vocabulary acquisition.

How can we test whether a particular kind of vocabulary assistance helps? We now discuss how to assess vocabulary.

Assessing vocabulary knowledge

Assessing vocabulary knowledge presents challenges as it requires sampling from two key dimensions: selecting a representative subset of words from the vast English lexicon and identifying specific aspects of word knowledge to evaluate For instance, one might randomly choose 20 words from a dictionary and test children's ability to define them or use them in context The National Reading Panel (2000) advises that customized tests be utilized for vocabulary assessment instead of relying exclusively on standardized tests.

In this dissertation, we employ diverse methodologies for word selection and evaluation, specifically concentrating on children's understanding of these words Chapter 4 highlights our examination of words that possess limited meanings.

In this article, we explore the concept of "low-hanging fruit" in the context of automatically generated vocabulary assistance and multiple-choice assessments Chapter 5 discusses the use of the Word Comprehension subsection from the Woodcock Reading Mastery Test as a reliable measure of vocabulary knowledge Chapter 6 begins by examining domain-specific content words through a matching task created by the experimenter, followed by an analysis of rare, domain-independent words using experimenter-designed multiple-choice questions.

This dissertation explores the process of learning new vocabulary through reading connected text, enhanced by additional vocabulary support, during computer-assisted oral reading We specifically examine the initial encounter with a word and the subsequent understanding of its meaning.

Learning vocabulary from assisted oral reading

In this section we describe the process of learning vocabulary during assisted oral reading We describe an informal model: a conceptual framework useful for identifying opportunities to improve vocabulary learning

We can characterize how many words a student learns in a day of assisted oral reading as shown in Equation 1.1

New words seen Story read

New words learned New words seen

Equation 1.1 New words learned per day of assisted oral reading.

Our thesis statement focuses on enhancing vocabulary learning during assisted oral reading, as outlined in equation 1.1 We propose to achieve this by increasing children's exposure to new words and improving their understanding of these words To increase word exposure, we aim to enhance the amount of new material students read, though this is not guaranteed when students have significant control over their software interaction Additionally, we seek to improve vocabulary learning by augmenting text to provide better learning opportunities compared to unaugmented text, acknowledging that reading alone can effectively build vocabulary We support these assertions through empirical tests conducted on modifications to the 1997-1998 version of Project LISTEN's Reading Tutor, a program designed to assist children in reading aloud and developing their reading skills (Chapter 2).

This dissertation is structured as follows: Chapter 2 introduces the baseline version of Project LISTEN’s Reading Tutor from 1997-98 In Chapter 3, we discuss modifications made to the Reading Tutor to assist children in encountering new words, while Chapter 4 focuses on how these modifications help children understand the meanings of those words Chapter 5 compares the effectiveness of the modified Reading Tutor with traditional classroom instruction and one-on-one tutoring by certified teachers, noting that while classroom instruction is the standard, one-on-one tutoring is often considered the "gold standard" despite its high cost Chapter 6 presents follow-up experiments aimed at exploring additional vocabulary support strategies, and Chapter 7 summarizes the contributions of this thesis The appendices include the materials utilized in our experiments.

In this section, we will describe the Reading Tutor and then revisit Equation 1.1 to explore the specific roles of each term within the context of the Reading Tutor.

This dissertation is part of the long-standing Project LISTEN, which focuses on enhancing children's reading skills through an interactive Reading Tutor that listens to them read aloud While a comprehensive history of Project LISTEN is not included here, interested readers can refer to Mostow & Aist (FF 2001) for in-depth information This section aims to provide sufficient background on previous findings to contextualize our current research.

In 1994, Project LISTEN introduced the Reading Coach, an innovative tool designed to enhance oral reading skills A study revealed that second graders who utilized the Reading Coach demonstrated a 40% improvement in comprehending challenging third-grade texts compared to those who did not use the tool This significant enhancement was measured through a comprehension test given immediately after reading.

In that study, there was no assistive effect for an easier passage.

1996-1997 pilot study Iterative redesign of the Reading Coach with concurrent usability testing resulted in the 1996 version of the Reading Tutor (Mostow, Hauptmann, and Roth UIST

In a 1996 pilot study conducted by Mostow and Aist, eight third graders from Fort Pitt Elementary, a low-income urban school, utilized the 1996 Reading Tutor under the supervision of a school aide Of the original group, six students completed the study, achieving an average reading improvement equivalent to two years within just eight months, as measured by a school-administered Informal Reading Inventory Further details can be found in Mostow & Aist (FF 2001).

Summer 1997 Reading Clinic During the summer of 1997, 62 students in grades K-5 used the Reading Tutor during a reading clinic at a low-income urban elementary school (Fort Pitt

Elementary) Concurrently, “the Reading Tutor underwent major design revisions of the “frame activities” – logging in and picking a story to read – to enable classroom-based use” (Mostow and Aist FF 2001)

1997-1998 formative and controlled studies As Mostow and Aist report (FF 2001):

Between 1997 and 1998, a formative study was conducted at Fort Pitt Elementary, an urban elementary school, where students in 11 classrooms utilized the Reading Tutor program By Spring 1998, a total of 63 students had completed the study, providing valuable insights into the application of the Reading Tutor in a regular classroom environment.

In a study conducted by Mostow and Aist (2001), 72 participants utilized various reading methods, including the Reading Tutor, commercial software, and conventional instruction The results indicated that the Reading Tutor group significantly outperformed their peers in the conventional instruction group on the Passage Comprehension subtest of the Woodcock Reading Mastery Test, despite participating in only a fraction of the intended 20-minute daily sessions The study, which took place during the 1997-1998 academic year, evaluated multiple reading skills such as Word Attack, Word Identification, Passage Comprehension, and fluency, but did not assess Word Comprehension.

For our dissertation, we utilized the 1997-1998 Reading Tutor as our baseline system, focusing on oral reading due to its crucial role in vocabulary development By integrating vocabulary instruction with oral reading, we aimed to enhance spelling, receptive pronunciation, productive pronunciation, and comprehension simultaneously Our research was conducted using Project LISTEN's Reading Tutor to expand its skill set to include vocabulary assistance, leverage the platform for research purposes, and efficiently conduct experiments with students already engaged with the Reading Tutor, eliminating the need for separate software development for each study.

We now move on to describing the baseline 1997-1998 Reading Tutor.

Description of the baseline 1997-98 Reading Tutor

The Reading Tutor in its social context

The 1997-1998 Reading Tutor was designed for independent use in classrooms, with one Reading Tutor computer allocated per classroom This setup allowed students to read individually while teachers engaged with the rest of the class, enhancing the overall learning experience.

Figure 2.1 A student reads with the Reading Tutor while the teacher teaches the rest of the class

Teachers should focus on teaching rather than managing software The Reading Tutor, developed for the 1997-1998 academic year, allowed for minimal teacher intervention, featuring a special “teacher menu” for optional adjustments This software was primarily designed for independent student use, with teachers providing guidance as needed This approach emphasizes the limited role of teachers in software interaction.

“indirect usage.” Project LISTEN staff trained teachers and provided technical support, including on-site visits about once per week to collect recorded data and check each computer for problems

Tutorial stories introduced students to various aspects of using the Reading Tutor.

Administrative features enabled teachers to enroll new students, modify stories, and log out of the software During the formative study, teachers established classroom policies for Reading Tutor usage, determining the daily reading minutes and the number of stories each student should complete In the controlled study, we instructed teachers to implement a 20-minute reading session for students.

The Reading Tutor maintained a user history database to inform tutorial decisions, while also generating log files, capturing student speech, and saving speech recognition outputs for research Initially, the Project LISTEN team transferred this data to various storage media such as recordable CDs and Jaz™ disks before returning it to Carnegie Mellon,

A prototypical session with the 1997-1998 Reading Tutor

The typical student session with the 1997-98 Reading Tutor involved several key steps: logging on to the platform, selecting a story to read, engaging with part or all of the chosen narrative, possibly exploring additional stories, and concluding the session by logging off Each of these steps plays a crucial role in the overall reading experience.

To streamline the usage of the Reading Tutor software, we kept it running throughout the day, allowing students to access it without needing to launch it each time To facilitate effective data analysis, we ensured that each child's data remained separate Students initiated their sessions by selecting their name from a list of enrolled participants and confirming their identity by clicking on their birth month, which served as a simple and memorable password.

In the 1997-1998 Reading Tutor, students could select a story from a comprehensive list organized by reading level and title The system initially displayed stories starting from the last level read, allowing easy navigation to the first story of the chosen level Upon selecting a story title, the Reading Tutor highlighted it, read it aloud, and provided an estimated difficulty level based on unfamiliar vocabulary, helping students make informed reading choices After reviewing the difficulty estimate, students could click "Okay" to start reading their selected story.

The 1997-1998 Reading Tutor featured a story choice screen where Level K stories were presented with a blank designation, while stories from Level A to E were labeled with a single letter Additionally, stories categorized as Help were marked with the "Help" level.

2.1.2.3 Read all or part of the story aloud

After choosing a story, students read it aloud with the assistance of the Reading Tutor, which displays one sentence at a time The tutor listens to the student's reading and offers support for any mispronounced or challenging words Students have the flexibility to read individual words, complete sentences, or segments of sentences.

Students can navigate through sentences by clicking "Back" to revisit the previous sentence, selecting the help icon for assistance, or clicking "Go" to advance to the next sentence (see Figure 2.3) The Reading Tutor progresses to the next sentence after recognizing that the student has read all content words Additionally, students have the option to choose a different story by clicking "Story" (available only in the 1997-1998 version) or log out by selecting "Goodbye."

Figure 2.3 Reading a story in the 1997-1998 Reading Tutor

The Reading Tutor offers timely assistance by responding to student mistakes and requests for help It provides support through hints and guidance delivered in recorded human voices This approach aims to balance the student's immediate need to read words or sentences with the broader objective of fostering long-term reading skills.

(Aist and Mostow CALL 1997, Mostow and Aist CALICO 1999) Help included:

1 Read the entire sentence using a recording of a human narrator’s fluent reading, to model correct reading While playing the (continuous) recording, the Reading Tutor would highlight each word as it was spoken, which we call word-by-word highlighting.

2 Read the entire sentence by playing back isolated recordings of a single word at a time, in order to allow students to hear one word read at a time Because these recordings may be in different voices, we call word-by-word playback “ransom note” help.

3 Recue a word by playing an excerpt from the sentence narration of the words leading up to that word (along with word-by-word highlighting), in order to prompt the student to try (re-) reading the word For example: If the text is Jack and Jill went up the hill to fetch a pail of water, the Reading Tutor could recue hill by first reading Jack and Jill went up the out loud, and then underlining the word hill to prompt the student to read it.

4 Give a rhyming hint that matches both the sound (phoneme sequence) and the letters (grapheme sequence) of the target word, in order to give a hint on how to read the target word, and to expose the student to related words For example, if the word is hill, give the word fill as a spoken and displayed rhyming hint, but not the word nil because its spelling does not match.

5 Decompose a word, syllable-by-syllable or phoneme-by-phoneme, to model the process of sounding out words and to call attention to letter-to-sound mappings For example, say /h/ while highlighting h, then say /i/ while highlighting i, then say /l/ while highlighting ll.

6 Show a picture for a word, in order to demonstrate word meaning and to increase engagement For example, if the word is apple, show a drawing of an apple Fewer than 200 words had pictures in the 1997-1998 version.

7 Play a sound effect, perhaps to demonstrate word meaning but primarily to increase engagement For example, if the word is lion, play the roar of a lion Fewer than 50 words had sound effects in the 1997-98 version; most were names of animals, such as seagulls, tiger, and wolf.

2.1.2.4 (Perhaps) choose and read more stories

Students can easily select a new story by clicking the Story button at any time Once they finish reading a story, the Reading Tutor will promptly present the story choice menu again, allowing for seamless navigation between stories.

The Reading Tutor as a software program

We describe the Reading Tutor as a software program: first its inputs and outputs, and then its hardware requirements.

The 1997-1998 Reading Tutor utilized input from a noise-canceling headset or handset microphone, along with mouse clicks and keyboard entries, to facilitate student interaction According to Mostow et al (1994), the system incorporated the Sphinx-II speech recognizer to effectively listen to children as they read aloud It monitored the student's progress, identified potential errors, and determined when assistance was needed The Reading Tutor showcased text, graphics, and images on a standard computer monitor while delivering audio through speakers or headphones.

The 1997-98 Reading Tutor was a software application developed mainly in C++, with additional components in Perl, SQL, and MS-DOS batch language This program operated on Windows NT 4.0 and was designed for standard IBM-compatible computers, making it accessible for users.

The 1997-1998 Reading Tutor, priced around $2,000, utilized 128 megabytes (MB) of memory and required a full-duplex sound card for simultaneous audio input and output It demanded a Pentium™ processor with a speed of 200 MHz or higher and a screen resolution of 1024x768 Installation was performed via compact disc (CD-ROM), with data storage on a hard disk, which could later be transferred using recordable CDs on a compact disc writer.

We now compare the Reading Tutor to other reading software that uses speech recognition

Comparison of baseline 1997-1998 Reading Tutor and other software

In this article, we situate the baseline Reading Tutor within its research context by comparing it to similar software that focuses on aiding reading in a child's first language through speech recognition For those interested in software beyond these parameters, Aist (1999) offers insights into speech recognition in second language learning, while Schacter (1999) reviews both traditional and software-based reading instruction for a child's first language Additionally, Whines (1999) presents a comprehensive comparison of several systems relevant to this discussion.

The Speech Training Aid (STAR), created by DRA Malvern, utilizes automatic speech recognition technology to assist children in practicing the reading of isolated words (Russell et al 1996) Additionally, the 1997-98 Reading Tutor was designed to listen to children as they read connected and authentic texts.

Talking and Listening Books, also described by Russell et al (1996), used continuous text but employed word spotting techniques to listen for a single word at a time.

Let’s Go Read (Edmark 1997) incorporated speech recognition into a variety of single- phoneme and single-word exercises The 1997-1998 Reading Tutor focused on assisted reading of authentic text.

In 1998, IBM and Williams et al developed a speech recognition system designed to enhance reading skills by utilizing continuous text, diverging from the conventional talking-book method that relied on visually rich trade books with minimal text The Reading Tutor, introduced in 1997-1998, focused on engaging children with large, easy-to-read fonts, prioritizing text comprehension over visual distractions.

In this article, we revisit the 1997-1998 baseline Reading Tutor and its advancements, focusing on how children acquire vocabulary through computer-assisted oral reading We analyze the implications of Equation 1.1 in relation to the Reading Tutor's features and enhancements, comparing it to other speech recognition reading software.

Learning vocabulary in the Reading Tutor

Stories read per unit time

Time per day on Tutor = Time reading Words Story read Words

Time per day on Tutor

Equation 2.2 Stories read per time on Tutor.

Assuming that the number of minutes per session is externally determined, we can represent the number of stories read per day as shown in Equation 2.2

We now discuss each of the terms in Equation 2.2 in turn.

Time reading on Tutor refers to the proportion of each session spent reading stories, distinguishing it from other activities like logging in, selecting stories, composing narratives, or responding to comprehension questions.

Reading rate, measured as words per time, is influenced by various factors in the Reading Tutor system Delays in assistance and system response can slow down this rate, similar to how brief distractions affect independent reading Additionally, students may find that rereading a story is quicker than their initial reading experience.

Words / Story read is story length Harder stories will of course tend to have more words.

New words seen per story read

Story = New words in original text + New words in additional text

Equation 2.3 New words seen per story read.

Equation 2.3 describes how many new words a student will see in a story, for both the original text and for any text added by the Reading Tutor (see below).

New words in original text is zero for a previously completed story, and varies with story difficulty for a previously unread story.

If the Reading Tutor does not add any new text to the original story, the count of new words remains zero However, this number can increase if the tutor introduces comparisons between familiar words and new vocabulary that the student has not encountered before.

New words learned per word seen

New words seen = New words actually read

New words seen ×New words learned

Equation 2.4 New words learned per new word seen.

Equation 2.4 describes the ratio of new words learned to new words seen – equivalently, the probability that a student learns a new word if he or she sees it.

To effectively learn a new word's meaning, students must engage with it by reading it thoroughly instead of glossing over it Struggling readers often avoid challenging words, which hinders their vocabulary development.

Research indicates that students with higher reading abilities acquire new vocabulary more effectively from context compared to those with lower reading skills (McKeown 1985, Shefelbine 1990) However, not all contexts facilitate word learning equally (Beck, McKeown, and McCaslin 1983) Sentences rich in semantic content are more beneficial for vocabulary acquisition than simple word lists Additionally, incorporating relevant information into texts can enhance the proportion of new words learned relative to those actually read.

Goal: Help students encounter new words, and learn them

This dissertation emphasizes two critical factors from equation 2.1: the number of new words encountered per story and the learning rate of new words upon first exposure For effective vocabulary acquisition, students need to both encounter new words and grasp their meanings at the moment of introduction To enhance these processes, we have made modifications to Project LISTEN’s Reading Tutor.

In this article, we discuss the enhancements made to the Reading Tutor and evaluate their effectiveness through experiments Chapter 3 focuses on the advancements in the story choice policy of the Reading Tutor, while Chapter 4 outlines the experiments conducted to assess the vocabulary assistance provided.

The balance between students' interests and the Reading Tutor's educational objectives is crucial in the story selection process While the Reading Tutor aims to enhance reading skills, children often engage with the software for different reasons, such as exploring narratives or playing games As noted by Hanna et al., children's goals may diverge from the intended educational outcomes, leading to a gap between the product's purpose—like teaching the alphabet—and the child's motivation Ultimately, the Reading Tutor seeks to facilitate reading development, but students may prioritize their own interests, such as choosing specific stories or experimenting with story menus.

In the 1997-1998 school year, students used the Reading Tutor in a classroom independently.

To enhance student engagement and foster a love for reading, the 1997-98 Reading Tutor allowed learners to select from a diverse range of stories, including non-fiction, poetry, and fiction However, this approach revealed challenges, as some students repeatedly chose familiar texts, such as excerpts from Martin Luther King’s “I Have a Dream” speech or simple nursery rhymes like “Jack and Jill.” This limited their exposure to new vocabulary, which is essential for language development Additionally, many students opted for stories that were too simplistic, hindering their ability to encounter and learn new words To address these issues, we aimed to refine story selections to ensure students are exposed to a greater variety of new vocabulary, thereby enhancing their reading experience and language acquisition.

In this chapter, we outline the enhancements made to the Reading Tutor's story selection policy, beginning with the key design considerations that informed our approach We then detail the implementation of our revised policy, followed by an evaluation of the outcomes achieved Finally, we share the valuable lessons learned throughout this process.

Revising the story choice policy

Design considerations for revised story choice policy

During the process of constructing and analyzing possible story choice policies, we made explicit some desired characteristics for story choice A good story choice policy is:

An effective story choice policy should support teachers in guiding students to specific stories, fostering a collaborative classroom environment It should promote healthy competition, encouraging students to explore new stories rather than focusing on repetitive readings of favorites Additionally, the policy should facilitate peer recommendations, allowing students to engage with stories suggested by their friends Ultimately, a well-designed story choice policy will align with the practical needs and opportunities present in the classroom setting.

The policy should offer clear and straightforward choices for students, making it easy for both students and teachers to understand A confusing or complicated policy can undermine its credibility and effectiveness.

Acceptable Does the policy result in choosing stories that students actually read? If students boycott stories chosen by the computer, the Reading Tutor will be less effective.

The revised story choice policy must be as efficient as the student-only story choice policy implemented during the 1997-1998 Reading Tutor program It is crucial that the time spent selecting a story does not exceed the time allocated in the previous policy, as every minute spent on selection detracts from valuable reading time Therefore, the new policy should ensure that story selection remains quick and streamlined to maximize reading opportunities for students.

An effective reading policy should ensure that students regularly engage with new material to encounter unfamiliar words in context Additionally, it should allow for some re-reading of texts, as successful reading programs often incorporate this practice to enhance comprehension and retention (Schacter 1999).

Story choice policy: Options considered

To enhance the shared reading process, we aimed to improve the Reading Tutor's ability to assist children in selecting suitable reading materials While one might envision a system where each student has a personalized reading list managed by a teacher, our findings indicate that teachers prefer to focus on direct instruction rather than adjusting software They typically engage with the Reading Tutor indirectly, guiding students towards effective reading behaviors and occasionally monitoring their progress For instance, one teacher creatively used a 3x5 index card to display a list of stories atop the Reading Tutor monitor Our goal was to develop a story choice policy that allows for robust student engagement without requiring direct teacher involvement, while still accommodating their indirect guidance, similar to the original policy.

Allowing children to choose their reading material empowers them and encourages engagement with stories that capture their interest Consequently, we aimed to preserve student choice while exploring less extreme alternatives.

To enhance student engagement with new stories, we can prioritize the arrangement of the story list by placing new titles at the top However, simply sorting the list may not ensure effectiveness, as students might still opt for familiar stories, disregarding the recommendations.

Restrict the story list We could restrict the list of stories that students could choose from.

Imposing significant restrictions on children's choices can lead to confusion, as they may struggle to comprehend the limitations Additionally, allowing students to select an old favorite occasionally raises questions about fairness, especially if they cannot choose it consistently Moreover, such restrictions might prevent students from exploring stories recommended by their peers, limiting their engagement with diverse narratives.

Implementing two separate lists of stories for students to choose from can enhance their reading experience The Reading Tutor can alternate between a list of new stories and a list of old favorites, allowing students to engage with fresh material while also revisiting beloved tales This method is straightforward and maintains fairness, as students have the autonomy to make their own choices However, to avoid confusion, it's essential to clearly label each list—such as "Old" and "New"—to help students easily distinguish between the two Overall, this approach not only encourages a diverse reading selection but also supports the enjoyment of familiar stories.

Requiring students to choose an old story over a new one raises a significant concern: why restrict their reading choices when they may prefer fresh material? While re-reading has its advantages, it limits the opportunities for students to engage with more challenging and diverse content that could enhance their learning experience.

Implementing a "Take Turns" approach in story selection allows both students and Reading Tutors to choose from a shared list, ensuring that students can explore new stories while also enjoying favorites recommended by peers This method simplifies decision-making by providing a consistent list of stories, making it easier for students and teachers to navigate options Although students won't have unlimited choices, the selections they do make will be varied and unrestricted Additionally, this system is efficient; it could reduce the time spent on story selection by about half compared to previous methods, while guaranteeing that every student is exposed to new reading material.

Among the various policies considered, implementing a turn-taking system proved to be the most beneficial with minimal downsides We opted for this approach as it is easier to explain than presenting multiple lists Additionally, turn-taking occasionally allows students the freedom to choose their reading material without restrictions This method is also more time-efficient, enabling students to spend more time engaged in reading stories.

Take Turns offers a unique advantage by allowing the Reading Tutor to select stories, which can be utilized for various purposes, such as introductory stories During the 1999-2000 period, we programmed the Reading Tutor to choose new stories, and in subsequent research (not detailed in this dissertation), we leveraged its turn for experimental purposes Additionally, the Reading Tutor’s turn can be employed to introduce other activities, including passages designed to assess fluency.

Accordingly, we introduced a Take Turns story choice policy for the 1999-2000 Reading Tutor.

Implementing Take Turns

Reading Tutor and student take turns choosing stories

The Take Turns implementation required a system that enabled both the Reading Tutor and the student to alternately select stories It was essential for the algorithm to include an introductory tutorial for the Reading Tutor Additionally, the algorithm needed to be resilient against software crashes, student attempts to bypass the story selection process, and differing classroom practices regarding the number of stories read each day.

The story choice algorithm operates by randomly determining each day whether the student or the Reading Tutor will select the first story to read, followed by alternating selections for the remainder of the day Table 3.1 outlines the two key elements of this algorithm: the process of choosing a story and the actions taken once the student completes reading a story.

When it is time to pick a story…

If tutorial not finished… Then, choose the tutorial – a prespecified story introducing the

Otherwise, if this is the first time this student has logged in today…

Pick randomly (50/50) who chooses next story, Reading Tutor or student.

Set who_chooses to result.

Otherwise … Use value of who_chooses: if who_chooses = student, student chooses story; else who_chooses = Reading Tutor, so Reading Tutor chooses story.

When the student has just finished reading a story…

If who_chooses = Reading Tutor Then, set who_chooses = student.

Otherwise, if who_chooses = student Then, set who_chooses = Reading Tutor.

Table 3.1 How to Take Turns choosing stories.

We comment on a few aspects of Take Turns.

To minimize variations in student behavior caused by differing initial training, we implemented a standardized tutorial story for the Reading Tutor, ensuring all students received a consistent introduction Upon logging in, the Reading Tutor verified whether the student had completed the tutorial; if not, it prompted the tutorial story However, the original tutorial from 1999-2000 was found to be too challenging for young readers and was subsequently replaced with small-group training led by certified teachers Newer interactive tutorials tailored for younger children were developed in the Summer of 2000, although detailed descriptions of these updates are not included in this dissertation.

The selection of the first story of the day was determined randomly, ensuring that all students engaged with both the Reading Tutor's choices and their own selections This approach prevents any bias that might arise from a systematic policy, allowing for a diverse reading experience Additionally, the random selection data was securely stored on disk to protect it from potential crashes and software restarts.

To enhance the storytelling experience, we implemented a feature that allows the "who_chooses" option to remain active until the student completes the story or a full day has elapsed This approach is designed to prevent students from prematurely exiting the narrative, encouraging deeper engagement with the content.

“cheating” – getting around the Reading Tutor’s story choice by logging out and then logging back in again.

Reading Tutor story choice

The 1999-2000 Reading Tutor assigned students a personalized reading level based on their age and performance, ensuring that each student received stories suited to their abilities When a story at the recommended level was unavailable, the Reading Tutor selected a more challenging option to facilitate continued learning.

The Reading Tutor assigned students to a recommended reading level based on their age, determined by their birth month and day at the time of enrollment Initial assignments were outlined in Table 3.2.

Student’s age Initial recommended reading level Story level grade equivalent

13 years old or older D Fourth grade

Table 3.2 Initial assignment to recommended reading level.

The initial level was deliberately low to avoid frustrating poor readers, and relied on level adjustment to quickly reach story levels that would challenge the better students

The Reading Tutor adjusts a student's recommended reading level to maintain an appropriate level of difficulty, which is essential for effective learning Researchers categorize text difficulty into three levels: independent, instructional, and frustration A book is deemed suitable for independent reading when a child can read 95-100% of the words accurately, while instructional level texts allow for 90-94% accuracy In contrast, frustration level reading occurs when a child reads at 89% accuracy or lower, indicating that the material may be too challenging.

Measuring difficulty in reading can be effectively achieved through the assessment of reading fluency, which is closely linked to comprehension (Pinnell et al., 1995) Utilizing speech recognition technology to listen to children read aloud allows for direct and unobtrusive evaluation of their oral reading performance, as demonstrated by research such as that of Mostow and Aist.

In 1999, we revised our level adjustment criteria by focusing on the student’s assisted reading rate, measured by the number of words accepted by the Reading Tutor per minute, rather than solely on oral reading accuracy This approach provides a more accurate reflection of the student’s reading fluency, despite the limitations of machine-based error identification in individual words.

The level adjustment policy, outlined in Table 3.3, was implemented only after students completed a new story Although teachers had the option to override the suggested reading levels through the Reading Tutor's administrative mode, this feature was rarely utilized In hindsight, establishing story-level-specific promotion and demotion thresholds would have enhanced the system, as reading rates typically improve with each advancing grade level.

Last story completed, compared to recommended reading level

Accepted words per minute (AWPM) < 10 10 ≤ AWPM < 30 AWPM ≥ 30

Easier Move down one level Stay same Stay same

At level Move down one level Stay same Move up one level

Harder Stay same Stay same Move up one level

Noncomparable Stay same Stay same Stay same

Sorting the stories The Reading Tutor sorted the stories by level as follows Each story in the

Reading Tutor had a story level, assigned by Project LISTEN team members Levels K, A, B, C,

D, and E were used for kindergarten through fifth grade respectively, rather than the (possibly stigmatizing) K, 1-5 The Help level was used for help stories (Help stories were about how to operate the Reading Tutor and handle common technical problems.) Student-authored stories were assigned to level U for “unleveled.” The story levels K through E were ordered by increasing difficulty, but levels U and Help did not have a defined difficulty relationship with any other level The Reading Tutor sorted all of the approximately 200 stories into five categories, as shown in Figure 3.1: 1 previously read stories, 2 new stories below the student’s recommended reading level, 3 new stories at the student’s recommended reading level, 4 new stories above the student’s recommended reading level, and 5 stories at a noncomparable level.

(Help) Read me (Help) About the headset

3 Stories at student’s recommended reading level (B, in this case)

(A) Jack and Jill (A) Recycling (K) A B C D E F G (K) Bob got a dog

(B) Windy Nights (B) The Moon (B) Butterflies

(E) Rumpelstiltzkin (E) Mastodon (D) Food Groups (C) The Wind (C) Totem Poles

Figure 3.1 Sorting stories into levels for Reading Tutor story choice.

The Reading Tutor initially selected an unread story based on the student's recommended reading level However, if a student completed all stories at that level, the Fall 1999 version randomly chose from unread stories across all higher levels, sometimes presenting overly difficult stories This issue became apparent after extended use, as some students finished all stories at one level To address this in Spring 2000, we updated the Take Turns feature to ensure the Reading Tutor would sort stories by level and randomly select from those at the next immediate level above the student's recommended reading level.

Student story choice

Students have the freedom to select any story from the Reading Tutor collection or opt to write and optionally narrate their own story.

We wanted to help the student make good story choices Specifically, we wanted to:

Encouraging students to explore new stories is essential for their reading development To facilitate this, we revamped the story selection menu to display the number of times each story has been read by the student To prevent students from viewing the reading frequency as a score, we implemented a system where, after four readings, the display simply shows ">4" for the number of times a story has been read This approach helps maintain focus on discovering new narratives rather than competing over reading counts.

Provide a varied menu of stories to choose from The number of stories available in the

To enhance user engagement and prevent students from selecting only the first few story titles displayed, we redesigned the Reading Tutor's story choice menu Instead of always starting at the top of the alphabetically ordered list, the menu now begins at a random point, encouraging children to explore a wider range of story options.

Support finding a favorite story One important aspect of reading is reading stories you like.

To enhance student accessibility to their favorite stories, we maintained an organized list instead of randomizing it, ensuring that locating a specific story remains straightforward and efficient.

In the 1999-2000 Reading Tutor, we streamlined the menu interaction by reducing the story selection process to a single click, enhancing usability for a diverse range of students Unlike the previous version, which required two clicks for story selection, the new design allowed students to easily return to the story choice screen using a Back button This "one-click pick" feature, detailed in Aist & Mostow ITS-PA 2000, also included support for non-readers by reading story titles aloud, while advanced readers could effortlessly select titles independently.

The revised story choice screen, as illustrated in Figure 3.2, allows students to select a story by clicking on a title at any time The Reading Tutor prompts the student with a message at the top, such as “Greg, choose a level C story to read,” and reads each title aloud while highlighting it in yellow If the student fails to select a title, the Reading Tutor repeats the list before eventually logging the student out due to inactivity Additionally, clicking on “More Level Stories” provides access to additional stories at the current reading level, exemplified here with “More Level C Stories.”

In the fall of 1999, the story choice screen featured the title "Life in Space," which is a shortened version of its opening sentence: "For many years, the United States and Russia worked separately on going into space." Additionally, the screen displayed the frequency with which each student had read the stories, indicated next to the respective titles.

Evaluation of Take Turns story choice policy

Classroom-compatibility

Take Turns was designed to support teachers in recommending stories to children while also allowing kids to choose stories recommended by their friends It encouraged re-reading of favorite tales and featured a Hello screen that displayed information on students' software usage when no one was logged in This setup fostered a sense of healthy competition among the students.

“Hello” screen showed how many minutes each student had read that day, and how many new words and new stories each student had seen.

Usability

In 1999, we conducted multiple tests of the Take Turns story choice policy to evaluate its usability and acceptance Initially, a preliminary version was trialed at Fort Pitt Elementary School during informal one-day sessions in the spring Subsequently, children participating in the CHIkids program at the Conference on Computer-Human Interaction in Pittsburgh engaged with the Reading Tutor featuring the Take Turns policy over several days Additionally, over fifty students at Fort Pitt further contributed to our assessment of this innovative approach.

In the 1999 Summer Reading Clinic, the Reading Tutor implemented a Take Turns story choice policy, which revealed several concerns Notably, some students expressed frustration when the Reading Tutor selected materials that were too challenging To address this, we adjusted the reading level thresholds by simulating both proficient and struggling readers and monitoring students' assisted reading rates during their sessions Although the adjustments weren't flawless, they proved sufficient for students to accept the Reading Tutor's selections.

Some students expressed frustration when the Reading Tutor selected stories that differed from their preferences To address this, the Reading Tutor provided explanations, saying, “Read this story, and then YOU’LL choose one.” While this approach likely alleviated some frustration, it did not completely eliminate it.

The Reading Tutor occasionally selected stories that were segments of a larger narrative, such as a multi-part version of Cinderella, designed for students to read in one session Programmer Andrew Cuneo ensured that each segment was a prerequisite for the next, modifying the Reading Tutor to only present stories that students had completed in sequence This approach highlighted the need to explicitly program the natural expectation for children to read stories in order.

Acceptance

We now turn to considering acceptance We looked at several aspects of story choices: story completion, writing stories, story levels, and re-reading stories.

To assess students' acceptance of their Reading Tutor story choices, we analyzed the frequency of story completions By utilizing data from the Reading Tutor, we compared the number of stories initiated with those completed, focusing on the completion rates of the stories students started reading.

In Fall 1999, a technical issue occurred when the Reading Tutor was unable to log a student's activity of opening the story "There once was a strange little bug" due to a full disk on December 21.

Table 3.4 shows the per-student average number of stories started and finished for the Fall

In the 1999 Centennial data analysis, we categorized stories into new versus old, focusing on who selected each story—either the Reading Tutor or the student To ensure an accurate representation of student engagement, we calculated the per-student average by equally weighting all students, rather than skewing the results towards those who read more stories through simple aggregation.

Reading Tutor Student – new stories Student – old stories Total number of stories 1509 finished out of

1334 started Percentage of stories finished: per-student mean and standard deviation

Table 3.4 Stories finished out of stories chosen, Fall 1999 data

The completion rate for stories selected by the Reading Tutor was only 39%, indicating a significant difference compared to stories chosen by students This discrepancy may be attributed to the Reading Tutor opting for more challenging, longer narratives.

In summary, students demonstrated a higher completion rate for new stories selected by the Reading Tutor compared to those they chose themselves, indicating the effectiveness of the Reading Tutor's selections in encouraging student engagement and completion.

Many students focused on revisiting previously learned content; however, the Reading Tutor's objective was to encourage the exploration of new material to enhance overall reading skills.

Students gradually accepted some selections from the Reading Tutor, allowing them to complete stories despite initially rejecting many individual choices This trend is illustrated in Figure 3.3, which shows the average number of stories completed per student each day during Reading Tutor sessions Notably, the daily completion rate of stories aligned with the intended 50/50 collaboration between the student and the Reading Tutor On average, students finished 1.56 stories per day that were selected by the Reading Tutor.

Tutor – and finished 1.54 stories per day that were chosen by the student (Figure 3.3).

Average number of stories per day

Figure 3.3 Average number of stories started and finished per day, by who chose the story.

In the 1999-2000 Reading Tutor program, students had the option to write their own stories alongside engaging with provided materials (Mostow & Aist AAAI 1999, Mostow & Aist USPTO 1999) This raises the question of how much influence other activities, categorized under level U, such as reading peers' stories and creating their own narratives, had on students' story selections To explore this, we analyzed the distribution of story levels, as illustrated in Figure 3.4, which details the number of stories started and completed by each student, the choices they made, and the respective story levels.

The distribution of level U stories reveals a significant disparity in student choices, with many opting for these stories infrequently; notably, 14 students did not select any level U stories at all.

14 students chose level U stories as half or more of their story choices

To analyze the frequency of students writing versus reading stories, we classified each level U choice based on the author’s identity A choice was deemed as reading another student’s story if the author's first name, last initial, and inferred gender did not align with the selecting student's recorded gender and initials For instance, a story titled by Jane D would be categorized as read by a student with initials J.R., indicating a mismatch Conversely, if the identifiers matched, the choice was classified as writing a story Out of 996 level U choices, this classification provided insight into student engagement in writing and reading activities.

In our analysis of U stories, we found that 107 out of 996 choices involved students reading stories written by their peers, with 69 out of 242 distinct student-story pairs representing first-time selections While Appendix B provides additional insights into student-written stories, the focus of our study was not on the writing or reading choices made by students It's important to note that writing stories does not introduce students to new vocabulary, whereas reading stories from other students could potentially do so However, we will not delve deeper into this aspect in our discussion.

The Reading Tutor selected stories that were generally more challenging than those chosen by the students Specifically, the average grade level of stories selected by the Reading Tutor was 1.13 ± 0.65, with levels ranging from K (0) to E (6) In contrast, the average grade level for stories chosen by students was significantly lower at only 0.61 ± 0.59, indicating a notable difference in the difficulty levels of the stories each group preferred.

A T-test analysis compared each student's new-material percentage for stories chosen by a Reading Tutor against their percentage for self-chosen stories The findings indicate that many students completed more challenging stories selected by the Reading Tutor, implying they may not have been adequately challenging themselves with their own choices Additionally, when comparing new student-chosen stories to previously read ones, it was observed that students tended to re-read stories that were easier on average (0.46 ± 0.66) than the new stories they selected (0.78 ± 0.57), with results being statistically significant (p < 001) This suggests a preference for re-reading easier stories among students.

Figure 3.4 illustrates the average distribution of story choices made by students Several factors may lead students to begin a story but not complete it, including opting to select a different story, logging off by clicking Goodbye, or being automatically logged out after a period of inactivity The significant percentage of unfinished stories chosen by the Reading Tutor is primarily attributed to students clicking to return to previous options.

Back to reject the Reading Tutor’s story choice.

Story level of stories chosen by the Reading Tutor

Stories finished by the student

Stories started but not finished All stories started

Story level of new stories chosen by the student

Story level of old stories chosen by the student

Figure 3.4 How many new stories were chosen and finished on average.

In Fall 1999, Figure 3.4 illustrates the average number of new stories selected and completed, distinguishing between choices made by students and those made by the Reading Tutor The story counts for Level U may be underestimated, as they consider stories with identical first lines as the same, which could lead to fewer new stories being recorded if multiple stories shared the same Reading Tutor-supplied opening line Notably, over half of the completed new stories were a result of selections made by the Reading Tutor, despite many initial choices being turned down.

Efficiency

In our previous research (Aist & Mostow ITS-PA 2000), we compared the 1999-2000 Take Turns Reading Tutor with the student-only story choice Reading Tutor from Spring 1998 As detailed in Chapter 2, Section 1, the 1997-98 Reading Tutor required two clicks to select and confirm a story, while the updated Take Turns Reading Tutor streamlined this process, allowing users to select a story with a single click and providing an option to go back and choose another story.

We conducted a comparison to evaluate the efficiency of the Take Turns version of the Reading Tutor against the previous student-only story choice version To achieve this, we focused on analyzing the time taken to select a story, specifically measuring the duration from the last sentence of one story to the second sentence of the next By filtering out browsing behaviors such as story selection and navigation, we aimed to gain a clearer understanding of the decision-making process involved in choosing a story Our analysis included a hand-reviewed sample of story choices from the student-only version of the Reading Tutor utilized in Spring.

In 1998, story selection took an average of 2.0 minutes (± 1.7 minutes) for 10 examples However, students using the Take Turns Reading Tutor in Fall 1999 significantly improved their story selection time to approximately 30 seconds (0.44 ± 0.27 minutes) across 9 examples This improvement was statistically significant, with a two-sample T-test yielding p = 02, indicating better performance when assuming unequal variances We attribute this enhancement to the combined effects of both the one-click pick feature and the Take Turns Reading Tutor.

The analysis presented is limited due to its reliance on a small sample size and the conflation of story selection time with the student's acceptance time A more effective approach to assess efficiency is to compare the duration it took for students to choose a story in the 1997-98 Reading Tutor with the time taken by the Reading Tutor itself to make a selection The disparity in these times is significant, indicating a clear advantage for the Reading Tutor's efficiency in story selection.

The 2000 Reading Tutor efficiently selects stories in under ten seconds, allowing students minimal time to explore their options Unlike students, whose reading and processing speeds limit their choices, computerized systems can quickly sift through hundreds of stories without significant time constraints Advances in computer technology could further enhance the speed of story selection, making the speed of the Reading Tutor's choices a non-issue for future applications.

The time taken for an individual computerized choice is less concerning than the overall time required to make a story decision In the Fall 1999 Reading Tutor, factors such as excessive browsing of the story menu and frequent rejection of choices contributed to delays To address this issue, potential solutions include offering students more detailed information about available stories, enhancing the appeal of the Reading Tutor's selections, and restricting the number of times a student can decline a suggested choice by using the Back button.

The crucial factor is the actual amount of reading students engaged in as a result of the story selections made by them and the Reading Tutor, demonstrating that their time was not merely spent on choosing stories Thus, we focus on the core issue: effectiveness.

Effectiveness

To evaluate the effectiveness of the Take Turns Reading Tutor in increasing student engagement with reading materials, we analyzed story choice data from two distinct periods The first dataset, from spring 1998, involved 24 students in grades 2, 4, and 5 at Fort Pitt Elementary School, who utilized a version of the Reading Tutor that allowed for student-only story selection In contrast, the second dataset from fall 1999 included 60 students in grades 2 and 3 at Centennial Elementary School, who participated in the Take Turns Reading Tutor, which incorporated a turn-based story selection process By comparing these two groups from similar educational contexts, we aimed to determine whether the new policy led to increased material engagement among students.

• both groups of students spent 3-4 months with the Reading Tutor by the end of the period we examined; 2

• in each study, there was one Reading Tutor per classroom, shared by 8-12 students;

• in both studies, students used the Reading Tutor individually in the classroom;

At the beginning of each study, the Reading Tutor was unfamiliar with all the classrooms, although some students in the Spring 1998 study may have used the Reading Tutor in the previous summer.

Appendix A summarizes this experiment according to the coding scheme used by the National Reading Panel report Teaching Children to Read (NRP 2000).

To compare the effectiveness of the two versions from 1997-98 and 1999-2000, we analyzed the percentage of new sentences introduced to students rather than just counting new words This approach allowed us to assess the changes in reading materials and how much new content students were exposed to, effectively reflecting the impact of the story choice policy.

To compare the effectiveness of the two versions, we calculated the percentage of new sentences encountered by students relative to the total sentences they saw, ensuring an equal weighting for all students Figure 3.7 presents boxplots illustrating the percentage of new sentences for participants in the Spring 1998 and Fall 1999 studies The subsequent statistics on sentences in this chapter were derived by counting the files that documented students’ utterances, encompassing all sentences recorded.

At Centennial Elementary, students utilized the Reading Tutor throughout the entire school year following Fall 1999, highlighting the ongoing benefits of this tool for vocabulary development The analysis of student-authored level U stories suggests their positive impact on vocabulary growth, despite the lack of recorded titles or levels in the utterance files Consequently, statistics derived from the Reading Tutor’s database do not include level U stories, emphasizing the need for careful consideration in evaluating their contribution.

Figure 3.7 Boxplot 3 for per-student rate of new sentences seen out of all sentences, Spring

Two studies were conducted involving students from various overlapping grades: grades 2, 4, and 5 in Spring 1998 and grades 2 and 3 in Fall 1999 A univariate analysis of variance (ANOVA) indicated a significant difference favoring Fall 1999 regarding the rate of new material encountered, with results showing F=3.25 and p = 075, significant at the 90% level Additionally, we calculated the average rate of new sentence encounters per student for each study.

In a study analyzing students' exposure to new material, researchers calculated the average rate of encountering new sentences by considering each student's individual rate and then averaging these rates In the spring of 1998, 24 students encountered over 10,000 sentences, resulting in an average rate of 60.1% for new sentences per student By the fall of 1999, this number increased to 60 students encountering nearly 35,000 sentences, with the average rate rising to 64.1%, reflecting a 6.7% increase Boxplots illustrating these rates for Spring 1998 and Fall 1999 are presented, categorized by grade level.

Figure 3.8 Spring 1998 student-only story choice vs Fall 1999 Take Turns story choice for percentage of new sentences, subdivided by grade.

The analysis of the Take Turns policy indicated its effectiveness in increasing the amount of new material read by students who initially engaged with less new content In the Fall 1999 "Take Turns" group, students in the bottom half demonstrated a slight increase in new material read compared to their peers in the Spring 1998 student-only story choice group, with a p-value of 301 when controlling for grade Both groups' top-half students read an average of 72% of new sentences Specifically, in the Spring 1998 study, the bottom 12 students averaged 49.6% new sentences, while the top 12 averaged 72.1% In contrast, the Fall 1999 study showed that the 28 students below the median read 54.9% new sentences, while the 32 students at or above the median read 72.2% This data is visually represented in Figure 3.9.

A comparative study conducted in Spring 1998 and Fall 1999 reveals the percentile rankings of students based on the percentage of new material they encountered, calculated as the ratio of new sentences to the total number of sentences It's important to note that some data points may reflect multiple students due to tied rankings.

To analyze the impact of the Reading Tutor's story selections on the number of new stories read by students in Fall 1999, we conducted a detailed examination of the data from that period We compared the frequency of new story choices made by students during their turn to select versus those chosen by either the Reading Tutor or the students themselves Additionally, we assessed the percentage of new material in stories chosen by students and the overall new-material percentage.

The impact of Reading Tutor-chosen stories is significant, particularly for students who typically select fewer new materials In a Fall 1999 study, students who finished their chosen stories had an average new-material percentage of 43%, while the lowest-performing 10% averaged below 15% However, when incorporating stories selected by the Reading Tutor, nearly all students engaged with over 50% new content Consequently, the Take Turns feature proved most beneficial for those students who initially opted for a smaller percentage of new stories.

Students ordered by rate of new stories started out of student-chosen stories

Rate of new stories started out of all stories started

Figure 3.10 New-material percentage for started stories, comparing stories chosen by the student (bottom points) vs stories chosen overall (top points), Fall 1999

Students ordered by P(new|finished) for stories chosen by the student

P(new|finished) for all stories

Figure 3.11 New-material percentage for finished stories, comparing student-chosen stories (bottom points) vs percent new stories finished overall (top points), Fall 1999

In the Take Turns version of the Reading Tutor, students encountered a higher percentage of new stories compared to the previous student-only story choice version Analyzing the number of new words presented to students during the 1999-2000 academic year reveals that second graders encountered between 458 and 1588 distinct words, with a median of 1124, while third graders saw between 856 and 1910 distinct words, with a median of 1224 This translates to approximately 11-12 new words per day over the course of around 100 days Statistical analysis indicated that third graders experienced significantly more new words than second graders (F=6.54, p=0.01), although no significant difference in the number of new stories read was found between the two grades (F=0.81, p=0.37) The discrepancy in new words can be attributed to the fact that second graders typically read easier, shorter stories, which resulted in fewer unique words.

Reading Tutor Student (new stories) Student (old stories) Second grade Start: 1.65 ± s.e 0.13

Start: 0.21 ± s.e 0.08 Finish: 0.18 ± s.e 0.07 Third grade Start: 1.96 ± s.e 0.18

Table 3.5 Story level (K=0, A=1, …) by grade.All main effects significant at p < 0.001: grade (FR.7, df=1), start/finish (F0.0, df=1), who chose the story (F.8, df = 1), old vs new story (FB.0, df=1).

Figure 3.12 Number of distinct words (not stems) seen by students using the ReadingTutor in 1999-2000 study.

Figure 3.13 Distinct stories seen by students in 1999-2000 study.

Lessons learned from story choice experiments

The story choice experiment revealed that the nature of collected data significantly influences the types of comparisons that can be made For instance, the Reading Tutor's collection of student utterances allowed for effective year-over-year comparisons, despite software changes Conversely, while comparing time spent on activities could be insightful, the Reading Tutor's data consisted of single events rather than defined intervals, complicating accurate measurement Issues such as timeouts or program crashes further hindered interval determination, making it less reliable than percentage comparisons of recorded events Consequently, we relied on percentages derived directly from the raw data to accurately reflect Reading Tutor usage Additionally, we complemented automated analyses with small, hand-counted samples to provide a more intuitive understanding of the experimental results.

In summary, Take Turns proved to be more effective than student-only story selection in promoting the reading of new material among all students, significantly increasing the overall volume of new stories read This approach particularly benefited those students who typically engaged with the least amount of new content independently.

While exposing students to new vocabulary is essential, it alone does not ensure they grasp the meanings of those words To enhance vocabulary acquisition, we must explore effective strategies for assisting students in learning from their encounters with new words The following chapter will focus on providing valuable vocabulary support.

4 Automatically generated vocabulary assistance and assessment: The factoids experiment

To effectively learn new vocabulary, students need reading materials that introduce unfamiliar words, but merely encountering these words in challenging texts isn't enough Individual interactions with new words often lack the depth needed for comprehensive understanding Therefore, it's essential to enhance the text in a way that provides richer context for each word, enabling students to gain more insight without significantly detracting from their reading experience.

In our investigation, we aimed to enhance text comprehension through vocabulary assistance This chapter details our comparison between augmented text and unaugmented text, rather than using a "no exposure" control group Our rationale for this approach is that if the augmented text does not provide additional benefits over unaugmented text, implementing such enhancements could ultimately waste students' time.

We now discuss several design questions for giving vocabulary help, and discuss where our vocabulary experiments stand in relation to each of them.

Identifying students who require vocabulary assistance is crucial for effective intervention Our experiments focused on providing vocabulary support to elementary students across various grades and reading levels, revealing key insights into which learners could benefit most from targeted vocabulary help.

The Reading Tutor should provide vocabulary assistance for specific categories of words, including those with limited meanings, words that are crucial for understanding the overall message of a passage, and rare, domain-independent terms.

What kind of vocabulary help should the Reading Tutor give?

• A conventional definition “asãtroãnaut A person trained to pilot, navigate, or otherwise participate in the flight of a spacecraft” (American Heritage 3 rd edition, 1996).

Definitions in children's dictionaries can vary significantly in complexity and length For instance, the term "astronaut" is defined simply as "a traveler in a spacecraft," while more complex terms like "comet" and "meteor" have lengthier explanations A comet is described as "a bright heavenly body that develops a cloudy tail as it moves in an orbit around the sun," whereas a meteor is defined as "one of the small bodies of matter in the solar system observable when it falls into the earth’s atmosphere, where the heat of friction may cause it to glow brightly for a short time."

• A comparison to another word “An astronaut is a kind of traveler.”

• A short explanation “An astronaut is someone who goes into outer space.”

• An example sentence “The astronaut went to the Moon in a rocket.” See for example Scott and Nagy (1997).

In this chapter, we describe work on comparisons to other words (Chapter 6 describes experiments on short explanations.)

Determining the optimal timing for vocabulary assistance in reading is crucial for effective learning Research by Memory (1990) indicates that for high school readers, the timing of vocabulary instruction—whether before, during, or after reading—may not significantly impact comprehension If this finding applies to elementary students as well, Reading Tutors have the flexibility to provide vocabulary support at various stages without hindering learning In our study, we offered vocabulary help just before the sentence containing the target word, while another study discussed in Chapter 6 provided assistance before the story itself.

Determining when help is needed in vocabulary assistance raises the question of whether the decision should lie with the computer, the student, or a combination of both To isolate the effects of vocabulary support without the influence of student-initiated requests, we opted for a method where the computer controls the presentation of words This approach ensures that both frequent and infrequent clickers receive equal assistance, promoting a more balanced learning experience.

This chapter presents an experiment focused on vocabulary assistance, which involves the automatic generation and presentation of word comparisons, as well as the creation and administration of assessments We will explore the rationale behind the study, its design and implementation, the results obtained, and the lessons learned throughout the process.

Rationale

Automatic generation of vocabulary assistance

To create a versatile vocabulary assistance tool for students, we utilized the extensive lexical database WordNet, developed by George Miller and colleagues This resource encompasses tens of thousands of words organized in a thesaurus-style hierarchy, facilitating both assistance and assessment of vocabulary usage By comparing words in any given text to those in WordNet, we designed an automated system that enhances reading comprehension and aids in the learning process.

Comparisons are most effective when a new word is a specialized version of a familiar concept, like "morose," which means very sad However, these comparisons may not be as effective for vocabulary related to previously unknown concepts, unless there are relatable categories, such as tuberculosis being classified as a type of disease.

Automatic generation of vocabulary assessment

To assess the effectiveness of vocabulary assistance, it is essential to evaluate multiple-choice questions based on the proximity of distractors to the correct answer Nagy, Herman, and Anderson (1985) provide a classification system for these questions, highlighting the importance of how closely the incorrect options relate to the correct choice.

Level 1 Distractors are a different part of speech from the correct answer For example, if the target word is astronaut and the correct answer is traveler, Level 1 distractors might be eating, ancient, and happily.

Level 2 Distractors are the same part of speech but semantically quite different For example, if the target word is astronaut and the correct answer is traveler, Level 2 distractors might be antelope, mansion, and certainty.

Level 3 Distractors are semantically similar to the correct answer For example, if the target word is astronaut and the correct answer is traveler, Level 3 distractors might be doctor, lawyer, and president This example illustrates that sometimes it is easier to design the intended answer and the distractors together; a more natural example would be a correct answer of (space) pilot, with the same distractors doctor, lawyer, and president.

We designed automated vocabulary assessment questions using the WordNet hierarchy, taking as our goal Nagy, Herman, and Anderson’s Level 3 multiple choice questions Section 4.3.1 provides further details.

Skeptics may question the use of automated, variable-quality, experimenter-designed questions over standardized instruments; however, this approach aligns with the National Reading Panel's recommendation for experiment-defined measures to assess vocabulary Experimenter-constructed measures are more sensitive to small vocabulary gains compared to standardized tests, which often sample a limited number of words In our study, all students received assistance and were exposed to words in both experimental and control conditions before taking multiple-choice tests, allowing for within-subject comparisons that standardized tests do not easily accommodate Additionally, we pre-tested students using the widely recognized Word Comprehension subtest of the Woodcock Reading Mastery Test (WRMT) Our analysis showed a significant correlation between students' performance on context-based multiple-choice questions and their scores on the WRMT, with correlations of r=0.47 for grade 2 and r=0.49 for grade 3, demonstrating that our measure aligns with national research standards, can be automatically constructed and scored, and correlates with established vocabulary assessments.

Experiment design

Figure 4.1 shows the design of the experiment described in this section, intended to contrast seeing a word in a story alone vs seeing a word in a story along with some vocabulary help.

Figure 4.1 Factoid flowchart, showing one example using the target word astronaut

( Desperate , like the other three possible answers, is used as a noun here.)

In summary, a control trial was as follows:

1 First, the student read out loud (with the Reading Tutor’s assistance, as described in Chapter

3 Student reads remainder of story, with Reading Tutor assistance For example:

The Russians took the lead thirty three years ago by sending the first astronaut into space …

1 Student reads story (with Reading Tutor assistance) up to just before sentence containing target word For example:

… For many years the United States and Russia worked separately on going into space.

5 Reading Tutor administers multiple-choice vocabulary question at start of session on the next day the student logs in For example:

Which of these do YOU think means the most like astronaut? past master desperate best friend traveler

2.b Student reads factoid, with Reading Tutor assistance For example: astronaut can be a kind of traveler Is it here?

4 One or more days pass

2) the portion of the story up to but not including the sentence containing the target word.

2 a Second – this is the control condition – nothing happened.

3 Third, the student continued reading the story, starting at the sentence containing the target word.

4 Fourth, one or more days elapsed.

5 Finally, the student answered a multiple choice question on the target word.

An experimental trial was as follows:

1 First, the student read out loud (with the Reading Tutor’s assistance) the portion of the story up to but not including the sentence containing the target word.

2 b Second – this is the experimental condition – the student read out loud (with the Reading Tutor’s assistance) a factoid comparing the target word to another word For example,

“astronaut can be a kind of traveler Is it here?”

3 Third, the student continued reading the story, starting at the sentence containing the target word.

4 Fourth, one or more days elapsed.

5 Finally, the student answered a multiple choice question on the target word.

Here is an example of an experimental trial, excerpted from actual Reading Tutor use during Fall 1999 For convenience, we have put events involving the target word astronaut in boldface.

Time event occurred What happened?

Student (P.O., girl aged 9 years 5 months) chooses Level C story “Life in Space” (adapted from a Weekly Reader passage)

2 seconds later 12:37:12.259 Reading Tutor displays sentence “For many years the United States and

Russia worked separately on going into space.”

Student tries reading sentence out loud.

19 seconds later 12:37:31.106 Student finishes speaking.

Actual utterance: for many years the united states of russia worked s sponidy on going to space

For many years, the United States and Russia pursued their own independent space exploration efforts.

< 1 second later 12:37:31.166 Reading Tutor decides to display next sentence of story

24 seconds later 12:37:55.391 Reading Tutor displays first sentence of factoid: “astronaut can be a kind of traveler.”

16 seconds later 12:38:11.464 Student finishes speaking; Reading Tutor heard:

ASTRONAUT CAN BE A KIND OF TRAVELER ASTRONAUT CAN

< 1 second later 12:38:11.524 Reading Tutor decides to go on to the next sentence

3 seconds later 12:38:14.408 Reading Tutor displays second sentence of factoid: “Is it here?”

9 seconds later 12:38:23.571 Student finishes speaking; Reading Tutor heard:

The Reading Tutor's interpretation of a student's reading may differ from the student's actual words To enhance comprehension, the tutor often adds extra words to short sentences, helping to capture the nuances of the student's oral reading, including any insertions or deletions For instance, the word "INDIA" was included to facilitate this process.

< 1 second later 12:38:23.621 Reading Tutor decides to display next sentence

1 second later 12:38:24.843 Reading Tutor displays: “The Russians took the lead thirty three years ago by sending the first astronaut into space.”

Student logs in the next day

2 seconds later 12:28:08.564 Reading Tutor presents student’s name, for student to read as confirmation of identity and to make sure the microphone is working

10 seconds later 12:28:18.098 Student finishes reading name

The Reading Tutor engages students by presenting a vocabulary question that includes both the question and answer options, while also reading them aloud The question asks which of the following words is most similar in meaning to "pail": railway car, paper bag, bucket, or piles.

16 seconds later 12:28:43.845 Student clicks on bucket (right!)

6 seconds later 12:28:49.713 Reading Tutor presents vocabulary question:

Which of these do YOU think means the most like asparagus? butterfly pea; bog plant; yam plant; herb

20 seconds later 12:29:10.232 Student clicks on herb (right!)

17 seconds later 12:29:36.881 Reading Tutor presents vocabulary question:

Which of these do YOU think means the most like astronaut? past master; desperate; best friend; traveler

17 seconds later 12:29:54.025 Student clicks on traveler (right!)

5 seconds later 12:29:59.013 Reading Tutor presents vocabulary question:

Which of these do YOU think means the most like fetch? bring; project; impact; ferry

14 seconds later 12:30:13.073 Student clicks on impact (wrong!)

4 seconds later 12:30:17.299 Reading Tutor presents vocabulary question:

Which of these do YOU think means the most like silk? material; hill; piece; cross

21 seconds later 12:30:37.708 Student clicks on material (right!)

8 seconds later 12:30:45.760 Reading Tutor chooses Level A story: “The Letter A”

The Reading Tutor occasionally experienced delays in displaying factoids due to database access, with instances like the astronaut example taking up to 24 seconds Additionally, students often repeated sentences when their reading was not immediately accepted During each experimental trial, the Reading Tutor presented one factoid per target word; if two target words appeared in a single sentence and were assigned to the experimental condition, separate factoids were shown for each However, if a target word appeared twice in the same sentence, we intended for the factoid to be displayed only once, but unfortunately, these factoids were not shown at all, which was a rare occurrence limited to a few specific sentences.

A minor bug in the Reading Tutor led to the display of multiple factoids for specific capitalized words that appeared at the beginning of sentences These words, totaling fewer than ten, were excluded from the experimental analysis.

We now discuss in more detail how the Fall 1999 Reading Tutor carried out the factoid experiment.

Implementation of the factoid experiment

Selecting target words

A target word for factoid vocabulary assistance had to meet several conditions.

4.3.1.1 Conditions for generating vocabulary assistance

The Reading Tutor aimed to provide automated assistance for words with limited meanings, specifically those with three or fewer senses in WordNet across all parts of speech This approach was essential to avoid overly lengthy or confusing help For instance, the word "astronaut" was suitable as it has a single clear sense: "a person trained to travel in a spacecraft." Conversely, "times" was deemed inappropriate because it has multiple meanings, including "multiplication" and "best of times, worst of times," while its stem "time" has even more senses Stemming was performed using WordNet's "morph" function to ensure clarity in word meanings.

The chosen word must be at least three letters long to avoid overly simple options like "cat" or "dog." It should not appear on the list of 36 specific words identified by Mostow et al (1994) Furthermore, the word cannot be represented as a number in Arabic numerals, such as 200 or 35.

Table 4.1 Thirty-six function words excluded from vocabulary experiment.

The term should not be treated as a proper noun It must not be capitalized, except for when it appears as the first word in a sentence This guideline aims to exclude the majority of names.

4.3.1.2 Conditions for generating vocabulary assessment

The Reading Tutor had to be able to ask a vocabulary question about the target word To ask a multiple choice question, the Reading Tutor needed a correct answer and three distractors.

Our objective was to implement the criterion of semantically similar distractors as outlined by Nagy et al (1985) To create a 4-item multiple-choice vocabulary question, the Reading Tutor required one correct answer along with three distractors that were incorrect yet semantically related.

The Reading Tutor employed synonyms and hypernyms as the primary method for providing answers, only resorting to a sibling term when neither option was available Examples include terms like "past master," "best friend," and "individual," alongside variations such as "traveler" and "astronaut." This approach ensures a comprehensive understanding of language by utilizing a range of related words, enhancing vocabulary and comprehension skills.

In WordNet 1.6, relationships among words are organized in a hierarchy, where nodes represent synonyms forming a single synset For instance, "astronaut" and "pedestrian" are siblings, sharing the parent node "traveller." Meanwhile, "astronaut" is a cousin to "best friend," as they both connect through the grandparent node "person."

To effectively assess vocabulary comprehension, each question should include at least three distractors that are closely related to the target word These distractors, which share a common linguistic ancestor but have distinct meanings, are designed to challenge students' understanding By presenting semantically similar options, the multiple-choice format encourages students to discern the precise meaning of the target word.

In the actual test, answers were presented in a randomized order to minimize variance and reduce the influence of peer responses on children's performance with the Reading Tutor Additionally, each trial featured a unique selection of correct answers and distractors for every word, ensuring variability in the testing process.

To ensure social acceptability, both the target word and its comparison words, intended answers, and distractors were carefully screened We implemented a two-pronged approach: first, requiring a natural-speech recording of the target word by a Project LISTEN team member to prevent the use of offensive language, and second, excluding any words from a list of explicitly banned terms to mitigate risks associated with secondary slang meanings While these measures effectively addressed many issues, they were not foolproof, as they sometimes allowed for phrases that, while individually inoffensive, could collectively be deemed inappropriate, such as the term "white trash," which is considered a cultural slur.

Assigning words to conditions for factoid vocabulary assistance

In the Fall 1999 factoid vocabulary study, the Reading Tutor randomly assigned target words to either an experimental condition (factoid plus context) or a control condition (context alone) for each student This randomization ensured that while one student might encounter the word "astronaut" in the experimental condition, another student could see it in the control condition When students came across previously unseen words, the Reading Tutor designated these words to the appropriate condition based on this random assignment By using the same passages for both conditions and employing a large number of trials, the study effectively controlled for text and word differences Although future studies may focus on selecting words based on individual student needs, the Fall 1999 Reading Tutor maintained a blind, random assignment approach to enhance each student's learning experience.

To enhance treatment fidelity, we implemented an open-ended set of target words for the Reading Tutor, allowing for assistance with any new material introduced by teachers, students, or the Project LISTEN team without disrupting the study design The intention was to maintain consistent word assignments throughout a student's use of the Reading Tutor, facilitating the analysis of long-term effects from repeated exposure to vocabulary However, a software flaw prevented these assignments from being saved, leading us to analyze only the initial day of a student's interaction with a word and the subsequent vocabulary question.

Constructing vocabulary assistance

The factoid vocabulary assistance in the Reading Tutor provides targeted support by offering brief

The Reading Tutor developed factoid texts using a template that included placeholders for both the target and comparison words The templates utilized in the 1999-2000 study were specifically designed for this purpose.

Antonym “The_Stem may be the opposite of The_Antonym Is it here?”

Hypernym “The_Stem can be a kind of The_Hypernym Is it here?”

Synonym “Maybe The_Stem is like The_Synonym here Is it?”

The base form of the word "astronaut" is "astronauts." In linguistic terms, a synonym for "astronaut" refers to a word with a similar meaning, while a hypernym is a broader category that encompasses the term Interestingly, antonyms, or words with opposite meanings, are less common for "astronaut," as there is no widely recognized opposite for this term Overall, synonyms and hypernyms are utilized more frequently in language than antonyms.

Presenting vocabulary assistance

To enhance student engagement with vocabulary assistance and provide additional reading practice, we implemented a method where students read the vocabulary support aloud with guidance from the Reading Tutor Alternatives we explored included verbalizing the vocabulary assistance, displaying it briefly in a drop-down window beneath the original sentence, or using a mix of spoken and drop-down text.

In collaboration with the Project LISTEN team, including Kerry Ishizaki and Jack Mostow, as well as Human-Computer Interaction Master's student Margaret McCormack, we successfully achieved several design goals through extensive joint design efforts.

• To distinguish the factoid from the original text, the factoid appears on a yellow background.

The factoid is displayed in a call-out balloon linked to the Reading Tutor's face, positioned in the lower left corner of the screen, rather than being credited to the original text's author.

• To avoid confusion about what to read, and to simplify layout, the balloon occludes the original text.

To enhance student support, the factoid is displayed in text format for reading aloud, complemented by Reading Tutor assistance (Figure 4.3) This text presentation also opens the door for potential future enhancements, such as providing additional factoids related to the original factoids.

Figure 4.3 Factoid in popup window.

Administering the automatically constructed multiple-choice questions

We evaluated the effectiveness of vocabulary intervention by having the Reading Tutor present vocabulary questions for both experimental and control words the next time a student logged in For instance, when asked about the target word "astronaut," students were given multiple-choice answers, including "past master," "desperate," "best friend," and "traveler," with the correct answer being "traveler." To minimize bias and prevent cheating, the answers were displayed in random order, and the Reading Tutor articulated the prompt and each answer sequentially, highlighting them in yellow Students could select their answers at any time, although responding took between 14 to 21 seconds Administering these vocabulary questions at the beginning of the session ensured that they were unaffected by any prior assistance.

Figure 4.4 Multiple-choice question for factoid experiment.

Data integrity

We gathered data from over 3,000 factoid trials using a comprehensive database However, one trial was not recorded due to a full hard drive On March 22, 1999, a student received assistance with the word "POUNCE," which was noted in the Reading Tutor's log file but not captured in the database.

Results of the factoid experiment

Overall, factoids did not help…

A study examining the impact of factoids on student performance revealed that the average percentage of correct answers was 37.2% for control trials and 38.5% for experimental trials, indicating no significant improvement The variation in student performance was reflected in the high standard deviations of these averages With a coefficient of 0.07 for the effect of assistance on answering questions correctly across 3,359 trials, the findings suggest that factoids did not enhance overall performance, potentially due to issues with the automated assistance used in the trials.

Exploration revealed possible effect for rare single-sense words, tested one or two days later

tested one or two days later

We investigated the circumstances that could influence the effectiveness of factoids by analyzing different subsets of data Our exploratory analysis aims to determine how factoids may have been beneficial or ineffective under various conditions It is important to note that the findings presented are suggestive rather than definitive, raising questions about the conditions that could impact the efficacy of factoids.

The presence of multiple meanings in certain target words can lead to confusion among students, especially when factoids provide explanations that differ from the intended context While factoids may not significantly assist with single-sense words, there is a positive trend in their effectiveness For simpler words, like "apple," prior knowledge may render factoids unnecessary However, the question remains: do factoids aid in understanding single-sense words that are more challenging?

We manually classified each target word as either hard or not hard, ensuring that our classification was unbiased by the outcomes of individual trials To maintain objectivity, we did not consider the trial results while categorizing the words Additionally, we identified rare words, defined as those occurring fewer than 20 times in the million-word Brown corpus (Kucera and Francis 1967) Although the results were not statistically significant, they suggest a potential positive impact of factoids, as indicated in Table 4.3.

Students may retain information for a short period, such as a few days, but struggle with long-term retention, especially over weekends While factoids can aid in recalling rare words tested shortly after learning, their effectiveness diminishes over time To enhance students' retention of word meanings, future research should focus on providing a second exposure to the target words, reinforcing their understanding and memory.

Number of trials How were trials selected? Per-student average number right Coefficient in logistic regression model

720 trials Single-sense words 34.9% ± 23.0% for control vs.

191 trials Single sense words coded as hard by a certified elementary teacher

348 trials Single sense words coded as hard by the experimenter

317 trials Single sense rare words 35.4% ± 30.5% for control vs

189 trials Single-sense rare words tested one or two days later 25.8% ± 29.4% for control vs.

1.04 ± 0.42 Significant at 95%, exploratory and thus not correcting for multiple comparisons

Table 4.3 Single-sense difficult words in the factoids experiment.

Word Word frequency in Brown corpus

Aluminum is a type of metal, while an astronaut refers to a traveler Bliss can be likened to the feeling of walking on air A coward is described as a type of mortal, and crouching is a form of sitting down Daisies are categorized as flowers, and an eggshell serves as a natural covering.

In a whimsical world, a little dog with a mouth full of glittering meat stays together, shining brightly as it carries a lurching headdress made from plastic wrap This unique apparel serves as a dust cover, greeting passersby with a cheerful "hello," akin to a warm hi or good day However, the dog also faces infirmities, representing a bad condition or wound that it must navigate Along the outskirts of a bustling city district, this creature encounters various elements like pasta, a beloved food product, and pebbles, which are small stones scattered along the roads As it explores, it consults a plat, a kind of map, to find its way The dog's journey is adorned with plumage, a vibrant body covering, and it occasionally stumbles upon pollen, a fine powder that adds to the magic of its surroundings In this enchanting tale, the dog embodies the spirit of a princess, a symbol of blue blood and nobility, as it travels through its charming realm.

Rwanda - Rwanda can be a kind of African country African country; England; United

In the United States, salads can serve as a delightful dish for breakfast or a light bite Tennis, played on a court, is a popular sport enjoyed both day and night The Blue Ridge Mountains feature beautiful hills and valleys, offering picturesque landscapes Wading through streams allows for a leisurely walk, while the wayside represents the edge of a path or road, enhancing the natural scenery.

Table 4.4 Single-sense rare words tested one or two days later.

As a sanity check, we looked at the 27 words in these trials (Table 4.4) Word frequency in

2 plat appeared in an (apparently) student-written story which included the sentence “slapt flash slise plair clio ciay glass plat”.

Table 4.4, derived from the extensive Brown corpus (Kucera & Francis, 1967), reveals that the words "glittering" and "twinkling" were not supplemented with factoids during the study due to the random assignment of trials The words listed appear to be plausible candidates that elementary school students might struggle with, indicating a need for explanations By selecting trials conducted one or two days post-training, the study inadvertently introduced a self-selection bias, as fewer trials were included from frequently absent students Consequently, we shifted our focus to analyzing factoid results based on student attributes, such as grade, rather than self-selection factors.

Further characterization of factoid results

To better understand the impact of factoids on student responses, we analyzed various subdivisions of the data, focusing on their influence on the percentage of correct answers and the regression model's coefficient for the effect of factoids Table 4.5 presents the average percentage of correct answers per student and the corresponding effect of factoids on responses across different data subsets.

Which students? Which words? Trials Percentage correct Outcome: Coefficient ± 1 s.d.

All students All words 3359 37.2% ± 16.9% control

All students Single-sense 769 36.8% ± 26.6% control

All students Rare words 1927 35.6% ± 19.5% control

38.3% ± 21.1% expt No significant effect: 0.13 ± 0.10 All students Non-rare words 1427 40.0% ± 18.6% control

29 students below median on weighted score of WRMT word comprehension pretest

31 students at or above median on weighted score of WRMT word comprehension pretest

Table 4.5 Further characterization of factoid results.

Word recency effect

In an experiment comparing the word "traveler" to "astronaut," researchers found that 993 out of 1,709 trials used the same word for both the comparison and expected answers, while 716 trials utilized different words This overlap raised concerns that the observed effects might result from a recency effect, stemming from participants having encountered the comparison word on an earlier occasion To mitigate these potential biases, subsequent experiments focused on enhancing text with definitions to eliminate recency influences.

Lessons learned from factoid study

In summary, the study indicates that incorporating factoids into text can enhance vocabulary acquisition for third graders, particularly with rare words (p < 10) and single-sense rare words assessed 1-2 days later (p < 05).

The effectiveness of factoid assistance was hindered by several issues, including target word frequency, multiple meanings, and socially inappropriate content Furthermore, some comparison words used for vocabulary assistance were more difficult to comprehend than the target words themselves Additionally, challenges with automated assessment further obscured the overall effectiveness of factoids.

1 For example, some of the incorrect answers (distractors) were themselves rare – such as butterfly pea – making the question difficult to understand.

2 Or, questions may have relied on uncommon knowledge, such as banana being (botanically) an herb.

These examples illustrate the absence of some constraints that common sense would enforce, but that a computer program must explicitly allow for.

At the end of Fall 1999, we discontinued the vocabulary questions due to performance issues and to maintain consistency in the Reading Tutor experience for children Although this decision limited our ability to conduct detailed analyses of factoids in Spring 2000, Chapter 5 offers a comprehensive evaluation of the Reading Tutor's effectiveness, comparing it to both classroom instruction and one-on-one human tutoring over the course of a year.

In conclusion, while factoids can be helpful, effective automatic assistance necessitates common sense that current algorithms do not possess Therefore, we recommend utilizing vocabulary assistance that is either machine-generated and subsequently refined by humans or created entirely by hand This approach aligns with the experimental designs discussed in Chapter 6.

5 How well did the 1999-2000 Reading Tutor help children learn vocabulary?

In this article, we explore two significant enhancements to computer-assisted oral reading Firstly, we introduced a "Take Turns" story selection policy, allowing both the computer and the student to alternate in choosing reading materials, which led to an increase in the variety of new stories read by students Secondly, we enriched the reading experience by incorporating vocabulary assistance through automatically generated factoids, enhancing comprehension and engagement.

Astronauts can be considered a unique type of traveler Our research indicates that factoids significantly aid third graders in understanding rare vocabulary, particularly when it comes to uncommon words encountered one to two days later.

The enhancements made to the baseline 1997-98 Reading Tutor significantly improved its effectiveness This chapter examines how the upgraded 1999-2000 Reading Tutor, which features Take Turns and factoids, performed in comparison to other reading instruction methods, particularly in terms of vocabulary acquisition We present key findings from a broader study conducted in 1999-2000, which we helped design and analyze, focusing on vocabulary gains achieved by the modified Reading Tutor relative to alternative treatments.

In a year-long study conducted by Project LISTEN from 1999 to 2000, researchers compared the effectiveness of three different treatments: the Reading Tutor, human tutoring through assisted reading and writing, and a control condition featuring equal-time classroom instruction The main objective of this study was to evaluate and enhance the Reading Tutor program by assessing its performance against traditional tutoring methods.

This dissertation examines the effectiveness of traditional reading instruction methods while highlighting opportunities for enhancement, specifically in vocabulary acquisition We will focus solely on the aspects of the study that pertain to improving vocabulary learning.

This chapter outlines the study participants, treatment conditions, and methods used to measure learning outcomes We will present the results and discuss their implications For a detailed overview of the study following the National Reading Panel's guidelines, please refer to Appendix A Additional insights can be found in the works of Aist et al (AI-ED 2001) and Mostow et al (AI-ED).

Assignment of students to treatments

In a study conducted during the 1999-2000 school year at Centennial Elementary School near Pittsburgh, Pennsylvania, 144 students in grades 2-3 participated, with six out of twelve classrooms designated to receive a Reading Tutor based on the principal's assessment of teacher cooperation All classroom teachers involved were experienced veterans, ensuring comparability among them Each classroom followed a basal reading curriculum with an average class size of 24 students Teachers identified the 12 lowest-reading students, from which two were randomly selected as control subjects, while the remaining 10 received daily 20-minute sessions with the Reading Tutor Although teachers attempted to include three control students in the tutoring sessions, they were ultimately excluded from the analysis due to insufficient participation In non-Reading Tutor classrooms, the lowest-reading students were randomly assigned to either receive standard classroom instruction or one-on-one tutoring.

Assigning students to different treatment conditions may result in varying amounts of instruction on specific reading skills The 20 minutes allocated for each treatment could be distributed differently across reading and writing skills, and in the classroom control condition, it may also include instruction on other subjects.

Treatments

We describe each treatment condition in turn.

Classroom control (labeled “control” in figures) Students in the baseline condition received normal instruction

Students in the human tutoring group received approximately 20 minutes of daily tutoring, conducted at a desk located in the hallway outside their classroom Each control classroom was assigned a dedicated human tutor, although occasional substitutions occurred due to the tutors' other commitments.

Research on one-on-one human tutoring for elementary reading has involved tutors with diverse training backgrounds, ranging from volunteers to certified teachers A study by Wasik and Slavin (1996) highlighted that certified teachers yielded more positive outcomes in reading tutoring compared to paraprofessionals The tutors in this study were certified elementary teachers, possessing at least a bachelor's degree and minimal teaching experience, yet lacking specialized training in reading This setup provided a realistic and replicable comparison to traditional classroom instruction, demonstrating the effectiveness of trained tutors in enhancing reading skills.

During tutoring sessions, human tutors utilized paper copies of the Reading Tutor stories to assist students with reading and writing, while also engaging in additional activities like individual word review, and maintaining written records To ensure consistency, tutors were asked to refrain from introducing outside materials Tutors kept a "Human Tutor Log" to track the stories read by their students and had them write in a journal Project LISTEN collected these logs and journals at the end of the year, and for efficient documentation, used a digital camera to capture them on-site throughout the year.

Reading Tutor (labeled “computer” in figures) There was one Reading Tutor computer per

Reading Tutor classroom in the 1999-2000 study Students in the Reading Tutor condition were scheduled for 20 minutes of Reading Tutor use, working individually

In the study by Mostow et al (2001), tutoring sessions under both human and Reading Tutor conditions comprised only a small part of the school day, ensuring that most instruction was uniform across the classroom To prevent students from consistently missing the same subjects, teachers rotated the scheduling of tutoring sessions Each student received 20 minutes of the same treatment daily throughout the year, alongside 60-70 minutes of reading instruction and additional related activities Consequently, tutored students averaged only 0 to 15 minutes more daily on reading and writing compared to their peers.

This chapter compares three comprehensive reading treatments that incorporate various activities, including those focused on vocabulary development It emphasizes that while vocabulary development is a key aspect, it may be more effectively achieved through targeted interventions rather than broad treatments Such focused approaches could prioritize tasks like matching words with synonyms or definitions, potentially sacrificing the benefits of learning words in context.

Out of the 144 students who began the study, 131 completed the study Table 5.1 shows the division of students into conditions

Grade Room type control human computer

Table 5.1 Assignments of students to treatments, by grade and classroom, showing the

131 students who completed the study (One student – in room 208 here – switched grades from Grade 2 to Grade 3.)

Outcome measures

Project LISTEN utilized the Woodcock Reading Mastery Test (WRMT) to obtain results comparable to other studies on reading instruction The WRMT, which is individually administered and normed by month within grade, has a mean score of 100 and a standard deviation of 15 This assessment includes various subtests targeting specific reading skills, such as Word Attack for decoding, Word Identification for reading single words aloud, Word Comprehension for understanding individual words, and Passage Comprehension for grasping short passages Additionally, trained testers evaluated students' oral reading fluency by measuring their unassisted reading rates on designated passages at both grade level and appropriate student levels.

Results on Word Comprehension

Did Reading Tutor students gain from pre- to post-test?

The difference in raw weighted scores for Word Comprehension between post-test and pre-test was 15.72 ± 1.12, with a 95% confidence interval of 13.49 to 17.96 Boxplots in Figure 5.2 illustrate the raw gains in Word Comprehension across different treatment groups, while Figure 5.3 breaks it down further by treatment and grade level However, this observed gain may merely indicate general growth in children over the year; therefore, we will compare these results against national norms to account for this growth.

Figure 5.2 Word Comprehension raw score gains, by treatment 1999-2000.

Word Comprehension Raw Score Gains

Figure 5.3 Raw score Word Comprehension gains by grade and treatment.

Did Reading Tutor students gain more than a national cohort?

In our analysis of normed gains, we found that a zero gain indicates a student's performance remained consistent relative to the norming sample, not a lack of learning Students utilizing the Reading Tutor demonstrated an average gain of 4.38, with a standard error of 0.90 and a 95% confidence interval ranging from 2.58 to 6.18 This indicates that these students made significant progress in their normed scores Visual representations in Figures 5.4 and 5.5 illustrate the normed score gains across different treatments and grades, while Figures 5.6 and 5.7 provide insights into the pretest, post-test, and gains categorized by grade level.

W o rd C o m p re h e n s io n N or m ed S co re G a in s

Figure 5.4 Word Comprehension normed score gains, by treatment 1999-2000.

W o rd C o m p re h e n s io n N o rm e d S co re G a in s

Figure 5.5 Word Comprehension normed score gains, by grade and treatment.

Figure 5.6 Word Comprehension normed pretest, post-test, and gains, Grade 2.

Figure 5.7 Word Comprehension normed pretest, post-test, and gains, Grade 3.

Did Reading Tutor students gain more than their peers who received classroom instruction?

We conducted an analysis of variance to assess Word Comprehension gains based on treatment and grade, incorporating an interaction term for both factors Table 5.2 details results for each of the six human tutors, with a specific focus on the Reading Tutor for comparative analysis An overall summary is provided in Table 5.3 To account for regression to the mean and the varying effects of pretests on gains, we included normed pretest scores for Word Comprehension and Word Identification, optimizing the model's fit by evaluating various covariate combinations The analysis revealed a significant interaction between treatment and grade (F=2.47, p=.088), prompting us to analyze grades 2 and 3 separately.

In grades 2 and 3, the treatment had no significant effect on grade 2 students (F=0.32, p = 731), yet all treatment groups showed improvement compared to the national norm This suggests that the extensive classroom instruction in grade 2 may have been sufficient to enhance vocabulary development, potentially overshadowing any additional benefits from human tutors and the Reading Tutor.

In grade 3, a significant main effect was observed for treatment (F=4.27, p=.018), indicating that students using the Reading Tutor outperformed their peers receiving standard classroom instruction, with an average gain of 3.90 points in Word Comprehension (effect size = 0.56, p=.042) Additionally, third graders receiving human tutoring also demonstrated superior performance compared to those in classroom treatment, achieving a gain of 4.56 points (effect size = 0.72, p=.039) These results suggest that both the Reading Tutor and human tutoring significantly enhance reading comprehension skills in third graders, ruling out regression to the mean as an explanation for the observed effects.

We continued to use this analysis to answer our fourth question.

Did Reading Tutor students gain more than their peers who received one-on-one human tutoring?

received one-on-one human tutoring?

In grade 2, there was no significant effect of treatment, while in grade 3, students who received human tutoring showed a slight advantage over those using the Reading Tutor, with a mean gain of 0.66 points in normed Word Comprehension (standard error ± 1.65) A comparison of the Reading Tutor with individual human tutors and classroom instruction on Word Comprehension gains, categorized by grade, is detailed in Table 5.3.

1 Effect size is the adjusted gains difference divided by the average standard deviation of the compared subtests.

2 Incidentally, gains vs a control group does not permit regression to the mean as an explanation of the effect.

Grade 2 Grade 3 Individual results normed by grade 9.67 ± 2.42 HT Room 205 n=6

7.83 ± 6.31 CT Room 208 n=6 7.00 ± 2.65 CT Room 209 n=3 6.17 ± 6.71 HT Room 209 n=6

2.20 ± 6.76 HT Room 310 n=5 1.00 ± 5.66 CT Room 301 n=2 -4.00 ± 4.98 CT Room 310 -4.40 ± n=6 5.08 CT Room 309 n=5

Individual results normed by grade, from a model including Word

9.81 ± 2.48 HT Room 205 6.91 ± 2.49 CT Room 208 6.56 ± 3.54 CT Room 209 6.37 ± 4.46 CT Room 212 6.25 ± 2.48 HT Room 209

7.80 ± 4.33 CT Room 303 6.29 ± 2.74 HT Room 305 5.92 ± 2.75 CT Room 305

1.57 ± 4.30 CT Room 301 -3.08 ± 2.50 CT Room 310 -3.96 ± 2.73 CT Room 309

Table 5.2 presents the gains in word comprehension normed scores by grade for three instructional methods: classroom control (CT), human tutors (HT), and the Reading Tutor (RT) The results before adjusting for pretest scores indicate the mean scores along with their standard deviations, while the adjusted results, which account for Word Comprehension and Word Identification, show the mean scores accompanied by standard errors.

Table 5.2 presents the grade-wise results, indicating that second graders in all three treatment groups achieved similar gains For third graders, those using the Reading Tutor outperformed the classroom instruction group in Word Comprehension, as did the students receiving human tutoring When comparing the Reading Tutor to human tutoring, both showed comparable effectiveness; however, the small sample size for human tutors limits a precise statistical ranking Overall, the Word Comprehension improvements for both the Reading Tutor and human-tutored students were closely aligned.

Significant differences were observed among human tutors regarding Word Comprehension gains in second graders Students tutored by M.B demonstrated a notable improvement, achieving an average gain of 9.81, while those tutored by M.E experienced a decline of -2.22 To enhance the Reading Tutor's effectiveness in vocabulary learning, future research may focus on analyzing and comparing the tutoring strategies employed by M.B and M.E.

Overall Grade 2 Grade 3 Compare classroom- tutored students, human-tutored students, and students who used the

Marginally significant effect of grade*treatment interaction: F = 2.47, p = 0.088.

No significant main effect of differing treatment conditions

Significant main effect for treatment:

HT – CT = 4.56 ± 1.78 p =0.039 with Bonferroni correction for multiple comparisons

RT – CT = 3.90 ± 1.54 p = 0.042 with Bonferroni correction Reading Tutor ≈ Human tutor:

RT – HT = -.663 ± 1.646, p = 1.0 Compare individual human tutors and the Reading

Marginally significant main effect of different tutors:

Marginally significant main effect of different tutors:

Different tutors not significantly different

Tutors ordered from highest to lowest grade- normed word

Relationship between students’ word comprehension gains and distinct words seen

Human tutor AC: 6.2 ± 6.7, 6 students Reading Tutor: 4.4 ± 7.5, 29 students Human tutor ME: -4.0 ± 7.3, 5 students

Human tutor LN: 4.8 ± 5.2, 5 students Reading Tutor: 4.3 ± 6.3, 29 students Human tutor MM: 3.1 ± 4.3, 7 students

Table 5.3 Comparing the Reading Tutor to human tutoring, classroom instruction.

5.6 Relationship between students’ word comprehension gains and distinct words seen

Students using the Reading Tutor demonstrated significant improvements in word comprehension, particularly third graders who outperformed their peers in a traditional classroom setting Our informal model of vocabulary learning suggests that increased exposure to words correlates with a higher number of words learned This raises the question of how the students' vocabulary gains align with our model's predictions.

To determine whether students’ encounters with words significantly affected their word comprehension gains, we calculated the partial correlation between a student’s grade-normed

The study examined the relationship between Word Comprehension gains and the number of distinct words encountered by students in the Reading Tutor A significant correlation was found between pretest scores and gains (-.4034, p=.002), while a weaker correlation was observed with the number of words seen (.22, p=.102) After controlling for pretest scores, the partial correlation between Word Comprehension gains and distinct words seen was 18 (p = 178), indicating a weak association It is important to note that this correlation does not imply causality, as factors such as students' motivation to engage with appropriately challenging material may influence both variables.

Future analysis will address the attribution of gains to story choice or factoids However, our findings present the first direct evidence that our vocabulary learning model aligns with students' learning outcomes Importantly, exposure to new words in the Reading Tutor significantly contributed to students' vocabulary development, independent of their prior vocabulary knowledge.

Lessons learned

A year-long study revealed that a computer tutor outperformed a classroom control in enhancing third graders' vocabulary learning The 1999-2000 Reading Tutor, featuring Take Turns and factoids, showed competitive results compared to one-on-one human tutoring To improve the effectiveness of the Reading Tutor for vocabulary acquisition, Chapter 6 explores experiments aimed at identifying which students gain the most from vocabulary assistance.

6 Follow-on experiments in vocabulary assistance

This chapter explores two follow-up experiments focused on enhancing vocabulary support through in-context explanations Initially, we assessed whether low-reading elementary students could comprehend and effectively utilize the information provided in these explanations Subsequently, we conducted an experiment to differentiate the impact of encountering a word within running text versus in an explanatory context Each experiment is discussed in detail.

Can (low-reading elementary) students make use of explanations?: The comets and meteors experiment

Texts

We adapted two nonfiction texts from the StarChild website, aimed at middle school students, focusing on comets and meteors The comets text begins with the statement, “Scientists believe that comets are made up of material left over from when the Sun and the planets were formed.” To ensure both texts were approximately equal in length, we edited by removing sentences from one passage and incorporating content from a version designed for older students into the other, while maintaining the original meaning and flow The complete original and edited texts can be found in Appendix D.

The original Web-formatted texts contained hyperlinks to an online glossary written by the same teachers who wrote the original text For example, the definition for comet was “COMET:

Comets are often described as "a big ball of dirty ice and snow in outer space." In our printed version, we included definitions and nonsemantic aids alongside the original text For instance, a nonsemantic aid for the term "comet" was simply stating, "COMET starts with C." Full definitions and nonsemantic assistance can be found in Appendix D.

The original web pages featured images such as a girl reading, a boy with a toy spaceship, comets, and meteors However, in our experiment focused on text-only content, we excluded these illustrations and presented the text on paper using a large, child-friendly font.

Table 6.1 presents a summary of the texts (Grade levels shown are only approximate, as readability formulas simply estimate grade level of text.)

Adapted text Text plus nonsemantic help Text plus definitions

Table 6.1 Summary of texts used in comets and meteors experiment Grade levels calculated using Flesch-Kincaid Grade Level using Microsoft Word (Office 97, WindowsNT).

Test

We assessed the understanding of target words in the story through a 5-item matching task for each topic, administered on the same day as the stories For instance, the match for "comet" was defined as "A ball of ice in space with a tail." While the matching items shared some lexical similarities with the in-text definitions, they differed at the phrasal level This design allowed students to refer back to the stories while taking the tests, effectively evaluating their short-term memory, reading comprehension, and information retrieval skills The complete matching tasks for the comets and meteors stories can be found in Appendix D.

The chance-level performance on the matching test was influenced by the interdependence of the items, as answering one question impacted the choices available for the others Once a student matched a word with an answer, it reduced the number of possible answers for the remaining words For instance, if a student matched "atmosphere" with "The air around the Earth," it would limit their options for matching subsequent words like "crater."

A computer program was developed to analyze how students might randomly match five items with five answers, excluding the already identified response "The air around the Earth." The distribution of correct answers revealed a median of 1 and a mean of 119/120, indicating that the chance distribution is not normally distributed Despite this, analysis of variance remains effective even with non-normal data, as noted by Brian Junker, Associate Professor of Statistics at Carnegie Mellon University The calculations showed that there are 44 ways to achieve zero correct answers, 45 ways to get one correct, 20 ways for two correct, 10 ways for three correct, and a single way to achieve four correct, summing to a total of 120 possible combinations.

Number of distinct ways to get this many matches correct

In this analysis, we examined the chance distribution of correct answers in a 5-item matching task, assuming that the items were matched randomly without replacement Consequently, scoring 4 out of 5 items as correct is influenced by the predetermined nature of the fifth choice, which is entirely determined by the previous four selections.

Experiment design

In an experiment assessing the effectiveness of definitions versus nonsemantic assistance, participants were exposed to two passages, each corresponding to a different condition The order of topics and assignment to conditions was counterbalanced to ensure fairness Students read the passages on paper and completed a test for each passage immediately after reading The experimental design is illustrated in Figure 6.2.

Read comets with nonsemantic help

Test on words from comets passage

Read meteors passage with definitions

Test on words from meteors passage

Read comets with nonsemantic help

Test on words from comets passage

Read meteors passage with definitions

Test on words from meteors passage

Read comets with nonsemantic help

Test on words from comets passage

Read meteors passage with definitions

Test on words from meteors passage

Read comets with nonsemantic help

Test on words from comets passage

Read meteors passage with definitions

Test on words from meteors passage

Counterbalance (a) order of passages, (b) assignment of topics to conditions

Figure 6.2 Flowchart for comets & meteors experiment.

In our study, we systematically varied the order of topics, presenting either comets or meteors first, and also alternated the assignment of topics to conditions, with comets paired with nonsemantic help and meteors with definitions, or the reverse Each test included one sample item (ice and iron) alongside five actual items A total of 41 students, ranging from 2nd to 5th grade and hailing from a low-income urban elementary school (Fort Pitt Elementary School), participated in the research.

Results

Explanations held an advantage over nonsemantic help Analysis of variance (ANOVA) including a term for age showed a significant effect of definition on the matching task (p = 041).

A paired Student's t-test demonstrated that definitions significantly enhanced student performance, with an average of 2.5 correct responses for the text containing definitions compared to 1.8 correct responses for the text supplemented with non-semantic help (p = 007) This indicates that students effectively utilized the information provided in definitions, surpassing the benefits of merely increased exposure to the material.

Lessons learned from comets and meteors experiment

The comets and meteors experiment investigated the immediate impact of augmented text, utilizing either definitions or non-semantic assistance Following this, our subsequent experiment assessed the effects of augmented text on a later date, comparing the influence of definitions against natural contexts.

Can explanations add to natural contexts?: The limericks experiment

Texts

The article discusses eight children's limericks by Edward Lear from the 19th century, each featuring a specific target word: dolorous, laconic, imprudent, innocuous, mendacious, oracular, irascible, and vexatious These texts were crafted with careful consideration of various factors to enhance their educational value.

1 Genre – all the limericks were poems.

2 Author – all the limericks were written by Edward Lear.

3 Intended audience – all the limericks were written for children.

4 Syntax – all of the limericks contained the target word in the last line, as follows: “That target word (old) Person of Place.”

1 The test used the alternate spelling vexacious

5 Word frequency – all of the target words occurred zero or one time in the Brown corpus (Kucera and Francis 1967), a million-word representative sampling of written English fiction and nonfiction in a variety of styles and domains – Francis and Kucera (1971 provides details The words mendacious and vexatious occurred once; the other target words did not occur.

6 Part of speech – all target words were adjectives.

7 General semantic class – all target words described human personality traits.

Here is an example of a limerick:

There was an Old Man of Cape Horn,

Who wished he had never been born;

So he sat on a chair,

Till he died of despair,

That dolorous Man of Cape Horn.

The limericks are given in full in Appendix E.

We wrote the definitions for the target words to be as syntactically similar as possible Each definition explained the words in ordinary language, following the advice given in McKeown

(1993) For example: “We can say someone is dolorous if they are mournful, or feel really bad.”The definitions are given in full in Appendix E.

Tests

We distributed a paper questionnaire to students, featuring two questions for each target word to ensure clarity and avoid rec

Experiment design There were four conditions (Table 6.2):

To reduce variance from first- or last-item effects, we implemented a technique from vocabulary assistance research by maintaining a consistent order of limerick presentation Although word and order were combined, our focus was not on their individual effects Each participant encountered two different words per condition, resulting in exposure to 6 out of 8 words during the experiment, with 2 words serving as a no-exposure control The allocation of words to conditions was predetermined for each specific Reading Tutor computer.

Students utilized Reading Tutor assistance while reading texts on 12 individual computers, as illustrated in Figure 6.3 Each computer accommodated one student at a time, with five sessions conducted daily, featuring different students in each session The number of Reading Tutors in operation ranged from 5 to 12, ensuring a supportive learning environment.

Figure 6.3 Children reading with the Reading Tutor at summer 2000 reading clinic. Photo credit: Mary Beth Sklar, Project LISTEN Educational Research Field Coordinator.

Figure 6.4 shows a flowchart of the design of the vocabulary limerick experiment.

“There was an Old Man of Cape Horn, Who wished he had never been born ;

So he sat on a chair, Till he died of despair, That dolorous Man of Cape Horn.”

“We can say someone is dolorous if they are mournful, or feel really bad.”

(No limerick) explanation no explanation limerick no limerick

1 Have you ever seen the word dolorous before? Yes No

If someone is dolorous they must be… angry sad tired afraid

Test word familiarity and word knowledge, subsequent day

(questions on other words not shown):

Figure 6.4 Limericks flowchart, showing one (of eight) limericks.

Treatment fidelity: 3% of trials affected by bug

During our experiment, we encountered a minor issue with treatment fidelity due to a bug in the July 2000 version of the Reading Tutor This glitch allowed students to skip sentences by repeatedly clicking the Go arrow while the program was transitioning to the next sentence Consequently, some trials intended to present a sentence featuring the target word, either a definition or a line from the original limerick, were inadvertently skipped by the students.

To assess the impact of the bug, we analyzed whether each sentence with the target word had a corresponding student utterance recorded by the Reading Tutor, which captured at least one spoken word Out of 232 trials, seven did not have any recorded student utterances for the sentences containing the target word These affected trials are detailed in Table 6.2.

ID Grade Word Intended to display limerick?

Reading Tutor heard something for limerick sentence with target word?

Reading Tutor heard something for definition?

“Have you seen this word?”

CW 4 dolorous Yes No No No Yes Wrong

JR 2 vexatious No No Yes No Not answered – coded as No

MT 3 innocuous Yes Yes Yes No Yes Wrong

SJ 5 mendacious No No Yes No No Wrong

DG 2 imprudent No No Yes No Yes Wrong

DG 2 oracular Yes No No No No Wrong

DD 2 imprudent Yes No Yes Yes Yes Wrong

Table 6.2 Trials affected by sentence-skipping bug in Summer 2000 limericks experiment.

When faced with trial data affected by a bug, we have several options for handling it One approach is to "recode" the data, treating the impacted trials as if they occurred under different conditions, although this can lead to confusion since the original randomization was compromised by a nonrandom effect Another option is to eliminate all trials for any student impacted by the bug; however, this would significantly reduce our subject pool, as only 6 out of 29 students were affected Alternatively, we could choose to delete only the specific trials influenced by the bug Determining the most appropriate course of action is crucial for maintaining data integrity.

In our analysis of the limericks experiment, we aimed to address two key questions: first, we examined the actual outcomes of the experiment, focusing on student performance across various conditions, including the presence of a bug We included all data for a comprehensive analysis Second, we explored the potential future performance implications of implementing the vocabulary assistance policy represented by the experiment, post-bug fix To predict future outcomes, we removed seven trials impacted by the bug and re-analyzed the data Notably, whether we included or excluded these trials did not significantly alter the results, and we present findings from both analyses for transparency.

Results

Table 6.3 presents the limericks experiment results crosstabulated by grade using SPSS, while Table 6.4 offers a more accessible format of the same data Together, these tables provide complementary insights: Table 6.3 contains raw data with subtotals, and Table 6.4 displays marginal probabilities In two instances, students did not respond to a word familiarity question; since they did not answer "Yes," these cases are recorded as "No." The discussion focuses on three key areas: word familiarity, word knowledge, and the relationship between the two, utilizing a blend of definitive statistical comparisons and exploratory informal comparisons, with appropriate notes on the methodology throughout.

Word familiarit y * Word knowledge * Limerick * Explanat ion * Grade Crosst abulat ion

Table 6.3 Results from Summer 2000 limericks study in SPSS crosstabulation Grade here means grade just finished. no limerick limerick Grade 2 no explanation 9 familiar/18 total (50%)

6 right/18 total (33%) explanation 12 familiar/18 total (67%)

5 right/18 total (28%) no limerick limerick

Grade 3 no explanation 6 familiar/18 total (33%)

4 right/18 total (22%) explanation 8 familiar/18 total (44%)

4 right/18 total (22%) no limerick limerick

Grade 4 no explanation 4 familiar/14 total (29%)

2 right/14 total (14%) explanation 4 familiar/14 total (29%)

4 right/14 total (29%) no limerick limerick

Grade 5 no explanation 1 familiar/8 total (13%)

4 right/8 total (50%) explanation 5 familiar/8 total (63%)

3 right/8 total (38%) no limerick limerick

All grades no explanation 20 familiar/58 total (34%)

16 right/58 total (28%) explanation 29 familiar/58 total (50%)

Table 6.4 Results for Summer 2000 limericks experiment, by grade (just finished) and overall

Including all students and all trials, results were as follows (Table 6.5).

All students no limerick limerick no explanation 20/58 familiar

Table 6.5 All students’ performance on word familiarity question.

Explanations and familiarity Explanations had a strong effect on self-reported familiarity, significant in a first-order logistic regression 2 model at p < 001: logistic regression coefficient 1.08 ± 0.32, with 99.9% confidence interval 0.02, 2.15

Limericks and familiarity Limericks exhibited a trend favoring a positive effect on familiarity, but not significantly: 0.50 ± 0.32, with 90% confidence interval -0.02, 1.03

Excluding trials affected by the sentence-skipping bug did not significantly alter the main effects of explanation (1.08 ± 0.33) or limerick (0.50 ± 0.33) The term "effect of explanation" refers to the overall impact of viewing an explanation, rather than the effectiveness of specific explanations, and the same applies to limericks Additionally, incorporating an interaction term in the model revealed no significant interaction between explanation and limerick, and it did not meaningfully change the main effects, which were 1.26 ± 0.45 for explanation and 0.67 ± 0.44 for limerick.

A study on word familiarity revealed that lower-grade students often mistakenly believed they had encountered words not previously presented in the Reading Tutor program Specifically, 50% of 2nd graders, 33% of 3rd graders, 29% of 4th graders, and only 13% of 5th graders reported prior exposure to two out of eight words they had not actually seen.

The study indicates a general tendency to respond affirmatively, with awareness appearing to increase in higher grades: 65% of 2nd graders, 43% of 3rd graders, 38% of 4th graders, and 58% of 5th graders exhibited this trend.

Cell-by-cell comparisons provided qualitative insights into the data, revealing that explanations significantly impacted familiarity across all grades In seven of the eight comparisons, cells with explanations demonstrated higher familiarity than those without For instance, in Grade 2, the group with no limerick and no explanation had 9 familiar responses out of 18, while the group with a limerick but no explanation had 12 familiar responses out of 18 Additionally, the influence of limericks on familiarity was similarly evident in seven out of eight comparisons.

The results for word knowledge are more nuanced than the results for familiarity

Explanations and word knowledge Considering all grades together, the trend favored a positive effect of explanations on word knowledge, but not significantly (0.24 ± 0.31)

Limericks and word knowledge There was essentially no effect for seeing the target word in the limerick (-0.05 ± 0.31)

Excluding trials impacted by the sentence-skipping bug did not significantly alter the results, with the explanation showing a mean of 0.29 ± 0.32 and the limerick at -0.05 ± 0.32 Introducing an interaction term in the model revealed no significant interaction between explanation and limerick (0.48 ± 0.63), no notable effect for explanation (0.00 ± 0.44), and an unexpectedly weak negative trend for limerick at -0.28 ± 0.43 What could be the cause of these outcomes?

Consider the percentage of correct answers in the four conditions, aggregated for all 29 students (Table 6.6)

All students no limerick limerick no explanation 14/58 right

Table 6.6 All students’ performance on word knowledge question.

Performance across all cells hovers around chance levels at 25% Notably, there is a surprising decline in accuracy when comparing the explanation/no limerick cell to the explanation/limerick cell This observed difference may be attributed to random variations in performance.

In an analysis of word knowledge by grade level, second graders answered 19 out of 72 multiple-choice questions correctly, achieving a score of 26% Similarly, third graders scored 25% by answering 18 out of 72 questions correctly Notably, in seven out of eight categories for both second and third graders, the correct responses ranged from 4 out of 18 (22%) to 5 out of 18 (28%), reflecting performance levels closely aligned with random guessing at approximately 25%.

Fourth graders answered 16 out of 56 questions correctly, achieving a score of 29%, while fifth graders answered 10 out of 32 correctly, resulting in a score of 31% The performance of both grades varied significantly across different categories, with fifth graders scoring as low as 0% on questions without explanations and as high as 50% when provided with explanations In cell-by-cell comparisons, three out of four instances showed that explanations outperformed the absence of explanations, whereas only one out of four comparisons indicated that limericks were more effective than their absence.

Because they performed similarly, we aggregated the fourth and fifth graders’ data together (Table 6.7).

Fourth and fifth graders no limerick limerick no explanation 18% right

Table 6.7 Fourth and fifth graders’ data, aggregated together.

In a study involving fourth and fifth graders, results indicated a significant effect favoring explanations over limericks, with a mean difference of 0.89 (p < 10) Excluding trials influenced by a sentence-skipping bug did not alter the outcomes significantly When an interaction term was added to the model, neither explanations nor limericks showed significant effects While explanations appeared to be more beneficial than limericks, the trends were not statistically significant Overall, the findings suggest that fourth and fifth graders may benefit from explanations of target words, although the impact is less pronounced compared to their performance on familiarity measures.

6.2.4.3 The relationship between familiarity and word knowledge

A significant correlation was found between self-reported familiarity and word knowledge, measured using Kendall’s tau b, as detailed in Table 6.8 The correlation values for each grade level were as follows: Grade 2 showed a correlation of 372 (p = 125), Grade 3 had 088 (p = 718), Grade 4 recorded 300 (p = 279), while Grade 5 results were not applicable.

269+, p = 055 explanation, no limerick 094, p = 697 194, p = 423 091, p = 742 258, p = 495 110, p = 410 no explanation, limerick 158, p = 514 036, p = 883 548*, p = 048 500, p = 186 288*, p = 031 explanation, limerick

-.169, p = 485 -.122, p = 615 316, p = 254 600, p =.112 085, p = 526 all data 113, p = 341 049, p = 679 271*, p = 044 507**, p = 005 187**, p=.004 all data except no exposure condition: explanation alone plus limerick alone plus explanation and limerick

Table 6.8 Correlation between familiarity and word knowledge, by grade and condition.

The relationship between familiarity and word knowledge is notably stronger and statistically significant for higher-grade students, while it is negligible and insignificant for lower-grade students Second and third graders showed no correlation between word knowledge and familiarity, performing at chance levels on the word knowledge test despite reporting increased familiarity after exposure to a word In contrast, fourth and fifth graders demonstrated an overall increase in familiarity, with those who could derive meaning from limericks likely retaining the words better than those who could not.

The relationship between familiarity and accuracy varied by grade and condition, revealing that the limerick strengthened the connection between reported familiarity and actual word knowledge (correlation of 0.288, p = 031), while the definition did not show a significant correlation (0.110, p = 410) This suggests that students may have engaged more deeply with the limerick, aiding their retention, whereas the definition required less cognitive effort, making it harder to remember This observation is preliminary and highlights potential avenues for future research It is important to note that a significant correlation does not necessarily indicate that it is more reliable than a non-significant correlation.

Lessons learned from limericks experiment

The limericks experiment revealed that providing explanations significantly enhanced students' familiarity with new words, particularly benefiting fourth and fifth graders in understanding word meanings for multiple-choice assessments While exposure to limericks offered some familiarity, its impact was minimal It is important to note that the exploratory nature of this analysis suggests that the findings regarding grade-level differences should be viewed as suggestive rather than conclusive.

Furthermore, we note that reading the limerick might have had advantages beyond those revealed in the multiple-choice test – such as strengthening the relationship between familiarity and word knowledge

The effectiveness of limericks in teaching target words may be limited due to several factors While limericks provide contextual information about the target word, they require students to read multiple sentences, make inferences, and remember their conclusions In contrast, definitions present the meaning of the word in a single, clear sentence, making it easier for students to grasp Additionally, limericks, as a form of poetry, focus on aesthetic appreciation rather than explicit learning, whereas definitions directly convey the purpose of learning the word Moreover, the target word is emphasized more in the definition than in the limerick, which further diminishes the limerick's effectiveness as a learning tool.

The findings from the limericks study indicate that the effects observed are not limited to immediate recall, as the assessment was conducted on a subsequent day Furthermore, the enhancement in word knowledge stems from the ability to remember the meaning of the word itself, rather than merely from recent exposure, since the answers were not present in the original text or its definitions.

Reading is essential for comprehension, as it involves deriving meaning from text, with vocabulary playing a crucial role in this process We initiated our approach with computer-assisted oral reading, enhancing story selection to expose students to new material By using factoids that compared words within the text, we facilitated vocabulary acquisition for some learners Our findings indicated that the Reading Tutor, which incorporated Take Turns and factoids, outperformed a traditional classroom control group in vocabulary learning for third graders and demonstrated results comparable to one-on-one human tutoring Subsequent experiments have paved the way for improved vocabulary support strategies.

The National Reading Panel highlighted that most vocabulary studies primarily show positive effects on experimenter-designed measures rather than standardized tests like the Woodcock Reading Mastery Test This limitation arises because standardized tests often measure vocabulary in a simplistic manner, making it challenging to demonstrate significant vocabulary growth or differences between treatment methods In contrast, this dissertation not only achieved notable results in the Word Comprehension section of the Woodcock Reading Mastery Test but also implemented two innovative techniques: the introduction of new reading material and the use of computer-generated, in-context vocabulary questions as part of an embedded experiment within the typical Reading Tutor usage.

To enhance the effectiveness of Project LISTEN's Reading Tutor, we implemented two key improvements: first, we introduced a system where the Reading Tutor alternates story selection, ensuring that each student encounters at least 50% new material, particularly benefiting those who typically choose fewer new stories Second, we incorporated automatically generated vocabulary support through factoids—short comparisons to other words—and multiple-choice questions for vocabulary assessment These factoids were particularly beneficial for third graders dealing with rare words, aiding them in answering the multiple-choice questions effectively Additionally, the design of these questions adhered to Nagy et al.’s (1985) criteria, and their correlation with the Word Comprehension subtest of the Woodcock Reading Mastery Test provided evidence of their validity However, some examples of these multiple-choice questions revealed constraints that were not initially anticipated.

Subsequent experiments revealed that providing in-context explanations significantly enhances vocabulary support for students Research showed that students who had recently completed 2nd to 5th grade improved their word familiarity through exposure to vocabulary in the Reading Tutor Additionally, 4th and 5th graders benefited from receiving definitions, which further enriched their word knowledge.

Throughout our research, we employed various techniques across different time scales, from seconds to months, in line with Newell's 1990 framework for human behavior Story selection took mere seconds or minutes, while reading them required a few minutes We analyzed the impact of different story choice policies on cumulative story selection over several months Vocabulary assistance was quick to create and present, taking only seconds to minutes for users to engage with We assessed the effects of this assistance both immediately and on subsequent days, as detailed in Chapters 4 and 6 Additionally, using the Reading Tutor involved approximately 20 minutes of daily engagement over an entire school year, with its effects evaluated in a year-long study.

Table 7.1 presents a summary of our experimental results, while Table 7.2 offers additional insights into the vocabulary experiments conducted Throughout the year-long study, all three treatment conditions incorporated various activities, including those focused on vocabulary enhancement Consequently, we analyzed three comprehensive treatments concerning a specific facet of reading development, rather than solely targeting vocabulary growth.

Goal Chapter Methodology Key result

Improve story choice Chapter 3 Modify Reading Tutor to take turns with the student at picking stories Compare to Spring 1998 student-only story choice policy.

Higher percent of new material chosen in Fall 1999 (64.1%), vs Spring 1998 (60.1%) Reading Tutor helped lower-performing students more.

Provide automatically generated vocabulary assistance

Chapter 4 Supplement stories with WordNet- extracted factoids; look for effect of factoids on answering multiple-choice questions.

Compare trials with factoid + context to trials with context alone.

Factoids helped for the 189 trials with single-sense rare words tested one or two days later – significant at 95%, but exploratory.

Tutor to other reading instruction

Chapter 5 Analyze Word Comprehension portion of a larger Project LISTEN study comparing Reading Tutor with classroom instruction, one-on-one human tutoring

For third graders, Reading Tutor better than classroom control (effect size = 0.56, p = 042) and competitive with one-on-one human-assisted oral reading Explore ways to improve vocabulary assistance

Chapter 6 Compare short explanations to nonsemantic assistance Two texts with teacher-written definitions or nonsemantic assistance (COMET starts with C.)

At least when test is given in back of packet, students perform better on word-to-definition matching task when supplied with definitions (2.5 items right vs 1.8).

Chapter 6 Adapt limericks to vocabulary experiment.

Compare no exposure vs limerick alone vs definition alone vs limerick plus definition, all in Reading Tutor.

Measure familiarity (“Have you seen this word before?”) and semantics (multiple-choice question on word meaning).

Strong effect of seeing explanations on familiarity.

Trend favoring effect of seeing limericks on familiarity.

Only 4 th and 5 th graders learned enough from definition to answer multiple-choice questions better.

Table 7.1 Summary of experimental results.

41 students who had just finished grades

2 through 5 Fort Pitt Elementary School

29 students in grades 2,3, 4, 5 Fort Pitt Elementary School Summer reading clinic setting

Reading Tutor could automatically generate vocabulary assistance

Five domain- specific content words for each topic

Eight domain-independent but very rare adjectives

What kind of help? Comparisons to other words, drawn from

Definitions written by story author (a teacher)

Experimenter-written context- specific explanations

When was help given? Immediately before sentence containing target word

Immediately before sentence containing target words

Prior to limerick containing target word

Reading Tutor-selected using experimenter- written constraints

Two teacher-written nonfiction passages, one about comets and one about meteors

Modality of text Computer-assisted oral reading Independent paper- based reading Computer-assisted oral reading Modality of vocabulary help

Help inserted in yellow pop-up boxes, to be read out loud in computer- assisted oral reading

Definitions inserted seamlessly into text, to be read independently on paper

Explanations inserted seamlessly into text, to be read out loud in computer-assisted oral reading

Automatically generated multiple-choice questions, administered by the Reading Tutor

Five-item matching test, administered on paper, stapled to the text passages

The study involved four multiple-choice questions administered on paper, followed by eight yes-no questions assessing word familiarity and eight questions evaluating word knowledge The results indicated that factoids were beneficial in understanding rare single-sense words, which were tested one or two days later.

(44.1% correct with factoids vs 25.8% correct without).

Factoids also helped for third graders seeing rare words (42.0% with factoids vs 36.2% without).

Definitions helped more than nonsemantic assistance on same- day matching task (2.5 items right vs

All students gained familiarity: 59/116 with limerick vs 49/116 65/116 with definition vs 43/116 Only 4 th and 5 th graders showed increased knowledge, and only for explanations:

13/22 right with limerick vs 14/22 17/22 right with definition vs 10/22

Table 7.2 Summary of vocabulary help experiments.

Contributions of this dissertation

This dissertation has contributed to a number of different fields, including intelligent tutoring systems, artificial intelligence and language technologies, and reading research In particular, we have:

We have developed and assessed a mixed-initiative approach called "Taking Turns" for intelligent tutoring systems, which effectively balances learner and system control over time This method enhances the process of selecting stories and could be beneficial for determining the next task in various tutoring systems, combining the strengths of both learner autonomy and guided assistance.

We have developed an automated system for vocabulary assistance and assessment, which includes the generation of automatic support tools While we have clarified the construction and operationalization of this system, we also identified its limitations and additional requirements Despite these flaws, the potential of automated assistance is evident, as it has shown to aid learning Our findings indicate that incorporating factoid vocabulary support into computer-assisted oral reading significantly enhances vocabulary acquisition, particularly for older students and for less common, single-sense words.

Research indicates that computer-assisted oral reading, utilizing tools like Take Turns and factoids, significantly enhances vocabulary learning in third graders compared to traditional classroom methods, and is nearly as effective as reading assistance provided by humans.

We have identified a collection of authentically written and carefully controlled materials for vocabulary experiments, essential for detecting nuanced learning These well-balanced texts, originally created for purposes beyond research, enhance the face validity of studies Our findings include freely available resources, specifically Edward Lear’s limericks, along with our handwritten explanations and test items, which can be utilized in future vocabulary research.

Our research investigates how different students acquire various aspects of vocabulary knowledge from specific texts We also clarify the contexts in which students can effectively learn new words Notably, our limericks experiment revealed that older students benefit from definitions to enhance their word knowledge, while narrative contexts may further reinforce the connection between familiarity and vocabulary understanding.

Future work

We discuss future directions for story choice, and for giving help.

The Reading Tutor's story choice policy currently focuses solely on the difficulty level of stories, neglecting their content By incorporating story content into the selection process, the Reading Tutor could better align with students' preferences while ensuring they engage with new and suitably challenging material To enhance this approach, stories could be organized into pairs based on both difficulty level and topic, allowing the Reading Tutor to select a story that closely resembles the one a student has chosen.

Vocabulary assistance involves selecting which words to help with and determining the type of support to provide In various experiments, such as the factoids and comets & meteors studies, the Reading Tutor focused on words suitable for automated assistance and those flagged by authors for explanation The limericks experiment explored support for extremely rare words, occurring once or fewer per million words While future considerations may include revisiting automated vocabulary help, a semi-automated approach that combines human expertise with computer assistance appears more effective This method could involve manually filtering synonyms from WordNet and concentrating on relatively rare words to enhance the effectiveness of vocabulary support.

Concluding remarks

We set out to demonstrate two claims, which we framed as improvements over factors in Equation 7.1:

New words learned New words seen

Equation 7.1 New words learned per day on Reading Tutor.

An automated reading tutor that allows students to take turns picking stories significantly increases the variety of material they read, particularly benefiting those who typically choose fewer new stories Additionally, by providing semantic information about words, the tutor enhances vocabulary learning beyond what traditional stories offer Research indicates that effective vocabulary instruction can be achieved through brief word explanations Notably, the 1999-2000 Reading Tutor, which incorporated these features, demonstrated superior Word Comprehension gains for third graders compared to a classroom control group and performed comparably to one-on-one human-assisted reading sessions.

Boxplots were created using SPSS to visually represent data distribution The thick black line in the center of the box indicates the median, while the lower edge of the box represents the median of the lower half of the data, known as the lower quartile Conversely, the upper edge of the box signifies the median of the upper half of the data, referred to as the upper quartile.

“T” bars indicate data within 1.5 times the interquartile range, while open circles (○) represent outliers that are more than 1.5 but less than 3 times the interquartile range from the median Closed circles (●), or stars (*) in SPSS, signify data points that exceed three times the interquartile range from the median Additionally, the numbering on closed circles corresponds to internal case numbers used by SPSS (Devore 1991, p 28; SPSS 9.0).

In this dissertation, we employed logistic regression models to analyze experiments with a binary outcome variable, such as a yes/no or correct/incorrect response, using categorical independent variables like grade We constructed general loglinear models in SPSS, following the equation log (p/1-p) = α + βb + γc + δd + … + ζz, where p represents the conditional probability of achieving the desired outcome based on the influencing factors a through z.

SPSS utilizes maximum likelihood estimation to calculate coefficients, which indicate how variations in specific factors affect the probability of an outcome This process is similar to linear regression, where the equation (y = a + bx) demonstrates that altering the coefficient b directly impacts the value of y.

Including a term for student made the analysis take into account student identity, analogous to McNemar’s test

An overview and bibliography may be found at Garson, n.d “Logistic Regression.” http://www2.chass.ncsu.edu/garson/pa765/logistic.htm See also Menard (1995).

Aist, G SR-CALL 1999 Speech recognition in computer assisted language learning In K C Cameron (ed.),

Computer Assisted Language Learning (CALL): Media, Design, and Applications Lisse: Swets & Zeitlinger

Aist, G S 1997 Challenges for a mixed initiative spoken dialog system for oral reading tutoring AAAI 1997 Spring Symposium on Computational Models for Mixed Initiative Interaction AAAI Technical Report SS-97-04.

Aist, G and Mostow, J ITS-PA 2001 Improving story choice in a reading tutor that listens Proceedings of the Fifth International Conference on Intelligent Tutoring Systems (ITS’2000), p 645 Montreal, Canada, June 2000 Poster

Abstract Available online as http://www.cs.cmu.edu/~aist/ITS2000-story-choice-abstract.doc

Aist, G S., and Mostow, J CALL 1997 Adapting human tutorial interventions for a reading tutor that listens: using continuous speech recognition in interactive educational multimedia In Proceedings of CALL 97: Theory and

Practice of Multimedia in Computer Assisted Language Learning Exeter, UK.

A study presented at the Tenth Artificial Intelligence in Education Conference in May 2001 by Aist et al found that computer-assisted oral reading significantly enhances vocabulary learning for third graders compared to traditional classroom methods The results indicated that the effectiveness of computer-assisted reading is comparable to that of human-assisted oral reading, highlighting the potential of technology in educational settings.

American Guidance Service, n.d Bibliography for Woodcock Reading Mastery Tests – Revised (WRMT-R) http://www.agsnet.com/Bibliography/WRMTRbio.html

The American Heritage® Dictionary of the English Language, Third Edition 1996 Houghton Mifflin Company.

Available online at dictionary.com.

Beck, Isabel, and Margaret McKeown 1991 Conditions of vocabulary acquisition In Handbook of Reading Research vol 2: pp 789-814 Mahwah, New Jersey: Lawrence Erlbaum.

Beck, I L., McKeown, M G., & McCaslin, E S 1983 Vocabulary development: All contexts are not created equal.

California State Board of Education 1996 Teaching Reading: A Balanced, Comprehensive Approach to Teaching

Reading in Prekindergarten Through Grade Three http://www.cde.ca.gov/cilbranch/teachrd.htm ISBN 0-8011-

Carver, Ronald P 1994 Percentage of unknown vocabulary words in text as a function of the relative difficulty of the text: Implications for instruction Journal of Reading Behavior 26(4) pp 413-437.

Devore, Jay L 1991 Probability and Statistics for Engineering and the Sciences 3 rd edition Pacific Grove, California: Brooks/Cole.

Donahue, P L., Voelkl, K E., Campbell, J R., and Mazzeo, J 1999 NAEP 1998 Reading Report Card for the

Nation and the States At http://nces.ed.gov/nationsreportcard/pubs/main1998/1999500.shtml National Center for

Duke, N K 2000 Print environments and experiences offered to first-grade students in very low- and very high-SES school districts Reading Research Quarterly 35(4): 456-457.

Edmark 1997 Let’s Go Read http://www.edmark.com/prod/lgr/island/.

Eller, Rebecca G., Pappas, Christine C., and Brown, Elga 1988 The lexical development of kindergarteners: Learning from written context Journal of Reading Behavior 20(1), pp 5-24.

Fellbaum, Christiane, ed 1998 WordNet: An Electronic Lexical Database Cambridge MA: MIT Press.

The Brown Corpus Manual, authored by Francis, W N and Kucera, H in 1971, is a foundational resource in linguistic research This comprehensive manual, published by Brown University in Providence, RI, provides essential insights into the structure and usage of the English language It was made accessible online on November 28, 2000, through the University of Southern Denmark's website, highlighting its ongoing relevance in the study of corpora For those interested in linguistic analysis and the evolution of language, the Brown Corpus remains a significant tool.

Garson, G David n.d Logistic Regression http://www2.chass.ncsu.edu/garson/pa765/logistic.htm.

Gipe, Joan P., and Richard D Arnold 1978 Teaching vocabulary through familiar associations and contexts.

In their 1999 work, Hanna et al explore the critical role of usability research in the design of computer products for children Featured in "The Design of Children’s Technology," edited by Allison Druin, the authors emphasize the importance of understanding children's unique needs and behaviors to create effective and engaging technology The insights provided in this chapter highlight how usability research can inform design decisions, ensuring that products are not only user-friendly but also enhance children's learning experiences This foundational work underscores the necessity of integrating usability studies into the development process of children's technology to foster better interaction and engagement.

Huang, X D., Alleva, F., Hon, H W., Hwang, M Y., Lee, K F., and Rosenfeld, R 1993 The Sphinx-II speech recognition system: An overview Computer Speech and Language 7(2):137-148.

IBM 1998 Watch Me Read http://www.ibm.com/IBM/IBMGives/k12ed/watch.htm.

Juel, Connie 1996 What makes literacy tutoring effective? Reading Research Quarterly 31(3), pp 268-289

Kucera, H & Francis, W N 1967 Computational analysis of present-day American English Brown University Press, Providence, RI.

Kuhn, Melanie R., and Stahl, Steven A 1998 Teaching children to learn word meanings from context: A synthesis and some questions Journal of Literacy Research 30(1): 119-138.

Lear, Edward 19 th c The Book of Nonsense Available online from Project Gutenberg at ftp://sailor.gutenberg.org/pub/gutenberg/etext97/nnsns10.txt.

Madden, R., Gardner, E F., Rudman, H C., Karlsen, B., & Merwin, J C 1973 Stanford Achievement Test New York: Harcourt, Brace, Jovanovich, Inc Cited in (McKeown 1985).

McKeown, Margaret G 1993 Creating effective definitions for young word learners Reading Research Quarterly 28(1): 17-31.

McKeown, Margaret G 1985 The acquisition of word meaning from context by children of high and low ability.

McKeown, Margaret G., Isabel L Beck, Richard C Omanson, and Charles A Perfetti 1983 The effects of long- term vocabulary instruction on reading comprehension: A replication Journal of Reading Behavior 15(1): 3-18.

Memory, David M 1990 Teaching technical vocabulary: Before, during or after the reading assignment? Journal of Reading Behavior 22(1), pp 39-53.

Menard, Scott 1995 Applied Logistic Regression Analysis vol 106, Quantitative Applications in the Social Sciences Sage Publications.

Merriam-Webster Student Dictionary, available online at wordcentral.com.

Mostow, J 1996 A Reading Tutor that Listens (5-minute video) Presented at the DARPA CAETI Community Conference, November 19-22, 1996, Berkeley, CA.

Mostow, J., and Aist, G FF 2001 Evaluating tutors that listen In (K Forbus and P Feltovich, Eds.) Smart Machines in Education: The coming revolution in educational technology MIT/AAAI Press 2001

In their 2001 AI-ED poster presentation, Mostow et al conducted a controlled evaluation comparing the effectiveness of computer-assisted oral reading to that of human-assisted reading This study was showcased at the 10th International Conference on Artificial Intelligence in Education held in San Antonio, Texas, highlighting the advancements in educational technology and its impact on reading skills development.

In their 1999 paper presented at the AAAI National Conference and the ACL Annual Meeting, Mostow and Aist discuss the development of a Reading Tutor that incorporates listening capabilities to author new educational material This innovative approach aims to enhance the learning experience by allowing the system to adaptively generate content, showcasing advancements in intelligent systems within the field of artificial intelligence.

Mostow, J and Aist, G USPTO 1999 Reading and Pronunciation Tutor United States Patent No 5,920,838 FiledJune 2, 1997; issued July 6, 1999 US Patent and Trademark Office

Mostow and Aist (1999) explore the challenges of providing assistance and feedback in a reading tutor that utilizes imperfect speech recognition technology Their study, published in the CALICO Journal, highlights the inherent uncertainties of automated speech recognition and its implications for language learning This research is part of a special issue focused on the role of speech recognition in educational tutoring systems.

Mostow, J., and Aist, G S presented their work at the 1997 National Conference on Artificial Intelligence (AAAI 97), focusing on the automatic evaluation of student learning through a Reading Tutor that listens Their research, detailed on pages 355-361, explores innovative methods for assessing student progress in reading comprehension.

Mostow, J., & Aist, G PUI 1997 When speech input is not an afterthought: A Reading Tutor that listens. Workshop on Perceptual User Interfaces, Banff, Alberta, Canada, October 1997

Mostow, J., Hauptmann, A G., Chase, L L., and Roth S 1993 Towards a Reading Coach that listens: Automatic detection of oral reading errors In Proceedings of the Eleventh National Conference on Artificial Intelligence

(AAAI-93), 392-397 Washington DC: American Association for Artificial Intelligence.

Mostow, J., Hauptmann, A., and Roth, S F 1995 Demonstration of a Reading Coach that listens In Proceedings of the Eighth Annual Symposium on User Interface Software and Technology, Pittsburgh PA Sponsored by ACM

SIGGRAPH and SIGCHI in cooperation with SIGSOFT.

Mostow, J., Roth, S F., Hauptmann, A G., and Kane, M 1994 A prototype Reading Coach that listens In

Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle WA Selected as the

Nagy, William E., Herman, Patricia A., and Anderson, Richard C 1985 Learning words from context Reading Research Quarterly 20(2): 233-253.

National Reading Panel 2000 Teaching Children to Read http://www.nichd.nih.gov/publications/nrppubskey.cfm

Newell, A 1990 Unified Theories of Cognition Cambridge MA: Harvard UP.

Pinnell, G.S., Pikulski, J.J, Wixson, K.K., Campbell, J.R., Gough, P.B., & Beatty, A.S 1995 Listening to children read aloud Washington, DC: Office of Educational Research and Improvement U.S Department of Education.

Reinking, David, and Rickman, Sharon Salmon 1990 The effects of computer-mediated texts on the vocabulary learning and comprehension of intermediate-grade learners Journal of Reading Behavior 22(4), pp 395-411.

Russell, M., Brown, C., Skilling, A., Series, R., Wallace, J., Bohnam, B., and Barker, P 1996 Applications of Automatic Speech Recognition to Speech and Language Development in Young Children In Proceedings of the

Fourth International Conference on Spoken Language Processing, Philadelphia PA.

Schacter, John 1999 Reading Programs that Work: A Review of Programs from Pre-Kindergarten to 4 th Grade.

Milken Family Foundation PDF available from www.mff.org.

Scott, Judith A., and Nagy, William E 1997 Understanding the definitions of unfamiliar verbs Reading Research

Schwanenflugel, Paula J., Steven A Stahl, and Elisabeth L McFalls 1997 Partial word knowledge and vocabulary growth during reading comprehension Journal of Literacy Research 29(4): 531-553.

Shefelbine, John L 1990 Student factors related to variability in learning word meanings from context Journal of

Singh, S., Kearns, M S., Litman, D J., & Walker, M A 1999 Reinforcement learning for spoken dialogue systems In Proceedings of NIPS*99, to appear as S A Solla, T K Leen, and K.-R Müller, (Editors), Advances in

Neural Information Processing Systems 12 Cambridge, MA: MIT Press.

Snow, Catherine E., Burns, M Susan, and Griffin, Peg, Eds 1998 Preventing Reading Difficulties in Young

Children Washington D.C.: National Academy Press.

SPSS 9.0 1998 Statistical software package www.spss.com.

Wasik, B A., and R E Slavin 1993 Preventing early reading failure with one-to-one tutoring: A review of five programs Reading Research Quarterly 28(2), 178-200.

Whines, N 1999 Unpublished master’s thesis, by Nick Whines, for Master of Arts in Design for Interactive Media, Middlesex University, London

Williams, S.M., Nix, D., & Fairweather, P 2000 Using Speech Recognition Technology to Enhance Literacy Instruction for Emerging Readers In B Fishman & S O’Connor-Divelbiss (Eds.), Proceedings of the Fourth

International Conference of the Learning Sciences (pp 115-120) Mahwah, NJ: Erlbaum. http://www.umich.edu/~icls/proceedings/pdf/Williams.pdf

Zechmeister, E B., Chronis, A M., Cull, W L., D’Anna, C A., and Healy, N A 1995 Growth of a functionally important lexicon Journal of Reading Behavior 27(2), pp 201-212.

Appendix A: Experiment summaries in National Reading Panel format

This appendix contains a summary of the story choice study, from Chapter 3, and descriptions of the following other studies:

2 Comparison of Reading Tutor to other reading instruction, Chapter 5;

3 Comets and meteors, from Chapter 6; and,

Spring 1998 vs Fall 1999 story choice comparison, summarized using National Reading Panel coding scheme (NRP 2000).

States or countries represented in sample

Pittsburgh and surrounding communities in western Pennsylvania, USA

Number of different schools represented in sample 1: Fort Pitt Elementary 1: Centennial Elementary

Number of different classrooms represented in sample

Whether participants were drawn from urban, suburban, or rural settings

Pretests administered prior to treatment Woodcock Reading Mastery Test

(WRMT): word attack, word identification, and passage comprehension subtests Oral reading fluency

Woodcock Reading Mastery Test (WRMT): word attack, word identification, word comprehension, and passage comprehension subtests Oral reading fluency

Socioeconomic status (SES) Low SES Mixed.

67% received free lunch 6.7% received reduced lunch

→ 75% received free or reduced lunch

American: ~35% black and ~65% white 2 students may have reported multiethnic background

(Hispanic/African-American/Hawaiian) Exceptional learning characteristics Unknown 1 student with cerebral palsy

2 students with significant speech impairments

First language All except one or two were native speakers of English

All native speakers of English

Explain any selection restrictions that were applied to limit the sample of participants

None Bottom half of class (as determined by teacher) selected to participate

Concurrent reading instruction received in classroom

Other reading instruction Other reading instruction

How was sample obtained? Sample was obtained by comparing samples from two different studies, each examining effectiveness of the Reading Tutor vs other reading instruction

Number of participants lost per group during the study

Was attrition greater for some groups that others?

(2 unavailable for readministering of post-test – post-test readministered to some students due to initial error)

Setting of the study Classroom Classroom

The study utilized a nonequivalent control group design, comparing the behaviors of existing groups from previous research while controlling for age to account for nonequivalence It is essential to detail both the treatment and control conditions, specifically outlining the nature and components of the reading instruction delivered to the control group.

1998 Reading Tutor Take-turns 1999-2000 Reading Tutor

The Reading Tutor offers comprehensive support for oral reading through a combination of implicit and explicit instruction It primarily employs implicit techniques by modeling fluent reading and focusing on individual word recognition Additionally, the program delivers explicit instruction by highlighting specific letter-to-sound rules, such as demonstrating that the letter 'a' makes the /a/ sound, thereby reinforcing grapheme-to-phoneme connections.

Difficulty level and nature of texts

Authentic text ranging in level from pre-primer through fifth grade and including a mix of fiction and non-fiction.

Some decodable text included to scaffold learning decoding skills.

Authentic text ranging in level from pre-primer through fifth grade and including a mix of fiction and non- fiction.

Short factoids inserted into text (see Chapter 4).

Duration of treatments Nominally 20-25 minutes per session, 5 sessions per week, for entire spring

Actual usage: ~13 minutes/session, 1 day in 4-8

Nominally 20 minutes per session, 5 sessions per week, for entire fall Actual usage close to nominal guidelines

Was fidelity in delivering treatment checked?

Weekly visits by Project LISTEN personnel

2-3x/week visits by Project LISTEN personnel

Number of trainers who administered treatment One computer per classroom in study One computer per classroom in study

Type of computers IBM-compatible personal computers running Windows NT

IBM-compatible personal computers running Windows NT

Special qualifications The Reading Tutor listens to children read aloud

Length of training Not applicable

Assignment of trainers to groups

Cost factors Personal computer costs ~$2500; cost of software depends on accounting for research and development costs List and describe other nontreatment independent variables included in the analysis of effects

List processes that were taught during training and measured during and at the end of training

Not applicable for comparison of story choice behavior

List names of reading outcomes measured

Investigator-constructed quantitative measure of how much new material students were seeing: Percent new sentences per sentence encountered No reason to suspect low reliability.

List time points when dependent measures were assessed

Percent new material calculated over duration of each semester

Any reason to believe that treatment/control groups might not have been equivalent prior to treatments?

The study involved students from various schools, grades, and age groups Two specific groups were chosen for comparison due to their similar supervision, shared classroom locations, and comparable duration of involvement.

Were steps taken in statistical analyses to adjust for any lack of equivalence?

Yes; analysis of variance controlled for grade.

Result: Average percent new sentences per sentence encountered

See Chapter 3 for story choice study; Chapter 4 for factoids; Chapter 5 for Reading Tutor vs classroom instruction vs one-on-one human-assisted oral reading

Difference: treatment mean minus control mean

(Measure is a process variable, not an educational outcome variable.)

Summary statistics used to derive effect size

Number of people providing effect size information

Length of time to code study Uncertain

The factoids study was conducted at Centennial Elementary, concurrent with the Fall 1999 Story choice experiment summarized on the previous pages.

Reading Tutor vs classroom instruction vs human- assisted oral reading, from Chapter 5

The yearlong comparison of the Reading Tutor to other reading instruction was conducted at Centennial Elementary, with classroom control and human tutors as the comparison conditions.

Comets and meteors, from Chapter 6

The comets and meteors study was conducted at Fort Pitt Elementary during the spring of 2000.

The limericks study was conducted at Fort Pitt Elementary during a summer reading clinic inJuly 2000.

Student-written stories from Fall 1999 at Centennial Elementary School.

Last names modified to protect students’ identities thus: “S.” or “S -”.

Ngày đăng: 18/10/2022, 20:23

w