Sizing up the hoosier mental lexicon measuring the familiarity of 20,000 words

[RESEARCH ON SPEECH PERCEPTION Progress Report No.#10 (1984) Indiana University] Sizing up the Hoosier Mental Lexicon: Measuring the Familiarity of 20,000 Words* Howard C. Nusbaum, David B. Pisoni, and Christopher K. Davis Speech Research Laboratory Department of Psychology Indiana University Bloomington, IN 47405 *This research was supported, in part, by NIH grant NS‐12179 to Indiana University. We thank Laura Manous for her assistance in testing subjects and Steven Greenspan and Paul Luce for helpful comments and suggestions. Measuring Word Familiarity Nusbaum, Pisoni, & Davis Abstract The relative frequency of occurrence of words in language has been shown to have extremely powerful effects on perceptual and cognitive processes used in word recognition and lexical access. Although different theories of word perception account for the effects of word frequency with different mechanisms, all current theories assume that a mental representation of the experienced frequency of words exists in some form in the lexicon. However, except for published norms of word frequency counts, no data are available on the familiarity of words. Thus, it is quite important to measure the judged familiarity of words to develop a database of experienced frequency information that would give researchers more precise control of word frequency in perceptual and memory experiments. The present study collected familiarity ratings and response times for 20,000 words in the Mirriam‐Webster Pocket Dictionary. Summary statistics on these data are reported. These findings support the claim that although rated word familiarity and frequency counts are related, they are by no means equivalent. This database of familiarity ratings therefore provides a more valid estimate of the psychological correlate of word frequency than computational analyses of corpora of text. In addition, these familiarity data can be used to provide more direct estimates of the size of the mental lexicon than have been made by previous studies. 358 Measuring Word Familiarity Nusbaum, Pisoni, & Davis Sizing up the Hoosier Mental Lexicon: Measuring the Familiarity of 20,000 Words The statistical properties of language have been viewed, for quite some time, as important to understanding language and language use (e.g., Miller, 1951; Zipf, 1945). Of all the statistics that have been computed and investigated for linguistic structures, word frequency probably has had the most significant impact on cognitive psychology and psycholinguistics. This is best demonstrated by the observation made by White (1983) that the Thorndike and Lorge (1944) and Kucera and Francis (1967) compilations of word frequencies are among the most frequently cited references in the literature of cognitive psychology. The reason for this interest in word frequency is quite simple and straightforward: Word frequency has been shown to be a highly reliable and significant factor in studies of visual and auditory perception and memory for linguistic materials. In word identification tasks, the visual threshold duration is lower for high frequency words than low frequency words (e.g., Goldiamond & Hawkins, 1958; Howes & Solomon, 1951; Newbigging, 1961; Solomon & Postman, 1952), the signal‐to‐noise ratios can be as much as 20 dB lower for spoken high frequency words (e.g., Savin , 1963), and spoken high frequency words can be identified with less waveform in the gating paradigm (Grosjean, 1980). In memory studies, high frequency words are recalled better than low frequency words (e.g., Deese, 1960; Sumby, 1963), although low frequency words are recognized better than high frequency words (e.g., Allen & Garten, 1968; Earhard, 1982; Glanzer & Bowles, 1976). In addition, word frequency affects the speed of lexical decision (Landauer & Freedman, 1968; Rubenstein, Garfield, & Millikan, 1970; Scarborough, Cortese, & Scarborough, 1977). In general then, high frequency words are generally responded to more accurately and faster than low frequency words. Given that the frequency of occurrence of a word in language has a significant effect on the cognitive processing of that word, two questions naturally arise. First, how is word frequency represented in the mental lexicon? Second, through what mechanisms does word frequency operate to affect the processing of words? The major controversy in answering these questions has generally been viewed as a single issue of whether word frequency produces a response bias or a change in perceptual sensitivity (see Broadbent, 1967; Catlin, 1969; Nakatani, 1973). However, despite this basic controversy, the most widely accepted view of word frequency is that the frequency of a word in language is somehow represented by a data structure that is canonical with a "counter in the head" that reflects the frequency of occurrence of any stimulus, including nonlinguistic materials (see Hasher & Zacks, 1984). Thus, this view asserts that word frequency effects are really not special to words, but are just a special case of a more general phenomenon related to stimulus and event frequency effects. The notion here is that each experienced occurence of a word in language increments the counter for that particular word (cf. Scarborough, Cortese, & Scarborough, 1977). According to this view, published word frequency counts such as those reported by Kucera and Francis (1967), Thorndike and Lorge (1944) and Carroll, Davies, and Richman (1971), that are based on the frequency of occurrence of words in large corpora of text, provide a measure that is similar (in its derivation) to the mental counter for each word. Therefore, choosing high and low frequency words for an experiment 359 Measuring Word Familiarity Nusbaum, Pisoni, & Davis according to the published frequency counts should be a good method of manipulating the experienced frequency of words. Although different mechanisms may be used to implement the "counter in the head" in various theories of word recognition, these mechanisms are still all based on the fundamental notion that experimental manipulations of word frequency are canonical with some mental representation of experienced frequency. For example, in Morton's (1969) Logogen model of word recognition, the word frequency counter is represented by the threshold of each logogen unit in the lexicon. Each logogen represents a word in the lexicon, and these logogens accumulate evidence that a given stimulus is a specific word (e.g., by counting features of the stimulus that match the logogen). When the accumulated evidence exceeds the threshold (as determined by word frequency), the word is recognized. Low frequency words have higher thresholds than high frequency words and therefore require more stimulus information for recognition. A different example of the "counter in the head" approach is Forster's (1978) model of word recognition in which words are ordered according to word frequency in a serial list in the lexicon. Recognition occurs by sequentially searching this list, starting with the highest frequency word, until a match is found. Thus, in Forster's model, high frequency words are recognized faster than low frequency words because they are encountered first in the search process. Although these and other contemporary models of word recognition use different mechanisms to instantiate the effects of word frequency, they are all basically very similar because the mechanisms used to account for word frequency effects are still based on the same theoretical construct ‐‐ the view that word frequency can be indexed by counting occurrences of words in language. That is, they are all based on the notion of experienced frequency (cf. Eukel, 1980). However, the "counter in the head" analogy is not the only way to view word frequency. One alternative account of word frequency effects is that they reflect structural factors related to the organization of words in the lexicon. By this account, high frequency words are distinguished from low frequency words by the distribution and arrangement of their constituent letters or phonemes (Landauer & Streeter, 1973; Eukel, 1980). If the phonotactic structure of words is related to the relative frequency of use of those words, it would be unnecessary to postulate a counter in the head to account for frequency effects. Moreover, it would be unnecessary to assume that "experienced frequency" is the major factor responsible for the observed effects of word frequency. Instead, frequency effects could be a direct reflection of the structural properties of each word, and perhaps the structural properties of words in relation to other similar words in the lexicon. In order to accurately test predictions derived from the counter in the head hypothesis and the phonotactic structure hypothesis, it is necessary to have ways of designing sets of stimuli that differ on a number of dimensions including experienced frequency. In previous research (e.g., Landauer & Streeter, 1973; Scarborough et al., 1977), the assumption was made that published frequency counts from corpora of text would provide a reasonable estimate of experienced frequency. More recently, however, this assumption has been questioned by Gernsbacher (1984) who demonstrated that familiarity ratings elicited from subjects were more accurate predictions of performance in a variety of tasks than published estimates of word frequency based on corpora of text. Furthermore, by using familiarity estimates instead of frequency counts, Gernsbacher was 360 Measuring Word Familiarity Nusbaum, Pisoni, & Davis able to resolve several apparently inconsistent findings in visual word recognition research. She claimed, for example, that much of the discrepancy between frequency counts and familiarity judgements could be attributed to sampling errors on low frequency words (Carroll, 1967, 1970). As many researchers know, not all low frequency words are unfamiliar words. For example, in the Francis and Kucera (1982) word frequency count, "violin" occurs only 13 times in one million words so that by almost any criterion, this would be considered a low frequency word and thus, harder to process. However, it seems unlikely that "violin" would either be unknown or unfamiliar to most people, partcularly the average undergraduate subject. This disparity between frequency count information and judged familiarity suggests the importance of understanding more precisely the relationship between objective counts of word frequency based on analyses of texts and subjective estimates of word familiarity elicited from subjects (see Gernsbacher, 1984). Previous attempts to measure familiarity or subjective estimates of word frequency have used magnitude estimation (Carroll, 1971; Shapiro, 1969) or rating scales (Gernsbacher, 1984). However, these studies have only measured word familiarity for a very small and highly restricted set of stimuli. To take one example, Gernsbacher (1984) used only 175 words in her study. Thus, the generality of the conclusions about the relationship between rated familiarity and word frequency can be questioned based on the size of the sample of words and the relation of this sample to the population (i.e., the mental lexicon). In addition, familiarity ratings for very small sets of words are probably insufficient to serve as the kind of research tool that is provided by published word frequency counts. Thus, there is clearly a need for a large corpus of word familiarity estimates that would serve the function provided by frequency counts for so many years. A large database of familiarity ratings would therefore permit researchers to select words for experiments according to a wide range of objective criteria, including phonotactic structure, stress location, graphemic structure, etc. In collecting familiarity ratings for a very large number of words, another issue becomes apparent. In the most extreme case, collecting familiarity ratings for every word (all forms and inflections) that has ever been used in language would be tantamount to investigating directly the entire mental lexicon: Words that are rated as familiar must be part of the lexicon, while words that are unknown must be outside the lexicon. Of course, it is probably impossible to test the entire set of words to see if they are present or absent from the lexicon because the lexicon is constantly changing as new words are added. In this connection, it is interesting to note that previous attempts to measure the size of the mental lexicon have used extremely small samples of words (see Anderson & Freebody, 1981, for a review). In general, the procedure that has been used to investigate the size of the mental lexicon has been to sample a small set of words from a dictionary and ask subjects whether the words are known or unknown. Although several variations on this approach have been used, the differences in methodology are minor compared to the common assumption that the size of the lexicon can be determined by simply multiplying the proportion of the sample set that is known to subjects by the size of the original dictionary from which the sample was derived. The result is a number that purportedly indicates how many words in the dictionary would be known by subjects. However, this approach has yielded wildly divergent estimates of the size of the adult lexicon. These estimates range from a low estimate of 15,000 words to a high value of 200,000 words (see Table 1). One problem with this approach is that the estimates of the size of the lexicon are entirely dependent on the size of the dictionary from which the test words were originally sampled (Hartman, 1941; Lorge & Chall, 1963). It is certainly the 361 Measuring Word Familiarity Nusbaum, Pisoni, & Davis case that the larger the dictionary is in relation to the sample, the less representative the sample will be of the entire dictionary. Furthermore, it is probably also true that smaller dictionaries are contained within larger dictionaries. For example, a sample drawn from the 240,000 words of Webster's Third New International Dictionary would almost certainly be contained within the Oxford English Dictionary. But the estimate of lexicon size based on the Oxford English Dictionary would be twice as large as an estimate based on Webster's for the same sample of test words. These observations call into question the basic approach of estimating lexicon size from a very small sample of words. Indeed, it may never be clear how to compute the basic population size of the lexicon. One dictionary may contain the same sample as another and yet be much larger, because it either contains more of the average working lexicon, or because it contains highly unusual technical words and words that are no longer in common use. Table 1 Previous estimates of the size of the adult mental lexicon (from Anderson & Freebody, 1981). Source Size Seashore (1933) 15,000 Kirkpatrick (1907) 19,000 Seashore & Eckerson (1940) 60,000 Gerlach (1917) 85,300 Gillette (1927) 127,800 Hartman (1946) 200,000 An alternative approach to estimating the size of the mental lexicon is to use an extremely large sample of words. As the sample size increases relative to the size of the lexicon, the accuracy of estimation should increase as well. Thus, by measuring familiarity for a very large sample of words, we can also compute an estimate of lexicon size that is based on a much larger sample of words than has been used previously. Of course, there are still several problems involved in computing the size of the mental lexicon. One of these problems is the basic issue of what constitutes a word (see Anderson & Freebody, 1981; Butterworth, 1983; Nagy & Anderson, 1984). While irregular inflections may be represented mentally as separate entries in the lexicon (e.g., "go" and "went"), it is as yet quite unclear whether regular inflections are stored separately (e.g., the Full Listing Hypothesis, Butterworth, 1983), or whether they are stored as some base lexical form plus inflection rules (e.g., Berko, 1958; Chomsky & Halle, 1968). For this reason, past attempts to measure the size of the lexicon have separated words into two disjoint sets ‐‐ base forms and derived forms. Another related question arises when we consider what it means to "know a word" (Anderson & Freebody, 1981). Is it sufficient to be able to recognize the orthographic form of a word without knowing how it is pronounced or what it means? Or, is it sufficient to believe the meaning is known, even if the stored meaning is incorrect? Alternatively, a person may know only one of the 362 Measuring Word Familiarity Nusbaum, Pisoni, & Davis many meanings or uses for a word and thus have only partial knowledge of the word. Clearly then, at this point, there is no strong consensus as to what it means to have a word represented in the mental lexicon (see Butterworth, 1980; Miller, 1978). Nonetheless, it is possible to make certain assumptions that do not seem unreasonable as a first attempt to solve some of these problems, at least for the purpose of computing an initial estimate of lexicon size. We can assume that if a subject believes that he or she knows the meaning of a word, that word has some representation in the lexicon. Even if the meaning is incorrect, or only represents one of many possible meanings, a minimum criterion of a single meaning indicates the existence of some lexical representation for a word. Thus, it is not as important to have a subject define a word correctly as it is to find out whether or not the subject believes the meaning is available. In the worst case, this will overestimate the size of the lexicon by attributing lexical knowledge to a subject that is not available. Also, if the subjects are given an opportunity to classify words that are familiar (i.e., words with known meanings) differently from words that are simply recognized (i.e., the stimulus pattern is identifiable as a word, but the meaning is unknown), it becomes possible, at least in principle, to differentiate to some extent levels of lexical knowledge (e.g., episodic recognition of a word pattern may be differentiated from the semantic representation of a word). The present study reports the results of an investigation of the familiarity of 19,750 words. In carrying out this study, we had three basic goals: First, we wanted to determine the familiarity of a very large sample of words; second, we wanted to measure the degree to which frequency counts and familiarity ratings are related for these words, and third, we wanted to compute a rough estimate of the size of the mental lexicon without having to generalize from a very small sample of words to a much larger population. Method Subjects. The subjects were 600 undergraduate students at Indiana University who participated as partial fulfillment of a course requirement. All of the subjects were native speakers of English and reported having no history of any speech, hearing, or language disorder. In addition, all subjects had normal or corrected‐to‐normal vision. Stimuli. The stimuli used in this experiment were 19,750 words taken from the intersection of Miriam‐Webster's Pocket Dictionary and Webster's Seventh Collegiate Dictionary. This set of stimuli was divided into 50 subsets, each containing 395 words. The subsets were created by stratified random sampling according to two criteria: word frequency and word length. Of the 19,750 words, only 11,750 words were in the Kucera and Francis (1967) text database. A frequency count was determined for each of these words. Stimulus subsets were created that were matched on the mean frequency for the 235 words in each set with frequency counts. We excluded the 17 highest frequency words (i.e., the, of, and, in, to, a, is, was, he, for, it, with, as, his, on, be, at) from the stimulus sets because their frequencies were so high that they would greatly unbalance the frequency means of any of the subsets and because these words would clearly be familiar to every adult speaker of English. 363 Measuring Word Familiarity Nusbaum, Pisoni, & Davis The mean frequency of the entire stimulus set was 40.8 occurrences per million words of text; the lowest frequency word had a count of 1 per million and the highest frequency had a count of 5305 per million. Of the 11,750 highest frequency words in the Kucera and Francis (1967) database, 44.9% were in our sample of words with frequency counts. The mean frequency of this set was 86.0 occurrences per million; the lowest frequency word in this set (the intersection of the 11,750 most frequent words with the 11,750 words in the dictionary) was 6 per million. Thus, the words used in the experiment that were not in the 11,750 most frequent words were somewhat lower in frequency (mean = 4.0, lowest frequency = 1, highest frequency = 5297) than the words that were not used from this set of 11,750 most frequent words (mean = 36.4, lowest frequency = 6, highest frequency = 9489). The remaining 8,000 words (without frequency counts) were then divided among the 50 stimulus sets so that the mean length of these words was matched across sets. The mean word length of the 50 stimlus subsets ranged from a low of 7.2 letters per word to a high of 7.6 letters per word. Procedure. Experimental sessions were conducted with small groups of six subjects each. Each group was tested on only one of the 50 word sets. The subjects were instructed to rate the familiarity of each of 395 words, presented one at a time. A seven‐point rating scale was used in which a rating of one indicated that the word was unknown, a rating of seven indicated that the word was familiar and its meaning was well known. A rating of four indicated that the stimulus was definitely recognized as a word, but its meaning was unknown. The ratings two, three, five, and six were used to indicate variations between these response categories. For example, a rating of three indicated that the subject might have seen the word before, while a rating of five indicated that the subject recognized the word, but had only the vaguest notion of its meaning. Subjects indicated word familiarity on each trial by pressing one of the appropriately labeled buttons on a computer‐controlled response box. For half of the subjects, the buttons were labeled from one to seven, where one was assigned to the leftmost button and seven was assigned to the rightmost button. For the other half of the subjects, the assignment of rating scale to buttons was reversed. Each word set was presented to two groups of subjects (12 subjects total). A different random order of presentation was used for each group. One of the groups received one response button order and the other group received the reverse order. On each trial, the word "READY was presented on a CRT display for 500 msec before the test word. After the "READY" prompt was turned off, a single test word was presented in upper case for a maximum of six seconds. If all the subjects responded before the six‐second time limit, the test word was turned off, and a new trial was initiated. If one or more subjects failed to respond, a new trial began after the six‐second period was finished. The subjects were encouraged to respond to every word. Ratings and response latencies were collected on every trial. Results To examine the relationship between word frequency and familiarity, Pearson product moment correlations were computed for those words that were in the Kucera and Francis (1967) corpora of text. The correlation computed for mean ratings averaged over 12 subjects and the log‐ transformed frequency was .43 for 11,750 words (p

Định dạng
Số trang	19
Dung lượng	247,41 KB