Language learning in the full or why the

Language Learning in the Full or, Why the stimulus might not be so poor after all Morten H Christiansen Philosophy-Neuroscience-Psychology Program Department of Philosophy Washington University in St Louis Campus Box 1073 St Louis, MO 63130 morten@twinearth.wustl.edu January 6, 1994 Abstract Language acquisition is often said to require a massive innate body of language speci c knowledge in order to overcome the poverty of stimulus In this picture, language learning merely implies setting a number of parameters in an internal Universal Grammar But is the primary linguistic evidence really so poor that it warrants such an extreme nativism? Is there no room for a more empiricist oriented approach to language acquisition? In this paper, I argue against the extreme nativist position, discussing recent results from psycholinguistics and connectionist research on natural language Speci cally, I eschew the competence/performance distinction traditionally presupposed in linguistics, advocating a close relationship between the representation of grammar and the processing mechanism Abandoning the in nite grammatical competence allows us to focus on regular languages This signi cantly restricts the computational complexity that such models will face regarding both learning and processing The learning task is further facilitated by an inversion of the perspective on the relation between natural language and the human learning mechanism, suggesting that language is an artifact heavily constrained by the human learning mechanism This is supplemented by evidence concerning the way maturational development alleviates language learning Construing natural language as a maturationally constrained artifact promises to overcome the predicament following a classic formal language learning proof; viz., that no interesting language can be learned from positive examples alone Together, these arguments strongly suggest that some sort of non-trivial empiricist language learning is possible after all|because, given the right assumptions, the stimulus is not really that poor Introduction Universal Grammar (UG)|understood as a substantial innate endowment of language speci c knowledge|is widely regarded as being a necessary component in accounts of language learning In this framework language \learning" is simply a matter of setting a number of parameters (speci c to a particular language) in an innate database consisting of universal language speci c principles Chomsky, the principal architect of UG, has argued that when it comes to the explanation of language acquisition the postulation of a universal set of linguistic principles (constraints or rules) is the only game in town In other words, UG is held to be the best explanation of human linguistic behavior That there are internal constraints on the acquisition of language is hardly controversial, but the nature and extent of these constraints is the focus of much debate The main argument against empiricist approaches to language learning stems from observations about the poverty of the stimulus that language learners are exposed to Without substantial constraints, it seems that the fragment of language available to a child is too limited and (perhaps) too degenerate to allow the induction of the appropriate grammar underlying a speci c language In addition, Gold's (1967) proof that context-free languages|and even regular languages|cannot be reliably learned from a nite set of positive examples has been taken as further evidence against empiricist approaches to language acquisition Yet, there are a number of holes in the arguments from the poverty of stimulus to the extreme nativist position of UG These lacunae center around (1) the appeal to a competence/performance distinction; (2) the perceived relation between language and the human language learning mechanism; (3) a static conception of the learning mechanism; and (4) an over-interpretation of Gold's results Together, these problems pose a strong challenge to traditional Chomskyan linguistics regarding the psychological reality of their grammars In this paper, I seek to advance this challenge, presenting arguments based on recent connectionist and psycholinguistic research However, before we start, a few clari cations are in order to avoid terminological confusions In particular, we need to be clear about what is meant by `grammar' and `psychological reality' Regarding the latter, Peacocke (1989) has suggested \that for a rule of grammar to be psychologically real for a given subject is for it to specify the information drawn upon by the relevant mechanisms or algorithms in that subject" (p 114) Given the connectionist spirit of the present paper, we can come up with a more suitable notion of grammar and the psychological reality thereof by dropping the quali cation of grammars as being essentially rule-based Thus, a grammar consists of a body of information organized in such a way that it can account for a given set of linguistic data Notice that the mode of organization is not predetermined, so the above notion of grammar can be applied in the discussion of both rule-based and connectionist natural language systems Rephrasing Peacocke, I suggest that a grammar is psychologically real if it speci es the body of information which, when drawn upon by a particular mechanism or algorithm, is necessary for the causal explanation of a given collection of linguistic and psycholinguistic evidence In what follows, I start out by eschewing the distinction between competence and performance typically presupposed in linguistics, stressing performance data as the basis for models of natural language learning and processing Abandoning the in nite grammatical competence allows us to focus on regular languages which signi cantly limits the complexity that our models face Next, the language learning task is further reduced by observing the way maturational constraints interacts with language learning, the former strongly constraining the latter Finally, suggestions are made that circumvent the predicament following Gold's formal language learning results without succumbing to the extreme nativism of UG In the conclusion, I suggest that we can get much further than previously assumed using simple statistical and connectionist language models incorporating certain maturational constraints Abandoning the Distinction Between Competence and Performance In modern linguistics, the paradigmatic method of obtaining data is through intuitive grammaticality judgements However, it is a generally accepted fact that the greater the length and complexity of a particular utterance is, the less sure people are in their judgement thereof To explain this phenomena, a distinction between an in nite linguistic competence and a limited performance is made In contrast to the idealized grammatical competence, the performance of a particular individual is limited by memory limitations, attention span, lack of concentration, and so on This methodological separation of the unbounded linguistic competence from the limited performance of observable natural language behavior has been strongly advocated by Chomsky: One common fallacy is to assume that if some experimental result provides counter-evidence to a theory of processing that includes a grammatical theory T and parsing procedure P ( ), then it is T that is challenged and must be changed The conclusion is particularly unreasonable in the light of the fact that in general there is independent (so-called \linguistic") evidence in support of T while there is no reason at all to believe that P is true (Chomsky, 1981: p 283) ::: The main methodological implication of this position is that it leads to what I elsewhere have called the `Chomskyan paradox'1 On the one hand, the competence/performance distinction (C/PD) makes T immune to all empirical falsication, since any falsifying evidence can always be dismissed as a consequence of a false P On the other hand, all grammatical theories nevertheless rely on grammaticality judgements that (indirectly via processing) display our knowledge of language Consequently, it seems paradoxical that only certain kinds of empirical material is accepted|i.e., grammaticality judgements|whereas other kinds are dismissed on what appears to be relatively arbitrary grounds Indeed, Chomsky does not seem to care much about psycholinguistic results: In the real world of actual research on language, it would be fair to say, I think, that principles based on evidence derived from informant judgment have proved to be deeper and more revealing than those based on evidence derived from experiments on processing and the like, although the future may be di erent in this regard (Chomsky, 1980: p 200) For a more detailed discussion of this and related points, see Christiansen (1992) In this light, the C/PD provides its proponents with a protective belt that surrounds their grammatical theories and makes them empirically impenetrable to psycholinguistic counter-evidence As long as the C/PD is upheld, potentially falsifying psycholinguistic evidence can always be explained away by referring to performance errors This is methodologically unsound insofar as linguists want to claim that their grammars have psychological reality But it is clear that Chomsky (1986) nds that linguistic grammars are psychologically real when he says that the standpoint of generative grammar \is that of individual psychology" (p 3) Nevertheless, by evoking the distinction between grammatical competence and observable natural language behavior, thus disallowing negative empirical testing, linguists cannot hope to nd other than speculative (or what Chomsky calls `independent linguistic') support for their theories In other words, if linguistic theory is to warrant psychological claims, then the C/PD must be abandoned In contrast, a connectionist perspective on natural language promises to eschew the C/PD, since it is not possible to isolate a network's representations from its processing The relation between a grammar, which has been acquired through training, and network processing is as direct as it can be Instead of being a set of passive representations of declarative rules waiting to be manipulated by a central executive, a connectionist grammar is distributed over the network's memory as an ability to process language (Port & van Gelder, 1991) Notice also that although networks are generally \tailored" to t the linguistic data, this does not simply imply that a network's failure to t the data is passed onto the processing mechanism alone Rather, when you tweak a network to t a particular set of linguistic data, you are not only changing how it will process the data, but also what it will be able to learn That is, any architectural modi cations will lead to a change in the overall constraints on a network, forcing it to adapt di erently to the contingencies inherent in the data and, consequently, to the acquisition of a di erent grammar Thus, since the representation of the grammar is an inseparable and active part of a network's processing, it is impossible to separate a connectionist model's competence from its performance It is, furthermore, worth noticing that performance, but not competence, can be described in terms of regular languages produced by nite-state machines This substantially reduces the complexity of the processing involved and, subsequently, of the learning task, too This avenue has been pursued by Elman (1991a) and Christiansen & Chater (1994) via connectionist simulations of various aspects of natural language performance In particular, these models exhibit psychologically realistic behavior when faced with sentences involving center-embedding and cross-dependency Moreover, as we shall see next, the close t between the kind of grammar that can be learned and the network guration is also likely to be characteristic of the human language learning mechanism Language as a Maturationally Constrained Artifact Cross-linguistic studies have revealed a number of universal patterns that can be found in all human languages An example of such a universal pattern is the `head parameter' which gives name to the observation that the head always is positioned in the same way within phrases For a `head rst' language, such as English, this means that the head is always rst in phrases; for instance, in verb phrases the verb is always rst (as in \feeds the cat") Together with the apparent poverty of the primary linguistic stimulus, this has been taken as evidence that the human language acquisition mechanism must be constrained in such a way that it induces only human languages This seems to be a considerable feat, since the set of human languages is merely a small subset of the vast range of theoretically possible languages The combination of innate constraints necessary for such a feat is therefore supposed to be substantial Indeed, Chomsky has suggested \that in certain fundamental respects we not really learn language; rather, grammar grows in mind" (Chomsky, 1980: p 134) Thus, UG is the proposed explanation of the prima facie surprising fact that humans only learn human languages, and not any of the many possible non-human languages However, I suggest that this is the wrong way to perceive the tight connection between language and the human learning mechanism What we need is a Gestalt switch2 : Instead of saying that humans can only learn a small subset of a huge set of possible languages, we must inverse our perspective, observing that natural languages exist only because humans can produce, learn and process Some of the ideas presented in this section were developed in discussion with Andy Clark them Natural languages are human artifacts constrained by human learning and processing mechanisms It is therefore not surprising that we, after all, are so good at learning them Language is closely tailored just for human learning, not vice versa Notice, moreover, that the \evolutionary rate" for cultural artifacts, such as language, is much faster than for a group of humans making up a speci c speech community3 The artifacts are therefore much more likely to adapt to their humans producers than the other way round Languages that are hard for humans to learn simply die out, or, more likely, not come into existence at all So, in short, the human learning mechanism determines a number of a priori restrictions on the possible human languages; and the set of the latter is a small fraction of the set of theoretically possible languages That language has evolved in close relation to the development of the human language mechanism is a phylogenetic point, but it has ontogenetic plausibility, too Based on evidence from studies of both rst and second language learners, Newport (1990) has proposed a \Less is More" hypothesis which suggests \paradoxically, that the more limited abilities of children may provide an advantage for tasks (like language learning) which involve componential analysis" (p 24) Maturationally imposed limitations in perception and memory forces children to focus on certain parts of language depending on their stage of development Interestingly, it turns out that these limitations make the learning task easier, because they help the children acquire the building blocks necessary for further language learning In contrast, the superior processing abilities of adults prevent them from picking up the building blocks directly; rather, they have to be found using complex computations, making language learning more di cult (hence, the notion of a crucial period in language learning) This means that \because of age di erences in perceptual and memorial abilities, young children and adults exposed to similar linguistic environments may nevertheless have very di erent internal data bases on which to perform linguistic analysis" (Newport, 1990: p 26) In relation to morphology, Newport discusses whether a learner necessarily needs a priori knowledge akin to UG in order to segment language into the right units corresponding to morphemes She nds that such a segmentation is, indeed, possible, \even without advance knowledge of the morphology, if the units of perceptual segmentation are (at least sometimes) the morphemes This was pointed out to me by Nick Chater which natural language have developed" (p 25) Given the above discussion of language as a human artifact, I contend that this point can be applied not only to morphology, but also to syntax Consequently, some of the universal principles and parameters of UG|or perhaps even all of them|do not necessarily need to be speci ed innately, but could instead be mere artifacts of a learning mechanism undergoing maturational development Whether this conjecture will hold is an empirical matter that only future research into the human language learning mechanism can settle However, recent results from connectionist and statistical language learning does seem to point in the right direction In a number of connectionist simulations involving simple recurrent networks, Elman (1991b) showed that a network without any `maturational' development cannot learn to respond appropriately to sentences derived from a small phrase grammar of some linguistic complexity However, when Elman introduced constraints|decreasing over time|on the number of words in sentence that the network was able to `pay attention to', the network was able to learn the task This work has recently been extended by Christiansen & Chater (1994) who trained a network to deal with a substantially more complex grammar using the same approach This practical evidence supports the idea that maturational constraints (of some sorts) on the learning mechanism allows it to pick up relatively complex linguistic structure without presupposing any innate language speci c knowledge Still, it might be objected that these connectionist simulations only deal with small arti cially generated corpora and will therefore not be able to scale up to noisy real world data This might be true, but recent research in statistical language learning suggests otherwise Finch & Chater (1994) demonstrated that simple statistics|similar to what the above networks are sensitive to|can lter out noise and induce lexical categories and constituent phrases from a 40 million word corpus extracted from INTERNET newsgroups This positive outlook lends support to the idea of construing language as a maturationally constrained human artifact In addition, the latter might, in turn, pave the way out of the predicament following Gold's learnability proof, as we shall see in the nal section Formal Language Learning Results in the Limit In a now classic paper in the (formal) language learning literature, Gold (1967) proved that not even regular languages can be learned in nite time from a nite set of positive examples Gold was aware that his proof combined with the lack of observed negative input found in primary linguistic data leads to a predicament regarding human language learning He therefore suggested that his nding must lead to at least one of the following three suppositions Firstly, it is suggested that the learning mechanism is equipped with information allowing it to constrain the search space dramatically In other words, innate knowledge will impose strong restrictions on exactly what kind of grammars generate the proper projections from the input to (only) human languages This is the approach which goes under the name of UG Secondly, Gold proposes that children might receive negative input that we are simply not aware of This would allow the correct projection to only human languages However, see Pinker (1989) for an extensive discussion and subsequent dismissal of such a proposal (though the prediction task as applied in the previously mentioned language learning simulations by Christiansen & Chater 1994] and Elman 1991a, 1991b] might be construed as a kind of weak negative feedback) Thirdly, it could be the case that there are a priori restrictions on the way the training sequences are put together For instance, the statistical distribution of words and sentence structures in particular language could convey information about which sentences are acceptable and which are not (as suggested by, for instance, Finch & Chater, 1994) Regarding such an approach, Gold notes that distributional models are not suitable for this purpose because they lack sensitivity to the order of the training sequences So, it seems that prima facie UG is the only way to get a language learning o the ground|even though learning has to take second place to innate knowledge Nevertheless, given our earlier discussions it should be clear that this conclusion is far from inevitable The way out of Gold's predicament without bying into UG can best be eshed out by taking a closer look at the two basic assumptions which the proof is based on: Given any language of the class and given any allowable training sequence for this language, the language will be identi ed in the limit i.e, it is learnable in nite time from a nite set of examples]." (Gold, 1967: p 449; my emphasis and comment) First of all, Gold is considering all possible permutations from a nite alphabet (of words) into possible languages constrained by a certain language formalism (e.g., context-free or nite-state formalisms) Thus, he stresses that "identi ability (learnability) is a property of classes of languages, not of individual languages." (Gold, 1967: p.450) This imposes a rather stringent restriction on candidate learning mechanisms, since they would have to able to learn the whole class of languages that can be derived from the combination of an initial vocabulary and a given language formalism Considering the above discussion of language as a human artifact, this seems like an unnecessarily strong requirement to impose on a candidate for the human language learning mechanism In particular, the set of human languages is much smaller than the class of possible languages that can be derived given a certain language formalism So, all we need to require from a candidate learning mechanism is that it can learn all (and only) human languages, not the whole class of possible languages derivable given a certain language formalism Secondly, Gold's proof presupposes that the nite set of examples from which the grammatical knowledge is to be induced can be composed in an arbitrary way However, if the learning mechanism is not xed but is undergoing signi cant changes in terms of what kind of data it will be sensitive to (as discussed above), then we have a completely di erent scenario Speci cally, even though the order of the environmentally presented input that a learning mechanism is exposed to might be arbitrary, the composition of the e ective training set is not That is, maturational constraints on the learning mechanism will essentially recon gure the input in such a way that the training sequence always will end up having the same e ective guration (and this is, in e ect, comparable with Gold's third suggestion) Importantly, this is done without imposing any restrictions on the publically available language, i.e., the language that the child is exposed to Conclusion In this paper, I have provided evidence and arguments against the extreme nativist approach to language learning that follows from Chomskyan UG Instead, I have advocated the role of empiricist language learning in accounts of natural 10 language acquisition I think we very plausibly can entertain Newport's (1990) suggestion that at least some of the constraints crucial to success in language acquisition are nonlinguistic, and that the maturational changes which lead to more di culty in language learning occur in these nonlinguistic constraints on perception and memory (p 27) Given the rather empiricist spirit of connectionism, I am dent that future connectionist natural language research will push the eld of language learning in a more empiricist direction than before Nevertheless, this does not imply a return to tabula rasa models, but it does strongly suggest that the stimulus is not as poor as previously assumed Consequently, UG is no longer the only game in town Connectionist models provide a new way of recasting the old nativism/empiricism debate This revived debate has already begun, but is far from nished Acknowledgements Thanks to especially Nick Chater, but also to Dave Chalmers and Mark Rollins, for comments on a previous version of this paper This research was made possible through a McDonnell Postdoctoral Fellowship References Chomsky, N (1980) Rules and Representations New York: Columbia University Press Chomsky, N (1981) Lectures on Government and Binding Dordrecht: Forris Publications Chomsky, N (1986) Knowledge of Language New York: Praeger Christiansen, M (1992) The (Non) Necessity of Recursion in Natural Language Processing Proceedings of the 14th Annual Conference of the Cognitive Science Society, p 665-670 Indiana University, Bloomington Christiansen, M & Chater, N (1994) Natural Language Recursion and Recurrent Neural Networks PNP Tech Report, Philosophy-Neuroscience-Psychology Program, Washington University in St Louis 11 Church, K (1982) On Memory Limitations in Natural Language Processing Bloomington, IN: Indiana University Linguistics Club Elman, J L (1991a) Distributed Representations, Simple Recurrent Networks, and Grammatical Structure Machine Learning, 7, 195-225 Elman, J L (1991b) Incremental Learning, or The Importance of Staring Small In Proceedings of the 13th Annual Conference of the Cognitive Science Society, p 443-448 University of Chicago, Chicago Finch, S & Chater, N (1994) Learning Syntactic Categories: A Statistical Approach In G.D.A Brown & M Oaksford (Eds.), Neurodynamics and Psychology, Academic Press Gold, E.M (1967) Language Identi cation in the Limit Information and Control, 10, p 447-474 Newport, E (1990) Maturational Constraints on Language Learning Cognitive Science, 14, p 11-28 Peacocke, C (1989) When is a Grammar Psychologically Real? In A George (Ed.) Re ections on Chomsky, Basil Blackwell Pinker, S (1989) Learnability and Cognition - The Acquisition of Argument Structure Cambridge, Mass: MIT Press Port, R & van Gelder, T (1991) Representing Aspects of Language In Proceedings of the 13th Meeting of the Cognitive Science Society, 487{492, Chicago, Illinois: Cognitive Science Society Pulman, S.G (1986) Grammars, Parsers, and Memory Limitations Language and Cognitive Processes, 2, 197{225 12 ... face Next, the language learning task is further reduced by observing the way maturational constraints interacts with language learning, the former strongly constraining the latter Finally, suggestions... disallowing negative empirical testing, linguists cannot hope to nd other than speculative (or what Chomsky calls `independent linguistic') support for their theories In other words, if linguistic theory... development Interestingly, it turns out that these limitations make the learning task easier, because they help the children acquire the building blocks necessary for further language learning In contrast,

Định dạng
Số trang	12
Dung lượng	182,78 KB