how to use corpora in language teaching

How to Use Corpora in Language Teaching Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights into the way a corpus can be used, the type of findings that can be obtained, the possible applications of these findings as well as the theoretical changes that corpus work can bring into linguistics and language engineering The main concern of SCL is to present findings based on, or related to, the cumulative effect of naturally occuring language and on the interpretation of frequency and distributional data General Editor Elena Tognini-Bonelli Consulting Editor Wolfgang Teubert Advisory Board Michael Barlow Graeme Kennedy Rice University, Houston Victoria University of Wellington Robert de Beaugrande Geoffrey Leech Federal University of Minas Gerais University of Lancaster Douglas Biber Anna Mauranen North Arizona University University of Tampere Chris Butler John Sinclair University of Wales, Swansea University of Birmingham Sylviane Granger Piet van Sterkenburg University of Louvain Institute for Dutch Lexicology, Leiden M A K Halliday Michael Stubbs University of Sydney University of Trier Stig Johansson Jan Svartvik Oslo University University of Lund Susan Hunston H-Z Yang University of Birmingham Jiao Tong University, Shanghai Volume 12 How to Use Corpora in Language Teaching Edited by John McH Sinclair How to Use Corpora in Language Teaching Edited by John McH Sinclair John Benjamins Publishing Company Amsterdam/Philadelphia TM The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984 Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996 Library of Congress Cataloging-in-Publication Data How to use corpora in language teaching / edited by John McH Sinclair p cm (Studies in Corpus Linguistics, issn 1388–0373 ; v 12) Includes bibliographical references and indexes Language and languages Computer-assisted instruction I Sinclair, John McHardy, 1933- II Series P53.28 H69 2004 418’.00285-dc22 isbn 90 272 2282 (Eur.) / 58811 490 (US) (Hb; alk paper) isbn 90 272 2283 (Eur.) / 58811 491 (US) (Pb; alk paper) 2003067697 © 2004 – John Benjamins B.V No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher John Benjamins Publishing Co · P.O Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O Box 27519 · Philadelphia pa 19118-0519 · usa JB[v.20020404] Prn:25/03/2004; 13:35 F: SCL12CO.tex / p.1 (34-78) Table of contents List of contributors Introduction John Sinclair vii The corpus and the teacher In the classroom In preparation Corpora in the classroom: An overview and some reflections on future developments Silvia Bernardini 15 What teachers have always wanted to know – and how corpora can help Amy B M Tsui 39 Corpus linguistics, language variation, and language teaching Susan Conrad 67 Resources – Corpora Corpus variety Spoken – general Spoken corpus for an ordinary learner Anna Mauranen Spoken – an example The use of concordancing in the teaching of Portuguese Luísa Alice Santos Pereira 109 Learner corpora and their potential for language teaching Nadja Nesselhauf 125 Learner corpora 89 JB[v.20020404] Prn:25/03/2004; 13:35  F: SCL12CO.tex / p.2 (78-103) Table of contents Research Composition Textbooks The use of adverbial connectors in Hungarian university students’ argumentative essays Gyula Tankó A corpus-driven approach to modal auxiliaries and their didactics Ute Römer 157 185 Resources – Computing Basic processing Software for corpus access and analysis Michael Barlow 205 Programming Simple Perl programming for corpus work Pernilla Danielsson 225 Network Learner oral corpora and network-based language teaching: Scope and foundations Pascual Pérez-Paredes Prospects 249 New evidence, new priorities, new attitudes John Sinclair 271 Notes on contributors 301 Index 305 JB[v.20020404] Prn:9/03/2004; 9:44 F: SCL12LI.tex / p.1 (44-264) List of contributors Silvia Bernardini SSLMIT University of Bologna Corso della Repubblica 136 47100 Forlì, Italy Amy B M Tsui Chair Professor Faculty of Education The University of Hong Kong Pokfulam Road, Hong Kong SAR Susan Conrad Department of Applied Linguistics PO Box 751 Portland State University Portland OR 97202-0751, USA Anna Mauranen Professor of English Head of School School of Modern Languages and Translation Studies FIN-33014 University of Tampere Finland Luísa Alice Santos Pereira Centro de Linguística da Universidade de Lisboa Av Prof Gama Pinto, 1649-003 Lisboa, Portugal Nadja Nesselhauf English Department University of Basel Nadelberg 4051 Basel, Switzerland Gyula Tankó Assistant Lecturer Department of English Applied Linguistics Eötvös Loránd University Ajtósi Dürer sor 19-21 1146 Budapest, Hungary Ute Römer English Department University of Hanover Königsworther Platz 30167 Hannover, Germany Michael Barlow Department of Applied Language Studies and Linguistics The University of Auckland Fischer Building 18 Waterloo Crescent Auckland, New Zealand Pernilla Danielsson Centre for Corpus Research School of Humanities University of Birmingham Edgbaston Birmingham B15 2TT, UK Pascual Pérez-Paredes Departamento de Filología Inglesa Campus de la Merced Universidad de Murcia 30071 Murcia, Spain John Sinclair via Pandolfini 27 50122 Firenze, Italy JB[v.20020404] Prn:9/03/2004; 10:04 F: SCL12IN.tex / p.1 (44-107) Introduction John Sinclair Substantial collections of language texts in electronic form have been available to scholars for almost forty years, and they offer a view of language structure that has not been available before While much of it confirms and deepens our knowledge of the way language works, there is also a fascinating area of novelty and unexpectedness – ways of making meaning that have not previously been taken seriously Further, in studying corpora we observe a stream of creative energy that is awesome in its wide applicability, its subtlety and its flexibility This cornucopia has not been welcomed with open arms, neither by the research community nor the language teaching profession It has been kept waiting in the wings, and only in the last few years has any serious attention been paid to it by those who consider themselves to be applied linguists For a quarter of a century, corpus evidence was ignored, spurned and talked out of relevance, until its importance became just too obvious for it to be kept out in the cold The reasons for this neglect of vital information need not detain us long Just as the first electronic corpora were taking shape in the early nineteensixties,1 the focus of linguistic theory was shifting from the study of empirical data to the study of the mental processes that together are often called the language faculty This approach preoccupied most linguists until recently, and may still be the dominant paradigm world-wide After a few awkward attempts at the application of mentalist theory to language teaching, its relevance was generally accepted as minimal, and so a gap opened up between the theory of language and the teaching of languages, to the great detriment of the teaching profession Applied linguists, whose jobs were originally designed to mediate between theory and practice, took on the additional burden of providing quasi-theoretical underpinning for the linguistic side of language pedagogy, but their descriptions were not detailed enough to provide a firm foundation JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.24 (1262-1314)  John Sinclair In the teaching/learning process this does not need to be a big problem Using a corpus will for some years to come be a voyage of discovery at every level of education – the student, the teacher, the class, the institution, the educational authority, the curriculum planners, the publishers As new patterns and relationships emerge, they can be referred to quite informally and provisionally In the longer term the organisation of meaning may not become much more fixed, because of the opportunity that a speaker has in every utterance of developing the meaning of an item by manipulating the cotext We may have to talk in rather general categories, which is not attractive to teachers who want a firm foundation for their statements Once again, the much-maligned feature of frequency of occurrence can be used to define levels of granularity in description Most of the apparent problems in the classification of meaning are in the less common events, and for patterns which are often repeated the computer can find hard evidence, and that evidence, as the pattern grammars begin to show, is often semantically homogeneous Lexicographers distinguish between intensive and extensive definition The former is the usual kind, assigning a word to a superordinate and adding a feature that distinguishes it from its co-hyponyms Extensive definition, on the other hand, simply enumerates all the items that may be named by the word So “A liquid is a substance which flows ” (intensive), but “A substance is a solid, powder, liquid or gas ” (extensive) (Cobuild 1987) The enumeration of the most frequent members of a class is a simple and effective way of characterising the class, and it is an accurate definition at a specified level of granularity; once we get into fine detail, of course, the lists get too long to be manageable and something more like intensive definition is needed But that is some way ahead, and in our present discovery mode the extensive definition is quite adequate There are many ways of working this kind of exploration into teaching material Many English word forms can occur both as nouns and as verbs, sometimes with almost the same meanings, and sometimes with subtle differences, such as profit and combat You may come across several lexical items all of which suggest doom and gloom, but perhaps of different kinds Many words have a range of meanings often described as ranging from “literal” to “figurative” Such factors suggest that useful groupings might be made on an informal basis, and “supersets” (§7) formed This can be the basis for discussion of similarity and difference in meaning, comparisons of overlapping lists of realisations, and insights into the nature of the vocabulary JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.25 (1314-1367) New evidence, new priorities, new attitudes  The importance of becoming aware of these semantic groupings and characterising them cannot be overestimated, because these are the operational parameters that eventually determine the appropriacy of a phrasing; a keen understanding of them leads to sensitive interpretation when reading and listening, and guides effective communication in speech and writing Without access to a corpus and some simple tools for exploring it, this appreciation of the semantic preferences and prosodies can only be acquired through inductive learning over a long period; current reference books give guidance only on the extremely striking and obvious semantic parameters Remember that this is the practical side of meaning, and not the abstract classification of thesauruses (including visual ones), lexicons, ontologies, semantic webs, wordnets etc These deal with another aspect of meaning, one which has little to with the deployment of words in texts, and consequently is of little use in applications to textual analysis Incompleteness IV Finally we come to the question of the gaps that must exist in any description, and we must try to make a description whose gaps are in places that are not very important to the intended application – in this case language teaching and learning As §8 points out, an adequate description for all purposes should account for all observed patterns except the one-offs This target in turn depends on the size and design of the corpus, because if a corpus doubles in size then we can reasonably expect that some repetitions of the original single occurrences will be found, and lots of new one-offs as well The “granularity” model based ultimately on frequency of occurrence is a good basis for dealing with this issue Something that occurs a thousand times is likely to be more use to a learner than something that just occurs a few times This is an extension of the principle of §8, but a justifiable one One way of looking at a corpus is as a repository of “used language” (Brazil 1995) The main repetitive lines of its holdings probably reflect fairly closely what is readily available to an average user – his or her stored knowledge of the language A learner, presumably, has a much smaller, more patchy and less reliable store, and by becoming familiar with the corpus may be able to assimilate quite a lot §9 claims that there are two ways by which we understand text; one is by referring to our stored knowledge and the other by interpreting those portions of a text which are not explained by the stored knowledge It follows that a learner needs to develop strategies of interpretation (or adapt them from first language proficiency), and use this experience to feed and expand the store JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.26 (1367-1425)  John Sinclair Concordances are ideal material for developing interpretive strategies, as Bernardini (this volume) points out Returning to our example of fire for the last time, let us pick out what might appear to be a strange collocation, requiring some interpretation This is the phrase friendly fire If it was from the “burning” set of meanings rather then the “shooting” ones, then it might mean the warm look of a bright fire on a cold night; if it was from the shooting set, then it could mean supportive fire from colleagues in the area Unfortunately it does not, and we can see from a tiny but unbiased selection: the casualties came from misdirected giving details of what are known as determine whether those men died from The men were killed by US that has been causing so many ‘friendly fire’ from a Russian helicopter gunship friendly-fire incidents in which the Americans friendly fire, a phenomenon which he said friendly fire They died when an American ‘friendly fire’ casualties With slightly wider cotexts, each of these instances makes it clear what the phrase means e.g (the fourth line above) The men were killed by US friendly fire They died when an American aircraft fired on two British armored personnel carriers by mistake The collocational profile of this phrase confirms the interpretation: by deaths during caused killed were from died casualties victims American British incident Gulf marine hit so-called US tragedy been The whole story is there; the presence of so-called as a prominent collocate suggests that the latent irony in the phrase is still vivid for many people, though it is used as a technical term in military reports and discussions An examination of this kind is very likely to add friendly fire to the stored knowledge of the learners, along with its tragic prosody and its provenance of warfare Conclusion I should reiterate that even the biggest corpora available are still small, especially for retrieving information about multi-word lexical items, which more and more are becoming the focus of research The software available to someone who is not a computer specialist is simple and not very flexible, and sometimes it maddeningly refuses to the obvious We can certainly look forward JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.27 (1425-1476) New evidence, new priorities, new attitudes  to having much more powerful tools and almost limitless text to try them out on – the Web itself is now being examined by search engines looking for concordances and collocations, and that, while still rather a jungle compared with a reasonably tidy corpus, is another huge source of language that is available in the classroom or the study at home To summarise the results of this investigation, we argued against ambiguity being an important property of language, but merely the result of bad theories; most of it can be dispelled by widening the cotext, and that gives rise to a whole panoply of activities that can derive from a corpus, large or small On the other hand, variation is a major and essential property of language, but it can be controlled easily using the information provided by the corpus, and need not confuse the learner nor suggest that language is more complicated than it is The problem of terminology will not be solved in a short article, but the case is made for introducing a less misleading set of terms, which are easily coined, and to move gradually from one to the other New terms, which will be needed in large numbers once the new information begins to be codified, can be extensively defined, leaving them open-ended and flexible Corpus information can be used to control the inevitable gaps in the description, by introducing the notion of granularity into description and starting from the most clearly outlined and most often used patterns All the work suggested here can be merged with traditional models of language and language-teaching; the emphasis is different, and gradually we can expect the interests of the students to shift into new areas, but there is nothing revolutionary in my proposals as offered here As time goes on there will be alternative theoretical positions formulated, and new descriptions begun, but for the present any teacher or student can readily enter the world of the corpus and make the language useful in learning Notes  At the time of finalising this paper, The Bank of English consisted of around 450 million words of contemporary English, last updated in 2002, and covering most major types of text, drawn principally from British English but with a substantial representation of other nativespeaker varieties, particularly transatlantic and antipodean English The corpus contains large amounts of transcribed spoken material The Bank of English is jointly owned by The University of Birmingham and HarperCollins plc., and can be accessed from the website http://www.cobuild.collins.co.uk I am grateful for continued access to this unique resource JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.28 (1476-1545)  John Sinclair  Susan Conrad’s paper in this volume deals with variations in usage that are associated primarily with register, or language suited to the occasion of its utterance or dissemination, so I will leave that area to her, and concentrate on the variability that does not entail contrasting varieties  I am using canonical form to mean the most explicit, full and unambiguons presentation of a lexical item that can be achieved  small caps indicates that the word form cited stands for the whole lemma, that is all the usual inflected forms of the word So save is short for save, saves, saving and saved  To this I would add expect, and there may be one or two other contenders  The consequences of this faulty view of language structure go well beyond a few teaching problems; it is now standard opinion in Information Technology that language is quite inadequately structured, especially semantically, and requires the erection of a massive external edifice to “code” the meanings and usages These are called “lexicons” or “wordnets”, and have consumed huge resources over the last fifteen years None so far seems to work very well  In a fully formal grammar there are devices that allow a finite system to match any number of patterns – for example Chomsky’s well-known recursive rules (1965: 6) Although indefinitely large, such a list of sentences will never be complete  On other occasions it could of course be the single occurrences which are the object of enquiry  Chomsky’s early phrase structure grammar used such a device, where the passive was formed by adding (be + en) to the verb phrase structure, and a later transformational rule transferred the “en” to the end of the following word  I had forgotten where this reference came from, but such is the wonder of e-mail that in a few hours I was reminded of it, courtesy of Anthony Chenells and Bill Louw of Zimbabwe  The instance is arrived at by choosing all the lines containing the next most numerous collocate and repeating that process until there is only one left This routine is automated in the program C-lect, and is very useful whenever you want to isolate a very typical example of a pattern  Halliday (2002) makes the same point with reference to modality References Aijmer, K (2002) English Discourse Particles [Studies in Corpus Linguistics 10] Amsterdam: John Benjamins Barnbrook, G & J Sinclair (2001) Specialised corpus, local and functional grammars In M Ghadessy, A Henry, & R Roseberry (Eds.), Small Corpus Studies and ELT [Studies in Corpus Linguistics 5] (pp 237–276) Amsterdam: John Benjamins Brazil, D (1995) A Grammar of Speech Oxford: Oxford Univeristy Press Carter, R (1987) Vocabulary London: Allen and Unwin Chomsky, N (1965) Aspects of the Theory of Syntax Cambridge, MA: The MIT Press JB[v.20020404] Prn:9/03/2004; 10:58 F: SCL1212.tex / p.29 (1545-1617) New evidence, new priorities, new attitudes  Cobuild (1987) Collins Cobuild English Language Dictionary ed J Sinclair, P Hanks et al 2nd edition 1995 London: HarperCollins Cobuild (2002) A Dictionary of Idioms Glasgow: HarperCollins Francis, G., S Hunston, & E Manning (1996) Grammar Patterns 1: Verbs London: HarperCollins Halliday MAK (2002) “Judge takes no cap in mid-sentence”: On the complementarity of grammar and lexis University of Birmingham, Department of English Orr, J (1939) On Homonymics Studies in French Language and Mediaeval Literature presented to Professor Mildred K Pope, 253–297 Manchester Palmer, F R (Ed.) (1968) Selected Papers of J R Firth 1952–1959 London: Longman Partington, A (1998) Patterns and Meanings Using Corpora for English Language Research and Teaching [Studies in Corpus Linguistics 2] Amsterdam: John Benjamins de Saussure, F (1917) Cours de Linguistique Générale Paris: Payot Sinclair, J (1991) Corpus, Concordance, Collocation Oxford: Oxford Univeristy Press Sinclair, J (1998) The lexical item In E Weigand (Ed.), Contrastive Lexical Semantics [Current Issues in Linguistic Theory 17] (pp 1–24) Amsterdam: John Benjamins Reprinted in Sinclair (2004b) Sinclair, J (2001) Lexical grammar In M Gellerstam, K Jóhanesson, B Ralph, & L Rogström (Eds.), Nordiska Studier I Lexicografi 5, Proceedings of the Fifth Conference on Nordic Lexicography, Göteborg 27th–29th May 1999 Göteborg, Skrifter utgivna as Nordiska förenginen för lexikografia Meijerbergs Arkiv för Svensk Ordforskning 27 Göteborg, Meijerbergs Institut för Svensk Etymologisk Forskning (pp 323–343) Göteborg: Göteborgs Universitet Reprinted in Sinclair (2004b) Sinclair, J (2003) Reading Concordances London: Longman Sinclair, J (2004a) Intuition and annotation – the discussion continues In B Altenberg & K Aijmer (Eds.), Proceedings of the 23rd ICAME Conference Amsterdam: Rodopi Sinclair, J (2004b) Trust the Text, ed R Carter London: Routledge JB[v.20020404] Prn:30/01/2004; 15:17 F: SCL12NO.tex / p.1 (45-118) Notes on contributors Silvia Bernardini currently has a research contract with the Department of Intercultural Studies in Translation, Languages and Culture of the University of Bologna at Forlì, Italy, where she is involved in the construction of the CEXI corpus, a parallel bi-directional corpus of English and Italian Her research interests include corpus-based translation studies and contrastive linguistics and the didactic applications of corpora in English language teaching and translation She is co-editor of the journal Languages in Contrast Amy B M Tsui is Chair Professor in the Faculty of Education of The University of Hong Kong She obtained her PhD in linguistics in 1986 at The University of Birmingham She has published widely in the areas of discourse analysis, language policy, teacher education and ICT in teacher education Her most recent publications include three books, Understanding Expertise in Teaching (2003), New York: Cambridge University Press; Medium of Instruction Policies: Which Agenda? Whose Agenda?, Mahwah, NJ: Erlbaum, co-edited with J Tollefson (2004), and Classroom Discourse and the Space of Learning, Mahwah, NJ: Erlbaum, co-authored with F Marton et al (in press) Susan Conrad is an associate professor in the Department of Applied Linguistics at Portland State University Her work in corpus linguistics includes collaborations on Corpus Linguistics: Investigating Language Structure and Use (Cambridge University Press), the Longman Grammar of Spoken and Written English, and the edited collection Variation in English: Multi-dimensional Studies (Longman), as well as articles in journals such as TESOL Quarterly, System, and Linguistics and Education Anna Mauranen is professor of English at the University of Tampere, Finland Her major publications are in contrastive rhetoric, discourse analysis and corpus linguistics, including Cultural Differences in Academic Rhetoric Her current research and publications focus on corpus linguistics, speech corpora, English as lingua franca and translation studies She is compiling a corpus on JB[v.20020404] Prn:30/01/2004; 15:17 F: SCL12NO.tex / p.2 (118-172)  Notes on contributors spoken English as lingua franca (the ELFA corpus) and running a large research project “Translated Finnish and Translation Universals” Luísa Alice Santos Pereira is a High School teacher of Portuguese as mother language, with experience on teaching Portuguese as foreign language She works in the Linguistic Research Center of Lisbon University (Centro de Linguística da Universidade de Lisboa – CLUL) as collaborator of research in the group of Corpus Linguistics She collaborated with the redactorial team of the Dictionary of Contemporary Portuguese Language (2001) (Dicionário da Língua Portuguesa Contemporânea) Nadja Nesselhauf is an Assistant at the Department of English at the University of Basel, Switzerland, where she teaches courses in corpus linguistics, second language acquisition, and phonology She has just finished her PhD, which investigates the use of collocations by advanced learners of English on the basis of a learner corpus Her main research interests are foreign language teaching (in particular the use of corpora for teaching), second language acquisition, and phraseology Gyula Tankó is an Assistant Lecturer at the Department of English Applied Linguistics of Eötvös Loránd University in Budapest He has taught EFL courses, academic writing courses, discourse and corpus analysis courses for teaching purposes at undergraduate level, and methodology courses on the teaching and assessment of writing both at undergraduate level and as part of in-service teacher training courses His research areas are discourse analysis, corpus linguistics, research methodology and testing He is currently working on his PhD dissertation in which he investigates the Rhetorical Move Structure of argumentative essays Ute Römer studied English linguistics and literature, Chemistry and Pedagogy at the University of Cologne and now works as a researcher and lecturer in English linguistics at Hanover University She is currently writing up her doctoral thesis on the functions, contexts and didactics of English progressive verb forms, taking a corpus-driven approach to the topic Main research and teaching interests include corpus linguistics and discourse intonation She has recently co-edited Language: Context and Cognition Papers in Honour of Wolf-Dietrich Bald’s 60th Birthday (2002) and has published articles on corpus linguistics and language teaching JB[v.20020404] Prn:30/01/2004; 15:17 F: SCL12NO.tex / p.3 (172-214) Notes on contributors  Michael Barlow completed his PhD in Linguistics at Stanford in 1988 Since that time he has compiled the Corpus of Spoken Professional American English and has created a variety of text analysis tools including MonoConc, ParaConc and Collocate The main strands of his other research interests are: usage-based models of language and the use of corpora in language teaching Pernilla Danielsson (PhD from Gothenburg, 2001) is the Academic Director for the Centre for Corpus Research at Birmingham University She moved to Birmingham in 2000, to take up the role of project manager of the EU concerted action TELRI II Since the end of the project in 2002, she has been involved in setting up the new centre as well as lecturing on the Birmingham’s master degrees in corpus linguistics She is also involved in running short courses in corpus linguistics, both in Birmingham and at the Tuscan Word Centre Her own research covers areas of identifying units of meaning in corpora She is the co-editor of Meaningful Texts (together with Geoff Barnbrook and Michaela Mahlberg) and is working on publishing her monograph ‘Retrieving Meaningful Units from Corpora’ Pascual Pérez-Paredes has worked as an EFL teacher in Spain since 1989, first in Secondary Schools and later, for eight years, in Escuelas Oficiales de Idiomas (State-run Language Schools) Since 1996 he has been working at the English Department in the University of Murcia, Spain He completed his doctorate in English Philology in 1999 and currently teaches English Language and Translation He is also a Sworn Translator John McH Sinclair was Professor of Modern English Language at the University of Birmingham for most of his career, and Editor-in-Chief of Cobuild for much of that time His education and early work was at the University of Edinburgh, where he began his interest in corpus linguistics, stylistics, grammar and discourse analysis He now lives in Italy, where he is President of The Tuscan Word Centre He holds an Honorary Doctorate in Philosophy from the University of Gothenburg, and an Honorary Professorship in the University of Jiao Tong, Shangai He is an Honorary Life Member of the Linguistics Association of Great Britain and a member of the Academia Europæa JB[v.20020404] Prn:31/03/2004; 15:01 F: SCL12IND.tex / p.1 (23-141) Index A adverbial connectors – distribution 168 adverbial connectors – position 175 adverbial connectors – semantic relationship 171 adverbial connectors 70, 157seq ambiguity 272seq arbitrariness of the linguistic sign 274 arrays 240 arreigar-se 116 authenticity 19, 91 D data-driven learning 16, 126 day after day 48 day by day 49 deduzir 113–115, 121 definite article 53 digital bridge 260 dimensions of variation 75 discovery learning 22 B Bank of English 272, 297 bi-directional corpora 20 BNC 23, 185 F famoso 116–118 formulaic expressions 95 frequency counter 239 frequency of occurrence 40 friendly fire 296 C célebre 116–118 classroom 15, 99 Cobuild 126 cohesion 160 collocates 26, 212seq collocation 212seq., 289 communicative utility 94 comparable corpora 20 concession 71 concordance 209, 288 concordancer 243 contrast/concession 71 corpus access and analysis 205seq coselection 292 CRPC (Portuguese) 109 E elaborated reference G graduar 76, 84 116 H hailed 24–25 hands-on 102 high 45–48 I idiom principle 18 impersonal style 76, 85 imply 57seq JB[v.20020404] Prn:31/03/2004; 15:01 F: SCL12IND.tex / p.2 (141-230)  Index incompleteness of description 272seq infer 57seq informational production 76, 83 involved 76, 82 L language awareness 41 language pedagogy 16 learner autonomy 28, 258 learner corpora – lists 129, 150–152 learner corpora – potential and limitations 131 learner corpora and pedagogic material 137seq learner corpora availability 133 learner corpora 125, 127seq learner corpus studies 134seq learner oral corpora – taxonomy 255 learner oral corpora 129, 152, 249seq lexical frameworks 217 lexical item 18, 281–283 lexicogrammar 277 Lexicon of Portuguese 110 linking adverbials 70, 157 M mark-up 206 MICASE corpus 100 modal auxiliaries – co-occurrence 189 modal auxiliaries – frequency 186 modal auxiliaries – meanings 187 modal auxiliaries 185seq modals in EFL teaching 190seq modals in textbooks vs modals in corpus 193seq multi-dimensional comparisons 74seq N narrative 76, 83 native speaker 30 naturalistic learning 94 network-based language teaching 152seq none of 53 notável 116–118 P parallel corpora 20 pattern grammar 276 Perl – changing access rights 228 Perl interpreter 228 Perl programming 225seq persuasion 76, 84 Português Falado 110 prefabricated units 90 prefabs 96 programs – concordance 209 programs – concordancer 243 programs – frequency counter 239 programs – tokeniser 231 programs – word splitter 236 programs – wordlist 207 R recurrence 287 regular expressions 233 resources for teaching 113 rules and evidence 50 S schema theory 17 semantic preference 293 semantic prosody 292 serendipidity 23 Sir 29 situation-dependent reference 84 sofisticar 116 spoken corpus 89 staunch 24–25 sub-corpus 27 76, JB[v.20020404] Prn:31/03/2004; 15:01 F: SCL12IND.tex / p.3 (230-271) Index  subject-verb agreement 50 suggest(s/ed/ing) 141, 142 supersets 286 synonymous items 44 T tall 44–46 Telecorpora 43 TeleNex 42 terminology 272seq the case 96 there’s 50 thing 210seq though 70–73 tokeniser – enhanced 238 tokeniser 231 translation corpora 19 U used language 91, 295 V variation 67seq., 272seq verbs of perception 278 W well-experienced 55 word splitter 236 wordlist 207 In the series STUDIES IN CORPUS LINGUISTICS (SCL) the following titles have been published thus far: PEARSON, Jennifer: Terms in Context 1998 PARTINGTON, Alan: Patterns and Meanings Using corpora for English language research and teaching 1998 BOTLEY, Simon and Anthony Mark McENERY (eds.): Corpus-based and Computational Approaches to Discourse Anaphora 2000 HUNSTON, Susan and Gill FRANCIS: Pattern Grammar A corpus-driven approach to the lexical grammar of English 2000 GHADESSY, Mohsen, Alex HENRY and Robert L ROSEBERRY (eds.): Small Corpus Studies and ELT Theory and practice 2001 TOGNINI-BONELLI, Elena: Corpus Linguistics at Work 2001 ALTENBERG, Bengt and Sylviane GRANGER (eds.): Lexis in Contrast Corpus-based approaches 2002 STENSTRÖM, Anna-Brita, Gisle ANDERSEN and Ingrid Kristine HASUND: Trends in Teenage Talk Corpus compilation, analysis and findings 2002 REPPEN, Randi, Susan M FITZMAURICE and Douglas BIBER (eds.): TUsing Corpora to Explore Linguistic Variation 2002 10 AIJMER, Karin: English Discourse Particles Evidence from a corpus 2002 11 BARNBROOK, Geoff: Defining Language A local grammar of definition sentences 2002 12 SINCLAIR, John McH (ed.): How to Use Corpora in Language Teaching 2004 13 LINDQUIST, Hans and Christian MAIR (eds.): Corpus Approaches to Grammaticalization in English n.y.p 14 NESSELHAUF, Nadja: Collocations in a Learner Corpus n.y.p 15 CRESTI, Emmanuela and Massimo MONEGLIA (eds.): C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages n.y.p

Định dạng
Số trang	317
Dung lượng	3,08 MB