A corpus based study of the linguistic features and processes which influence the way collocations are formed some implications for the learning of collocations

A Corpus-Based Study of the Linguistic Features and Processes Which Influence the Way Collocations Are Formed: Some Implications for the Learning of Collocations CRAYTON PHILLIP WALKER University of Birmingham Birmingham, England In this article I examine the collocational behaviour of groups of semantically related verbs (e.g., head, run, manage) and nouns (e.g., issue, factor, aspect) from the domain of business English The results of this corpus-based study show that much of the collocational behaviour exhibited by these lexical items can be explained by examining some of the linguistic features and processes which influence the way collocations are formed These include the semantics of the individual items themselves, the use of metaphor, semantic prosody, and the tendency for many of the selected items to be part of larger phraseological units I show that it is possible to explain many of these collocations by considering the linguistic features and processes which have influenced the way they have been formed My contention is that, if the learner is encouraged to look for an explanation, it makes the process of learning collocations more memorable doi: 10.5054/tq.2011.247710 he subject of collocation has received considerable attention in the field of language teaching over recent years A number of authors (Lewis, 1993, 1997, 2000; McCarthy, 1990; Nation, 2001; Thornbury, 2002; Woolard, 2000) have represented collocations as being either partially or fully arbitrary, and several studies (Benson, 1989; Nesselhauf, 2003, 2005; Smadja & McKeown, 1991) have even used arbitrariness as part of their definition of what constitutes a collocation Lewis claimed that ‘‘collocation is an arbitrary linguistic phenomenon’’ (Lewis, 1997, p 32), and, as a consequence, teachers are urged not to attempt to explain collocations to their learners If collocations are simply arbitrary combinations of words, it means that the foreign language learner has little option but to memorise large numbers of collocations with very little in the way of explanation or any T TESOL QUARTERLY Vol 45, No 2, June 2011 291 other help in memorising them The learner is liable to become very dependent on a dictionary, especially a collocational dictionary, checking whether a particular combination is acceptable or not before using it in his or her writing If, on the other hand, there is some sort of explanation as to why a particular word is frequently found in the company of one or more others, it means that the foreign language learner is able to understand how and why a particular combination is frequently used by native speakers Instead of trying to remember large numbers of collocations, the learner would be able to produce some of these combinations by using his or her understanding of the linguistic features and processes which influenced the way they were formed More recently there have been a few publications (Crowther, Dignen, & Lea, 2002; McCarthy & O’Dell, 2005) which have taken the position that not all collocations are arbitrary and have started to present collocations in such a way that students can begin to understand why one particular word is frequently found in the company of another Unfortunately, there is very little research so far to support this position Although Kennedy (2003) did not go into the question directly, his corpus-based research concerning the collocational behaviour of adverbs of degree or amplifiers (e.g., absolutely, completely, utterly, rather, about, somewhat) seems to show that the collocations they form are not that arbitrary Liu (2010) is one of the few studies which critically examined the accepted definition of collocation and found that many collocations can be explained using a combination of techniques drawn from the disciplines of corpus linguistics and cognitive linguistics The aim of the current study is to show that collocation is not simply an arbitrary phenomenon but is a process which can be partially explained by examining some of the linguistic features and processes which influence the way collocations are formed In order to this, the study uses a corpus-based methodology to investigate the collocational behaviour of groups of semantically related nouns and verbs taken from the domain of business English The study found that the process of collocation is influenced by, for example, the precise meaning or meanings of a particular lexical item, the use of metaphor, and any phraseological behaviour or semantic prosody associated with the item COLLOCATION In this article the term a collocation (countable noun) is used to refer to a combination of two or more words which occur together or in close proximity to each other in both written and spoken discourse, whereas the term collocation (uncountable noun) is used in a more general sense 292 TESOL QUARTERLY to refer to ‘‘the habitual co-occurrence of individual lexical items’’ (Crystal, 2003, p 82) It is clear from the literature that a collocation is defined in a variety of ways, and that these different definitions reflect differences in approach, the only common denominator being that the term is used to refer to some kind of syntagmatic relationship between words However, it is possible to group the different definitions into two broad categories, those which use what I call a lexical approach to collocation (Carter, 1987; Cowie, 1998; Howarth, 1996, 1998) and those which use a frequency or statistically based approach (Moon, 1998; Nesselhauf, 2003, 2005; Sinclair, 1991) Studies which follow a lexical approach use lexical criteria to decide whether a particular combination can be classified as a collocation or not According to this approach a collocation will typically exhibit a degree of fixedness and/or a lack of transparency in meaning There is a tendency with this type of approach to create categories (e.g., unrestricted, semirestricted, familiar, and restricted collocations; Carter, 1987, p 63) based on the lexical characteristics exhibited by different combinations Studies which use a frequency or statistically based approach generally consider a collocation to be a co-occurrence of words within a certain distance of each other Collocations are seen as being co-occurrences that are ‘‘more frequent than could be expected if words combined randomly in a language’’ (Nesselhauf, 2005, pp 11–12) Frequencybased approaches are often associated with the work of Sinclair, whose own approach to collocation was, in turn, influenced by the work of Firth (1957, 1968) Collocations are viewed more in terms of probability, where the strength of a particular collocation is assessed on the basis of how frequently it appears in a large representative sample of discourse According to Halliday, ‘‘the native speaker’s knowledge of his language will not take the form of his accepting or rejecting a given collocation: he will react to something as more acceptable or less acceptable on a scale of acceptability’’ (1966, p 159) In other words, the question is not whether something is a collocation or not but rather whether a particular collocation is more or less acceptable This means that there are virtually no impossible collocations, but that some collocations are much more likely to occur than others However, as Halliday has pointed out, there is a need for at least one cutoff point in order to eliminate combinations which are simply the result of a random distribution of items within the discourse Sinclair, writing in the Office of Scientific and Technical Information (OSTI) report (Krishnamurthy, 2004) first circulated in 1970,1 used the term The original OSTI report (1970) only had a limited distribution but has recently been republished This new edition, entitled English Collocation Studies, is edited by Ramesh Krishnamurthy (2004) A CORPUS-BASED STUDY OF COLLOCATION 293 significant collocations to refer to combinations which co-occur more frequently than ‘‘their respective frequencies and the length of the text in which they appear would predict’’ (Sinclair, Jones, & Daley, 1970, p 10) Sinclair (1966) also used three very useful terms for any discussion of collocation We may use the term node to refer to an item whose collocations we are studying, and we may define a span as the number of lexical items on each side of a node that we consider relevant to that node Items in the environment set by the span we will call collocates (Sinclair, 1966, p 415) Writing in the OSTI report, Sinclair went on to explain that there is essentially no difference in status between the node and a collocate if word A is a node and word B one of its collocates, when B is studied as a node, word A will be one of its collocates In practice, however, it is convenient to examine the behaviour of one item at a time and the use of the two terms enables a useful distinction to be made when describing results (Sinclair et al., 1970, p 10) Sinclair and Jones (1974) proposed a span of four words on either side of the node word The following nomenclature is normally used to describe the positions in the span; node –1 to –4 describe the four positions to the left of the node and node +1 to +4 describe the positions to the right, as can be seen in the example below: the long node –4 node –3 and node –2 painful process node –1 node of rebuilding node +1 node +2 this Country node +3 node +4 Although there is some statistical basis for using a span of four words (Mason, 1997, 1999), the distance between a collocate and a node will depend on both lexical and grammatical elements For example, the distance between the node and the collocate(s) will normally be greater in the case of verb/noun collocations compared with adjective/noun or noun/noun combinations, and consequently it may be necessary to use a wider span when verb/noun collocations are being examined Arguably, a frequency or statistical approach is more suited to a corpus-based methodology, because it enables large quantities of spoken or written discourse stored on a computer to be analysed by software programmes (concordancing packages) which can extract the most frequent, or the most statistically significant collocates associated with a particular node These programmes can be used to rank collocates according to frequency or statistical significance for each of the different positions within the span It is also possible to specify a cutoff point, as proposed by Halliday (1966), in order to eliminate combinations which 294 TESOL QUARTERLY may simply be the result of random distribution The approach to collocation used in the current study has been influenced by both the lexical and frequency or statistically based approaches SEMANTIC PROSODY I would like to briefly discuss semantic prosody here in the introductory section, because the concept is referred to a number of times later in the article The term semantic prosody2 was first used by Louw in an article published in 1993, where he credits Sinclair with having provided him with both the idea and the term in a personal communication Sinclair (1991) examined the collocational behaviour of the phrasal verb set in and found that most of the subjects associated with it referred to ‘‘unpleasant states of affairs’’ (Sinclair, 1991, p 74) Louw suggested that semantic prosody is the result of a diachronic process, whereby meaning has been transferred from one word or words to another, and defined semantic prosody as being a ‘‘consistent aura of meaning with which a form is imbued by its collocates’’ (1993, p 157) The term semantic prosody is also used by some writers (Nelson, 2006; Sinclair, 1996, 2004a, 2004b; Stubbs, 2001, 2009) in a wider sense to describe the way in which a lexical item can develop one of a range of different prosodies such as ‘‘ ‘something nasty’ or ‘something worrying’ or ‘disturbing’ [ ] ‘something magnificent’, ‘socially appropriate’ ‘positively constructive’ etc.’’ (Sinclair, 2004b, p 173) However, it can be argued that when the term is used in this wider sense, it is simply reflecting the rather complex and multifaceted nature of the meaning of a lexical item I have chosen to limit the use of the term in this article to Louw’s original notion of a lexical item having either a positive or negative prosody, depending on whether it is frequently associated with collocates which refer to desirable or undesirable items or events METHODOLOGY The main corpus used in the current study was the Bank of English (BoE)3 which is a large corpus of general English consisting of 450 million words A second more specialised corpus of business English was also used in order to check that the results obtained from the corpus of For a comprehensive account of semantic prosody, please refer to Stewart (2009) The Bank of English (BoE) corpus is jointly owned by HarperCollins Publishers and the University of Birmingham During 2003 to 2006, when most of the research for this study was carried out, the corpus contained 450 million words http://www.titania.bham.ac.uk A CORPUS-BASED STUDY OF COLLOCATION 295 general English are valid in the domain of business English The second corpus, which was made up of commercial and financial data files from the British National Corpus,4 contains 6.3 million words In the current study this second more specialised corpus is referred to as the British National Commercial Corpus (BNCc) The lexical items selected for study (i.e., the nodes), referred to as the selected items, were chosen for two reasons First, because they are all high-frequency items in the BNCc, and, second, because each item within a particular group is a partial or close synonym of the other (e.g., process was chosen because it is a close synonym of procedure and system) Experience gained from the pilot studies showed that it was more fruitful to establish the collocational behaviour of a particular selected item by comparing its collocational behaviour with that of a synonym or near synonym It was therefore decided to study the collocational behaviour of groups of synonyms or near synonyms rather than of individual items Table shows the four groups of items selected for study (the table does not include plural forms, which were also studied) Synonymy, near-synonymy, and frequency were not the only criteria used when selecting the items Some items were chosen because they are particularly important within the context of teaching business English (e.g., RUN,5 HEAD, MANAGE), whereas others were selected because, in my experience,6 learners frequently have difficulties using the item or items appropriately (e.g., issue, aspect, factor) These difficulties are frequently caused by cross-linguistic factors, such as a level of semantic incongruency between items in the learner’s first language and the target language or the fact that the item already exists as a loan word in the first language As already mentioned, it is often difficult to attach a precise level of significance to a list of collocates ranked solely according to the number of times they occur (raw frequency) together with the node For this reason statistical measures such as t-score7 are used in order to assign a more precise level of significance to each co-occurrence For example, any collocate with a t-score of 2.00 or above can be regarded as significant (Barnbrook, 1996, p 98); that is, the way that it combines with the node is The British National Corpus (BNC) is a 100 million word corpus developed in the 1980s It is maintained and distributed by the Oxford University Computer Service (OUCS) http:// www.natcorp.ox.ac.uk Capital letters are used to indicate that reference is being made to all members of a lemma RUN, for example, refers to run, ran, runs, running In this study the lemma is only used with verb forms, that is, the members of Group I spent 18 years in Germany teaching business English in large organisations such as Bosch GmbH, Audi AG, Siemens AG, and Deutsch Bank AG The t-score is a statistical instrument which is used to measure distribution, or more specifically how the distribution of something deviates from what is standard For more information regarding t-score, please refer to Barnbrook (1996) and Hunston (2002) 296 TESOL QUARTERLY TABLE The Four Groups of Selected Items Group Item issue, aspect, factor aim, objective, target, goal* RUN, HEAD, MANAGE, DEAL with, HANDLE system, process, procedure Noun forms Noun forms Verb forms Noun forms not simply the result of random distribution The t-score value usually reflects how frequently a particular combination occurs in the corpus, that is, the more frequent the collocation, the higher the t-score Given that there have been a number of reservations expressed about the use of statistical measures in corpus research (Clear, 1993; Stubbs, 1995), both tscore and raw frequency data are included in the current study The first stage of the research consisted of establishing a collocational profile for each of the selected items using a corpus of general English, in this case the BoE This involved identifying the most frequent collocates for each of the positions within a span of four words to the left and right of the node The easiest way to this with the BoE is to use the picture function, which identifies and ranks the most frequent collocates for each of the positions within the span Table shows a tpicture for the node word aspect where the collocates are ranked according to their t-score values, with the highest (i.e., the most significant collocates) at the top of each of the four columns to the left and to the right of the node During the first stage of the research, all the relevant information from the collocational profile was carefully recorded This involved listing each of the 20 most frequent collocates together with its t-score value and raw frequency; that is, the number of times the collocate was found to occur with the node in this particular position If one takes the first group of selected items (issue, aspect, factor) as an example, an examination of the TABLE A t-Picture for the Node Word Aspect Where the Most Frequent Collocates Are Ranked According to Their t-Score Values is perhaps there this not it but has ,p focus the is on there ,p about to or this was most the a an on one only about every this every one this an important another any other some particular aspect aspect aspect aspect aspect aspect aspect aspect aspect aspect A CORPUS-BASED STUDY OF COLLOCATION Of Is Ratio between Which computing Call however though to The This His Our their your life it her my life s game work lives is business job policy relations is s that which life was work and system write 297 corpus data reveals both shared collocates, that is, collocates which are frequently associated with all the selected items in the group, and characteristic collocates, that is, collocates which are more frequently associated with one item in the group It is significant that, as a general rule, shared collocates are more frequent than characteristic ones Once the data had been recorded, they could then be manually examined for characteristic collocations (e.g., controversial issue, worrying aspect, growth factor), which reflect the precise meaning of individual items within the group The data were also examined for fixed or semifixed phrases (e.g., every aspect of, take issue with), for collocations which reflect either different polysemous or homonymous forms (e.g., the latest issue of, a controversial issue, a share issue), for signs of a particular semantic prosody (e.g., a long and difficult process) and for the use of metaphor (e.g., meet the 3% target) The aim of the second stage of the research was to establish a collocational profile for each selected item using a corpus of business English By comparing the two profiles (i.e., the profile obtained from the BNCc and the profile obtained using the BoE) it was possible to establish whether there are any significant differences in the way the selected items are used in a business domain compared with a more general one In this particular case, the results showed that there were very few differences in the way the selected items are used in the two domains and, as a result, much of the data used in this article have been taken from the BoE because, as the larger of the two corpora, it is liable to yield more reliable results RESULTS AND DISCUSSION Results from the current study show that there are a number of linguistic features and processes which influence the way in which collocations are formed The first of these is concerned with semantic and pragmatic features associated with the selected item (i.e., the node word) itself Semantics and Usage The corpus data show how items such as issue, aspect, or factor are frequently used as cohesive devices in both spoken and written discourse Halliday and Hasan (1976) used the term general noun8 to 298 Francis (1994, pp 83–88) used the terms ‘‘advance labels’’ and ‘‘retrospective labels’’ to refer to nouns or noun groups which are frequently used to label stretches of text Partington (1998) also examined the way in which these general nouns function as cohesive devices TESOL QUARTERLY refer to a class of nouns (and noun phrases) which are frequently used as cohesive devices in text They are part of the system of deixis in English and function as proforms, which typically refer to either individual items (e.g., place, man, woman, boy) or to whole stretches of discourse (e.g., situation, question of, issue) Seven out of the ten most frequent node 21 collocates associated with issue, aspect, and factor belong to a group of evaluative adjectives (important, key, main, major, crucial, critical, vital) which seem to have the same semantic function— that of attributing a level of importance to the node word These shared collocates were found to be associated mainly with the way in which issue, aspect, and factor are used as general nouns Here is one example taken from the BoE data, where issue refers forward to what the writer regards as being the most important issue in the presidential election By far the most important issue in the campaign was the state of the national economy Clinton won because he presented himself as a competent, moderate alternative to a president who was perceived as having failed to manage the economy Although the shared collocates are generally the most frequent collocates, it can be argued that the characteristic collocates, which normally occur slightly lower down in any list of collocates ordered by t-score or frequency, are more useful to learners as they highlight slight but significant differences in the way that the selected items from a particular group are used In the case of the items from Group 1, for example, an issue is frequently seen as something which is contentious and controversial, whereas an aspect is something which can be worrying or disturbing (Table 3) Factor, on the other hand, was found to be frequently associated with more technical usages (e.g., growth factor) but is also used in a kind of pseudotechnical way (e.g., feel-good factor), which may be an attempt to bring a sense of objectivity to something that can only really be measured by more subjective means Table shows the most frequent node 21 collocates associated with issue, aspect, and factor For example, sensitive issue example, issue is the node and sensitive is the collocate which appears in the node 21 position The values in Table show that this collocation occurs 316 times in the BoE and a t-score value of 17.77 is a measure of the statistical significance of this combination i.e., the collocation is not simply the result of chance as the t-score is well above 2.00 Although it is clear from the data that all three Group items are frequently used as cohesive devices, it is also clear from the characteristic collocates that they not all have the same meaning and associations By choosing one item over another, the user is obviously making some form of evaluation and is not simply referring to another item or stretch of discourse in a neutral manner It is precisely these slight but A CORPUS-BASED STUDY OF COLLOCATION 299 TABLE The Most Frequent Characteristic Collocates Associated With Issue, Aspect, and Factor Issue Node –1 sensitive contentious controversial worrying disturbing pleasing risk feel-good growth Aspect Factor t-Score Frequency t-Score Frequency t-Score Frequency 17.77 15.45 15.09 2.64 1.99 0.00 1.00 0.00 2.00 316 239 229 4 2.00 2.45 6.32 8.42 6.41 5.29 0.00 0.00 0.00 40 71 41 28 0 1.00 1.41 1.00 3.60 3.16 2.23 19.61 18.11 13.69 13 10 386 329 189 Note Data are from the Bank of English significant differences in usage and therefore in meaning, that learners need to be aware of in order to use the target language effectively Polysemy and Homonymy Where a word has a number of different senses it is normally the collocates in the surrounding cotext which can be used to disambiguate the item It is possible, for example, to identify three different senses of issue in the corpus data, each associated with different characteristic collocates (e.g., contentious issue, latest issue, share issue) The values in Table show, for example, that there are 535 occurrences of the collocation political issue, 452 of latest issue, and 1,024 of rights issue in the BoE The corpus data from the current study show that one of the most significant features which influences the way collocations are formed is the semantics of the selected item, and, where the item has two or more distinct senses, each of them is generally associated with a different set of characteristic collocates Hoey (2005) argued that, where ambiguity is possible, speakers deliberately avoid collocates that increase this ambiguity and generally choose ones which decrease it However, it was not always so easy to discern a clear number of different senses for a particular selected item from the corpus data An examination of the most frequent node 21 collocates for system, for example, shows how it is used to refer to a variety of different types of system and that it is possible to group these collocates according to the type of system they refer to (Table 5) In this case the collocates have been grouped together to reveal seven different types of system, but this is based on my own rather subjective judgement, and the number of different types of system would seem to vary according to who is doing 300 TESOL QUARTERLY TABLE The Most Frequent Collocates Associated With Three Meanings of Issue Issue (meaning 1) Issue (meaning 2) Node –1 Fret-score quency political Palestinian contentious controversial thorny 22.05 18.09 15.43 14.96 12.59 535 332 239 229 159 Node –1 latest current next special last Issue (meaning 3) Fret-Score quency 20.80 18.57 16.78 14.45 14.02 452 368 282 240 197 Node –1 t-Score Frequency rightsa bond share stock currency 33.48 19.41 18.66 6.95 6.76 1024 377 349 49 46 Note Data are from the BoE aCollocations such as human rights issue or civil rights issue are not included in the values for frequency or t-score the grouping The Collins COBUILD Advanced Learner’s Dictionary (Sinclair, 2006), for example, lists six different types of system, whereas the Oxford Advanced Learner’s Dictionary (Wehmeier, 2005) and the Longman Dictionary of Contemporary English (Summers, 2003) only list three and four different types, respectively The corpus data also contained a number of verbal collocates which were associated with specific types of system For example, verbs such as DEPRESS and STIMULATE were found to be associated with biological systems, INSTALL and ASSEMBLE with technical systems, and REFORM and RESTRUCTURE with social systems The following concordance lines taken from the BoE serve to illustrate some of these associations TABLE The Most Frequent Node –1 Collocates Associated With Seven Different Types of System Node –1 t-Score Frequency social systems legal education 39.31 36.90 1,554 1,380 business systems management accounting 18.78 10.56 transport systems transport rail geographical systems solar river Node –1 t-Score Frequency political systems capitalist democratic 18.78 18.31 354 341 461 112 technical systems computer telephone 40.11 17.16 1,610 305 23.62 14.99 565 225 biological systems immune nervous 55.43 46.67 3,074 2,200 40.17 11.04 1,615 122 Note Data are from the Bank of English A CORPUS-BASED STUDY OF COLLOCATION 301 key factor, because that can depress the human immune system." Analyses a chemical transmitter that stimulates the heart, digestive system and s one of the first brewers to install a cellar cooling system free from his fever, Professor Saito assembled a temporary distillation system ent has unveiled its plan for reform of the banking system Treasury office of worker participation, to restructure the social security system, and These collocations result from the semantic relationship which exists between the verb and the relevant noun phrase, and the majority of these verbs appear to have precise meanings which limit the number of possible associations On the other hand, the higher frequency verbs such as DEVELOP, INTRODUCE, and USE, which seem to have less precise meanings, were found to be associated with a larger number of different types of system INTRODUCE, for example, was found to be associated with at least five different types of system (political, social, technical, business, transport), as can be seen from these concordance lines taken from the BoE data the first country to introduce a state education system 1877: Edison rec ity is also planning to introduce a pensions forecasting system that will agency had attempted to introduce a new computer system and compulsory passp production manager had introduced a daily bonus system but he proposed that a positive move towards introducing an integrated transport system Rod Lit A comparison of the data for node 21 collocates in the two corpora showed that collocates which refer to business or technical systems (e.g., management, computer) occur more frequently in the BNCc, whereas collocates which refer to biological or geographical systems (e.g., immune, solar) occur more frequently in the BoE Differences in the frequencies in the two corpora of collocates which refer to either social, political, or transport systems were found to be less significant These findings would seem to reflect the difference in the content of the two corpora, and it is only to be expected that a corpus of business English will include more occurrences of collocates which refer to business and technical systems Semantic Prosody There is evidence from both corpora to show that the word process may have a negative semantic prosody and that this has a significant influence upon its collocational behaviour Corpus data for both the singular and plural forms of process show how they are associated more frequently with adjectives which refer to negative attributes rather than with adjectives which refer to positive ones However, this negative semantic prosody only seems to be associated with the individual items (i.e., process or processes) and not with noun phrases containing process or 302 TESOL QUARTERLY processes (e.g., learning process, manufacturing process, biological processes, etc.) The left-hand column of Table shows the most frequent attributive adjectives associated with process (e.g., long, lengthy, slow, gradual + process), whereas the data in the right-hand columns show how their antonyms are less frequently associated with process (e.g., short, quick, fast, painless + process) The values also show how this negative prosody does not occur consistently throughout and that, for example, process is also associated (although not quite so frequently) with more positive adjectival collocates such as simple and easy Further evidence for this negative prosody can be seen in the pattern adjective and adjective + process The adjectives which most frequently appear within this pattern are nearly all negative, as can be seen in these concordance lines taken from the BoE data italism could only be a slow and gradual process because of the generali the case through the long and expensive process of trial and appeals La spent fuel is a dangerous and complex process But there is no law to is the beginning of a long and painful process of rebuilding this count -optic cable It’s an expensive and slow process There are estimates tha scheme cuts out a lengthy and difficult process of obtaining rechecks of There is also corpus evidence from the current study to show how a negative or positive semantic prosody can only really be attributed to one or more senses of a word or phrase and not to an item as a whole The data for DEAL with, for example, show how it has at least seven different but related senses, but only three are associated with collocates which refer to negative items or events (e.g., deal with stress, the problem, wrongdoers) It is therefore only possible to attribute a negative semantic prosody to a few of the different senses TABLE Most Frequent Node 21 Attributive Adjectives Associated With Process (left-hand columns) and How Frequently Their Antonyms Occur in the Node 21 Position With Process (right-hand column) Node 21 t-Score Frequency long lengthy slow gradual complex difficult painful 17.42 11.74 16.90 12.20 13.37 11.31 12.44 304 138 286 149 179 128 155 Node 21 t-Score Frequency short 1.73 quick fast simple easy painless 3.60 1.73 9.16 6.56 3.74 13 84 43 14 Note Data are from the Bank of English A CORPUS-BASED STUDY OF COLLOCATION 303 Metaphor Another linguistic feature which influences the process of collocation is the use of metaphor Data from both corpora show how some of the features associated with the literal senses of target and goal are retained when the items are used metaphorically For example, in its literal sense, a prototypical target is something which has been identified, something which is to be aimed for, and something which you can either hit or miss The literal senses of goal refer to either the wooden structure or to something which is scored when the ball enters the area formed by the posts and crossbar Some of the features associated with the literal senses of goal, such as the fact that a player strives to score a goal during the game of football or that one can generally see if a goal has been scored or not, influence the way the word is used metaphorically This retention of features can be seen in the way that, for example, the metaphorical senses of target and goal are more frequently associated with verbs such as SET, HIT, MISS, REACH and MEET (Table 7) However, the data also show how only certain features are mapped (Koăvecses, 2002) from the literal onto the metaphorical sense, and that other features such as the fact that a target is often destroyed, or that a goal is rectangular, would seem to be ignored when the items are used metaphorically Findings from the current study add to the weight of evidence from other corpusbased studies (Deignan, 1997, 2005), which show how only certain features associated with the literal sense of an item are mapped onto the metaphorical The fact that target, and to a lesser extent goal, were found to be associated with numerical values supports the proposition that the metaphorical senses of target and goal are frequently associated with exact values and that this feature of exactness has been mapped from the literal to the metaphorical senses of both items It is also clear from TABLE Data Show How the Verbs SET, HIT, MISS, MEET, and REACH Occur Far More Frequently With Target and Goal Aim Node –3 t-Score Frequency SET HIT MEET REACH MISS 1.00 1.41 2.45 1.00 1.00 1 Goala Objective Target Fret-Score quency Fret-Score quency 2.64 1.00 2.45 2.45 1.00 7 18.91 8.72 8.72 7.14 3.46 331 76 76 51 12 t-Score Frequency 11.57 11.18 4.00 6.40 3.61 134 125 16 41 13 Note Data are from the Bank of English aIn the case of goal, only the metaphorical sense has been included in the data 304 TESOL QUARTERLY the data that this feature of exactness is largely lacking in the case of aim and objective The following examples taken from the BoE show how numerical values are associated with the metaphorical senses of target and goal than anything seen so far to meet that 3% target But, as the prime minist in January - more than double the 2,500 target for job losses outlined look forward to surpassing the $1 million goal for the Hospice Endowment showrooms, with an eventual sales target of 100,000 cars a year -double they did not wait longer than the target of 18 months Health watchdogs extra year of life to achieve its goal of 1000 processors Fuchi says This feature of exactness can also be seen in the way that target is frequently combined with prepositions such as on, above, or below in order to describe, for example, the financial position of a company or project The following examples taken from the BNCc data illustrate the way these prepositional phrases are used to describe the relationship between the planned and the actual situation the bonuses but you tell me I am on target for the large bonus in April was sorry twenty seven percent above target er for the quarter and most of that profit levels were 37 per below target in 1949, 19 per below in 1950 a Phraseology Some of the selected items were found to be associated with a range of different types of fixed or semi-fixed phraseological units The preposition of, for example, directly follows aspect and aspects in 75% of all occurrences of the items in the BoE In the majority of cases this is not because aspect of or aspects of are frequent combinations in themselves, but because they are elements of a whole series of longer sequences (Table 8) The phrases all aspects of, some aspects of, one aspect of, and every aspect of, for example, account for 23% of the total number of occurrences of aspect of and aspects of in the BoE It was also found that the phrase one aspect of occurs far more frequently in the corpus data than other combinations such as two aspects of or three aspects of, an indication that one aspect of is more than simply a loose grouping of items The phraseological units associated with aspect are both fixed and compositional However, there are also examples in the corpus data of units which are less fixed and only partially compositional, and TAKE issue with is an example of one of these Although the meaning of the phrase appears to be related to one of the three different senses of issue, the phrase TAKE issue with would also seem to have its own distinct A CORPUS-BASED STUDY OF COLLOCATION 305 TABLE The Most Frequent Node 21 Collocates Associated With Aspect of and Aspects of Aspect of Node –1 every one an this important t-Score Frequency 36.94 27.35 21.85 21.28 20.51 1,373 786 534 520 426 Aspects of Age of total 15% 9% 6% 6% 5% Node –1 t-Score Frequency Age of total all other many some certain 40.50 30.06 25.36 23.45 19.66 1,707 939 666 588 391 13% 7% 5% 4% 4% Note Data are from the Bank of English meaning (i.e., to disagree with something someone said) It can be seen from the corpus data that the collocates associated with issue when it appears as part of the phrase TAKE issue with are very different (e.g., polite, strong, fierce) to those associated with issue when it occurs as a single item (e.g., controversial, contentious, political) The following examples taken from the BoE data serve to illustrate this point editor, Ian Black, took polite issue with some of Pilger’s more outla of the stiffs May I take gentle issue with Morton Schatzman’s pessimism in the Netherlands - took strong issue with his colleague While he diff salty but Debs and I took fierce issue with him, having helped ourselves The fact that a phrase such as TAKE issue with was found to be associated with its own set of characteristic collocates would seem to suggest that the phrase has developed a meaning of its own, probably as a result of some form of lexicalisation process It is obvious from the corpus data that some of the selected items are associated with various types of phraseological units and that these units generally have their own collocational behaviour Consequently, any phraseological behaviour associated with a particular lexical item needs to be taken into account when attempting to describe and explain its collocational behaviour Some Implications for the Learning of Collocations Far from being purely arbitrary combinations of words, evidence from the current study shows how some collocations can be partially or fully explained by considering one or more linguistic features or processes which played a part in their formation In order to make the process of learning collocations more meaningful, and hence more memorable, language learners need to be aware of these explanations A study of collocational exercises in three course books designed to teach business 306 TESOL QUARTERLY English (Walker, 2008, p 198) found that this type of exercise typically asks the learner to ‘‘match items on the left with items on the right.’’ In order to successfully match all the items in an exercise, the learner will often have to take four or five different linguistic features or processes into account This type of exercise should focus on one feature or process at a time in order to present both the collocations themselves but also an explanation of why these words are frequently found together A contemporary English language teaching course book contains many grammatical exercises that are designed so that the learner is able to derive the rule from the results of the exercise, and current methodology frequently emphasises the importance of allowing the learners to deduce grammatical rules for themselves (Brown, 2001; Cook, 2001; Harmer, 2007) There is no reason why many of the exercises which present or practise collocations could not be designed in exactly the same way Learners would be asked to complete the collocational exercise and to speculate as to the reason why, for example, verbs such as SET, REACH, and MEET are associated with target and goal rather than with aim or objective A collocational exercise could, for instance, focus on the different senses associated with a polysemous or homonymous item or the way that some of these senses may be associated with a negative semantic prosody Where possible, collocations should be explained in the language classroom in order to help with their memorability, and encouraging learners to look for an explanation will help them to increase their awareness of the linguistic features and processes which influence the way collocations are formed As part of the current study (Walker, 2009), the entries in three learner’s dictionaries and three collocational dictionaries for each of the fifteen selected items were examined and their content compared with the findings from this study Results from the examination of the learner’s dictionaries9 showed that most of the collocations included in the entries were chosen in order to exemplify different aspects of the definition of the headword Although most of the collocations included in the entries are the same or similar to those revealed by the current study, it is apparent that these dictionaries have tended to select the most frequent collocates (e.g., important/major/key/crucial factor; Longman Dictionary of Contemporary English [Summers, 2003, p 561]), whereas findings from the current study show that it would be beneficial for learners if these dictionaries included more characteristic collocates (e.g., risk/growth/feel-good factor) The entries in the three dictionaries often failed to explain important differences in meaning between items The three dictionaries examined were the Collins COBUILD Advanced Learner’s English Dictionary (5th edition), the Longman Dictionary of Contemporary English (4th edition), and the Oxford Advanced Learner’s Dictionary (7th edition) A CORPUS-BASED STUDY OF COLLOCATION 307 such as aim, objective, target, and goal or RUN, HEAD, and MANAGE If the dictionaries focused less on the most frequent collocates and included more characteristic collocates (i.e., the slightly less frequent collocates), it would help to bring these slight but significant differences in meaning to the fore A comparison of the collocates listed in the entries for the selected items in the three collocational dictionaries showed that there is a considerable lack of agreement in the content of the three dictionaries The results of the comparison showed that, for instance, only 3% of the total number of collocates listed appear in all three dictionaries and that more than 80% appear in only one of the three This lack of agreement seems to result from differences in what each of the dictionaries regards as a collocation The BBI Dictionary of English Word Combinations (Benson, Benson, & Ilson, 1997), for example, includes large numbers of grammatical collocations (e.g., concerned about, blockade against, angry at) in its entries, whereas both the Oxford Collocations Dictionary (Crowther et al., 2002) and the Dictionary of Selected Collocations (Hill & Lewis, 2002) concentrate more on lexical collocations There are also differences in the way that the three collocational dictionaries order the collocates within an entry The BBI Dictionary and the Dictionary of Selected Collocations list collocates alphabetically, whereas the Oxford Collocations Dictionary groups collocates with similar or related meanings together This helps to show how the collocates relate to the different senses of the headword in exactly the same way that grouping the most frequent collocates of system revealed seven or so different types of system Grouping collocates alphabetically obscures this semantic relationship and, once again, encourages the learner to think of collocations as being arbitrary combinations Results from the current study show that some collocations are not simply arbitrary combinations and can, to some extent, be explained An examination of three business English course books, learner’s dictionaries, and more specialised dictionaries of collocation shows that collocations are often presented and practiced with little or no explanation as to why a native speaker frequently uses particular combinations Dilin Liu (2010) showed that, by combining techniques used in corpus linguistics with approaches used in cognitive linguistics, it is possible to demonstrate how many collocations are either partially or fully motivated Unfortunately for the learner, although a corpus-based cognitive analysis may be successful in explaining collocations which have been formed as a result of polysemy or homonymy, the use of metaphor, or simply as a result of the precise semantics of the node and its collocate(s), it may be less successful in explaining collocations which have been influenced by factors such as semantic prosody or the phraseological behaviour of the node 308 TESOL QUARTERLY CONCLUSION The current study only looked at a total of fifteen lexical items Although the collocational behaviour of each form within a lemma (a total of twelve different forms) was also examined, this is still a minute sample of the total number of items in the language, and consequently any findings can only be regarded as preliminary However, despite the obvious limitations of the study, results still show that not all collocations are arbitrary, and therefore any definition of collocation which sees it as being purely ‘‘an arbitrary linguistic phenomenon’’ (Lewis, 1997, p 32) has to be something of an overgeneralisation ACKNOWLEDGMENTS I would like to thank Professor Susan Hunston for her very helpful comments on a draft of this manuscript and my three anonymous reviewers for their thoughtful and constructive feedback THE AUTHOR Crayton Walker is a lecturer in applied linguistics at the University of Birmingham in Birmingham, England He has a background in teaching English, with over 20 years of experience teaching business English in Germany REFERENCES Barnbrook, G (1996) Language and computers: A practical introduction to the computer analysis of language Edinburgh, Scotland: Edinburgh University Press Benson, M (1989) The structure of the collocational dictionary International Journal of Lexicography, 2, 1–14 doi:10.1093/ijl/2.1.1 Benson, M., Benson, E., & Ilson, R (1997) The BBI dictionary of English word combinations Amsterdam, The Netherlands: John Benjamins Brown, D H (2001) Teaching by principles: An interactive approach to language pedagogy Harlow, England: Longman Carter, R (1987) Vocabulary: Applied linguistic perspectives London, England: Allen and Unwin Clear, J (1993) From firth principles: Computational tools for the study of collocation In M Baker, G Francis, & E Tognini-Bonelli (Eds.), Texts and technology: In honour of John Sinclair (pp 271–292) Amsterdam, The Netherlands: John Benjamins Cook, V (2001) Second language learning and language teaching London, England: Arnold Cowie, A P (1998) Phraseology: Theory, analysis and application Oxford, England: Clarendon Press Crowther, J., Dignen, S., & Lea, D (Eds.) (2002) Oxford collocations dictionary for students of English Oxford, England: Oxford University Press Crystal, D (2003) A dictionary of linguistics and phonetics Oxford, England: Blackwell A CORPUS-BASED STUDY OF COLLOCATION 309 Deignan, A (1997) A corpus-based study of some linguistic features of metaphor (Unpublished doctoral dissertation) University of Birmingham, Birmingham, England Deignan, A (2005) Metaphor and corpus linguistics Amsterdam, The Netherlands: John Benjamins Firth, J R (1957) Papers in linguistics 1934–1951 Oxford, England: Oxford University Press Firth, J R (1968) Descriptive linguistics and the study of English In F R Palmer (Ed.), Selected papers by J R Firth (pp 96–113) London, England: Longman Francis, G (1994) Labeling discourse: An aspect of nominal-group lexical cohesion In M Coulthard (Ed.), Advances in written text analysis (pp 83–101) London, England: Routledge Halliday, M A K (1966) Lexis as a linguistic level In C Bazell, J Catford, M A K Halliday, & R Robins (Eds.), In memory of J R Firth (pp 148–162) London, England: Longman Halliday, M A K., & Hasan, R (1976) Cohesion in English London, England: Longman Harmer, J (2007) The practice of English language teaching (4th ed.) London, England: Longman Hill, J., & Lewis, M (2002) LTP dictionary of selected collocations Boston, MA: Heinle & Heinle Hoey, M (2005) Lexical priming: A new theory of words and language London, England: Routledge Howarth, P A (1996) Phraseology in English academic writing Tuăbingen, Germany: Max Niemeyer Verlag Howarth, P A (1998) Phraseology and second language proficiency Applied Linguistics, 19, 24–44 doi:10.1093/applin/19.1.24 Hunston, S (2002) Corpora in applied linguistics Cambridge, England: Cambridge University Press Kennedy, G (2003) Amplifier collocations in the British national corpus: Implications for English language teaching TESOL Quarterly, 37, 467487, doi:10.2307/3588400 Koăvecses, Z (2002) Metaphor: A practical introduction Cambridge, England: Cambridge University Press Krishnamurthy, R (2004) English collocation studies: The OSTI report London, England: Continuum Lewis, M (1993) The lexical approach Hove, England: Language Teaching Publications Lewis, M (1997) Implementing the lexical approach Hove, England: Language Teaching Publications Lewis, M (2000) Teaching collocations Hove, England: Language Teaching Publications Liu, D (2010) Going beyond patterns: Involving cognitive analysis in the learning of collocations TESOL Quarterly, 44, 4–30 doi:10.5054/tq.2010.214046 Louw, W (1993) Irony in the text or insincerity of the writer: The diagnostic potential of semantic prosodies In C Bazell, J Catford, M A K Halliday, & R Robins (Eds.), In Memory of J R Firth (pp 157–176) London, England: Longman McCarthy, M (1990) Vocabulary Oxford, England: Oxford University Press McCarthy, M., & O’Dell, F (2005) English collocations in use: Intermediate Cambridge, England: Cambridge University Press Mason, O (1997) The weight of words: An investigation of lexical gravity Proceedings of PALC 97 (pp 361–375) Lodz, Germany: University of Lodz 310 TESOL QUARTERLY Mason, O (1999) Parameters of collocation: The word in the centre of gravity In J Kirk (Ed.), Corpora galore Amsterdam, The Netherlands: Radopi Moon, R E (1998) Fixed expressions and idioms in English: A corpus-based approach Oxford, England: Clarendon Press Nation, I S P (2001) Learning vocabulary in another language Cambridge, England: Cambridge University Press Nelson, M (2006) Semantic associations in business English: A corpus-based analysis English for Specific Purposes, 25, 217–234 doi:10.1016/j.esp.2005.02.008 Nesselhauf, N (2003) The use of collocations by advanced learners of English and some implications for teaching Applied Linguistics, 24, 223–242 doi:10.1093/ applin/24.2.223 Nesselhauf, N (2005) Collocations in a learner corpus Amsterdam, The Netherlands: John Benjamins Partington, A (1998) Patterns and meanings Amsterdam, The Netherlands: John Benjamins Sinclair, J (1966) Beginning the study of lexis In C Bazell, J Catford, M A K Halliday, & R Robins (Eds.), In memory of J R Firth (pp 410–430) London, England: Longman Sinclair, J (1991) Corpus, concordance and collocation Oxford, England: Oxford University Press Sinclair, J (1996) The search for units of meaning Textus, 9, 75–106 Sinclair, J (2004a) The lexical item In J Sinclair, & R Carter (Eds.), Trust the text: Language, corpus and discourse (pp 131–148) London, England: Routledge Sinclair, J (2004b) Lexical grammar In J Sinclair, & R Carter (Eds.), Trust the text : Language, corpus and discourse (pp 164–176) London, England: Routledge Sinclair, J (2006) Collins COBUILD advanced learner’s English dictionary (5th ed.) Glasgow, Scotland: Harper Collins Publishers Sinclair, J., Jones, S., & Daley, R (1970) The OSTI report Birmingham, England: University of Birmingham Sinclair, J., & Jones, S (1974) English lexical collocations: A study in computational linguistics In J Foley (Ed.), J M Sinclair on lexis and lexicography (pp 110–128) Singapore: University of Singapore Press Smadja, F., & McKeown, K (1991) Using collocations for language generation Computational Intelligence, 7, 229–239 doi:10.1111/j.1467-8640.1991.tb00397.x Stewart, D (2009) Semantic prosody: A critical evaluation London, England: Routledge Stubbs, M (1995) Collocation and semantic profiles: On the cause and trouble with quantitative methods Functions of Language, 2, 1–33 Stubbs, M (2001) Words and phrases: Corpus studies of lexical semantics Oxford, England: Blackwell Stubbs, M (2009) The search for units of meaning: Sinclair on empirical semantics Applied Linguistics, 30, 115–137 doi:10.1093/applin/amn052 Summers, D (2003) Longman dictionary of contemporary English (4th ed.) Harlow, England: Longmans Thornbury, S (2002) How to teach vocabulary Harlow, England: Longmans Walker, C (2008) A corpus-based study of the linguistic features and processes which influence the way collocations are formed (Unpublished doctoral dissertation) University of Birmingham, Birmingham, England Walker, C (2009) The treatment of collocations by learners’ dictionaries, collocational dictionaries and dictionaries of business English International Journal of Lexicography, 22, 281–299 doi:10.1093/ijl/ecp016 Wehmeier, S (2005) Oxford advanced learner’s dictionary (7th ed.) Oxford, England: Oxford University Press A CORPUS-BASED STUDY OF COLLOCATION 311 Woolard, G (2000) Collocations: Encouraging learner independence In M Lewis (Ed.), Teaching collocations (pp 28–46) Hove, England: Language Teaching Publications 312 TESOL QUARTERLY ... (2008) A corpus- based study of the linguistic features and processes which influence the way collocations are formed (Unpublished doctoral dissertation) University of Birmingham, Birmingham, England... fixed and compositional However, there are also examples in the corpus data of units which are less fixed and only partially compositional, and TAKE issue with is an example of one of these Although... awareness of the linguistic features and processes which influence the way collocations are formed As part of the current study (Walker, 2009), the entries in three learner’s dictionaries and three

Định dạng
Số trang	22
Dung lượng	136,74 KB