To achieve the above aims, this corpus-based study investigates lexical characteristics of the corpus of medico-pharmaceutical texts used in a pilot ESP course at HUP.. The morphological
Trang 1COLLEGE OF FOREIGN LANGUAGES DEPARTMENT OF GRADUATE STUDIES
M.A Combined Programme Thesis
Field: English linguistics Code: 602215
HANOI, APRIL 2008
Trang 2DEPARTMENT OF GRADUATE STUDIES
D Y)
M.A Combined Programme Thesis
Field: English linguistics Code: 602215
Supervisor: Dr Ki u Thu H ng
HANOI, APRIL 2008
Trang 3STATEMENT OF AUTHORSHIP
This work contains no material which has been accepted for the award of any other degree in any university or other tertiary institution and, to the best of my knowledge and belief, contains no material previously published or written by other person, except where due to references have been made in the text
Hanoi, April 2008
Nguyen Thi Van Hanh
Trang 4ACKNOWLEDGEMENTS
I would like first and foremost to express my sincere and deep gratitude to my supervisor,
Dr Kieu Thi Thu Huong, for her deliberate guidance and invaluable critical feedback and suggestions during the writing of this study Her constant support, encouragement and patience are highly appreciated But for her help, this work would not have been completed
I would like to take this opportunity to express my sincere thanks for the support and encouragement from Assoc Prof Dr Le Hung Tien toward the completion of my thesis
I would also like to thank all teachers from the English Department at Hanoi University
of Pharmacy for their unconditional support and their useful ideas for my study Particularly, I owe my thanks to Mrs Nguyen Do Thu Hoai, Head of the English Department, who has continuously encouraged me and shared with me her experience relating to teaching and learning ESP at HUP
My appreciation is also to the professors who participated in my inter-rater reliability check for their valuable feedback
I am also indebted to all other people whose suggestions, support and encouragement have contributed to the completion of my thesis
Trang 5ABSTRACT
English for pharmacy at Hanoi University of Pharmacy (HUP) has been taught for three decades; however, there has been little empirical research on medico-pharmaceutical English texts which are used for this English for Specific Purposes (ESP) course This research has been conducted in order to provide teachers and students at HUP with a detailed analysis of the lexical and morphological characteristics of the corpus of texts they are working with and drawing implications for teaching and learning
To achieve the above aims, this corpus-based study investigates lexical characteristics of the corpus of medico-pharmaceutical texts used in a pilot ESP course at HUP This is carried out by classifying vocabulary into four levels using primarily the RANGE program (Nation, 2006) and the four-point rating scale by Chung and Nation (2003), and
by exploring the morphological characteristics of this ESP corpus mainly with the Simple Concordance Program (Reed, 1997-2008) The results show that the size and the coverage of technical vocabulary are relevant as compared to the previous results of similar studies, strongly suggesting that the coursebook materials are manageable for students The morphological analysis presents the frequency, origin, formation, meanings and functions of the most frequently used affixes in the corpus, revealing that there is a high frequency of words in the corpus from technical vocabulary which share the same origin and formation by means of their affixes The morphological characteristics are, therefore, important in helping students to acquire technical vocabulary
The results brought about by the lexical and morphological analyses in this study suggest various implications for course design, materials evaluation, and materials development,
as well as for teaching, learning, and testing ESP at HUP in a narrow focus and in EFL teaching and learning in a wider context The tools and methods employed in this study are also intended to assist teachers and researchers in the field of ESP to deal with technical vocabulary
Trang 6TABLE OF CONTENTS
Acknowledgements ii
Abstract iii
PART ONE: INTRODUCTION 1
1 Rationale 1
2 Aims of the study 2
3 Research questions 2
4 Research methods 2
5 Scope of the study 4
6 Significance of the study 4
7 Structure of the thesis 5
PART TWO: DEVELOPMENT 6
CHAPTER 1: THEORETICAL BACKGROUND 6
1.1 An overview of lexicon 6
1.1.1 Some basic concepts 6
1.1.1.1 Word and lexeme 6
1.1.1.2 Word classes 8
1.1.1.3 Closed system versus open classes 9
1.1.2 Lexical relations 10
1.1.2.1 Collocation 10
1.1.2.2 Polysemy and homonymy 11
1.1.3 Word types, word tokens and lemmas 14
1.2 An overview of morphology 15
1.2.1 Some basic concepts 15
1.2.2 Inflection, derivation and compounding 17
1.2.2.1 Inflection 17
1.2.2.2 Derivation 18
1.2.2.3 Compounding 19
1.2.2 The historical sources of English word formation 20
1.2.3 Characteristics of Germanic and non-Germanic derivation 21
1.3 Text analysis 22
Trang 71.3.1 Quantitative versus qualitative text analysis 22
1.3.2 Corpus linguistics and corpus-based approach to text analysis 22
1.3.4 Tools for corpus-based analyses 26
1.4 ESP texts 26
1.4.1 ESP texts and technical vocabulary 26
1.4.3 Corpus-based approach and analysis tools in ESP 28
1.5 English for medicine and pharmacy 31
CHAPTER 2:LEXICAL CHARACTERISTICS OF MEDICO-PHARMACEUTICAL TEXTS AT HANOI UNIVERSITY OF PHARMACY 33
2.1 Methodology 33
2.1.2 The selection of texts 34
2.1.3 Major methods for data analysis 35
2.1.4 Major tools for data analysis 35
2.1.5 The inter-rater reliability check 37
2.1.5.1 Introduction of the inter-rater reliability check 37
2.1.5.2 The results of the inter-rater reliability check 41
2.2 Lexical features of the corpus of texts at HUP 42
2.2.1 Initial description and discussion of the data 43
2.2.1.1 General statistics of the corpus 43
2.2.1.2 Processing of the data against the first 2,000 most frequent words in GSL 45
2.2.1.3 Processing of the data against the AWL 47
2.2.1.4 Processing of the data from word list 4 48
2.2.2 In-depth description and discussion of technical vocabulary 49
2.2.2.1 The size of technical vocabulary in the ESP texts 49
2.2.2.2 The importance of technical vocabulary in the ESP texts 51
CHAPTER 3:MORPHOLOGICAL CHARACTERISTICS OF MEDICO-PHARMACEUTICAL TEXTS AT HANOI UNIVERSITY
OF PHARMACY 55
3.1 Methodology 55
3.2 Discussion of inflectional suffixes in the corpus 56
3.2.1 Suffix -ed 56
3.2.2 Suffix -ing 58
3.3 Discussion of derivational affixation in the corpus 60
Trang 83.3.1 Suffix –tion 61
3.3.2 Suffix –al 62
3.3.3 Suffix –ic, -ical and -ous 63
3.3.4 Suffix -ine, -ium and -ia 64
PART THREE: CONCLUSION 68
1 Conclusion 68
2 Major findings 69
2.1 Major findings concerning lexical characteristics 69
2.2 Major findings concerning morphological characteristics 69
3 Implications 70
3.1 Implications for course designers, materials evaluators and materials developers 70
3.1.2 For course designers 70
3.1.3 For materials evaluators 71
3.1.4 For materials developers 72
3.2 Implications for EFL/ESP teaching and learning 73
3.2.1 Implications for teachers 73
3.2.2 Implications for students 77
3.3 Implications for testing 78
3.4 Other implications 79
4 Suggestions for further research 79
REFERENCES 81 APPENDIX 1 I APPENDIX 2 IV
Trang 9LIST OF TABLES AND FIGURES
Table 1 Typical differences between lexical words and function words 9
Table 2 Germanic and non-Germanic derivation 21
Table 3 Association patterns in language use 24
Table 4 Percentage of each vocabulary level in academic language courses 27
Table 5 Effectiveness of the four ways of identifying technical terms 30
Table 6 Sample classification in the inter-rater reliability check 39
Table 7 Marked words for the inter-rater reliability check 40
Table 8 Inter-rater reliability accuracy score calculated by the number of words assigned to four steps by rater 1 and by the researcher 41
Table 9 Inter-rater reliability accuracy score calculated by the number of words assigned to four steps by the rater 2 and by the researcher 42
Table 10 Coverage of texts by the various levels of vocabulary types and tokens by RANGE program 43
Table 11 Ratio between number of input files and number of types found 44
Table 12 Word classes vs word list 1 45
Table 13 The most frequent words vs word list 1 46
Table 14 The most frequent words vs word list 2 47
Table 15 The most frequent words vs word list 3 48
Table 16 The most frequent words vs word list 4 49
Table 17 Coverage of levels of vocabulary types in the corpus of ESP texts 50
Table 18 Coverage of levels of vocabulary frequency in the corpus of ESP texts 52
Table 19 A sample of raw data for developing a glossary of technical words 53
Table 20 A sample of raw data for developing a glossary from low frequency words 54
Table 21 Past participles and their frequency of occurrences 57
Table 22 Present participle/gerund and their frequency of occurrences 59
Table 23 The most common suffixes in the corpus 61
Table 24 Words with suffix –ation and their frequency 62
Table 23 Words with suffix –al and their frequency 63
Table 25 Words with suffix –ic and their frequency 64
Table 26 Summary of the most frequently met suffixes 67
Table 27 Sample of an exercise applicable to teaching technical vocabulary 76
Figure 1 Antonymy and synonymy for polysemic and homonymic words 13
Figure 2 Word morphological structure 16
Trang 10Figure 3 A sample of word morphological structure 16 Figure 4 A sample of concordance of words with suffix -ed 56
Trang 11LIST OF ABBREVIATIONS
Abbreviations
AWL : Academic Word List
EFL : English as a Foreign Language
ESP : English for Specific Purposes
GE : General English
GSL : General Service List
HUP : Hanoi University of Pharmacy
TTR : Type-Token Ratio
SCP : Simple Concordance Program
Trang 12PART ONE INTRODUCTION
As a matter of fact, the major tasks throughout the course are concerned with reading comprehension Besides, other activities such as speaking/presentation or writing are included, but not dominant The lessons in class are only able to provide them with the rough comprehension of the texts in which the content is pharmacy-oriented, or those in which the content is both pharmacy-oriented and medicine-oriented (hereinafter called medico-pharmaceutical texts) It is notable that the students have undertaken few courses
on professional subjects in their curriculum, which indicates that their background knowledge of their major is scattered and insufficient Accordingly, the texts used during the ESP course here are only at a moderate level of difficulty, regarding specialist knowledge, so that students can thoroughly understand them without previous specialist background Despite this, they cannot commit themselves to understand the texts thoroughly, and therefore they do not acquire enough knowledge to perform the comprehension tasks Although they are instructed to deal with them in a basic way, they still find it a struggle to comprehend the linguistic characteristics of the texts Therefore,
it is due to these difficulties and the learning needs of the students that a more thorough analysis is required of the English texts they study in class
Trang 13A literature search revealed that research into linguistic characteristics, especially on lexical and morphological features, – which are the two significant distinguishing features of English texts for Medicine and Pharmacy, – is still modest compared with other branches of study in applied linguistics in general and in the study of ESP in particular Specifically, at the College of Foreign Languages, there has been MA research
on designing a theme-based ESP reading syllabus for students at Hanoi Medical University (Nguyen Thi Thuy Huong, 2004) and another on students’ evaluation of the current course at Hanoi University of Pharmacy (Nguyen Do Thu Hoai, 2004) This clearly indicates that ESP for pharmacists in the context of Vietnam is calling for more applied linguistic research
These factors justified a research study which examines the lexical and morphological features of pharmaceutical and medical texts
2 Aims of the study
The study is aimed at:
1 finding out the features of the corpus of texts within the lexical and morphological levels, and
2 drawing implications on the basis of the analysis of the lexical and morphological characteristics which have been examined
3 Research questions
There are two questions which will be answered through the data discussion and pedagogical implications:
1 What are the lexical and morphological characteristics of the texts?
2 How are these characteristics valuable to various aspects of teaching and learning ESP at HUP in particular as well as teaching and learning ESP in general?
4 Research methods
There are a number of approaches to text analysis, particularly discourse approach including genre-based approach A wide variety of applied linguistic research has been thoroughly exploited using these approaches In the field of English for Specific Purposes, the popular method is genre analysis (Dudley-Evans, 1994) Another new
Trang 14approach to ESP text analysis is a corpus-based approach, particularly in corpus linguistics This approach, with both statistical and linguistic methods, and both automatic and interactive techniques, produces useful information on the size, the importance and other characteristics of technical vocabulary in the corpus, and other applicable results such as input materials for course design and revision, and glossary and vocabulary lists according to specific goals set for each type of list for the course
Seeing these advantages of a corpus-based approach to studying the pilot corpus of ESP texts used at HUP, the researcher will carry out the study based on the methods of corpus-based text analysis There are various tools, some of which are computer-based, used in corpus linguistics In particular, the analysis of lexical features investigated vocabulary of the corpus of the texts by identifying 4 types of vocabulary, and the prevalent tool for carrying out this analysis is the the RANGE and FREQUENCY program (hereinafter called RANGE) developed by Nation (2006) The methods that Chung and Nation (2003) applied to their own study on recognising and analysing technical vocabulary played an important role in the lexical analysis At morphological levels, derivational and inflected affixes which appear in the corpus of the texts are subjected to analysis using mainly the Simple Concordance Program (Reed, 1997-2008) However, it is anticipated that there are also some problematic features in the data in the light of the framework followed, as the lexico-morphological features do have exceptions At this point, some modifications will be applied to the corpus The validity
of the linguistic analysis will be supported by observation and informal interviews with both teachers and students at HUP; the latter, however, are secondary sources of information
Before reporting on the analysis, there are some umbrella terms that require definition, such as lexicon, morphology, lexeme, morpheme, English for Specific Purposes, text analysis, corpus linguistics, corpus-based approach and other sub-concepts The conceptualisation of the above key terms is all based on publications put forward by, for example, Biber et al (1998), Biber et al (1999), Celce – Murcia and Larsen – Freeman (1983); Carstairs-McCarthy (2002); Bauer (1983); Hutchinson and Waters (1987), Chung and Nation (2004), and Lankamp (1988)
Trang 155 Scope of the study
It is worth noting again that the study is focused on the linguistic characteristics of medico-pharmaceutical texts at the levels of lexicon and morphology Ideally, the linguistic levels of medical English in general are investigated in terms of the following levels (Lankamp, 1988):
medico-of this corpus are chosen as the focus medico-of analysis medico-of a corpus medico-of texts, which consists medico-of 8 texts from the newly-designed pilot coursebook by the English Department at HUP
6 Significance of the study
There are currently impressionistic ways of teaching and learning through the medium of the texts which will be analysed in this study, therefore the findings brought about by this study will hopefully be one of the modest reference sources which helps teachers and students at HUP to have a systematic grasp of the corpus of technical words and ways to deal with them
Trang 167 Structure of the thesis
The study is divided into three parts: Introduction, Development and Conclusion Part One - Introduction - presents the rationale, the aims, the research questions, the applicable methods, the scope, the significance and the structure of the study Part Two - Development, which is the main part of the study, consists of three chapters Chapter 1 provides a theoretical background for the research development This chapter gives a description of the published related materials concerning such basic concepts in lexicon and morphology as text analysis, corpus linguistics, corpus-based approach and ESP Chapter 2 and Chapter 3 respectively give a description of lexical and morphological features of medico-pharmaceutical texts used in teaching ESP at HUP In these two chapters, lexical and morphological features of the target corpus of the texts are analysed employing a corpus-based approach using the methods and tools mentioned in the first section of each chapter Part Three - Conclusion - summarises the major findings from this study and suggests implications for course design, materials development, teaching, learning and testing This part also proposes some suggestions for further research
Trang 17PART TWO DEVELOPMENT
CHAPTER 1 THEORETICAL BACKGROUND
An understanding and analyses of lexical and morphological characteristics of the selected corpus of medico-pharmaceutical texts require various concepts and theoretical background in the field of lexicology, morphology, text analysis, corpus linguistics and English for medicine and pharmacy This chapter will deal with the basic concepts and ideas to set the theoretical background for the analyses which will be carried out later in this study
There are different definitions of and discussions on some basic concepts from various authors, however, the most significant publications on which this study is based are Biber
et al (1999) This book gives a comprehensive account of English grammar based on different large-scale grammar books; however, the feature that distinguishes this book from other grammar books is that it not only describes the nature of language, but also the actual use of each grammatical feature, based on corpus analytic research of four registers: conversation, fiction, newspaper language and academic prose The comprehensiveness of the description of English grammar in the book and the fact that the book itself is a corpus-based research study with data of the real use of linguistic features are the major factors which account for the heavy reference of the book in this study
1.1 An overview of lexicon
Lexicon is defined in Richards et al (1992:212) as “a set of all the words and idioms of any language” In Oxford English Dictionary (Oxford University, 1989), lexicon is “the complete set of meaningful units in a language” Other basic concepts within lexicon will
be presented in the following section
1.1.1 Some basic concepts
1.1.1.1 Word and lexeme
In a language, a grammatical unit consists of one or more elements The hierarchy of a grammatical unit is shown in the way “that clause consists of one or more phrases, a
Trang 18phrase consists of one or more words, a word consists of one or more morphemes, etc.” (Biber et al., 1999:50) It is the words of a language that are the focus when vocabulary
of that language is spoken of
Biber et al (1999:51) defines word as “the basic elements of language” And “they are clearly shown in writing; they are the units which dictionaries are organised around” Carstairs-McCarthy’s definition intensifies Biber et al.’s definition of word as follows:
“words…are units of language which are basic in two senses, both
listed in dictionaries, and
are formed”
(Carstairs-McCarthy, 2002:5)
A simple description of characteristics of words is presneted in Biber et al (1999:51) According to them, words, phonologically, may be preceded and followed by a pause; orthographically there are spaces of punctuation marks; syntactically, they may be used alone as a single utterance; and semantically, words can obtain one or more meanings in
a dictionary
Another term frequently used in lexicology is lexeme Whereas words are understood as orthographic words, which are word forms separated by spaces in written texts and the corresponding forms in speech as discussed above, lexemes are the smallest units of a lexicon, but may also occur in the form of a phrase, a compound word, or in special combinations Biber et al define lexeme as “a group of word forms that share the same basic meaning and belong to the same word class” (Biber et al., 1999:54) A lexeme may
be abstract, but it can be simplified by saying a lexeme allows different inflections to affix to it to make words For example, speak is a lexeme, meanwhile speaks and speaking are inflected forms of speak The dictionary information on a lexeme as a dictionary entry generally includes its pronunciation, part of speech, inflected forms, and various meanings, generally grouped according to its senses and sub-senses
Every lexeme or lexical item in the language must be entered in the lexicon (which is a comprehensive list of all words and productive derivational affixes in the language) and
Trang 19represented on a number of levels, which include at least the following, according to Celce – Murcia and Larsen – Freeman (1983:49):
5 morphological regularity or irregularity
According to Celce – Murcia and Larsen – Freeman (1983:50), these different types of information provides different functions “Orthographical information is used when we alphabetise things, phonological information is used when we make words rhyme, and syntactic information is used when we match determiners and nouns appropriately”, and
“semantic information is used when we accept a lexical item in certain constructions as meaningful.”
- Function words: binds the text together Function words serve two major roles: indicating relationships between lexical words or larger units, or indicating the way in which a lexical word or larger unit is interpreted Function words belong to closed systems They have high frequency and tend to occur in any texts, whereas the occurrence of lexical words varies greatly in frequency The differences between lexical words and function words are shown in the following table:
Trang 20Features Lexical words Function words
Table 1 Typical differences between lexical words and function words
1.1.1.3 Closed system versus open classes
Both Biber et al (1999) and Celce – Murcia and Larsen – Freeman (1983) provide a clear description of closed system and open classes Words, according to them, are divided into either of these two classes:
- Closed systems: contain a limited number of members, and new members are not easily added These are mainly function words
- Open classes: membership is indefinite and unlimited These are generally lexical words
Every lexical item from either closed systems or open classes belongs to a part of speech They are nouns, auxiliary verbs, verbs, adjectives, adverbs, determiners, intensifiers, or
Trang 21prepositions The major parts of speech (nouns, verbs, adverbs and adjectives) constitute open lexical categories The other parts of speech (e.g., determiners, intensifiers, prepositions, and auxiliary verbs) constitute closed lexical categories, since they contain far fewer items than the open ones and they do not readily add new items or discard old ones (Celce – Murcia and Larsen – Freeman, 1983:49)
Biber et al (1999:56) also state that the size of function words in closed categories does not increase very quickly, meanwhile new lexical words in open categories may be instantaneously created by using the regular word formation processes of the language 1.1.2 Lexical relations
There are some ways in which lexical units, especially lexical items, are related to each other However, regarding the relationship between lexical relations and the nature of words in the corpus of texts that are going to be studied, there are three main kinds of relatedness in terms of word meaning: collocation, polysemy and homonymy
1.1.2.1 Collocation
“Collocations are the associations between lexical words so that the words co-occur more frequently than expected by chance” (Biber et al., 1999:988) A collocation is an association of words, and some words are more firmly associated with each other than others Collocations, according to Biber et al (1999:988), do not only depend on the meaning of the associated words themselves, but largely depend on the contexts in which they occur The individual words in collocations retain their own meaning, and they obtain their extended meaning through associating with other words Some examples are:
make a laugh, but not *do a laugh,
big problem, but not *large problem
Therefore, a laugh is a collocate for make, problem is a collocate for big From these combinations, there is an understanding that words with similar meanings can be distinguished by their preferred collocations; make, rather than do, prefers to collocate with a laugh and big, rather than large, prefers to collocate with problem
Trang 221.1.2.2 Polysemy and homonymy
Polysemy and homonymy are closely-related concerning lexical relations, however, it is not easy to distinguish these concepts Theoretical linguistics distinguishes between two kinds of lexical ambiguity
Polysemy
A word is polysemous (or polysemic) when it has two or more related meanings (Finegan, 2000:195) For example, the word plain can have several related meanings as follows:
(1) “easy, clear” (plain English)
(2) “undecorated” (plain white shirt)
(3) “not good-looking” (plain Jane)
(Finegan, 2000:195) Apresjan (1974:16) classifies polysemy into two types:
(a) metaphor: senses are related by analogy
E.g.: The word table has different meanings related to each other:
(1) a thin flat piece of stone/metal/wood with four legs
(2) part of a machine tool on which work is operated
(3) a level area, a plateau
(4) the people seated at a table
(5) the food on the table
(Vo Dai Quang, 2003:26) (b) metonymy: senses related by connectedness The second meaning is formed on the basis of the first, and the third is based on the second and so on
E.g.: “Rabbit” has polysemic senses as “the animal” and “the meat of that animal”; the meaning of the latter is based on that of the former
Polysemy exists only in written language, not in speech A word can only have one meaning in speech Therefore in reading texts, polysemy is a common phenomenon and
it causes difficulty for non-native readers
Trang 23Homonymy
“Words are homonymic when they have the same written or spoken form but different senses” (Finegan, 2000:196) They are not connected semantically, for instance, “punch 1” means “blow with a fist” while “punch 2” means “a drink” There are two types of homonymy according to either word sound form or word meaning There are several sub-types of homonymy; however, because of the nature of the written language that this study is dealing with, some sub-types of homonymy can be summarised as follows (Vo Dai Quang, 2003):
(a) Homonymy according to sound form:
-Full/absolute homonyms: These homonyms are identical in both pronunciation and spelling and are of the same part of speech
E.g.: seal (n): a design printed on paper by means of a stamp vs seal (n): a sea animal bank (n): a financial institution vs bank (n): a sloping side of a river
-Partial homonyms: are words identical in pronunciation or spelling, and are homonymous only in some forms of their respective paradigms They may be of the same
or different parts of speech
E.g.: still (adj): quiet
still (adv): yet
(b) Homonymy according to types of meaning:
-Lexical homonyms: words of the same part of speech but of different meanings and there is no semantic relationship between them
-Grammatical homonyms: words of different parts of speech
E.g.: light (v) – light (n), asked (simple past) – asked (past participle)
Lyons (1995:58) concludes “homonymy (whether absolute or partial) is a relation that holds between two or more lexemes, polysemy is a property of a single lexeme.” A difficulty, however, arises in distinguishing between polysemy and homonymy, i.e., how
to know if the words are separate lexical items rather than a single word with different senses? According to Finegan (2000:196), to have a clear distinction between polysemy
Trang 24and homonymy must involve several criteria, none of which by itself can be sufficient The first criterion according to Lyons (1995:59) is etymology, or a word’s historical origin As an example of homonymy, bank meaning “financial institution” is a borrowed word from Italian, while bank meaning “sloping side of a river” is traced back to a Scandinavian word Another criterion to distinguish between polysemy and homonymy is
to judge whether the words are semantically related (Lyons,1995:28) There is usually a semantic relatedness when metaphorical extension appears, with the case of such words
as foot meaning “terminal part of a body”, but foot also came to mean “the lowest part of
a hill or a mountain” Both words refer to the lowest part, which suggests they have commonality and therefore are senses of a word The same polysemic word, moreover, may share the same synonyms and antonyms, however, this type of word is limited in number, i.e., not all words have synonyms and/or antonyms Let us take a look at this example:
easy, clear
undecorated stretch of water
Trang 251.1.3 Word types, word tokens and lemmas
In a sentence, a word may appear twice or more, such as the word the and is:
The sun is shining and the girl is playing with her toys under the shade of a tree
Such words in the example are distinct tokens of a single type (Carstairs-McCarthy, 2002:5) Thus, in the above sentence, there are 18 tokens and 15 types (the and is are repeated) In simpler words, one may say two performances of the same tune, two copies
of the same book, are distinct tokens of one type
According to Biber et al (1999), the relationship between the number of different word forms, or types, and the number of running words, or tokens, is called the type-token ratio (or TTR):
TTR= (Types/tokens) x 100
Biber et al (1999) also think that TTR varies with the length of the text: longer texts have many more repeated words and therefore much a lower TTR and the same relationship between TTR and text length is found in all registers Surprisingly, the TTR
in academic prose is somewhat lower than in fiction and news, according to Biber et al (1999:53)
Another concept is lemma, which consists of a headword and its inflected forms (Chung and Nation, 2004:253) In the example below, plays and playing contain the same headword but with different inflections:
Tom usually plays tennis in the afternoon but he is playing football this afternoon
Plays and playing, therefore, are both inflected forms of lemma play
It is noted that in some studies, lemma can be used as counting unit instead of word types, or word token Even within the same study, all word type, word token and lemma can be counting units for different sections; for example, in the study by Chung and Nation (2004), the unit of counting is lemma It depends on the purpose and the scope of a study to decide which should be the counting unit However, only word types and word tokens will be the counting units in chapters 2 and 3 of this study
Trang 261.2 An overview of morphology
1.2.1 Some basic concepts
As defined by Bauer (1983:13), “morphology as a sub-branch of linguistics deals with the internal structure of word-forms” The basic units of analysis recognised in morphology are morphemes Bauer (1983:13) also gives an intensive discussion of the word form untouchables This word form can be segmented into the smallest constituent elements: un touch able s None of these segments can be subdivided into smaller segments which function the same kind of way as they do, and each of them represents a morpheme “A morpheme may be defined as the minimal unit of grammatical analysis” (Bauer, 1983:14) “Morphemes are abstract elements of analysis, meanwhile what realises morphemes are segmentable is the phonetic form, which is termed morph…A morph can be defined as a segment of a word form which represents a particular morpheme ” (Bauer, 1983:15) As such, the segmented portions of the word form as
un touch able s are morphs, and these morphs represent morphemes
Besides morphemes and morphs, a third term “allomorph” is required for morphological analysis “An allomorph is a phonetically, lexically or grammatically conditioned member of a set of morphs representing a particular morpheme” (Bauer, 1983:15) For example, the plural morpheme “s”, in its regular forms, has three different phonological realisations: /iz/, /s/ and /z/ depending on the phonetic environment in which the morpheme occurs, i.e it is phonetically conditioned
Carstairs-McCarthy (2002:21) and Delahunty and Garvey (1994:97) support the definition of allomorph as presented in Bauer (1983) However, although both Bauer (1983) and Carstairs-McCarthy (2002) agreed that there are two categories, namely free and bound, for morphological units, Bauer (1983) uses the term “morph” and Carstairs-McCarthy (2002) uses the term “morpheme” instead A morpheme may be termed “free” when it can stand on its own in an appropriate context and constitute an utterance by itself meanwhile “bound” morphemes cannot stand in isolation For example, help in helpfulness can stand alone; on the contrary, -ful and -ness cannot stand alone in an appropriate context
Functional words are invariable and they cannot be decomposed into smaller carrying units Most functional words consist of a single morpheme However, there are
Trang 27meaning-some exceptions such as throughout, nevertheless, even if, moreover Lexical words, on the other hand, may consist of a single morpheme, but they are usually more complex than that
A word can be formed with the constituents in the following diagram:
Derivational root derivational inflectional
prefix suffix suffix
Figure 2 Word morphological structure
(adapted from Bauer, 1983:20)
According to the diagram above, the word untouchables can be analysed as in the following diagram:
un touch able s Figure 3 A sample of word morphological structure
(Bauer, 1983:20)
Bauer (1983:20) cites the uses the term root, base and stem in the literature to refer to the part of the word that remains when all affixes have been removed However, there have been attempts to redefine these terms The definitions of these concepts are coined
by Lyons (1977:513), and these definitions are summarised in Bauer (1983:20) A root is
a form which is not further analysable, either in terms of derivational or inflectional morphology A base is any form to which affixes of any kind can be added This means that a derivationally analysable form to which derivational affixes are added can only be
Trang 28referred to as a base, and the word part touchable can become an analysable base A stem
is involved only when dealing with inflectional morphology In this way, untouchable becomes a stem
1.2.2 Inflection, derivation and compounding
In order to truly know how to use a word appropriately in English, a speaker would need
to know more than simply the “meaning” of the word In addition to fairly structured information, the lexicon also contains rules governing the three productive processes of English word formation The following is a brief description by Biber et al (1999:57) of ways in which a word is formed, including three main processes: inflection, derivation and compounding The examples in the following sections are adapted from the description of these three main processes of word formation by Biber et al (1999) This description is supplemented with other considerations from the literature
1.2.2.1 Inflection
According to Biber et al (1999:57), inflection signals meaningful relationships similar to those expressed by function words
Eg: the child’s toys = the toys of the child
The role of inflection is limited in English compared with many other languages; relationships are more commonly expressed by function words or by word order There are only eight productive inflectional affixes (suffixes) in English:
(Delahunty and Garvey, 1994:98)
Trang 29In English, only one inflectional affix can be used on any word Inflectional suffixes in English are highly productive - that is, they are repeatedly used: e.g., a new noun will take -s plural and the past form of a new verb will take –ed They are, however, overworked because one suffix often serves several purposes (e.g., –s marks plural and
3rd person singular; -ed = both past tense and some adjective endings) and are potentially ambiguous to second language learners
Other forms of inflection such as the following are not productive, and words entering the language will not take these forms:
(adapted from Barnard, 2005:530) According to the 8 types of inflectional suffixes above, the following parts of speech are marked by inflection:
Nouns: base (boy) – plural (boys) – genitive (boy’s/boys’),
Verbs: base (live, write) – third person singular present indicative (lives, writes) – past tense (lived, wrote) – past participle (lived, written) – ing-participle (living, writing) Adjectives: base (dark) – comparative (darker); superlative (darkest)
Adverbs: base (soon) – comparative (sooner); superlative (soonest)
1.2.2.2 Derivation
Derivation is used to form new lexemes (Biber et al., 1999:57) In this process, derivational affixes are added to morphemes Morphemes are added either by derivational prefixes or suffixes
E.g.: Prefixes: ex-president, reread, unknown
Suffixes: boyhood, centralise, greenish, derivation
This process changes the meaning of the stem that is affixed There are affixes that retain the part of speech of the words they affix, however, many others change the word class For example, prefixes like un-, pre-, and dis- (Finegan, 2004:52) change the meaning of
Trang 30the words but they do not change the word class Prefix un-, for example, is added to an adjective to create an adjective with the opposite meaning; therefore, such adjectives as reliable, friendly, familiar, successful will become adjectives unreliable, unfriendly, unfamiliar, unsuccessful, when added with this prefix In other cases, such suffix as -ation, -ation do not only change the meaning, but it also changes the meaning of the part
of speech of the stem For instance, verbs such as concentrate, imitate, classify, form become corresponding nouns concentration, imitation, classification, formation
Words can be built up using a number of prefixes and suffixes for the same stem, and may become very complex
E.g.: pre-industr-ial, industry-ial-ise, industry-ial-is-ation
1.2.2.3 Compounding
In compounding, independently existing bases are combined to form new lexemes (Biber et al., 1999:58) These following are common compounding patterns:
Noun + noun: chairman, girlfriend
Adj + noun: flatfish, Englishman
Verb+ noun: playboy, washing machine
Noun + adjective: care-free, user-friendly
Noun – Verb-er: baby sitter, screw-driver
Adjectives/Adv + Noun-ed: bow-legged, short-sighted
Directional particle + verb: overstate, underrate
In another approach to word formation, Celce-Murcia and Larsen-Freeman (1983:52) contend that words are formed by compounding, affixation, and incorporation Incorporation occurs when some element in the sentence becomes part of another element In one kind of incorporated into the verb to show that something is being added, taken away or used for doing something
He put butter on his bread He buttered his bread
He poured water over the plants he watered the plants
Trang 31According to Bauer (1983:32), morphology can be divided into two main branches, inflectional morphology and word-formation (or lexical morphology) In word formation, there are two main types of processes, namely derivation and compounding Derivation sometimes can be classified into class-maintaining derivation and class-changing derivation Class-maintaining derivation is the derivation of new lexemes which are of the same part of speech, whereas class-changing derivation produces lexemes which belong to different parts of speech from their bases However, Bauer (1983:32) is doubtful of a clear status of incorporation (which he terms “conversion”) within word-formation, since it is considered by some other linguists (Lyons, 1977, cited
by Bauer (1983:32)) to be a branch of derivation, which is also termed as zero-derivation
He, however, regards conversion as a method of forming words, although he is cautious about the dispute regarding this term Also, apart from the three categories of Biber et al (1999), he combines both the “complex” formation (which means both the forms produced by compounding and by derivation), and “compound” formation of words, and classifies them into the same categories “complex, compound”, which seems to reflect a fuller view toward the types of word formation
In all, the classification of morphology and word formation suggested by Bauer (1983) is well-grounded and therefore will be adopted as the modalities of word formation in this study
1.2.2 The historical sources of English word formation
Carstairs-McCarthy (2002:100) provides an elaborate description of the historical sources
of English word formation English is a West Germanic language, related closely to the other West Germanic languages (Dutch, German, Frisian) and less closely to the North Germanic languages (Norwegian, Danish, Swedish, Icelandic) On the other hand, England was conquered by the Normans, which led to the use of French in law and administration for a long period That is why English contains a high proportion of words borrowed from French; French is a Romance language descended from Latin, along with Portuguese, Spanish and Italian Most words borrowed from French therefore come originally from Latin; however, Latin has also entered into English directly
Furthermore, the Romans revered Greek culture, and most of classical Latin literature emulates Greek models, and they created their own terms to translate Greek words The Greek influence on English did not arise until Western Europeans began to learn about
Trang 32Greek culture in the fifteenth century, and then the main influence of Greek has been its use in the invention of scientific and technical words
A striking feature is that the inherited Germanic forms, heart and bear are free, whereas
in the forms borrowed from Latin, French or Greek the cognate roots are bound This highlights an important morphological difference between inherited and borrowed words, although there are some exceptions to the rule: some borrowed roots are free and few inherited ones are bound
If, for example, a noun is borrowed from a source of language that also distinguishes singular and plural inflectionally, then the foreign inflected plural form may be borrowed too
Eg: phenomenon (singular) phenomena (plural)
Cactus (singular) cacti (plural)
1.2.3 Characteristics of Germanic and non-Germanic derivation
In the same book, Carstairs-McCarthy (2002:104) discusses the characteristics of Germanic and non-Germanic derivation It is concluded that native Germanic affixes should attach to the free bases, while the affixes that attach to bound bases should generally be borrowed
The following is a list of some common derivational affixes that are classified according
to their origin This classification will be useful in the detailed analysis of a corpus of ESP texts in the next chapters of the study
(Carstairs-McCarthy (2002:107)
Trang 331.3 Text analysis
1.3.1 Quantitative versus qualitative text analysis
General speaking, social scientists have two ways to distinguish quantitative text analysis from qualitative text analysis (Roberts, 1997) On the one hand, quantitative analyses can
be differentiated from qualitative analyses according to the level of measurement of the variables being analysed On the other hand, social scientists also distinguish their methods as quantitative or qualitative Whereas quantitative methods are more deductive, statistical and confirmatory, qualitative methods are more inductive, nonstatistical and exploratory This study employs elements of both as will be set out in the following sections
1.3.2 Corpus linguistics and corpus-based approach to text analysis
A corpus is defined in the Concise Oxford English Dictionary as a ‘body, collection of writings’ Aston and Burnard (1998:4, cited in Rayson, 2002:2) note that the second edition of the Oxford English Dictionary lists five distinct senses for the word Only two
of these refer particularly to language However, preliminary standards guidelines have distinguished between the terms corpus and collection or archive, of which only corpus
is related to some linguistic purpose (Sinclair, 1996, cited in Rayson, 2002:2)
In language study, investigations are made to compare the language of different texts or groups of texts used in different situations The varieties of language that we use in different situations are referred to as registers, and describing the characteristics of these registers is an important area of study that provides some clues to characterise the language used in these different varieties For all such studies of language use, analysts attempt to reach goals such as (i) assessing the extent to which a pattern is found, and (ii) analysing the contextual factors that influence the variability To achieve these goals, corpus linguistics has emerged with its empirical investigations of corpora which shed new light on the study of linguistics In the recent years, corpus-based studies have become common and text analysis tools have become increasingly accessible
The term corpus linguistics has been described (McEnery and Wilson, 1996) in simple terms as the study of language based on examples of ‘real life’ language use Corpus-based analyses are empirical investigations of language use
Trang 34Corpus linguistics is not a branch of linguistics such as syntax, semantics and pragmatics that concentrates on describing or explaining some aspect of language use It is a methodology that can be applied to a wide range of linguistic study
Corpus-based approach is a term used in corpus linguistics Corpus-based methods are those which use corpora of texts, whether written or spoken, to provide genuine examples
of language in use (Scott and Tribble, 2006:3) Frequency profiling is one of the two main methods in corpus linguistics, the other being the use of concordance lines A set of concordance lines presents instances of a word or phrase usually in the centre, with words that come before and after it to the left and right The following are the characteristics of a corpus-based analysis:
“corpus”, as the basis for analysis;
3 It makes extensive use of computers for analysis, using both automatic and
interactive techniques;
Biber et al (1998:5) Earlier studies using a corpus-based approach compared the frequency of a particular lexical item or syntactic structure, resulting in simple stylistic indicators of the language use However, if properly exploited, a researcher can provide many additional kinds of information about language use, such as association patterns of language use Linguistic analyses have traditionally focused on a particular linguistic feature, either a word or grammatical construction; yet, the use of such features can be further investigated by considering their systematic associations with other linguistic and non-linguistic features Two main kinds of associations are: linguistic associations and non-linguistic associations, as follows:
Trang 35A Investigating the use of a linguistic feature (lexical or grammatical, etc.)
(i) Linguistic associations of the feature
- Lexical associations ( associations with particular words)
- Grammatical associations (associations with particular grammatical constructions)
(ii) Non-linguistic associations of the feature
- Distribution across registers
- Distribution across dialects
- Distribution across time periods
B Investigating varieties or texts (e.g., registers, dialects, historical periods)
(iii) Linguistic association patterns
- Individual linguistic features or classes of features
- Co-occurrence patterns of linguistics
Table 3 Association patterns in language use
(Biber et al., 1998:6)
It is important to realise that linguistic and non-linguistic association patterns are not independent For example, the word big, large, great are considered in terms of how many times they occur in the corpus and what word they often co-occur with in a text, but their combinability with other linguistic and non-linguistic features across registers, for example, is also considered
The role of quantitative analysis is strongly emphasised in corpus-based studies (Biber et al., 1998:8) From quantitative analyses, a crucial part of the corpus-based approach goes beyond the quantitative patterns to propose functional interpretations explaining why the patterns exist As a result, a large amount of effort in corpus-based studies is devoted to
Trang 36explaining and exemplifying why the patterns exist This represents qualitative analysis
in corpus-based studies Quantitative analyses help us see “the extent of variation in texts and analyse the complex interactions among linguistic features” (Conrad, 1996:301), whereas qualitative interpretations guarantee that the communicative functions underlying linguistic features are appropriately understood
There have been different studies carried out to bring us an in-depth understanding of how language is used For example, some studies have addressed the difference between intuitive word use and actual patterns in authentic language (Altenberg (1994); Kennedy (1991); and Sinclair (1991), cited in Conrad (1996:301)) Grammatical complex patterns
of use have been studied in the light of corpus linguistics (Mair, 1990, on infinitival complement clauses; Meyer, 1992, on apposition; and Tottie, 1991, on negation, cited in Conrad (1996)) Most relevant for this study, corpus-based techniques have also been used to investigate patterns in language features across different types of texts (Biber,
1988, and Biber and Finegan, 1994, cited in Conrad (1996)) The methodologies of these studies - and in some cases, even specific findings - are directly applicable to understanding variation in academic discourse
Corpus-based approach can also be applied to almost any area of linguistics, from morphology, to lexicography to syntax It is also notable that corpus-based studies are applicable to educational linguistics (Biber et al., 1998:12) It helps in designing effective materials and activities for the classroom In this study, the corpus-based approach is applied to carry out an analysis of lexical and morphological features of a corpus of ESP texts used in the study of pharmacy
Language use can be studied through detailed analyses of specific linguistic features in particular texts, complementing findings from analyses of large corpora Micro-analysis
in small segments of conversation in conversation analysis, for example, can also provide different perspectives on language use that are not covered by a corpus-based approach This is the case of methods applied to a lexical analysis of English of electronics by Farrell (1990), or that of Chung and Nation (2003) on an ESP corpus of texts, which will
be mentioned later in Chapter 3
Trang 371.3.4 Tools for corpus-based analyses
There are various kinds of computational tools available for corpus-based analyses The two major kinds according to their accessibility are commercially available packages and open source packages Commercial packages are widely available; however, the costs are sometimes prohibitive Wordsmith Tools (Scott, 1998), for example, is a commercial computer program which provides tools for analysing frequency of words and simple concordances On the other hand, there are an increasing number of open source tools for such analyses It is interesting that corpus linguistics researchers now have a similarly featured, yet open-source alternative to Wordsmith, which is Corsis (Sert, 2007) Two other useful computer programs, called VocabProfilers that can be found online, were created by a group of authors (Nation, 2006), and Simple Concordance Program (Reed, 1997-2008), which can be easily installed onto a computer Only the open-source software mentioned above are used in the data analysis of this study
According to their functions, there are two types of tools The first type of tool is a wide range of ready-made mainly generic tools available either for free or for a small fee, to assist researchers in carrying out simple statistical research such as frequency counts and concordancing The second type of tools is usually specially designed by researchers or corpus linguists for a specific research purpose For example, Conrad (1999) carried out a study on the use of linking adverbials across different registers At first she used some automatic and interactive computer programs which identify and code adverbials, and then provide initial analyses of the adverbial (e.g., semantic category, grammatical structure, clause position) When the interactive coding was complete, she wrote other computer programs and used statistical packages to compile frequency counts and analysed the patterns of association among the characteristics of the linking adverbials The second type of tool which Conrad wrote is a personal program which served only the purpose of her own study
1.4 ESP texts
1.4.1 ESP texts and technical vocabulary
In corpus-based approach toward ESP, (Nation, 2001) points out one description of the various levels of vocabulary with the goal of designing the vocabulary component of a language course divides vocabulary into four levels: high frequency words; academic
Trang 38words; technical words; and low frequency words Following is a table summarising the percentage of each level in a typical academic language course:
Levels of vocabulary Percentage in an academic corpus
(approx.)
Table 4 Percentage of each vocabulary level in academic language courses
(Adapted from Chung and Nation, 2003:104)
According to Chung and Nation (2003:104), high frequency words cover around 80% of the running words of academic texts and newspapers, and around 90% of conversation and novels The coverage of academic words amounts to, on average, 8.5% of academic text, 4% of newspapers, and less than 2% of the running words of novels This vocabulary is common to a wide range of academic fields, but is not high frequency vocabulary, and is not technical, since it is not typically associated with just one field, but
is more closely related to high frequency vocabulary than to technical vocabulary Technical words covers about 5% of the running words in specialised texts, and words in technical vocabulary are those that occur frequently in a specialised text or subject area but do not occur or are of very low frequency in other fields The fourth level of vocabulary consists of all the remaining words of English: the low frequency words, covering around 5% of the running words in texts
There has been research into the nature and coverage of high frequency and academic words; however, there has been little investigation of technical vocabulary and low frequency words (Chung and Nation, 2003:104) One of the reasons for this is that there has been little agreement about what technical vocabulary is and about how to count it reliably Chung and Nation (2003) have critiqued the study by Nation (2001) stating that
Trang 39it lacked a reliable method of classifying words at the time of the research, and therefore has limitations in the way the data was justified Later, Chung and Nation (2003) carried out a study in which they used a rating scale as mentioned above to create four classes of vocabulary with two types of text Chung and Nation (2003) conclude that technical vocabulary makes up a very large proportion of the running texts To ensure a clear understanding of the findings, the term technical vocabulary is hereinafter used to refer
to both technical words and low-frequency words, to distinguish it from non-technical vocabulary, which refer to both high frequency words, or the first 2,000 most frequent words, and academic words
1.4.3 Corpus-based approach and analysis tools in ESP
Chung and Nation (2004) compare different methods to research in technical vocabulary The following summarises their basic arguments about the methods usable for this area They note that there are generally four approaches taken to the identification of technical vocabulary in ESP texts, namely using a rating scale, using a dictionary, using clues provided in the texts and using computer-based approach:
(i) Using a rating scale
Chung and Nation (2004:253) state that a rating scale is used to decide whether the individual meanings of words obtain a specialised meaning or not It depends on the ability of researchers to draw on their own domain knowledge and to make inferences from domain information within the context to have right decisions on the individual meanings of words A careful design of the rating scale is desired to ensure words are classified based on intuition An inter-rater reliability check should be carried out to guarantee the reliability of the four-point rating scale Inter-rater reliability is used to estimate whether there is a reasonable agreement by different raters on which level a lexical item falls on the scale A training of raters with the same research materials, if possible, can be helpful to check whether raters check works efficiently
(ii) Using a technical dictionary
One way to decide the meaning of a word in the technical corpus is to use a technical dictionary There are numerous technical dictionaries and most established specialist fields have at least one dictionary The criterion for deciding if a word is a technical word
Trang 40or not is to see if it occurs in a technical dictionary for that specialist area The technical words are supposed to occur in a technical dictionary; this step requires an excellent technical dictionary The largest dictionary is not necessarily the best since there may also be non-technical terms in the largest dictionary
(iii) Using clues provided in the text
Writers of specialised texts sometimes signal in the text that a word is a technical term by explicitly providing a definition for the word and marking it with bolding or italics Sometimes this is done through the use of a synonym in brackets, which helps readers understand the term, for example:
The long aliphatic chain portion of the molecule, however is a nonpolar hydrocarbon and is therefore lipophilic (fat loving)
Another way in which words are likely to be technical terms is through labeling in diagrams However, as stated in Chung and Nation (2004:257), it is not easy to find clues
in a text Firstly, definitions can take a variety of forms, and semi-formal definitions are not always easy to recognise Secondly, signals such as brackets and marking can have other functions in addition to indicating that a word is a technical term Thirdly, not all labels on diagrams are technical terms, since function words like the, of and a can serve
as parts of labels
(iv) Using a computer-based approach
Computer scientists are constantly developing new computer softwares in order to obtain more accurate results The process is called automatic term extraction (Heid, 1998/1999; Pazienza, 1998/1999, cited in Chung and Nation, 2004:258), or computer-assisted term acquisition (Gamper and Stock, 1998/1999, cited in Chung and Nation, 2004:258) Typically term extraction software has used two different approaches: statistical, and linguistic (Biber et al., 1998:6) Statistical approaches basically compare the number of occurrences of a word in a technical corpus with the number of occurrences in a comparison corpus These comparisons are based on the fact that the frequency and range
of word forms are different in different types of text Simple formulas use the difference between the raw frequency and range of word forms in specialised texts and in general language texts