Personal pronoun we and other key items in non native english learners academic writing a corpus driven study

淡江大學英文學系博士班博士論文 A DISSERTATION FOR THE DOCTOR OF PHILOSOPHY DEPARTMENT OF ENGLISH TAMKANG UNIVERSITY 指導教授：王藹玲博士 ADVISOR: DR AI-LING WANG 非英語為母語的學習者之人稱代名詞使用：資料庫研究 Personal Pronoun We and Other Key Items in Non-Native English Learners Academic Writing: A Corpus-Driven Study 研究生：鄭玉成撰 Graduate Student: TRINH NGOC THANH 中華民國 108 年 January 2019 月 Acknowledgement I would like to dedicate the completion of this doctoral dissertation to Professor Ai-Ling Wang-my advisor-for her excellence and patience in supervision Her comments during the procedure of writing and revising truly direct the source of richness in academic content and value for the dissertation itself I also want to express my thankfulness to Professor Lilie Tsay, Professor Wilson Lin, Professor Stella Hsu, and Professor Tzu-Shan Chang- other members in the thesis committee- for their helpful feedback during the defense date 論文名稱：非英語為母語的學習者之人稱代名詞使用：資料庫研究頁數：119 校系(所)組別：淡江大學英文學系博士班 B 組 (英語教學) 畢業時間及提要別：107 學年度第學期研究生：鄭玉成博士學位論文提要指導教授：王藹玲博士論文提要內容：本研究運用語料庫驅動的研究方法(corpus-driven approach)主要分析非英語母語學習者的學術寫作。根據三角剖分的方法論點(triangulation),本研究首先采用混合方法研究設計(mixed methods research design) 來研究指定的語言對象。其次,研究方向透過量化分析提出語言模式既而結合質性分析解釋量化結果第三,在結合量化結果和質性解釋當中,混合研究設計旨在增加研究結果的可靠性 (reliability)。本研究研究重點包含第一人稱代名詞 we 及其他詞彙之使用。學術寫作文本來自 2015 年 ICNALE SW 1.1，其中作者提取中國學習者，其他非英語母語的學習者（包含泰國，日本，韓國學習者）和一組英語母語的學習者。與此同時,研究範圍分為兩部分 :(1) 屬於中國學習者語料庫者分析 we 過度之使用和 we 句子 ; (2)屬於多種學習者的語料庫觀察人稱代名詞(包含第一,第二第三人稱代名詞) 和第一人稱代名詞(I & we)n-grams。研究意義提出基於字頻統計分析(frequencybased analysis)是語料庫驅動的研究方法(corpus-driven approach)重要的元素。關鍵字：語料庫驅動;人稱代名詞; 字頻; 混合方法; 學術寫作＊依本校個人資料管理規範，本表單各項個人資料僅作為業務處理使用，並於保存期限屆滿後，逕行銷毀。表單編號：ATRX-Q03-001-FM030-03 Title of Thesis： Total pages: 119 Personal Pronoun We and Other Key Items in Non-Native English Learners Academic Writing: A Corpus-Driven Study Key word: corpus-driven; personal mixed methods; academic writing pronoun; word frequency; Name of Institute: Tamkang University-Department of English- Division B (TESOL) Graduate date: January 2019 Degree conferred: Doctor of Philosophy Name of student: TRINH NGOC THANH 鄭玉成 Advisor: DR.AI-LING WANG 王藹玲博士 Abstract: The present study implements corpus-driven approach to analyze non-native English learners academic writing On the notion of triangulation, this study uses mixed methods research design to study the specified linguistic objects In particular, the research direction firstly goes through quantitative analysis to propose the linguistic model before explaining quantitative results in combination with qualitative analysis In the combination of quantitative results and qualitative interpretation, the mixed methods research design aims to increase the reliability of the research results The focus in this study includes first person plural pronoun we and other key items The academic writing text data is from ICNALE SW 1.1 (2015) in which the author extracted the written modules of Chinese, other non-native English learners (including Thai, Japanese, and Korean learners), and a group of native English learners The scope of investigation is divided into two sections: (i) the analysis of excessive use of we and we-clause by Chinese learners and (ii) the analysis of various personal pronouns (including first, second, and third person pronouns) and first person pronouns (I and we) n-grams from multiple learner corpora The research findings draw on frequency-based analysis to suggest important elements for corpus-driven approach According to “TKU Personal Information Management Policy Declaration“, the personal information collected on this form is limited to this application only This form will be destroyed directly over the deadline of reservations 表單編號：ATRX-Q03-001-FM031-02 TABLE OF CONTENTS CHAPTER 1: INTRODUCTION 1.1 Research Purposes 1.2 Research Background 1.3 Statement of the Problems 10 1.4 Research Summary 15 1.5 Research Contributions 17 CHAPTER 2: LITERATURE REVIEW 19 2.1 Research Paradigms in Corpus Linguistics 19 2.2 Research Implementations of Corpus-Driven Approach 24 2.3 Empirical Corpus Findings on Person Pronouns 29 CHAPTER 3: RESEARCH METHODOLOGY 36 3.1 Review of Methods of Investigation 36 3.2 Review of Research Issues 38 3.3 Research Procedures 40 3.4 Pilot study 47 CHAPTER 4: FINDINGS 55 4.1 The Overuse of We in Chinese Learner’s English Essay Corpus 55 4.2 The Distribution of We-Clause in Chinese Learner’s English Essay Corpus 63 4.3 Person Pronouns from Multiple Corpora Comparison 67 4.4 Exploring I and We Tri-Grams from Multiple Corpora Sources 72 CHAPTER 5: DISCUSSION 76 5.1 Patterns of Pronoun Usage 76 5.2 Patterns of Lexical Items Usage 84 5.3 Patterns of n-grams 87 CHAPTER 6: CONCLUSION 96 6.1 Efficiency of Frequency-Based Findings 96 6.2 Limitations of the Study 97 6.3 Direction for Future Studies 99 REFERENCES 101 APPENDIX 117 i LIST OF FIGURES Figure 1: Methodological triangulation for frequency data (form & presentation) and data analysis 29 Figure 2: Top 20 high-frequency word list of the pilot study 49 Figure 3: N-gram of we 51 Figure 4: Pattern of we 51 Figure 5: Top 50 high-frequency word list in Study 4.1 57 Figure 6: The scatter plot presentation for areas of frequency of we in Study 4.1 59 Figure 7: First, second, and third singular and plural pronouns in Study 4.3 70 Figure 8: The scatter plot presentation on pronoun usage of CHN learners in Study 4.1 77 Figure 9: The scatter plot presentation of key items in top 50 high-frequency word list in Study 4.1 84 Figure 10: Extracted n-grams data of money and society in Study 4.1 86 Figure 11: Frequent tri-grams of I, you, and they from THA and ENS learners in Study 4.3 88 Figure 12: Concordance data of I think it on the writing topic about smoking in Study 4.4 90 Figure 13: Concordance data of I think that and I agree with from CHN learners in Study 4.4 92 Figure 14: Concordance data of We can get and We can learn from CHN learners in Study 4.4 94 Figure 15: Concordance data of We can get and We can learn from JPN learners in Study 4.4 95 ii LIST OF TABLES Table 1: An outline of corpus-driven studies in the dissertation 46 Table 2: Frequency data of writing samples in the pilot study 48 Table 3: Normalized frequency data of we and part-time job(s) 50 Table 4: x contingency table of we-clause and its indexes 52 Table 5: 2x2 contingency table on observed frequency of we-clause and its index 53 Table 6: The descriptive value distribution for frequency of we in Study 4.1 58 Table 7: The z-score distribution for frequency of we in Study 4.1 after outliers excluded 58 Table 8: One-sample KS tests examining normalized frequency of we into areas of word count level 62 Table 9: Distribution of frequency of we into four word count levels 63 Table 10: 2x2 contingency table on expected frequency (left) and Pearson residuals (right) 65 Table 11: One sample KS test and Mann-Whitney U tests on testing the mean difference of we-clause 66 Table 12: Corpus data of CHN, JPN, KOR, THA, and ENS learners in Study 4.3 67 Table 13: Two predictive models for the difference in pronoun usage between THA and ENS learners 71 Table 14: Corpus data of CHN, JPN, and THA learners in Study 4.4 72 Table 15: I tri-gram data from corpus data of CHN, JPN, and THA learners 74 Table 16: We tri-gram data from corpus data of CHN, JPN, and THA learners 75 iii CHAPTER 1: INTRODUCTION 1.1 Research Purposes The general aim the dissertation is to examine the role of frequency data in an inductive analysis of corpus-driven approach In particular, the present dissertation deals with the practical concern of how the corpus-driven analysis can be applied to uncover linguistic patterns emerging from frequency data and to achieve the reliability in data interpretation based on frequency distribution and recurrent linguistics patterns On this consideration, the theoretical question for the purpose of researching in this dissertation is that: To what extent can corpus-driven analysis reveal the linguistic patterns and enhance the reliability of data interpretation? Considering the stated theoretical question as the guidance of studies, the dissertation is conducted to crack out three theoretical concerns regarding corpus-driven approach for text data analysis First, it is the view of frequency data as the primary concern in corpus-driven approach (Biber, 2009); however, the reliance on frequency-based criteria could result in the exclusion of meaningful corpus evidence in corpus-driven approach when corpus-driven linguists set the minimum frequency as a filter for data extraction (Xiao, 2009) Second, it is the extent to which pre-existing theories are involved in inductive corpus-driven analysis, especially when it is claimed that the corpus-driven extreme holds an overstatement on its independent attitude toward the preconceived theories and towards the rejection of intuition in its approach (Xiao, 2009) And third, it is the concern on the effectiveness of combining qualitative and quantitative paradigm into corpus linguistics research For instance, under the circumstance of having no clues about the context of writing samples, corpus linguists can proceed with quantitative analysis as a bottom-up approach to discover textual patterns of the compiled writing samples; however, without a followup qualitative analysis, further insights into the written discourse and genres cannot be retrieved (Handford, 2010) The major emphasis of personal pronoun usage in this dissertation is reserved for the case of we in the phenomenon of its overuse Apart from implementing corpus-driven approach to investigate the overuse of we, other key items (including other types of person pronouns and lexical items) are also selected to explore corpora data from student academic writing At the first step of setting up the theoretical platform of the dissertation, the next section briefly reviews the research background and the research problems of the dissertation While section 1.2 on research background in general discovers the notion of corpus representativeness and the interaction between writers and the writing topic, section 1.3 on statement of the problems indicates non-native English learners as the research group and covers the overuse of we as the research emphasis for studies in the dissertation 1.2 Research Background a Corpus Representativeness This section generates the research background on the notion of corpus representativeness Corpus representativeness is defined from both the traditional and modern views from which the main concerns are given to how the selection and the size of writing samples can considerably assist the investigation of distinctive features from corpus data The traditional view of corpus representativeness The concept of corpus representativeness traditionally refers to the extent to which the corpus itself can cover the variability of language produced by a certain group or different groups of population The selected samples in the corpus, therefore, should demonstrate the two dimensions: (a) the balance between the genres featured in the text types of collected samples and (b) characteristics featured by the population in the corpus Regarding the balance between the genres featured in the text types of collected samples, two general criteria of evaluating the balance of corpus representativeness were proposed by Biber (1993): they are (a) the range of text types in the language and (b) the range of linguistic distribution in the language as two criteria to evaluate corpus representativeness In terms of evaluation, the former criterion involves the variability of the resources (e.g newspapers, fiction, novel, academic writing, etc.) where the text data is sampled while the latter one is determined by the distribution of linguistics features at the internal level (i.e within the text itself) and the external level (i.e across the text itself and across the text types) Meanwhile, on the focus of the population to define how representative the corpus is, Atkins, Clear and Ostler (1992) mentioned the notion of generalizations about language in line with sampling method as two key concerns of maintaining the specialization for the collected writing samples Regarding the generalizations about language, it is to what extent the results as obtained from the constructed samples represent the whole population in a consideration for internal and external criteria According to Atkins et al (1992), an inclusion of internal criteria filled with linguistic features excludes the relationship between language and context while non-linguistic features from the external criteria (e.g social, situational, or extra-linguistic) would not yield any liability to variation of the texts themselves but would motivate the attention to contextual factors when analyzing textual features in general Considering the criteria of sampling method, Atkins et al (1992) mentioned the distinction between receptive (i.e languages for the purpose of hearing and reading) and productive language (i.e languages for the purpose of speaking and writing) as defining the connection the sample and the population On further explanation for the receptive and productive aspects, they make a point over the proportion of language users in the community in the sense that it is possible for one language activity (e.g writing) to become rare in large population; moreover, the forms (e.g email, business correspondence, private conversations, etc ) where the samples are collected also have a role in determining whether the inclusion of these samples is a valuable contribution to represent the distinctive features of the corpus The modern view of corpus representativeness Corpus representativeness from a traditional viewpoint has a major focus on avoiding Chomsky’s (1957) criticism that a corpus itself is a snapshot of population and thus missing its representativeness for the reason that certain forms of natural language from the population are not in the coverage of the collected samples in the corpus In doing so, a representative corpus should have three characteristics: first, the corpus data should be selected from an appropriate sampling frame of the available text genres and forms; second, the corpus itself should clearly define the level and the context of the population; and third, the corpus should index appropriate text length and variation of categories (McEnery & Wilson, 1996) Regarding the relationship between human language intuitions and corpus representativeness, the first consideration of the modern view is given to a problem in measuring the representativeness of corpus for the reason that there is no standard for measuring the population and size On the assumption that human acquires language in a constant stream between language production and reproduction, Rapp (2014) proposed that corpus representativeness can be measured by the following statistics: (a) word frequency (i.e the raw frequency of the a single word), (b) word co-occurrence (i.e corpus statistics concerning the association of one single word with the appearance other single words), and (c) common contexts of words (i.e corpus statistics ... Total pages: 119 Personal Pronoun We and Other Key Items in Non- Native English Learners Academic Writing: A Corpus- Driven Study Key word: corpus- driven; personal mixed methods; academic writing. .. in pronoun usage between THA and ENS learners 71 Table 14: Corpus data of CHN, JPN, and THA learners in Study 4.4 72 Table 15: I tri-gram data from corpus data of CHN, JPN, and THA learners. .. ICNALE SW 1.1 (2015) in which the author extracted the written modules of Chinese, other non- native English learners (including Thai, Japanese, and Korean learners) , and a group of native English

Tiêu đề	Personal Pronoun We and Other Key Items in Non-Native English Learners Academic Writing: A Corpus-Driven Study
Tác giả	Trinh Ngoc Thanh
Người hướng dẫn	Dr. Ai-Ling Wang
Trường học	Tamkang University
Chuyên ngành	English
Thể loại	dissertation
Năm xuất bản	2019
Thành phố	Taipei

Định dạng
Số trang	126
Dung lượng	5,74 MB