Characterization of Vietnamese intonation for questions Characterization of Vietnamese intonation for questions Characterization of Vietnamese intonation for questions luận văn tốt nghiệp,luận văn thạc sĩ, luận văn cao học, luận văn đại học, luận án tiến sĩ, đồ án tốt nghiệp luận văn tốt nghiệp,luận văn thạc sĩ, luận văn cao học, luận văn đại học, luận án tiến sĩ, đồ án tốt nghiệp
MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCIENCE CHARACTERIZATION OF VIETNAMESE INTONATION FOR QUESTIONS NINH KHÁNH DUY Supervisor: Dr ERIC CASTELLI HA NOI 2005 Acknowledgments Firstly, I would like to express my gratitude to my supervisor, Dr Eric Castelli, whose expertise, understanding, patience, added considerably and constructively critical eye to my graduate experience Special thanks go to Dr Nguyen Trong Giang and Dr Pham Thi Ngoc Yen for supporting me the best convenient conditions during my working time at International Research Center MICA I would like to thank to PhD students Nguyen Viet Tung, Tran Do Dat, Vu Minh Quang and Le Xuan Hung who helped me a lot in finishing the thesis I would also like to thank my family, especially my parents for the support they provided me through my entire life, without whose care, encouragement I would not have finished this thesis Finally, thanks go to all of my colleagues who helped me while I worked on this thesis Table of contents Acknowledgments .1 List of Figures .4 List of Tables .7 Chapter INTRODUCTION Chapter SPEECH PRODUCTION PROCESS 10 2.1 Introduction 10 2.2 Sound 12 2.3 Speech production 13 2.3.1 Articulators 13 2.3.2 The voicing mechanism 16 Chapter AN OVERVIEW OF PROSODY 20 3.1 The concepts of prosody and intonation 20 3.2 Levels of representation of prosodic phenomena 20 3.3 The functions of prosody .22 3.4 Applications of intonation 24 Chapter PROSODY IN VIETNAMESE 27 4.1 General characteristics of Vietnamese language 27 4.1.1 Phoneme system 27 4.1.2 Syllable structure 30 4.1.3 The tonal system 31 4.1.4 Tones in context 33 4.1.5 Modality, attitude and morphosyntactic structures 34 4.2 Some studies on Vietnamese prosody 36 Chapter FUNDAMENTAL FREQUENCY DETECTION 41 5.1 Introduction 41 5.2 Some pitch detection algorithms 43 5.2.1 The autocorrelation method 43 5.2.2 The average magnitude difference function method 46 5.2.3 The simple inverse filtering tracking method 48 5.2.4 The cepstrum-based method 49 5.3 The Praat pitch tracker 50 5.3.1 Introduction 50 5.3.2 Windowing and sampling problems 51 5.3.3 Evaluation 54 Chapter EXPERIMENTAL INTONATION ANALYSIS 58 6.1 Objective 58 6.2 Speech corpus .59 6.3 Hypotheses .60 6.4 Experiments 62 6.4.1 First experiment 62 6.4.2 Second experiment 66 6.4.3 Third experiment 68 Chapter CONCLUSION AND PERSPECTIVES 74 References 76 Appendix 78 A List of questions in the corpus 78 B List of statements in the corpus 80 List of Figures Figure 2.1 The underlying determinants of speech generation and understanding The gray boxes indicate the corresponding computer system components for spoken language processing [1] 12 Figure 2.2 Application of sound energy causes alternating compression/refraction of air molecules, described by a sine wave [1] 13 Figure 2.3 A schematic diagram of the human speech production apparatus 14 Figure 2.4 Schematic representation of the complete physiological mechanism of speech production [2] 16 Figure 2.5 A section of waveform of the utterance “sa” The unvoiced sound “s” in the first part and the voiced sound “a” in the second part 17 Figure 2.6 Vocal fold cycling at the larynx (a) Closed with sub-glottal pressure buildup; (b) trans-glottal pressure differential causing folds to blow apart; (c) pressure equalization and tissue elasticity forcing temporary reclosure of vocal folds, ready to begin next cycle [1] 18 Figure 2.7 Glottal airflow and the resulting sound pressure at the mouth [2].19 Figure 4.1 Example of the contours of six tones (female subject PNY), as described in [7] 32 Figure 4.2 F0 variations of typical pairs of sentences in [9]: 40 Figure 5.1 Autocorrelation function for (a) and (b) voiced speech, and (c) unvoiced speech [10] 44 Figure 5.2 Example of waveforms and correlation function: (a) no clipping, (b) center clipped [10] 46 Figure 5.3 AMDF function for same speech segments as in Figure 5.1 [10] 47 Figure 5.4 Block diagram of the SIFT algorithm [10] 48 Figure 5.5 Cepstrum of an example segment of: (a) voiced speech, (b) unvoiced speech 49 Figure 5.6 Windowing a signal and estimating the ACF of a signal segment from the ACF of its windowed version [15] 51 Figure 5.7 Some F0 points are detected in the unvoiced consonant “kh” of the word “không” (female subject HT) 55 Figure 5.8 Pitch halving errors in the middle of the word “trà” (female subject LH) 55 Figure 5.9 Some F0 points are missed in the voiced consonant “b” of the word “biết” (female subject LH) 55 Figure 5.10 Some F0 points are missed in the middle of the word “rõ” (female subject VL) 56 Figure 5.11 Some F0 points are missed at the end of the word “vậy” (male subject VN) 56 Figure 6.1 Speech waveform (in the background) and F0 contour (blue dotted line) of the utterance “Bây anh đâu?” (male subject ND) The final syllable “đâu” is bounded by two vertical lines 62 Figure 6.2 F0 contour (blue dotted line) and proposed intonation contour (red dotted line) of the utterance “Hiện anh làm việc đâu?” (male subject VN) 64 Figure 6.3 The intonation contour (red dotted line) of the statement “Bà làm giáo viên.” (male subject VN) 66 Figure 6.4 The intonation contour (red dotted line) of the question “Bà có nhìn rõ khơng?” (male subject VN) 66 Figure 6.5 F0 level of all speakers for questions (Q) and statements (S) 67 Figure 6.6 Time waveform (top), F0 contour (middle) and the position of representative points 69 List of Tables Table 3.1 Links between levels of representation of prosodic phenomena [3] 21 Table 3.2 Information conveyed by prosody, ‘*’ marking feature discussed in this study [4] 22 Table 4.1 Vietnamese vowels 28 Table 4.2 Vietnamese consonants 29 Table 4.3 Arrangement of Vietnamese consonants 30 Table 4.4 The phonological hierarchy of Vietnamese syllables with total numbers of each phonetic unit [6] 31 Table 4.5 The six Vietnamese tones 31 Table 5.1 Praat PDA evaluation for male speech and female speech 57 Table 6.1 Speakers’ information 60 Table 6.2 Statistics on F0 level of all speakers for questions (Q) and statements (S) including: mean, minimum (min), maximum (max) and standard deviation (std) 67 Table 6.3 Representative values of “ngang” tone in final position of questions and statements for all speakers 70 Chapter INTRODUCTION Vocal technologies are important and strategic in the development of information technology The increasing demand for the application of speech in manmachine communication in all areas ranging from telephony, telematics, and automated translation to aids for the handicapped requires sophisticated technology for the recognition and synthesis of speech However, to carry out automatic modules of speech synthesis or speech recognition for a given language, it is essential to know perfectly the characteristic of the language, particularly in term of phonetics and phonology Since intonation forms such a central part of human speech communication, not only conveying diverse linguistic information, but also information about the speaker, the speaker’s mood and attitude, it certainly ought to be useful in such above applications In the field of speech recognition, the more the task develops from the recognition of single words in a limited vocabulary towards the understanding of complex utterances, the more suprasegmental features like intonation have to be taken into account These are important cues for the segmentation and classification (question vs statement, for instance) of utterances In speech synthesis, modeling intonational features is indispensable for increasing the intelligibility and naturalness of synthetic speech This is the reason I chose to study the characteristic of Vietnamese intonation in questions The thesis is organized as follow Chapter gives a brief review of human speech production system and an introduction of some related fundamental concepts An overview of prosody, which includes intonation, is presented in Chapter Chapter describes the general characteristics of Vietnamese language and some studies on Vietnamese prosody Fundamental frequency, the acoustical correlate of intonation, and the problem of its estimation are provided in Chapter Chapter presents the experiments carried out in the work and the results obtained Finally, the conclusion and the perspectives of the study are given in Chapter 69 syllable is concerned To represent the form of this F0 contour, F0 values at time-spaced evenly F0 points (namely begin, mid1, mid2, mid3, end), including the beginning point and the ending point of voiced section, were stored as representative F0 values This is shown in an example of Figure 6.6 Figure 6.6 Time waveform (top), F0 contour (middle) and the position of representative points These representative F0 values are averaged over the questions of a particular speaker according to their positions in the final syllable, i.e begin, mid1, or mid2… These mean values were then used to represent the “ngang” tone in final position of questions for that speaker A similar procedure was carried out for the statements After the computation, I obtained representative values of “ngang” tone in final position of questions and statements for all speakers as shown in Table 6.3 70 begin mid1 VL LH ND VN AB DN HT NA PT PH mid2 mid3 end Q 216 210 205 206 213 S 221 221 219 209 205 Q 257 265 270 272 273 S 239 260 263 256 244 Q 149 149 157 158 156 S 128 130 132 131 123 Q 182 188 191 193 183 S 175 186 188 184 174 Q 138 133 133 134 133 S 130 135 136 132 128 Q 243 241 241 234 222 S 245 243 243 235 227 Q 232 231 236 237 244 S 228 230 230 225 220 Q 139 138 141 146 154 S 130 137 141 140 124 Q 235 239 241 246 249 S 225 229 232 230 219 Q 247 246 248 248 243 S 234 246 246 240 235 Table 6.3 Representative values of “ngang” tone in final position of questions and statements for all speakers From the above table, the representative contours of the final “ngang” tone in questions and statements for all speakers were plotted in the figures on the next page In each figure, the notation Q is stands for question, S for statement, B for begin, M1 for mid1, M2 for mid2, M3 for mid3 and E for end 280,0 270,0 260,0 250,0 240,0 230,0 220,0 LH Q LH S B M1 M2 M3 F0 (Hz) F0 (Hz) 71 E ND Q ND S 120,0 F0 (Hz) F0 (Hz) 160,0 140,0 100,0 M1 M2 M3 NA Q NA S F0 (Hz) F0 (Hz) 150,0 130,0 120,0 M1 M2 M3 VN S M1 M2 M3 E DN Q DN S B M1 M2 M3 E 140,0 240,0 HT Q 230,0 HT S 220,0 F0 (Hz) F0 (Hz) E 250,0 245,0 240,0 235,0 230,0 225,0 220,0 E 250,0 135,0 AB Q 130,0 AB S 125,0 120,0 210,0 B B F0 (Hz) M1 M2 M3 VN Q B 160,0 B VL S 195,0 190,0 185,0 180,0 175,0 170,0 E 140,0 VL Q B 180,0 B 225,0 220,0 215,0 210,0 205,0 200,0 195,0 M1 M2 M3 M1 M2 M3 E E 250,0 260,0 250,0 240,0 245,0 PT Q 230,0 220,0 210,0 PT S PH Q 240,0 PH S 235,0 230,0 B M1 M2 M3 E B M1 M2 M3 E 72 6.4.3.4 Discussion and conclusion From the above figures, it can be seen that the F0 contour of the final “ngang” tone in questions is highly different from that in statements for all speakers with the only exception of the female speaker DN (in this case, the two contours are nearly the same) The difference is realized on two aspects: the F0 contour and/or the register, i.e the F0 level Most clear differences are realized in speakers: LH, ND, HT, NA, and PT (their representative figures are shown on the left side of the previous page) With these speakers, the difference between question and statement is realized in both the contour and the register: the contour is rising in question and falling in statement; the register of the final “ngang” tone in question is higher than that in statement In my auditive perception, these speakers utter questions and statements with highly different intonations This perceptual difference is consistent with the difference in acoustical realizations as analyzed above With the speakers VL, VN, AB, PH, the difference between question and statement is either on the contour or on the register In my auditive perception, these speakers utter questions and statements with inconsiderable different intonations Especially, for the speaker DN, the F0 contours in question and statement are nearly the same In my auditive perception, this speaker utters questions and statements with no difference in intonation From the discussion above, it could be said that there is a correlation between the perceptual intonation difference and the acoustical difference realized in the F0 contour of final “ngang” tone The greater the perceptual intonation difference, the greater the acoustical difference In the positive case when a speaker utter questions and statements with highly different 73 intonations, the F0 contour of the final “ngang” tone could be used to characterize question and statement: if it has a rising form with a high register, the sentence is likely a question; if it has a falling form with a low register, the sentence is likely a statement Generally, it could be said that there is an influence of intonation on the tone in final position of the sentence and the tonal variants could be used to characterize the sentence type The results are consistent with those of a similar study on spontaneous speech [16] 74 Chapter CONCLUSION AND PERSPECTIVES The thesis aims to find out characteristic of Vietnamese intonation in questions For facilitating the study, questions were compared with statements to see if there is any difference in intonation between the two sentence types Based on the experiments carried out, I can conclude that question and statement are differentiated by the modification of the F0 contour of the tone at final position of a sentence caused by intonation The F0 contour of the final-syllable tone is raised in question whereas in statement, it is lowered The discussion in section 6.4.3.4 also suggests that it is necessary to perceptual test on read speech corpus before intonation analysis If a speaker reads questions and statements with inconsiderable intonation differences, it will be difficult to find out acoustical differences The study also has remaining problems The representation of intonation contour proposed in section 6.4.1.2 is somewhat based on point of view of signal processing This representation should be based on a quantitative model physiologically and/or perceptually motivated, for instance Fujisaki’s model For such a tonal language as Vietnamese, to understand tone and intonation in speech, we need to identify their functional components The problem of separation of functional components of tone and intonation from observed F0 patterns remains a future challenge 75 In the second experiment, the sentences with different tonal configurations were taken into consideration This makes the result become less meaningful The comparison of F0 level should be made for statement – question pairs whose tonal configurations are (nearly) the same Another solution is proposed in [16], which used the “ngang” tone as reference and examined the height of F0 at different positions, i.e initial, non-final, and final, of the sentences In the third experiment, only the “ngang” tone was investigated It should be necessary to examine the remaining tones so that the general conclusion about the influence of intonation on final-syllable tone become more consistent Finally, the study was based on read speech which less or more causes the loss of intonation information A study on spontaneous speech will be carried out in the future 76 References Huang X., Acero A., et al (2001), Spoken language processing: A guide to theory, algorithm and system development, Prentice Hall PTR Rabiner L and Juang B.H (1993), Fundamentals of speech recognition, Prentice Hall Dutoit T (1997), An Introduction to Text-to-Speech Synthesis, Springer Mixdorff H (1998), Intonation patterns of German - Model-based quantitative analysis and synthesis of F0 contours, PhD thesis, TU Dresden Nguyen H.Q (2001), Ngữ pháp tiếng Việt, Nhà xuất từ điển Bách Khoa Tran D.D., Castelli E., et al (2005), "Influence of F0 on Vietnamese syllable perception", Interspeech Nguyen Q.C (2002), Reconnaissance de la parole en langue Vietnamienne, PhD Thesis, Institut National Polytechnique de Grenoble Do T.D., Tran T.H., et al (1998), Intonation in Vietnamese, in Hirst and Di Cristo (ed.) Intonation system - A survey of twenty languages (chap 22), Cambridge University Press Nguyen T.T.H and Boulakia G (1999), "Another look at Vietnamese intonation", ICPhS'99 10 Rabiner L.R and Schafer R.W (1978), Digital processing of speech signals, Prentice Hall 11 de Cheveigné A., Kawahara, H (2001), "Comparative evaluation of F0 estimation algorithms", Eurospeech 12 Govender N., Barnard E., et al (2005), "Fundamental frequency and tone in isiZulu: initial experiments", Interspeech 77 13 Bagshaw P.C., Hiller S.M., et al (1993), "Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching", EuroSpeech 14 Praat toolkit's website: http://www.fon.hum.uva.nl/praat/ 15 Boersma P (1993), "Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound", Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 16 Nguyen T.T.H (2004), Contribution l'étude de la prosodie du vietnamien: variations de l'intonation dans les modalités - assertive, interrogative et impérative, PhD thesis, Université Paris 78 Appendix A List of questions in the corpus Code Utterance Q_01 Chị hỏi vậy? Q_02 Cô uống trà chứ? Q_03 Vậy cô muốn dùng gì? Q_04 Bao nhiêu lâu có chuyến tàu? Q_05 Mấy máy bay từ Pháp đến? Q_06 Nhưng có lên khơng? Q_07 Khỏe anh ? Q_08 Em sao? Q_09 Tên bác gái ? Q_10 Bác nhà báo phải không ? Q_11 Ơng cần giấy tờ khơng? Q_12 Vợ ông người Pháp hay người Việt? Q_13 Tối hôm qua chị thức khuya phải khơng? Q_14 Cịn anh? Q_15 Sách lúc bán không ông? Q_16 Sao anh khơng mở cửa? Q_17 Anh đứng tìm vậy? Q_18 Chỗ có ngồi khơng? Q_19 Các bạn uống gì? Q_20 Cái gì? Q_21 Bà khơng nhìn à? Q_22 Thật sao? Q_23 Bà có nhìn rõ khơng? Q_24 Mấy anh đi? 79 Q_25 Vậy sáng mai đánh thức anh dậy vào lúc sáu nhé? Q_26 Tin vui chứ? Q_27 Có trước cửa nhà tơi chị muốn gửi bảo đảm? Q_28 Còn tuổi anh sao? Q_29 Bây anh đâu ? Q_30 Hiện anh làm việc đâu ? Q_31 Thế hơm ăn đâu? Q_32 Cậu ăn chưa? Q_33 Nhà rộng mét ? Q_34 Có phịng tất cả? Q_35 Khu phố tên gì? Q_36 Ơng làm ơn cho chúng tơi trường tiểu học Giảng võ không ? Q_37 Cậu làm việc gần không? Q_38 Thế cậu làm gì? Q_39 Chúng ta chơi cơng viên bách thú ? Q_40 Ngày mai em có rảnh khơng? Q_41 Thế thứ bẩy? Q_42 Sao anh lại hỏi vậy? Q_43 Năm hai bác tuổi rồi? Q_44 Tại anh nhanh vậy? Q_45 Anh chậm chút có khơng? Q_46 Anh hẹn người ta giờ? Q_47 Sáng làm gì? Q_48 Hôm qua ngày bao nhiêu? Q_49 Mà em hỏi kỳ vậy? Q_50 Em nói vậy? Q_51 Chị có mua dưa lê khơng? Q_52 Thế em biết nấu chưa? Q_53 Thế chị thích ăn thịt bị thăn bắp bị? Q_54 Xin lỗi có thấy người phụ nữ qua không? 80 Q_55 Nhưng anh tìm người nào? Q_56 Anh hút có nhiều khơng? Q_57 Chị biết nấu chè bắp khơng ? Q_58 Nước cốt dừa có cần khơng? B List of statements in the corpus Code Utterrance NQ_03 Tôi tên Hương, mẹ bé Mi NQ_06 Tôi vào gọi nhà NQ_100 Giới thiệu ông nhà NQ_103 Khoảng trăm mét vuông ông NQ_104 Bảy phòng NQ_105 Một phòng ăn, hai phòng khách bốn phòng ngủ NQ_107 Đây khu Giảng võ NQ_110 Trường học nằm phía bên tay phải, đối diện vườn hoa NQ_112 Tớ làm việc ngoại ô, cách khoảng ba mươi số NQ_114 Tôi xe máy NQ_122 Em phải học từ ngày mai thứ sáu NQ_123 Em chưa biết NQ_125 Anh muốn đến nhà em chơi NQ_127 Đây bố tơi cịn bên cạnh mẹ tơi NQ_128 Bên trái em gái NQ_130 Bố sáu mươi cịn mẹ tơi năm mươi lăm NQ_132 Chúng ta bị trễ hẹn NQ_133 Mười NQ_134 Mà chín năm nhăm NQ_137 Ngày mồng mười tháng tư NQ_138 Anh người hay quên NQ_139 Anh không hiểu 81 NQ_14 Cách hai tiếng NQ_140 Hôm qua ngày sinh nhật em NQ_142 Bốn mươi nghìn đồng chị NQ_15 Bà chờ chừng nửa tiếng chuyến sau tới NQ_150 Em muốn nấu phở cho nhà sáng mai NQ_152 Em đọc sách hướng dẫn NQ_154 Thế NQ_156 Phụ nữ qua nhiều Tơi có thấy bà cao gầy mặc áo mầu xanh quần NQ_158 trắng NQ_159 Sáng sáng cần tách cà phê điếu thuốc đủ NQ_16 Tôi ghét máy bay phi trường thường xa thành phố NQ_160 Cho nên anh gầy NQ_161 Không nhiều đâu NQ_162 Hai ngày gói NQ_163 Buổi sáng, tơi phải ăn NQ_165 Một bát phở đĩa bánh NQ_166 Không ăn bữa sáng không chịu NQ_17 Đi xe lửa mệt NQ_170 Có ngon NQ_171 Khơng có khơng NQ_18 Các ga gần trung tâm thành phố NQ_19 Nhưng máy bay nhanh xe lửa NQ_21 Đáng lý đến NQ_22 Nhưng trời xấu nên có lẽ đến muộn khoảng chừng hai tiếng NQ_23 Thời tiết xấu máy bay không xuống NQ_24 Cũng khơng NQ_26 Anh bình thường NQ_27 Dạ, em không khỏe NQ_28 Em mệt 82 NQ_29 Bà tên Trần Thị Lan NQ_31 Bà giáo viên NQ_36 Mùa xuân đẹp cỏ nhiều hoa nhiều NQ_37 Theo tôi, đẹp mùa thu NQ_40 Mùa đơng lạnh cịn mùa hè trời lại q nóng NQ_41 Nhưng có lẽ mùa hè mùa hay người nghỉ hè NQ_42 Anh đốn NQ_43 Tối hơm qua tơi xem phim muộn NQ_44 Tôi xem kịch NQ_45 Lúc trời mưa to NQ_46 Tôi phải đợi đến hết mưa NQ_48 Không tạm NQ_49 Hôm muốn mua vài truyện NQ_50 Chìa khố anh NQ_51 Anh khơng nhớ để đâu NQ_53 Anh tìm khơng thấy NQ_56 Khơng có hết NQ_59 Tớ cần ly cà phê đen NQ_60 Còn em tớ ly sữa đá NQ_62 Đây tượng người đàn ông NQ_63 Chỗ đầu với hai tai NQ_64 Còn hai tay hai chân NQ_68 Tôi không khoẻ NQ_69 Buổi sáng, tơi thường chóng mặt khơng dậy NQ_70 Tôi hay bị nhức đầu NQ_71 Có rõ, có khơng NQ_72 Như bà bị cận thị NQ_73 Ngày mai, phải sớm NQ_75 Càng sớm tốt NQ_76 Chúng phải bốn năm trăm số 83 NQ_77 Nếu trễ trời nóng mệt NQ_79 Tơi vừa nhận thư nhà NQ_80 Bây phải đến bưu điện để gửi thư NQ_81 để túi xách hai ngày NQ_82 Chị bỏ vào thùng thư NQ_86 Năm nay, em làm khơng thành cơng NQ_87 Ơng thầy bói nói NQ_88 Năm tới tuổi Thìn hơn, tuổi Dần NQ_89 Anh tuổi Tý NQ_90 Cũng năm tốt anh đâu có tin thầy bói NQ_91 Tôi chỗ cũ, số bốn hai phố Trần Hưng Đạo NQ_92 Tôi làm việc trường Đại học Bách khoa NQ_96 Quán cơm Tám cuối góc phố NQ_98 Tớ ăn nhiều lần NQ_99 ăn ngon ... non-linguistic aspects The understanding of information conveyed by intonation is important for intonation study Each type of information has its effect on tonal variations, i.e intonation These effects need... in intonation analysis 3.4 Applications of intonation Since intonation forms such a central part of human speech communication, not only conveying diverse linguistic information, but also information... 2.5 We see here the a part of the waveform of the utterance “sa”, which consists of two phonemes: unvoiced consonant /s/ and vowel /a/ Figure 2.5 A section of waveform of the utterance “sa” The