modeling the prosody of vietnamese language for speech synthesis

105 565 2
modeling the prosody of vietnamese language for speech synthesis

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF TECHNOLOGY - Thesis for the degree of MASTER OF SCIENCE Modeling the prosody of Vietnamese language for speech synthesis Speciality: “Information processing and Communication” Code:23.04.3898 M C ĂNG KHOA Supervisor: Prof PH M TH NG C Y N Hanoi, 2007 Faculty of Information Technology International research center of Multimedia Information, Communication and Application -1- Master thesis Acknowledgment Many people provided me generous help and inspiration during my time of master student First, I would like to express my deep sense of respect and gratitude towards my supervisors: Dr Eric Castelli and Prof Ph m Th Ng c Y n Thank you very much for orienting and guiding my research in speech processing domain Thank you for all your useful advices, your true criticisms and your patience during my time of master research Special thanks also goes to Mrs Geneviève Caelen-Haumont, PhD students Tr n t, Vũ Minh Quang and all members of MICA’s speech group I could not have done this thesis without your supports Thank all of you for all your suggestions and your sincere remarks on entire of my research I would like to thank to Ms oàn Th Ng c Hi n, who guiding me in recording the corpus I would also like to thank to a lot of MICA member who spent much of time for recording and testing for my research I am grateful to Prof Nguy n Tr ng Gi ng and MICA’s directorate supporting me the best convenient conditions during time working in International Research Center MICA Finally, I owe a great deal to my parents and my sister for their continued support I also give a very special thanks to my girl friend for her constant encouragement, giving me strength and motivation in my work and in my life M c ăng Khoa -2- Master thesis Abstract Text-To-Speech (TTS) system is a computer system which is able to produce the speech from the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking We call it the “prosody controlling ability” A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "macro-prosody" corresponding to the whole sentence The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the micro-prosody, in three types of sentence: assertive, interrogative and imperative The first task is to set up a “prosody corpus” and extract all possible prosody parameters Base on the extracted data, we defined seventy-two simple prosody patterns for Vietnamese syllables in three types of sentence After that, these patterns were applied to synthesize some simple sentences Finally, some perception experiments were taken to evaluate these synthesized sentences The results shown that the proposed patterns can be applied successfully to generate the prosody of simple sentence This work is our preliminary work in Vietnamese prosody, just concerning the sentence types and the position of syllable in a sentence In the future, we expect to continue this research with more factors of Vietnamese prosody, improve our pattern and apply them Vietnamese TTS system M c ăng Khoa -3- Master thesis M c ăng Khoa -4- Master thesis List of Figures Figure 1-1: Category of methods for predicting syllable duration [6] 23 Figure 2-1: Example of the contours of six tones, as described in [21] 30 Figure 2-2: The shape of Tone with female and male voice [18] .31 Figure 2-3: The shape of Tone with female and male voice [18] .31 Figure 2-4: The shape of Tone with female and male voice [18] .32 Figure 2-5: The shape of Tone with female and male voice [18] .32 Figure 2-6: The shape of Tone with female and male voice [18] .32 Figure 2-7: The shape of Tone 5b with female and male voice [18] .33 Figure 2-8: The shape of Tone with female and male voice [18] .33 Figure 2-9: The shape of Tone 6b with female and male voice [18] .34 Figure 2-10: Sentence classification by structure [20] 35 Figure 2-11: The sentences “Lan thích ăn cơm khơng” in 36 Figure 2-12: The sentences “B o c g ng t p i” in .36 Figure 2-13: The sentences “Tân b i ch ” in 37 Figure 2-14: The differences of F0 contour between Assertive and Interrogative sentence [16] 37 Figure 3-1: A general function diagram of TTS system [13] 41 Figure 3-2: Fujisaki model 46 Figure 3-3: Fujisaki model for tonal language [19] 46 Figure 3-4: Function diagram of proposal TTS system 47 Figure 3-5: Prosody generation module 48 Figure 4-1: Key-syllable segmentation 56 Figure 4-2: Extracting F0 contour using PRAAT 57 Figure 4-3: An example of prosody pattern 60 Figure 5-1: An example of synthesized non-sense phrase 73 Figure 5-2: Perception test 74 Figure 5-3: An example of synthesized multi-type sentences .80 M c ăng Khoa -5- Master thesis Figure 5-4: Interface for Perception test 82 Figure 5-5: Correct recognition rate with tones of last syllable 85 Figure 5-6: Correct recognition rate (%) with other types of sentences 86 Figure 5-7: Result comparison of three experiments 87 M c ăng Khoa -6- Master thesis List of Tables Table 1.1: Prosody functions 16 Table 1.2:Links between levels of representation of prosodic phenomena [13] 17 Table 1.3: Intonation model classification .18 Table 2.1:Vietnamese vowels 27 Table 2.2:Vietnamese consonants .28 Table 2.3: Arrangement of Vietnamese consonants 28 Table 2.4:The phonological hierarchy of Vietnamese syllables with total numbers of each phonetic unit [14] .29 Table 2.5 The six Vietnamese tones .30 Table 3.1: Comparison between direct pattern and model pattern 50 Table 4.1: Prosody corpus structure .52 Table 4.2: Prosody corpus text information 53 Table 4.3: Recording information of Prosody corpus 54 Table 5.1: Confusion matrix (in %) for tones with male voice 75 Table 5.2: Confusion matrix (in %) for tones with female voice 75 Table 5.3: Confusion matrix (%) of sentence types with male voice .76 Table 5.4: Confusion matrix (%) of sentence types with female voice 77 Table 5.5: Test data for Experiment 79 Table 5.6: Confusion matrix (in %) of sentence types (with male voice) 82 Table 5.7: Confusion matrix (in %) of sentence types (with female voice) 83 Table 5.8: Confusion matrix (in %) of sentence types (average of Male and Female) 84 Table 5.9: Correct recognition rate (%) with other types of sentences 86 Table 5.10: Result of three experiments .87 M c ăng Khoa -7- Master thesis Table of contents Acknowledgment Abstract List of Figures List of Tables Table of contents INTRODUCTION PROSODY AND PROSODIC MODEL 12 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.2 1.2.1 1.2.2 1.2.3 2.1.1 2.1.2 2.1.3 2.2 2.2.1 2.2.2 2.2.3 Prosody modeling 17 Intonation models 18 Duration modeling 21 This thesis work approach 23 Vietnamese language 25 Vietnamese characteristics 25 Vietnamese phoneme system 27 Syllable structure 29 Vietnamese prosody 29 Micro-prosody and tones system in Vietnamese 30 Macro-prosody and sentence types in Vietnamese 34 Some special phenomena in Vietnamese prosody 38 TTS SYSTEM AND PROSODY GENERATION 40 3.1 3.2 3.2.1 3.2.2 3.3 The concept of prosody 12 Major components of prosody 13 The functions of prosody 14 Levels of representation of prosodic phenomena 16 VIETNAMESE LANGUAGE AND PROSODY 25 2.1 Overview of prosody 12 An overview of TTS system 40 Prosody generation 41 Overview of prosody generation 41 From text to prosody 43 Other researches and our proposal 45 PROSODY PATTERNS EXTRACTION 51 4.1 Prosody corpus 51 M c ăng Khoa - 88 - Chapter 6: Conclusion and Perspectives Overall, the result in this experiment is better than in its previous, which used non-sense sentences (50% of correction) In all types of sentence, the correct recognition rate of experiment is approximately 10% higher than in experiment It can be explain that, in the first, we use the same tone for all syllables in the sentence but we did not simulate the co-articulation phenomena Therefore, the synthesized sentences were very different from nature With test sentences in experiment 2, except the final syllable, the others were in tone (or non-tone) The co-articulation phenomenon is minimized, and the sentences are more natural Compare with the experiment M, the result of experiment is better in assertive sentences but worse in interrogative sentences The test sentences in Experiment M were synthesized by simulating the prosody of actual sentences That is why they could be more natural than the test sentences in Experiment 2, which used our proposal prosody patterns Therefore, the worse result in Experiment can be understandable and acceptable In this chapter, we have presented some experiments to evaluate our prosody pattern These patterns are currently very simple, just base on the position of syllable in sentence and type of container sentence However, the results of experiments show that our proposal pattern can be apply to predict the prosody of simple sentences M c ăng Khoa - 89 - Chapter 6: Conclusion and Perspectives Conclusion and Perspectives The thesis subject is “Modeling the prosody of Vietnamese language for speech synthesis” But as all we know, finding a completed model to modeling Vietnamese prosody currently is a large and complex field, which requires many researches on linguistic, acoustic and on speech processing also Therefore, in scope of master thesis, we have studied on some basic factors of Vietnamese prosody, characterized and tried to apply them to speech synthesis In chapter 3, we have proposed a method and structure of prosody generation module The simplest way to generate the prosody of whole sentence is concatenation the direct prosody patterns of syllables Hence, in chapter 4, we set up a prosody corpus, analyzed and proposed 72 prosody patterns syllables, corresponding to the initial, middle and final positions of syllable in three type of sentence (assertive, interrogative and imperative) Although these patterns are not enough to model all case of Vietnamese prosody, but they are able to apply to generate the prosody of simple sentences That was proved by the results of experiments in chapter In the perception tests, the listener could correctly determine 64% of assertive sentences, 60% of interrogative sentences and 50% of imperative sentences These patterns are our first prosody patterns for Vietnamese syllables They were extracted from a small prosody corpus and just concern to three factors: tone, position of syllable and the sentence type In the future work, we expect to improve M c ăng Khoa - 90 - Chapter 6: Conclusion and Perspectives these patterns by set up and analyze a larger corpus Additionally, we also research others factor such as syntactic, lexical meaning factors or glottalization and coarticulation phenomena to integrate into our patterns Moreover, these prosody patterns can be represented not only in absolute values of F0, duration and intensity but also in a set of parameters of other prosody model (such as Fujisaki model) By that way, we are be able to generate the prosody description in other types of prosodic model and apply to other type of TTS system The following is a summary on the works we have done in this master thesis, the limitations and future approaches: • The works we have done: Proposed a simple method a prosody generation and a structure of prosody generation module in TTS system Set up a corpus for researching prosody Proposed 72 prosody patterns for Vietnamese syllable Apply proposal prosody patterns to synthesize some simple sentences and evaluate these sentence by perception tests • The limited points The corpus is small The proposal patterns for syllables are simple, just concern to three factors: tone, position of syllable and the sentence type Intensity controlling in speech synthesis was done manually and not very accurate The listener in perception test was few (5 males and females) • Future approach Set up a larger corpus, which concern to other factors and phenomena in Vietnamese prosody Define prosody patterns from that corpus Research other prosodic model and apply to transfer these patterns to other model parameters M c ăng Khoa - 91 - Chapter 6: Conclusion and Perspectives Develop a complete prosody generation module and integrate in a TTS system The last words This work is carried out in MICA center and I expected its result could be applied into the MICA speech synthesis system This work is also my preliminary work in the domain of speech processing With the guiding of my supervisors, the instruction and support of others in MICA’s speech processing group, my knowledge and skill of speech processing have been more and more improved That knowledge is the necessary background for my future studies and researches Once again, thank all of you very much! M c ăng Khoa - 92 - Master thesis References [1] Chilin Shih , Greg Kochanski, “Prosody and Prosodic Models”, www.prosodies.org [2] Do T.D., Tran T.H., et al (1998), “Intonation system - A survey of twenty languages”, chap 22, Cambridge University Press [3] Dung Tien Nguyen, Hansjörg Mixdorff, Mai Chi Luong, Huy Hoang Ngo, Bang Kim Vu, (2005) “Fujisaki Model based F0 contours in Vietnamese TTS”, Eurospeech proceeding [4] H Fujisaki, S Ohno, C Wang (1974), “A command-response model for F0 contour generation in multilingual speech synthesis”, Journal of Phonetics, vol 2, pp 223-232, [5] Mixdorff H (1998), “Intonation patterns of German - Model-based quantitative analysis and synthesis of F0 contours”, PhD thesis, TU Dresden [6] Mixdorff H (2001), “An Integrated Approach to Modeling German Prosody”, TU Dresden [7] Mixdorff H., Nguyen Hung Bach, Hiroya Fujisaki and Mai Chi Luong (2003), “Quantitative Analysis and Synthesis of Syllabic Tones in Vietnamese”, Eurospeech proceeding [8] Nguyen T.T.H and Boulakia G (1999), "Another look at Vietnamese intonation", ICPhS'99 [9] Ninh Khanh Duy (2005), “Characterization of Vietnamese intonation for questions”, Master Thesis, Hanoi University of Technology, M c ăng Khoa - 93 - Master thesis [10] Paul Alexander Taylor (1992), “A Phonetic Model of English Intonation”, PhD thesis, University of Edinburgh, [11] Sami Lemmetty (1999), “Review of Speech Synthesis Technology”, MSc thesis, Faculte Helsinki University of Technology [12] Thierry Dutoit (1993), “High Quality Text-To-Speech Synthesis of the French Language”, PhD thesis, Faculte Polytechnique de Mons, TCTS Lab, Belgium [13] Thierry Dutoit (1997), “An Introduction to Text-to-Speech Synthesis”, Kluwer Academic Publishers [14] Tran D.D., Castelli E., et al (2005), "Influence of F0 on Vietnamese syllable perception", Interspeech [15] Tran Do Dat (2003), “Building a large Vietnamese Speech Database”, Master Thesis, Hanoi University of Technology [16] Vu M.Q., Tran D.D., Castelli E (2006), “Prosody of Interrogative and Affirmative Sentences in Vietnamese Language: Analysis and Perceptive Results” [17] Vu M.Q., Tran D.D & Castelli E (2006), "Intonation des phrases interrogatives et affirmatives en langue vietnamienne", JEP2006, XXVIes Journées d’Etude sur la Parole Manoir de la Vicomté - Dinard, France [18] Nguyen Quoc Cuong (2002), “Reconnaissangce de la parole en langue Vietnamienne”, These, INPG, UJF Grenoble, France, Juin 2002 [19] B ch Hưng Ngun, Nguy n Ti n Dũng,(2005), "Mơ hình Fujisaki áp d ng phân tích i u ti ng Vi t" [20] Mai Ng c Ch , Vũ c Nghi u, Hoàng Tr ng Phi n (2005) “Cơ s ngôn ng h c ti ng Vi t”, NXB Giáo d c M c ăng Khoa - 94 - Master thesis [21] Nguy n H u Quỳnh (2001) “Ng Pháp Ti ng Vi t”, Nhà xu t b n t i n Bách Khoa, pp.11-86, Hà N i M c ăng Khoa - 95 - Master thesis Appendix A Text for prosody corpus Code Role Sentences 1_As_In_A 1_As_Mid_A 1_As_Fi_A A A A Bên ta theo ch n t n c Bên ch b bên ta theo n t n c Bên ch b ánh b t kh i c bên ta 1_Int_In_A 1_As_In_B Context A B K t thúc tr n ánh, th trư ng h i: Này c u Bên ta theo ch n t n c à? Vâng Bên ta theo bên ch n t n c 1_Int_Mid_A 1_As_Mid_B A B Này c u Bên ch b bên ta theo n t n c à? úng Bên ch b bên ta theo n t n c 1_Int_Fi_A 1_As_Fi_B A B Này c u Bên ch ã b ánh b t kh i c bên ta? úng Bên ch b ánh b t kh i c bên ta 1_Int_In_B 1_Imp_In_A Context B A Trong tr n ánh, c p dư i h i th trư ng Bên ta theo anh có c n bám theo ch không? Bên ta theo ch cho tơi! 1_Int_Mid_B 1_Imp_Mid_A B A ch rút bên ta theo có c khơng anh? À Th bên ta theo cho tôi! 1_Imp Fi_A Context A Trong tr n ánh, th trư ng l nh: G i quân c u vi n bên ta! Nhanh lên ! Code Role Sentence 2_As_In_A 2_As_Mid_A 2_As_Fi_A A A A Trên tà thêu m t hoa màu Áo c a ch tà thêu m t hoa Áo c a ch có thêu m t bơng hoa tà 2_Int_In_A 2_As_In_B Context A B M t ch i t may áo dài Th may h i: Trên tà thêu khơng h ch ? À Trên tà thêu m t hoa màu 2_Int_Mid_A 2_As_Mid_B A B Áo ch tà thêu khơng th ? À Cái áo y tà thêu hoa màu M c ăng Khoa - 96 - Master thesis 2_Int_Fi_A 2_As_Fi_B A B Áo ch thêu tà ? À Cái áo y thêu m t bơng hoa 2_Int_In_B 2_Imp_In_A B A Trên tà thêu khơng h ch ? Gi ng kìa.Trên tà thêu y th cho ch ! 2_Int_Mid_B 2_Imp_Mid_A B A Áo ch t tà thêu khơng ch ? Gi ng l n trư c ý Cái áo tà thêu th em! 2_Int_Fi_B 2_Imp Fi_A B A Áo ch t thêu tà? Gi ng l n trư c C thêu cho ch m t hoa tà! Code Role Sentence 3_As_In_A 3_As_Mid_A 3_As_Fi_A A A A Trên tã thêu hình m t qu bóng Ch nhìn th y tã thêu hình qu bóng Ch nhìn th y hình m t qu bóng c thêu tã 3_Int_In_A 3_As_In_B Context A B Ch ng v nói v tã m i c a em bé: Trên tã thêu hình th em ? À Trên tã thêu hình m t qu bóng anh 3_Int_Mid_A 3_As_Mid_B A B Em có th y tã thêu hình khơng ? À Em th y tã thêu hình qu bóng anh 3_Int_Fi_A 3_As_Fi_B A B Có c hình qu bóng tã ? Vâng Em th y có hình m t qu bóng tã 3_Int_In_B 3_Imp_In_A Context B A V ang thêu tã cho con, h i ch ng Trên tã thêu hình ây h anh? Con thích bóng Trên tã thêu hình qu bóng i! 3_Int_Mid_B 3_Imp_Mid_A B A Không bi t tã thêu h anh? Con có v thích bóng.Theo anh tã thêu bóng i em! 3_Int_Fi_B 3_Imp Fi_A B A Anh ơi! Th thêu tã? Con thích bóng.Th em thêu cho m t qu bóng tã! Code Role Sentence 4_As_In_A 4_As_Mid_A A A Bên t theo khuynh hư ng b o th ng tr bên t theo khuynh hư ng b o th tà M c ăng Khoa - 97 - Master thesis 4_As_Fi_A A B o th khuynh hư ng c a 4_Int_In_A 4_As_In_B Context A B Hai ngư i bàn v tr Bên t theo khuynh hư ng c u nh ? À Bên t theo khuynh hư ng b o th 4_Int_Mid_A 4_As_Mid_B A B À 4_Int_Fi_A 4_As_Fi_B A B Anh ng h ng tr bên t ? À Tôi ng h ng bên t 4_Imp_In_A Context A Trong bu i t ng t i u hành Ngư i ch huy hô: Bên t theo sau i tr ng! Nhanh lên! 4_Imp_Mid_A A Chú ý Toàn 4_Imp_Fi_A A T t c theo sau Code Role Sentence 5_As_In_A A 5_As_Mid_A 5_As_Fi_A A A t phong c p l n này, lên tá theo anh ch ng khó khăn Anh ta c thăng lên tá theo cách không bi t Cu i anh c thăng lên tá Context 5_Int_In_A A 5_As_In_B B ng tr bên t ng tr bên t theo khuynh hư ng th ? ng tr bên t theo khuynh hư ng b o th i bên t theo sau i tr ng! i bên t ! Nhanh lên Trong cu c h p bàn v phong c p m t ơn v quân i: t phong c p l n này, lên tá theo anh có khó l m không? À Lên tá theo ch ng khó khăn 5_Int_Mid_A 5_As_Mid_B A B Th trư ng h i v m t trư ng h p v a c phong lên c p tá Anh ta lên tá theo quy t nh th ? À Anh ta c lên tá theo cách ch ng bi t 5_Int_Fi_A 5_As_Fi_B A B Sao l i c lên tá? Thưa anh, không bi t l i c lên tá 5_Int_In_B 5_Imp_In_A Context B A C p dư i h i th trư ng Cịn ng chí Nam, lên tá theo anh có nên khơng? Nên ch Lên tá theo m i ngư i i ! 5_Int_Mid_B B 5_Imp_Mid_A A Context ng chí Nam mà phong lên tá theo m i ngư i có c không? c Phong lên tá theo h i! M c ăng Khoa - 98 - Master thesis 5_Int_Fi_B B 5_Imp_Fi_A A Anh không ng h vi c ng chí ó s c phong lên tá ? úng Các anh ng có phong lên tá ! Code Role Sentence 6_As_In_A 6_As_Mid_A 6_As_Fi_A A A A Lên t theo úng hư ng d n cách t p luy n r t t t Cách t t nh t t p lên t theo úng hư ng d n T t c v n ng viên u ph i t p lên t 6_Int_In_A 6_As_In_B Context A B V n ng viên h i hu n luy n viên v t p lên t : Lên t theo theo anh có t t khơng ? Có ch Lên t theo úng hư ng d n cách t p luy n r t t t 6_Int_Mid_A 6_As_Mid_B A B Theo anh t p lên t theo hư ng d n có t t khơng ? Có ch Cách t t nh t t p lên t theo úng hư ng d n 6_Int_Fi_A 6_As_Fi_B A B Li u em có ph i t p lên t ? Có M i v n ng viên u ph i t p lên t 6_Int_In_B 6_Imp_In_A B A Lên t theo cách h anh ? À Lên t theo úng hư ng d n cho tôi! 6_Int_Mid_B 6_Imp_Mid_A B A Th em có ph i t p lên t bây gi khơng? Có C u t p lên t theo ngay! 6_Int_Fi_B 6_Imp_Fi_A B A Th em ph i t p c lên t ? Ch cịn n a C theo hư ng d n lên t ! Code Role Sentence 5b_As_In_A A 5b_As_Mid_A 5b_As_Fi_A A A Trong cu c thi c a nhà nông, bên tát theo cách m i ã giành th ng l i Chi n th ng thu c v bên tát theo cách m i Cu i cùng, ph n th ng ã thu c v bên tát Context 5b_Int_In_A 5b_As_In_B A B Trong m t cu c thi c a nhà nông Hai khán gi h i nhau: Bên tát theo cách m i bên th anh? À Bên tát theo cách m i bên m c áo M c ăng Khoa - 99 - Master thesis 5b_Int_Mid_A 5b_As_Mid_B A B Bên bên tát theo cách m i h anh? À Bên bên tát theo cách m i 5b_Int_Fi_A 5b_As_Fi_B A B Trong l n thi này, li u ph n th ng có thu c v bên tát? Có Tơi nghĩ ph n th ng s thu c v bên tát 5b_Int_In_B Context B 5b_Imp_In_A A Hai anh em ang dư i ru ng, nói chuy n v i nhau: Ơ b m ang tát nư c Lên tát theo b m khơng anh? Có Lên tát theo b m i ! 5b_Int_Mid_B B 5b_Imp_Mid_A A 5b_Int_Fi_B B 5b_Imp_Fi_A A Code Role Sentence 6b_As_In_A A 6b_As_Mid_A 6b_As_Fi_A A A Trong cu c thi iêu kh c, bên t c theo cách m i ã hoàn thành b c tư ng s m Chi n th ng thu c v bên t c theo cách m i Cu c thi làm tư ng nhanh di n gi a bên úc bên t c Ơ, b m ang tát nư c Mình có lên tát theo b m khơng? Có, dư i làm xong r i Mình lên tát theo b m i! B m tát nư c Hay anh em lên tát? , dư i xong r i Nào anh em lên tát ! 6b_Int_In_A 6b_As_In_B A B Trong m t cu c thi iêu kh c, hai khán gi trò chuy n v i nhau: Bên t c theo cách m i bên th anh? À Bên t c theo cách m i bên áo 6b_Int_Mid_A 6b_As_Mid_B A B Bên bên t c theo cách m i th ? À Bên áo bên t c theo cách m i 6b_Int_Fi_A 6b_As_Fi_B A B Theo anh, li u ph n th ng có thu c v bên t c? Có Tơi nghĩ ph n th ng s thu c v bên t c 6b_Int_In_B 6b_Imp_In_A Context B A Trong m t xư ng iêu kh c, th t c h i ngư i th c : Có hai m u m i cũ T c theo m u h anh ? À T c theo m u m i cho ! 6b_Int_Mid_B 6b_Imp_Mid_A B A Anh ơi! Lô tư ng bên t c theo m u h anh ? À Lô bên t c theo m u m i i! Context M c ăng Khoa - 100 - Master thesis 6b_Int_Fi_B 6b_Imp_Fi_A V t c tư ng ph t núi ã em có ph i lên t c? Có C c u ph i lên t c! B A th chưa h anh? Li u B: Datasheet of prosody patterns Female Assertive sentence Male Initial part F0 F0 (Hz) (Semitone) 162.65 8.23 163.03 8.27 162.96 8.25 162.76 8.23 162.37 8.19 162.07 8.15 161.80 8.13 161.84 8.14 161.78 8.13 161.42 8.09 161.55 8.11 161.49 8.10 161.27 8.08 161.07 8.06 160.69 8.03 160.42 8.00 160.02 7.96 159.48 7.91 158.63 7.82 157.04 7.65 Duration(ms) Syllable Voiced part 295.52 18.65 292.33 18.47 289.19 18.28 287.26 18.16 285.73 18.07 284.65 18.01 283.93 17.97 283.23 17.92 282.77 17.90 282.00 17.85 281.85 17.84 282.13 17.85 282.72 17.89 283.01 17.90 283.29 17.92 283.33 17.92 283.32 17.92 282.34 17.85 281.63 17.81 278.94 17.65 Middle part Intensity (dB) 70.99 71.53 71.82 71.92 71.87 71.70 71.48 71.25 71.04 70.84 70.62 70.33 69.94 69.40 68.65 67.65 66.34 64.73 62.83 60.75 149 99 69.11 69.54 69.77 69.89 69.93 69.92 69.85 69.72 69.51 69.22 68.85 68.48 68.13 67.75 67.25 66.54 65.53 64.11 62.21 59.87 Duration Syllable 168 F0 100 F0 F0 (Hz) (Semitone) 180.21 9.54 180.42 9.52 178.30 9.30 177.88 9.27 177.40 9.23 177.01 9.20 176.65 9.16 176.48 9.15 176.58 9.15 176.85 9.16 177.00 9.16 176.94 9.14 176.84 9.11 176.89 9.10 176.78 9.09 176.53 9.07 174.97 8.97 172.82 8.81 171.33 8.68 169.78 8.54 Duration(ms) Syllable Voiced part 264.90 16.55 261.16 16.32 258.42 16.14 257.24 16.07 256.46 16.02 255.35 15.95 254.52 15.89 253.64 15.83 253.43 15.82 253.37 15.82 253.47 15.82 254.04 15.86 254.42 15.89 254.88 15.92 255.35 15.94 255.78 15.97 255.10 15.90 254.54 15.85 252.85 15.74 249.09 15.47 Final part Intensity (dB) 69.90 70.80 71.26 71.43 71.35 71.07 70.70 70.34 70.04 69.77 69.46 69.05 68.53 67.91 67.15 66.18 65.04 63.64 61.75 59.26 177 121 70.87 71.71 72.10 72.21 72.14 71.98 71.78 71.53 71.24 70.96 70.71 70.43 70.02 69.41 68.53 67.29 65.62 63.47 60.98 58.81 Duration Syllable 196 F0 126 F0 F0 (Hz) (Semitone) 153.85 7.28 153.42 7.23 153.01 7.17 152.51 7.11 152.42 7.09 152.38 7.08 152.45 7.09 152.22 7.06 151.97 7.03 151.27 6.96 150.76 6.90 150.17 6.84 149.58 6.77 149.10 6.73 149.37 6.80 147.78 6.62 146.89 6.52 145.75 6.40 146.23 6.45 146.34 6.47 Duration(ms) Syllable Voiced part 240.01 14.65 236.88 14.42 235.57 14.32 235.05 14.28 235.11 14.28 235.47 14.31 235.45 14.32 235.21 14.30 234.78 14.27 233.65 14.20 232.86 14.13 232.18 14.09 231.20 14.02 229.98 13.93 229.68 13.91 229.24 13.88 228.43 13.82 225.83 13.63 224.13 13.51 225.79 13.63 Intensity (dB) 71.28 71.94 72.14 72.12 71.99 71.84 71.66 71.37 71.01 70.61 70.14 69.45 68.36 66.81 65.00 63.27 61.54 59.80 57.75 55.54 320 210 70.26 72.06 72.57 72.53 72.32 72.10 71.83 71.37 70.83 70.27 69.62 68.81 67.77 66.65 65.43 64.07 62.26 60.17 57.80 55.54 Duration Syllable 361 F0 229 M c ăng Khoa - 101 - Female Interrogative sentence Male Master thesis 181.86 10.18 180.69 10.06 179.54 9.93 179.11 9.89 178.00 9.78 177.18 9.69 176.87 9.65 176.28 9.59 175.96 9.55 175.81 9.53 175.40 9.48 175.06 9.45 174.67 9.41 174.38 9.38 174.17 9.36 173.86 9.33 173.25 9.26 171.56 9.10 170.71 9.00 169.21 8.85 Duration(ms) Syllable Voiced part 312.38 19.63 307.61 19.39 305.55 19.27 302.99 19.14 301.12 19.03 299.87 18.96 298.81 18.90 298.09 18.86 297.63 18.83 297.28 18.81 297.09 18.80 297.07 18.80 297.01 18.80 297.24 18.81 297.68 18.84 298.50 18.88 298.50 18.88 297.57 18.83 296.96 18.79 294.52 18.66 73.56 74.16 74.50 74.69 74.75 74.69 74.51 74.25 73.92 73.55 73.12 72.63 72.05 71.34 70.47 69.36 67.95 66.20 64.18 62.07 132 86 72.34 73.23 73.80 74.14 74.34 74.42 74.44 74.36 74.20 73.94 73.57 73.10 72.51 71.80 70.93 69.83 68.42 66.62 64.38 61.75 Duration Syllable 145 F0 87 172.43 9.23 172.60 9.26 171.89 9.16 170.61 9.00 170.23 8.96 169.71 8.91 169.35 8.87 169.06 8.85 168.96 8.84 168.82 8.82 168.66 8.80 168.54 8.79 168.47 8.78 168.31 8.77 167.87 8.72 167.13 8.66 166.43 8.59 166.45 8.59 165.36 8.47 163.67 8.29 Duration(ms) Syllable Voiced part 298.25 18.82 294.71 18.63 292.32 18.49 290.69 18.40 289.01 18.29 287.49 18.21 286.66 18.16 285.86 18.11 285.43 18.09 285.56 18.09 285.83 18.11 286.29 18.14 286.90 18.18 287.53 18.22 288.06 18.25 288.71 18.29 289.32 18.33 289.54 18.34 288.86 18.30 286.81 18.18 72.50 73.09 73.25 73.25 73.17 73.00 72.76 72.52 72.29 72.07 71.82 71.52 71.08 70.46 69.66 68.66 67.33 65.51 63.02 59.91 185 127 70.57 71.83 72.73 73.36 73.78 74.01 74.09 74.06 73.93 73.72 73.42 73.04 72.59 72.07 71.40 70.46 69.14 67.35 65.07 62.32 Duration Syllable 168 F0 106 169.79 8.85 168.78 8.78 167.25 8.63 166.22 8.53 165.54 8.47 165.42 8.46 165.44 8.47 165.22 8.45 165.00 8.43 164.76 8.41 164.93 8.42 165.48 8.47 165.51 8.47 166.50 8.56 166.36 8.53 166.20 8.51 166.44 8.52 164.40 8.25 164.01 8.20 164.35 8.24 Duration(ms) Syllable Voiced part 291.53 18.42 286.30 18.12 283.53 17.95 281.86 17.86 281.04 17.81 280.33 17.77 280.58 17.79 280.23 17.77 280.17 17.76 280.29 17.77 281.01 17.80 282.34 17.88 283.69 17.96 285.85 18.08 288.17 18.21 288.25 18.22 291.93 18.42 291.64 18.40 292.02 18.41 290.68 18.33 71.48 72.34 72.38 71.95 71.32 70.92 70.64 70.34 70.14 69.99 69.74 69.22 68.35 67.23 66.13 65.13 64.03 62.55 60.67 58.35 281 187 70.55 72.38 73.19 73.44 73.45 73.27 73.00 72.61 72.13 71.63 71.08 70.35 69.49 68.61 67.63 66.31 64.50 62.22 59.60 57.14 Duration Syllable 281 F0 199 M c ăng Khoa - 102 - Female Imperative sentence Male Master thesis 178.07 9.72 177.27 9.64 176.55 9.58 175.76 9.50 174.98 9.42 174.36 9.35 174.08 9.33 173.94 9.32 173.88 9.32 173.76 9.31 173.48 9.28 173.27 9.26 173.16 9.24 173.06 9.23 172.97 9.23 172.81 9.21 172.68 9.20 172.54 9.18 172.31 9.16 171.42 9.07 Duration(ms) Syllable Voiced part 269.43 16.50 266.25 16.34 263.43 16.18 261.32 16.05 259.70 15.95 258.46 15.87 257.49 15.81 256.97 15.78 256.61 15.75 256.85 15.77 257.22 15.79 257.56 15.81 258.24 15.85 258.67 15.88 259.16 15.90 259.81 15.95 259.58 15.93 259.13 15.90 254.95 15.66 250.14 15.36 74.22 74.47 74.61 74.67 74.69 74.69 74.65 74.55 74.38 74.12 73.77 73.36 72.90 72.36 71.72 70.97 70.08 68.98 67.64 66.00 129 75 71.84 72.66 73.17 73.44 73.54 73.50 73.37 73.16 72.89 72.54 72.11 71.62 71.05 70.38 69.54 68.46 67.05 65.35 63.37 61.20 Duration Syllable 0.154 F0 0.092 166.28 8.52 165.50 8.43 165.40 8.41 164.87 8.35 164.13 8.26 162.93 8.11 161.65 7.95 161.41 7.94 162.59 8.14 162.85 8.20 162.68 8.18 162.43 8.15 162.09 8.11 162.02 8.10 161.85 8.09 161.67 8.07 161.44 8.04 161.17 8.01 160.71 7.96 159.67 7.86 Duration(ms) Syllable Voiced part 293.55 18.56 289.61 18.34 287.29 18.21 285.88 18.12 284.37 18.03 283.76 17.99 282.70 17.93 281.91 17.88 281.38 17.84 281.21 17.83 281.18 17.83 281.21 17.82 280.92 17.80 281.13 17.81 281.27 17.82 281.49 17.83 281.07 17.80 281.60 17.83 280.33 17.75 280.52 17.76 72.52 73.52 74.14 74.47 74.65 74.74 74.76 74.68 74.53 74.28 73.97 73.59 73.15 72.63 71.97 71.11 69.96 68.41 66.46 64.15 142 96 71.79 72.91 73.65 74.12 74.35 74.36 74.22 74.01 73.75 73.46 73.15 72.81 72.39 71.83 71.10 70.16 68.92 67.31 65.32 63.02 Duration Syllable 0.182 F0 0.110 178.57 9.71 178.89 9.77 178.16 9.70 177.91 9.67 177.80 9.66 177.57 9.63 177.27 9.60 177.12 9.59 177.00 9.59 176.24 9.51 174.95 9.39 173.77 9.28 172.74 9.19 171.43 9.07 169.59 8.89 167.57 8.68 166.24 8.55 165.10 8.43 162.74 8.15 162.07 8.04 Duration(ms) Syllable Voiced part 311.46 19.61 306.69 19.35 304.61 19.23 303.23 19.16 302.40 19.11 302.06 19.09 301.84 19.08 301.71 19.08 301.29 19.05 300.63 19.01 299.39 18.94 298.05 18.86 297.06 18.81 295.21 18.70 295.00 18.69 293.48 18.60 294.19 18.65 292.11 18.52 285.62 18.12 282.03 17.91 75.21 77.01 77.78 77.91 77.65 77.28 76.86 76.38 75.80 75.05 74.21 73.36 72.41 71.22 69.79 68.28 66.72 65.00 62.88 60.61 291 195 72.53 74.05 74.48 74.62 74.50 74.19 73.80 73.49 73.20 72.74 72.22 71.69 71.19 70.59 69.79 68.61 66.97 64.72 61.72 58.59 Duration Syllable 0.317 F0 0.205 M c ăng Khoa ... this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro -prosody on the micro -prosody, in three types of sentence: assertive,... characteristics of Vietnamese prosody to generate the ? ?prosody description” for speech synthesis In this thesis, we just focus on the differences of Vietnamese tones in different positions in the sentence... "naturalness" of synthesized speech is depends on ability of macro -prosody controlling during speech synthesis process Objectives and Tasks This thesis is part of MICA speech synthesis research

Ngày đăng: 19/02/2014, 08:58

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan