Xây dựng cơ sở dữ liệu cho xu hướng tổng hợp tiếng việt chất lượng tốt

Journal of Science & Technology 101 (2014) 179-181 Building Databases for Good Quality Vietnamese Synthesis Trinh Van Loan'*, Dinh Dong Laong^, Pham Thi Kim Ngoan^ LeXuan Thanh' 'Hanoi University of Science and Technology, No 1, Dai Co Viet Str., Hai Ba Trung, Ha Noi, Viet Nam ^Nha Trang University Received: March 05; accepted: April 22, 2014 Abstract The Vietnamese ts a monosyllabic and tonal language Therefore, in order to make high-quality synthesized Vietnamese units, it is necessary to synthesize six tones whose charactenstics are as close to natural language as possible In this paper, we propose a new approach to build Vietnamese databases for synthesizing the tones of Vietnamese with good quality In addition, the databases can be used for other Vietnamese synthesis applications using concatenation synthesis method Keyword Vietnamese database, good quality, tonal, concatenation Introduction Until now, Vietnamese synthesis using concatenation method has achieved some initial results [1-3], However, these results are still limited Through practice and research, we have recognized that the quality of Vietnamese synthesis mostly depends on the quality of tonal synthesis and database Establishing die database which satisfies both two factors above in which tonal synthesis comes fu^t has been conducted for good quality Viemamese synthesis In this article, the first part introduces some basic charactenstics of Vietnamese phonetics as a background for our paper The next part describes some stages which have been conducted to build database for good quality Viemamese synthesis using sound unit concatenation and the last part is assessment The scientific significance of database built by our method is that we can implement a synthesizer with unlimited vocabulary of any individual voice once we have built database of him or her Another method to build Vietnamese database for synthesizer using sound unit concatenation did not use natural tones but synthesized tones [4], With our method, the quality of Vietnamese tones is quite namral An essay on this method using lunited vocabulary has proven its advantage [5] vocabulary formed from one or two morphemes is called monosyllables, disyllables and polysyllables Viemamese is a tonal language with tones: level (no mark), hanging, sharp, heavy, askuig and tumbling In Vietnamese, tone plays a role as a syllable taking part m syllable and word forming and word meaning distinction Moreover, tone is the factor that creates Vietoamese specific charactenstics, Vietnamese phonemes and syllable structure system Based on the modem Vietnamese development, basic phonemes system mcludes 14 vowels and 22 consonants [6] Each vowels could stand alone or match with other one or two ones to form rhyme In a ftill form, each Vietnamese syllable includes parts: initial, onset, nucleus, coda and tone Except initial, the others are called final or rhyme They work together as the table below: Table I Syllabic structure in Vietnamese Tone Final (Rhyme) Initial Basic characteristics of Vietnamese phonetics Viemamese is a monosyllabic language [6] The word doesn't change morphology or codas (tails) to indicate grammatical categones In terms of word structure, Vietnamese does not use affixes and little morphemes Viemamese is analytic language without any boundaries between syllables and morphemes Each syllable has one morpheme, Viemamese •Con-esponding Aulhor Tel (+S4) 903.277,732 Email loantv@soict.hust.edu,vn Onset Nucleus Coda For example, syllable "toan" is analyzed as followed: initial /t/, onset /o/, nucleus /a/, coda ltd and sharp tone 3.1 Initial consonants systems Vietnamese has 22 initial consonants as table 2: Journal of Science & Technology 101 (2014) 179-181 Table Vietnamese consonants above, two issues should be reviewed Database allows to synthesize tone relevant to natural voice and the quality of speech signals m database need to be m a good recording condition Furthermore, to form the database better for synthesis, we need to solve such problems as build completed database which satisfies the requirements, choice of voices to record and script organization The choice of speaker voice depends only on which type of voice (male, female, old or young people ,) we want to synthesize ¥ ¥ b Ihl 12 nh Cfklq Ikl 13 ng/ngh ch Id 14 d/gl III 15 P ph d lil 16 r Ifl ki g7gh s 1,1 h 111 Ihl 17 18 III kh ¥ 19 Ih /(' ('/ tr 10 m /!/ 20 Iml 21 V Ivl 1! » lui 22 ' Isl Ip/ 3.3 Nucleus To meets such requirements, database should be formed with sound files which relevant to one syllable Each recorded syllable has a defined syllable to synthesize Following this idea, each syllable is divided into units: initial and coda The main parts of initial and coda unit equivalent is initial and nucleus as shown m table 1: syllabic structure in Vietnamese, According to the results of die research [7] and the division into units of sounds, the initial units are relevant to level tone and the rest coda is relevant to all tones Therefore, when building the database for initial sounds, we only record the sounds of the respective level tone In terms of coda, we record all tones Vietnamese has 16 vowels categorized into 14 groups as Table 3: 4.1 Syllable list foundation in database /(/ Onsets which have functions as tonal depression are semi-vowels Viemamese has semivowels: li /and lui Basmg on Viemamese syllable structure and using computer, we have founded the completed list of syllables which need recordmg List foundation is conducted by the combinatorial method with a purpose of takmg all probable case of Viemamese syllables Following the combination stage, we eliminate some cases which not exist in Viemamese and filter a list of sounds to record by manual method Syllables are recorded according to the defined number of initial and coda unit Table Nucleus system in Vietnamese , /a/ a a e Id 11 u /m/ a Id 12 ua/tio /up/ J, N 13 ifa/jm /itsr/ hi 14 ia/ie/ya/ye IW m a hi iti 10 u /u/ lol Initials foundation By combining initial units with nucleus vowels, we get 324 combinations After manual elirmnation phase, 294 combinations remain For instance, some combinations which not exist m Vietnamese are: "ce", "ce", "ci", "nghu", "nghu", ., Apart from /zero/ coda, Vietnamese coda has consonants and semi-vowels as Table 4: Table coda sysiem in Vietnamese m Iml n Inl dch Ikljd ng/nh /y P 111 •ly olu h' Coda foundation: by combining onsets, nucleus and coda m the table of Vietnamese syllable structure, we finally get 721 combinations existmg in Vietnamese In particularly, by combining onset with nucleus and removing non-existing combinations, we get 187 combinations Keepmg to take these 187 combinations to combine with coda, we collect 2244 combinations Nexl, we extrude not existing combinations in Viemamese, 721 combinations remain For example, some eliminated combinations which not exist in Vietnamese are "at", "at", "af', "ap", "ap", "a", "ai", "So", III ¥ Database building We have constructed the database to synthesize good quality Vietnamese with an aim to recreate the most natural tones Tonal quality, instead of capacity of database has been put into the top prionty In order to construct this database which meets requirements In total 1015 combinations have been established These combinations combine with 181 Journal ofScience & Technology 101 (2014) 179-181 necessary characters to form a list of need-to-record syllables in which there are some similar pronunciations Accordingly, we only have to record 976 syllables 4.2 Recording scripts After finishing the syllable list foundation, we should ensure to prepare the record script which bnngs about the best results In terms of coda combinations, we conduct to combine /n/ or Ixl m front of these syllables For examples, in order to earn coda combmations "u&ng", "oan", we will record "tuong", "toan" sounds or "nuong", "noan" The consonants /n/ (or any voiced consonant) or /t/ have been chosen as the first phoneme in a recorded syllable because we can exttact coda more easily from syllables consttucted with these consonants This method enables exttaction of syllables and sound units to automatic or semi-auto work In order to reduce coarticulation phenomena to the lowest level, the list of recorded syllables should be mdependently displayed in computer screen At one moment, just one recorded syllable is shown in I second 4.3 Recording Recording equipment is Computerized Speech Lab Model 4500 (CSL Model) from Kay PENT AX specified for speech recording and analysis The recording room is isolated from the noise from extemal envhonment Recording process was implemented in the studio of School of Information Technology and Communications in Hanoi University of Science and Technology The sampling frequency is 16000 Hz with 16 bits per sample The speaker will read regularly, clearly and decisively recorded syllables With average duration for each syllable is about 250 ms, recording time is 244000 ms (244 seconds) At fu^t, we conducted to record with three voices- one man's, one woman's and one's children's ones Continuously recording time for a 976-syllable set is 20 minutes (breaks beriveen syllables included) The total capacity of 1015 syllables is 10MB foreach voice It is the database we built for research goals In terms of practical applications, after extracting the initial or coda for synthesis, the rest should be eliminated Then the total capacity reduces to 5,8MB Accordmg to the calculated results, the average signal to noise ratio is 21 dB which is good and acceptable [4] S Conclusion In summary, we have uitroduced the method of build mg databases for good quality Viemamese synthesis The initial results suggest thai voice synthesis is satisfactory It is believed that building the database by this method creates favorable conditions to conduct Viemamese dialect synthesis and any voice that we want to synthesize In addition, the database that we build is also used for another synthesis application, especially Viemamese synthesis using concatenation method References [ I ] Tran Do Dat, Eric Castelii, Sengnat Jean-Francois, Trinh Van Loan, Le Xuan Hung Linear FO Contour Model for Viemamese Tones and Viemamese Syllable Synthesis with TD-PSOLA Proc TAL 2006, La Rochelle, Apnl 2006 [2] Nguyen Thanh Kien, Nguyen Due Thang, Le Thai Hoa, Tnnh Van Loan,"DSP-based Embedded System for Text to Speech Synthesis of Viemamese", Proceeding of the 2™* Asia Pacific International Conference on Information ScienceandTechnology, Hanoi, December (2007) 215219, [3] Hansjorg Mixdorff, Nguyen Hung Bach, Hiroya Fujisaki, Mai Chi Luong, "Quantiiaiive Analysis and Synthesis of Syllabic Tones in Viemamese", EuroSpeech 2003 - GENEVA [4] Trin D6 Dat, Eric Castel li, Tn nh Van Loan, Le V:?t BSc, Building a large Vietnamese Speech Database, Tap chi Khoa hoc va Cong ngh? (ISBN 0868-3980) Vol 46/47 (2004) 13-17, [5] La The Vinh, Trinh Van Loan, "Vietnamese Recognition and Synthesis with T-engme Embedded System", Proceeding of the 2"'' Asia Pacific International Conference on Information Science and Technology, Hanoi, (2007) 133-137, it ban [7] TrSn Do Dat, Eric Castelii, Sengnat Jean-Francois, Le Xuan Himg, Trinh Van Loan Influence of FO on Vietnamese syllable perception Proc of Interspeech 2005, Lisbon, (2006) 1697-1700 ... problems as build completed database which satisfies the requirements, choice of voices to record and script organization The choice of speaker voice depends only on which type of voice (male, female,... "toan" sounds or "nuong", "noan" The consonants /n/ (or any voiced consonant) or /t/ have been chosen as the first phoneme in a recorded syllable because we can exttact coda more easily from... isolated from the noise from extemal envhonment Recording process was implemented in the studio of School of Information Technology and Communications in Hanoi University of Science and Technology

Định dạng
Số trang	3
Dung lượng	152,02 KB