Huấn luyện mô hình âm học

Để có thể huẩn luyện mô hình âm học, ta cần các công cụ sau - Python 2.7.5

- Active Perl - SphinxTrain

Cấu trúc file để xây dựng cơ sở dữ liệu cho việc nhận dạng tiếng nói có dạng như sau: Etc – Thư mục cấu hình

- Lenh.dic - Phonetic dictionary - Lenh.phone - Phoneset file

- Lenh.lm.DMP – Mô hình ngôn ngữ - Lenh.filler – Danh sách tạp âm

- Lenh _train.fileids – Danh sách các file huấn luyện

- Lenh _train.transcription – Danh sách các câu dành cho huấn luyện - Lenh _test.fileids – Danh sách các file cho kiểm thử

55 Wav- Thư mục chứa file ghi âm

- Người nói _1

o file_1.wav - Recording of speech utterance - người nói_2

o file_2.wav

Trước khi bắt đầu huấn luyện, ta cần xây dựng một cơ sở dữ liệu như trên. - Từ điển âm học được sinh ra tại mục 4.3: Lenh.dic

- File Phoneset: là tập các âm xuất hiện trong từ điển âm học và âm SIL. Để đơn giản hóa quá trình tạo lập file phoneset, tôi đã xây dựng chương trình hxgenphone nhằm xây dựng file phoneset từ một từ điển âm học có sẵn. Ta thực hiện lệnh sau để xây dựng phoneset:

Hxgenphone Lenh.dic Lenh.phone

Kết quả ta thu được file phoneset có nội dung như sau:

Bảng 13 File Phoneset A B C D E G H I K L M N O P Q R S SIL T U V X Y

- Mô hình ngôn ngữ được xây dựng tại mục 4.2 ta thu được file Lenh.gram và Lenh.lm.DMP

- File tạp âm: Lenh.filler chứa các âm không phải là ngôn ngữ như tiếng thở, tiếng cười…, Lệnh.filler nội dung như sau:

Bảng 14 File tạp âm

<s> SIL </s> SIL <sil> SIL

Để nhanh chóng sinh ra file tạp âm chúng ta sử dụng lệnh

hxgenfiller Lenh.filler

- Các file transcription được sinh ra nhờ các lệnh sau:

hxgentranscription Lenh.txt Users.txt Lenh_train.transcription hxgentranscription Lenh.txt TestUsers.txt Lenh _test.transcription hxgenfileids Lenh.txt Users.txt Lenh _train.fileids

hxgenfileids Lenh.txt TestUsers.txt Lenh _test.fileids

Trong đó các chương trình hxgentranscription, hxgenfileids được xây dựng bởi tác giả nhằm đơn giản hóa cách thức xây dựng transcription và file id. Các chương trình này được xây dựng sử dụng ngôn ngữ lập trình C# và .NET Framework 3.5. Do đó để thực thi, môi trường phải hỗ trợ các nền tảng này. .NET Framework có thể dễ dàng cài đặt trên hệ điều hành Windows. Trên môi trường Linux, ví dụ Ubuntu, để có thể thực thi các chương trình này, các gói thư viện MONO cần phải được cài đặt trước.

File Users.txt liệt kê danh sách các thư mục chứa file wave tham gia huấn luyện, Users.txt có nội dung tương tự như sau:

Hình 27 Nội dung file Users.txt

File TestUsers.txt liệt kê danh sách các thư mục tham gia quá trình giải mã, TestUsers.txt có nội dung như sau:

Hình 28 Nội dung file TestUsers.txt

58 - Khởi tạo cấu hình huấn luyện (Windows)

python ../sphinxtrain/scripts/sphinxtrain -t Lenh setup

Sau khi lệnh trên được thực hiện thành công, thư mục etc sẽ được khởi tạo cùng với file cấu hình các tham số huấn luyện: sphinx_train.cfg. Các thông số quan trọng trong file cấu hình có thể được thay đổi để phù hợp với yêu cầu của chương trình.

Tham số định dạng file ghi âm:

Với mô hình đang xây dựng, chúng ta sử dụng định dạng wav

Bảng 15 Tham số định dạng file ghi âm

# Audio waveform and feature file information

$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav"; $CFG_WAVFILE_EXTENSION = 'wav';

$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw $CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";

$CFG_FEATFILE_EXTENSION = 'mfc'; $CFG_VECTOR_LENGTH = 13;

Tham số cấu hình file: đường dẫn đến các file filler, transcription, fileids, mô hình ngôn ngữ…

Bảng 16 Tham số cấu hình file

# Variables used in main training of models

$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic"; $CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone"; $CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler"; $CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids"; $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription" Tham số kiểu mô hình và tham số mô hình:

Mô hình chúng ta đang huấn luyện là mô hình liên tục, do đó cấu hình được lựa chọn là .cont

Bảng 17 Tham số kiểu mô hình và tham số mô hình

$CFG_HMM_TYPE = '.cont.'; # Sphinx 4, PocketSphinx #$CFG_HMM_TYPE = '.semi.'; # PocketSphinx

#$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets) Tham số đặc trưng âm thanh:

Bảng 18 Tham số đặc trưng âm thanh

# Feature extraction parameters

$CFG_WAVFILE_SRATE = 16000.0;

$CFG_NUM_FILT = 40; # For wideband speech it's 40, for telephone 8khz reasonable value is 31

$CFG_LO_FILT = 200; # For telephone 8kHz speech value is 200 $CFG_HI_FILT = 3500; # For telephone 8kHz speech value is 3500 Các tham số cho quá trình huấn luyện:

Bảng 19 Tham số quá trình huấn luyện

# Variables used in main training of models

$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic"; $CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone"; $CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler"; $CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids"; $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription"; $CFG_FEATPARAMS = "$CFG_LIST_DIR/feat.params"; Cấu hình tham số giải mã:

Bảng 20 Tham số giải mã

# Variables used in main training of models

$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic"; $DEC_CFG_DICTIONARY = "$CFG_BASE_DIR/etc/$CFG_DB_NAME.dic"; $DEC_CFG_FILLERDICT = "$CFG_BASE_DIR/etc/$CFG_DB_NAME.filler"; $DEC_CFG_LISTOFFILES = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}_test.fileids"; $DEC_CFG_TRANSCRIPTFILE = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}_test.transcription"; $DEC_CFG_RESULT_DIR = "$CFG_BASE_DIR/result";

- Huấn luyện: sau khi cài đặt các tham số phù hợp, ta tiến hành huấn luyện

python ../sphinxtrain/scripts/sphinxtrain run

Hình 29 Huấn luyện

Kết quả huấn luyện được lưu trong thư mục model_params/Lenh.cd_cont_200 với nội dung như sau:

Hình 30 Kết quả huấn luyện

Chúng ta có thể thực nhiện nhanh các bước trên băng cách thực hiện lần lượt các script được viết sẵn: clear.bat, gen_predata.bat, train.bat

61 Sơ đồ khối và nội dung file clear.bat

Bảng 21 Nội dung file clear.bat

set project=Lenh set base=Z:\Dropbox\LuanVan\HuanLuyen\ set tools=%base%SphinxTools\ set sphinxtrain=%base%\SphinxTrain\ set hxutils=%base%\HxUtils\ del %project%.arpa del %project%.dic del %project%.filler del %project%.html del %project%.idngram del %project%.lm.DMP del %project%.phone del %project%.vocab del %project%_test.fileids del %project%_test.transcription Bắt đầu

Thiết lập biến môi trường

Xóa dữ liệu huấn luyện cũ

Kết thúc

62 del %project%_train.fileids del %project%_train.fileids del %project%_train.transcription rmdir /Q /S bwaccumdir rmdir /Q /S etc rmdir /Q /S feat rmdir /Q /S logdir rmdir /Q /S model_architecture rmdir /Q /S model_parameters rmdir /Q /S qmanager rmdir /Q /S result rmdir /Q /S trees

63 Bắt đầu

Thiết lập biến môi trường

Xóa dữ liệu huấn luyện cũ

Sinh dữ liệu từ vựng

Sinh dữ liệu ngữ pháp

Sinh các file transcription & cấu

hình

Kết thúc

64 Bảng 22 File gen_predata.bat set project=Lenh set base=Z:\Dropbox\LuanVan\HuanLuyen\ set tools=%base%SphinxTools\ set sphinxtrain=%base%\SphinxTrain\ set hxutils=%base%\HxUtils\ del %project%.arpa del %project%.dic del %project%.filler del %project%.html del %project%.idngram del %project%.lm.DMP del %project%.phone del %project%.vocab del %project%_test.fileids del %project%_test.transcription del %project%_train.fileids del %project%_train.fileids del %project%_train.transcription rmdir /Q /S bwaccumdir rmdir /Q /S etc rmdir /Q /S feat rmdir /Q /S logdir rmdir /Q /S model_architecture rmdir /Q /S model_parameters rmdir /Q /S qmanager rmdir /Q /S result rmdir /Q /S trees

%tools%text2wfreq < %project%.txt | %tools%wfreq2vocab > %project%.vocab %tools%text2idngram -vocab %project%.vocab -idngram %project%.idngram < %project%.txt

%tools%idngram2lm -vocab_type 0 -idngram %project%.idngram -vocab %project%.vocab -arpa %project%.arpa

%tools%sphinx_lm_convert -i %project%.arpa -o %project%.lm.DMP %hxutils%hxgendic %project%.vocab %project%.dic

%hxutils%hxgenphone %project%.dic %project%.phone %hxutils%hxgentranscription %project%.txt Users.txt %project%_train.transcription

%hxutils%hxgentranscription %project%.txt TestUsers.txt %project%_test.transcription

%hxutils%hxgenfileids %project%.txt Users.txt %project%_train.fileids %hxutils%hxgenfileids %project%.txt TestUsers.txt %project%_test.fileids %hxutils%hxgenfiller %project%.filler

python %sphinxtrain%scripts\sphinxtrain -t %project% setup

Sơ đồ khối và nội dung file train.bat

Bảng 23 File train.bat set project=Lenh set base=Z:\Dropbox\LuanVan\HuanLuyen\ set tools=%base%SphinxTools\ set sphinxtrain=%base%\SphinxTrain\ set hxutils=%base%\HxUtils\

copy %project%.dic etc\%project%.dic copy %project%.filler etc

Bắt đầu

Thiết lập biến môi trường

Sao chép các file dữ liệu cần thiết

Huấn luyện

Kết thúc

copy %project%.lm.DMP etc copy %project%.phone etc copy %project%_test.fileids etc

copy %project%_test.transcription etc copy %project%_train.fileids etc

copy %project%_train.transcription etc python %sphinxtrain%scripts\sphinxtrain run pause

Đánh giá và lựa chọn giải pháp