Thiết kế hệ thống nhúng nhận dạng câu nói tiếng việt, ứng dụng trong hướng dẫn đường đi ở trường đại học bách khoa đhqg tp hcm

ĐẠI HỌC QUỐC GIA THÀNH PHỐ HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA - TRẦN VĂN HOÀNG THIẾT KẾ HỆ THỐNG NHÚNG NHẬN DẠNG CÂU NÓI TIẾNG VIỆT, ỨNG DỤNG TRONG HƯỚNG DẪN ĐƯỜNG ĐI Ở TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐHQG TP.HCM Chuyên ngành: Kỹ thuật điện tử Mã số: 60520203 LUẬN VĂN THẠC SĨ TP HỒ CHÍ MINH, THÁNG 07 NĂM 2014 CƠNG TRÌNH ĐƯỢC HOÀN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐHQG TP.HCM Cán hướng dẫn khoa học: TS Hoàng Trang…………………………… Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 1: TS Huỳnh Phú Minh Cường………………… (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 2: TS Lê Chí Thơng……………………………… (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 3: TS Nguyễn Minh Hoàng……………………… (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 4: TS Lê Ngọc Phú………………………………… (Ghi rõ họ, tên, học hàm, học vị chữ ký) Luận văn thạc sĩ bảo vệ Trường Đại Học Bách Khoa, ĐHQG TP.HCM ngày 15 tháng 07 năm 2014 Thành phần hội đồng đánh giá luận văn thạc sĩ gồm: (Ghi rõ họ, tên, học hàm, học vị Hội đồng chấm bảo vệ luận văn thạc sĩ) TS Huỳnh Phú Minh Cường (Chủ tịch) TS Lê Chí Thơng (Thư ký) TS Nguyễn Minh Hoàng (Ủy viên phản biện) TS Lê Ngọc Phú (Ủy viên phản biện) TS Hoàng Trang (Ủy viên) Xác nhận Chủ tịch Hội đồng đánh giá luận văn Trưởng Khoa quản lý chuyên ngành sau luận văn sữa chữa (nếu có) CHỦ TỊCH HỘI ĐỒNG TRƯỞNG KHOA ĐIỆN – ĐIỆN TỬ ĐẠI HỌC QUỐC GIA TP.HCM CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM TRƯỜNG ĐẠI HỌC BÁCH KHOA Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: ………Trần Văn Hoàng………………… MSHV:…12140019 Ngày, tháng, năm sinh: ….25/08/1989 …………………… Nơi sinh: …T T Huế Chuyên ngành: ….Kỹ thuật điện tử………………………… Mã số : …60520203 I TÊN ĐỀ TÀI: …Thiết kế hệ thống nhúng nhận dạng câu nói Tiếng Việt, ứng dụng hướng dẫn đường Trường Đại học Bách Khoa – ĐHQG TP.HCM II NHIỆM VỤ VÀ NỘI DUNG: ….Thiết kế hệ thống nhúng nhận dạng câu nói Tiếng Việt, thiết bị có khả giao tiếp với người điều khiển thơng qua câu nói qua lại, hướng tới ứng dụng hướng dẫn đường Trường Đại học Bách Khoa – ĐHQG TP HCM III NGÀY GIAO NHIỆM VỤ : (Theo QĐ giao đề tài)…24/06/2013 IV NGÀY HOÀN THÀNH NHIỆM VỤ: (Theo QĐ giao đề tài) 23/05/2014 V CÁN BỘ HƯỚNG DẪN (Ghi rõ học hàm, học vị, họ, tên): TS Hoàng Trang Tp HCM, ngày tháng năm 20 CÁN BỘ HƯỚNG DẪN CHỦ NHIỆM BỘ MÔN ĐÀO TẠO (Họ tên chữ ký) (Họ tên chữ ký) TRƯỞNG KHOA….……… (Họ tên chữ ký) LỜI CÁM ƠN Trong trình thực đề tài luận văn này, em gặp phải nhiều khó khăn Nhưng, hướng dẫn tận tình tạo điều kiện thuận lợi Thầy Hoàng Trang, em tháo gỡ nhiều vấn đề, qua hồn thành đề tài Vì em xin gửi lời cám ơn chân thành tới Thầy Xin cám ơn Thầy nhiều! Em xin cám ơn Thầy Hội đồng luận văn Thạc sĩ có nhận xét, đánh giá góp ý xác bổ ích để giúp em chỉnh sửa hồn thiện luận văn tốt Xin cám ơn Thầy nhiều! Em xin cám ơn Bộ môn Kỹ thuật Điện tử, Khoa Điện – Điện tử, Trường Đại học Bách Khoa Tp HCM, đơn vị em công tác, tạo điều kiện thuận lợi để em có thời gian, thiết bị để thực đề tài luận văn Và cuối cùng, em xin cảm ơn gia đình người thân bên cạnh, động viên, giúp đỡ mặt tinh thần trình thực luận văn Tp Hồ Chí Minh, ngày 30 tháng 07 năm 2014 Học viên Trần Văn Hồng i TĨM TẮT LUẬN VĂN Đề tài “Thiết kế hệ thống nhúng nhận dạng câu nói Tiếng Việt, ứng dụng hướng dẫn đường trường Đại Học Bách Khoa – ĐHQG TP.HCM” với mục tiêu thiết kế hệ thống nhúng có hệ điều hành có khả giao tiếp với người điều khiển thơng qua câu nói tiếng Việt, hướng ứng dụng hướng dẫn đường trường Đại học Bách Khoa – ĐHQG TP.HCM  Hệ thống nhận dạng câu nói tiếng Việt  Thiết bị có khả giao tiếp với người điều khiển thơng qua câu nói qua lại  Xây dựng đồ khu vực cách đơn giản (mà cụ thể trường Đại học Bách Khoa – ĐHQG TP.HCM) giải thuật định tuyến đường để từ ứng dụng hướng dẫn đường trường giọng nói  Ứng dụng xây dựng hệ thống nhúng có hệ điều hành ii SUMMARY OF THESIS The thesis “Designing embedded system for Vietnamese continuous speech recognition, application-oriented in directions at University of Technology, Vietnam National University, Ho Chi Minh City” with the goal of designing an embedded Linux system is capable to communicate with the users via Vietnamese sentences, application-oriented in directions at University of Technology, Vietnam National University, Ho Chi Minh City  The system has the ability to recognize the Vietnamese sentences  The device has the ability to communicate with the users via sentences  Develop an areas map in a simple way (i.e University of Technology, HCM City) and research the routing path algorithm  The application is built on embedded Linux system iii LỜI CAM ĐOAN Tôi xin cam đoan tơi làm trình bày hồn tồn thật cơng sức thân tơi Học viên Trần Văn Hồng iv MỤC LỤC LỜI CÁM ƠN i TÓM TẮT LUẬN VĂN ii SUMMARY OF THESIS iii LỜI CAM ĐOAN iv MỤC LỤC .v DANH SÁCH CÁC CHỮ VIẾT TẮT ix DANH SÁCH BẢNG .x DANH SÁCH HÌNH xi CHƯƠNG 1: TỔNG QUAN .1 1.1 Giới thiệu đề tài 1.2 Tổng quan tình hình nghiên cứu trong, ngồi nước 1.2.1 Hệ thống nhận dạng tiếng nói: 1.2.2 Hệ điều hành nhúng cho ứng dụng thời gian thực[92]: 13 1.2.3 Ứng dụng định tuyến đường đi: 15 1.3 Mục tiêu đề tài 16 1.3.1 Mục tiêu tổng quát 16 1.3.2 Mục tiêu cụ thể 17 CHƯƠNG 2: HỆ THỐNG NHẬN DẠNG TỪ ĐƠN TIẾNG VIỆT .18 2.1 Thuật tốn trích đặc trưng MFCC 18 2.1.1 Pre-emphasis 18 2.1.2 Frame Blocking 19 2.1.3 Windowing 20 2.1.4 FFT 22 2.1.5 Mel Frequency Bank 23 v 2.1.6 Cepstral Analysis 24 2.1.7 Energy Calculation 25 2.1.8 Delta Coefficient 26 2.1.9 Kết luận 26 2.2 Mơ hình Markov ẩn 27 2.2.1 Quá trình Markov rời rạc 27 2.2.2 Mơ hình Markov ẩn 29 2.2.3 Ba vấn đề HMM 31 2.2.4 Các loại mơ hình HMM 43 2.2.5 Mật độ quan sát liên tục HMM 46 2.2.6 Kết luận 47 CHƯƠNG 3: NHẬN DẠNG CÂU NÓI LIÊN TỤC 48 3.1 Giới thiệu nhận dạng câu nói liên tục dựa đơn vị subword 48 3.1.1 Các đơn vị tiếng nói subword Mơ hình ngữ âm cho tiếng Việt 49 3.1.2 Các mô hình đơn vị subword dựa HMMs 53 3.1.3 Huấn luyện đơn vị subword 54 3.1.4 Các mơ hình ngơn ngữ cho nhận dạng tiếng nói từ vựng lớn 56 3.1.5 Mơ hình thống kê ngôn ngữ 56 3.1.6 Sự phức tạp mô hình ngơn ngữ 57 3.1.7 Tổng thể hệ thống nhận dạng dựa đơn vị subword 58 3.2 Xây dựng chương trình nhận dạng câu nói Tiếng Việt hệ điều hành Linux 59 3.2.1 Khối tiền xử lý xây dựng vector đặc trưng 60 3.2.2 Dữ liệu huấn luyện 62 3.2.3 Mơ hình ngữ âm 63 3.2.4 Mơ hình từ vựng 63 3.2.5 Mơ hình ngơn ngữ 65 vi 3.2.6 Khối nhận dạng 66 CHƯƠNG 4: GIẢI THUẬT ĐỊNH TUYẾN 67 4.1 Giới thiệu 67 4.2 Giải thuật Dijkstra 68 4.2.1 Nội dung giải thuật 68 4.2.2 Lưu đồ giải thuật 74 4.3 Giải thuật A* 75 4.3.1 Nội dung giải thuật 75 4.3.2 Lưu đồ giải thuật 76 4.3.3 Tối ưu hóa thời gian tính tốn cho giải thuật A* 77 4.3.4 Kết luận 78 4.4 Ứng dụng đồ thực tế 79 CHƯƠNG 5: LINUX NHÚNG KIT BEAGLEBOARD – xM VÀ HỆ ĐIỀU HÀNH 80 5.1 Kit BeagleBoard-xM 80 5.1.1 Tổng quan BeagleBoard-xM 80 5.1.2 Đặc tính BeagleBoard-xM 80 5.1.3 Cấu trúc hệ thống BeagleBoard-xM 83 5.2 Hệ điều hành linux nhúng 83 5.2.1 Cấu trúc hệ điều hành Linux tương tác phần cứng 84 CHƯƠNG 6: KẾT QUẢ VÀ THẢO LUẬN 86 6.1 Kết khảo sát độ xác nhận dạng hệ thống nhận dạng từ đơn 86 6.1.1 Khảo sát hệ số a khối Pre-emphasis 86 6.1.2 Khảo sát số lọc khối Mel Frequency Filter Bank 87 6.1.3 Khảo sát số hệ số đặc trưng 89 6.1.4 Khảo sát số trộn 90 vii HVTH: Trần Văn Hồng GVHD: TS Hồng Trang - tơi muốn đến ban quản lí mạng - tơi muốn đến ban giáo trình - tơi muốn đến ban thi đua - muốn đến ban tra giáo dục Các thông số xây dựng đánh sau:  Mỗi câu huấn luyện với 100 lần đọc / câu Như với 15 câu ta có 1500 câu huấn luyện  Ứng với 15 câu từ vựng lúc gồm 50 từ đơn bao gồm từ sau: “án, ban, bảo, bị, chất, chính, chức, cơng, giáo, đào, đại, đảm, đến, đối, đua, dục, dự, hành, hệ, hoạch, học, kế, khoa, lí, lượng, mạng, muốn, nghệ, ngoại, phòng, quan, quản, quốc, sau, sinh, tài, tạo, tác, tế, thanh, thi, thiết, thư, tôi, tổ, tra, trình, trị, và, viện, viên”  Hệ thống với hướng ứng dụng hướng dẫn đường trường Đại Học Bách Khoa, cho nên, hệ thống phải nhận dạng câu với cú pháp linh động sau: o Tơi muốn đến phịng tổ chức hành o Muốn đến phịng tổ chức hành o Đến phịng tổ chức hành o Phịng tổ chức hành  Hệ thống đánh giá độ xác nhận dạng câu với cú pháp Mỗi câu kiểm tra nhận dạng với khoảng từ 20 tới 30 lần đọc Tổng câu đem nhận dạng 355 câu  Các câu dùng cho huấn luyện nhận dạng mẫu khác nhau, thu giọng người, điều kiện giống nhau, thời điểm khác  Các câu dùng để đánh giá độ xác thu âm từ trước cho chạy tự động để nhận dạng offline CHƯƠNG 6: KẾT QUẢ VÀ THẢO LUẬN 98 HVTH: Trần Văn Hoàng GVHD: TS Hoàng Trang  Kết nhận dạng: o Câu nhận dạng đúng: 100% o Từ nhận dạng đúng: 100% Phương pháp sử dụng sử dụng “mơ hình ngôn ngữ”, bao gồm ngữ pháp ngữ nghĩa, để phát thống kê lại từ thường chung với từ nào, trước sau từ từ để tạo thành câu hay đoạn có nghĩa Mơ tả mơ hình ngữ pháp đơn giản mơ tả hình sau: phịng Bắt đầu câu muốn đến ban thư tổ chức đào tạo hành sau đại quốc tế cơng tác trị kế hoạch tài khoa học cơng nghệ quản trị thiết bị quan hệ đối ngoại đảm bảo chất lượng quản lí mạng giáo trình thi đua tra giáo học sinh viên dự án Kết thúc câu dục viện Hình 6.3-1 Mơ hình ngữ pháp ứng dụng câu nói liên tục Chú thích: Các có dấu Caro từ xuất không xuất câu nhận dạng Một số từ xuất câu khác từ “chính” “tổ chức hành chính” hay kế hoạch tài chính”… thống kê lại trường hợp với từ CHƯƠNG 6: KẾT QUẢ VÀ THẢO LUẬN 99 GVHD: TS Hoàng Trang HVTH: Trần Văn Hồng Kết luận: Như vậy, cách sử dụng mơ hình ngơn ngữ, ta làm tăng độ xác nhận dạng câu tiếng Việt Đồng thời, từ gần âm như: “quan”, “quản” hay “án”, “ban”, hay “viện”, “viên” nhận dạng xác kết hợp thêm điều kiện ngữ cảnh xuất từ 6.4 Bài báo khoa học đăng hội nghị tạp chí quốc tế Trong trình thực đề tài, em có báo khoa học Trong hội nghị quốc tế tạp chí quốc tế Tran Van Hoang, Nguyen Ly Thien Truong, Hoang Trang, Xuan-Tu Tran, “Design and Implementation of a SoPC System for Speech Recognition”, Lecture Notes in Electrical Engineering, Multimedia and Ubiquitous Engineering MUE 2013, Springer Hoang Trang, Tran Van Hoang, “Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition”, Journal of IKEEE, Vol 17, No 3, 241 ~ 247, 2013 Tran Xuan Thien, Hoang Trang, Tran Van Hoang, Nguyen Huu Phuoc, “The effect of MFCC extraction methods on speech recognition embedded system”, ICGHIT 2014 (International Conference on Green and Human Information Technology 2014), 2014, Ho Chi Minh – Vietnam CHƯƠNG 6: KẾT QUẢ VÀ THẢO LUẬN 100 HVTH: Trần Văn Hoàng GVHD: TS Hoàng Trang CHƯƠNG 7: KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 7.1 Kết luận Những điểm đạt đề tài:  Tìm hiểu nghiên cứu hệ thống nhận dạng tiếng nói, hệ điều hành Linux nhúng, giải thuật định tuyến đường  Xây dựng hệ thống nhận dạng từ đơn Tiếng Việt hệ thống nhận dạng câu nói liên tục Tiếng Việt chạy hệ điều hành Linux bao gồm PC kit ARM BeagleBoard-xM  Xây dựng lại số thành phần hệ điều hành Linux nhúng để mục đích làm giảm dụng lượng lưu trữ tăng tốc độ  Xây dựng ứng dụng chạy kit BeagleBoard-xM nhận dạng câu nói liên tục hướng ứng dụng dẫn đường  Khảo sát số thông số hệ thống nhận dạng tiếng nói áp dụng cho Tiếng Việt  Có báo khoa học đăng hội nghị tạp chí quốc tế Những điểm chưa đạt được:  Chưa hoàn thành kịp kết hợp module GPS vào hệ thống để đưa vào báo cáo luận văn (Hi vọng tới ngày bảo vệ kịp để bổ sung thêm kết quả)  Ứng dụng nhận dạng phụ thuộc vào giọng người huấn luyện nên để ứng dụng thực tế cần phát triển thêm giọng nhiều người  Chưa thể ứng dụng thực tế thành phần nguồn, cáp kết nối, hình chưa linh động (dùng hình Desktop, cục nguồn cắm với ổ điện, cáp kết nối không gọn…) CHƯƠNG 7: KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 101 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng 7.2 Hướng phát triển Đề tài tiếp tục nghiên cứu phát triển thêm phần sau:  Phát triển hệ thống nhận dạng câu nói tiếng Việt độc lập người nói, từ vựng lớn hơn, để áp dụng rộng rãi cho nhiều người hoàn cảnh ứng dụng lớn  Ứng dụng công nghệ GPS, 3G để truy suất liệu online trường hợp ứng dụng offline không đủ sở liệu (tăng tính linh động cho hệ thống)  Hồn thiện sản phẩm cho gọn nhẹ, dễ sử dụng, dễ di chuyển để ứng dụng thực tế CHƯƠNG 7: KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 102 HVTH: Trần Văn Hoàng GVHD: TS Hoàng Trang TÀI LIỆU THAM KHẢO [1] Bạch Hưng Khang, “Nghiên Cứu Phát Triển Công Nghệ Nhận Dạng, Tổng Hợp Xử Lý Ngôn Ngữ Tiếng Việt”, đề tài cấp Nhà Nước KC01-03, năm 20012004 [2] Lương Chi Mai , “Nghiên cứu, phát triển số sản phẩm tiêu biểu thiết yếu xử lý tiếng nói văn tiếng Việt”, đề tài cấp Nhà Nước KC01.01/06-10, năm 2006 - 2010 [3] Nguyễn Văn Giáp, Trần Việt Hồng, “Kỹ thuật nhận dạng tiếng nói ứng dụng điều khiển”, Bộ mơn Cơ điện tử - Khoa Cơ khí – Đại học Bách Khoa TPHCM [4] Lê Tiến Thường (chủ nhiệm), Hồng Đình Chiến, Trần Tiến Đức… “Ứng dụng Wavelets nhận dạng tiếng nói Tiếng Việt điều khiển thơng tin” Báo cáo nghiệm thu đề tài NCKH trọng điểm ĐHQG TP HCM ngày 28-01-2004 [5] Lê Tiến Thường, Hoàng Đình Chiến “Vietnamese Speech Recognition Applied to Robot Communications” Au Journal of Technology, Volume No January 2004 Published by Assumption University (ABAC) Hua Mak, Bangkok, Thailand [6] Carlos Asmat, David López Sanzo, Kanwen Wu, “Speech Recognition Using FPGA Technology”, Master thesis, University of McGill, 2007 [7] Online: http://www.datasheetcatalog.com/datasheets_pdf/H/M/2/0/HM2007.shtml [8] Lawrence Rabiner, Biing - Hwuang Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993, ISBN 0-13015157-2 [9] Joe Tebelskis, “Speech Recognition using Neural Networks”, PhD thesis, Carnegie Mellon University, 1995 Fang Zheng, Guoliang Zhang, Zhanjiang Song, “Comparison of Different Implementations of MFCC” , J Computer Science & Technology, 16(6):582589, Sept 2001 [10] Chin Kim On, Paulraj M Pandiyan, Sazali Yaacob, Azali Saudi, “MelFrequency Cepstral Coefficient Analysis in Speech Recognition”, International Conference on Computing & Informatics, 2006 ICOCI '06., pp.1-5 [11] Sung-Nam Kim, In-Chul Hwang, Young-Woo Kim, Soo-Won Kim, “A VLSI chip for isolated speech recognition system”, IEEE Transactions on Consummer Electronics, vol.42, No.3, 1996, pp.458-467 [12] HAN Wei, “A Speech Recognition IC with an Efficient MFCC Extraction Algorithm and Multi-mixture Models”, the Chinese University of Hong Kong, Doctor of philosophy thesis, September 2006 [13] TÀI LIỆU THAM KHẢO 103 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng J Markel and A.H Gray, Jr., Linear Prediction of Speech, SpringerVerlag,New York, New York, USA, 1980 [14] B.S Atal, “Predictive Coding of Speech at Low Bit Rates,” IEEE Transactions on Communications, vol 30, no 4, pp 600-614, April 1982 [15] J Campbell and T.E Tremain, “Voiced/ Unvoiced Classification of Speech with Applications to the U.S Government LPC-10E Algorithm,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 473-476, Tokyo, Japan, April 1986 [16] B.S Atal and J.R Remde, “A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 614-617, Paris, France, 1982 [17] V.R Viswanathan and J Makhoul, “Quantization Properties of Transmission Parameters in Linear Predictive Systems,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 23, no 3, pp 309-321, June 1975 [18] B.S Atal, “Linear Prediction For Speaker Identification,” Journal of the Acoustical Society of America, vol 55, no 6, pp 1304-1311, June 1974 [19] G.R Doddington, J Picone, and J.J Godfrey, “The LPC trace as an HMM development tool,” Journal of the Acoustical Society of America, Vol 84, pg S1-561A, Fall 1988 [20] Joseph W Picone , “Signal Modeling Techniques in Speech Recognition”, Proceedings of the IEEE, vol.81, No.9, 1993, pages: 1215-1247 [21] R G Lyons, Addison Wesley, “Understanding Digital Signal Processing” ,1997, ISBN 0-201-63467-8 [22] M.A.Anusuya , S.K.Katti “Speech Recognition by Machine: A Review” International journal of computer science and Information Security, 2009 [23] Kenneth Thomas Schutte “Parts-based Models and Local Features for Automatic Speech Recognition” B.S., University of Illinois at UrbanaChampaign (2001) [24] Hoang Trang, Vo Quoc Viet, Nguyen Ly Thien Truong, “FPGA Architecture of HMM-based Decoder Module in Speech Recognizer”, IEEE- International Conference on Control, Automation and Information Sciences (ICCAIS) 2012 [25] L.R Rabiner, “A Tutorial on Hidden Markov Models And Selected Applications In Speech Recognition,” in Proceedings of the IEEE, vol 77, no 2, pp 257-285, February 1989 [26] J.G Wilpon, C.H Lee, and L.R Rabiner, “Application of Hidden Markov Models for Recognition of a Limited Set of Words in Unconstrained Speech,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 254-257, Glasgow, Scotland, 1989 [27] J Pihl, T Svendsen, M H Johnsen , “A VLSI Implementation of Pdf Computations in HMM-Based Speech Recognition”, , Proceedings of the IEEE [28] TÀI LIỆU THAM KHẢO 104 HVTH: Trần Văn Hoàng GVHD: TS Hoàng Trang Region Ten Conference (TENCON'96), Nov 1996 on Digital Signal, Processing Applications Jer Min Jou, Yeu-Horng Shiau, Chen-Jen Huang, “An Efficient VLSI Architecture for HMM-Based Speech Recognition”, Electronics, Circuits and Systems, 2001 ICECS 2001 The 8th IEEE, International Conference on, Volume: I ,2-5 Sept 2001 [29] S J Melnikoff, S F Quigley, M J Russell, “Performing Speech Recognition on Multiple Parallel Files Using Continuous Hidden Markov Models on an FPGA”, International Conference on Field Programmable Logic and Applications, 2002, pages: 202-211 [30] Wei Han; Kwok-Wai Hon; Cheong-Fat Chan; Tam Lee; Chiu-Sing Choy; Kong-Pang Pun; Ching, P.C., “An HMM-Based Speech Recognition IC”; Circuits and Systems, 2003 ISCAS '03 Proceedings of the 2003 International Symposium on , Volume: , May 25-28, 2003, page(s): 744 -747 [31] Meng Yuan, Tan Lee, Ching, P.C., “Speech Recognition on DSP: Issues on Computational Efficiency and Performance Analysis”, Communications, Circuits and Systems, 2005 Proceedings 2005, International Conference on, Volume 2, 27-30 May 2005 [32] Sujuan Ke, Yibin Hou, Zhangqin Huang, Hui Li, “A HMM Speech Recognition System Based on FPGA”, IEEE computer society, 2008 [33] S Yoshizawa, Y Miyanaga, and N Wada , “A low power VLSI design of a HMM based speech recognition system”, PhD thesis, Hokkaido University Sapporo, Japan, pages: 489-492, 2001 [34] J Picone, “Continuous Speech Recognition Using Hidden Markov Models,” IEEE ASSP Magazine, vol 7, no 3, pp 26-41, July 1990 [35] G D Forney , “The Viterbi algorithm”, Proc IEEE, vol.61, Mar 1973, pp.268-278 [36] J Picone, “The Demographics of Speaker Independent Digit Recognition”, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1990, pp 105-108 [37] E.L Bocchieri and G.R Doddington, “Frame Specific Statistical Features for Speaker-Independent Speech Recognition”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 34, no 4, 1986, pp 755-764 [38] A Lee, T Kawahara, and K Shikano, “Julius—An open source, realtime, large-vocabulary recognition engine” Proceedings of European Conference on Speech Communication Technology (EUROSPEECH), Denmark, 2001, pp.1691–1694 [39] A Waibel, T Hanazawa, G Hintom, K Shikano, and K.J Lang, “Phoneme Recognition Using Time-Delay Neural Networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 37, no 3, 1989, pp 328-339 [40] TÀI LIỆU THAM KHẢO 105 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng K Kita, T Kawabata, H Saito, “HMM Continuous Speech Recognition Using Predictive LR Parsing,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 703-706, May 1989 [41] J.G Wilpon, C.H Lee, and L.R Rabiner, “Improvements in Connected Digit Recognition Using Higher Order Spectral and Energy Features,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 349-352, 1991 [42] H Ney, “Experiments on Mixture-Density Phoneme Modelling For the Speaker- Independent 1000-Word Speech Recognition DARPA Task,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 713-716, Albuquerque, New Mexico, USA, April 1990 [43] Y.L Chow, M.O Dunham, O.A Kimball, M.A Krasner, G.F Kubala, J Makhoul, P.J Price, S Roucos, and R.M Schwartz, “BYBLOS: The BBN Continuous Speech Recognition System,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 89-92, Dallas, Texas, USA, April 1987 [44] F Kubala, M Feng, J Makhoul, and R Schwartz, “Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System,” in Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, Inc., Palo Alto, California, USA, pp 100-105, February 1989 [45] M.M Hochberg, L.T Niles, J.T Foote, and H.F Silverman, “Hidden Markov Model/Neural Network Training Techniques for Connected Alphadigit Speech Recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 109-112, Toronto, Ontario, Canada, April 1991 [46] T.D Harrison, and F Fallside, “A Connectionist Model for Phoneme Recognition in Continuous Speech,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 417-420, Glasgow, Scotland, May 1989 [47] K.F Lee, Automatic Speech Recognition: the Development of the SPHINX System, Kluwer Academic Publishers, Boston, Massachusetts, USA, 1989 [48] P Haffner, M Franzini, and A Waibel, “Integrating Time Alignment and Neural Networks for High Performance Continuous Speech Recognition”, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1991, pp 105-109 [49] L Fissore, P Laface, G Micca, “Comparison of Discrete and Continuous HMMs in a CSR Task Over the Telephone,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 253-256, Toronto, Ontario, Canada, April 1991 [50] TÀI LIỆU THAM KHẢO 106 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng L Fissore, P Laface, G Micca, and R Pieraccini, “Lexical Access to Large Vocabularies for Speech Recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 37, no 8, pp 1197-1213, August 1989 [51] S Kimura, “100,000-Word Recognition Using Acoustic-Segment Networks,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 61-64, Albuquerque, New Mexico, USA, April 1990 [52] A Averbuch, L Bahl, R Bakis, P Brown, A Cole, G Daggett, S Das, K Davies, S De Gennaro, P de Souza, E Epstein, D Fraleigh, F Jelinek, S Katz, B Lewis, R Mercer, A Nadas, D Nahamoo, M Picheny, G Shichman, and P Spinelli, “An IBM-PC Based Large- Vocabulary Isolated Utterance Speech Recognizer,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 53-56, Tokyo, Japan, April 1986 [53] F Jelinek, “The Development of an Experimental Discrete Dictation Recognizer,” in Proceedings of the IEEE, vol 73, no 11, pp 1616-1624, November 1985 [54] V.N Gupta, M Lennig, and P Mermelstein,“Integration Of Acoustic Information In A Large Vocabulary Word Recognizer,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 697700, Dallas, Texas, USA, April 1987 [55] L Deng, V Gupta, M Lennig, P Kenny, and P Mermelstein, “Acoustic Recognition Component of an 86,000-word Speech Recognizer,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 741-784, Albuquerque, New Mexico, USA, April 1990 [56] J Koo, C.K Un, H.S Lee, H.R Kim, and M.W Koo, “A Recognition Time Reduction Algorithm for Large-Vocabulary Speech Recognition,” in Proceedings of the International Conference on Spoken Language Processing, pp 253-256, Kobe, Japan, November 1990 [57] D Paul, “A Speaker-Stress Resistant Isolated Word Recognizer,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 713-716, Dallas, Texas, USA, April 1990 [58] D Paul, “The Lincoln Robust Continuous Speech Recognizer,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 556-559, Glasgow, Scotland, May 1989 [59] S Seneff, “A Joint Synchrony/Mean-Rate Model of Auditory Speech Processing”, Journal of Phonetics, vol 16, no 1, pp 55-76, January 1988 [60] V Zue, J Glass, M Phillips, and S Seneff, “Acoustic Segmentation and Phonetic Classification in the SUMMIT System,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 389392, Glasgow, Scotland, May 1989 [61] K Shirai, N Hosaka, E Kitagawa, and T Endou, “Speaker Adaptable Phoneme Recognition Selecting Reliable Acoustic Features based on Mutual [62] TÀI LIỆU THAM KHẢO 107 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng Information,” in Proceedings of the International Conference on Spoken Language Processing, pp 353-356, Kobe, Japan, November 1990 S Mizuta and K Kakajima, “An Optimal Discriminative Training Method for Continuous Mixture Density HMMs,” Proceedings of the International Conference on Spoken Language Processing, 1990, pp 245-248 [63] K Yoshida, T Watanabe, and S Koga, “Large Vocabulary Word Recognition Based on Demisyllable Hidden Markov Model Using Small Amount of Training Data,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 1-4, Glasgow, Scotland, 1989 [64] D Lubensky, “Word Recognition Using Neural Nets, Multi-State Gaussian and K-Nearest Neighbor Classifiers,” Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1991, pp 141-144 [65] S Furui, “On the Use of Hierarchical Spectral Dynamics in Speech Recognition,” in Proceedings IEEE International Conference on Acoustics,Speech, and Signal Processing, pp 789-792, Albuquerque, New Mexico, USA, April 1990 [66] S Matsunaga, S Sagayama, S Homma, and S Furui, “A Continuous Speech Recognition System Based on A Two- Level Grammar Approach,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 589-592, Albuquerque, New Mexico, USA, April 1990 [67] T Matsuoka and K Shikano, “Robust HMM Phoneme Modeling for Different Speaking Styles”, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1991, pp 265-268 [68] Y Zhao and H Wakita, “Experiments With A Speaker-Independent Continuous Speech Recognition System on the TIMIT Database,” in Proceedings of the International Conference on Spoken Language Processing, pp 697-700, Kobe, Japan, November 1990 [69] V Steinbiss, A Noll, A Paeseler, H Ney, H Bergmann, C Dugast, H.H Hamer, H Piotrowski, H Tomaschewski, A Zielinski, “A 10,000-Word Continuous Speech Recognition System,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 57-60, Albuquerque, New Mexico, USA, April 1990 [70] M.J Russel, K.M Ponting, S.M Peeling, S.R Browning, J.S Bridle, R.K Moore, I Galiano, P Howell, “The ARM Continuous Speech Recognition System,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 69-72, Albuquerque, New Mexico, USA, April 1990 [71] M Weintraub, H Murveit, M Cohen, P Price, J Bernstein, G Baldwin, and D Bell, “Linguistic Constraints in Hidden Markov Model Based Speech Recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 699-702, Glasgow, Scotland, 1989 [72] H Murveit, and M Weintraub, “1000- Word Speaker-Independent Continuous- Speech Recognition Using Hidden Markov Models,” in [73] TÀI LIỆU THAM KHẢO 108 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 115-118, New York, NY, USA, April 1988 W.S Meisel, M.T Anikst, S.S Pirzadeh, J.E Schumacher, M.C Soares, and D.J Trawick, “The SSI Large Vocabulary Speaker-Independent Continuous Speech Recognition System”, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1991, pp 273-276 [74] J Picone, “The Demographics of Speaker Independent Digit Recognition”, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 105-108, Albuquerque, New Mexico, USA, April 1990 [75] G.R Doddington, “Phonetically Sensitive Discriminants for Improved Speech Recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 556-559, Glasgow, Scotland, May 1989 [76] E.L Bocchieri and G.R Doddington, “Frame Specific Statistical Features for Speaker-Independent Speech Recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 34, no 4, pp 755-764, August 1986 [77] S Makino, A Ito, M Endo, and K Kido, “A Japanese Text Dictation System Based on Phoneme Recognition and a Dependency Grammar,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 273-276, Toronto, Ontario, Canada, April 1991 [78] B Dautrich, L.R Rabiner, T.B Martin, “On the Effects of Varying Filter Bank Parameters on Isolated Word Recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 31, no 4, pp 793-807, August 1983 [79] James W Cooley, John W Tukey, "An algorithm for the machine calculation of complex Fourier series" Journal of Mathematics of Computation, vol 19, pp.297–301, 1965 [80] J.N Mitchell Jr., “Computer Multiplication and Division Using Binary Logarithms”, IRE Trans Electronic Computers, vol 11, pp 512-517, Aug 1962 [81] A.M Noll, “Problems of Speech Recognition in Mobile Environments,” in Proceedings of the International Conference on Spoken Language Processing, pp 1133-1136, Kobe, Japan, November 1990 [82] Y Nakadai and N Sugamura, “A Speech Recognition Method For Noise Environments Using Dual Inputs,” in Proceedings of the International Conference on Spoken Language Processing, pp 1141-1144, Kobe, Japan, November 1990 [83] S Tamura, “An Analysis of a Noise Reduction Neural Network,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 2001-2004, Glasgow, Scotland, May 1989 [84] J Picone, G.R Doddington, and B.G Secrest, “Robust Pitch Detection in a Noisy Telephone Environment,” in Proceedings IEEE International Conference [85] TÀI LIỆU THAM KHẢO 109 GVHD: TS Hoàng Trang HVTH: Trần Văn Hoàng on Acoustics, Speech, and Signal Processing, pp 1442-1445, Dallas, Texas, USA, April 1987 P Gomez, A Alvaren and R Martinez, M Perez, "A DSP based modular architecture for noise cancellation and speech recognition" Proceedings of IEEE ISCAS98, 1998, pp.178-181 [86] [87] Online: http://aurora.hsnr.de/ Online: http://www.ldc.upenn.edu/Catalog/docs/LDC93S10/tidigits.readme.html [88] Shingo Yoshizawa, Naoya Wada, Noboru Hayasaka, and Yoshikazu Miyanaga, “VLSI architecture for HMM-based speech recognition system and its verification platform”, Proceeding of International Symposium on Communications and Information Technologies ISCIT 2004, pp.700-703 [89] Y Choi, K You, J Choi, and W Sung, “A real-time FPGA-based speech recognizer with optimized DRAM access,” IEEE Transaction on Circuits System I, Regular papers, vol 57, no 8, 2010, pp 2119–2131 [90] Guangji He, Takanobu Sugahara, Yuki Miyamoto, Tsuyoshi Fujinaga, Hiroki Noguchi, Shintaro Izumi, Hiroshi Kawaguchi, and Masahiko Yoshimoto, “A 40 nm 144 mW VLSI Processor for Real-Time 60-kWord Continuous Speech Recognition”, IEEE Transactions on circuits and systems—I: regular papers, vol 59, No 8, pp 1656-1666, August 2012 [91] Prof Vivek Agarwal, “Embedded Operating Systems for Real-Time Applications”, M.Tech credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted in November 2002 [92] Michael Barr, “Programming Embedded systems in C and C++” CA: O’Reilly & Associates, 1999 [93] Hermann Ney, Stefan Ortmannsm (1999), “Dynamic Programming Search for Continuous Speech Recognition”, IEEE Signal Processing Magazine, RWTH Aachen - University of Technology, D - 52056 Aachen, Germany [94] J O Hamblen, T.S.Hall, “Using System-on-a-Programmable-Chip Technology to Design Embedded Systems”, IJCA, Vol 13, No 3, Sept 2006 [95] Karim Yaghmour, Jon Masters, Gilad Ben-Yossef, Philippe Gerum, Michael Opdenacker, Steven Rostedt, Philippe Gerum, “Building Embedded Linux Systems”, O’Reilly, Aug2008 [96] Wei HAN, Cheong-Fat CHAN, Chiu-Sing CHOY and Kong-Pang PUN, “An Efficient MFCC Extraction Method in Speech Recognition”, Circuits and Systems, 2006 ISCAS 2006 IEEE International Symposium, May 2006, pages: 145-148 [97] Wei HAN, Cheong-Fat CHAN, Chiu-Sing CHOY and Kong-Pang PUN, “A Speech Recognizer with Selectable Model Parameters”, Circuits and Systems, 2005 ISCAS 2005 IEEE International Symposium, May 2005, pages: 58425845 [98] TÀI LIỆU THAM KHẢO 110 HVTH: Trần Văn Hoàng GVHD: TS Hoàng Trang PHỤ LỤC Các báo đăng hội nghị tạp chí quốc tế: Tran Van Hoang, Nguyen Ly Thien Truong, Hoang Trang, Xuan-Tu Tran, “Design and Implementation of a SoPC System for Speech Recognition”, Lecture Notes in Electrical Engineering, Multimedia and Ubiquitous Engineering MUE 2013, Springer Hoang Trang, Tran Van Hoang, “Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition”, Journal of IKEEE, Vol 17, No 3, 241 ~ 247, 2013 Tran Xuan Thien, Hoang Trang, Tran Van Hoang, Nguyen Huu Phuoc, “The effect of MFCC extraction methods on speech recognition embedded system”, ICGHIT 2014 (International Conference on Green and Human Information Technology 2014), 2014, Ho Chi Minh – Vietnam PHỤ LỤC 111 PHẦN LÝ LỊCH TRÍCH NGANG Lý lịch sơ lược: Họ tên: Trần Văn Hồng Giới tính: Nam Ngày sinh: 25/08/1989 Nơi sinh: Thừa Thiên Huế Địa liên lạc: 1/4 Nguyễn Trung Trực, Phường 5, Quận Bình Thạnh, TP Hồ Chí Minh Số điện thoại liên lạc: 0937459776 Email: tvhoang@hcmut.edu.vn; hoangvan300489@gmail.com Quá trình đào tạo: Đại Học: Thời gian đào tạo: từ 09/2007 đến 01/2012 Hệ đào tạo: Chính quy Nơi đào tạo: Đại học Bách Khoa – ĐHQG TP.HCM Ngành học: Điện tử - Viễn thông Cao học: Thời gian đào tạo: từ 09/2012 đến Nơi đào tạo: Đại học Bách Khoa – ĐHQG TP.HCM Ngành học: Kỹ Thuật Điện Tử Q trình cơng tác: Từ tháng 06/2012 tới nay: Làm việc Bộ môn Điện tử, Khoa Điện – Điện tử, Trường Đại học Bách Khoa – ĐHQG TP HCM ... nhận dạng câu nói Tiếng Việt, ứng dụng hướng dẫn đường trường Đại Học Bách Khoa – ĐHQG TP. HCM giọng nói? ?? có mục tiêu tổng quát là: + Thiết kế hệ thống nhúng nhận dạng câu nói tiếng Việt + Thiết bị... … .Thiết kế hệ thống nhúng nhận dạng câu nói Tiếng Việt, thiết bị có khả giao tiếp với người đi? ??u khiển thơng qua câu nói qua lại, hướng tới ứng dụng hướng dẫn đường Trường Đại học Bách Khoa – ĐHQG. .. dụng hướng dẫn đường trường Đại Học Bách Khoa – ĐHQG TP. HCM? ?? với mục tiêu thiết kế hệ thống nhúng có hệ đi? ??u hành có khả giao tiếp với người đi? ??u khiển thơng qua câu nói tiếng Việt, hướng ứng dụng

Định dạng
Số trang	127
Dung lượng	3 MB