Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 106 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
106
Dung lượng
4,14 MB
Nội dung
資 逢甲大學 訊 工 程 學 博士論文 系 基於深度學習來輔助視覺障礙者認知 之多功能嵌入式系統 A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People 指導教授: 竇其仁 林峰正 研 究 生:吳友輝 中 華 民 國 一 百 一 十 年 一 月 A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People Acknowledgement First and foremost, I would like to express my sincere gratitude to my advisor, Prof Chyi-Ren Dow, for his motivation, extensive experience, and immense knowledge I am very grateful for all his ideas, time, and funding contributions that laid the foundation for my research experience The passion and enthusiasm he has for the research were inspired and motivational to me, especially during tough periods in the Ph.D pursuit I am also thankful for the excellent pieces of advice he has offered as an outstanding professor These pieces of advice are valuable lessons for me in all the time of research and the future It is an honor for me to be one of his Ph.D students Again, I would like to convey my heartfelt gratitude to him I would like to express my sincere gratitude to my co-advisor, Prof Feng-Cheng Lin, for his scientific advice, knowledge, and valuable guidance I am very grateful for all his ideas, time, and the supported devices He always encourages and helps me to promote strengths in my research, especially appreciates our research results I am also thankful for his excellent pieces of advice It is an honor for me to be his first Ph.D student From the bottom of my heart, I would like to express my sincere gratitude to him again I would like to thank my dissertation committee: Prof Hsiao-Hsi Wang, Prof Tsung-Chuan Huang, Prof Lin-Huang Chang, Prof Cheng-Min Lin, and Prof Hsi-Min Chen, for their meaningful suggestions, which help me continue to improve and develop my research In addition, I am indebted to Szu-Yi Ho (Toni), who gave me numerous insightful discussions and suggestions She supported and shared with me to resolve my faced challenges in related research issues, especially publishing papers i FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People I would always remember my fellow labmates at the Mobile Computing lab for the inspiring discussions, unconditional supports, friendship, and for all the fun-time we have had in the last four years In particular, my gratitude goes to Ms Yu-Yun Chang (Amber) and Mr Kuan-Chieh Wang (Rich) for providing essential local supports during the years Last but not least, I am grateful to my family members for all their encouragements and faith in me They gave me enough moral support, encouragement, and motivation to accomplish the personal goals And most of all for my parents, who raised me up with unconditional love and gave me unlimited support in every decision I have made ii FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People 摘要 視力障礙的人在生活中面臨許多困難,例如,無輔助導航,獲取訊息和情境 感知。儘管許多智慧型裝置可用來幫助視障人士,但大多數只在提供導航幫助和 避障。在本研究中,我們專注於情境感知和周遭物件辨識。與大多數主從式架構 或是單台桌機運算所進行的研究不同,我們提出了一種基於深度學習的多功能嵌 入式系統,以幫助視覺障礙者對周遭環境的認知。我們提出的系統還克服了使用 上的區域限制,並增強了導航任務的能力。我們使用嵌入式設備(NVIDIA Jetson AGX Xavier)作為主要的處理器模組,並連接到其他外部周邊設備(像是網路鏡 頭、藍芽喇叭、螢幕、滑鼠和藍芽音訊配對器)。它幾乎可以執行所有主機應有 的系統功能,包括影像蒐集,影像處理和結果呈現。首先,系統的網路鏡頭用於 擷取使用者當前場景。然後,透過遙控器執行所選取的功能來處理該圖像。最後, 系統將當前場景的結果描述,從文字描述轉為語音,並由藍芽喇叭將其傳達給使 用者。該系統的三個主要功能,包括臉部辨識和情緒分類感知(第一個功能), 年齡和性別分類(第二個功能)以及物體檢測(第三個功能)。該系統是基於不 同的深度學習模型構建的,但對於視力障礙的人使用上可能會成為挑戰。因此, 我們還提出了一種可以有效選擇功能的過程,以減輕視障人士的系統控制的複雜 性。最後,完成設計,製造和測試原型,並進行實驗驗證。利用原型機上獲得的 實驗結果,證明了所提系統的性能可靠度。基於辨識和分類準確性、計算時間及 實際適用性的結果證明,該系統是可行的,並且可以有效地用於幫助視障人士。 關鍵詞:年齡分類,情緒分類,臉部辨識,性別分類,對象檢測。 iii FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People Abstract Individuals with visual impairment confront many difficulties in their living, for example, unassisted navigation, access to information, and context-aware Although many smart devices were designed to assist visually impaired people, most of them aimed to provide navigation assistance and obstacle avoidance In this study, we focus on context-aware and surrounding object recognition Unlike most studies, which were implemented on servers or laptop computers, we propose a multifunctional embedded system based on deep learning for assisting the cognition of visually impaired people This proposed system also overcomes the limitation of area usage and enhances the capabilities of navigation tasks An embedded device (NVIDIA Jetson AGX Xavier) is employed as a central processor module in the system and connected to peripheral devices (webcam, speaker, monitor, mouse, and Bluetooth audio transmitter adapter) It performs almost all the system functions, including image collection, image processing, and result description First, the webcam of the system is used to capture the current scene of the user Then, this image is processed by following the selected function that is executed through a remote controller Lastly, the system converts the result description of the current scene from text to voice and delivers it to the user by the speaker Three main functions of this system include face recognition and emotion classification (the first function), age and gender classification (the second function), and object detection (the third function) This system is built based on different deep learning models, and it may become a challenge for visually impaired people Therefore, we also propose a process that can select functions efficiently to ease the complexity of the system control for visually impaired people Finally, a prototype is designed, fabricated, and tested for experimental validation The performance of the proposed system is demonstrated using results obtained from the experiments on the prototype iv FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People Results based on recognition and classification accuracy, computing time, and practical applicability prove that the proposed system is feasible and can be effectively used to assist visually impaired people Keywords: Age Classification, Emotion Classification, Face Recognition, Gender Classification, Object Detection v FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People Table of Contents Acknowledgement i 摘要 iii Abstract iv Table of Contents vi List of Figures ix List of Tables xi Chapter Introduction 1.1 Motivation 1.2 Overview of Research 1.3 Dissertation Organization Chapter Related Work 2.1 Face Recognition 2.2 Gender, Age and Emotion Classification 11 2.3 Object Detection 14 2.4 Smart Healthcare 16 Chapter System Overview 19 3.1 System Architecture 19 3.2 Function Selection 21 3.2.1 Remote Controller 21 3.2.2 Function Selection Process 23 3.3 NVIDIA Jetson AGX Xavier 25 3.3.1 NVIDIA Jetson Family Introduction 25 3.3.2 Technical Specification of NVIDIA Jetson AGX Xavier 26 Chapter Face Recognition Function 29 vi FCU e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People 4.1 Overview of Face Recognition Function 29 4.2 Dataset Collection 30 4.3 Model Architectures 33 4.4 Enrolling a New Person 36 Chapter Gender, Age and Emotion Classification Function 38 5.1 Overview of Gender, Age and Emotion Classification Function 38 5.2 Gender Classification Schemes 39 5.3 Age Classification Schemes 41 5.4 Emotion Classification Schemes 42 Chapter Object Detection Function 47 6.1 Overview of Object Detection Function 47 6.2 Object Detection Schemes 48 6.2.1 Two-Stage Detectors 48 6.2.2 One-Stage Detectors 49 6.3 Arrangement of Result Description 52 Chapter System Prototype and Implementation 53 7.1 Devices in System Implementation 53 7.2 Initialization Program in Embedded System 55 7.3 Dataset Collection 58 Chapter Experimental Results 60 8.1 Evaluation of Face Recognition Results 60 8.1.1 Results Evaluation in Terms of Precision and Recall 60 8.1.2 Analysis Results of Face Recognition 63 8.1.3 Results Comparison in Multiple Standard Datasets 64 8.2 Examination Results of Gender, Age and Emotion Classification 65 8.2.1 Evaluation Results of Gender Classification 65 vii FCU e-Theses & Dissertations (2021) ... e-Theses & Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People Abstract Individuals with visual impairment confront... Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People 1.1 Motivation Visually impaired people confront numerous visual challenges every day, such as... Dissertations (2021) A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People (2) Gender, age and emotion classification issues Gender classification