(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data

Thông tin tài liệu

(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data(Luận văn thạc sĩ) A study on deep learning techniques for human action representation and recognition with skeleton data

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY PHAM DINH TAN A STUDY ON DEEP LEARNING TECHNIQUES FOR HUMAN ACTION REPRESENTATION AND RECOGNITION WITH SKELETON DATA DOCTORAL DISSERTATION IN COMPUTER ENGINEERING Hanoi−2022 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY PHAM DINH TAN A STUDY ON DEEP LEARNING TECHNIQUES FOR HUMAN ACTION REPRESENTATION AND RECOGNITION WITH SKELETON DATA Major: Computer Engineering Code: 9480106 DOCTORAL DISSERTATION IN COMPUTER ENGINEERING SUPERVISORS: Assoc Prof Vu Hai Assoc Prof Le Thi Lan Hanoi−2022 DECLARATION OF AUTHORSHIP I, Pham Dinh Tan, declare that the dissertation titled "A study on deep learning techniques for human action representation and recognition with skeleton data" has been entirely composed by myself I assure some points as follows: This work was done wholly or mainly while in candidature for a Ph.D research degree at Hanoi University of Science and Technology The work has not been submitted for any other degree or qualifications at Hanoi University of Science and Technology or any other institution Appropriate acknowledgment has been given within this dissertation, where reference has been made to the published work of others The dissertation submitted is my own, except where work in the collaboration has been included The collaborative contributions have been indicated Hanoi, March 08, 2022 Ph.D Student SUPERVISORS Assoc Prof Vu Hai Assoc Prof Le Thi Lan i ACKNOWLEDGEMENT This dissertation is composed during my Ph.D at the Computer Vision Department, MICA Institute, Hanoi University of Science and Technology I am grateful to all people who contribute in different ways to my Ph.D journey First, I would like to express sincere thanks to my supervisors Assoc Prof Vu Hai and Assoc Prof Le Thi Lan for their guidance and support I would like to thank all MICA members for their help during my Ph.D study My sincere thank to Dr Nguyen Viet Son, Assoc Prof Dao Trung Kien, and Assoc Prof Tran Thi Thanh Hai for giving me a lot of support and valuable advice Many thanks to Dr Nguyen Thuy Binh, Nguyen Hong Quan, Hoang Van Nam, Nguyen Tien Nam, and Pham Quang Tien for their support I would like to thank colleagues at Hanoi University of Mining and Geology for all support during my Ph.D study Special thanks to my family for understanding my hours glued to the computer screen Hanoi, March 08, 2022 Ph.D Student ii ABSTRACT Human action recognition (HAR) from color and depth sensors (RGB-D), especially derived information such as skeleton data, is receiving the research community’s attention due to its wide range of applications HAR has many practical applications such as abnormal event detection from camera surveillance, gaming, human-machine interaction, elderly monitoring, and virtual/augmented reality In addition to the advantages in fast computation, low storage, and immutability with human appearance, skeleton data have shortcomings The shortcomings include pose estimation errors, skeleton noise in complex actions, and incompleteness due to occlusion Moreover, action recognition remains challenging due to the diversity of human actions, intraclass variations, and inter-class similarities The dissertation focuses on methods to improve the performances of action recognition using the skeleton data The proposed methods are evaluated using public skeleton datasets collected by RGB-D sensors Especially, they consist of MSR-Action3D/MICA-Action3D - datasets with high-quality skeleton data, CMDFALL - a challenging dataset with noise in skeleton data, and NTU RGB+D - a worldwide benchmark among the large-scale datasets Therefore, these datasets cover different dataset scales as well as the quality of skeleton data To overcome the limitations of the skeleton data, the dissertation presents techniques in different approaches First, as joints have different levels of engagement in each action, techniques for selecting joints that play an important role in human actions are proposed, including both Preset joint subset selection and automatic joint subset selection Two frameworks are evaluated to show the performance of using a subset of joints for action representation The first framework employs Dynamic Time Warping (DTW) and Fourier Temporal Pyramid (FTP), while the second one applies Covariance Descriptors extracted on both joint position and joint velocity Experimental results show that joint subsect selection helps improve action recognition performance on datasets with noise in skeleton data However, HAR based on hand-designed features could not exploit the inherent graph structure of the human skeleton Recent Graph Convolution Networks (GCNs) are studied to handle these issues Among GCN models, Attention-enhanced Adaptive Convolutional Network (AAGCN) is used as the baseline model AAGCN achieves state-of-the-art performance on large-scale datasets such as NTU-RGBD and Kinetics However, AAGCN employs only joint information Therefore, a Feature Fusion (FF) module is proposed in this dissertation The new model is named FF-AAGCN The performance of FF-AAGCN is evaluated on the large-scale dataset NTU-RGBD and CMDFALL The evaluation results show that the proposed method is robust to noise iii and invariant to the skeleton translation Particularly, FF-AAGCN achieves remarkable results on challenging datasets Finally, as the computing capacity of edge devices is limited, a lightweight deep learning model is expected for application deployment A lightweight GCN architecture is proposed to show that the complexity of GCN architecture can still be reduced depending on the dataset’s characteristics The proposed lightweight model is suitable for application development on edge devices Hanoi, March 08, 2022 Ph.D Student iv CONTENTS DECLARATION OF AUTHORSHIP i ACKNOWLEDGEMENT ii ABSTRACT iii CONTENTS viii ABBREVIATIONS viii SYMBOLS x LIST OF TABLES xiii LIST OF FIGURES xvi INTRODUCTION CHAPTER LITERATURE REVIEW 1.1 Introduction 1.2 An overview on action recognition 1.3 Data modalities for action recognition 1.3.1 Color data 10 1.3.2 Depth data 10 1.3.3 Skeleton data 1.3.4 Other modalities 11 11 1.3.5 Multi-modality 13 1.4 Skeleton data collection 1.4.1 Data collection from motion capture systems 13 13 1.4.2 Data collection from RGB+D sensors 14 1.4.3 Data collection from pose estimation 15 1.5 Benchmark datasets 17 1.5.1 MSR-Action3D 18 1.5.2 MICA-Action3D 19 1.5.3 CMDFALL 1.5.4 NTU RGB+D 19 19 1.6 Skeleton-based action recognition methods 21 1.6.1 Handcraft-based methods 21 1.6.1.1 Joint-based action recognition 1.6.1.2 Body part-based action recognition 22 24 v 1.6.2 Deep learning-based methods 27 1.6.2.1 Convolutional Neural Networks 28 1.6.2.2 Recurrent Neural Networks 29 1.7 Research on action recognition in Vietnam 32 1.8 Conclusion of the chapter 34 CHAPTER JOINT SUBSET SELECTION FOR SKELETON-BASED HUMAN ACTION RECOGNITION 35 2.1 Proposed methods 2.1.1 Preset Joint Subset Selection 36 36 2.1.1.1 Spatial-Temporal Representation 38 2.1.1.2 Dynamic Time Warping 38 2.1.1.3 Fourier Temporal Pyramid 2.1.2 Automatic Joint Subset Selection 39 39 2.1.2.1 Joint weight assignment 40 2.1.2.2 Most informative joint selection 41 2.1.2.3 Human action recognition based on MIJ joints 41 2.2 Experimental results 44 2.2.1 Evaluation metrics 44 2.2.2 Preset Joint Subset Selection 2.2.3 Automatic Joint Subset Selection 45 46 2.3 Conclusion of the chapter 55 CHAPTER FEATURE FUSION FOR THE GRAPH CONVOLUTIONAL NETWORK 56 3.1 Introduction 56 3.2 Related work on Graph Convolutional Networks 56 3.3 Proposed method 63 3.4 Experimental results 68 3.5 Discussion 77 3.6 Conclusion of the chapter 81 CHAPTER THE PROPOSED LIGHTWEIGHT GRAPH CONVOLUTIONAL NETWORK 82 4.1 Introduction 82 4.2 Related work on Lightweight Graph Convolutional Networks 82 4.3 Proposed method 84 4.4 Experimental results 86 vi 4.5 Application demonstration 95 4.6 Conclusion of the chapter 97 CONCLUSION AND FUTURE WORKS 99 PUBLICATIONS 101 BIBLIOGRAPHY 102 vii ABBREVIATIONS No Abbreviation Meaning 2D Two-Dimensional 3D Three-Dimensional AAGCN Attention-enhanced Adaptive Graph Convolutional Network AMIJ Adaptive number of Most Informative Joints AGCN Adaptive Graph Convolutional Network AS Action Set AS-GCN Actional-Structural Graph Convolutional Network BN Batch Normalization BPL Body Part Location 10 CAM Channel Attention Module 11 CCTV Close-Circuit Television 12 CNN Convolutional Neural Network 13 CovMIJ Covariance Descriptor on Most Informative Joints 14 CPU Central Processing Unit 15 CS Cross-Subject 16 CV Cross-View 17 DFT Discrete Fourier Transform 18 DTW Dynamic Time Warping 19 FC Fully Connected 20 FF Feature Fusion 21 FLOP Floating Point OPeration 22 FMIJ Fixed number of Most Informative Joints 23 fps f rames per second 24 FTP Fourier Temporal Pyramid 25 GCN Graph Convolutional Network 26 GCNN Graph-based Convolutional Neural Network 27 GPU Graphical Processing Unit 28 GRU Gated Recurrent Unit 29 HAR Human Action Recognition 30 HCI Human-Computer Interaction viii ... contains one action Action recognition aims to predict the action label from the skeleton data Action recognition can be extended to action detection Action detection focuses on labeling for skeleton. .. OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY PHAM DINH TAN A STUDY ON DEEP LEARNING TECHNIQUES FOR HUMAN ACTION REPRESENTATION AND RECOGNITION WITH SKELETON DATA Major:... skeleton data, CMDFALL - a challenging dataset with noise in skeleton data, and NTU RGB+D - a worldwide benchmark among the large-scale datasets Therefore, these datasets cover different dataset

Ngày đăng: 30/11/2022, 12:43

Xem thêm: