1. Trang chủ
  2. » Luận Văn - Báo Cáo

Kết hợp đặ trưng diện mạo và chuyển động trong biểu diễn hoạt động của người sử dụng mạng nơ ron tích chập

49 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

The problem of human actionrecognition can be defined as below.∙ Input: A video or a sequence of consecutive frames that contain a human action.∙ Output: Label of the action that that bel

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI Khổng Văn Minh KẾT HỢP ĐẶC TRƯNG DIỆN MẠO VÀ CHUYỂN ĐỘNG TRONG BIỂU DIỄN HOẠT ĐỘNG CỦA NGƯỜI SỬ DỤNG MẠNG NƠ RON TÍCH CHẬP Chuyên ngành : Hệ thống thông tin LUẬN VĂN THẠC SĨ KHOA HỌC HỆ THỐNG THÔNG TIN NGƯỜI HƯỚNG DẪN KHOA HỌC : TS Trần Thị Thanh Hải Hà Nội – Năm 2018 Tai ngay!!! Ban co the xoa dong chu nay!!! 17057204899671000000 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY KHONG VAN MINH COMBINATION OF APPEARANCE AND MOTION INFORMATION IN HUMAN ACTION REPRESENTATION USING CONVOLUTIONAL NEURAL NETWORK FIELD OF STUDY : INFORMATION SYSTEM MASTER’S THESIS IN INFORMATION SYSTEM SUPERVISOR: PhD: Tran Thi Thanh Hai HANOI – 2018 SĐH.QT9.BM11 CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập – Tự – Hạnh phúc BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ Họ tên tác giả luận văn : Khổng Văn Minh Đề tài luận văn: Kết hợp đặc trưng diện mạo chuyển động biểu diễn hoạt động người sử dụng mạng nơ ron tích chập Chun ngành: Hệ thống thơng tin Mã số SV: CBC17021 Tác giả, Người hướng dẫn khoa học Hội đồng chấm luận văn xác nhận tác giả sửa chữa, bổ sung luận văn theo biên họp Hội đồng ngày… .………… với nội dung sau: …………………………………………………………………………………………………… …………………………………………………………………………………………………… …………………………………………………………………………………………………… …………………………………………………………………………………………………… …………………………………………………………………………………………………… …………………………………………………………………………………………………… …………………………………………………………………………………… Ngày Giáo viên hướng dẫn CHỦ TỊCH HỘI ĐỒNG tháng năm Tác giả luận văn Abstract In this thesis, I focus on solving the action recognition problem in video or a stack of consecutive frames This problem plays an important role in surveillance systems that are very popular nowadays There are two main solutions to solve this problem: using hand-crafted features or using learned features using deep learning Both of the solutions have pros and cons and the solution that I study belongs to the secondategory Recently, advanced techniques relying on convolutional neural networks produced impressive improvement compared to traditional handcrafted features based techniques Besides, literature researches also showed that the use of different streams of data will help to increase recognition performance This paper proposes a method that exploits both RGB and optical flow for human action recognition Specifically, we deploy a two stream convolutional neural network that takes RGB and optical flow computed from RGB stream as inputs Each stream has architecture of an existing 3D convolutional neural network (C3D) which has been shown to be compact but efficient for the task of action recognition from video Each stream works independently then is combined by early fusion or late fusion to output the recognition results We show that the proposed two-stream 3D convolutional neural network (2stream C3D) outperforms one stream C3D on two benchmark datasets UCF101 (from 82.79% to 89.11%), HMDB51 (from 45.71 % to 60.87%) and CMDFALL (from 65.35% to 71.77%) Acknowledgments Firstly, I would like to express my deep gratitude to my supervisor PhD Tran Thi Thanh Hai for supporting my research direction, which allowed me to explore new ideas in the field of computer vision and machine learning I would like to thank for her supervision, encouragement, motivation, and support and her guidance helped me throughout the research work and in writing of the thesis I would like to acknowledge the International Research Institute MICA, HUST for providing me the great research environment I wish to express my gratitude to the teachers in Computer vision department, MICA for giving me the opportunity to work and acquire great research experience I would like to acknowledge the School of Information and Communication Technology for providing me the knowledge and the opportunity to study I would like to thank my friends for supporting me in my study Last but not least, I would like to convey my deepest gratitude to my family for their supports, and sacrifices during my studies Contents Introduction to Human Action Recognition 1.1 Human Action Recognition problem 1.2 Overview of human action recognition approach 12 1.2.1 Hand crafted feature based methods 12 1.2.2 Deep learning based methods 13 1.2.3 Purpose of thesis 13 State-of-the-art on HAR using CNN 15 2.1 Introduction to Convolutional Neural Networks 15 2.2 2D Convolutional Neural Networks 17 2.3 3D Convolutional Neural Networks 18 2.4 Multistream Convolutional Neural Networks 20 Proposed method for HAR using multistream C3D 23 3.1 General framework 23 3.2 RGB stream 23 3.3 Optical Flow Stream 25 3.4 Fusion of multistream 3D CNN 26 3.4.1 Early fusion 26 3.4.2 Late fusion 27 Experimental Results 4.1 28 Datasets 28 4.1.1 UCF101 dataset 28 4.1.2 HMDB51 dataset 28 4.1.3 CMDFALL dataset 29 4.2 Experiment setup 30 4.3 Single stream 34 4.4 Multiple stream 35 Conclusion 43 5.1 Pros and Cons 43 5.2 Discussion 43 List of Figures 1-1 Human Action Recognition Problem 10 1-2 Human Action Recognition phases 11 1-3 Hand-crafted feature based method for Human Action Recognition 12 1-4 Deep learning method for Human Action Recognition problem 13 2-1 Main layers in Convolutional Neural Networks 15 2-2 Fusion techniques used in [1] 17 2-3 3D convolution operator 19 2-4 Two stream architecture for Human Action Recognition in [2] 21 3-1 General framework for human action recognition 24 3-2 Early fusion method by concatenate two L2-normalization feature vectors 26 3-3 Late fusion by averaging class score 27 4-1 The class labels in UCF101 dataset 29 4-2 The class labels in HMDB51 dataset 30 4-3 Experiment steps for each dataset 30 4-4 The step using C3D for experiment 32 4-5 C3D clip and video prediction 35 4-6 Confusion matrix of two stream on UCF101 36 4-7 Confusion matrix of two stream on HMBD51 36 4-8 Confusion matrix of two stream on CMDFALL 37 4-9 In HMDB51, the most confused action in the RGB stream is swing baseball 60% of its videos are confused with throw 39 4-10 Most benefit classes in UCF101 when combining compared to RGB stream 39 4-11 Most benefit classes in HMDB51 when combining compared to RGB stream 40 4-12 Most benefit classes in HMDB51 when combining compared to RGB stream 40 4-13 Classes of UCF101 in which RGB stream perform better 40 4-14 Classes of UCF101 in which Flow stream perform better 41 4-15 Classes of HMDB51 in which RGB stream perform better 41 4-16 Classes of HMDB51 in which Flow stream perform better 41 4-17 Classes of CMDFALL in which RGB stream perform better 41 4-18 Classes of CMDFALL in which Flow stream perform better 42 Acronyms 3DCNN 3D Convolutional Neural Networks 1, 19 CNN Convolutional Neural Networks 1, 15, 17, 20 HAR Human Action Recognition 1, 9, 23 HOG Histogram of Gradients 12 MBH Motion boundary histograms 12 SIFT Scale-invariant feature transform 12

Ngày đăng: 22/01/2024, 16:52

Xem thêm:

w