Nghiên cứu và phát triển phương pháp biểu diễn và nhận dạng hoạt động người dựa trên khung xương

MINISTRY OF EDUCATION AND TRAINING HA NOI UNIVERSITY OF SCIENCE AND TECHNOLOGY NGUYỄN TIẾN NAM Tien Nam NGUYEN HỆ THỐNG THÔNG TIN SKELETON-BASED HUMAN ACTIVITY REPRESENTATION AND RECOGNITION MASTER OF SCIENCE THESIS IN INFORMATION SYSTEM KHOÁ 2018B Hanoi - 2019 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY - Tien Nam NGUYEN SKELETON-BASED HUMAN ACTIVITY REPRESENTATION AND RECOGNITION Speciality: Information System MASTER OF SCIENCE THESIS IN: INFORMATION SYSTEM SUPERVISOR: Assoc Prof Thi Lan LE Hanoi - 2019 CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập – Tự – Hạnh phúc BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ Họ tên tác giả luận văn: Nguyễn Tiến Nam Đề tài luận văn: Nghiên cứu phát triển phương pháp biểu diễn nhận dạng hoạt động người dựa khung xương Chuyên ngành: Hệ thống thông tin Mã số SV: CBC18019 Tác giả, Người hướng dẫn khoa học Hội đồng chấm luận văn xác nhận tác giả sửa chữa, bổ sung luận văn theo biên họp Hội đồng ngày 26/10/2019 với nội dung sau: STT Yêu cầu hội đồng Gộp chương Giải thích lí lựa chọn phương pháp nhận dạng sử dụng đề tài Bổ sung độ đo đánh giá Precision, Recall, F1 Nội dung sửa chữa, bổ sung Đã gộp chương chương thành chương tên Các kết thực nghiệm (tên tiếng Anh: Experimental results) Học viên bổ sung thêm chi tiết lí lựa chọn phương pháp chương phần Học viên bổ sung thêm thông tin cách tính độ đo đánh giá trình bày chương phần (Evaluation metric) Các độ đo Precision, Recall F1 score sử dụng để đánh giá hệ thống nhận dạng Tuy nhiên, luận án, để so sánh với phương pháp đề xuất trước đó, tùy vào sở liệu mà độ đo khác sử dụng Cơ sở liệu MSRAction3D sử dụng độ xác (Accuracy) sở liệu CMDFall sử dụng độ đo F1 score Trong chỉnh sửa luận văn, bên cạnh độ đo sử dụng riêng cho sở liệu, học viên bổ sung thêm bảng 4.7 chương kết nhận dạng tất độ đo cho sở liệu thử nghiệm Ngày 07 tháng 11 năm 2019 Giáo viên hướng dẫn CHỦ TỊCH HỘI ĐỒNG Tác giả luận văn Acknowledgements I would first like to thank my thesis advisor Associate Professor Le Thi Lan, head of the Computer Vision Department at MICA Institute The door of Assoc.Prof Lan office was always open whenever I ran into a trouble spot or had a question about my research or writing She consistently allowed this thesis to be my own work, but steered me in the right the direction whenever she thought I needed it I would also like to thank the experts who were involved in the validation survey for this thesis: Dr.Vu Hai, Assoc Prof Tran Thi Thanh Hai, PhD student Pham Dinh Tan who participated and give me more useful information Without their passionate participation and input, the validation survey could not have been successfully conducted I would also like to acknowledge to School of Information and Communication technology where I have been created all the best conditional to make the master thesis, and I am gratefully indebted the teachers in SOICT for very valuable comments on this thesis Finally, I must express my very profound gratitude to my parents, my sister and also to my colleagues in Toshiba Software Development VietNam (Nhu Dinh Duc, Pham Van Thanh and many colleagues) for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis This accomplishment would not have been possible without them Thank you ! Abstract Human action recognition problem with the aim is to predict what action of people is making, is currently receiving increasing attention from computer vision researchers due to its widely potential applications in many fields such as human computer interaction, surveillance camera, robotics, health care Recently, the release of cost-effective depth cameras such as Microsoft Kinect and Asus Xtion PROLIVE allows to open new opportunities for HAR as they provide richer information of the scene Thanks to these sensors, besides color images, depth and skeleton information are also available Moreover, the latest research results on human pose estimation in RGB video show that the human pose and skeleton can be accurately estimated even in complex scenes Using skeleton information for human action recognition has several advantages in comparison with those using color and depth information As results, a wide range of methods for HAR using skeleton information have been introduced [1] The methods proposed for skeleton-based HAR can be categorized into two groups: hand-crafted features and deep learning Each has its own advantages and disadvantages Deep learning based techniques obtains impressive results several benchmark datasets However, they usually require large datasets and high performance computing hardware Among hand-crafted descriptors for action representation, Cov3DJ with covariance matrix of 3D joint positions proves its effectiveness and computational efficiency [2] To take into account the duration variation of action, a temporal hierarchy representation is introduced with multiple layers However, the disadvantage of Cov3DJ is that it uses of all joints in the skeleton, which causes computational burden and may become ineffective as each joint has a certain level of engagement in an action Moreover, the authors employs only joint positions as joint features It seems not good enough to represent action So other features in representation action are investigated (joints velocities), combined with joints positions to create more discrimination feature of each action This thesis improves the Cov3DJ method presented [2] by two improvements: (1) proposing two different schemes to select the most informative joints for action representation and (2) combining velocity information with positions of the joints for action representation To evaluate the effectiveness of the proposed method, extensive experiments have been performed on two public datasets (MSRAction3D [3] and CMDFall [4]) On MSRAction3D, the experimental results show that the proposed method obtains 6.17% of improvement over the original method and outperforms many state-of-theart methods On CMDFall dataset, the proposed method with F1 score of 0.64 outperforms the deep learning networks ResTCN (F1 score: 0.39) [4] and LSTM (F1 score: 0.46) [5] The contributions of the thesis have been published in an international conference Contents List of Acronymtypes x Introduction 1.1 Motivations 1.2 Challenges and open issues in skeleton-based HAR 1.3 Objectives and Contributions 1.4 Outline of the thesis State of the Art 2.1 Overview of skeletal data and skeleton-based human action recognition 2.2 Pre-processing techniques 2.3 Hand-crafted features-based approach 10 2.3.1 Spatial-temporal descriptors 10 2.3.2 Geometric descriptors 13 2.4 Deep learning based approaches 14 2.5 Conclusions 18 Proposed Method 19 3.1 The proposed approach 19 3.2 The most informative joints detection 20 3.2.1 Strategy (FMIJ) for most information joints detection 22 3.2.1.1 Detect candidate joints for each action 22 3.2.1.2 Select the most informative joints of each action 24 iv 3.2.2 3.3 3.4 Strategy (AMIJ) for most information joints detection 24 Action representation by covariance descriptor 25 3.3.1 Temporal covariance descriptor with position information 26 3.3.2 Temporal covariance descriptor with velocity information 26 3.3.3 Temporal hierarchy covariance descriptor 27 Classification with support vector machine 28 3.4.1 Linear separable training 28 3.4.2 Non linear separable training 29 Experimental results 4.1 31 Datasets 31 4.1.1 MSRAction3D 31 4.1.2 CMDFall 33 4.2 Evaluation metric 33 4.3 Experiment Environments 36 4.4 Evaluation of features used for joint representation 36 4.4.1 4.4.2 4.5 Results on MSRAction3D dataset 37 4.4.1.1 ActionSet1 37 4.4.1.2 ActionSet2 38 4.4.1.3 ActionSet3 40 Results on CMDFall dataset 41 Evaluation of the most informative joints selection 42 4.5.1 The effect of the number of most informative joints 42 4.5.2 Comparison between two strategies 44 4.6 Comparison with state-of-the-art methods 47 4.7 Time computation 48 Conclusions 50 5.1 Conclusions 50 5.2 Future works 51 Publications 52 v References 56 vi 4.5 Evaluation of the most informative joints selection This one can be explained by the fact that this dataset is collected from 50 different subjects In this dataset, due to the different action performing styles of different subjects, the velocity feature is not robust for action representation However, when combining with position features, the velocity help to improve the recognition rates This one can be seen through the confusion matrix in Fig 4.11 When seeing the (a) Position (b) Velocity (c) Combination Figure 4.11: Confusion matrix of CMDFall confusion matrix of this dataset, the position gives us the good results at class 2, 6, 17, 18, 19 and has many mistakes along class 3, and Otherwise, for velocity feature, it gave us some better results at these classes if compared with position feature In the combination feature, the results of these classes are also improved significantly in comparison with position feature However, in general, at the others class, the recognition is very low Once again the combination feature showed clear effectiveness 4.5 Evaluation of the most informative joints selection 4.5.1 The effect of the number of most informative joints In the experiments, I want to observe the effect of the number of most informative joints For this, I only use ActionSet1 of MSRAction3D dataset and employ t-SNE method [31] for data visualization As we saw in the Fig 4.12, with covariance features 42 4.5 Evaluation of the most informative joints selection from position of 20 joints, the instances of class 3, and are overlapped with each other while class and class 10 are clearly separated In the same way, with covariance features from velocity of 20 joints (see Fig 4.13), the instances of class and class are close Nevertheless the distribution of class and class 18 seem be more separated While combining position and velocity feature, the instances of action class are more separated (a) Full 20 joints (b) MIJ (c) 10 MIJs Figure 4.12: Comparison the distribution of each class based on covariance features on position in ActionSet1 of MSRAction3D (a) Full 20 joints (b) MIJ (c) 10 MIJs Figure 4.13: Comparison the distribution of each class based on covariance features on velocity in ActionSet1 of MSRAction3D When using with small number of most informative joints (1 or 2), instances of the classes are scattered with no structure Therefore, it is difficult to distinguish these classes The separation of the classes is even worse than that when using 20 joints However, when increasing the number of most informative joints (for instance 10 joints), the separation of classes is becoming more distinction even though the separation of some classes are still unclear In Fig 4.12c, class and are separated completely Some instances of the class are scattered Especially, when moving to the velocity 43 4.5 Evaluation of the most informative joints selection (a) Full 20 joints (b) MIJ (c) 10 MIJs Figure 4.14: Comparison the distribution of each class based on covariance features on combination of position and velocity in ActionSet1 of MSRAction3D feature, we can easily see the points of the same class are clustered perfectly This distribution is better than the distribution of data in case of using all 20 joints The results can be explained as follows: when the MIJs are small, the MIJs set of the classes are equivalent, there are difficult to distinguish those classes When the number of MIJs increases, using these MIJs for action representation allows to capture the specific characteristic of each action More significantly, when we used the MIJs instead of all joints for covariance matrix extraction, we can reduce the noises and focus on the joints that have strong impact in action representation However, when the MIJs is too big, nearly 20 joints, the features distribution gradually becomes unclear This makes the recognition performances decreasing 4.5.2 Comparison between two strategies These experiments aim to compare the performance of two strategies proposed for the most informative joint detection The results on each strategy are calculated in the same condition (one layer, kind of features: combination position and velocity) Although the initial assessment of CMDFall dataset is the F1 score, the two figures are plotted in accuracy scale for ease of objective comparison between CMDFall and MSRAction3D dataset The detail of F1 score is in the next section 4.6 Figure 4.15 and 4.16 illustrate a thorough insight into the two strategies With MSRAction3D set, AMIJ gives slightly better performance than FMIJ In case of small MIJs, AMIJ strategy (with bigger threshold) output significantly ameliorated results For 44 4.5 Evaluation of the most informative joints selection Action Action Action 14 Action 15 Action 16 Action 17 Action 18 Action 19 Action 20 FMIJ 11 13 12 17 19 15 12 17 19 13 15 12 13 10 11 11 13 11 13 10 12 13 11 11 13 12 AMIJ 11 13 12 15 17 19 13 15 17 19 10 11 12 13 11 13 10 11 12 13 10 11 12 13 11 12 13 20 Table 4.4: The selected joints on AS3 of MSRAction3D dataset with two strategies each example, I used approximate average 4.7, 3.6, 3.8 joints in each class to get 95.7%, 92%, 96.2% accuracy on AS1, AS2, AS3 respectively with AMIJ as compared to 94.6%, 90.2% and 91.4% with FMIJ (4 joints) However, both strategies seem to converge at the same results when the number of joints (small threshold) increases FMIJ even outperforms AMIJ in some cases Various observations on CMDFall dataset are also used to assess the two proposed strategies There is no MIJs suitable for AMIJ with big threshold as no joints have common role in the same action Therefore, we need more joints to cover action for better results This helps using FMIJ with small MIJs, but the results are still not good because the model can not well recognize all actions If 10 most informative joints are used, the performance are relatively better but similar with both strategies From 10 joints to 20 joints (same with threshold from 0.4 to 0), the results are stable This time, AMIJ works as well as FMIJ Although the two strategies seem to choose similar joints (in best case), the flexibility and weighted selection make them distinguishable from each other (see Tab 4.4) and more difference in some class like as class 18 Consequently, extracted features are more discrimination on those classes From the results on both datasets, AMIJ strategy is clearly better than FMIJ The former requires shorter training time as it works on low feature dimension 45 4.5 Evaluation of the most informative joints selection Figure 4.15: Fixing MIJ Figure 4.16: Adaptive MIJ 46 4.6 Comparison with state-of-the-art methods Method Action Graph, CVPRW 2010 [3] Lie Group, CVPR 2014 [20] Moving Poselet, ICCVW 2015 [22] * SMIJ, CVPRW 2012 [19] SCs with IJs Sig.Proc 2015 [9] Hierarchical RNN 2015, CVPR [11] Cov3DJ, IJCAI 2013 [2] CovP3DJ, VISAPP 2018 [4] CovFMIJ (proposed method) CovAMIJ (proposed method) AS1 72.9 95.29 89.81 N/A 88.7 93.33 88.04 93.48 95.7 96.7 AS2 71.9 83.87 93.57 N/A 87.7 94.64 89.29 84.82 94.6 94.6 AS3 79.2 98.22 97.03 N/A 88.5 95.50 94.29 94.29 98.1 99.0 Mean 74.7 92.46 93.50 47.06 88.3 94.77 90.53 90.98 96.1 96.7 Table 4.5: Accuracy (%) of state-of-the-art methods on MSRAction3D dataset Three best results are in bold * On the experiments of authors not used all datasets 4.6 Comparison with state-of-the-art methods Table 4.5 shows the comparison the results with the state-of-the-art methods in MSRAction3D dataset In the method of [3], these features are simple, so the results seem be not so good For Lie group approach, the results are quite high for AS1 and AS3 However, the results obtained for AS2 are very poor For the informative joints approach, the method in [19] got bad recognition results as it does direct matching The last three line shows the comparison among the approaches used covariance feature The proposed method obtains the average highest accuracy For fair assessment, in comparison with the original method [2] and the improved method [8], I used the best results of three sets at same layer, the proposed method obtained +6% of improvement accuracy The results obtained for CMDFall dataset are poorer than those of MSRAction3D due to the challenging of the dataset Although the proposed method had significantly increased by 0.18 and 0.25 of F1 score in comparison with the methods in [5] and in [4]) respectively For this dataset, the approaches based deep learning seem ineffective due to the difference of action duration, the noise of skeletal data It is worth to note that the proposed method does not need to normalize duration thanks to covariance computation Moreover, the most informative joints detection strategies help to reduce noise data and can remove the mistake from wrong labeling by keep slack in classification 47 4.7 Time computation Method ResTCN, ICPR 2018 [4] CNN-LSTM, MAPR 2019 [5] Cov3DJ, IJCAI 2013 [2] CovFMIJ (proposed method) CovAMIJ (proposed method) F1 score 0.39 0.46 0.61 0.63 0.64 Table 4.6: F1 score of state-of-the-art methods on CMDFall dataset Dataset AS1 (MSRAction3D) CovFMIJ AS2 (MSRAction3D) AS3 (MSRAction3D) CMDFall AS1 (MSRAction3D) CovAMIJ AS2 (MSRAction3D) AS3 (MSRAction3D) CMDFall Precision 0.97 0.94 0.98 0.64 0.97 0.94 0.99 0.64 Recall 0.95 0.95 0.97 0.63 0.96 0.94 0.98 0.64 F1 score 0.95 0.94 0.98 0.63 0.97 0.94 0.99 0.64 Accuracy 95.7 94.6 98.1 63.8 96.7 94.6 99.0 64.4 Table 4.7: Obtained results of the proposed methods on two datasets phase Hence the proposed method outperforms all of the state-of-the-art methods for this dataset However, the obtained F1 score for this dataset still needs to be improved As mentioned in metric section, besides computing results in same metric with other author in each dataset, I also computed with the other metric in case of the best results for each dataset and proposed method This one can see details in Table 4.7 The results with precision and recall are almost approximate, so F1 score and also accuracy are the same 4.7 Time computation Table 4.8 shows the time computation required for action recognition in testing phase This time is computed with the assumption that the models and data for testing are ready The method will take the input and predict the action label The value in the table is the average value for all samples in the test set of CMDFall dataset We can see using only position feature requires the smallest computation time In case two layers, the method takes only 0.02s in average for recognizing one action We can also observe 48 4.7 Time computation Number of MIJs Layers Position Velocity Combination L=1 0.01 0.04 0.05 L=2 0.02 0.06 0.1 L=2,OL 0.04 0.07 0.13 L=1 0.05 0.07 0.19 10 L=2 0.5 0.23 1.13 L=2,OL 0.73 0.31 1.85 Table 4.8: Time computation (s) in testing phase the computational time required when we change the number of the most informative joints used for action recognition With MIJs, all settings take less then 0.1s for predicting except combination feature with two layers (overlapping and none overlapping) However, when we used 10 MIJs, the computational time increases significantly for combination feature The results can be explained by the fact for the position feature we can put directly data to the model With the velocity, we have to compute the velocity from position then the duration increases a little bit Moreover, when combining the features, as the size of feature vector is two times greater than the size when using one sole feature, therefore, it requires time in the predicting step of SVM Besides that the number of layer has an significant effect on the computational time of the proposed method So, depending on the problem and the desired performance, we can consider between using and none using more layer 49 Chapter Conclusions 5.1 Conclusions The thesis has proposed a new method for action recognition from skeleton by (1) introducing two strategies for determining the most information joints for each action and (2) combining position and velocity feature for action representation The most informative joints detection strategies have proved their effectiveness in representation of action This representation allows to keep enough information related to these actions and in the same time to reduce the noise in skeleton data Furthermore, the idea of using velocity information for computing covariance feature has show its potential in human action recognition problem The combination the velocity and position has figured out the new approach for making discriminative description of actions The performance of the proposed method has proved its robustness in comparison with the original method as well as the state-of-the-art methods On MSRAction3D dataset, the obtained results (approximately 96%) has obtained +6% improvement in comparison with the baseline method [2] The proposed method has also outperformed different state-of-the-art methods For CMDFall dataset, although the results are not as good as for MSRAction3D dataset, the method significantly improved results in comparison with the baseline method [5] A part of the thesis results has been published in the 10th IEEE international conference on knowledge and systems engineering 2018 (KSE 2018) The experimental results obtained in the thesis allows to confirm that the use of the 50 5.2 Future works most informative joints in describing human actions is giving really potential results for the current models Although bringing several significant advances, the proposed method still has limitations: Firstly, it is proved that thanks to the voting technique in the most informative joint detection step, the proposed method helps to reduce noise in skeletal data However, if the noise occurs in a majority of action instances performed by different subjects, the proposed method may result an incorrect recognition Secondly, due to the set of the most informative joints for each action is different, we have to build many one vs all SVM model (one model for each action) This requires the computational time in both training and testing phase Finally, it has been shown in the thesis that the combination of position and velocity help to leverage a high recognition accuracy However, to build a discriminative features for action representation, others features sill have to be investigated for action recognition 5.2 Future works This section aims at pointing out the future works of the thesis First, due to the limitation time of the thesis, experiments have performed only on two benchmark datasets So, in order to evaluate the performance of the proposed method, in the future, I will conduct the experiments with other datasets such with more action and more samples Second, in the proposed method, as each action has its own set of the most informative joints, this requires to train many models In the future, I will study and propose a global representation for all actions This will help to reduce model errors and computational time Third, I will investigate others joint features such as angular velocity and acceleration and to combine with the features used in this thesis through different fusion schemes Finally, to deploy the model into practice, we also need to explore the first phase, spotting then connecting recognizing phase (current work) as well as handling the input data from camera 51 Publications [1] Tien-Nam Nguyen, Dinh-Tan Pham, Thi-Lan Le, Hai Vu, Thai-Hai Tran, Novel Skeleton-based Action Recognition Using Covariance Descriptors on Most Informative Joints, 10th International Conference on Knowledge and Systems Engineering (KSE 2018) - November 2018, HCM, Vietnam 52 References [1] L Lo Presti and M La Cascia, “3d skeleton skeleton-based human action classification: A survey,” Pattern Recognition, vol 53, 12 2015 ii, [2] M.Hussein, Torki, Goawayyed, and El-Saban, “Human action recognition using a temporal hierarchy of covariance descriptors on 3D joints locations,” International Joint Conference on Artifical Intelligence, 2013 ii, iii, vii, 4, 5, 9, 10, 11, 19, 21, 25, 26, 27, 32, 47, 48, 50 [3] W Li, Z Zhang, and Z Liu, “Action recognition based on a bag of 3d points,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp 9–14, June 2010 iii, 5, 31, 47 [4] T H Tran, T L Le, D T Pham, V N Hoang, V M Khong, Q T Tran, T S Nguyen, and C Pham, “A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality,” ICPR, pp 1947–1952, 08 2018 iii, 5, 33, 47, 48 [5] T.Hoang, V.Nguyen, T.Le, H.Vu, and T.Tran, “3d skeleton-based action recognition with convolutional neural networks,” in 2nd Int Conference on Multimedia Analysis and Pattern Recognition (MAPR), May 2019 iii, 5, 47, 48, 50 [6] E Ghorbel, R Boutteau, J Boonaert, X Savatier, and S Lecoeuche, “Kinematic spline curves: A temporal invariant descriptor for fast action recognition,” Image and Vision Computing, vol 77, 06 2018 vii, 9, 10 [7] J Wang, Z Liu, Y Wu, and J Yuan, “Mining actionlet ensemble for action 53 REFERENCES recognition with depth cameras,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1290–1297, June 2012 vii, 10, 11 [8] H El-Ghaish, A Shoukry, and M Hussein, “Covp3dj: Skeleton-parts-basedcovariance descriptor for human action recognition,” VISAPP, 01 2018 vii, 9, 11, 12, 47 [9] M Jiang, J Kong, G Bebis, and H Huo, “Informative joints based human action recognition using skeleton contexts,” Sig Proc.: Image Comm., vol 33, pp 29–40, 2015 vii, 9, 12, 13, 47 [10] C Li, Q Zhong, D Xie, and S Pu, “Skeleton-based action recognition with convolutional neural networks,” CoRR, vol abs/1704.07595, 2017 vii, 14, 15, 17 [11] Y Du, W Wang, and L Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1110–1118, June 2015 vii, 14, 15, 16, 47 [12] A Shahroudy, J Liu, T Ng, and G Wang, “NTU RGB+D: A large scale dataset for 3d human activity analysis,” CoRR, vol abs/1604.02808, 2016 vii, 16 [13] M Rhif, H Wannous, and I Farah, “Action recognition from 3d skeleton sequences using deep networks on lie group features,” ICPR, pp 3427–3432, 08 2018 vii, 17 [14] A Veeraraghavan, A Srivastava, A Roy-Chowdhury, and R Chellappa, “Rate- invariant recognition of humans and their activities,” IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol 18, pp 1326– 39, 05 2009 viii, 10, 29 [15] Zhang.HB, Zhang.YX, and Zhong.B, “A Comprehensive Survey of Vision-Based Human Action Recognition Methods,” Sensors, 2019 [16] Z Zhang, “Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol 19, pp 4–12, April 2012 54 REFERENCES [17] G Johansson, “Visual perception of biological motion and a model for its analysis,” Perception & Psychophysics, vol 14, pp 201–211, Jun 1973 2, [18] F Han, B Reily, W Hoff, and H Zhang, “Space-time representation of people based on 3d skeletal data: A review,” CoRR, vol abs/1601.01006, 2016 [19] F Ofli, R Chaudhry, G Kurillo, R Vidal, and R Bajcsy, “Sequence of the most informative joints (smij): A new representation for human skeletal action recognition,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp 8–13, June 2012 4, 11, 20, 47 [20] R Vemulapalli, F Arrate, and R Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 588–595, June 2014 9, 10, 13, 14, 47 [21] M A Bagheri, Q Gao, S Escalera, T B Moeslund, H Ren, and E Etemad, “Locality regularized group sparse coding for action recognition,” Comput Vis Image Underst., vol 158, pp 106–114, May 2017 10 [22] L Wang, J Zhang, L Zhou, C Tang, and W Li, “Beyond covariance: Feature representation with nonlinear kernel matrices,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp 4570–4578, Dec 2015 11, 47 [23] D Carbonera Luvizon, H Tabia, and D Picard, “Learning features combination for human action recognition from skeleton sequences,” Pattern Recognition Letters, vol 99, pp 13–20, Nov 2017 12 [24] R Vemulapalli and R Chellappa, “Rolling rotations for recognizing human actions from 3d skeletal data,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4471–4479, June 2016 13, 14 [25] W Ding, K Liu, B XU, and F Cheng, “Skeleton-based human action recognition via screw matrices,” Chinese Journal of Electronics, vol 26, no 4, pp 790–796, 2017 14 55 REFERENCES [26] S Suriani, S Noor, F Ahmad, R Tomari, W Nurshazwani, W Wan Zakaria, and M N Haji Mohd, “Human activity recognition based on optimal skeleton joints using convolutional neural network,” Journal of Engineering Science and Technology, vol 7, pp 48–57, 07 2018 14, 17 [27] S Yan, Y Xiong, and D Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” CoRR, vol abs/1801.07455, 2018 14 [28] H Liu, J Tu, and M Liu, “Two-stream 3d convolutional neural network for skeleton-based action recognition,” ArXiv, 05 2017 14 [29] “Support Vector Machine.” https://machinelearningcoban.com/2017/04/09/ smv/, 2019 [Online; accessed 10-March-2019] 28 [30] Chang, Chih.C, and Lin.C, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol 2, pp 27:1–27:27, 2011 Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm 36 [31] L van der Maaten and G Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol 9, pp 2579–2605, 2008 36, 42 [32] M L McLaren, Improving automatic speaker verification using SVM techniques PhD thesis, Queensland University of Technology, 2009 [33] D Tran, L D Bourdev, R Fergus, L Torresani, and M Paluri, “C3D: generic features for video analysis,” CoRR, vol abs/1412.0767, 2014 [34] T Nguyen, D Pham, T Le, H Vu, and T Tran, “Novel skeleton-based action recognition using covariance descriptors on most informative joints,” in 2018 10th International Conference on Knowledge and Systems Engineering (KSE), pp 50– 55, Nov 2018 56 ... phúc BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ Họ tên tác giả luận văn: Nguyễn Tiến Nam Đề tài luận văn: Nghiên cứu phát triển phương pháp biểu diễn nhận dạng hoạt động người dựa khung xương Chuyên... ngày 26/10/2019 với nội dung sau: STT Yêu cầu hội đồng Gộp chương Giải thích lí lựa chọn phương pháp nhận dạng sử dụng đề tài Bổ sung độ đo đánh giá Precision, Recall, F1 Nội dung sửa chữa, bổ... Precision, Recall F1 score sử dụng để đánh giá hệ thống nhận dạng Tuy nhiên, luận án, để so sánh với phương pháp đề xuất trước đó, tùy vào sở liệu mà độ đo khác sử dụng Cơ sở liệu MSRAction3D sử

Định dạng
Số trang	70
Dung lượng	2,51 MB