Một phương pháp nhận dạng hình dáng người 2d trong các hình ảnh và video

Journal ofScience & Technology 101 (2014) 155-158 A 2D Human Posture Recognition Metliod in Images and Videos Thi-Lan Le', Thanh-Mai Nguyen Hanoi University ofScience and Technology No.l Dai Co Viet Sir Ha Noi, Viet Nam Received March 04 2013; accepted April 22, 2014 Abstract Human posture recognition aims at determining the pose of human at a given instant This is an important topic that attracts a number of research becasue of its wide range application HPR can be used in different applications such as heathcare system, human-robot interaction, just name a few In this paper, we propose human posture recognition method based on Speeded Up Robust Features and Bag of Words model In order to evaluate the performance of our method, we have done both independent and dependent experiments The experimental results have proved the robustness of the proposed method Keywords HPR; Image analysis, 2D Introduction Human posture recognition (HPR) is an attractive topic in computer vision because of its wide range of application HPR can be viewed as a subfield of gesture recognition since a posture is a "stahc gesture" In practice, posture recognition is usually at the crossroad between people detection and gesture recognition Sometimes we are only interested in the posture at a given time which can be performed by a people detector [I], In other cases, posture detection can be sometimes considered as a first step for gesture recognition, for instance, by associating postures to states of a Finite State Machine (FSM) [2] The challenges of posture recognition are seamlessly the same as gesture recognition except that the temporal aspect is not accounted In our work, we are interested in detecting key human postures that can be used for abnormal event modeling and recognition in a heath monitoring system, Related works In the literature, there are a number of works that have been proposed for human posture recognition Zhao and Liu [3] used a centroid radii model as shape descnptor to represent human posture in each frame Non- hnear SVM (Support Vector Machine) decision tree is used to recognize human postures such as standing, lying, bending, and sitting In [4], Chella et al proposed a system for simultaneous people tracking and posture recognition incluttered environments in the context of humanrobot interaction As soon as the tracking algorithm individuates a person to track, the system also Corresponding Author; Tel (+84) 904 412 844 Email Thi-Lan Le@mica.edu.vn estimates its posture The recognition adopts a modified eigen-space technique in order to classify seven different postures- standing, pointing left, pointing right, stop (pointing both lefl/right), left arm raised, right arm raised, both arms raised In [5], a combination of MPEG-7 contour-based shape descriptor and projection histogram was used to recognize the main postures and the view of a human based on the bmary object mask obtained by the segmentation process The recognition is treated as a typical pattern recognition task and is carried out through a hierarchy of classifiers The system can recognize four main postures: standing, sitting, bending and laymg with four views: front, back, left, right The SURF (Speeded Up Robust FeaUires) has been proved to be robust for different applications such as object recognition and plant identification However, to the best of our knowledge, there is no work dedicated for human posture recognition using SURF In this paper, we propose a human posture recognition using SURF and analyze the robustness of this feamre for human posture recognition, Proposed approach 3.1 Overview The flowchart of our work is illustrated in Fig I, Our human posture recognition module has four main components- preprocessing, SURF extraction, Bag of word calculation and SVM classification It is important to note that the human posture recognition module takes the output of human detection and fracking as mput For human detection and tracldng, we can apply any human detection and tracking algorithms proposed in the state of the art In the context of our work, we used our human detection Journal ofScience & Technology 101 (2014) 155-IS8 and tracking based on background subtraction combining with Histogram of Onented Gradient and SVM [6], First of all, in preprocessing step, we convert trainmg image to grayscale image and use thresholding technique for producing binary image of the human shape The binary image created after preprocessing step will then be used as a mask in feature extraction step, where we detect and exfract the SURF keypoints and descriptors within the scope of the mask In order to make more descriptive and smaller feature set for training the classifier, we take advantage of Bag-of-word (BOW) model as a dimension reduction method, which clusters all exnacted SURF descnptors mto different sets, known as feature dictionary, by unsupervised learning technique The system will then compute BOW descnptor as a histogram, where each bin counts the number of SURF descriptors closest to the respective visual word from the dictionary Finally, SVM classifier is used for both traning and classifying, 3.2 SURF detection and extraction SURF was first introduced in [7] as a more robust feature detection than Scale Invariance Feature Transfomi (SIFT) The idea of SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral images With an integral image, it uses an integer approximation to the determinant of Hessian blob detector For descriptors, It uses the sum of the Haar wavelet response around the point of interest with the aid of integral image SURF has proven to be fast and robust when apply to local feature extraction task In this paper, we use both SURF keypoint detector and SURF descnptor extractor With this, a keypomt is represented by a 64 dimensions vector The application of SURF keypoint detector is illustrated in Fig, 1 ' Prflproi»fislng tucking 1 1 1 1 SURF Fig 2,Detected SURF keypoints on posture image 3.3 BOW model and SVM Since the number of SURF keypoints detected in each image can be very high In order to choose the relevant SURF keypoints, we use BOW model The BOW model is originated in natural language processing and information retneval field [8], In our work, BOW model is defined by a dictionary of visual words or feature clusters, each represents a cluster of SURF features classified by K-means clustering method over all detected features Finally, each image is represented by a histogram of visual words These histograms become feature vector for training SVM Since SVM is a well-known classification method, we not describe it in this paper Experimental results 4.1 Evaluation measure To evaluate the posture recognition system, for each posture type, we measure two cnteria below: where TP (True Positive) is the number of correct posnires detected; FP (False Positive) is the number of wrong postures detected, and FN (False Negative) IS the number of lost postures Sensitivity measures the proportion of actual positives which are correctly identified as such while F.A.R measures the ratio of FP and the total predicted positive The smaller F.A R and the greater Sensitivity are, the better system is 4.2 Results and Discussions Bagolwonl fiupporl Victor MMMIK Fig Overview of our approach We have performed main experiments with two databases The first database contains postures that are manually segmented while the second on consists of postures that are automatically defined by object detection and tracking The main objective is to evaluate the performance of posture recognition system with different qualities of object detection For each database, we have done kinds of test: dependent and independent tests In each test, in order to analyze the effect of dictionary size in posture recognition, we choose four different values of dictionary size that are 256, 512, 1024 and 2048, Journal of Science & Technology 101 (2014) 155-1 The first database contains images of postures extracted in posture video database In order to determine the posture region in image, we use Object Marker tool There are four types of posture' bending (PI), lying (P2), sitting (P3) and standing (P4) We select 960 images of postures for training (30 images/posture/subject * postures * subjects) and 960 images of poshires for dependent test and 240 images for independent test (30 images/posture/subject * postures * subjects) The second database consists of the images of postures that are determined automatically by our human detection and tracking algorithm as described in [6] This database is budt in the framework of the elderly monitoring system In this framework, we ask people to play a predefined scenario that does not contain bending posture Therefore, we have three postures: lying, sitting and standing The total number of images for each test is shovra in Tab, III, The first database is relatively simple since the postures in the training and the testmg set have the same view while the second database is more challenging In this database, we not control the The obtained results of two tests with the first database are shown in Tab I and Tab II The results show that our method is able to detect with high accuracy the four main postures in both independent and dependent tests Table Obtained F.A.R And Sensitivity For Four Postures In Dependent Test With The First Database Size 256 512 1024 Measure FAR Sens FAR Sens FAR •Se/w 2048 FAR Sens PI 100 100 100 100 P2 100 100 100 100 P3 100 100 100 100 P4 100 100 100 100 100 100 100 100 Table Obtained fa r and sensitivity in independent test with the first database Size 256 512 1024 2048 Measure FAR Sens FAR Sens FAR Sens FAR Sens PI 100 4,76 100 23 100 23 100 P2 100 95 100 96 67 P3 3.23 100 100 96 67 100 P4 96,67 100 100 100 Ave 0.81 99.17 1.19 98.8 0.81 99.2 0.81 99.17 Fig 3, Examples of posture regions determined by object detection algorithm for (a) lying, (b) standing and (c) sittmg The results obtained with the second database are presented in Tab IV and Tab, V, respectively The results show again that our method obtains very good results in dependent test and good result in independent test For the independent test, the dictionary with 512 words gives the best resuh in term of both F,A.R and Sensitivity, Table Images in the second database Dependent test Independent test Posture Himages Posture Lying Sitting Standing Total 693 96 580 1369 Sitnng Standing Total ttimages 129 120 249 Table Otained fa.r and sensitivity i test with the second database Size 256 512 1024 2048 Measures FAR Sens FAR Sens FAR Sens FAR Sens P2 3.57 97 69 1,75 97 25 04 96.24 03 97.11 P3 1,12 91.67 1,14 90,63 92.71 93,75 P4 4,15 95,52 471 97.76 45 98,79 43 99.66 2.95 95 2.53 95.2 2.16 95.9 1.53 96.8 Table Obtained fa.r and sensitivity in independer, test with the second database Size 256 512 1024 2048 Measures FAR Sens FAR Sens FAR Sens FAR Sens P2 42 79 07 1,69 89,y2 75,97 1,08 71,32 P4 19 42 93 33 9,92 98 33 20.81 98,33 23,72 99,17 12.92 86.2 5.81 94.13 11.4 87.15 12,4 85.24 Journal ofScience & Technology 101 (2014) 155-158 Fig and Fig illusfrate some results with correct and incorrect recognition postures with a high value of sensitivity and low value of false alarm rate However, this algorithm is not invanant to point of view In the fuUne, we would like to extend this work by taking into account point of view invanant and by combining with other information such as skeleton information provided by Kinect Acknowledgments The research leading to this paper was supported by the National Project B2013,01,41 "Study and develop an abnormal event recognition system based on computer vision techniques" References [1] Zumga, M., Incremental Learning of Events in Video usmg Reliable Information, 2008, Universite de NiceSophia Antipolls [2] Bernard, B., Human posture recognition for behaviour understanding, 2007, Universite de NIce-Sophia Antipolis [3] ZHAO, H and Z, LIU, Recognizing Human Aclivittes Usmg Non-linear SVM Decision Tree Joumal of Computational Information Systems, (2011) 24612468 [4} Chella, A., H, Dindo, and Infantine, People Tracking and Posture Recognition for Human-Robot Interaction, Workshop on Vision based human-robot interaction, Italy, March 18.2006 Fig Some examples of correct recognition with images in the second database [5] Goldmann, L., M Karaman, and T, Sikora, Human Body Posture Recognition Using MPEG-7 Descnptors, Visual Communications and Image Processing, vol 5308,2004 [6] Le, T.-L and T-H Tran, Abnormal events detection combining moiion templates and object locah2ation, in The first NAFOSTED Conference on Infomiation and Computer Science 2014 (N!CS'i4)20l4 Hanoi, Vietnam [7] Herbert Bay, A.E, Tinne Tuytelaars, Luc Van Gool, Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, 110 (2008) 346-359, Fig Some examples of incorect recognition with images in the second database In Fig, 5, the sitting and the lying postures are recognized as standing posture The main reason is that in our model, we not take into account spatial information of keypoints The set of keypoints detected for these images is similar to that of standing posture However, theses positions are different Conclusions and future works In this paper, we have presented a human posture recognition algorithm based on SURF and BOW, The experimental results show that our algorithm is capable to detect four main human [8] Hams, 2„ Disttibutional stmcmre Word, 10 (1954) 146-162 ... feature detection than Scale Invariance Feature Transfomi (SIFT) The idea of SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral images With an integral image,... Technology 101 (2014) 155-1 The first database contains images of postures extracted in posture video database In order to determine the posture region in image, we use Object Marker tool There... based on computer vision techniques" References [1] Zumga, M., Incremental Learning of Events in Video usmg Reliable Information, 2008, Universite de NiceSophia Antipolls [2] Bernard, B., Human

Định dạng
Số trang	4
Dung lượng	214,52 KB