Available online at www.sciencedirect.com ScienceDirect IERI Procedia (2013) 181 – 187 2013 Inteernational Conference C o Electronnic Engineerring and Coomputer Scieence on A Pracctical annd Autom mated Im mage-bassed Fram mework ffor T Tracking gPedestriian Movvements from a Video V Halimatuul Saadiah Md Yatim m1, Abdullaah Zawawii Talib1, Faazilah Haroon1,2 SSchool of Computter Sciences Universiti Sains Malaysia, M 111800 USM Penanng, Malaysia Dep epartment of Comp mputer Science Taibah Univeersity, Al-Madinah All-Munawwarah, Kingdom K of Saudii Arabia Absttract Videeo tracking of pedestrian p movvements can bee used to gain better understaanding of crow wd features and behaviors as nowaadays pedestriaan safety is a very v important consideration in i many situatio ons The existiing works on vvideo tracking have some limitatioons For examplle some works focus on a speecific event and d a specific placce, and some reequire a lot of mage-based fram mework for traccking pedestriaan movements humaan interventionn In this paper, a practical andd automated im from m a video is preesented The prroposed framew work consists of o several steps such as detecttion, tracking aand extracting charaacteristics of a pedestrian from m a video Connsecutive framees in the video are processed to t extract moviing objects of intereest which are assumed to be peedestrians Thee extracted data obtained are filltered to removve unwanted objjects and then objeccts are labelled for identificatioon These stepss are automatedd and less human efforts are neeeded The centrroid positions of all objects are coomposed to obttain the movem ment vectors whhich are used to o plot a graph and visualize thhe movement path of a pedestriann Besides, it is i also possiblee to estimate thhe speed of a pedestrian Thhe results of thee preliminary mework using three videos with w different scenarios s are presented p in thhe paper The experiment for the proposed fram pedestrian movemeent is plotted accurately a and the maximum and minimum m numbers of pedestrians p in tthe video are H the sppeed of the pedeestrian is slighttly inaccurate recorrded correctly However by Elsevier B B.V B.V © 20013.Published © 2013 The Authors.dPublished by Elsevier Selection andand peerpee review under responsibility of Information Research Institute Sele ection r review undeer responsibilit ty of Engineering Informattion Engineer ing Research Institute Keyw words: Object traccking; video; pedeestrian; monitorinng; movement paath; speed 2212-6678 © 2013 The Authors Published by Elsevier B.V Selection and peer review under responsibility of Information Engineering Research Institute doi:10.1016/j.ieri.2013.11.026 182 Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 Introduction For the past few decades, videos have been used mainly for watching recorded events Later, they have been used in monitoring and surveillance through closed-circuit television video (cctv) Manual human monitoring through video recording is not always practical In order to help and automate video monitoring, research on surveillance and monitoring [1]has become a growing research area These applications can help to minimize the human effort as it can be run automatically or semi-automatically to meet specific objectives The main idea in automated monitoring is to extract and analyze the macroscopic and even microscopic data from the images of the video camera automatically without manual inspection on the video and therefore, with less or no human effort Some research applications are aimed at getting the pedestrian characteristics from a video such as the speed and trajectory of moving objects and density of pedestrians in a specific area The data obtained could be used to validate and calibrate simulation model for safety purpose, enhances the design architecture of a building or alert the security personnel on anomaly of events In this paper, a practical andautomated framework for the entire process of pedestrian tracking from video footageis proposed The framework can assist in understanding crowd characteristics and behaviors with little or no human effort Related Work There exist a number of works on extracting data from a video footage with varying focus and targets[1] The works can be classified into extracting the density of crowd in image [2–5], counting the number of pedestrians[2,6,7] and also extracting trajectories of moving object from video footage [4,5,8,9] In term of detecting trajectories of an object, the object can be a pedestrian [10], vehicle [11], human fingertip[12] and many more Different researchers have used different methods to obtain the object’s trajectory such as supervised learning [8], unsupervised learning [13] and in some casesthrough clustering of trajectories[14] From the trajectory, it can be observed whether the movement pattern of the object of interest is abnormal or normal[14,15].In order to extract data of a pedestrian from video footage, the first and most important step is to detect individual pedestrian in the video This step is quite challenging and existing methods focus on specific and controlled situations A popular method for pedestrian detection is background subtraction [16– 18] However, it cannot guarantee that a detected object is the object of interest Therefore, there are methods that combine background subtraction with other method [19,20].Object classification is another method for detection.For examplethere is a method that classifies the pedestrians based on their color [21,22] Once the pedestrian is properly detected, tracking ofthe detected pedestrian throughout the video frameswill take place A number of tracking algorithms are available such as Kalman filter [6,21], particle filter [23], feature-based tracking [9,22] and active contour-based tracking [24] The existing automatic detection and tracking methods focus on some specific events with some limitations Therefore, it is not guaranteed that it can be applied in different situation After detection and tracking, the next step is to extract the position of a pedestrian accurately in order to obtain some information from video footage especially speed measurement Because of the camera placement and also lens characteristics, accurate pedestrian position can hardly be obtained due to the distortion of the image This phenomenon called geometric distortion[25], must be correctedthrough a technique called image calibration [7] The data will be further analyzed to get the speed, trajectory and the number of pedestrians on the images The conservative way to get the speed of pedestrian is by calculating the speed manually and there is another way to obtain speed but it requires a device called GPS [26] However, this device requires a pedestrian to bring it along in order to get speed measurements Extracting pedestrian data from sequence of images is an alternative way of getting speed measurement without the need for the pedestrian to carry anything Nonetheless this is a new area of study Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 The Proposed Framework The purpose of the framework is to automate the entire process of detecting, tracking and extracting pedestrian characteristics from video footages Therefore, the overall framework as shown in Figure consists of object detection, object tracking and lastly extracting and visualizing the object’s characteristics Object here refers to pedestrian Object detection is implemented by getting the background image from the video [27]and then applying the frame differencing method [28] in order to extract moving object in the video We assume that a moving object on the video is a pedestrian and no other objects are present on the background image Then, each frame will undergo image processing techniques to extract individual objects The objects are filtered, and unwanted objects (e.g very small objects) are removed Then they are labeled in such a way that each specific object can be identified using the same label for every frames After obtaining the desired objects, the centroidcoordinate [16] of each object is identified and the movement vector for each of them is extracted The next step is extracting characteristics from the video footage such as speed, trajectory and also the number of pedestrians detected The steps used in the implementation of the framework are adapted from some existing works of other researchers.The techniques used in our approach include: image processing, extracting objects and filtering unwanted objects, labeling and identifying objects, extracting the movement vectors, and finally plotting the movement vector to visualize the trajectory, measuring the speed and counting the number of pedestrians Detection Extracting and Visualizing Characteristics Tracking Fig The proposed framework 3.1 Detection The first step in the framework involves extracting the background image[27]from a video frame In order to detect objects in a video, consecutive frames are processed using frame differencing techniques [29] between video frames and the background image in order to get the objects of interest Moving objects in the scene are identified as the objects of interest Pixels which change during the video will be grouped as foreground blobs The foreground blobs are the object presents on the image [30] They are then further processed using image processing techniques[31]namely convertingthe images to grayscale and thresholdingthem by using the method by Otsu [32] As the objects on the image might not be filled completely (in the form of a blob), the holes present on each object are filled We also applied morphological closing techniques to get a more precise object This process will produce a binary image that contains the background and the foreground Image(RGB) Convertingto grayscaleand thresholding Fillingtheholesof theobjectsof interest Removingobjectsof nonͲinterest Fig 2.The overall process for detection Since the focus is on detecting the pedestrians, objects of non-interest which have been detected during 183 184 Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 frame differencing are removed The filtering process will be appliedin order to remove small objects and unwanted lines in every frame Thus, we can assume that the remaining objects are humans and there is no false detection because of shadow, reflection or other reasons Figure shows the overall detailed process for detection 3.2 Tracking The object tracking method that we have applied in the implementation of the proposed framework is based on the blob tracking method The blobs detected are labeled for identification The objects in subsequent frames are labeled in such a way that each object has similar label in all frames.The movement of each blob is tracked by their centroid coordinates [16] The coordinates for each object obtained from each frame is storedas a sequence of coordinates throughout the video frames These coordinates compiled for each object in the video frames give the movement vector for the object 3.3 Extracting and visualizing characteristics From the movement vector obtained from the video, the graph of pedestrian movement is plotted Hence, the movement pattern of pedestrian can be visualized automatically by plotting the direction of the movement of the pedestrians The speed of the pedestrians is one of the useful information that we can extract from a raw video Speed is obtained from the distance of the object in pixel per time in seconds The speed that we get is measured in pixels per second which depends on the size of the pixel on the image frame The time is extracted directly from the video duration in seconds since the video is captured in real time and the object is assumed to be present on video scene throughout the video duration.The distance d is calculated asfollows: d x2 x1 y2 y1 ˄1) where the initial point is (x1,y1) and the final point is (x2,y2).Speed is averaged out for every two seconds of the video for a more accurate calculation This is to avoid getting the result from straight line distance from initial point to final point since pedestrian might move in random directions from one point to another Thus, the average speed for a distance of two seconds is more practical and reliable for a more accurate speed calculation This is calculated using the following equation: ݀݁݁ݏ݁݃ܽݎ݁ݒܣൌ ௧௧ௗ௦௧ ௧௧௧ (2) The speed in pixels per second is hard to validate Thus, the real measurement for distance in meter persecond is suggested It is calculated by mapping the pixel distance into the centimeter distance In order to pixel mapping, a user must choose two points on the image and provide a distance in real measurement between these two points This step requires an effort from the user This framework also identifies the maximum and minimum of number of pedestrians detected in the video frames Preliminary Experimental Results and Discussion For the purpose of doing a preliminary test on the framework, we have captured several videos which were taken from the top view using a single fixed camera We have used three different scenarios and they were taken at the same place Table shows the result of the experiment which consists of the trajectories, 185 Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 maximum and minimum numbers of pedestrians, speeds in pixels per second (pps) and speeds in meter per second (mps) for the three different scenarios We have also indicated the actual speed of the pedestrians The graph shows an accurate movement path of the pedestrian for all the scenarios For the counting part in all the three scenarios, the exact maximum and minimum numbers of pedestrians detected were obtained which is one and zero respectively As shown in the table, the speed of the pedestrian is slightly inaccurate for all the three scenarios This may be due to the geometric distortion phenomenonwhich has not been tackled in the implementation Table The result of the experiment Scenario Video Information Description x sec A pedestrian walks in a straight line x Max x 64.86 pps x Min x 0.42 mps (actual 0.69 mps) A pedestrian walks in a straight line but slower than in Scenario x Max x 39.58 pps x Min x 0.33 mps(actual 0.46 mps) A pedestrian walks in a zigzag manner x Max x 80.40 pps x Min x 0.72 mps (actual 0.78 mps) x 25 fps x sec x 25 fps x sec x 25 fps Sample Image of The Video Trajectory Number of Pedestrians Speed Conclusion and Future Work The framework presented in this paper can be used to get the movement path and speed of the pedestrians, and the maximum and minimum numbers of pedestrians detected in a video, and also provides some analysis and visualization of the result in an automated manner and with less human effort In all cases accurate or exact results are obtained However, the result for the speed is slightly inaccurate due to geometric distortion of the video frames We have also successfully developed and implemented this practical framework by using and adapting some existing methods and techniques in order to reduce and minimize the constraints and limitations of automating the entire process of detecting, tracking, and extracting and visualizing characteristics of a pedestrian For our future work, we plan to improve the result for the speedmeasurement by applying the geometric distortion correction Perhaps, the framework should be enhanced for dense crowds which appear in places like Masjid al-Haram, in Saudi Arabia To ensure the robustness of the system, a wider variety of videos will be used to test the system More efforts will be focused on occlusion handling, placement of the video camera, and proper testing and validation 186 Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 Acknowledgements The author would like to acknowledge the support of the Ministry of Higher Education Malaysia for this research under the Fundamental Research Grant Scheme entitled “More Accurate Models for Movements of Pedestrians in Big Crowds” References [1] W Hu, T Tan, L Wang, S Maybank, A Survey on Visual Surveillance of Object Motion and Behaviors, IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 34 (2004) 334–352 [2] R Ma, L Li, W Huang, Q Tian, On Pixel Count based Crowd Density Estimation for Visual Surveillance, IEEE Conference on Cybernetics and Intelligent Systems, 2004 (2004) 170–173 [3] X Liu, W Song, J Zhang, Extraction and Quantitative Analysis of Microscopic Evacuation Characteristics Based on Digital Image Processing, Physica A: Statistical Mechanics and Its Applications 388 (2009) 2717–2726 [4] J Zheng, D Yao, Intelligent Pedestrian Flow Monitoring Systems in Shopping Areas, 2010 2nd International Symposium on Information Engineering and Electronic Commerce (2010) 1–4 [5] B Steffen, a Seyfried, Methods for measuring pedestrian density, flow, speed and direction with minimal scatter, Physica A: Statistical Mechanics and Its Applications 389 (2010) 1902–1910 [6] H Celik, A Hanjalic, E Hendriks, Towards a robust solution to people counting, IEEE International Conference on Image Processing, 2006 (2006) 2401–2404 [7] D Conte, P Foggia, G Percannella, F Tufano, M Vento, A Method for Counting Moving People in Video Surveillance Videos, EURASIP Journal on Advances in Signal Processing 2010 (2010) 231–240 [8] J Albusac, J.J Castro-Schez, L.M Lopez-Lopez, D Vallejo, L Jimenez-Linares, A supervised learning approach to automate the acquisition of knowledge in surveillance systems, Signal Processing 89 (2009) 2400–2414 [9] M Boltes, A Seyfried, B Steffen, A Schadschneider, Automatic Extraction of Pedestrian Trajectories from Video Recordings, in: W.W.F Klingsch, C Rogsch, A Schadschneider, M Schreckenberg (Eds.), Pedestrian and Evacuation Dynamics 2008, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010: pp 43–54 [10] D Makris, T Ellis, Path detection in video surveillance, Image and Vision Computing 20 (2002) 895– 903 [11] Z Zhang, K Huang, T Tan, L Wang, Trajectory Series Analysis based Event Rule Induction for Visual Surveillance, 2007 IEEE Conference on Computer Vision and Pattern Recognition (2007) 1–8 [12] D Ren, J Li, Vision-Based Dynamic Tracking of Motion Trajectories of Human Fingertips, Robotic Welding, Intelligence and Automation (2007) 429–435 [13] N Johnson, D Hogg, Learning the distribution of object trajectories for event recognition, Image and Vision Computing 14 (1996) 609–615 [14] C Piciarelli, G Foresti, L Snidaro, Trajectory clustering and its applications for video surveillance, in: Proceedings IEEE Conference on Advanced Video and Signal Based Surveillance 2005 AVSS-05, IEEE, 2005: pp 40–45 [15] A Fernández-Caballero, jose carlos Castilllo, jose maria Rodriguez-sanchez, A proposal for local and global human activities identification, in: Articulated Motion and Deformable Objects, Springer Berlin Heidelberg, 2010: pp 78–87 [16] H Yue, C Shao, Y Zhao, X Chen, Study on Moving Pedestrian Tracking Based on Video Sequences, Journal of Transportation Systems Engineering and Information Technology (2007) 47–51 [17] C Hao-li, S Zhong-ke, F Qing-hua, The study of the detection and tracking of moving pedestrian using monocular-vision, in: Computational Science–ICCS 2006, 2006: pp 878–885 Halimatul Saadiah Md Yatim et al / IERI Procedia (2013) 181 – 187 [18] Y Dedeoglu, B ugur Toreyin, U Gudukbay, A.E Cetin, Silhouette-based method for object classification and human action recognition in video, Computer Vision in Human-Computer Interaction 3979 (2006) 64–77 [19] L Bazzani, D Bloisi, V Murino, A comparison of multi hypothesis kalman filter and particle filter for multi-target tracking, in: 11th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2009), 2009: pp 47–55 [20] J Berclaz, A Shahrokni, F Fleuret, Evaluation of probabilistic occupancy map people detection for surveillance systems, in: 11th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2009), 2009: pp 55–62 [21] S Hoogendoorn, W Daamen, P.H.L Bovy, Extracting microscopic pedestrian characteristics from video data, in: 82nd Annual Meeting at the Transportation Research Board, 2003: pp 1–15 [22] J Ma, W Song, Z Fang, S Lo, G Liao, Experimental study on microscopic moving characteristics of pedestrians in built corridor based on digital image processing, Building and Environment 45 (2010) 2160– 2169 [23] I Ali, M.N Dailey, Multiple human tracking in high-density crowds, Image and Vision Computing (2012) 540–549 [24] A Yilmaz, O Javed, M Shah, Object tracking, ACM Computing Surveys 38 (2006) 1–45 [25] S Lee, S Lee, J Choi, Correction of radial distortion using a planar checkerboard pattern and its image, IEEE Transactions on Consumer Electronics 55 (2009) 27–33 [26] S Bandini, M Federici, S Manzoni, A qualitative evaluation of technologies and techniques for data collection on pedestrians and crowded situations, in: Proceedings of the 2007 Summer Computer Simulation Conference, Society for Computer Simulation International, 2007: pp 1057–1064 [27] S.S Cheung, C Kamath, Robust techniques for background subtraction in urban traffic video, in: Proceedings of SPIE, SPIE, 2004: pp 881–892 [28] M Karaman, L Goldmann, D Yu, T Sikora, Comparison of static background segmentation methods, in: S Li, F Pereira, H.-Y Shum, A.G Tescher (Eds.), Visual Communications and Image Processing 2005, 2006: pp 1–12 [29] C Zhan, X Duan, S Xu, Z Song, M Luo, An Improved Moving Object Detection Algorithm Based on Frame Difference and Edge Detection, Fourth International Conference on Image and Graphics (ICIG 2007) (2007) 519–523 [30] D Kong, D Gray, A Viewpoint Invariant Approach for Crowd Counting, 18th International Conference on Pattern Recognition (ICPR’06) (2006) 1187–1190 [31] N Hussain, H.S.M Yatim, N.L Hussain, J.L.S Yan, F Haron, CDES: A pixel-based crowd density estimation system for Masjid al-Haram, Safety Science 49 (2011) 824–833 [32] N Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Systems, Man, and Cybernetics (1979) 62–66 187 ... main idea in automated monitoring is to extract and analyze the macroscopic and even microscopic data from the images of the video camera automatically without manual inspection on the video and. .. personnel on anomaly of events In this paper, a practical andautomated framework for the entire process of pedestrian tracking from video footageis proposed The framework can assist in understanding... video, and also provides some analysis and visualization of the result in an automated manner and with less human effort In all cases accurate or exact results are obtained However, the result for