A videobased tracking system for football player analysis using Efficient Convolution Operators44880

6 5 0
A videobased tracking system for football player analysis using Efficient Convolution Operators44880

Đang tải... (xem toàn văn)

Thông tin tài liệu

2019 International Conference on Advanced Technologies for Communications (ATC) A video-based tracking system for football player analysis using Efficient Convolution Operators Nguyen Hong Thinh1 , Hoang Hong Son1,2 , Chu Thi Phuong Dzung1 , Vu Quang Dzung3 , and Luu Manh Ha1,2∗ Abstract—Computer vision has been applied in sports analysis under the demand of the media as well as a training activity This paper presents work on a system for tracking multiple football players in video streams The challenges of the task are: the players are relatively small in the video with chaos movements; the processing time is efficient to ensure the analyzed data is reported during the match while the accuracy is required to be sufficient; the hardware of the system needs to be high mobility To overwhelm those, we apply Efficient Convolution Operators (ECO) as a core tracking method to track the targets on two synchronized laptops, then the data is merged in a post-processing stage Besides, user interactive functions are also provided to assist the operators to correct failed tracks The tracking method is qualitatively evaluated on videos from professional football matches with two resolution settings The number of user interactions to correct the failed tracks and the time processing are chosen as criteria for the evaluation The results show that ECO tracking outperforms several well-known tracking methods with less than tracking loss in minutes on average with processing rate of 12-17 fps In conclusion, the proposed system is a promising tool for football player tracking and statistical analysis in practice Index Terms—ECO tracking, football player tracking system, user interface functions I I NTRODUCTION Recently, the demand of use of artificial intelligence (AI) has become increasingly popular In the field of sport, many smart systems - based on image and video processing and machine learning technique - have been designed to assist the managers in monitoring and analyzing actions and behaviours of each player as well as providing important statistical information of the match [1] In a match, information about each player’s movement plays an important role By using AI technology, the system provides statistical information such as traveled distance, range of activity and dynamic level of the players Also, information obtained from these analyses allows the coach to have important assessments in the physical and tactical records of teams and personal development direction There are several player tracking systems applied in football sport [2]–[5] During the game, these systems conduct highquality video/data collection from fixed cameras around the football pitches [3], [5] Then, the systems perform detailed processing and provide analyzed results after a few hours or a few days Due to not being limited by the time processing ∗ Corresponding author: Manh Ha Luu, halm@vnu.edu.vn FET, VNU University of Engineering and Technology, Hanoi, Vietnam AVITECH, VNU University of Engineering and Technology, Hanoi, Vietnam R&D Department, Ecovision, Hanoi, Vietnam 978-1-7281-2392-9/19/$31.00 ©2019 IEEE factor, the advantages of these systems are usually with high accuracy, less need of technician involvement However, for an online purpose such as in TV sport news, the analysts of the match usually need results immediately after each round to provide the audience the comment In a such case, these systems are not feasible Besides, the systems require multiple fixed cameras on the football pitches leading to a large amount of data to be processed Consequently, it demands a high cost for cameras and computational systems In football matches, the players run chaotically on the field with the speed randomly changed, thus the trajectory is highly complicated In addition, the lighting condition may vary considerably which affects the video quality Also, the relative position of the player to the camera changes resulting in a considerable change in shape with a small size of the the player in the video Moreover, the players usually cross each others thus tracking algorithms may perform incorrectly, resulting in loss of tracking In this paper, we present a proposed system to handle the problems of tracking the players on the football field The proposed system allows tracking the number of predetermined players on the field, providing analyzed results in the break time of each round The designed system has several specific properties such as maneuverability, compact and high accuracy The main contributions of this work are: • We successfully apply ECO tracking method [6] to solve the problem of tracking football players and embed it to the system • We evaluate the performance of the the tracking method and compare to several other well known tracking methods using a user-scheme interaction counting The remaining parts of this paper are organized as follows In the next section, we mention similar prior arts on tracking and identification system for football players Our proposed system is described in details in section III In addition, section IV presented the experiment and evaluation performed with practical football match to test and verifies the proposed framework Finally, section V contains discussions and conclusions of the main findings achieved II L ITERATURE REVIEW Tracking multiple football players in the field is an application of Multiple Objects Tracking (MOT)-a popular problem of object tracking in computer vision Generally, MOT tracking contain multiple single tracking operating at the same time One single tracking accounts for an individual object For this, 149 2019 International Conference on Advanced Technologies for Communications (ATC) it is usually solved by using two categories of single tracking methods: tracking-by-detection and learning-to-track In the first strategy, objects of a specific category, such that humans or cars, are detected which acts a main key factor for the tracking There are a variety of methods to detect targets such as background subtraction, colour segmentation, and by applying a trained detector based on visual features such as HOG features, deep-learning features Then, the detected objects are linked to form trajectories of the tracking targets The predictions are associated to the detection and thus an incorrect detection may lead to incorrect tracking Several football-player tracking systems such as [5], [7]– [9] belong to this type In [5], Schilipsing et al presented a real-time football analysis system based on background subtraction to detect the players and the Kalman filter They also used the SVM technique to classify the football players to connect the target to each trajectory The Kalman filter requires continuous detection to correct the tracking, however, the background subtraction method is sensitive to light condition, which may lead to incorrect players detection Furthermore, the Kalman filter is mainly suitable for tracking objects with linear movement which is not the case with chaotic movement of football players Baysal et al [8] introduced Sentioscope, a football player tracking system, to track the players in real-time The system utilizes a particle-filter-based method which effectively handles occlusion problem and has been showed outperform several other tracking methods The tracking method’s accuracy intensively depends on the resolution of the particles in the football fields The higher the resolution is, the better the accuracy is However, the high resolution of the particles comes with the cost of computation time which may affect the real-time property when the method is implemented in a low-cost computational system In [9], Kim et al described a tracking method for multiple football players The tracking method uses background subtraction and edge information to detect the players A multi-scale sampling strategy with the block matching method is used to find the best match among the detected players The other strategy, learning-to-track, has become a trend recently Several well-known methods such as Multiple Instance Learning (MIL) [10], Generic Object Tracking Using Regression Networks (GOTURN) [11], trackings based on Correlation Filters [12]–[14], ECO tracking [6] are associated to this type In general, the learning-to-track methods can be categorized into online learning and offline learning In offline mode, the model-based trackers, which are designed to track a specific class of objects, is performed before the actual tracking taking place These trackers are trained offline, but they are limited because they are static and can only track a specific class of target In addition, in case of tracking multiple small objects of the same type, such as the players of the same team, the tracker can not resolve ambiguities, re-assign missed or occluded of targets Therefore, this strategy may include a learning from inaccurate information Those errors accumulate and lead to drift-tracking problem In on-line learning, the trackers are typically trained entirely Fig The structure of the proposed system: The upper path is processing blocks for the left camera while the lower path is the processing blocks for the right camera The tracks are merged and analyzed in the laptop before sending to the data center online, starting from the first frame of a video, using foreground and background patches around the targets These patches are then used to train an object-background classifier Consequently, this classifier is used to estimate the new location of the target object in next frame [13], [15] A survey on football player tracking can be found in [16] It is reported that “Detection-based trackers gave poor performance since detection was not reliable" To solve the problem of false detection in tracking, most recent research [17], [18] proposed to combine extra information from future frames in the video sequence to identify the target However, such a noncausal system is not suitable for online tracking applications Thus, we eliminate detection-based trackers in the application of football player tracking Using online tracking mode is suitable for outdoor tracking problems such as variable of lighting conditions Moreover, because the appearance of players who need to be tracked are typically unknown (e.g different color of clothes for each game), it’s not feasible for offline training For that reason, we intend to apply ECO tracking method in the proposed football player system III P ROPOSED SYSTEM A Proposed system in general The main structure of the system is represented in Figure In terms of hardware, the system consists of two parts: the vision part with two cameras (Left view and Right view) and the processing part includes two laptops (see section III-C) In our framework, we use video streams collected from two independent and fixed (during the march) cameras (Fig 2) The cameras are chosen with a wide-angle to cover a haft of the playground and with the 2.5K resolution to allows capturing clearly the players at all position in the pitches The core processing of the system is installed on the two laptops Each laptop is responsible for processing video data from one camera Unlike the system [5], we did not merge Left and Right videos to create a big, single and full view of the football field The reason is that the large processing area slows down 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 150 2019 International Conference on Advanced Technologies for Communications (ATC) player being tracked This process is completed by the Merge Track-list L-R module after each video-segment processing ends B ECO Tracking Conventionally, ECO tracking algorithm was first introduced to track basketball player, which is based on Correlation Filter [6] We adapt the method for the application of football player tracking The main ideas of the tracking schemes are summarized in Figure Typically, correlation operation is used to distinguish an object and its translations in the video sequence To reduce the complexity of computation, the correlation operation is performed in the frequency domain, thank to the FFT method Furthermore, a cosine window is usually applied for reducing spatial boundary effect of selected regions around the target Furthermore, since the background may cause bad effect on computing correlation between the image of intensity blocks, the discriminative correlation filters are applied aimed to construct a classifier to distinguish the target from its background In ECO tracking method, Minimum Output Sum of Squared Error (MOSSE) filter [12] is used Subsequently, instead of using directly image intensity, image visual features (such as ColorName feature [13], HOG feature [19]) can be extracted for better describe the image intensity Additionally, based on Continuous Convolutional Operators for Tracking (CCOT) in [14], the target features are learned by using multi-resolution feature maps in a continuous sequence Thus, ECO tracking method is an extension of CCOT Contrary to the CCOT algorithm, ECO tracking does not update the model on every frame but combining features from N frames and the final model is refined using GMM and Conjugate Gradient iterations [20] In summary, ECO tracking has several important properties which more efficient to football player tracking: • Learning target features from multiple tracks in the video sequence, so ECO tracking is suitable for deforming and resizing objects due to fast movement • Applying GMM to regroup similar targets into a few components to decrease the number of learned classifiers and thus reduce the computation cost of the tracking step • Utilizing factorized convolution operator to minimize the number of feature space Fig Mounted IP cameras on a tripod cover the whole field of view of the football field Both of them are connected to processing laptops via a LAN network the processing speed dramatically Furthermore, it is difficult to observe the players running in full view on a single laptop screen Therefore, we decide to handle the videos, process each video independently before combine the tracking list in a single laptop The core of the processed part, as shown in Fig 1, includes several main modules such as Video buffer, Tracking, Manual ReID, and Merge track-list The operating principle of the processing system can be summarized as follows: • • • Video buffer: The system is designed to operate online with as small delay as possible Besides, the system is semi-automatic, still needs supervisors to correct tracking errors Thus, instead of using real-time frame-work which may cause lots of error, or waiting for full video data of the game which not suitable for the online purpose, we split and save the video stream into two-minute segments Processing is continuously carried out on the split videos In the such way, the analyzed results can be updated in after completing each 2-minute video segment Tracking and Manual ReID: The main purpose of the system is to track players and provide statistical results, such as traveled distance a traveled position For this, we intend to buitl semi-automatic system, using stopand-go method ECO tracking method [6] is used as the core of the tracking system Detail of ECO algorithm are described in section III-B In case of tracking loss, manual re-identification (ReID) is performed to correct the trackloss via an user interface property Merge track-list Left-Right: It is emphasized that the above Tracking and ReID is processed for each half of the pitch, not on the whole field This method ensures the technicians to have a large perspective view for each player; However, this also has limitation when the players move from half the pitch to the other half To solve this problem, we perform an combining process of the results track-list of Left view and Right view based on the tracklist information including the time, the location of each C Hardware configuration The hardware of the system contain the following components: • Two Dahua cameras, DH-IPC-HFW4631 model, with H.264 stream encoding, 30/25fps FullHD/2.5K resolution, 2.7 - 13.5mm adjustable lens and shutter speed of 1/10 - 1/100.000s The two cameras are mounted on a tripod with a gap of meter ( see Fig 2) • Two laptops Dell Precision 7510, Intel Core i7-6820HQ, Ram 16GB, SSD 512GB, 15.6 inch Full HD screen The two laptops connect to the two cameras via a LAN network which enables a connection to the Data center via the Internet 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 151 2019 International Conference on Advanced Technologies for Communications (ATC) Fig General workflow of ECO tracking method: The convolution operation between image and filter is performed by element-wise multiplication (symbols ) in Frequency domain (using Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) The filter’s weights are trained on multiple target features from multi-resolution feature maps in continuous sequence B Evaluation tracking algorithm is suitable for the hardware of the system and the efficiency of the algorithm with the conditions of competition on different videos Firstly, we use frame rate as the evaluation parameter The different tracking methods are tested on the two video resolution and with one to three players to track Besides, the system enables the semiautomatic operations, but the level of human interaction should be as little as possible Therefore, we used a semi-automation as the second evaluation criteria [8] For this, the average of number of track losses in two-minute video presents the accuracy of tracking methods The less number of track losses are, the better method is In practice, for each tracking loss, the operator has to manual ReID for the track right at the time it occurs As a result, number of manual ReID interactions can be count as the number of track losses To evaluate the performance of tracking algorithms, we implemented several different methods and verified them on the actual video data obtained from the football matches The tracking algorithms used to compare are Median Flow [21], KCF [22], MIL [10], Boosting [23] and ECO tracking [6] In the above algorithms, KCF and ECO tracking are reimplemented as in the original papers using C++ in Linux environment The other methods, we use directly source code in OpenCV library The algorithms are validated on two different video resolutions, FullHD video and high-resolution video 2.5K The purpose of this evaluation section is to check the relevance of tracking algorithms with conformance to the design of the system Hence, two criteria are given: the performance of the For the first evaluation, the results of the frame rate are shown in Table I It can be seen that, when the tracking number increases from one to three players at the same time, the performance of the system with different tracking methods decreases significantly In most cases, Median Flow has the highest frame rate and the second is KCF and then Boosting ECO tracking algorithms have reasonable performance, varying from 12 fps to 17 fps (a haft of original video frame rate) MIL algorithm has the lowest frame rate Running the system with video at different resolutions also affects system performance Frame rate slightly increases when running Median Flow, KCF, Boosting algorithms In contrast, for higher resolution videos the system processing slow down significantly when running MIL or ECO tracking IV T RACKING EVALUATIONS AND R ESULTS A Data We use ten videos, to 12-minute length for each, from matches collected at Hang Day stadium in V-LEAGUE tournament 2018-2019 The videos were obtained from two fixed cameras look at the two sides of the pitch (see Fig ) The original resolution of the videos is in 2.5K with 25 frames per second Seven of those were resized into FullHD size in order to verify the dependency of tracker performance on video resolution The videos were recorded in several conditions such as in the afternoon and evening; with clear weather and cloudy weather As in the system pipeline operation, all of the videos are spitted into several continuous 2-minute videos 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 152 2019 International Conference on Advanced Technologies for Communications (ATC) Fig The interface of analyzed video frame obtain from two cameras at the Left side and the Right side of football field For each team, the number of each player is showed together with the traveled distance and the number of manual ReID times TABLE I AVERAGE OF PROCESSING SPEED (FPS) OF THE TRACKING ALGORITHMS Tracking Methods Median Flow Boosting MIL KCF ECO Number of tracks (FullHD) player players players 26.5 23.4 21.3 21.7 14.5 14.1 11.4 6.9 5.1 24.1 18.9 18.1 17 15.4 12.8 Number of tracks (2.5K) player players players 28 25.9 20.8 22.9 18.3 14.6 10 5.1 25.9 20.6 17.7 15.6 13.7 12.2 Fig Illustration of ECO tracking, Median Flow, KCF, MIL and Boosting on the same football player at an accelerated movement algorithms For the second evaluation, the averages of number of track losses in two-minute videos are reported in Fig The result demonstrates that ECO tracking perform the best, compared to other tracking methods, with number of track losses are less than trackloss per two minutes on average In addition, the number of track losses in video 2.5K are less than those in video FullHD which can be explained by the fact that the better resolution provide more detailed features for the tracking methods An example of video sequence with all of the tracking methods on an accelerating player is illustrated in Fig MIL and Median Flow can not follow the player after 10 frames; Boosting and KCF start being out of tracking at the frame number 180 while ECO tracking still fits to the target V D ISCUSSIONS AND C ONCLUSION We have built a system for football tracking based on criteria of mobility, accuracy, and online ability The core tracking method, ECO tracker, was quantitatively evaluated and compared to other well known tracking methods using video data from several professional football matches The results showed that with the same number of tracking players, ECO tracker perform the best in term of trackloss measurement However, a drawback of the system is that it can not operate in real-time when the number of processed frames per minutes for threeplayer tracking, i.e 12 fps, which is smaller than frame rate of the input video stream This drawback can be improved by dropping-frame technique, but it may increase the inaccuracy of the tracking Still, we suppose that, once the learning stage for ECO tracking is computed in a compact GPU, the 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 153 2019 International Conference on Advanced Technologies for Communications (ATC) [4] A Al-Ali and S Almaadeed, “A review on soccer player tracking techniques based on extracted features,” in 2017 6th International Conference on Information and Communication Technology and Accessibility (ICTA), pp 1–6, IEEE, 2017 [5] M Schlipsing, J Salmen, M Tschentscher, and C Igel, “Adaptive pattern recognition in real-time video-based soccer analysis,” Journal of Real-Time Image Processing, vol 13, no 2, pp 345–361, 2017 [6] M Danelljan, G Bhat, F Shahbaz Khan, and M Felsberg, “Eco: Efficient convolution operators for tracking,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646, 2017 [7] W.-L Lu, J.-A Ting, J J Little, and K P Murphy, “Learning to track and identify players from broadcast sports videos,” IEEE transactions on pattern analysis and machine intelligence, vol 35, no 7, pp 1704– 1716, 2013 [8] S Baysal and P Duygulu, “Sentioscope: a soccer player tracking system using model field particles,” IEEE Transactions on Circuits and Systems for Video Technology, vol 26, no 7, pp 1350–1362, 2015 [9] W Kim, S.-W Moon, J Lee, D.-W Nam, and C Jung, “Multiple player tracking in soccer videos: an adaptive multiscale sampling approach,” Multimedia Systems, vol 24, no 6, pp 611–623, 2018 [10] B Babenko, M.-H Yang, and S Belongie, “Visual tracking with online multiple instance learning,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 983–990, IEEE, 2009 [11] D Held, S Thrun, and S Savarese, “Learning to track at 100 fps with deep regression networks,” in European Conference on Computer Vision, pp 749–765, Springer, 2016 [12] D S Bolme, J R Beveridge, B A Draper, and Y M Lui, “Visual object tracking using adaptive correlation filters,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2544–2550, IEEE, 2010 [13] M Danelljan, F Shahbaz Khan, M Felsberg, and J Van de Weijer, “Adaptive color attributes for real-time visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1090–1097, 2014 [14] M Danelljan, A Robinson, F S Khan, and M Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in European Conference on Computer Vision, pp 472–488, Springer, 2016 [15] S.-H Bae and K.-J Yoon, “Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1218–1225, 2014 [16] M Manafifard, H Ebadi, and H A Moghaddam, “A survey on player tracking in soccer videos,” Computer Vision and Image Understanding, vol 159, pp 19–46, 2017 [17] A A Butt and R T Collins, “Multi-target tracking by lagrangian relaxation to min-cost network flow,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1846– 1853, 2013 [18] L Leal-Taixé, M Fenzi, A Kuznetsova, B Rosenhahn, and S Savarese, “Learning an image-based motion context for multiple people tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3542–3549, 2014 [19] N Dalal and B Triggs, “Histograms of oriented gradients for human detection,” 2005 [20] P Li, D Wang, L Wang, and H Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognition, vol 76, pp 323–338, 2018 [21] Z Kalal, K Mikolajczyk, and J Matas, “Forward-backward error: Automatic detection of tracking failures,” in 2010 20th International Conference on Pattern Recognition, pp 2756–2759, IEEE, 2010 [22] J F Henriques, R Caseiro, P Martins, and J Batista, “High-speed tracking with kernelized correlation filters,” IEEE transactions on pattern analysis and machine intelligence, vol 37, no 3, pp 583–596, 2014 [23] H Grabner, M Grabner, and H Bischof, “Real-time tracking via on-line boosting.,” in Bmvc, vol 1, p 6, 2006 Fig Average number of loss tracking in 2-minutes video segment The tracking players are randomly initial, from one to three trackers per time, and we count number of tracklosses during each run on video segments performance of the system will be improved However, in the case that all of players on the field need to be tracked at once, the method seems to be far from practice application due to the expensive computation Moreover, the system requires completely manual ReID for the correction, although it was showed that for three-players tracking, there are less than one track loss in every minutes on average, which is inconvenience in the case that there are a few players out of tracking at the same time Nevertheless, we believe that, with the current development of deep learning techniques for person recognition, the automatic player identification will achieve good result soon Another drawback of the proposed system is that, two mounted cameras can only provide a view from a bleacher When occupation occur, there is not sufficient information to predict the overlapped players Two cameras placed on the opposite bleacher can compensate for the drawback, however this may increase the complexity of the system In conclusion, the proposed system has been tested in practice and shows a potential to be applied in football data analysis ACKNOWLEDGMENT This work has been supported by VNU University of Engineering and Technology under project number CN18.14 We would like to thank VTVcab company for supporting us in data collection and experiment in Hang Day stadium R EFERENCES [1] G Thomas, R Gade, T B Moeslund, P Carr, and A Hilton, “Computer vision for sports: Current applications and research topics,” Computer Vision and Image Understanding, vol 159, pp 3–18, 2017 [2] W.-L Lu, J.-A Ting, K P Murphy, and J J Little, “Identifying players in broadcast sports videos using conditional random fields,” in CVPR 2011, pp 3249–3256, IEEE, 2011 [3] C.-W Lu, C.-Y Lin, C.-Y Hsu, M.-F Weng, L.-W Kang, and H.-Y M Liao, “Identification and tracking of players in sport videos,” in Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pp 113–116, ACM, 2013 978-1-7281-2392-9/19/$31.00 ©2019 IEEE 154 ... et al presented a real-time football analysis system based on background subtraction to detect the players and the Kalman filter They also used the SVM technique to classify the football players... identify the target However, such a noncausal system is not suitable for online tracking applications Thus, we eliminate detection-based trackers in the application of football player tracking Using. .. 153 2019 International Conference on Advanced Technologies for Communications (ATC) [4] A Al-Ali and S Almaadeed, ? ?A review on soccer player tracking techniques based on extracted features,” in

Ngày đăng: 24/03/2022, 09:35