Vnanomaly a novel vietnam surveillance video dataset for anomaly detection

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) VNAnomaly: A novel Vietnam surveillance video dataset for anomaly detection Tu N Vu Toan T Dinh Nguyen D Vo Vietnam National University University of Information Technology Ho Chi Minh City, Vietnam 18520184@gm.uit.edu.vn Vietnam National University University of Information Technology Ho Chi Minh City, Vietnam 18521504@uit.edu.vn Vietnam National University University of Information Technology Ho Chi Minh City, Vietnam nguyenvd@uit.edu.vn Tung Minh Tran Khang Nguyen Vietnam National University University of Information Technology Ho Chi Minh City, Vietnam tungtm.ncs@grad.uit.edu.vn Vietnam National University University of Information Technology Ho Chi Minh City, Vietnam khangnttm@uit.edu.vn Abstract—Surveillance systems have long been considered as an effective tool to capture various realistic abnormal actions or events in various domains such as traffic management or security With the smart city development, thousand of installed surveillance cameras have played a vital role in detection and prevention of dangerous events However, there is a lack of anomaly datasets for developing automatic anomaly detection systems in Vietnam In this study, we introduce a new dataset named VNAnomaly for anomaly detection in Vietnam Moreover, we also conduct a thorough evaluation of current state-of-theart for unsupervised anomaly detection methods based on deep architectures including MLEP, Future frame prediction, MNAD, and MNAD with modified inference on benchmark datasets and our dataset Experimental results indicate that the proposed method almost always outperforms the competitors and achieves the best performance in terms of Area Under the Curve (AUC) score at 61.14% Index Terms—Anomaly, Anomaly Detection, Deep Learning, VNAnomaly, Autoencoder the objects but also depends on the context in the surveillance video [2], [3] Several events are normal in some contexts but are abnormal in another context For example, riding a motorbike in a pedestrian zone is considered an anomaly, but in a city road context, this is a normal event [2] To avoid this ambiguity, The scope of this work mainly focuses on the urban street scenes in Vietnam and some unusual events that often occur in this context Output Input (Abnormal, Abnormal, Model Abnormal, Abnormal, Abnormal) I INTRODUCTION Nowadays, with advances in artificial intelligence, integrating surveillance cameras has emerged as an efficient tool for complicated urban management tasks such as road traffic monitoring or anomalous event detection An abnormal event in a surveillance camera is defined as an event that does not conform to expected behavior [1], [2] The anomaly detection problem takes a sequence of frames as input and returns the label of each frame (Normal, Abnormal), see Figure With the development of smart cities in Vietnam, it is reasonable to build a surveillance system that can identify abnormal events such as crimes or illegal activities However, there are not many studies providing a decent data resource for anomaly detection in Vietnam Therefore, this study provides a novel dataset that focuses on human-related events to aim for urban management One of the biggest challenges of anomaly detection is the ambiguity of anomaly definition An abnormal identification process depends not only on the activities and appearance of 978-1-6654-1001-4/21/$31.00 ©2021 IEEE Fig The anomaly detection model takes a video (sequence of frames) as input and returns the label of each frame in the video: normal or abnormal Sample images are taken from VNAnomaly dataset Currently, there are two main approaches for anomaly detecting problems including unsupervised learning and weaklysupervised learning, which are indicated based on the training data’s experimental setting [3] One most important challenges of anomaly detection is the lack of anomalous events leading to an imbalanced dataset The abnormal sample is usually expensive to collect and there is always an unknown and new kind of anomaly existing In the unsupervised learning approach, models are trained with only normal video frames and validated with both normal and anomalous frames On the other hand, models in weakly-supervised learning approaches are trained with both normal and only a very small amount of anomalous data There are many benchmark datasets provided for both approaches However, most of the unsupervised 266 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) datasets are only single-scene so they are not close to realworld scenes We summarize our contribution as follows: ‚ ‚ ‚ We introduce a novel dataset for the task of unsupervised anomaly detection on streets in Vietnam We conduct a thorough evaluation of current state-of-theart methods for unsupervised anomaly detection on the dataset We suggest a way to modify the inference stage of unsupervised approaches, which increases the MNAD method’s result by about 0.5%-2%, compared to the previous state-of-the-art methods The rest of the paper can be organized as follows In section II, we summarize the related works Section III then describes the collecting, annotating process, and detailed information of our dataset Section IV, we discuss the evaluation method and propose methods for our problem Section V, the evaluation and the outcomes obtained from different detection methods are presented The paper ends with a conclusion and some directions for future work 2) Weakly supervised learning: For Weakly-supervised learning approaches, the anomalous datasets are mainly collected from social media platforms such as Youtube, Facebook [3] The diversity and enormous video capacity of these platforms allow researchers to access and collect a large number of anomaly videos In these approaches [4], [7], abnormal events are explicitly predefined and collected in various contexts from numerous sources Moreover, models are trained with both normal and only a very small amount of anomalous data to learn to distinguish between normal and abnormal events These approaches usually comprise three main modules described in Figure 3: i) Arrange training instances to preprocess the video-level ground truth; ii) Feature extraction to extract video’s features; and iii) Fully connected network to classify whether a frame is anomalous Arrange training instances Feature extraction Fully connected module Anomaly score II RELATED WORKS Fig An example of weakly supervised method from Sultani et al [4] A Anomaly detection Anomaly detection is a binary classification between the normal and the anomalous classes and it is one of the most challenging and long-standing problems in computer vision [4] For video surveillance applications, there have been many attempts to detect the abnormalities as well as violence in the videos Overall, there are two main approaches to solve this problem including: (1) Unsupervised learning; and (2) Weaklysupervised learning 1) Unsupervised learning: In contrast to the abundance of normal events, the probability of appearing abnormal events is very low Furthermore, it is almost infeasible to gather all kinds of abnormal events Therefore, in unsupervised learning methods [1], [5], [6], models are trained with only normal video frames because of the availability of benchmark datasets The collected anomaly frames are only used for validation purposes These methods focus on learning the pattern of normal frames and use the reconstruction or prediction loss to determine whether a frame is anomalous for inference, see Figure After that, they will try to reconstruct, predict the current frame and use the reconstruct/ predict error to calculate the anomaly score Reconstructing module Comparing module Anomaly score Fig An example of an inference process of unsupervised method: Future frame prediction [1] B Existing anomaly datasets There have been many studies proposing anomaly datasets for anomaly detection in recent years In these datasets, we can separate them into two types: single-scene and multi-scene datasets [2] We discuss each of these below and summarize them in Table I 1) Single-scene datasets: Single-scene datasets usually contain only limited scenes which usually are less than three scenes Due to the difficulty in collecting surveillance camera videos back in these days, it was reasonable to take a long video captured by one camera Therefore, there are many single-scene datasets introduced in recent years However, they might be not general enough to satisfy real-world surveillance applications Some popular single-scene datasets’ samples are displayed in Figure ‚ Subway dataset [9] was captured at the entrance and the exit gates in a subway station and comprises one video each The entrance gate video sequence is hour 36 minutes long whereas the exit gate video footage is 43 minutes long with a resolution of 384 x 512 Anomalous activities mainly include people jumping or trying to get through the turnstiles without payment, walking in the wrong direction ‚ UCSD Pedestrian dataset [8] consists of two subsets: UCSD Pedestrian (Ped 1) and UCSD Pedestrian (Ped 2) Ped contains 34 training videos and 36 evaluating videos with 40 anomalous events Most of the outliers in this dataset are related to cyclists, cyclists, and car drivers entering the pedestrian zone Ped consists of 16 training videos and 12 evaluating videos with 12 anomalous events The definition of an anomaly in Ped 267 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) TABLE I CHARACTERISTICS OF VIDEO ANOMALY Dataset Total frames Training UCSD Ped [8] 14,000 6,800 UCSD Ped [8] 4,560 2,500 Subway Entrance [9] 144,246 20,000 Subway Exit [9] 38,940 4,500 CUHK Avenue [10] 30,652 15,328 ShanghaiTech campus [1] 315,306 274,515 VNAnomaly (Ours) 588,941 578,609 ‚ DETECTION DATASETS FOR UNSUPERVISED APPROACHES Abnormal Events 54 23 66 19 47 130 110 Testing 7,200 2,010 124,246 34,440 15,324 40,791 75,214 is similar to the one in Ped The main difference between these two subsets is the viewpoints, the dataset’s size, and the resolution (158 x 238 in Ped and 240 x 360 in Ped 2) Both subsets only contain scene each CUHK Avenue dataset [10] comprises 16 training videos and 21 evaluating videos (resolution 480 x 856 pixels) with 47 anomalous events in total including throwing an object, running, jumping This dataset only captures one scene, but the size of people may change because of the distance and angle of the camera UCSD Ped Subway Entrance CUHK Avenue Scenes 1 1 13 36 Resolution 158 x 238 240 x 360 384 x 512 384 x 512 360 x 640 480 x 856 340 x 672 camera angles There are some anomaly events such as riding a bike, skateboarding that not relate to security purposes in general UCF-Crime dataset [4] is a large-scale complex dataset that spans over 128 hours of videos of 240 × 320 resolution and contains 13 different classes of real-world anomalies Its training split contains 800 normal and 810 anomalous videos, while the test split has 150 normal and 140 anomalous videos This dataset is intended for a very different formulation of video anomaly detection refer to a weakly supervised anomaly detection III Normal UCSD Ped ‚ Anomaly Types 5 19 4 VNANOMALY DATASET Abnormal A Dataset description Throwing object Enter without payment Ride a bike Ride a bike Fig Some samples including normal and abnormal frames in the singlescene datasets are illustrated Red boxes denote anomalies in abnormal frames 2) Multi-scene datasets: In recent years, the popularity of surveillance cameras and the rise of video-sharing platforms has enabled the increase of scenes in anomaly datasets Some datasets were applied in the recent study including ShanghaiTech dataset and UCF-Crime dataset Some samples are shown in Figure ShanghaiTech UCF-Crime To tackle the limitation of existing datasets, we present the VNAnomaly dataset It consists of 90 training videos and 127 evaluating videos that include real-world anomalies in Vietnam street The anomaly types contain common human-related anomalies in the street of Vietnam including fighting, assault, vandalism, and robbery In Figure 6, we show normal and abnormal frames with a nearly similar scene Because the VNAnomaly is an unsupervised dataset, we will not explicitly define these anomaly types The reason we choose the above anomaly types is the popularity of these types compared to other ones In addition, these anomaly types are also relevant to the safety of public lives and assets in Vietnam’s urban environments However, there are some unusual events that are not mentioned such as traffic accidents Therefore, we will continue to provide more anomaly types soon Abnormal Normal Normal Abnormal Ride a bike Shoplifting Fighting Running Chasing Fig The normal and abnormal frames from VNAnomaly dataset Fig Some samples including normal and abnormal frames in the multiscene datasets are illustrated Red boxes denote anomalies in abnormal frames ‚ ShanghaiTech Campus dataset [1] contains 330 training videos and 107 evaluating videos (resolution 480 x 856 pixels) in the campus Presenting mostly person-based anomalies, it contains 130 abnormal events captured in 13 different scenes with complex lighting conditions and Our dataset surpasses existing unsupervised datasets from the following three aspects 1) To the best of our knowledge, this is one of the first unsupervised datasets that capture the Vietnamese scene 2) Larger data volume: VNAnomaly has 578,609 training frames and 75,214 evaluating frames, which is bigger than existing unsupervised benchmark datasets 268 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) 3) Higher scene diversity: Our dataset contains 36 scenes with different aspects such as different camera angles, times of the day Furthermore, to ensure the context that the model learns in the training stage matches the testing phase, the scenes in the training set are required to be suitable with the testing set As compared with ShanghaiTech dataset [1], amount of scenes in our dataset triples them Some different scenes are shown in Figure (a) Training set (b) Testing set Fig The time-length of different surveillance video types C Annotation Ground truths for each testing video are annotated in temporal form We divide the VNAnomaly dataset into separate parts for annotators Then we have annotators cross-check and re-check the labels ourselves for mistaken checking Our annotation form follows the annotation form in [1], [4] i.e the start and end frames of the anomalous event in each testing anomalous event Fighting Vandalism Assault Right angle - day time Right angle - full-color night time Right angle - day time Robbery Fighting IV EVALUATION Left angle - day time Left angle - black and white night time A Methods Fig The diversity of scenes in VNAnomaly dataset B Collecting process To enhance the dataset’s quality, we constrain the context in this dataset to the street in Vietnam captured by the surveillance camera We search videos on Youtube using text search queries in the Vietnamese language In addition, to ensure the videos collected from Youtube are surveillance videos, we closely monitor the context in each video and only choose the videos that are not related to commercial purposes and are captured by surveillance cameras After that, following the collecting process in [4], we get rid of the videos that not satisfy the following conditions: manually edited, prank videos, not captured by CCTV cameras, taken from news, captured using a hand-held camera, and containing compilation Additionally, we also try to collect training videos that have some similarities with the videos in the testing The training sequences last seven and a half hours, whereas the testing sequences last approximately hour with 110 anomalous events In Figure 3, we show normal and abnormal frames with a nearly similar scene More specifically, our dataset contains street scenes in three different surveillance video types related to time: daytime; full-color nighttime; and black and white nighttime videos Our dataset has in total hours 19 minutes 59 seconds daytime videos and hours 49 seconds nighttime videos The description of each surveillance video types’ time length in training and testing set is shown in Figure Moreover, the scenes in videos are the street scene captured in different camera angles: left angle (3 hours 39 minutes and 53 seconds) and right angle (4 hours 19 minutes and 52 seconds) The diversity of scenes, view angles, light conditions reflect the real-world environment, which also poses a challenge for anomaly detection In this study, we conduct evaluation of three state-of-theart unsupervised anomaly detection methods: Future frame prediction [1], Margin learning embedded prediction (MLEP) [5], and Learning Memory-guided Normality for Anomaly Detection (MNAD) [6] ‚ Future frame prediction [1] is one of the most popular baselines for anomaly detection problems It used a generative adversarial network (GAN) [11] to exploit normal patterns and predict the next frame This method focused on predicting the future frame and uses the predicting error to calculate the anomaly score The illustration of the training process is described in Figure ‚ MLEP [5] contains an encoder and a decoder based on ConvLSTM [12] In this method, the feature corresponding to the hidden state of the last input in the ConvLSTM was fed to the margin learning module This module learned a more compact normal data distribution and enlarged the margin between normal and abnormal events which improved the model’s ability to discriminate between normal and abnormal frames ‚ MNAD [6] is one of the most robust state-of-the-art unsupervised methods In this model, a new memory module to record prototypical patterns of normal data on the items in the memory was added Moreover, feature compactness and separateness losses to train the memory module were also proposed B Measurement Evaluating scores in most of the works for anomaly detection [1], [5], [6], are calculated based on the receiver operating characteristic curve (ROC curve) The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings The final score is the area under the ROC curve (AUC score) AUC measures the entire two-dimensional area underneath the entire ROC curve from (0,0) to (1,1) The higher the AUC score is, the better 269 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) model distinguishes between normal and anomalous frames AUC scores of experiment results are visualized in Figure the anomaly score will be more stable However, it will limit the discriminant ability of the model Thus, the 1:3 ratio is determined by using grid search to balance the elements of the predicted and actual frames This alteration slightly increases the effectiveness of MNAD from 0.5% to 1.14% AUC score Generator Generator Fig AUC score visualization of state-of-the-art methods on VNAnomaly Fig 11 The desmonstration of modified inference process C Modifying inference stage When inspecting the visualization of the result of MNAD, we notice that there is a considerable fluctuation of the anomaly scores The model can predict the anomaly frames due to the powerful representation capacity of the CNNs Therefore, the scores only go on to the bottom in the begin- AUC = 69.27 (a) MNAD AUC = 73.66 (b) MNAD w/ modified inference Fig 10 The anomaly scores visualization of MNAD and MNAD with modified inference on video Right_Robbery_971 In this figure, the groundtruth score for the normal frame is 1, the abnormal frame’s ground truth score is vice versa ning After that, it starts to increase as the model can predict the abnormal frames, see figure 10 To mitigate the above issues, we alter the inference stage to stabilize the scores Instead of using four real previous frames as an input, we mix the current predicted nearest frame with the frame that is predicted in the earlier step, see Figure 11 If there are consecutive anomaly frames then this alteration will increase the predicting error in the later frames and stabilize the anomaly scores when frames are anomalous On the other hand, when the generator returns a high-quality frame, the inference process is similar to the base ones Moreover, this alteration should not affect the generator’s ability to predict the normal frames because the frames are mixed with a low ratio (1:3) The formation of the mixing strategy is demonstrated below: Ii1 “ λ1 Iˆi ` λ2 Ii Where Ii1 denotes the mixed frame, Iˆi is the predicted frame, and Ii is the real frame λ1 and λ2 are chosen using a grid search: λ1 “ 0.25, λ2 “ 0.75 If the ratio of the predicted frame information is higher than the actual frame information, V EXPERIMENT A Experiment setup We follow the setting of the Future Frame Prediction method [1] to split the existing datasets for the experiment In addition, the training and testing set of the VNAnomaly dataset is split similarly to other unsupervised datasets’ settings: The training set contains normal videos, whereas the testing set contains abnormal videos The whole process is implemented on GeForce RTX 2080 Ti GPU with memory 11019MiB For the VNAnomaly dataset, we train Future frame prediction, MLEP, and MNAD model for five epochs with batch sizes of 8, 8, and 4, respectively Other hyper-parameters are set to default In addition, we follow the setting proposed in the original paper for benchmark datasets B Experimental results and discussion Following the aforementioned experiment setting, we intensively experiment with three state-of-the-art methods: Future frame prediction [1], MLEP [5], and MNAD [6] Table II summarizes the performance comparison of these methods on VNAnomaly and other publicly available datasets namely UCSD Ped [8], UCSD Ped [8], Subway Entrance [9], CUHK Avenue [10], ShanghaiTech [1] based on AUC metrics Table II uses AUC at the frame level to show the performance comparison of various methods on selected video datasets Overall, our modified inference positively affects the MNAD [6] model are competitive results, compared to the current state-of-the-art methods benchmark datasets and outperforms the competitors on VNAnomaly As regards the CUHK Avenue [10], our method is the highest at 89.30% whereas the lowest percentage of MLEP method [5] was at 82.29% Additionally, the figure for the proposed method is slightly higher than the other methods ranged from nearly 0.5% to approximately 3% Out of the other methods, the proposed method has the highest figure at 61.04% and it always outperforms the competitors on the VNAnomaly dataset ranging from over 1% to 3% On the other hand, the figure for the proposed method has higher than MLEP method [5] and MNAD method [6] on Ped dataset [8] at 82.15%, compared 270 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) AUC OF TABLE II PED 1, PED 2, SUBWAY ENTRANCE, AVENUE, SHANGHAITECH AND VNANOMALY DATASETS Ped [8] Ped [8] Subway Entrance [9] CUHK Avenue [10] ShanghaiTech [1] VNAnomaly (our) 82.33% 95.40% 71.72% 85.10% 72.80% 58.00% 75.37% _ 77.30% 82.29% (92.8%)* 70.43% (76.8%)* 53.71% 80.37% 96.97% 69.37% 88.57% 70.30% 60.04% 82.15% 96.92% 72.14% 89.30% 70.71% 61.14% DIFFERENT METHODS ON THE Future frame prediction [1] MLEP [5] MNAD [6] MNAD w/ modified inference (our) *The result achieved in the original paper is evaluated in different settings to MLEP method [5] at 75.37% and MNAD method [6] at 80.37%, whereas our method is slightly lower than Future frame prediction [1] about 0.18% As indicated clearly from the Ped dataset [8], the proportion of the proposed method is the same figures, compared to the other methods at nearly 97.00% Similarly, the proportion of the proposed method is slightly higher than that of Future frame prediction method [1] and MNAD method [6] on Subway Entrance dataset [9] at 72.14%, 71.72% and 69.37%, respectively Likewise, the number of the proposed method is similar to that of the MLEP method [5] and MNAD method [6] on ShanghaiTech dataset [1] at about 70.50% but this figure is slightly lower than that of the Future frame prediction method [1] about 2% It is important to point out that the issue of unsupervised anomaly detection problems in videos is still challenging depending on a specific context It is noteworthy that the runtime difference between the modified and base methods is negligible at 1743,144 seconds on VNAnomaly and 1744,257 seconds for the modified inference methods, respectively VI CONCLUSION In this paper, we introduce a new dataset named VNAnomaly for anomaly detection problem in videos The dataset contains different scenes, multi-objects, and common anomaly types in Vietnam’s street It is one of the first anomaly datasets to capture the scenes in Vietnam and reflect the real-world challenge with the variety of angles and diversity of time Furthermore, we also conduct extensive experiments using three unsupervised anomaly detection methods wherein we adopt a new modified inference for the MNAD method Our experiments demonstrate improvements over state-of-theart methods on real-world datasets In our future work, we plan to collect more anomaly types in Vietnam to increase the diversity of the dataset, continue improving anomaly detection models, and try to implement on edge devices ACKNOWLEDGEMENT This work was supported by the Multimedia Processing Lab (MMLab) at the University of Information Technology, VNUHCM REFERENCES [1] W Liu, W Luo, D Lian, and S Gao, “Future frame prediction for anomaly detection - a new baseline,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp 6536–6545 [2] B Ramachandra, M J Jones, and R R Vatsavai, “A survey of single-scene video anomaly detection,” CoRR, vol abs/2004.05993, 2020 arXiv: 2004.05993 [Online] Available: https://arxiv.org/abs/2004.05993 [3] S Zhu, C Chen, and W Sultani, “Video anomaly detection for smart surveillance,” CoRR, vol abs/2004.00222, 2020 arXiv: 2004 00222 [Online] Available: https : //arxiv.org/abs/2004.00222 [4] W Sultani, C Chen, and M Shah, “Real-world anomaly detection in surveillance videos,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Jun 2018 [Online] Available: https://doi org/10.1109/cvpr.2018.00678 [5] W Liu, W Luo, Z Li, P Zhao, and S Gao, “Margin learning embedded prediction for video anomaly detection with a few anomalies,” in IJCAI, 2019 [6] H Park, J Noh, and B Ham, “Learning memory-guided normality for anomaly detection,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun 2020 [Online] Available: https://doi.org/10.1109/cvpr42600.2020.01438 [7] M.-I Georgescu, A Barbalau, R T Ionescu, F S Khan, M Popescu, and M Shah, “Anomaly detection in video via self-supervised and multi-task learning,” CoRR, vol abs/2011.07491, 2020 arXiv: 2011.07491 [Online] Available: https://arxiv.org/abs/2011.07491 [8] V Mahadevan, W Li, V Bhalodia, and N Vasconcelos, “Anomaly detection in crowded scenes,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp 1975–1981 [9] A Adam, E Rivlin, I Shimshoni, and D Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 30, no 3, pp 555–560, Mar 2008 [Online] Available: https://doi.org/10.1109/tpami.2007.70825 [10] C Lu, J Shi, and J Jia, “Abnormal event detection at 150 fps in matlab,” in 2013 IEEE International Conference on Computer Vision, 2013, pp 2720–2727 [11] I Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, A Courville, and Y Bengio, “Generative adversarial networks,” Communications of the ACM, vol 63, no 11, pp 139–144, Oct 2020 [Online] Available: https://doi.org/10.1145/3422622 [12] S Xingjian, Z Chen, H Wang, D.-Y Yeung, W.-K Wong, and W.-c Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Advances in neural information processing systems, 2015, pp 802–810 271 ... intended for a very different formulation of video anomaly detection refer to a weakly supervised anomaly detection III Normal UCSD Ped ‚ Anomaly Types 5 19 4 VNANOMALY DATASET Abnormal A Dataset. .. Future frame prediction [1] B Existing anomaly datasets There have been many studies proposing anomaly datasets for anomaly detection in recent years In these datasets, we can separate them into... Multi-scene datasets: In recent years, the popularity of surveillance cameras and the rise of video- sharing platforms has enabled the increase of scenes in anomaly datasets Some datasets were applied

Định dạng
Số trang	6
Dung lượng	3,64 MB