Uit dronefog toward high performance object detection via high quality aerial foggy dataset

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) UIT-DroneFog: Toward High-performance Object Detection Via High-quality Aerial Foggy Dataset Minh T Tran Bao V Tran Vietnam National University University of Information Technology Ho Chi Minh, Viet Nam 18520314@gm.uit.edu.vn Vietnam National University University of Information Technology Ho Chi Minh, Viet Nam 18520499@gm.uit.edu.vn Nguyen D Vo Khang Nguyen Vietnam National University University of Information Technology Ho Chi Minh, Viet Nam nguyenvd@uit.edu.vn Vietnam National University University of Information Technology Ho Chi Minh, Viet Nam khangnttm@uit.edu.vn Abstract—In recent years, although various research has been performed on object detection with clear weather images, little attention has been paid to object detection with foggy aerial images In this paper, we address the problem of detecting objects in foggy aerial images Firstly, we create the UIT-DroneFog dataset by implementing a fog simulator (taken from the imgaug library) on 15,370 aerial images collected from the UIT-Drone21 dataset This dataset has its distinguishing characteristic of having dense motorbike density in Vietnam with objects: Pedestrian, Motor, Car, and Bus Secondly, we further leverage two state-of-theart object methods: Guided Anchoring, and Double Heads The experiment results show that Double Heads achieve a higher mAP score, with 33.20% Additionally, we propose a method called CasDou, which is the combination of Cascade RCNN, Double Heads, and Focal Loss CasDou remarkably improves the mAP score up to 34.70% The comprehensive evaluation points out the advantages and limitations of each method, which is the fundamental basement for further work Index Terms—Foggy Aerial Images, Focal Loss, Object Detection, UIT-DroneFog I INTRODUCTION Nowadays, thanks to the rapid development of Deep Learning in the object detection field, people have useful applications that are widely used in daily life, such as surveillance, rescue, traffic tracking, and automated cars In which, the two main elements, which considerably receive attention in the computer vision community as well as artificial intelligence companies, are images captured by cameras or sensors and object detection algorithms While these sensors and algorithms are constantly being improved, they are mainly designed to operate in clear weather conditions Additionally, some works have been done on object detection in clear aerial images [4, 6] However, in fact, outdoor applications usually have to deal with "bad" weather, typically mentioned in this work is the existence of fog Fog directly downgrades the quality of input images, such as sharpness, color, contrast, etc This problem not only causes 978-1-6654-1001-4/21/$31.00 ©2021 IEEE difficulties for observers but also computer vision algorithms, which leads to inefficient detection results Understanding the importance of this problem, we decided to create a foggy dataset that can be used for enhancing the result of detection models However, we not have enough ability to collect images in real foggy conditions Therefore, we use a fog augmentation from the imgaug library1 to create our UIT-DroneFog dataset by adding fog to haze-free images from the UIT-Drone21 dataset The reasons for choosing UITDrone21 are the lack of a Vietnamese aerial foggy images dataset and its distinguishing feature, which is the high density of small objects in a single image of this dataset Moreover, we also implemented two state-of-the-art (SOTA) methods (Guided Anchoring [13] and Double Heads [2]) and a proposed method to evaluate the UIT-DroneFog dataset We hope that our dataset will have important contributions in future works Our foggy object detection can be described as shown in Figure 1, in which the detector gets a foggy image as input and then returns the objects’ locations (if any) in the input image The main contributions to this paper: ‚ We introduced the UIT-DroneFog dataset with differentiate features ‚ We explored the performance of two SOTA methods: Guided Anchoring, Double Heads to evaluate the challenges of this dataset ‚ We leveraged out the effectiveness when replacing the loss function of detection models and propose a method called CasDou which achieves the highest mAP score when evaluating our dataset We believe this work might be a remark for the development of future algorithms for the given problem The rest of the paper is organized as follows Section presents the related work Section introduces our UIT- 290 https://imgaug.readthedocs.io/en/latest/ 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) (a) Input (b) Output Fig Foggy object detection The input is a foggy image, and the output is the location of possible objects in the image TABLE I: The statistic of publicly available datasets DroneFog dataset, followed by Section which describes methods that we use to evaluate the proposed dataset Section includes experiments and results Finally, Section concludes the paper Dataset II RELATED WORK In this part, we briefly introduce some existing foggy datasets and current methods which are used to solve the problem A Existing Datasets There are many foggy datasets with different features including real or synthetic datasets captured indoor or outdoor scenes FRIDA [11], introduced in 2010, consists of 90 synthetic images captured from 18 road scenes in urban area These images are used to test the visibility enhancement algorithms related to visibility and contrast refurbishment FRIDA2 [12], introduced years later, consists of 66 assorted roads having 330 synthetic images and ten camera scenes with the implementation of the same algorithm Foggy Cityscapes and Foggy Driving datasets [9] have the driving views inside cities with 20550 and 101 foggy images, respectively However, these two datasets have repeated objects RESIDE [5] dataset is the largest dataset which contains five subsets with 429,292 images obtained both indoor and outdoor with both real and synthetic fog Each subset of this dataset was made for different purposes O-Haze dataset [1], introduced by Ancuti et al in 2018, comprises 45 sets of foggy and ground truth outdoor images captured over eight weeks This dataset includes the picture of slides, trees, and benches The details of the datasets mentioned above are described in Table I B Current approaches Semantic understanding of outdoor foggy scenes allows applications to work in not only clear but also foggy weather conditions Typical examples are vehicles detection, road and lane detection There are different approaches toward detection in foggy weather Some methods choose to directly work on fog detection [10], or they can use dehazing method like FFA-Net [8] to remove fog before doing detection tasks Furthermore, classification of scenes into foggy and fog-free has also been tackled [7] In this paper, in order to evaluate Images Context Fog type Year FRIDA [11] 90 Outdoor Synthetic 2010 FRIDA2 [12] 330 Outdoor Synthetic 2012 Foggy Cityscapes & Foggy Driving [9] 20,651 Outdoor Synthetic 2016 RESIDE [5] 429,292 Outdoor Indoor Synthetic Real 2018 O-Haze [1] 45 sets Outdoor Real 2018 15,370 Outdoor Synthetic 2021 UIT-DroneFog (Ours) UIT-DroneFog dataset, we used SOTA methods and directly worked on foggy images without doing any dehazing tasks III UIT-DRONEFOG DATASET UIT-DroneFog is an aerial images dataset captured by a drone This dataset was created by the implementation of a fog simulator on each image of the UIT-Drone212 dataset A UIT-Drone21 dataset UIT-Drone21 consists of 15,370 aerial images captured by drones with about 0.6 million bounding boxes of various means of transportation and pedestrians There are four classes in this dataset: Pedestrian, Motor, Car, and Bus This dataset is divided into subsets: Training set (8,580 images), Validation set (1,061 images), and Testing set (5,729 images) B Fog simulation In this work, the imgaug library3 was implemented to create synthetic fog for our dataset We simulated fog on the UITDrone21 dataset with the use of this library’s Fog class, which has pre-defined different parameters This class executes a single layer per image with a configuration leading to fairly dense fog with low-frequency patterns However, in order to simulate suitable fog with the image’s size of the chosen dataset, we decided to adjust two parameters, which were 291 https://uit-together.github.io/datasets/ https://imgaug.readthedocs.io 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Fig Example images in UIT-DroneFog alpha_min “ 0.75 and density_multiplier “ 0.7, and kept other parameters as the default library The alpha_min (default value of p0.7´0.9q) parameter indicates the minimum alpha value when blending fog noise with the image The higher value leads to fog being "everywhere" Meanwhile, the density_multiplier (default value of p0.4 ´ 0.9q) parameter is the Late multiplier for the alpha mask Setting this higher to get denser fog wherever it is visible C Dataset Description UIT-DroneFog can be considered as the foggy version of UIT-Drone21, which means this dataset inherits all the properties (number of images, classes, and bounding boxes mentioned in section A) Moreover, UIT-DroneFog has the distinguishing highlights: ‚ ‚ ‚ Variety of high-quality images: The foggy simulation process used images captured by high-end drones with different resolutions (3840x2160, 1920x1080, 1440x1080), resulting in our fog images are all high quality images and are not blurred, distorted, skewed or obscured Variety of context: Every image in our dataset is unique They differ in fog distribution, capturing angle and height Moreover, we simulated fog not only in one place but in various places in different cities in Vietnam The challenge comes from data classes: Because this dataset was made on Vietnam streets, the vast majority of objects are motors This data imbalance is a tough challenge for detectors to work efficiently Besides, motors densely appear on the roads and they are very small, which leads to the difficulty in detecting these objects quickly Example images of the our dataset are shown in We also calculated the number of each class and declared in Figure IV METHOD A Object Detectors In this research, two SOTA object detectors and a proposed method were used to evaluate the UIT-DroneFog dataset The details are shown as follows Fig Statistics of UIT-DroneFog dataset 1) Guided Anchoring [13]: Guided Anchoring introduced an anchoring scheme that does not use a predefined set of scales and aspect ratios The authors proposed a method that jointly predicts the locations where the center of objects of interest are possible to exist through a probability map, which is yielded from the input feature map Then, the scales and aspect ratios centered at the corresponding locations are predicted On top of predicted anchor shapes, a feature adaption module is employed to mitigate the feature inconsistency With 90% fewer anchors than the RPN baseline, the authors achieve 9.1% higher recall on MS COCO through their experiments 2) Double Heads [14]: In R-CNN-based detectors for localization and classification tasks, two head structure was widely used That is convolution head and fully connected head However, the two head structures have opposite preferences towards the two tasks Specifically, the convolution head (conv-head) is more suitable for the localization task, while the fully connected head (fc-head) is more suitable for the classification task Based upon these findings, Yue Wu proposed Double Head RCNN method has a convolution head for bounding box regression and a fully-connected head focusing on classification Their method gains 3.5 and 2.8 AP on MS COCO dataset from FPN baselines with ResNet-50 and ResNet-101 backbones 3) Cascade RCNN [2]: Cascade R-CNN is a multi-stage object detection method Cascade R-CNN comprises a series of detectors, which are trained with increasing IoU thresholds 292 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) in order to be more sequentially selective against false positives Moreover, the output of each detector is used by the later ones as a good distribution for training higher quality detectors in the following stages This method also optimizes each regressor for the bounding box distribution, which is generated by the earlier but not the initial distribution Cascade R-CNN achieved success in step-wise improvement of the predictions and adaptive processing of the training distributions B Evaluation metric We used the best weights on the validation set to predict and report the results on the testing set via the mAP measure to evaluate the performance of models, which is the same as the object detection contest on the MS COCO dataset The AP score was calculated for 10 IoU varied in range from 50% to 95% with steps of 5% Besides, the results of two specified values of 50% and 75% were also reported B Loss Function The loss function plays an important role in the performance of detectors in object detection One of the most widely used loss functions is the cross-entropy (CE) loss function This loss function is based on the idea of penalizing wrong predictions more than rewarding right predictions The CE loss function is defined as the following formula: LCE ppt q “ ´logppt q (1) Where pt is the probability of class t Besides, recently, the Focal Loss (FL), which can handle the class imbalance issue by assigning more weights to hard or easily miss-classified examples, was proposed as an different version of the CE loss This FL loss function is defined as: LF L ppt q “ ´αp1 ´ pt qγ logppt q, (2) Where α is a balanced form of Focal Loss, with the default value of 0.25; and the gamma (γ) is used for calculating the modulating factor, defaults to 2.0 C Proposed method As described above, Double Heads proposed a module that can be easily assigned to other detectors, such as Faster RCNN Therefore, we decided to attach Double Heads into Cascade RCNN and then evaluated this method on UIT-DroneFog dataset After analysed the result, we continued to change the default loss function, which was CE, to FL and called it CasDou method V EXPERIMENTS AND RESULT A Experimental Setting Our UIT-DroneFog is divided into subsets: training (8,580), validation (1,061), and testing (5,729) sets as mentioned All the experimental processes were conducted on a GeForce RTX 2080 Ti GPU with 11018 MiB memory We trained the models by employing the MMDetection framework V2.10.0[3] For each model, we used the highest mAP score configuration, provided on the MMDetection Github website4 , that can be able to run on a single GeForce RTX 2080 Ti GPU In terms of Guided Anchoring, we used default GAFaster RCNN configuration with the backbone X ´ 101 ´ 32x4d ´ F P N in 12 epochs and R ´ 50 ´ RP N was used for Double Heads within 12 epochs Meanwhile, our proposed method also used backbone R ´ 50 ´ RP N in 12 epochs in order to make an equal comparison with the default Double Heads https://github.com/open-mmlab/mmdetection C Analysis In general, the experimental results in Table II show that Guided Anchoring had lower results than Double Heads In terms of mAP score, Guided Anchoring achieved the worst performance with 31.39%, however, it had the best results for detecting Pedestrian (2,60%) and Motor (35.10%) Meanwhile, Double Heads showed that it was more efficient when detecting Car and Bus, especially Bus with 39.20% (5.40% higher than Guided Anchoring) The visualization is shown in Figure Furthermore, as Double Head had higher detection results, we decided to improve this model by combining it with Cascade RCNN and calling it CasDou The reasons why we had this combination were that Cascade RCNN had similar architecture to Faster RCNN (Double Heads’ default configuration) and Cascade RCNN was usually more efficient than Faster RCNN with the same backbone Then, we conducted an extensive experiment with Double Heads and CasDou However, these two models have only 0.1% difference in results This means when we attached the Double Head module into Cascade RCNN instead of using Faster RCNN as the default configuration, the detection performance was not improved as well as expected Moreover, we noticed that all the models still struggled with imbalanced data Pedestrian and Motor, which were the object accounted for the most extensive distributions (about 13.31% and 77.84%) on the UITDroneFog dataset, were commonly easily confused with each other when detecting in images In addition, Bus, the object that appears least in the dataset - shown in Figure 3, had the score of 33.80% in terms of Guided Anchoring but remarkably increased to 39.20% when using CasDou Consequently, we continued to shift from using CrossEntropy Loss (default configuration) in Double Heads and CasDou towards using Focal Loss, which was proven in the previous studies to decrease the confusion between classes when predicting objects As expected, both models achieved higher performances in each class and the mAP score, especially an increase to 34.70% (shown in Table III) was witnessed in the mAP score of CasDou The wrong detection of small objects and the missing detection of Car and Bus were significantly reduced (demonstrated in Figure 5) Generally, we improved the effectiveness in three classes: Pedestrian, Car, and Bus These results and visualization had proved the requirement of using the Focal Loss function to achieve better results in the problem of object detection in aerial foggy traffic images 293 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) TABLE II: Experimental results with the default configuration The best performance is marked in boldface Method Guided Anchoring Double Heads Pedestrian 2.60 1.60 Motor 35.10 33.20 Car 56.10 58.70 Bus 33.80 39.20 mAP 31.90 33.20 AP@.50 46.50 47.50 AP@.75 36.70 38.90 (a) Guided Anchoring (b) Double Heads Fig Exemplary of prediction cases using the default configuration The orange bounding boxes are predictions and the green bounding boxes are ground truth TABLE III: Experimental results of changed loss functions The best performance is marked in boldface Method Double Heads Double Heads CasDou CasDou Cls Loss Cross-entropy Focal loss Cross-entropy Focal loss Pedestrian 1.60 2.20 2.30 2.70 Motor 33.20 34.10 34.50 34.20 VI CONCLUSION Car 58.70 57.70 57.20 59.30 Bus 39.20 41.00 39.20 42.50 mAP 33.20 33.70 33.30 34.70 AP@.50 47.50 49.30 47.80 50.20 AP@.75 38.90 39.00 39.00 40.30 ACKNOWLEDGEMENT In this paper, we have introduced an aerial foggy dataset UIT-DroneFog - with classes which are Pedestrian, Motor, Car, and Bus with total of 15,370 images and about 0,6 million corresponding bounding boxes We conducted experiments on SOTA methods: Guided Anchoring, Double Heads and a proposed method called CasDou on our dataset Through experiments, the CasDou achieved the highest mAP score of 34.70% In the future, we will continue to expand and develop UIT-DroneFog dataset to a larger volume and more different levels of fog by applying other algorithms Furthermore, we will build a mobile app that can directly detect vehicles in fog weather, which can be used for different purposes This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number DS202126-01 We would like to express our sincere thanks to the Multimedia Communications Laboratory (MMLab), VNUHCM, and UIT-Together Research Group, for supporting my team in this research 294 REFERENCES [1] Codruta O Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer “O-haze: a dehazing benchmark with real hazy and haze-free outdoor images” In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops 2018, pp 754–762 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Fig Comparison between the result of Double Heads and CasDou experiment with Cross-Entropy and Focal Loss function The orange bounding boxes are predictions and the green bounding boxes are ground truth [2] Zhaowei Cai and Nuno Vasconcelos “Cascade r-cnn: Delving into high quality object detection” In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018, pp 6154–6162 [3] Kai Chen et al “Mmdetection: Open mmlab detection toolbox and benchmark” In: arXiv preprint arXiv:1906.07155 (2019) [4] Quynh M Chung et al “Data Augmentation Analysis in Vehicle Detection from Aerial Videos” In: 2020 RIVF International Conference on Computing and Communication Technologies (RIVF) 2020, pp 1–3 DOI: 10 1109/RIVF48685.2020.9140740 [5] Boyi Li et al “Benchmarking single-image dehazing and beyond” In: IEEE Transactions on Image Processing 28.1 (2018), pp 492–505 [6] Khang Nguyen et al “Detecting Objects from Space: An Evaluation of Deep-Learning Modern Approaches” In: Electronics 9.4 (2020), p 583 [7] Mario Pavlic, Gerhard Rigoll, and Slobodan Ilic “Classification of images in fog and fog-free scenes for use in vehicles” In: 2013 IEEE Intelligent Vehicles Symposium (IV) IEEE 2013, pp 481–486 [8] Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia “FFA-Net: Feature fusion attention network for single image dehazing” In: Proceedings of the AAAI [9] [10] [11] [12] [13] [14] 295 Conference on Artificial Intelligence Vol 34 07 2020, pp 11908–11915 Christos Sakaridis, Dengxin Dai, and Luc Van Gool “Semantic foggy scene understanding with synthetic data” In: International Journal of Computer Vision 126.9 (2018), pp 973–992 Rita Spinneker, Carsten Koch, Su-Birm Park, and Jason Jeongsuk Yoon “Fast fog detection for camera based advanced driver assistance systems” In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC) IEEE 2014, pp 1369–1374 Jean-Philippe Tarel, Nicolas Hautiere, Aurélien Cord, Dominique Gruyer, and Houssam Halmaoui “Improved visibility of road scene images under heterogeneous fog” In: 2010 IEEE Intelligent Vehicles Symposium IEEE 2010, pp 478–485 Jean-Philippe Tarel et al “Vision enhancement in homogeneous and heterogeneous fog” In: IEEE Intelligent Transportation Systems Magazine 4.2 (2012), pp 6–20 Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin “Region Proposal by Guided Anchoring” In: IEEE Conference on Computer Vision and Pattern Recognition 2019 Yue Wu et al “Rethinking Classification and Localization for Object Detection” In: (2019) arXiv: 1904 06493 [cs.CV] ... UIT- DroneFog (Ours) UIT- DroneFog dataset, we used SOTA methods and directly worked on foggy images without doing any dehazing tasks III UIT- DRONEFOG DATASET UIT- DroneFog is an aerial images dataset. .. Output Fig Foggy object detection The input is a foggy image, and the output is the location of possible objects in the image TABLE I: The statistic of publicly available datasets DroneFog dataset, ... captured by a drone This dataset was created by the implementation of a fog simulator on each image of the UIT- Drone212 dataset A UIT- Drone21 dataset UIT- Drone21 consists of 15,370 aerial images captured

Định dạng
Số trang	6
Dung lượng	17,02 MB