0

Study and improve few shot learning techniques in computer vision application

64 1 0
  • Study and improve few shot learning techniques in computer vision application

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Tài liệu liên quan

Thông tin tài liệu

Ngày đăng: 12/05/2022, 11:15

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FACULTY OF COMPUTER SCIENCE & ENGINEERING ——————– * ——————— BACHELOR THESIS Study and Improve Few-shot Learning Techniques in Computer Vision Application Major: Computer Engineering Council: Computer Engineering Supervisor: Dr Le Thanh Sach Dr Nguyen Ho Man Rang Reviewer: Dr Nguyen Duc Dung —o0o— Student: Nguyen Duc Khoi (1752302) HO CHI MINH CITY, 8/2021 ĐẠI HỌC QUỐC GIA TP.HCM -TRƯỜNG ĐẠI HỌC BỗCH KHOA KHOA:KH & KT My tnh BỘ MïN: KHMT _ CỘNG HđA XÌ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - T - Hnh phc NHIM V LUN ỗN TT NGHIỆP Chœ ý: Sinh vi•n phải d‡n tờ nˆy vˆo trang thuyết tr“nh MSSV: 1752302 Họ vˆ T•n SV: NGUYEN DUC KHOI Đầu đề luận ‡n: Ngˆnh: Kỹ thuật M‡y t’nh EN: A study on few-shot learning for computer vision applications VN: Nghiên cứu cải tiến kỹ thuật học với số mẫu làm nhãn cho ứng dụng thị giác máy tính Nhiệm vụ (y•u cầu nội dung vˆ số liệu ban đầu): ¥ Study Deep learning, and literature review for few-shot learning; ¥ Propose a learning techinque for training deep models (in computer vision) with popuplar datasets on the Internet; ¥ Apply few-learning to an application in computer vision, from training, tuning, to deploying the trained model on embeded systems supported by NVIDIA’s technologies Ngˆy giao nhiệm vụ luận ‡n: 01/ 01 /2021 Ngˆy hoˆn thˆnh nhiệm vụ: 01/ 08 /2021 Họ t•n giảng vi•n hướng dẫn: Phần hướng dẫn: 1) L• Thˆnh S‡ch Đồng hướng dẫn 2) Nguyễn Hồ Mẫn Rạng Đồng hướng dẫn Nội dung vˆ y•u cầu LVTN đ‹ th™ng qua Bộ m™n Ngˆy th‡ng năm 2021 CHỦ NHIỆM BỘ MïN GIẢNG VIæN HƯỚNG DẪN CHêNH (Ký vˆ ghi r› họ t•n) (Ký vˆ ghi r› họ t•n) L• Thˆnh S‡ch PHẦN DËNH CHO KHOA, BỘ MïN: Người duyệt (chấm sơ bộ): Đơn vị: _ Ngˆy bảo vệ: _ Điểm tổng kết: _ Nơi lưu trữ luận ‡n: _ TRƯỜNG ĐẠI HỌC BỗCH KHOA KHOA KH & KT MỗY TờNH CNG HủA XÌ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phœc -Ngˆy 09 th‡ng 08 năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dˆnh cho người hướng dẫn) Họ vˆ t•n SV: MSSV: 1752302 Họ vˆ T•n SV: NGUYEN DUC KHOI Ngˆnh: Kỹ thuật M‡y t’nh Đề tˆi: EN: A study on few-shot learning for computer vision applications VN: Nghiên cứu cải tiến kỹ thuật học với số mẫu làm nhãn cho ứng dụng thị giác máy tính Họ t•n người hướng dẫn: TS L• Thˆnh S‡ch Tổng qu‡t thuyết minh: Số trang: Số chương: Số bảng số liệu Số h“nh vẽ: Số tˆi liệu tham khảo: Phần mềm t’nh to‡n: Hiện vật (sản phẩm) Tổng qu‡t c‡c vẽ: - Số vẽ: Bản A1: Bản A2: Khổ kh‡c: - Số vẽ vẽ tay Số vẽ tr•n m‡y t’nh: Những ưu điểm ch’nh LVTN: ¥ The author masters different techniques required for designing deep learning models, and for training, tunning, and deploying models to GPU cards with NVIDIÃs technologies ¥ The thesis consists of a science and an engineering task related to deep learning as follows: (a) Science: improve a selected few-shot learning technique for computer vision The author has proposed an idea that is based on the episodic training and the dense convolution The proposed idea has been evaluated with popular datasets reserved for the reseach field, it can gain some improvements The reseachÕs results have been submitted to an international conference and wait for the reviewersÕ conclusions (b) Engineering: apply the few-shot to train a selected computer vision task and then deploy the trained model to an embeded system GPU card To this end, the author selected application Ịdrowsiness detectionĨ He utilized few-shot to train YOLOv5 and then deploy the trained model to NVIDIA Jetson TX2 successfully The demo application can run and detect the drowsiness live Những thiếu s—t ch’nh LVTN: ¥ The publication is not available at the defenseÕs time as designed Ơ ngh: c bo v ỵ Bổ sung th•m để bảo vệ o Kh™ng bảo vệ o Ba c‰u hỏi SV phải trả lời trước Hội đồng: 10 Їnh gi‡ chung (bằng chữ: giỏi, kh‡, TB): 10 (mười) Ký t•n (ghi r› họ t•n) L• Thˆnh S‡ch TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HỊA Xà HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc -Ngày 01 tháng 08 năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người hướng dẫn/phản biện) Họ tên SV: Nguyễn Đức Khôi MSSV: 1752302 Ngành (chuyên ngành): Computer Engineering Đề tài: Research and Apply Few-shot Learning Techniques in Drowsiness Detection Họ tên người hướng dẫn/phản biện: Nguyễn Đức Dũng Tổng quát thuyết minh: Số trang: Số chương: Số bảng số liệu Số hình vẽ: Số tài liệu tham khảo: Phần mềm tính tốn: Hiện vật (sản phẩm) Tổng quát vẽ: - Số vẽ: Bản A1: Bản A2: Khổ khác: - Số vẽ vẽ tay Số vẽ máy tính: Những ưu điểm LVTN: The thesis focus on detecting drowsiness from the human face using deep learning approaches The team proposed using ResNet block instead of normal convolutional block in the YOLOv5 network to improve the detection accuracy The team also deploy this model to the embedded system (Jetson TX2) for realtime performance The results show some improvement in the detection accuracy Những thiếu sót LVTN: The replacement of ResNet block in the network has been utilized for awhile, which makes this contribution a bit weak The drowsiness detection problem, however, can be solved better by other vision techniques, which can be very fast and realtime The choice of current approach is very bias and need to be considered in the future The few-shot learning scheme is irrelevant to the main topic we are discussing Đề nghị: Được bảo vệ o Bổ sung thêm để bảo vệ o Không bảo vệ o câu hỏi SV phải trả lời trước Hội đồng: a Why don't you use other vision algorithms to detect drowsiness? Even if it will give much better performance comparing to YOLO? b Explain why Few-shot learning matter The discussion need to be improved c 10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi Điểm: /10 Ký tên (ghi rõ họ tên) Nguyễn Đức Dũng Declaration We hereby declare that this thesis titled ‘Research and Apply Few-shot LearningTechniques in Computer Vision Application’ and the work presented in it are our own We confirm that: • This work was done wholly or mainly while in candidature for a degree at this University • Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated • Where we have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely our own work • We have acknowledged all main sources of help • Where the thesis is based on work done by ourselves jointly with others, we have made clear exactly what was done by others and what we have contributed ourselves Acknowledgments First and foremost, I am tremendously grateful for my advisers Dr Sach Le Thanh and Dr Rang Nguyen Ho Man for their continuous support and guidance throughout my project, and for providing me the freedom to work on a variety of problems Second, I take this opportunity to express gratitude to all of the Faculty of Computer Science and Engineering members for their help and support I also thank my parents for the unceasin encouragement, support and attention Abstract Artificial intelligence for driving is receiving more attention Drowsiness detection is one of the smaller tasks to improve the driving experience A drowsiness detector can detect and warn the drivers when they fall asleep and prevent accidents caused by drivers’ drowsiness A simple approach is to consider the drowsiness detection problem as an object detection problem In this thesis, we adopt a powerful object detector called YOLOv5 It is one of the most popular frameworks for object detection that was released to the public In our experiments, the YOLOv5 framework can achieve excellent detection performance with abundant supervised data In terms of speed performance, we deploy the trained model to the Jetson TX2 using TensorRT, which significantly outperforms the released Pytorch implementation In practice, we are not always able to access an abundant amount of labeled data The limited number of training examples can lead to severely deficient performance, as shown in our experiments We propose to pretrain the model with other datasets to improve the overall performance without introducing any computational inference cost We introduce a pretraining method from few-shot learning that achieves state-of-the-art in widely used few-shot learning benchmarks to pretrain the model We extensively conduct experiments with several pretraining methods to analyze their transfer performance to object detection tasks Contents Introduction 1.1 Motivation 1.2 The Scope of the Thesis 1.3 Organization of the Thesis Foundations 2.1 Probabilities and Statistic Basics 2.1.1 Random Variables 2.1.2 Probability Distributions 2.1.3 Discrete Random Variables - Probability Mass Function 2.1.4 Continuous Random Variables - Probability Density Function 2.1.5 Marginal Probability 2.1.6 Conditional Probability 2.1.7 Expectation and Variance 2.1.8 Sample 2.1.9 Confidence Intervals 2.2 Machine Learning Basics 2.2.1 Supervised Learning 2.2.2 Unsupervised Learning 2.2.3 Semi-supervised learning 2.3 Few-shot Learning 2.4 Object Detection 4 4 4 5 5 7 9 10 Related work 3.1 Few-shot Learning 3.1.1 Meta-Learning 3.1.2 Metrics-Learning 3.1.3 Boosting few-shot visual learning with self-supervision 3.2 Object Detection 3.2.1 Two-stage Detectors 3.2.2 One-stage Detectors 11 11 11 13 13 16 16 18 Methods 4.1 Problem formulation 4.2 Bag of freebies 4.3 A Strong Baseline for Few-Shot Learning 4.3.1 Joint Training of Episodic and Standard Supervised Strategies 4.3.2 Revisiting Pooling Layer 4.4 YOLOv5 21 21 22 22 23 24 25 i 1 2 4.4.1 4.4.2 YOLOv5 architecture ResNet-50-YOLOv5 25 25 Experiments 5.1 Datasets 5.2 Results of Training ResNet-50-YOLOv5 from Scratch with Abundant Annotations 5.2.1 Implementation Details 5.2.2 Quantitative Results 5.2.3 Qualitative Results 5.3 Performance of Deploying ResNet-50-YOLOv5 with TensorRT 5.3.1 Comparison between TensorRT and Pytorch 5.3.2 Effective of image’s resolution on performance 5.4 Result of Baseline on Few-shot Benchmarks 5.4.1 Implementation Details 5.4.2 Results 5.5 Results of Training ResNet-50-YOLOv5 with Limited Annotations 5.5.1 Results of Training ResNet-50-YOLOv5 from Scratch with Limited Annotations 5.5.2 Results of Training Pretrained ResNet-50-YOLOv5 with Limited Annotations 28 28 Conclusion Appendix 7.1 Network architecture terminology 7.2 Jetson TX2 7.2.1 Jetson TX2 Developer Kit 7.2.2 JetPack SDK 7.3 Tensor RT 7.3.1 Developing and Deploying with Tensor RT 29 29 29 30 30 30 31 31 31 32 33 34 34 42 43 43 46 46 47 48 48 Figure 5.6: Precision of mini-ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average precision with coefficient of 0.9 Figure 5.7: Recall of mini-ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average recall with coefficient of 0.9 37 Figure 5.8: mAP@0.5 of mini-ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average mAP@0.5 with coefficient of 0.9 Figure 5.9: mAP@0.5:0.95 of mini-ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average mAP@0.5:0.95 with coefficient of 0.9 38 Pretraining method None Standard Supervised Barlow Twins [41] SWAV [5] Precision 0.454 0.643 0.561 0.586 Recall 0.673 0.714 0.364 0.506 mAP@0.5 0.594 0.619 0.513 0.522 mAP@0.5:0.95 0.442 0.476 0.345 0.365 Table 5.8: Performance of ImageNet-pretrained ResNet-50-YOLOv5 Precision, recall, mAP@0.5, and mAP@0.5:0.95 on testing set Pretrain Backbone on ImageNet We also consider pretraining the backbone on ImageNet We pretrain the backbone with standard supervised training, Barlow Twins [41], and SWAV [5] While standard supervised training requires labeled samples, the other two pretrain the model unsupervisedly ImageNetpretrained ResNet-50 of standard supervised training is provided through torchvision On the other hand, both Barlow Twins and SWAV pretrained ResNet-50 are provided through the torch hub Due to the lack of computational resources, we not apply our baseline on the ImageNet Evaluation on validation set during training are showed in Figures 5.11, 5.11, 5.12 and 5.13 Results on the testing set are shown in Table 5.8 Pretraining the backbone with standard supervised training still gives the best results in precision, recall, mAP@0.5, and mAP@0.5:0.95 Pretraining with Barlow Twins, SWAV gives no performance gain on the testing set Since our validation and training sets are split from the original training data from the dataset, the two sets might share some properties Model pretrained with Barlow Twins, SWAV, or our baseline seems to be overfitting to the original training set of the dataset On the other hand, pretraining with standard supervised generally performs well on both validation set and test set 39 Figure 5.10: Precision of ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average precision with coefficient of 0.9 Figure 5.11: Recall of ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average recall with coefficient of 0.9 40 Figure 5.12: mAP@0.5 of ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average mAP@0.5 with coefficient of 0.9 Figure 5.13: mAP@0.5:0.95 of ImageNet-pretrained ResNet-50-YOLOv5 on Validation Set Smoothed moving average mAP@0.5:0.95 with coefficient of 0.9 41 Chapter Conclusion YOLOv5 is a strong object detection framework Experiments on our dataset demonstrate that training YOLOv5 with abundant supervised data can achieve remarkable performance However, when the number of training samples is reduced, YOLOv5 struggles to detect drowsiness We propose to pretrain the backbone of the object detector with an external dataset We introduce a novel baseline of few-shot learning which achieves state-of-the-art in popular benchmarks and adopt it to pretrain the model for the object detection task Experiments showed that while our baseline outperforms standard supervised training significantly in few-shot benchmarks, it gives no performance gains in object detection As a result, we suggest pretraining the backbone of the object detector with standard supervised training when we have a large amount of weakly supervised data Finally, we deploy the trained model to Jetson TX2 with TensorRT, optimizing the speed performance 42 Chapter Appendix 7.1 Network architecture terminology This section provides some common layers used in defining object detection networks with PyTorch-style pseudocodes Focus A focus block first generates four sparsely compressed versions of the input and concatenate them into a single tensor A convolutional layer is utilized to further transform the tensor class Focus ( nn Module ): # Focus wh information into c - space def init ( self , c1 , c2 , k =1 , s =1 , p = None , g =1 , act = True ): # ch_in , ch_out , kernel , stride , padding , groups super ( Focus , self ) init () self conv = Conv ( c1 * , c2 , k , s , p , g , act ) def forward ( self , x ): return self conv ( torch cat ([ x [ , ::2 , ::2] , x [ , 1::2 , ::2] , x [ , ::2 , 1::2] , x [ , 1::2 , 1::2]] , 1)) Algorithm 1: PyTorch-style pseudocode for Focus layer 43 Conv A conv block consists of a convolutional layer, a batch normalization layer and an activation function It has two forward modes, i.e., with or without batch normalization layer class Conv ( nn Module ): # Standard convolution def init ( self , c1 , c2 , k =1 , s =1 , p = None , g =1 , act = True ): # ch_in , ch_out , kernel , stride , padding , groups super ( Conv , self ) init () self conv = nn Conv2d ( c1 , c2 , k , s , autopad (k , p ) , groups =g , bias = False ) self bn = nn BatchNorm2d ( c2 ) self act = nn SiLU () if act is True else ( act if isinstance ( act , nn Module ) else nn Identity ()) def forward1 ( self , x ): return self act ( self bn ( self conv ( x ))) def forward2 ( self , x ): return self act ( self conv ( x )) Algorithm 2: PyTorch-style pseudocode for Conv layer Bottleneck This block has one convolutional layer with kernel of size ⇥ followed by another convolutional layer with ⇥ kernel size The number of intermediate channels typically smaller than that of the input There is an optional shortcut which adds the input to output of two convolutional layers class Bottleneck ( nn Module ): # Standard bottleneck def init ( self , c1 , c2 , shortcut = True , g =1 , e =0.5): # ch_in , ch_out , shortcut , groups , expansion super ( Bottleneck , self ) init () c_ = int ( c2 * e ) # hidden channels self cv1 = Conv ( c1 , c_ , , 1) self cv2 = Conv ( c_ , c2 , , , g = g ) self add = shortcut and c1 == c2 def forward ( self , x ): return x + self cv2 ( self cv1 ( x )) if self add else self cv2 ( self cv1 ( x )) Algorithm 3: PyTorch-style pseudocode for Bottleneck layer 44 Cross stage partial bottleneck Cross stage partial (CSP) reduces gradient computation by splitting an input feature map into two half One goes through a usual block, e.g., dense block, bottleneck block, and the other is skipped to the output The splitting operation is implemented as a ⇥ convolutional layer A CSP bottleneck utilizes bottleneck as it usual block class BottleneckCSP ( nn Module ): def init ( self , c1 , c2 , n =1 , shortcut = True , g =1 , e =0.5): # ch_in , ch_out , number , shortcut , groups , expansion super ( BottleneckCSP , self ) init () c_ = int ( c2 * e ) # hidden channels self cv1 = Conv ( c1 , c_ , , 1) self cv2 = nn Conv2d ( c1 , c_ , , , bias = False ) self cv3 = nn Conv2d ( c_ , c_ , , , bias = False ) self cv4 = Conv (2 * c_ , c2 , , 1) self bn = nn BatchNorm2d (2 * c_ ) # applied to cat ( cv2 , cv3 ) self act = nn LeakyReLU (0.1 , inplace = True ) self m = nn Sequential (*[ Bottleneck ( c_ , c_ , shortcut , g , e =1.0) for _ in range ( n )]) def forward ( self , x ): y1 = self cv3 ( self m ( self cv1 ( x ))) y2 = self cv2 ( x ) return self cv4 ( self act ( self bn ( torch cat (( y1 , y2 ) , dim =1)))) Algorithm 4: PyTorch-style pseudocode for CSP bottleneck layer Spatial pyramid pooling Spatial pyramid pooling (SPP) is used to downsample a feature map It linearly transforms an input feature map with a ⇥ convolutional layer Then, it concatenates outputs from a series of increasing kernel-size max pooling layers A final convolutional layer is leveraged to map the result output to a desired number of channels class SPP ( nn Module ): def init ( self , c1 , c2 , k =(5 , , 13)): super ( SPP , self ) init () c_ = c1 // # hidden channels self cv1 = Conv ( c1 , c_ , , 1) self cv2 = Conv ( c_ * ( len ( k ) + 1) , c2 , , 1) self m = nn ModuleList ([ nn MaxPool2d ( kernel_size =x , stride =1 , padding = x // 2) for x in k ]) def forward ( self , x ): x = self cv1 ( x ) return self cv2 ( torch cat ([ x ] + [ m ( x ) for m in self m ] , 1)) Algorithm 5: PyTorch-style pseudocode for SPP layer 45 7.2 Jetson TX2 Jetson TX2 is a fast and power-efficient embedded device released by NVIDIA It provides a high level AI solution to the edge computing The specifications of Jetson TX2 is as follow • GPU 256-core NVIDIA Pascal™ GPU architecture with 256 NVIDIA CUDA cores • CPU Dual-Core NVIDIA Denver 64-Bit CPU Quad-Core ARMđ Cortexđ-A57 MPCore ã Memory 8GB 128-bit LPDDR4 Memory 1866 MHx - 59.7 GB/s • Storage 32GB eMMC 5.1 • Power 7.5W / 15W 7.2.1 Jetson TX2 Developer Kit The Jetson TX2 module is included in the Jetson TX2 Developer Kit, providing an easy and compact way to hand on hardware and software of Jetson TX2 The board is provided with Linux development environment pre-installed Figure 7.1 shows the kit’s components What included in the box are • Jetson TX2 module • Reference carrier board • Power supply with AC cord • USB Micro-B to USB A cable • USB Micro-B to Female USB A cable • (2x) WLAN/Bluetooth antenna Figure 7.1: Jetson TX2 Developer Kit is a compact way to get started with hardware and software of the Jetson TX2 module 46 7.2.2 JetPack SDK JetPack SDK, provided by NVIDIA, is a complete package for AI developer working with Jetson products It consists of the latest OS images, libraries and APIs, samples, developer tools, and documentation Components Summary of JetPack components: • OS Image • Libraries and APIs • Sample Applications • Developer Tools The set of libraries and APIs includes TensorRT, cuDNN, CUDA, VisionWorks, OpenCV, etc For more details about JetPack’s components, please refer to https://docs.nvidia.com/jetson/jetpack/index.ht Installation An easy way to install JetPack to your Jetson TX2 is to use NVIDIA Software Development Kit (SDK) Manager NVIDIA SDK Manager provides you an complete solution for installing environments to your NVIDIA hardware development platforms Figure 7.2 shows how to install packages to you NVIDIA device via a host machine with NVIDIA SDK Manager Figure 7.2: NVIDIA SDK Manager NVIDIA SDK Manager requires you to have a host machine Please refer to NVIDIA SDK Manager System Requirements to see the system requirements Follow below steps to get your host machine ready: 47 • Download NVIDIA SDK Manager from https://developer.nvidia.com/nvidia-sdk-manager to your host machine • Installation NVIDIA SDK Manager to your host machine: from your terminal install the Debian package with: sudo apt install /sdkmanager_[version]-[build#]_amd64.deb • Launch SDK Manager from the Ubuntu launcher or from your terminal with sdkmanager • Login and Run SDK Manager See https://docs.nvidia.com/sdk-manager/download-runsdkm/index.html for more details Connect your Jetson TX2 to your host machine and following tutorial in https://docs.nvidia.com/sdkmanager/install-with-sdkm-jetson/index.html to setup development environment to your Jetson TX2 7.3 Tensor RT Tensor RT is a C++ library that provides a solution for high-performance inference on NVIDIA GPUs More specifically, a trained model written in a general purpose framework, e.g Tensorflow, Pytorch, can be optimized and deployed with Tensor RT to boost inference performance 7.3.1 Developing and Deploying with Tensor RT Typically, there are three phases for developing and deploying a deep learning model, namely training, developing deployment solution and deploying At training time, the model is usually written in a general purpose deep learning library such as Tensorflow, Pytorch These libraries are designed to maximize flexibility in designing model architectures, training algorithms, etc They provide API in high-level programming languages, e.g python, which allows a large number of libraries for different purposes A deep learning model can be a small part of a larger system In the second phase, one could use Tensor RT to compress the trained model to an inference engine The engine building phase could take a considerable time so one could save the engine to a plan file for later use Note that each plan file is not portable across platforms or TensorRT versions, and is specific to the GPU model for which it was built on In the final stage, the plan file is deserialized into an engine before starting any inference Usually, the TensorRT is used asynchronously so one could construct two buffers for input and output respectively The input data is then enqueued to feed in the input buffer 48 Bibliography [1] Kelsey Allen, Evan Shelhamer, Hanul Shin, and Joshua Tenenbaum Infinite mixture prototypes for few-shot learning In International Conference on Machine Learning, pages 232–241 PMLR, 2019 [2] Antreas Antoniou, Harrison Edwards, and Amos Storkey How to train your maml arXiv preprint arXiv:1810.09502, 2018 [3] Luca Bertinetto, Joao F Henriques, Philip HS Torr, and Andrea Vedaldi Meta-learning with differentiable closed-form solvers arXiv preprint arXiv:1805.08136, 2018 [4] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao Yolov4: Optimal speed and accuracy of object detection arXiv preprint arXiv:2004.10934, 2020 [5] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin Unsupervised learning of visual features by contrasting cluster assignments arXiv preprint arXiv:2006.09882, 2020 [6] Guneet S Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto A baseline for few-shot image classification arXiv preprint arXiv:1909.02729, 2019 [7] Chelsea Finn, Pieter Abbeel, and Sergey Levine Model-agnostic meta-learning for fast adaptation of deep networks arXiv preprint arXiv:1703.03400, 2017 [8] Chelsea Finn and Sergey Levine Meta-learning: from few-shot learning to rapid reinforcement learning In ICML, 2019 [9] Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick P´erez, and Matthieu Cord Boosting few-shot visual learning with self-supervision In Proceedings of the IEEE International Conference on Computer Vision, pages 8059–8068, 2019 [10] Spyros Gidaris and Nikos Komodakis Dynamic few-shot visual learning without forgetting In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4367–4375, 2018 [11] Ross Girshick Fast r-cnn In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015 [12] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik Rich feature hierarchies for accurate object detection and semantic segmentation In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014 [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Deep residual learning for image recognition In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016 [14] Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen Cross attention network for few-shot classification arXiv preprint arXiv:1910.07677, 2019 [15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton Imagenet classification with deep convolutional neural networks Advances in neural information processing systems, 25:1097–1105, 2012 49 [16] Yann LeCun The mnist database of handwritten digits http://yann lecun com/exdb/mnist/, 1998 [17] Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto Metalearning with differentiable convex optimization In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019 [18] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li Meta-sgd: Learning to learn quickly for few-shot learning arXiv preprint arXiv:1707.09835, 2017 [19] Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, and Andrei Bursuc Dense classification and implanting for few-shot learning In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9258–9267, 2019 [20] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, and Alexander C Berg Ssd: Single shot multibox detector In European conference on computer vision, pages 21–37 Springer, 2016 [21] Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang Learning to propagate labels: Transductive propagation network for few-shot learning In International Conference on Learning Representations, 2018 [22] Jonathan Long, Evan Shelhamer, and Trevor Darrell Fully convolutional networks for semantic segmentation In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015 [23] Boris Oreshkin, Pau Rodr´ıguez L´opez, and Alexandre Lacoste Tadam: Task dependent adaptive metric for improved few-shot learning Advances in Neural Information Processing Systems, 31:721–731, 2018 [24] Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, and Yonghong Tian Transductive episodic-wise adaptive metric for few-shot learning In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3603–3612, 2019 [25] Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L Yuille Few-shot image recognition by predicting parameters from activations In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7229–7238, 2018 [26] Sachin Ravi and Hugo Larochelle Optimization as a model for few-shot learning 2016 [27] Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto Few-shot learning with embedded class models and shot-free meta training In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 331–339, 2019 [28] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi You only look once: Unified, real-time object detection In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016 [29] Joseph Redmon and Ali Farhadi Yolo9000: better, faster, stronger In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017 [30] Joseph Redmon and Ali Farhadi Yolov3: An incremental improvement arXiv preprint arXiv:1804.02767, 2018 [31] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Faster r-cnn: Towards real-time object detection with region proposal networks arXiv preprint arXiv:1506.01497, 2015 [32] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al Imagenet large scale visual recognition challenge International journal of computer vision, 115(3):211– 252, 2015 50 [33] Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell Meta-learning with latent embedding optimization arXiv preprint arXiv:1807.05960, 2018 [34] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap Meta-learning with memory-augmented neural networks In International conference on machine learning, pages 1842–1850, 2016 [35] Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap A simple neural network module for relational reasoning In Advances in neural information processing systems, pages 4967–4976, 2017 [36] Jake Snell, Kevin Swersky, and Richard Zemel Prototypical networks for few-shot learning In Advances in neural information processing systems, pages 4077–4087, 2017 [37] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales Learning to compare: Relation network for few-shot learning In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1199–1208, 2018 [38] Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B Tenenbaum, and Phillip Isola Rethinking few-shot image classification: a good embedding is all you need? arXiv preprint arXiv:2003.11539, 2020 [39] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al Matching networks for one shot learning In Advances in neural information processing systems, pages 3630–3638, 2016 [40] Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha Few-shot learning via embedding adaptation with set-to-set functions In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8805–8814 IEEE, 2020 [41] Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St´ephane Deny Barlow twins: Selfsupervised learning via redundancy reduction arXiv preprint arXiv:2103.03230, 2021 [42] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang Random erasing data augmentation In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008, 2020 51 ... ban đầu): ¥ Study Deep learning, and literature review for few- shot learning; ¥ Propose a learning techinque for training deep models (in computer vision) with popuplar datasets on the Internet;... training and global training interchangeably Classifier A classifier is a common component in few- shot learning works Several classifiers have been developed for few- shot learning such as cosine... population’s mean 2.2 Machine Learning Basics Many works described in this text are deep learning techniques, which are a particular type of machine learning Understanding machine learning basic concepts
- Xem thêm -

Xem thêm: Study and improve few shot learning techniques in computer vision application ,