1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Research report topic face mask detection major computer engineering

33 8 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 33
Dung lượng 6,03 MB

Nội dung

HCM UNIVERSITY OF TECHNOLOGY AND EDUCATION FACILITY FOR HIGH-QUALITY TRADING DEPARTMENT OF COMPUTER AND COMMUNICATIONS ENGINEERING  HCMUTE RESEARCH REPORT TOPIC: FACE MASK DETECTION MAJOR: COMPUTER ENGINEERING Group 10: Phạm Minh Quân Nguyễn Hoài Phương Uyên 18161031 18119053 Ho Chi Minh City, Sunday, November 28, 2021 HCM UNIVERSITY OF TECHNOLOGY AND EDUCATION FACILITY FOR HIGH-QUALITY TRADING DEPARTMENT OF COMPUTER AND COMMUNICATIONS ENGINEERING  HCMUTE RESEARCH REPORT TOPIC: FACE MASK DETECTION MAJOR: COMPUTER ENGINEERING Group 10: Phạm Minh Quân 18161031 Nguyễn Hoài Phương Uyên 18119053 Supervise Teacher: PhD.Trương Ngọc Sơn Ho Chi Minh City, Sunday, November 28, 2021 INSTRUCTOR'S COMMENT TABLE Stt Implementation content Comment General comment: ………………………………………………………………………………………… ………………………………………………………………………………………… ………………………………………………………………………………………… ……………………… SUMMARY CONTENT LIST OF PICTURES LIST OF TABLES ABBREVIATIONS CHAPTER 1: INTRODUCTION 1.1 Introduction 1.3 Topic limit 1.4 Research Method 1.5 Object and Scope of Study 1.6 Report book layout CHAPTER 2: THEORY 2.1 Overview 2.2 Architecture of Yolo 2.2 Yolo's output 10 2.2.1 Predict on feature map 12 2.2.2 Anchor Box 13 2.2.3 Loss Function 14 2.3 Prediction on the bounding box 15 2.3.1 Non-max suppression 16 2.4 YOLOv5 Architecture 17 2.5 Face Mask Detection 18 CHAPTER 3: DESIGN SOFTWARE 20 3.1 THE ACTIVE FUNCTION OF SOFTWARE 20 3.1.1 Data Collection: 20 3.2 The training processing 20 3.2.1 Start training processing 22 CHAPTER 4: RESULTS 25 CHAPTER 5: CONCLUSION AND DEVELOPMENTS 25 5.1 CONCLUSION 25 5.2 DEVELOPMENTS 26 APPENDIX 26 REFERENCES 28 LIST OF PICTURES Image 1: YOLO's Architecture Image2 2: The layers in Dark-net 53 network Image2 3: The activative way of YOLO 10 Image2 The output’s architecture of YOLO 11 Image2 5: Some feature maps in YOLOv3 with 416x416 input, output’s feature maps is 13x13,26x26,52x52 12 Image2 6: Identify anchor box of an object 13 Image2 Algorithm decides whether class for cell 14 Image2 8: The formula estimates bounding box from anchor box 16 Image2 9: Non-max suppression From initial bounding boxes are decreased to bounding box 17 Image3 1: Use roboflow.ai to create a dataset and augmentation method 20 Image3 2: Clone repository and set up all dependencies in YOLOv5 21 Image3 3: 21 Image3.3 + 4: Use URL path to link directly to dataset in roboflow.ai 21 Image3 5: Dataset is contained in content’s folder 22 Image3 6: Figure of data.yaml file 22 Image3 7: Download the model to train 22 Image3 8: Figure of training process 23 Image3 9: Display results after training process 23 Image3 10 Figure of detecting process 24 Image4 1: : Results of training process 25 Image4 2: Results of detecting process 25 LIST OF TABLES ABBREVIATIONS CNN: Convolution Neural Network Relu: Rectified Linear Unit YOLO: You Only Look Once SSD: Single Shot Detection IoU: Interestion Over Union CSPNet: Cross Stage Partial Network PANet: Path Aggregation FPN: Feature Pyramid Network OpenCV: Open Computer Vision CHAPTER 1: INTRODUCTION 1.1 Introduction On March 11, 2020, the World Health Organization (WHO) issued a statement calling "COVID-19" a "Global Pandemic." To prevent the rapid spread of the pandemic, besides the encouragement given by WHO about wearing masks in crowded places, the Government of Vietnam has also required people to wear masks in public areas to limit the spread of the virus Prevent the spread of disease However, it is challenging and expensive to monitor the implementation of the Government's instructions with the old methods because of the lack of resources To support and improve monitoring and reminding people, our team will build a program to detect people not wearing masks in real-time automatically Today, artificial intelligence (AI) is increasingly popular and profoundly changes many aspects of daily life Computer vision (CV) is an important area of AI that includes acquiring, processing digital images, analyzing and recognizing images Deep learning neural network (Deep Learning Network) is a field of study of algorithms and computer programs so that computers can learn and make predictions like humans It is applied to many different applications such as science, engineering, other fields of life, and classification and object detection applications A typical example is CNN (Convolutional Neural Network) applied to automatic recognition, learning distinguishing patterns from images by successively stacking layers on top of each other In many applications, CNN is now considered a good example Full image classifier and leverages technologies in the field of computer vision that leverage machine learning However, besides that, CNN technology consumes many resources such as bandwidth, memory, and hardware processing capacity to classify an object To reduce these consumable resources, more and more algorithms and models over time have been introduced, including the YOLOv5 model for the recognition problem, specifically applied to the topic "Face mask detection." 1.2 Topic goal Apply basic knowledge about the process of training neural networks Understand the theoretical and architectural basis of the Yolov5 model for the object recognition problem Building a model capable of training different face mask detection datasets (Kaggle's face mask detection dataset and self-generated face mask detection dataset) Face recognition with and without a mask 1.3 Topic limit The data set has a relatively small number of images, which affects the accuracy The system only stops at research, not yet applied to the market 1.4 Research Method Based on the learned knowledge about training a neural network Collect documents, refer to previous related applications Consult and follow the instructor's instructions 1.5 Object and Scope of Study Identify people who are wearing masks and people not wearing masks in the dataset 1.6 Report book layout The thesis has a total of chapters: • Chapter - Overview In this chapter, learn about the issues that form the topic Attached are some contents and limitations of the topic that the project team has set • Chapter – Theoretical Basis An introduction to the background knowledge and the technology and software used in the project, including knowledge of image processing, neural network theory, characteristics, and how to train a dataset in YOLOv5 • Chapter – System Design Plan to use the sample set, interpret the model's parameters, the training process, the process of testing a face mask recognition system on the YOLOv5 platform • Chapter – Results Check the results of the training process and the recognition process • Chapter 5- Conclusion and development direction From Cell i, we can identify green-bordered anchor boxes as shown in the figure All these anchor boxes all intersect the bounding box of the object However, only the anchor box with the thickest blue border is selected as the anchor box for the object by IoU relative to the ground truth bounding box is the highest • Each object in the training image is distributed to a cell on the feature map that contains the midpoint of the object For example, the image in Figure will be assigned to the red cell because the midpoint of the image will fall into this cell From the cell we will define the anchor boxes surrounding the given image Image2 Algorithm decides whether class for cell Thus, when defining an object, we will need to identify components associated with it is (cell, anchor box) Not just the cell or just the anchor box Not just the cell or just the anchor box In some cases, two objects have the same midpoint, although very rarely, the algorithm will very well be able to determine the class for them 2.2.3 Loss Function After defining the information that the model needs to predict, and architecture of the CNN model Now it's time to define the error function To calculate the error for model, YOLO uses the squared error function, which holds the prediction and the label Our total error will be the sum of the three sub-errors shown below: 14 • The label type prediction error of the Object-Classification loss • The level, as well as the length and width of the boundary box, are determined by the prediction error -Localization loss • Whether the square error contains any objects or not - Confidence loss We want the error function to have the following functionality: • During training, the model will look at squares containing objects Increase the classification score of the correct class of that object • Then, also looking at that square, find the best boundary box out of the boxes is predicted • Increase the localization score of that boundary box, change the information boundary box to approximate the label For squares that not contain object, lower the confidence score and we won't care classification score and localization score of these squares 2.3 Prediction on the bounding box To predict the bounding box for an object we rely on a transformation from the anchor box and the cell YOLOv2 and YOLOv3 predict the bounding box such that no will not deviate from too much central location If the prediction bounding box can be placed in any part of the image, as in the regional proposal network, model training body shape becomes unstable Given an anchor box of size (pw ,ph) at the cell located on the feature map with its top left corner is (Cx , Cy) the model that predicts parameters ( tx,ty,tw,th) in measuring the first parameters are the offset (offset) from the top left corner of the cell and the last parameters are the ratio compared to the anchor box And these parameters will help determine bounding box predicts b with center (bx,by) and size (bw,bh) through the sigmoid function and exponential function like the formulas below: bx=(tx) + cx by=(ty) + cy bw = w 𝑒 𝑡 𝑤 bh = h 𝑒 𝑡 ℎ 15 Also because the coordinates have been adjusted according to the width and height of the image should always be within the threshold [0, 1] Because measuring when applying the sigmoid function helps us limit the coordinates not to exceed these thresholds Image2 8: The formula estimates bounding box from anchor box The outer dashed rectangle is the anchor box of size (pw,ph) Coordinates of a bounding box will be determined based on both the anchor box and the cell that it belong to This helps to control the position of the bounding box predicting somewhere around the position of the cell and bounding box without going too far outside the bounds Therefore, the training process will be much more stable than YOLOv1 2.3.1 Non-max suppression Because the YOLO algorithm predicts a lot of bounding boxes on an image, for cells that are close to each other, the possibility of overlapping frames is very high In that case YOLO will need non-max suppression to reduce the number of a significant number of frames are generated 16 Image2 9: Non-max suppression From initial bounding boxes are decreased to bounding box Steps of non-max suppression: • Step 1: First we will find a way to reduce the number of bounding boxes by filtering out all bounding boxes with a small probability of containing the object more than a certain threshold, usually 0.5 • Step 2: For intersecting bouding boxes, non-max suppression will select a bounding box with the highest probability of containing the object Then calculate the IoU interference index with the remaining bounding boxes If this index is greater than the threshold then it means bounding boxes are overlapping each other very high We will remove the boundings with lower probability and keep the boding box with the highest probability Finally, we obtain a bounding unique box for an object 2.4 YOLOv5 Architecture Yolov5 is built on Yolov1- Yolov4 and is a state-of-the-art, real-time object detector It has consistently outperformed the competition on two official object detection datasets: Pascal VOC (visual object classes) and Microsoft COCO (common objects in context) To begin, Yolov5 combined the cross stage partial network (CSPNet) into Darknet, resulting in the creation of CSPDarknet as the network's backbone CSPNet solves the problem of recurrent gradient information in large-scale backbones by including gradient changes into the feature map, reducing model parameters and FLOPS 17 (floating-point operations per second), ensuring inference speed and accuracy while simultaneously reducing model size Second, to improve information flow, the Yolov5 used a path aggregation network (PANet) as its neck PANet uses a new feature pyramid network (FPN) topology with an improved bottom-up approach to improve low-level feature propagation At the same time, adaptive feature pooling, which connects the feature grid to all feature levels, is employed to ensure that meaningful information from each feature level reaches the next subnetwork PANet improves the use of precise localization signals in lower layers, which can significantly improve the object's location accuracy Finally, Yolov5's head, the Yolo layer, generates three various sizes of feature maps (18 x18, 36 x36, 72 x72) to provide multi-scale prediction, allowing the model to handle tiny, medium, and large objects 2.5 Face Mask Detection In order to detect the face is wearing mask or not wearing mask We will use webcame through OpenCV to run in real-time It is general knowledge that videos are made up of still images called frames From a video, face detection was carried out in every frame For face detection, we will use YOLOv5 pre-trained model It is a algorithm to detect object in real-time Because it is trained in fast speed Moreover, it also returns the relative accuracy Furthermore, it is purposed to distinguish objects from a video or image Here, we use it to classify the face is wearing mask or not wearing mask First, Face Mask Detection need many different images to detect Here, we will label for frames in each image Then, pass them to model to train and return results Because the faces variable contains the rectangle's height and width, as well as the top-left corner coordinates enclosing the faces, it may be used to generate a face frame The approach for preprocessing is the same as the procedure for training the model in the second section The next step is to create a rectangle on top of the face and label it according to the predictions This brings us to the end of our paper We learned how to create a model that can detect masked faces and how to recognize faces in real time We may change the face detector to a mask detector using this concept 18 As we known, deep-learning has gradually developed by detection algorithms So, detect face also one of the import detection algorithms Detection is the first part of identity authentication and pattern recognition Though YOLO and its varieties aren’t as good in terms of accuracy as the FasterRCNN They outmarch their correspondents by a wide margin in terms of speed When facing with standard-sized objects, YOLO performs admirably, however it is incapable of detecting little objects When dealing with objects with faces that seem to have large scale changing properties, the accuracy reduces significantly In order to overcome , we will choose ultralytics open-source object-detection method YOLOv5 Because it can detect objects in small scale, hence improving performance in face detection The current technique involves the use of anchor boxes that are more suitable for face detection and a more accurate regression loss function The enhanced detector improved accuracy significantly while maintaining a fast detection speed 19 CHAPTER 3: DESIGN SOFTWARE 3.1 THE ACTIVE FUNCTION OF SOFTWARE 3.1.1 Data Collection: To prepare for label and training process, we will use a dataset of 120 images for people are wearing mask and not wearing mask Each image will be attached by a label (mask or nomask) The following images are some examples in datasets: Then, we use roboflow.ai software to add the augmentation for datasets Because datasets are small quantity, we need add many images for datasets which can returns the high accuracy Augmentation method that we used such as: saturation, noise, bounding box crop, bounding box brightness From a image will be created to images When we diverse input’s images, accuracy will increases Image3 1: Use roboflow.ai to create a datasets and augmentation method 3.2 The training processing We use Google Colab platform to complete the training processing for face mask detection Then, YOLOv5 to start training We begin by cloning the YOLO v5 repository and setting up the dependencies required to run YOLO v5 20 Image3 2: Clone repository and set up all dependencies in YOLOv5 Next, we link datasets in roboflow.ai through a URL link Image3 Image3.3 + 4: Use URL path to link directly to dataset in roboflow.ai This is an image when datasets is completely linked in content folder 21 Image3 5: Datasets is contained in content’s folder Now, classify each image with label in data.yaml file Image3 6: Figure of data.yaml file In this step, we download pre-trained weights to server for training processing Image3 7: Download the model to train 3.2.1 Start training processing Trained with 50 epoches, number of classes 22 Image3 8: Figure of training process Image3 9: Display results after training process This is code to display results when datasets is trained Then, we run file detect through file weights is previously trained to label for image and return accuracy 23 Image3 10 Figure of detecting process 24 CHAPTER 4: RESULTS Here is an image of results after training process Image4 1: : Results of training process Here is an image of results after detecting process Image4 2: Results of detecting process CHAPTER 5: CONCLUSION AND DEVELOPMENTS 5.1 CONCLUSION Model detected the face is wearing mask or not wearing mask Furthermore, model can detect with different color masks 25 However, with number of images in datasets are still small So, the accuracy is not high If multiple face frames are detected at the same time, the model will ignore some faces in that image From there, the results will not be accurate compared to when detecting the single face in an image 5.2 DEVELOPMENTS In future, we hope to develop model to detect multiple face frames at the same time Besides, we want to create an app which combines with face mask detection model so that system can use to identify people with or without mask in public APPENDIX !git clone https://github.com/ultralytics/yolov5.git !pip install -r /content/yolov5/requirements.txt %cd /content !curl -L "https://app.roboflow.com/ds/BmeW3iHGqh?key=hnPs1Fo6Ew" > roboflow.zip; unzip roboflow.zip; rm roboflow.zip 26 %cd /content/ %cat data.yaml train_path = '/content/train/images' valid_path = '/content/valid/images' test_path = '/content/test/images' !wget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt !python /content/yolov5/train.py img 416 batch epochs 50 data '/content/data.yaml' weights /content/yolov5s.pt nosave cache #display result images import glob from IPython.display import Image, display for imageName in glob.glob('/content/yolov5/runs/train/exp2/*.jpg'): #assuming JPG display(Image(filename=imageName)) print("\n") !python /content/yolov5/detect.py source /content/test/images weights /content/yolov5/runs/train/exp2/weights/last.pt img 416 save-txt save-conf #display result images import glob from IPython.display import Image, display for imageName in glob.glob('/content/yolov5/runs/detect/exp2/*.jpg'): #assuming JPG display(Image(filename=imageName)) print("\n") 27 REFERENCES [1]: YOLO Architecture [2]: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios [3]: How to train YOLOv5 on a custom datasets 28

Ngày đăng: 23/05/2023, 15:05

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w