2021 8th NAFOSTED Conference on Information and Computer Science (NICS) An Embedded Machine Learning System For Real-time Face Mask Detection And Human Temperature Measurement Lien Nguyen∗ , Trang N.M Cao† , Lam Huynh-Anh‡ , Hanh Dang-Ngoc§ Faculty of Electrical and Electronics Engineering, Ho Chi Minh city University of Technology, Ho Chi Minh city, Vietnam Email: ∗ lien.nguyen1812799@hcmut.edu.vn, † trang.cao1814391@hcmut.edu.vn, ‡ lam.huynh05042000@hcmut.edu.vn, § hanhdn@hcmut.edu.vn Abstract—In this paper, an efficient embedded machine learning system is proposed to automatically detect face masks and measure human temperature in a real-time application In particular, our system uses a Raspberry-Pi camera to collect realtime video and detect face masks by implementing a classification model on Raspberry Pi in public places The face mask detector is built based on MobileNetV2, with ImageNet pretrained weights, to detect three cases of correctly wearing, incorrectly wearing and not wearing a mask We also design a human temperature measurement framework by deploying a temperature sensor on the Raspberry Pi The numerical results prove the practicality and effectiveness of our embedded systems compared to some state-of-the-art researches The results of accuracy rate in detecting three cases of wearing a face mask are 98.61% based on the training results and 97.63% for validation results Meanwhile, our proposed system needs a short time of seconds for each person to be tested through the whole process of face mask detection and human forehead temperature measurement Index Terms—COVID-19, embedded machine learning system, face mask detection, human temperature measurement I I NTRODUCTION The COVID-19 virus mainly spreads through droplets that emerge from a person infected with corona-virus (SARS-CoV2) and poses a risk to others The risk of transmission is highest in public places [1] After one person gets infected, it takes almost fourteen days for the virus to grow in the body of its host and affect them In the meantime, it spreads to almost everyone who is in contact with that person One of the best ways to stay safe from getting infected is by wearing a face mask in open territories, as indicated by the World Health Organization [2], [3] Furthermore, elevated body temperature can be a common symptom of the medical condition COVID19 [4], but the normal way of using handheld devices to measure human temperature with a close distance fewer than meters might cause the infection for people Therefore, a stand-alone system for both face mask detection and noncontact forehead temperature measurement in public places has become the crucial embedded machine learning system to tackle this global problem Many computer vision-based systems have been deployed since December 2019 when the SARS-CoV-2 spread around 978-1-6654-1001-4/21/$31.00 ©2021 IEEE the world from Wuhan (China) Authors in [1]–[4] used MobileNetV2 and OpenCV for their face mask detection frameworks with high accuracy of training phase Their frameworks followed two main steps of detecting and auto-cropping human faces, then all the images were labeled for each person A bounding box was used to detect whether people were wearing face masks Other researchers used YOLOv3 and Haar cascade classifiers [5] with the accuracy result of 90.1% In [6], the authors proposed a transfer learning method based on the combination of MobileNetV2 and support vector machine However, those above methods were designed as an initial study to deploy an automatic system of face mask detection, which might not be practical for working day-by-day without human supervision In fact, the main shortage of [2]–[6] was the researchers used dataset that contained only classes: with and without mask, which certainly caused missed detection for people who incorrectly wore mask or intentionally covered their faces with scarves or handkerchiefs In the embedded machine learning research field, authors in [7] proposed a system that includes three phases of person detection, safe distance measurement between detected people, and face mask detection using single shot object detection with MobileNet V2 and OpenCV Other authors proposed a subsystem implemented in the entering door for temperature detection, face mask detection with a smartphone application for security guards [8] They used Arduino UNO enabled with an infrared thermal camera to measure human temperature and send alert messages to the security guards by using an ESP8266 Wi-Fi module Despite their optimistic results, the system consisted of various hardware components connected to a laptop which contained the corresponding software This complicated deployment made research [8] not flexible enough to work 24/7 in public places In this paper, we propose an embedded machine learning system deployed on the Raspberry Pi for automatically detecting face masks and measuring human forehead temperature with a MLX90614 temperature sensor We divide our detection problems into two classes, which are “Mask” for cases of correctly wearing masks and “No mask” for cases of incorrectly wearing or not wearing masks In order to 17 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) ensure the efficiency of our model, we use both real-time video streamed by the Raspberry-Pi camera and four videos collected from the internet of people in public for the testing phases After the face mask detection step, the human forehead temperature is measured using a temperature sensor on the Raspberry Pi 3, then results of these two phases are displayed on the LCD screen The Raspberry Pi will give a warning sound if there is an overheated case The whole system can work 24/7 using a power adapter The remainder of this paper is organized as follows Section I reviews some state-of-the-art researches in the field of computer vision based system, then discusses some aspects that need to be improved in some related embedded machine learning researches Our proposed system is presented in Section II with two main stages of building the face mask detection model and deploying the whole system on the Raspberry Pi The experimental results and discussion are carried out in Section III Finally, Section IV is the conclusion that remarks the contributions of our work (a) Correctly wearing a mask (b) Correctly wearing a pattern mask (c) Incorrectly wearing a (d) Incorrectly mask masks in public (e) Not wearing a mask II M ETHODOLOGY There are two main stages in our proposed system: (i) training and testing a machine learning model to detect cases of wearing face masks (ii) deploying a face mask detection model along with human temperature measurement on Raspberry Pi Our dataset has 5481 images in total with various sizes, which comprises cases of wearing masks: 1915 images of correctly wearing masks, 1782 images of incorrectly wearing masks1 and 1784 images of not wearing masks2 as shown in Fig All the images of correctly wearing masks are labeled as “Mask” Both the incorrectly wearing and not wearing masks cases are labeled as “No mask” We use 80% of the dataset for training and 20% remaining for testing, as shown in Table I A The Face Mask Detection Model 1) Pre-processing: Pre-processing steps include resizing each image to 224×224 pixels, converting them into array format, and scaling the pixel intensities in the input image to the range of [-1, 1] by some preprocessing functions Then, one-hot coding is used to represent categorical variables as binary vectors on the labels Essentially, this process converts our two labels, which are “Mask” and “No mask” into specific vectors If a training image is representative of the “Mask”, then the value will be [1, 0] Otherwise, for a “No mask” case, the value would be [0, 1] In the next step, we split the data into 80% for training and 20% for testing In the data augmentation step, we use the ImageDataGenerator to rotate, ”MaskedFace-Net - A dataset of correctly/incorrectly masked face images in the context of COVID-19”, Adnane Cabani, Karim Hammoudi, Halim Benhabiles, and Mahmoud Melkemi, Smart Health, ISSN 2352-6483, Elsevier, 2020 https://doi.org/10.1016/j.smhl.2020.100144 ”Masked Face Recognition Dataset and Application”, Zhongyuan Wang, Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu, Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing Huang, Jinbi Liang, abs/2003.09093, 2020 https://arxiv.org/abs/2003.09093 wearing (f) Not wearing masks in public Fig 1: Dataset for face mask detection model TABLE I: Dataset for training and testing phases Mask No mask Training phase Testing phase Total image 1532 2852 383 714 1915 3566 zoom, shift, shear, and horizontally flip all the images in the training set 2) Training Model: MobileNetV2 was built upon the ideas of MobileNet [9], using depthwise separable convolution as efficient building blocks [10] The key difference in depthwise separable convolutions of MobileNetV2 was to replace a full convolutional operator with a factorized version that splits convolution into two separate layers The first layer was called a depthwise convolution, it performed lightweight filtering by applying a single convolutional filter per input channel The second layer was a 1x1 convolution, called a pointwise convolution, which was responsible for building new features through computing linear combinations of the input channels All the layers used in this proposed model are implemented using Keras layers API In order to improve the accuracy of the pre-trained model for face mask detection, we coordinate some Keras layers and MobileNetV2 with pre-trained Imagenet weights They are used as the base model and left off the head of fully connected layer sets The input shape dimension for MobileNetV2 as a base model is 224x224 using channels Then we construct some layers that will be replaced as the head of the base model, which are some Keras layers such as Average Pooling 2D, Flatten, Dense, and Dropout The Average Pooling 2D layer calculates the average output of each feature map in the previous layer and, in order to prevent overfitting, we have a 18 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Fig 2: Block diagram of our proposed system 50% drop-out rate The fully connected head model is placed on top of the base model This is the actual model which will be trained Finally, the model is compiled with the Adam optimizer and binary cross-entropy loss function 3) Testing Model: In the testing phase, OpenCV is used for object detection We use both collected videos on Youtube and real-time video captured by the camera of Raspberry Pi Each video in the testing set will be grabbed by the dimensions of the frame and then constructed into a blob The blob will then be passed through the network to obtain a bounding box for the face detection step Finally, faces in the frame from the video stream and bounding boxes are added to their respective lists The prediction of wearing masks will be processed only if at least one face was detected The detected face locations and their corresponding locations will be looped over and the output frame will give out the class label “Mask” or “No mask” on the bounding box rectangle B Face Mask Detection and Human Temperature Measurement on Raspberry Pi This section describes the coordination of face mask detection and temperature measurement The combination of these two functions must be done on the Raspberry Pi which is connected with some peripherals such as Raspberry-Pi camera, temperature sensor, 16x2 LCD, and a buzzer to complete the block system as shown in Fig First, real-time video from the Raspberry-Pi camera will be an input for the pre-trained model to detect face masks In this initial step, the Raspberry-Pi camera also helps to detect human existence Whenever the face detection step is done, a non-contact temperature sensor MLX90614 is turned on by the human existence signal, so that it can send the human forehead temperature to the Raspberry Pi to analyze The default temperature is set at 37 degrees Celcius, since it is the common temperature of a healthy person [11] Meanwhile, the buzzer gives a warning sound if there is any “No mask” detected cases Afterward, the result of human temperature is displayed on the 16x2 LCD and a buzzer will give a warning sound to indicate an overheated case detected III R ESULTS AND D ISCUSSION The experimental setup computer is Intel(R) Core(TM) i57200U CPU @ 2.50GHz with 16.0GB RAM A Training/Testing the face mask detection model using images in dataset The parameters are initialized as follows: the learning rate is 0.0001, the number of epochs is 20 and the batch size is 32 Our proposed framework for face mask detection uses 80% of a total 5481 images for the training phases As shown in Fig 3, the training loss and validation loss were achieved at 4.66% and 5.56%, respectively A total of 1097 images are used for the testing phases The results are classified into four categories: true positive, true negative, false positive and false negative True positives (TP) and true negatives (TN) are the observations indicating the correct detection False positive (FP) means the number of samples in the detected object category is inconsistent with the actual object category, and false negative (FN) indicates that the actual sample is detected as the opposite result or in the undetected category Because all positive cases predicted by the model are (TP + FP), the proportion of real cases (TP) is called the precision rate, which represents the proportion of samples of real cases in positive cases among samples detected by the model, as shown in equation (1) As shown in Table II, our model can achieve 96% and 99% of precision rate for “Mask” and “No mask” cases, respectively These optimistic results show the classification ability to accurately detect the considered positive class to the other, such as “Mask” to “No mask” and vice versa 19 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) B Testing the face mask detection model by video collected from the internet Fig 3: Training accuracy/loss vs number of epochs TABLE II: Performance results of our proposed system Testing images Precision Recall F1-score Mask No mask 383 714 0.96 0.99 0.98 0.98 0.97 0.98 Accuracy rate 1097 0.98 P recision = TP TP + FP (1) The recall rate is used to measure the ability of the model for correctly predicting positive observations from all observations in the actual class [5], as written in (2) It is also called the sensitivity of the model in detecting the considered positive images from all the labeled ones As shown in Table II, our model can achieve 98% of recall rate for both “Mask” and “No mask” cases, which ensures the ability of our model in detecting the actual positive images from the correctly predicted positive images and the incorrectly negative detected ones Recall = TP TP + FN (2) Equation (1) and (2) show that there is a contradiction between precision rate and recall rate F1-score represents the harmonic average of precision rate and recall rate, as shown in equation (3) It is the weighted average of precision and recall so that F1-score takes both FP and FN cases into account As shown in Table II, our model can achieve 97% and 98% of F1-score for “Mask” and “No mask” cases, respectively Because our classification problems also consider the TP and TN cases, we show our high accuracy rate of 98% in Table II F − score = × P recision × Recall P recision + Recall (3) After training and testing the face mask detection model with 5481 images from the dataset, we continue to test our model with some videos collected from Youtube before letting the model work with real-time video Fig shows some captured frames of successfully detected results from four collected videos from Youtube3,4,5,6 As shown in Fig 4(a), Fig 4(b), Fig 4(c) and Fig 4(d), our model can give a high accuracy rate for “No mask” and “Mask” cases of both single person and multiple people Our proposed model can also correctly recognize whether human faces are covered with scarves, handkerchiefs or fabric face masks, such as in Fig 4(e) and Fig 4(h) Furthermore, by combining both incorrectly and not wearing mask images as one label named “No mask”, our model can give high accuracy in detecting some incorrect cases as shown in Fig 4(f) and Fig 4(g), while face masks can not fully cover human faces Last but not least, Fig 4(i) and Fig 4(j) shows some successful detected results, which proves the effectiveness of our proposed model in recognizing incorrect and correct wearing masks cases Look into some missed detected cases in Fig 5, there are some captured frames without recognizing face masks In fact, since we aim to deploy our model to process oneby-one person in real-time, this can be improved by setting an appropriate distance from our system to people, which could help the camera to detect all the details of human faces and masks Based on our experimental results, the most appropriate distance from our system to people is centimeters, which could help both Raspberry-Pi camera and MLX90614 temperature sensor to work at their best to collect all the information from human faces and human temperature C Testing the face mask detection model by real-time video captured by a built-in camera on Raspberry Pi We use Raspberry Pi Model B with 1.2 GHz 64-bit quadcore ARMv8 CPU to deploy the face mask detection phase after training it on our experimental setup computer In order to deploy the model in public places, we conduct a testing phase with real-time video taken from the built-in camera on Raspberry Pi In this step, we use VNC Viewer which is a cross-platform screen sharing system that was created to remotely control Raspberry Pi through our experimental setup laptop to collect some successful detected cases as shown in Fig Our model can detect whether people use different common types of face masks, such as medical face masks or fabric pattern face masks, as shown in Fig 6(a) and Fig 6(b), Coronavirus outbreak: Mixed messaging about mandatory face masks, https://www.youtube.com/watch?v=hekZBf8oUq0 DWCRA Women Face to Face - Making Masks In Tirupati - Chittoor District - Sakshi TV, https://www.youtube.com/watch?v=EIY9xJc4s0Q Lagosians On The Use Of Face Mask, https://www.youtube.com/watch?v=nf4bZgHsa5E Nigerians React To Wearing Face Mask - Street Login, https://www.youtube.com/watch?v=V54hhnyAntU 20 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) (a) Video - Frame (c) Video - Frame (e) Video - Frame (g) Video - Frame (b) Video - Frame (a) Mask (b) Fabric mask (c) Mask with sunglasses (d) Fabric mask with sunglasses (e) Not wearning face mask (f) Face covered by hand (d) Video - Frame (f) Video - Frame (h) Video - Frame (g) Incorrectly wearing mask (h) Incorrectly wearing mask (i) Video - Frame Fig 6: Example of successful detected cases (j) Video - Frame Fig 4: Example of successful detected cases (a) Video - Frame (b) Video - Frame (a) Face covered by a note- (b) Face covered by a handbook kerchief Fig 5: Example of missed detected cases Fig 7: Example of failed detected cases respectively Besides, because of the variety of our dataset, we can also evaluate some cases when people wear both face mask and sunglasses, as shown in Fig 6(c) and Fig 6(d), which also gives high accuracy results Furthermore, our model can either detect some cases of people who not wear mask or use their hand to cover their face and label them as “No mask”, as shown in and Fig 6(e) and Fig 6(f) For some cases when people incorrectly wearing mask as shown in Fig 6(g) and Fig 6(h), our model gives high accuracy of detecting them as “No mask”, then the buzzer will be turned on for warning Look into some failed detected cases as shown in Fig 7, there are some missed cases when people intentionally try to cover their face with some other things but masks such as a notebook or a handkerchief Fig 7(a) and Fig 7(b) show a low accuracy rate of “Mask” label, which can prove that these issues can be improved by increasing the quantity of our dataset for more incorrectly wearing mask cases 21 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) based on Tensorflow, MobileNetV2 and Keras, then embedded on the Raspberry Pi with a Raspberry-Pi camera for realtime detection Our proposed system proves the effectiveness of a practical, high-accuracy, and low-cost contactless system to handle multiple tasks in the era of COVID-19 compared to previous works It takes a short time of seconds for each person to have their face mask detected and temperature measured Our proposed system can be further developed with higher quality peripherals to obtain better results ACKNOWLEDGMENT This research is funded by Ho Chi Minh City University of Technology – VNUHCM, under grant number SVCQ-2020DDT-118 R EFERENCES Fig 8: Experimental proposed system D The face mask detection and human forehead temperature measurement real-time application As described in Fig 2, we use a Raspberry Pi with some connected peripherals to complete the whole system as shown in Fig This packed system is set on a camera stick holder with an appropriate height of 1.6 meters To the experiment, people will go through our system indoors with good light conditions They are required to step one-by-one at a distance of 1.5 meters in front of the system for face mask detection Results show that it takes about seconds to detect face mask wearing After that, the testing person steps closer to the system for measuring forehead temperature The appropriate distance between the MLX90614 sensor and the testing person is about centimeters The distance for temperature measurement can be lengthened using higher quality temperature sensor The signal from face mask detection activates the temperature sensor and the temperature measurement process takes about second The face mask detection and measured temperature results are finally displayed on the 16x2 LCD as shown in Fig If the system detects a “No mask” and/or an overheated case, a buzzer is turned on and gives a warning sound for seconds The warning sounds of those cases are different to be distinguished In summary, our system takes about seconds for each person to complete the whole process of face mask detection and forehead temperature measurement Our system can be deployed at the entrance of public places with good lighting conditions Due to the RGB camera, our system will need extra light support to work at night Last but not least, in order to ensure the practicality of our system, we use a 220V ac - 5V DC power adapter, so that our system can work 24/7 for a long time without human supervision [1] A Das, M W Ansari, and R Basak, “Covid-19 face mask detection using tensorflow, keras and opencv,” 12 2020, pp 1–5 [2] H Adusumalli, D Kalyani, R Sri, M Pratapteja, and P V R D P Rao, “Face mask detection using opencv,” in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, pp 1304–1309 [3] S A Sanjaya and S Adi Rakhmawan, “Face mask detection using mobilenetv2 in the era of covid-19 pandemic,” in 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), 2020, pp 1–5 [4] P Nagrath, R Jain, A Madan, R Arora, P Kataria, and J Hemanth, “Ssdmnv2: A real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2,” Sustainable Cities and Society, vol 66, p 102692, 2021 [Online] Available: https://www.sciencedirect.com/science/article/pii/S2210670720309070 [5] T Q Vinh and N T N Anh, “Real-time face mask detector using yolov3 algorithm and haar cascade classifier,” in 2020 International Conference on Advanced Computing and Applications (ACOMP), 2020, pp 146– 149 [6] K Suresh, M Palangappa, and S Bhuvan, “Face mask detection by using optimistic convolutional neural network,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021, pp 1084–1089 [7] S Yadav, “Deep learning based safe social distancing and face mask detection in public areas for covid-19 safety guidelines adherence,” International Journal for Research in Applied Science and Engineering Technology, vol 8, pp 1368–1375, 07 2020 [8] A M., S K., S K R., and Y I., “Contactless temperature detection of multiple people and detection of possible corona virus affected persons using enabled ir sensor camera,” in 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2021, pp 166–170 [9] A G Howard, M Zhu, B Chen, D Kalenichenko, W Wang, T Weyand, M Andreetto, and H Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017 [10] M Sandler, A Howard, M Zhu, A Zhmoginov, and L.-C Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019 [11] N S Yamanoor, S Yamanoor, and K Srivastava, “Low cost design of non-contact thermometry for diagnosis and monitoring,” in 2020 IEEE Global Humanitarian Technology Conference (GHTC), 2020, pp 1–6 IV C ONCLUSION In this paper, we propose an embedded machine learning system for automatically detecting face mask wearing and measuring human temperature The face mask detector is built 22 ... ? ?Mask? ?? or “No mask? ?? on the bounding box rectangle B Face Mask Detection and Human Temperature Measurement on Raspberry Pi This section describes the coordination of face mask detection and temperature. .. wear mask or use their hand to cover their face and label them as “No mask? ??, as shown in and Fig 6(e) and Fig 6(f) For some cases when people incorrectly wearing mask as shown in Fig 6(g) and. .. 4(c) and Fig 4(d), our model can give a high accuracy rate for “No mask? ?? and ? ?Mask? ?? cases of both single person and multiple people Our proposed model can also correctly recognize whether human faces