Detecting Drivers Falling Asleep Algorithm Based on Eye and Head States Detecting Drivers Falling Asleep Algorithm Based on Eye and Head States Phat Nguyen Huu and Trang Pham Thi Thu School of Electri[.]
2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Detecting Drivers Falling Asleep Algorithm Based on Eye and Head States Phat Nguyen Huu and Trang Pham Thi Thu School of Electrical and Electronics Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam Email: phat.nguyenhuu@hust.edu.vn; trang.ptt172862@sis.hust.edu.vn II R ELATED WORK Abstract—In the paper, we present an algorithm for detecting driver drowsiness based on computer vision techniques The algorithm will issue a warning signal when drivers are dozing based on the state of eyes closing as well as the tilt of the head with a camera used to observe their faces The proposed method determines six landmarks of eyes to detect eye closure and binocular coordinates to calculate head tilt angle The results show that the proposed algorithm achieves an accuracy of up to 95.25% with 16 frames per second The results show that the algorithm is suitable for development in real applications Index Terms—Drowsiness, closing eyes, tilted head, landmarks, coordinates of eyes In recent years, there have been many theoretical and experimental systems that study to detect drowsiness of driver through eye state for advanced networks [2]–[14] The authors [2] proposed a real-time doze driver monitoring algorithm to detect faces and the regression feature extraction method to determine the facial structure However, a drawback of the system is that it is not able to determine eye state when a face is not in front of the camera In [14], the authors present three measures and discuss the advantages and limitations of each Although the accuracy of using physiological measures to detect drowsiness is high, it can apply to real applications In [3], the authors describe a method to track eyes and detect their state However, the accuracy of the system depends on light, shadow, and slightly dark background In [4], fatigue detection system software is performed for smart-phone instead of laptops in a car The results show that the system has reserved all advantages as fast and real-time face and eye-tracking, accuracy allowance for fast head/face movement In [5], the authors outlined the design of a very simple and economical system that deals with a driver falling asleep on the wheel or becoming too drowsy The results show that the system is suitable to apply for real applications The authors [6] have proposed an algorithm to recognize driving fatigue based on eye-tracking The algorithm will issue a warning signal when the driver shows signs of fatigue based on the state of closing eyes when a camera puts into the face directly Experimental results show that the algorithm will give a fatigue warning to the driver if you close your eyes for 50 consecutive frames However, the disadvantage of the method is that it is not able to determine eye state if drivers tilt their face away or when they are wearing glasses In [7], the authors propose an algorithm based on respiratory rate variability (RRV) to detect the fight against falling asleep The accuracy of the algorithm achieved a specificity of 96.6% and sensitivity of 90.3% on average across all subjects through a leaveone-subject-out cross-validation The results show that the proposed algorithm is a valuable vehicle safety system to alert drowsiness In [8], a warning system about drowsy status is proposed consisting of three main steps, namely enhancing the brightness of an image, extracting by morphologies and image segmentation, and determining eye by SVM The results show that the accuracy of the system is 97.86% and the average recognition time is 0.106 seconds per image The authors I I NTRODUCTION Falling asleep is a common symptom of fatigue Driver ability will be significantly reduced due to asleep failing in dangerous situations According to [1], the authors show that lack of sleep is one of the main causes of traffic accidents in the world From 10 to 15% of car accidents are related to sleep deprivation They perform for 19 European countries and show high rates of drowsiness up to 17% on average There are 10.8% of people who fell asleep while driving at least once a month and 7% had a traffic accident and 18% nearly caused an accident due to drowsiness [1] The early warning of drowsiness of drivers will limit a lot of risks and ensure human life as well as material wealth Therefore, it is very necessary to build algorithms to detect dozing drivers The proposed algorithm has three main points as follows First, we suggest using available shape predictors including 68 face-landmarks datasets of the dlib library to determine face structure Second, we propose to use the method of determining landmarks of an eye to calculate the distance between two eyelids We set a threshold for the distance that determines whether the eye is open or close to inform an alert Third, we use the binocular coordinate determination method to calculate head tilt angle The angle is created by binocular coordinates while the states will be calculated according to geometric formulas We set a threshold for head tilt angle to decide to warn the driver The rest of the paper includes ve parts and is organized as follows Section I presents an overview of estimating the distance to the camera system Section II presents several related works Section III presents the proposal system Section IV will evaluate the proposed model and analyze the results In the nal section, we give conclusions and future research directions 978-1-6654-1001-4/21/$31.00 ©2021 IEEE 84 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) by [20] Unlike OpenCV’s purpose of providing algorithmic infrastructure for image processing and computer vision applications, dlib is designed for machine learning and artificial intelligence applications with the main sub-libraries including: [9] built a driver drowsiness detection system by determining the distance between eyes and eyebrows and curvature of a line connecting two eyelids to determine the state of closing eyes However, a disadvantage of the method is that it has not resolved confusion between raising eyebrows and closing eyes since two actions change the distance of eyes and eyebrows 1) Classification: classification techniques are mainly based on two basic methods, k-nearest neighbors algorithm (k-NN) and SVM [21] 2) Data transformation: data transformation algorithms to reduce the number of dimensions, remove redundant data and enhance the distinctiveness of retained features 3) Clustering: clustering techniques 4) Regression: regression techniques 5) Structure prediction: structured prediction algorithms III P ROPOSAL SYSTEM A Overview of system 1) Facial landmarks algorithm: We will use dlib and OpenCV libraries to detect markers of human face parts Points are marked with parts to define prominent areas of the face including eyes, eyebrows, nose, mouth, facial contour, etc 1) Facial landmarks Facial landmark is a subset of the shape prediction problem We have to identify the key points that make up the shape of the subject of an image In the problem of identifying facial landmarks, we will have to identify the main points of an image that make up the shape of a human face Facial landmark is an input for many other problems such as predicting head positions, swapping, detecting blinks, and reshaping faces [15] The identification of facial landmarks includes two steps In the first step, we locate the face of an image In the second step, we identify the points that make up the structure of the face Face positioning can be performed in many ways from as simple as Haar cascades algorithm [16] to as complex as the HOG algorithm combined with a linear support vector machine (SVM) that have been specially trained for detecting face [17] The goal of algorithms is to obtain an area defined by coordinates (x, y) surrounding the face of the image After identifying the face of an image, we will determine the structure of the face There are many different types of facial structures We have the following components as mouth, right and left eyebrows, right and left eyes, nose, and jaws The detection of facial landmarks of the dlib library comes based on [18] The method starts by using a training set of labeling facial markers of images These images are manually labeled and specify coordinates (x, y) of regions around each facet structure A regression tree algorithm is used to estimate facial landmarks directly from pixel intensities with training data 2) Face mark detector 2) Euclidean distance: The Euclidean distance between two points A and B is length of line segment AB In Descartes coordinate, if A = (A1 , A2 , , An ) and B = (B1 , B2 , , Bn ) are two points in n-dimensional Euclidean space, distance from A to B is equal to [22]: q 2 d(A, B) = (A1 − B1 ) + (A2 − B2 ) + + (An − Bn ) s n P = (Ai − Bi ) i=1 (1) Euclidean normal form is distance of point to origin in Euclidean space defining as follows: p 2 ||A|| = √ A1 + A2 + + An (2) = A.A, where Eq (2) is the dot product This is length of p when we consider it an Euclidean vector The distance is defined as follows p ||A − B||p = (A − B).(A − B) (3) = ||A||2 + ||B||2 − 2A.B B Proposal algorithm We propose the algorithm flowchart as shown in Fig In Fig 1, image data is captured from the camera and then preprocessed The program will next implement a face detection algorithm If the system detects a face, it continues to perform eye recognition We set the threshold for head tilt angle as 300 after eye recognition If head tilt angle α ≥ 300 , the system will give a warning We next set the scale threshold for the eyelid area as 0.2 If the ratio of the distance between eyelids ß ≤ 0.2, the system will continue to count the next frames If the ratio of the distance between eyelids is less than or equal to 0.2 for 50 consecutive frames (about seconds), the system will give a warning If the camera is not able to detect face or distance ratio between eyelids is greater than 0.2 or there are no more than 50 consecutive frames, or the ratio of the distance between eyelids is less than 0.2 or the angle of head tilt is less than 300 , the working cycle will finish We will begin the next working cycle The facial marker detector is pre-trained in the dlib library and used to estimate the position of 68 (x, y) coordinates that make up the human face The indies of 68 coordinates are described in [19] Face markers can be successfully applied to face alignment, head estimation, face swapping, blink detection, and other proper In addition to the OpenCV library, we also use dlib (another open-source library for system installation) Dlib was created 85 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Image input Camera Pre-processing Pre-processing False Face detection? Face detection True Mark facial structure using facial landmarks Eye recognition Extracting eye area State of closing eyes β≤0.2? False Head tilt angle α≥300 Calculating eye ratio False Detecting doze by eye ratio f = f +1 Determining eye coordinates to calculate head tilt angle c = 50? Detecting doze by eye coordinate determination method True False True True Informing warning Fig Processing steps of proposal algorithm Fig Flowchart of proposal algorithm 3) Face detection In the paper, we use the face detection system of the dlib library to find and detect faces in images by detector=dlib.get frontal face detector() 4) Marking facial structure using facial landmarks The next step is to apply a 68-point dlib library structure marking algorithm to locate each important area of the face Those areas include eyebrows, eyes, nose, mouth, and facial contours The data used for training is shapepredictor-68-face-landmarks we then convert the result to a NumPy array 5) Extracting the eye area Using the NumPy array slicing method, we can extract the (x, y) coordinates of the left and right eyes 6) Calculating eye ratio Giving coordinates of (x, y) for both eyes, we will calculate eye ratio In [23], each eye is represented by (x, y) coordinates starting from left corner of eye and marking in turn clockwise There is a relationship between the width and height of these coordinates C Deploying algorithm First, we set up a face-tracking camera To detect whether a person is sleepy or not, we only need the eye area When we have eye area, we can apply the method for six landmarks of eyes to determine whether they are opened or closed and rely on binocular coordinates to calculate the angle of head tilt If eyes have been closed for a long time or the head has tilted beyond the threshold, we conclude the driver is drowsy and issue a warning The steps to implement the algorithm are shown in Fig including 1) Capturing image from camera To access the camera, we install imutils library The program will establish a connection with the computer and take each image frame for processing 2) Pre-processing input data We perform the pre-processing step that includes resizing the input image to a width of 450-pixels and converting it to grayscale 86 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) ( , ) ( , ) ( , ) α (a) ( , ) (b) Fig Doze detection based on eye state (a) enough and (b) lack of light Fig Binocular coordinates and head tilt angle α are expressed in the coordinate system (Oxy) In [23], we can derive an equation that reflects this relationship calling eye ratio (EAR) by expression EAR = ||p2 − p6 || + ||p3 − p5 || , 2||p1 − p4 || (a) (b) Fig Detecting dozing based on head tilt (a) enough and (b) lack of light (4) where p1 , p2 , p3 , p4 , p5 , p6 are the positions marking upper landmark of eye The numerator of Eq (4) is the distance between vertical landmarks when the denominator calculates the distance between horizontal landmarks A denominator weight of is appropriate since there is only one set of horizontal points that have two sets of vertical points The eye ratio is approximately constant while opening However, it rapidly drops to zero when blink occurs The eye ratio will be large and relatively stable over time When the eye blinks, the ratio drops dramatically to near zero 7) Detecting doze by eye ratio: We first set an eye threshold to recognize the state of closing or opening eye We choose an eye threshold of 0.2 [23] We then check the previously calculated EAR If EAR is less than 0.2 for 50 consecutive frames, it is assumed that the person is dozing and issues an alarm 8) Determining eye coordinates to calculate head tilt angle In (Oxy) coordinate system, when face is in equilibrium, we call the first eye coordinate as A(x1 , y1 ) and second eye coordinate as B(x2 , y1 ) When tilting the head, we consider the eye at point A as a landmark and the second eye is at C(x3 , y3 ) point The projection of point C onto AB is D(x3 , y1 ) point Head tilt angle α is determined by length of AD and CD as shown in Fig by formula: q 2 AD = (x1 − x3 ) + (y1 − y1 ) q (5) = (x1 − x3 ) , q 2 CD = (x3 − x3 ) + (y1 − y3 ) q (6) = (y1 − y3 ) , AD α = arctan(√ CD ) = arctan( √ (y1 −y3 )2 (x1 −x3 )2 ) (7) 9) Detecting doze by eye coordinate We set the threshold angle to 300 If α is greater than or equal 300 , the system will issue a warning This work will be performed during doze detection IV S IMULATION AND R ESULT A Setup The simulation is performed on a computer using Windows 10 operating system and webcam to capture video The programming language is Python The training dataset is shape-predictor-68-face-landmarks Simulation scenarios include three cases In the first case, we perform to detect doze based on eye state In the second case, we perform to detect drowsiness based on head tilt In the third case, we perform to detect dozing while closing eyes and tilting head The evaluating parameters are shown in Tab I B Result The simulation results are shown in Figs 4, 5, and Figure is the simulation result of the case of detecting a drowsy driver based on eye state In two conditions (enough light (Fig (a)) and lack of light (Fig (b))), when the algorithm detects an eye ratio less than 0.2 for 50 consecutive frames, a message is displayed on screen Figure is the simulation result of detecting driver drowsy based on head tilt In two conditions (enough light (Fig (a)) and lack of light (Fig (b))), when the algorithm detects head tilt angle greater than 300 , a message is displayed on the screen Figure is 87 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) TABLE I S IMULATION PARAMETERS No Metric Description True Positive (TP) True Negative (TN) False Positive (FP) False Negative (FN) Accuracy (ACC) Detecting the Detecting the Detecting the Detecting the ACC = (T P right sleep state right state of not falling asleep wrong state of not falling asleep wrong state of dozing + T N )/(T P + T N + F N + F P ) TABLE II R ESULTS OF RECOGNIZING FOR OPENING AND CLOSING STATE OF EYES Eye state Wearing glasses with strong light (%) Without wearing glasses with strong light (%) Wearing glasses with low light (%) Without wearing glasses with low light (%) Closing eyes Opening eyes Average accuracy 97 98 97.5 99 99 99 87 88 87.5 89 91 90 TABLE III R ESULTS OF HEAD TILT ANGLE DETECTION Eye state Head tilt angle with strong light (%) Head tilt angle with low light (%) Wearing glasses Without glasses Average accuracy 91 92 91.5 88 88 88 TABLE IV T HE RESULT OF THE COMBINATION OF CLOSING EYES AND TILTING HEAD Eye state Strong light (%) Low light (%) Wearing glasses Without glasses Average accuracy 96 98 97 92 95 93.5 TABLE V R ESULT OF RECOGNITION SPEED OF ALGORITHM ( FRAMES / S ) Eye state Strong light Low light Wearing glasses Without glasses Average accuracy 16÷ 20 16÷ 20 16÷ 20 10÷ 18 10÷ 18 10÷ 18 the simulation result of the case of detecting drowsiness while closing eyes and tilting head In two conditions (enough light (Fig (a)) and lack of light (Fig (b))), when the algorithm detects an eye ratio that is less than 0.2 and the head tilt angle is greater than 300 , a message is displayed on the screen such as head tilt angle, closing, opening eyes, glasses, no glasses, enough light, lack of light The results are shown in Tabs II, III, IV, and V In Tab II, we find that the average accuracy is up to 93.5% and there is a similarity between the two cases with glasses and without glasses in the same light conditions Recognition accuracy is 98.25% and 88.75% in To evaluate correctly, we perform for many different causes 88 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) (a) [2] J W Baek, B.-G Han, K.-J Kim, Y.-S Chung, and S.-I Lee, “Real-time drowsiness detection algorithm for driver state monitoring systems,” in 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN), 2018, pp 73–75 [3] M S Devi, M V Choudhari, and P Bajaj, “Driver drowsiness detection using skin color algorithm and circular hough transform,” in 2011 Fourth International Conference on Emerging Trends in Engineering Technology, 2011, pp 129–134 [4] M Abulkhair, A H Alsahli, K M Taleb, A M Bahran, F M Alzahrani, H A Alzahrani, and L F Ibrahim, “Mobile platform detect and alerts system for driver fatigue,” Procedia Computer Science, vol 62, pp 555–564, 2015, proceedings of the 2015 International Conference on Soft Computing and Software Engineering (SCSE’15) [5] K Mukherjee, R Karmakar, and S Das, “Effective estimation of driver drowsiness based on eye status detection and analysis,” in 2014 International Conference on Devices, Circuits and Communications (ICDCCom), 2014, pp 1–4 [6] M S Devi and P R Bajaj, “Driver fatigue detection based on eye tracking,” in Proceedings of the 2008 First International Conference on Emerging Trends in Engineering and Technology, ser ICETET 08 USA: IEEE Computer Society, 2008, p 649652 [7] F Guede-Fernandez, M Fernndez-Chimeno, J Ramos-Castro, and M Garca-Gonzlez, “Driver drowsiness detection based on respiratory signal analysis,” IEEE Access, vol 7, pp 81 826–81 838, 06 2019 [8] D T Nhan, T Q Bao, and T Q Dinh, “A study on warning system about drowsy status of driver,” in 2017 Seventh International Conference on Information Science and Technology (ICIST), 2017, pp 215–222 [9] Q.-D Truong and N Quang, “System for detecting drowsiness of driver,” Journal of science, Can Tho University, 11 2015 [10] T H Nguyen, T V Chien, H Q Ngo, X N Tran, and E Bjrnson, “Pilot assignment for joint uplink-downlink spectral efficiency enhancement in massive mimo systems with spatial correlation,” IEEE Transactions on Vehicular Technology, vol 70, no 8, pp 8292–8297, 2021 [11] A L Ha, T Van Chien, T H Nguyen, W Choi, and V D Nguyen, “Deep learning-aided 5g channel estimation,” in 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), 2021, pp 1–7 [12] T N Duong, T A Vuong, D M Nguyen, and Q H Dang, “Utilizing an autoencoder-generated item representation in hybrid recommendation system,” IEEE Access, vol 8, pp 75 094–75 104, 2020 [13] N Duong Tan, T Duc, T Vuong, T Tran, Q H Dang, and H Pham, A Novel Hybrid Recommendation System Integrating Content-Based and Rating Information, Jan 2020, pp 325–337 [14] S A., S K., and M M., “Detecting driver drowsiness based on sensors: a review,” Sensors, vol 21, no 2, pp 16 937–16 953, Jan 2012 [15] A Sarsenov and K Latuta, “Face recognition based on facial landmarks,” in 2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), 2017, pp 1–5 [16] B Tej Chinimilli, A T., A Kotturi, V Reddy Kaipu, and J Varma Mandapati, “Face recognition based attendance system using haar cascade and local binary pattern histogram algorithm,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), 2020, pp 701–704 [17] P Patil and R Thaware, “Real-time face detection and recognition with svm and hog features,” EEWeb, vol 06, May 2018 [18] V Kazemi and J Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp 1867–1874 [19] C Sagonas, G Tzimiropoulos, S Zafeiriou, and M Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp 397–403 [20] D E King, “Dlib-ml: A machine learning toolkit,” J Mach Learn Res., vol 10, p 17551758, Dec 2009 [21] A Bhusari, N Gupta, T Kambli, and S Kulkami, “Comparison of svm an/knn classifiers for palm movements using semg signals with different features,” in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 2019, pp 881–885 [22] L Wang, Y Zhang, and J Feng, “On the euclidean distance of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 27, no 8, pp 1334–1339, 2005 [23] T Soukupov´a and J Cech, “Real-time eye blink detection using facial landmarks,” in 21st Computer Vision Winter Workshop, Feb 2016, pp 1–8 (b) Fig Detecting dozing while closing eyes and tilting head (a) enough and (b) lack of light the case of strong and weak light, respectively The results show that light affects the accuracy of eye detection Table III shows the results of head tilt angle detection Accuracy is almost the same for two cases with and without glasses The average accuracy of the algorithm is 89.75% The algorithm achieves 91.5% and 88% accuracy in both strong and weak light cases, respectively In Tab IV, we find that the average accuracy of the algorithm reaches 95.25% Accuracy is achieved at 97% and 93.5% in the case of strong and low light, respectively In similar lighting conditions, there is not much difference in accuracy between with and without glasses In Tab V, average recognition rate of algorithm is 16 frames per second (frames/s) We noticed a difference in recognition speed between the two cases of strong and low light The average recognition speed is 18 and 14 frames/s in strong and low light conditions, respectively In similar lighting conditions, recognition speed while wearing with or not wearing glasses is similar Based on implementation results, we found that light causes certain effects on accuracy and speed of recognition of algorithm In case of sufficient light, accuracy is good and detection speed is high In case of lack of light, it will make it difficult for detecting faces as well as mark eyes that is affecting recognition speed and accuracy We can overcome the problem by using an infrared camera to capture input images In the same lighting conditions, wearing clear glasses does not greatly affect accuracy and speed of recognition Besides, the best camera distance for the algorithm to recognize is from 0.4 to 1.5 meters V C ONCLUSION The paper proposes the method to determine landmarks of eyes to calculate the distance between two eyelids Besides, we use eye coordinates to calculate the tilt angle of the head from which to detect a drowsy driver In the future, we therefore will developing algorithm to recognize drowsy driving through head-down, optimize recognition speed and accuracy,and deploy system using Raspberry kit to improve the system that helps to install into cars and develop applications for smart devices R EFERENCES [1] C Mattice, R Brooks, and T Lee-chiong, Fundamentals of Sleep Technology USA: American Association of Sleep Technologists, 2012, vol ISBN 978-1451132038 89 ... R Karmakar, and S Das, “Effective estimation of driver drowsiness based on eye status detection and analysis,” in 2014 International Conference on Devices, Circuits and Communications (ICDCCom),... 2012 [15] A Sarsenov and K Latuta, “Face recognition based on facial landmarks,” in 2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT),... displayed on screen Figure is the simulation result of detecting driver drowsy based on head tilt In two conditions (enough light (Fig (a)) and lack of light (Fig (b))), when the algorithm detects head