MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION GRADUATION THESIS AUTOMOTIVE ENGINEERING TECHNOLOGY IN-CABIN SENSING: RESEARCH AND DEVELOP A
INTRODUCTION
Topic background
In the constantly developing industry of automotive technology, safety and comfort have become the top priority for both automakers and consumers Manufacturers have implemented various external monitoring devices to help drivers control their vehicles better, such as blind spot detection, parking assist, dashcams, etc However, there is a growing interest in automakers to monitor the inside of the vehicle, specifically the driver and passengers This industry section is referred to as in-cabin sensing, which has been the subject of multiple studies, focusing on enhancing accuracy and reliability to improve overall safety and comfort for drivers and passengers
Driving necessitates cognitive and physical interaction with the vehicle, which can be challenging to sustain over extended periods, particularly for truck drivers who drive long distances Maintaining focus is crucial, as it directly impacts the driver's ability to respond to unforeseen circumstances The driver's mental state plays a pivotal role in ensuring their preparedness and safety on the road.
Drowsiness and distraction are the two most significant factors affecting the driver's performance Fatigue or drowsiness can result in delayed reactions to changes in the surroundings, contributing to over 17% of fatal crashes in the USA [1] In Vietnam, there were a total of 72 accidents attributed to drowsiness in the year 2023 [2], many more of which have not been reported In contrast to fatigue or drowsiness, distraction is less clearly defined because it includes any activity that takes a driver's attention away from driving Examples include conversing with passengers, using mobile phones, and so on Distraction is a significant cause of traffic accidents, responsible for a staggering 29% of all crashes in the USA [3]
Ensuring passenger safety, especially children, is crucial In the USA, leaving children unattended in vehicles has led to tragic consequences, with over 1083 fatalities since 1990, including 54 in 2018 alone This safety concern extends globally, as illustrated by an incident in Vietnam where a student fell asleep during transit and was left in the vehicle for an extended period.
8 hours, ultimately passing away in the heat [5]
Considerable efforts have been made to in-vehicle sensing, which can be categorized into three main groups based on application: occupancy detection, fatigue/drowsiness detection, and distraction detection Due to the above statistics, we find that each of these scenarios is equally important when considering safety in a vehicle, which leads us to choose this area as our graduation thesis topic: “In-cabin sensing: Research and develop a real-time driver and occupant monitoring system using computer vision and radar imaging.”
- Research and develop a driver monitoring system using computer vision to detects drowsiness and distraction
- Research and develop an occupancy monitoring system using radar that detects occupants' positions, shows them in their respective seats and develop fundamentals for child presence detection application
- Combine the systems into one embedded solution with a graphic user interface for demonstration purposes
- Document and present the results
- Learn and use Python programming language
- Apply computer vision and machine learning to embedded systems
- Understand radar signal processing and UART protocol
- Learn system and graphic user interface design
- Practice testing and tuning parameters
- Obtain documentation and presentation skills.
Related works
In-vehicle sensing has emerged as a promising safety enhancement, garnering increasing attention due to its ability to monitor the driver's status continuously Significant efforts have been devoted to in-vehicle sensing, which, according to application scenarios, can be classified into three main categories: occupancy detection, fatigue or drowsiness detection, and distraction detection In-cabin sensing is a broad field that employs diverse technologies and algorithms A general comparison of In-cabin sensing technologies may lack clarity compared to reviewing each aspect separately Therefore, this section will be divided into three sections, each focusing on the category mentioned, followed by currently commercially available options from companies
In-vehicle occupancy detection involves detecting the number of occupied seats in a car and identifying the type of object present at each seat Existing studies on occupancy detection employ various techniques, each with advantages and drawbacks
The concept of “Wireless AI in Smart Cars: How Smart a Car Can Be?”, introduced in 2020, revolutionizes in-car monitoring systems by utilizing commercial
Wi-Fi devices for data collection [6] This system enables the identification of authorized drivers based on radio biometric information and monitors the vital signs of occupants through wireless signals Additionally, it can detect the presence of unattended children in the vehicle, showcasing its potential to enhance passenger safety and security
CarOSense, as detailed in "CarOSense: Car Occupancy Sensing with the Ultra-Wideband Keyless Infrastructure" (2023), leverages ultra-wideband (UWB) technology for precise car occupancy sensing By integrating UWB into keyless entry infrastructure, CarOSense employs the MaskMIMO deep learning model to identify per-seat occupancy with high accuracy This approach adheres to regulations and elevates user experience by personalizing vehicle settings based on occupant count.
The DFKI Cabin Simulator, introduced in 2021, serves as a comprehensive test platform for visual in-cabin scene analysis and occupant monitoring It features a wide-angle camera system and a ground truth reference sensor system, allowing for the creation of realistic vehicle interior environments This platform facilitates the development and testing of in-car monitoring algorithms and systems, enabling researchers and developers to evaluate and enhance their performance in a simulated environment.
In 2020, the paper “In-Cabin Monitoring Systems for Autonomous Vehicles” proposed a robust In-cabin monitoring system tailored for autonomous vehicles [9] This system addresses safety, security, and privacy concerns by combining monitoring cameras with onboard artificial intelligence By employing on-device AI, the system ensures improved privacy protection for users while effectively monitoring the vehicle environment
Lastly, a 2019 study, “Thermal Imaging for Occupancy Detection,” introduces a novel approach to in-vehicle occupancy detection using convolutional networks on thermal images [10] By creating a dedicated thermal image dataset and implementing a tiny convolutional model, this study demonstrates the feasibility of thermal imaging for passenger counting, highlighting its potential for enhancing in-car monitoring systems
Table 1.1 Occupant Monitoring System related works
Low initial accuracy, maximum passengers required CarOSense: Car
Occupancy Sensing with the Ultra-
✓ Requires multiple nodes for better accuracy
3D, IR camera ✓ ✓ Privacy, dedicated hardware
AI camera ✓ ✓ Privacy, dedicated hardware
Thermal camera ✓ Heat signal interfered with by objects
Detecting drowsiness or fatigue is critical to monitoring a driver's state When drivers are not sufficiently alert, their ability to react to unexpected events is compromised, increasing the likelihood of accidents Given the growing number of people gaining access to private vehicles and public transportation, advancements in this field are crucial for enhancing road safety
In their study from 2018, "A Smartphone-Based Drowsiness Detection and Warning System for Automotive Drivers", the authors propose a non-intrusive approach using smartphone sensors [11] Their three-stage framework involves facial analysis, speech data analysis, and touch response to detect real-time drowsiness By utilizing the percentage of eyelid closure (PERCLOS) from front camera images and analyzing speech features, this system offers accessibility and effectiveness, particularly in automotive settings
The 2019 study "Convolutional Two-Stream Network Using Multi-Facial Feature Fusion for Driver Fatigue Detection" utilizes a comprehensive fatigue detection algorithm by integrating convolutional neural networks (CNNs) and multi-task cascaded CNNs This model effectively extracts static and dynamic facial features for fatigue classification, relying on information obtained from facial images and optical flow analysis The research demonstrates the capabilities of advanced neural networks in driver fatigue detection.
Contrasting these visual-based approaches, the 2010 study "A Real-Time Wireless Brain-Computer Interface System for Drowsiness Detection" introduces a wireless EEG-based BCI system for real-time cognitive state monitoring [13] This system enhances driver awareness by detecting drowsiness through EEG signals and providing biofeedback to the driver, potentially offering a more direct and sensitive approach to drowsiness detection than visual cues alone
Lastly, the 2019 paper "Driver Drowsiness Detection Based on Steering Wheel Data Applying Adaptive Neuro-Fuzzy Feature Selection" proposes a novel method leveraging steering wheel data [14] Relevant features are extracted and utilized for drowsiness classification using adaptive neuro-fuzzy feature selection and a support vector machine (SVM) classifier The integration of fuzzy logic and optimization algorithms enhances the accuracy and effectiveness of steering wheel-based drowsiness detection systems
The issue of distraction has garnered significant attention in driver monitoring, particularly in light of the recent increase in accidents caused by drivers using their phones while behind the wheel Unlike cases of fatigue or drowsiness, these drivers are cognitively alert but are directing their attention toward activities unrelated to driving
The study by “Driver Gaze Tracking and Eyes Off the Road Detection System” in 2015 introduces an approach centered on the accurate detection of Eyes Off the Road (EOR) incidents [15] The system precisely detects EOR incidents by tracking facial features, estimating head pose, and analyzing 3-D geometry from a steering wheel-mounted camera
In a similar area, “Real-time detection of driver attention: Emerging solutions based on robust iconic classifiers and dictionary of poses” in 2014 proposes an innovative solution for real-time driver attention detection using binary classifiers and neural networks [16] By simplifying the learning process with a small dictionary of poses, the system effectively identifies lapses in driver attention, contributing to improved driving assistance and safety
Commercial product
Smart Eye's Driver Monitoring System has been integrated into over a million vehicles worldwide, contributing to saving lives daily [18] Smart Eye's Driver Monitoring Systems provide detailed analysis of the driver's condition and actions using sensors like in-car cameras, computer vision, and artificial intelligence Their AI-driven DMS technology facilitates numerous features to enhance road safety and driver comfort, leveraging Affectiva's Emotion AI to detect real-time subtle emotions, reactions, and facial expressions
Bosch's interior monitoring systems [19] utilize innovative sensor technology to enhance vehicle safety by detecting critical situations such as driver distraction and drowsiness early on These systems consist of several modules, including cabin sensing radar, driver monitoring camera, occupant monitoring camera, and steering angle sensor The radar detects living beings in the vehicle, while the cameras monitor the driver and occupants for signs of distraction or fatigue The steering angle sensor analyzes the driver's behavior to determine fatigue levels and prompts a warning to tell the driver to take a break if necessary
Figure 1.2 Bosch’s interior monitoring system
Mercedes's Attention Assist system monitors driving patterns through 70 parameters to detect drowsiness and fatigue Employing advanced algorithms, it considers external factors like road conditions, crosswinds, and control interactions Activated above 60 km/h, the system informs the driver of driving duration and issues alerts upon detecting reduced attention Customizable sensitivity levels (Standard and Sensitive) allow drivers to tailor the system's responsiveness based on their journey length.
Figure 1.3 Mercedes-Benz Attention Assist
Study range and research method
- Study range: This project encompasses the research and design of an embedded system running the In-cabin sensing system More specifically, it covers the theoretical research of the subsidiary systems, OMS and DMS, along with the relevant hardware and algorithms
- Research method: During the topic's implementation, the author team utilized experimentation as the primary research method Other techniques, such as data collection, analysis, comparison, and compilation, are also applied in the report- writing process.
Expected outcome
- A demonstration of the system running the In-cabin sensing application, including the subsystems performing their respective functions
- Documentation and presentation of the design progress and implementation results.
Structure of this thesis report
This report is structured into seven chapters:
- The first chapter introduces the project and provides background information, objectives, and a literature review
- The second chapter discusses theoretical foundations and concepts relevant to
In-cabin monitoring systems This includes relevant theories, principles, and technologies that form the basis of the project
- Chapters three and four shift the focus to the two most important systems, the
DMS and OMS systems They provide more detailed explanations, studies, and information on system functionality, flow charts, and implementation
- Chapter five describes the overall system of In-cabin sensing, including a step- by-step guide for setting it up Additionally, the graphical user interface for the demonstration is described
- The sixth chapter outlines the testing procedures to evaluate the In-cabin monitoring system's effectiveness and accuracy It also discusses any refinements or improvements made based on testing results
- Finally, the report summarizes key findings and insights from the project We also discuss potential future directions for research or development of real products.
THEORETICAL BASIS
In-cabin sensing system theory
In-cabin sensing is the technology used to monitor and understand the conditions and occupants within a vehicle's cabin In-vehicle sensing has emerged as a promising safety enhancement, garnering increasing attention When monitoring people in the cabin, two major areas are considered: the driver and the occupants Because of the crucial role of the driver, considerable endeavors have been dedicated to understanding driver behavior and condition Monitoring the occupants offers minimal value during vehicle operation; instead, during stationary periods, the issues become significant, such as trespassers or children in hot vehicles
2.1.1 Driver drowsiness detection definition and approach
Drowsiness, an intermediate state between sleep and wakefulness, is characterized by physiological changes that impair cognitive function and alertness Drowsy driving, also known as fatigued driving, occurs when a person operates a vehicle while experiencing reduced alertness and cognitive function due to insufficient sleep or prolonged wakefulness This phenomenon significantly increases accident risk, compromises driving performance, and lengthens reaction times As major safety concerns, drowsiness and fatigued driving are challenging to measure directly, despite their serious consequences on road safety, particularly on monotonous routes where minimal driver input is required.
Key characteristics contributing to drowsy driving incidents include time-on- task, sleep deprivation, and external conditions Findings suggest that the time-on- task is a reliable predictor of drowsiness Increasing driving duration is related to higher self-reported sleepiness and fatigue levels, longer blink durations and more significant steering wheel movements Sleep deprivation also stands out as an important factor, with even mild levels (such as 2 hours of sleep loss) leading to impaired driving performance Additionally, external conditions such as light settings, landscape monotony, traffic, and temperature play crucial roles in the development of drowsiness Sociodemographic factors like age and gender further shape susceptibility to fatigue Studies indicate that young drivers are more vulnerable to fatigue due to ongoing cognitive and physical development and irregular sleep patterns, increasing the risk of drowsiness while driving Men and women exhibit significant differences in drowsiness development, potentially influenced by biological and psychological factors; women may experience a more vital need for sleep, leading to higher self-reported sleepiness levels These insights underscore the complex interplay of internal and external factors contributing to driver drowsiness [22]
Our approach to eliminating such an issue uses fatigue detection based on facial features Image/video-based drowsiness detection utilizing facial features has gained popularity without the need for wearable sensors Examples of this type of detection include face recognition, eye detection, and the combination of features taken from the face, eye, mouth, etc A facial landmarks detection module is first applied to refine the input image, remove redundant information outside the region of interest, and extract a 3D face model The next stage is to use the extracted face mesh to observe fatigue-related features such as eye open/close and mouth open/close The measured parameters are Eye Aspect Ratio (EAR), percentage of eyelid closure over the pupil over time (PERCLOS), and yawn frequency, which will be discussed in the “Facial features” section
2.1.2 Driver distraction detection definition and approach
Driver inattention and distraction are also primary contributors to road accidents While there's no universally agreed-upon definition, it's broadly understood as any activity diverting a driver's focus from the road Distractions range from conversing with passengers to smartphone use and even emotional states like anger or anxiety Efforts to mitigate distractions have taken various forms, including legislative bans on mobile device use while driving and automakers integrating controls for common tasks into steering wheels to minimize hand movement Some vehicles even turn off certain entertainment features while in motion, like Tesla's restriction on playing video games However, there's a dilemma between streamlining features and meeting user demands for functionality Manufacturers introduce more functions to enhance the driving experience, potentially increasing distraction risks Thus, there's a need for automatic distraction detection systems capable of alerting or correcting drivers in real time when distraction is detected, balancing safety and user comfort
To understand the behavior, scholars categorize distraction detection methods into visual, manual, and cognitive domains [23] Often, these distractions involve a combination of these types, amplifying the risk further While verbal distractions have been reported, they're not significantly associated with accident responsibility
Our proposed solution uses a machine learning model to determine the direction of the driver’s gaze From the observed driver’s face mesh (using the same pipeline in driver drowsiness detection), head pose and eye gaze are collected; these data will be passed into a custom machine-learning model to achieve information about where the driver is looking The measured parameters include head pose (Euler’s angles pitch, roll, yaw) and eye gaze score
2.1.3 Occupant monitoring description and concept
The Occupant Monitoring System, or In-vehicle occupancy detection, is installed in the vehicle to determine the occupancy information of the seats within the cabin This information includes identifying which seats are occupied and discerning the type of occupant present, whether an adult, child, pet, or inanimate object The applications of the OMS system primarily focus on automotive safety Understanding which seats are occupied can be crucial in scenarios such as airbag optimization For maintenance cost-saving purposes, an airbag might be deactivated if a seat is unoccupied
It is essential to distinguish whether the front seat is occupied by an adult, a child in a rear-facing seat, or is vacant to implement optimal safety measures Particularly if there is a child in the front seat, the sudden deployment of the airbag could potentially pose more harm than protection [24] Identifying children also prevents them from being left unattended inside a vehicle, thereby mitigating the risk of accidents related to heatstroke or suffocation
Aside from safety, OMS can be used in more comfort-oriented functions One example is a passive anti-theft system, which can monitor and detect suspicious movement inside or outside the vehicle Another is modifying climate control according to passenger location to increase fuel efficiency
The proposed system employs a radar-based method of detecting occupants in the cabin From the collected radar information, the occupancy state of each seat in the chosen 5-seater vehicle is then assessed using criteria such as data size, data quality, and noise from the surrounding environment.
Facial features extraction and estimation methods
2.2.1 MediaPipe framework for facial landmarks detection
Among the plethora of computer vision technologies available, Google’s MediaPipe emerge as a robust framework for making on-device custom machine learning applications [25] MediaPipe is a collection of versatile, platform-agnostic machine-learning solutions that are highly customizable and extremely lightweight Some notable advantages of this solution include:
- Providing a fast inference solution: Google asserts that this toolkit can run smoothly on most common hardware configurations
- Easy installation and deployment: Installation is extremely straightforward and convenient, and it can be deployed on various platforms, such as Mobile (Android/iOS), Desktop/Cloud, Web, and IoT devices
- Open-source and free: The entire source code is publicly available on MediaPipe, allowing users to use and customize it directly to fit their needs
MediaPipe Solutions are available across multiple platforms, and each solution includes one or more customizable models: object detection and tracking, hand tracking, pose estimation, facial recognition and tracking, gesture recognition, audio recognition, text recognition, etc We want to highlight the solution for the face landmarks
Facial landmarks detection is a fundamental task in computer vision MediaPipe's facial landmarks solution employs a sophisticated neural network architecture trained on vast datasets to detect and track key facial landmarks accurately Facial landmarks are specific points detected on a human face, representing key features:
The current version of MediaPipe is 0.10.14 The face landmark machine learning pipeline employs two real-time deep neural network models that work together: a detector that identifies faces within the image and a 3D face landmark model that predicts the approximate 3D surface of these detected faces Precisely cropping the face reduces the necessity for common data augmentations like rotations and scale changes, allowing the network to focus on coordinate prediction accuracy The flow of the face landmark solution normally functions as follows:
- Face recognition: Finding and locating a face in an image or video frame is the first step in the process Convolutional neural networks (CNNs) that have been specially trained for face detection are frequently used for this kind of detection
The Face Landmark Solution uses deep neural networks, often based on convolutional or graph architectures, to locate important facial landmarks after identifying a face These networks are trained on extensive datasets of annotated facial photos to learn the spatial correlations between landmarks The model then generates an estimate of 478 three-dimensional landmarks, providing a precise representation of facial features.
- Landmark tracking: The system uses advanced algorithms to follow facial landmarks over time when faces are moving or where perspective and lighting fluctuate Even under difficult circumstances, its tracking guarantees stability and precision
- Output Visualization: The identified facial landmarks are usually superimposed onto the original image or video frame to represent the face features and their movements visually
Figure 2.2 Overview of model architecture [27]
MediaPipe Face Landmark Solution will be the best suit for driver monitoring systems because of the advantages it brings:
- Real-Time Performance: low-latency applications that are essential for live driver tracking
- Accuracy and robustness: The solution's deep learning models and sophisticated algorithms enable it to detect and track facial landmarks with accuracy in a variety of conditions, such as changing lighting, positions, and expressions
- Cross-platform compatibility: MediaPipe is adaptable and widely available because it is compatible with several platforms, including desktop, mobile, and embedded devices
2.2.2 Understanding the Eye Aspect Ratio (EAR)
Eye Aspect Ratio (EAR) is a computer vision technique that measures drowsiness based on facial landmarks, particularly the positions of the eyes Introduced by Soukupová and Čech in 2016, EAR quantifies changes in eye movement patterns, including eyelid closure and eye openness These changes indicate varying levels of drowsiness, providing a non-invasive method for drowsiness assessment.
The EAR method operates as so: As drowsiness sets in, the frequency and duration of eye closures increase, leading to a decrease in the calculated EAR value Using this, the onset of drowsiness can be detected and timely alerts to the driver are sent, thereby mitigating the risk of accidents
The EAR value is computed using coordinates surrounding the eyes, as depicted in the Figure below and described by the EAR equation:
Figure 2.3 Eyes landmarks EAR equation:
Specifically, landmarks P2, P3, P5, and P6 are used to measure eye height, while P1 and P4 are used for eye width When the eyes are closed, the EAR value rapidly decreases to nearly zero, whereas it remains constant when the eyes are open This sensitivity to eye state makes EAR a valuable tool for blink detection and eye state analysis [29]
The average EAR value of the right eye and left eye is calculated and then compared with a predetermined threshold value for eye closure estimation:
The EAR algorithm provides us with many advantages:
- Accurate blink detection: Ensures precise detection of blinks, including subtle ones indicating drowsiness, enabling effective assessment of alertness levels
- Efficient and real-time processing: EAR operates efficiently, processing eye movements in real-time and providing immediate updates on drowsiness status, facilitating prompt interventions
- Versatility and adaptability: EAR equation is adaptable across various platforms and devices, seamlessly integrating into different environments and scenarios, ensuring versatility in its applications
- Non-intrusive and user-friendly: EAR-based drowsiness detection does not require specialized equipment, enhancing user comfort and ease of use and fostering widespread adoption and compliance
2.2.3 Percentage of eyelid closure over the pupil over time (PERCLOS)
PERCLOS is referred to in a 1994 driving simulator study as the “proportion of time in a minute that the eyes are at least 80 percent closed, excluding normal blinks time” [30]
𝑇 1 × 100% where t is how long the eyes were closed and T1 is the time interval (1 minute)
While eye blinks are quick actions that usually last 300–400 milliseconds, extended eye closures happen when you feel sleepy The prolonged duration of closed-eye states can be mistaken for inattentiveness or drowsiness, hence the need for PERCLOS evaluation
PERCLOS is currently found to be the most reliable parameter used for real- time driver alertness indicator [31] A higher PERCLOS value indicates increased drowsiness levels, while lower values suggest heightened alertness By quantifying the proportion of time the eyes remain closed, PERCLOS is a reliable indicator of an individual's level of drowsiness This makes up an early warning system, alerting users about their drowsy state before full fatigue It enables timely interventions such as taking breaks or resting to maintain alertness
Yawning has long been recognized as a compelling indicator of fatigue and drowsiness [32] Systems designed to monitor yawning rates aim to offer early warnings of impending drowsiness by precisely identifying and interpreting yawning behavior This proactive approach provides drivers crucial opportunities to intervene and avoid accidents, enhancing overall road safety
The system collects cues related to the mouth region to assess the mouth’s height (HM) and width (WM) The ratio HM / WM is then calculated for yawning indication:
When the mouth is closed, the ratio of mouth width to height decreases However, when the mouth is open, this ratio increases Setting a threshold (ThY) allows for the distinction between speaking and yawning If the ratio (RM) exceeds ThY, the mouth is considered wide open due to yawning Consecutive yawning frames indicate drowsiness, which is determined by the number of yawns (YN) Initially, YN is set to 0 and is updated according to the provided equation.
𝑇 is then used for fatigue detection
Radar theory
Radar technology has undergone significant advancements from its early stages, where its primary functions were limited to detecting targets and determining their range Initially, "radar" was short for "Radio Detection and Ranging." However, contemporary radar systems have transformed into sophisticated transducer/computer setups These modern systems detect targets and collect their range and possess capabilities to track, identify, image, and categorize targets Moreover, they can suppress unwanted interference like environmental echoes (clutter) and countermeasures (jamming) These advanced radar functions find application across various domains, from traditional military and civilian tasks like tracking aircraft and vehicles to tasks like two- and three-dimensional mapping, collision avoidance, and Earth resources monitoring
The upcoming section introduces radio waves and their attributes, along with a basic outline of a radar system Subsequent segments will go deeper into the specific type of radio wave utilized in the project and the methodology employed to estimate motion values for one or multiple objects using the chosen hardware
A radar functions as an electrical system that emits radiofrequency (RF) electromagnetic (EM) waves towards a specific area and captures and detects these
EM waves upon their reflection from objects within that area Through the periodic transmission and reception of EM waves, the radar can determine the distance from itself to the target and its velocity This involves measuring the time delay between the transmitted signal and its echoed counterpart of a radio pulse or computing the variance in frequency between them
The speed of radio-frequency energy, approximately 3 × 10 8 meters per second, enables radar systems to determine target distances The time required for an energy pulse to travel to the target and return is used as a measure of the distance Only half of the round-trip time is used for calculation, as the pulse must complete a full cycle This time measurement is then incorporated into the radar's calibration.
Radio-frequency pulses travel at an incredibly fast speed, completing over seven circuits around the Earth's equator in a second This necessitates precise time measurements, hence the use of microseconds (µs) in radar applications to quantify the travel times between two points.
Radio waves share characteristics with other types of wave motion, such as ocean waves Wave motion involves a sequence of crests and troughs that occur at regular intervals and travel at a consistent pace Similar to waves in the ocean, radar waves possess energy, frequency, amplitude, wavelength, and velocity
The range 𝑅 to a detected target can be calculated by considering the time Δ 𝑇 it takes for the EM waves to travel to the target and back at the speed of light Given that distance equals speed multiplied by time, and the distance the EM wave traverses to the target and back is 2𝑅
𝑅 = 𝑐Δ𝑇 2 Here, c represents the speed of light in meters per second (𝑐 ≈ 3 × 10 8 𝑚 𝑠⁄ ), Δ 𝑇 denotes the time in seconds for the round-trip travel and 𝑅 signifies the distance in meters to the target
Electromagnetic waves are electric and magnetic field oscillations, vibrating at the carrier frequency The electric field E oscillates in one plane, while the magnetic field B is perpendicular to the electric field According to the right-hand rule, this electromagnetic wave's propagation direction through space (at the speed of light - c) is perpendicular to the plane formed by the E and B fields Specifically, the electric field is aligned along the y-axis, the magnetic field along the x-axis, and the direction of propagation along the z-axis
Figure 2.8 Propagation of an electromagnetic wave in space
Mathematically, the amplitude of the x or y component of the electric field 𝐸 of an EM wave propagating along the z-axis can be represented as:
In this context, 𝐸 0 represents the peak amplitude and 𝜙 denotes the initial phase The wave number, 𝑘, and the angular frequency, 𝜔, are related by the following equation:
𝜆 (𝑟𝑎𝑑𝑖𝑎𝑛𝑠 𝑚⁄ ), 𝜔 = 2𝜋𝑓 (𝑟𝑎𝑑𝑖𝑎𝑛𝑠 sec⁄ ) Where λ is the wavelength in meters, and 𝑓 is the carrier frequency in hertz
As the EM wave propagates through space, the amplitude of 𝐸 for a linearly polarized wave, measured at a specific time, follows a sinusoidal pattern, as depicted in the figure below The wavelength 𝜆 of the wave represents the distance from any point on the sinusoid to the next corresponding point For instance, it could be measured from peak to peak or from trough to trough
If, on the other hand, a fixed location was chosen and the amplitude of E was observed as a function of time at that point in space, the result would be a sinusoid as a function of time, as shown in the figure below
Figure 2.10 The cycle of a wave
The period 𝑇 0 of the wave represents the time from any point on the sinusoid to the next corresponding part For instance, it could be measured from peak to peak or from null to null Essentially, the period signifies the duration for the EM wave to complete one cycle If the period is expressed in seconds, then the frequency denotes the number of cycles the wave undergoes in 1 second
𝑇 0, Where frequency is expressed in hertz, 1 Hz equals one cycle per second
Indeed, the wavelength 𝜆 and frequency 𝑓 of an EM wave are not independent, their product is equal to the speed of light c in free space, as expressed by the equation
𝜆𝑓 = 𝑐 Therefore, if the frequency or the wavelength is known, the other can be derived
Various types of EM waves and their respective frequencies are mentioned in the graph, spanning from radio telegraphy to gamma rays Despite all being categorized as EM waves, their characteristics can vary significantly based on their frequency Radars typically function within the frequency range of 3 MHz to 300 GHz, with the majority operating between approximately 300 MHz and 35 GHz This frequency range is subdivided into several radio frequency (RF) bandwidths
Figure 2.11 Radar frequencies in the frequency scale [35]
The quantity 𝜙 in the electric field equation is often referred to as the fixed or initial phase It's arbitrary because it relies on the electric field's initial conditions, specifically the value of 𝐸, at arbitrarily chosen spatial and temporal positions corresponding to 𝑧 = 0 and 𝑡 = 0
The relative phase is the phase difference between two waves When two waves have a relative phase of zero, they are considered to be in phase with each other By altering one or both waves' wavelength, frequency, or absolute phase, they can be given a non-zero phase difference or out of phase Additionally, if two waves originally in phase travel different path lengths, they can become out of phase
Figure 2.12 Two waves with the same frequency but are out of phase
The electromagnetic waves emitted and received by a radar engage with a variety of matter on its travel path, such as the radar's antenna, the atmosphere, and the target These interactions are governed by physical principles such as diffraction (antenna), attenuation, refraction, depolarization (atmosphere), and reflection (target) However, for this project, the thesis will focus solely on the principle of reflection
Driver monitoring system description and design
System framework
The proposed system consists of six major modules: Driver visual data capturing, Image processing and Landmark extraction, Feature calculation, Driver’s state estimation, and User alert Python programming language and OpenCV are the main tools used in this project Python is one of the most popular programming languages for computer vision tasks due to its simplicity and rich ecosystem OpenCV (Open-Source Computer Vision Library) is a state-of-the-art library that supports computer vision and machine learning algorithms
The camera will capture real-time infrared video footage of the driver The frame will then be passed to MediaPipe for face detection, face tracking, and landmark extraction With refine_landmarks turned on, we can get 478 - 3D landmarks as output; however, just some of the IDs are our main focus:
- Procrustes landmarks (face geometry for pose estimation): 4, 6, 10, 33, 54, 67,
Figure 3.2 Face landmarks ID position
From these coordinates, EAR, PERCLOS, Yawing rate, Pitch, Roll, and Yaw angles are calculated (each computation module will be explained in detailed sections) Once the driver’s visual behavior is captured, processed, extracted, and stored in declared memory, the information will be used for Status of Attention (SoA) estimation for the past few frames Decision-making (whether to alert the driver or not) is based mostly on the SoA equation:
𝑆𝑜𝐴 = 𝑓(𝐸𝐴𝑅, 𝑃𝐸𝑅𝐶𝐿𝑂𝑆, 𝑌𝐹, 𝐺𝐷) {Asleep, if time(EAR ≥ th EAR ) ≥ th time-EAR Tired, elseif PERCLOS ≥ th PERCLOS Fatigue, elseif YF ≥ 𝑡ℎ 𝑌𝐹 Distracted, elseif GD not focus Normal, otherwise
The asleep state is based on the EAR value of the driver’s eyes A first threshold value 𝑡ℎ 𝐸𝐴𝑅 is used to determine whether the eyes are closed or not From multiple experiments, 𝑡ℎ 𝐸𝐴𝑅 in this system is set to 0.1
When 𝑡ℎ 𝐸𝐴𝑅 surpass that value (which means both eyes are closed); we use one more threshold to conclude that the closed time is too long and the driver’s state is asleep The exact duration of eye closure that indicates a driver is asleep can vary depending on the context and specific guidelines that different organizations or researchers follow Commonly, a predefined threshold is set for several seconds, 1-2 seconds (or 2), to trigger an alert to notify the driver to stay attentive or take a break
The tired state is based on the PERCLOS score The input of this function is the EAR score, and the output boolean value indicates whether the driver is tired or not PERCLOS threshold indicates the maximum time allowed in 60 seconds of eye closure (default is 0.2 → 20% of 1 minute) Below is a flowchart for driver-tired estimation:
To estimate whether the driver is in fatigue condition, this module is designed to count the yawning (yawn_count) in a predefined period (yawn_time_period)
When several yawns reach a certain threshold (yawn_thres) over the time cycle, it assumes that the fatigue condition is true
Once all the key points and landmarks for the head pose are detected using MediaPipe, we then calculate algebraically pose Euler Angles pitch (β), yaw (α), and roll (γ) Those parameters will then be passed to a custom machine-learning model for gaze zone mapping and distraction estimation
Figure 3.6 Head pose estimation flowchart
The gaze score of both eyes was also collected to enhance driver gaze estimation This is especially important in situations where the head pose may not accurately reflect the driver's gaze direction, such as when the head is turned but the eyes are focused elsewhere.
Figure 3.7 Gaze score estimation flowchart
Driver gaze tracking
Normally, most gaze tracking systems are based on a very simple technique: Using thresholds on yaw and pitch to determine whether the driver is looking left, right, up, or down But such a method brings many disadvantages: different poses on different people on different cars, difficulty finding a unified threshold, and really hard deciding whether the driver is distracted or not, just based on looking right or left Hence, it is almost unrealistic to apply this approach in real life
We proposed a method that divides the driver’s gaze into 7 main areas (as described in the figure below): the front windshield, left, right, and rear mirrors, dashboard, and infotainment area Then, from gaze activity on which of the areas, we can decide whether the driver is distracted or not
Figure 3.8 Driver's gaze areas break down
A custom machine learning (ML) model is built to fit head pose and gaze score to predefined area labels to achieve that With head pose and gaze score data, we can precisely decode that head action to see where the driver is looking Below are steps to train a ML Gaze Classification Model for gaze mapping:
- Step 1: Capture data and export to CSV
- Step 2: Train ML model using Scikit Learn
- Step 3: Make detections with the model
3.2.1 Capture head pose Euler angels and export to CSV file
The initial step involves creating a comma-separated values (CSV) file The first column should contain the class names (labels) Subsequent columns should include head pose Euler angles and gaze scores: roll, pitch, yaw, left eye gaze score, and right eye gaze score.
Table 3.1 Gaze data capture to csv class roll pitch yaw left_gaze_score right_gaze_score labels roll angle pitch angle yaw angle left eye gaze score right eye gaze score
The labels, according to cabin area breakdown, include looking at the front, left mirror, right mirror, rearview mirror, dashboard, and infotainment system
The next step is to capture data using methods presented in the previous chapter and append these data to a csv file Multiple footages of driver normally gazing in a specific area are recorded in this data collection step
3.2.2 Train machine learning gaze classification model
Scikit-learn is a popular open-source machine learning library for the Python programming language Because of its many features for analyzing data and assessment and its ease of use and efficiency, it is widely used in data science Key features:
- Easy to use and effective tools: Scikit-learn offers an array of effective and easy-to-use tools that address different facets of statistical modeling and machine learning
- Diverse algorithms: It offers numerous algorithms for dimensionality reduction, grouping, regression, and classification
For this project, we try multiple classification algorithms and then pick the one that has the best result: Logistic Regression, Ridge Classifier, Random Forest Classifier, Gradient Boosting Classifier Detailed explanations for each algorithm will not be our focus right now; we just utilize and test the performance of each
The pipeline for training, evaluating, and saving ML models using Scikit-learn is as follows:
- Environment setup, import dependencies: o pickle for saving the trained models o pandas for data manipulation and analysis o train_test_split for splitting the dataset into training and testing sets o make_pipeline for creating machine learning pipelines o StandardScaler for standardizing the features o accuracy_score for evaluating model accuracy
- Data loading: Loads the dataset from a csv file
- Data preparation: Splits the data set into features (‘x’) and (‘y’) Further splits the data into training and test sets with a 70-30 split
- Pipeline creation: Creates pipelines for each classification algorithm Each pipeline includes a StandardScaler to standardize the features and a classifier
- Model training: Trains each pipeline on the training data and stores the fitted models in a dictionary
- Model evaluation: Evaluate each trained model on the test data and get an accuracy score
Figure 3.9 The accuracy score of each model with a different algorithm
- Model saving: Saves each trained model as a pkl file using pickle
3.2.3 Make detections using the model
Real-time gaze captured data will be now passed through saved pkl model to predict gaze area Output of each predict will be the label of gaze area and the probability of that prediction
Choosing camera for DMS
Intel Realsense D435i is used in this project mainly for its built-in infrared (IR) capabilities, which are especially advantageous for night-time or low-light conditions By utilizing IR technology, the RealSense D435 can accurately track facial features, head movements, and eye gaze even in challenging lighting scenarios, enabling continuous monitoring of the driver's behavior and alertness Let’s look at some of the specs:
- Depth output resolution: up to 1280 x 720
- Depth frame rate: up to 90 fps
- RGB sensor technology: global shutter
- RGB sensor resolution: up to 1920 x 1080
- RGB frame rate: up to 90 fps
- Infrared Projector: Enhances depth perception, especially in low-light environments
Criteria for camera set-up:
- Give the best view of the driver’s face
- Should focus only on the driver
- Not blocked by the steering wheel
- Easy to set up and remove when needed
The distance between the driver’s face and the camera in this setup is 50-58 cm, depending on the seat position.
Occupant monitoring system design
Hardware Overview
This project uses the evaluation radar board AWRL6432BOOST from Texas Instruments The AWRL6432BOOST is a user-friendly low-power 60GHz mmWave sensor evaluation kit designed for the AWRL6432, featuring an FR4-based antenna This board enables access to point-cloud data and offers power-over-USB interfaces for added convenience
Figure 4.1 AWRL6342BOOST evaluation board
Figure 4.2 Block diagram of AWRL6432
TI provides valuable resources for evaluating its products This guide helps set up the board for operation It outlines the necessary steps for a successful setup, ensuring users can leverage the full capabilities of TI's Radar products.
- Step 1: Download and install the following packages provided to the user by
Texas Instruments (TI) provides comprehensive resources to assist developers working on TI boards These resources include a comprehensive software development kit (SDK) for working with mmWave low-power devices, as well as a Radar Toolbox for evaluating device performance The availability of these resources greatly simplifies the process of starting development on TI boards.
- Step 2: Set up the board
After plugging in the Micro-USB connector to the computer, setting up the UART Terminal is necessary Go to Device Manager, open Ports (COM & LPT) and check for two ports named XDS110 If the ports do not show up like in the figure below, go download the latest XDS Emulation Software (EMUPack) A reminder that the number of the COM gate changes according to the PC
Figure 4.3 COM Ports of the AWRL6432BOOST
Although the board show up as two ports in the Device Manager, the device only needs a single UART port for both device configuration and communication of processed data to the PC, which is COM4 in this case
The SOP switches must be switched to flashing mode to flash the initial software onto the board Then, press the reset button directly below S1 to lock in the mode
Figure 4.4 SOP switches S1 and S4 from left to right
Figure 4.5 Reset button Table 4.1 Flash mode SOP configuration
Flash Off Off Off Off On On Off Off On -
Then, prepare the binary flash file There are two ways to get this flash file One way is to use the file from the downloaded packages in Step 1 The other is to create the file using the flash tool
Each flash file provided in the SDK is created based on the flash file above However, some demos and examples have slight differences that make the flashed software not work correctly with every example Therefore, using the APPIMAGE file provided in the same folder as the example the user is working with is advised
In the case of this project, we opted for the Capon2D example as the base software for our system, which requires the given Capon2D app image file in the same folder
The Low Power Visualizer is the tool that is then used to flash the board with the initial software This tool is located at:
\tools\visualizer\Low_power_visualizer_5.3.0.0
Running the visualizer will require Google Chrome as the default browser; otherwise, do the following:
Figure 4.8 Chrome is not found After going to the URL mentioned, the Visualizer should open like so:
Figure 4.9 Visualizer home page Now, navigate to the Flash tab and follow the steps in the app to flash:
Figure 4.10 Select the UART COM port
Figure 4.11 Select the device type
Make sure the SOP Configurations are in flashing mode, as mentioned previously
Figure 4.12 SOP Configuration Select the prepared flash file and start flashing The success screen should look like this:
After flashing, switch the SOP configuration to functional mode like in the success screen, or following this table Then press the reset button to save the file to the on-board memory
Table 4.2 Functional mode SOP Configuration S1.1 S1.2 S1.3 S1.4 S1.5 S1.6 S4.1 S4.2 S4.3 S4.4 Functional On Off Off Off On On Off Off On -
The USER_LED should light up, indicating successful flashing After this, the board is ready for operation, and the visualizer should be shut down
Figure 4.14 USER_LED lighting up
UART communication and processing data
To start using a flashed program on a board, a specific sequence of commands is necessary to initiate its operation This sequence is stored in a file that is loaded and transmitted to the serial port via the command line These commands configure operating modes and other parameters like baud rate for the boards Additional details regarding this configuration file can be found in the SDK.
After the board is sent the commands, after the command sensorStart 0 0 0 0, it will go into operation mode, sending point cloud and other relevant data through UART Deciphering and processing this data is the main point of this section
A byte of data (8 bits) sent through UART is depicted above, and a specific series of bits received is called a packet The contents of the packet sent from the device are as follows:
For every packet of data the board transmits, many types of data can be extracted, most importantly, the coordinates of the detected points in 3d space The explanation of the packet components are explained below
Sync/Start: The start of every packet is the same, called the “Magic Word”, with 8 bytes of data representing the hexadecimal values:
Identifying this specific series of numbers is crucial for the recognition of any packet
The header comprises 32 bytes of data, each set of 4 bytes representing a different parameter These parameters include version, total packet length, platform, frame number, CPU cycle time, number of detected objects, TLVs, and sub-frame number The receiving order is also the same as mentioned
The main data is presented to the user as the TLV, which comprises three main parts:
- Type: An indicator value, usually an ID for the data type
- Length: Indicating the size of the following Value field in bytes
The value field information depends on the Type field, as it most likely indicates what kind of data is being put in the Value field In this project case, the value of the Type field is 301, indicating that the Value field contains Point cloud data, according to the SDK
301 20 Bytes + (10 Bytes x Num Detected Obj)
Compressed version of the point cloud in Cartesian coordinates
The Value field comprises a Unit value and Coordinates values The final world Coordinates are determined by multiplying the Unit value with the Coordinates values.
Table 4.5 Unit value format and coordinates value format
Coordinates value (10 bytes x point number)
Type uint16_t uint16_t uint16_t uint16_t uint8_t uint8_t
Type float float float float uint16_t
Zone mapping and occupancy detection
The output point cloud format is the detected points' x, y, and z values alongside other relevant information such as velocity and signal-to-noise ratio (detection accuracy) However, this point cloud coordinate is the distance from the detected point to the radar To correctly map the points and use these coordinates for processing later, these should be converted to global coordinates
Figure 4.17 Sensor coordinates to world coordinates
The conversion depends on the sensor position, which is predefined in the configuration file The result is an array containing the information related to each detected point's location and the signal-to-noise ratio (SNR)
Detected points Point 1 Point 2 Point 3 … Point [n] x x1 x2 x3 xn y y1 y2 y3 yn z z1 z2 z3 zn
The zone is defined by a series of cuboids, which are essentially rectangular volumes characterized by depth (y direction), width (x direction), and height (z direction) These cuboids serve to approximate the space where occupants may be seated
A detected point in the point cloud residing within any of these cuboids will be considered in the calculations for determining occupancy For conventional vehicle seats, footwell areas are included to check for the presence of children and pets Consequently, three cuboids are defined:
- Cuboid 1: Head and chest area
Figure 4.18 Cuboid locations in a seat zone
The number of zones (seats) and their cuboid dimensions are predefined in the configuration file The points are then assessed if they are within the cuboids of each zone using simple comparisons If 𝑥 𝑚𝑖𝑛 ≤ 𝑥 𝑝𝑜𝑖𝑛𝑡 ≤ 𝑥 𝑚𝑎𝑥 AND 𝑦 𝑚𝑖𝑛 ≤ 𝑦 𝑝𝑜𝑖𝑛𝑡 ≤
If a point's coordinates (y, z) satisfy the conditions y max ≤ y point ≤ y min and z max ≤ z point ≤ z min, the point lies within the cuboid of that zone The output is an array where each element indicates whether a point is inside a zone's cuboid (1) or not (0).
After assigning the detected points to their respective zones, the occupancy of each zone can be assessed To summarize the detection progress, the programs compare the number of points detected in a zone and that zone's average signal-to- noise ratio (SNR) with their respective predefined thresholds in the configuration file The zone is updated with each data frame received if the conditions are satisfied
Figure 4.19 Occupancy detection for each zone
In simple terms, when the state is 0, the system is actively searching for movement signals to transition the state to 1 Once the zone becomes occupied, the system continuously monitors for signals to maintain that occupied state However, if there is no movement detected for a certain period, the state is updated back to 0
As previously mentioned, detecting and alerting a parent or guardian about a child being left behind in a vehicle is critical in occupancy detection This section outlines our proposal for detecting a child in the cabin To simulate a child's presence, we use a device with a battery-powered electric motor that drives the main shaft in an oscillating motion This commercially available product is usually placed inside baby dolls to replicate a child's breathing The goal of this system is to detect small movements in the device
Figure 4.20 Breath simulation device extended (left) and retracted (right)
Figure 4.21 Baby dummy for CPD testing
In-cabin Sensing System Set-up and Graphic User Interface (GUI)
Embedded hardware for In-cabin sensing demonstration
Data captured from Awrl6432 radar and Intel Realsense D435i camera are sent to an embedded computer that running In-cabin sensing main software with DMS and OMS features Graphic plots are the output of this demo setup.
Software flowchart for In-cabin sensing demonstration
The main software for In-cabin sensing solutions integrates all features of DMS and OMS Real-time systems simultaneously take RGB and infrared frames from the camera and radar point cloud, process these data, and display the output to GUI
Graphic user interface (GUI)
The basic level GUI used to display both subsystems of the In-cabin sensing system is described in the following section One side will display the output of the OMS, showing occupant location of the vehicle The other side will display the output of the DMS with its relevant parameters Python and OpenCV are still, the main tool for designing this module.
Figure 5.3 GUI for the In-cabin sensing system
The OMS main function outputs a list containing six values, each representing each zone's seat occupancy state These values are either 0 for unoccupied or 1 for occupied Using this received data, the rectangles are shown up to indicate occupancy over a top-view image of the Ford Focus
The DMS system provides real-time monitoring of the driver's facial features, including eye aspect ratio (EAR), eyelid closure duration (PERCLOS), head pose Euler angles, and yawn counts It also estimates the driver's condition, such as asleep, tired, fatigued, or the direction of their gaze These results are visually displayed on the screen for easy monitoring.
During development, we discovered that the radar transmits data at a slower rate than the camera, and integrating both radar and camera data within the same window significantly impacts the frame rate and overall performance This is called bottlenecking, and it occurs because the loop that runs the GUI cannot advance to the next cycle until it receives the next package from the radar, even if the camera has already transmitted its next package Because of this, we had to employ multiprocessing for our GUI window, which involves running two separate processes simultaneously, effectively eliminating the issue.
Testing and refine
Camera calibration using Python and OpenCV
In computer vision applications, precise input data is essential, and camera calibration typically marks the initial phase in setting up a computer vision pipeline Camera calibration involves estimating the parameters of a camera to establish an accurate relationship between a 3D point in the real world and its corresponding 2D projection (pixel) in the image This process considers internal parameters, such as focal length, optical center, and radial distortion coefficients of the lens, as well as external parameters, like the rotation and translation of the camera, concerning a real- world coordinate system
Libraries used in this process include OpenCV and Numpy (for working with arrays)
The process of calibration using Python and OpenCV is as follows:
- Step 1: Define real-world coordinates - measure 3D real points of a checkerboard pattern
- Step 2: Take images of the checkerboard from different viewpoints
Figure 6.1 Images of checkerboard for camera calibration
- Step 3: Detect chessboard corners - use OpenCV's ‘findChessboardCorners()’ method to find the pixel coordinates (u, v) for each 3D point in the captured images
- Step 4: Calibrate the camera - use the ‘calibrateCamera()’ method to find the camera parameters
The output of this process includes:
- Camera Matrix: Transforms 3D object points to 2D image points
- Distortion Coefficients: Model the distortion introduced by the camera lens
- Rotation and Translation Vectors: Describe the camera's position and orientation in the world.
Driver monitoring experimental results
Precision and accuracy are two different metrics used to evaluate the performance of a machine learning model, particularly in classification tasks
Accuracy is the ratio of correctly predicted instances to the total instances It gives an overall measure of the model’s performance
Where 𝑇 𝐹 is the total number of frames recorded in each test case, and 𝐶 𝐹 are the frames that are correctly identified
Precision is the ratio of correctly predicted positive observations to the total predicted positives It focuses on the quality of positive predictions
While accuracy is used for driver drowsiness monitoring systems for a general performance measure, precision is used for driver gaze tracking to check the quality of each prediction.
Two drivers (with driving licenses, age average of 22 years old, both wearing glasses) spent two weeks tunning and testing this system, at least 2 hours each behind the wheel in a stationary vehicle with different lighting conditions
6.2.1 Accuracy of driver drowsiness monitoring system
Four driver statuses (normal, asleep, tired, and exhausted) were tested based on the SoA equation, which relates to driver conditions Each test case lasted for 2 minutes and was checked multiple times using a frame counter in the software The results were recorded and analyzed to assess the impact of driver fatigue on driving performance.
Figure 6.2 Normal state and Asleep state
Figure 6.3 Tired state and fatigued state
The accuracy of the drowsiness system was checked with various driver’s behaviors and evaluated using the accuracy equation:
Table 6.1 Testing with both eyes closed
Participant Test case 𝑻 𝑭 𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 (%) False alarm
Participant Test case Total yawns Yawns detected False alarm
6.2.2 Precision of driver gaze tracking
The precision of each trained model is plotted for evaluation:
Figure 6.4 Precision scores of different models The precision scores for each class, differentiated by model, allow us to compare the performance of different models across each class:
Figure 6.5 Precision scores for each class by different models
To further evaluate the performance of the custom machine learning model, the Confusion Matrix is used to visualize the performance of the classification algorithm by showing the counts of true positive, true negative, false positive, and false negative predictions:
Figure 6.6 Confusion matrix for Logistic Regression algorithm pipeline
Figure 6.7 Confusion matrix for Ridge Classifier algorithm pipeline
Figure 6.8 Confusion matrix for Random Forest Classifier algorithm pipeline
Figure 6.9 Confusion matrix for Gradient Boosting Classifier algorithm pipeline
Gaze tracking is achieved using a trained machine-learning model that analyzes each captured frame For each frame, the model predicts the gaze by estimating the area being looked at and the probability of the estimation:
Figure 6.10 Looking front and rear-view mirror
Figure 6.11 Looking left and right mirror
Figure 6.12 Looking at the Dashboard and Infotainment area
Figure 6.13 Testing in low-light conditions
At night, the lighting conditions in the cabin are quite poor To address this, the Driver Monitoring System (DMS) utilizes infrared imaging to improve visibility While the system's performance at night isn't entirely equivalent to its daytime operation, it still effectively detects all facial landmarks and maintains full functionality of all its features
- Wearing colored sunglasses will block all features that need eye landmark data
- Cannot detect face if covered, and sometimes detect passenger’s face
- Wearing a mask eliminates the fatigue features, which need a yawning count.
Occupancy Monitoring System testing
This part will cover the methods and processes involved in tuning the performance of the OMS system This step primarily entails adjusting parameters in the Command Line Interface (CLI) configuration file responsible for the operation of the board, as well as the reference values used in detection When determining the presence of passengers inside a cabin, three factors must be considered: the cabin parameters, the quality of the detected signal, and the processing parameters Each of these factors will be discussed in the following sections
To map the detected points to their dedicated zones, the reference point of the seating area must be defined Assume the origin of the car coordinates is behind the center console and across from the gas pedal; the x-axis will be the car's width with positive towards the driver's side, the y-axis will be the length of the vehicle, and the z-axis will be the height
Figure 6.14 Origin of the car coordinates
First, the cabin dimensions are measured and defined in the file using the command "interiorBounds." After measuring, the cabin size of the Ford Focus is 1.4 x 2.1 x1.16 meters Therefore, the syntax of the command is as follows:
Table 6.3 Cabin dimension CLI configuration interiorBounds -0.7 0.7 0 2.1 interiorBounds min X (meters) max X (meters) min Y (meters) max Y (meters)
Crucially, the sensor's position must be measured to enable the conversion of radar-relative coordinates to car coordinates for zone mapping The original orientation of the radar mount is zero degrees rotation in all planes.
Figure 6.15 Sensor mounting position at 0 degrees
In this project’s case, the radar board is mounted on the vehicle's roof, facing downwards, making the rotation angle in the y-z plane 90 degrees The measurements and the command syntax are as follows:
Table 6.4 Radar mount location CLI configuration sensorPosition 0 1.26 1.12 90 0 0 sensorPosition x
(meters) z (meters) y-z plane rotation (degrees) x-y plane rotation (degrees) x-z plane rotation (degrees)
Figure 6.16 Sensor mount at 90 degrees in the y-z plane
Finally, the zones and their cuboid dimensions are specified These measurements are important for defining the region of interest, allowing for focused detection within specific areas, thus simplifying the detection process Particularly, the areas under the seat and the space directly in front of the occupant will be disregarded, as most individuals sit with their backs against the seat backrest
Figure 6.17 Setting the region of interest
Measuring parameters along the x-axis is straightforward since the seats are symmetrical on either side of the cabin's center Thus, one side of the vehicle only need to be measured, and the other side can be inferred symmetrically
The y and z axis are more complicated, due to them having different values at different parts of the human body For instance, the footwell area is futher in front of the body cuboids on the y-axis, or the head area being different at the rear seat compared to the front
Figure 6.19 Measuring y and z axis dimensions
Minimizing zone overlap and inaccurate point mapping requires separation of cuboid boundaries This tuning step is crucial, as slight offsets can lead to misdetection and false detection, particularly in vibrating vehicles In the Ford Focus, we employed six primary zones: two front seats (each with three cuboids), three back seat zones, and a footwell zone Precise measurements ensure accurate zone cuboid syntax and results.
Table 6.5 Cuboid dimensions CLI configuration cuboidDef 1 1 0.13 0.7 0.77 1.07 0.7 1 cuboidDef zone no cuboid no x min x max y min y max z min z max
Table 6.6 Cuboid dimensions, in meters zone no cuboid type cuboid no x min x max y min y max z min z max
During testing, the rear seats often overlapped due to their close proximity The presence of a person in seat 3, with their leg extending into seat 4, could trigger presence detection To compensate, the sensing zones had to be narrowed down to mitigate this issue.
As indicated in the SDK, the command responsible for filtering the point cloud, or the number of points detected excluding environmental noise, is cfarCfg This CLI command regulates the CFAR (Constant False Alarm Rate) algorithm, which sets the power threshold that determines whether a return signal is likely from a target or noise
Three main parameters in this command’s syntax are considered and underlined in the following:
Table 6.7 Signal quality CLI configurations cfarCfg 2 8 4 3 0 9.5 0 1 4 6 2 9.5 0.85 0 0 0 1 1 1 1 0.5 1.5 0.15 cfarCfg thresholdScale[0] thresholdScale[1] sidelobeThreshold
The detection performance in target detection is highly dependent on the threshold settings A lower threshold can lead to less reliable detections, resulting in more noise around the target On the other hand, a higher threshold risks missing valuable detection points This loss of information impacts the signal-to-noise ratio (SNR) of the detected target points, ultimately affecting the detection quality and accuracy.
The following test is done with the person sitting at the driver seat, with no engine vibration, and the points detected are also shown using the MATLAB visualizer provided by TI to demonstrate the impact of thresholding As you can see in the number of points plotted on the zone map, there is significantly more noise in the second picture
Figure 6.20 Detected points at higher thresholds (more accurate)
Figure 6.21 Detected points at lower thresholds (less accurate)
This section is also responsible for the performance of the CPD system Due to the small size of the oscillating device, the device movement replicating the breath cycle of the child could be misunderstood by the radar as background noise Thus, balancing between precise, noise-filtered occupancy detection and child presence detection that is sensitive enough to detect this device is necessary
These parameters are the thresholds for the occupancy detection step described in section 4.3.3 The command responsible for these parameters is occStateMach, which has the syntax and explanation of each value as below:
Table 6.8 Processing parameters CLI configuration occStateMach 0 7 9.0 3 23 1 2 10.0 15 1 100000.0
Conclusion and future works
Conclusion
Through the exploration of “In-cabin sensing: Research and development of a real-time driver and occupant monitoring system using computer vision and radar imaging”, this study has made significant progress toward achieving its research objectives The report successfully addressed the following efforts:
- Present the theoretical basis of computer vision, radar signal processing, and their related fields Additionally, the concept and application of an In-cabin sensing system is also defined
- Design a real-time driver monitoring system that detects drowsiness and distraction using RGB and infrared images
- Design an Occupant Monitoring System that detect individuals in their seats and include Child Presence Detection functionality by utilizing the number of points reflected and signal quality
- Develop a combined system with a basic Graphical User Interface for display
- Successfully demonstrate the operations of the system on a Ford Focus 2018.
Future works
Due to time constraints, several aspects of the project require further refinement Below are potential development directions aimed at enhancing the performance and effectiveness of the In-cabin sensing system:
- Design a proper alarm system and human-machine interface
- CAN Integration: Integrate with Controller Area Network for door lock, In- cabin temperature, and HVAC information to add additional conditions for Child Presence Detection activation
- Performance Tuning: Fine-tune performance to adapt to various environmental conditions such as day, night, music playing, and engine operation
- Camera-Radar Fusion: Utilize camera-radar fusion technology for improved accuracy and verification between systems to optimize power usage
- Cognitive distraction of the drivers remains a very challenging topic, and mental behavior is very hard to address
[1] B C Tefft, “Drowsy Driving in Fatal Crashes, United States, 2017–2021,”
AAA Foundation for Traffic Safety., Mar 2024, Accessed: May 10, 2024
[Online] Available: https://aaafoundation.org/drowsy-driving-in-fatal- crashes-united-states-2017-2021/
[2] “Ngăn tai nạn do ‘giấc ngủ trắng’ của tài xế.” Accessed: May 10, 2024 [Online] Available: https://atgt.baogiaothong.vn/ngan-tai-nan-do-giac-ngu- trang-cua-tai-xe-192240404233524722.htm
[3] “U Drive U Text U Pay Campaign Kickoff | NHTSA.” Accessed: May 10,
2024 [Online] Available: https://www.nhtsa.gov/speeches- presentations/distracted-driving-campaign-kickoff
[4] “U.S Child Hot Car Death Data Analysis from the Kids and Car Safety National Database (1990-2023).” [Online] Available: www.KidsandCarSafety.org
Tragically, a first-grader from Gateway International School lost their life after being left unattended on a school bus This incident, reported on various news platforms, serves as a grim reminder of the vital need for enhanced safety measures in student transportation.
[6] Q Xu, B Wang, F Zhang, D S Regani, F Wang, and K J Ray Liu, “Wireless
AI in Smart Car: How Smart a Car Can Be?,” IEEE Access, vol 8, pp 55091–
[7] Y Ma, Y Zeng, and V Jain, “CarOSense: Car Occupancy Sensing with the Ultra-Wideband Keyless Infrastructure,” Proc ACM Interact Mob Wearable Ubiquitous Technol, vol 4, no 3, Sep 2020, doi: 10.1145/3411820
[8] H Feld, B Mirbach, J Katrolia, M Selim, O Wasenmüller, and D Stricker,
“DFKI Cabin Simulator: A Test Platform for Visual In-Cabin Monitoring Functions,” pp 417–430, 2021, doi: 10.1007/978-3-658-29717-6_28
[9] A Mishra, S Lee, D Kim, and S Kim, “In-Cabin Monitoring System for Autonomous Vehicles,” Sensors, vol 22, no 12, Jun 2022, doi:
[10] F E Nowruzi, W A El Ahmar, R Laganiere, and A H Ghods, “In-Vehicle Occupancy Detection with Convolutional Networks on Thermal Images.”
[Online] Available: http://www.site.uottawa.ca/research/viva/projects/thermal-
[11] A Dasgupta, D Rahman, and A Routray, “A Smartphone-Based Drowsiness Detection and Warning System for Automotive Drivers,” IEEE Transactions on Intelligent Transportation Systems, vol 20, no 11, pp 4045–4054, Nov
[12] W Liu, J Qian, Z Yao, X Jiao, and J Pan, “Convolutional two-stream network using multi-facial feature fusion for driver fatigue detection,” Future Internet, vol 11, no 5, May 2019, doi: 10.3390/fi11050115
[13] C T Lin, C J Chang, B S Lin, S H Hung, C F Chao, and I J Wang, “A real-time wireless brain-computer interface system for drowsiness detection,”
IEEE Trans Biomed Circuits Syst, vol 4, no 4, pp 214–222, Aug 2010, doi:
[14] S Arefnezhad, S Samiee, A Eichberger, and A Nahvi, “Driver drowsiness detection based on steering wheel data applying adaptive neuro-fuzzy feature selection,” Sensors (Switzerland), vol 19, no 4, Feb 2019, doi:
[15] F Vicente, Z Huang, X Xiong, F De La Torre, W Zhang, and D Levi,
“Driver Gaze Tracking and Eyes off the Road Detection System,” IEEE Transactions on Intelligent Transportation Systems, vol 16, no 4, pp 2014–
[16] G L Masala and E Grosso, “Real time detection of driver attention: Emerging solutions based on robust iconic classifiers and dictionary of poses,” Transp Res Part C Emerg Technol, vol 49, pp 32–42, Dec 2014, doi:
[17] J Jo, “Vision-based method for detecting driver drowsiness and distraction in driver monitoring system,” Optical Engineering, vol 50, no 12, p 127202,
[18] “Driver Monitoring System (DMS) - Smart Eye.” Accessed: May 10, 2024 [Online] Available: https://www.smarteye.se/solutions/automotive/driver- monitoring-system/
[19] “Interior sensing solutions.” Accessed: May 10, 2024 [Online] Available: https://www.bosch-mobility.com/en/solutions/interior/interior-sensing- solutions/
[20] “What is ATTENTION ASSIST®? | Mercedes-Benz Safety Features | Fletcher Jones Motorcars.” Accessed: May 10, 2024 [Online] Available: https://www.fjmercedes.com/mercedes-benz-attention-assist/
[21] M W Johns, “A sleep physiologist’s view of the drowsy driver,” 2000 [Online] Available: www.elsevier.com/locate/trf
[22] S Soares, T Monteiro, A Lobo, A Couto, L Cunha, and S Ferreira,
“Analyzing driver drowsiness: From causes to effects,” Sustainability (Switzerland), vol 12, no 5, Mar 2020, doi: 10.3390/su12051971
[23] A Kashevnik, R Shchedrin, C Kaiser, and A Stocker, “Driver Distraction Detection Methods: A Literature Review and Framework,” IEEE Access, vol
9 Institute of Electrical and Electronics Engineers Inc., pp 60063–60076,
[24] P Kuchár, R Pirník, A Janota, B Malobický, J Kubík, and D Šišmišová,
“Passenger Occupancy Estimation in Vehicles: A Review of Current Methods and Research Challenges,” Sustainability 2023, Vol 15, Page 1332, vol 15, no 2, p 1332, Jan 2023, doi: 10.3390/SU15021332
[25] “MediaPipe | Google for Developers.” Accessed: May 10, 2024 [Online] Available: https://developers.google.com/mediapipe
[26] “Face landmark detection guide | Edge | Google for Developers.” Accessed:
May 19, 2024 [Online] Available: https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker
[27] “google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.” Accessed: May 19, 2024 [Online] Available: https://github.com/google-ai-edge/mediapipe
[28] T Soukupová, “Real-Time Eye Blink Detection using Facial Landmarks,”
[29] C Dewi, R C Chen, C W Chang, S H Wu, X Jiang, and H Yu, “Eye Aspect Ratio for Real-Time Drowsiness Detection to Improve Driver Safety,”
Electronics (Switzerland), vol 11, no 19, Oct 2022, doi: 10.3390/electronics11193183
[30] W W Wierwille, S S Wreggit, C L Kirn, L A Ellsworth, and R J Fairbanks, “Research on Vehicle-Based Driver Status/Performance Monitoring; Development, Validation, and Refinement of Algorithms For Detection of Driver Drowsiness,” 1994
[31] T Abe, “PERCLOS-based technologies for detecting drowsiness: current evidence and future directions,” SLEEP Advances, vol 4, no 1 Oxford
University Press, 2023 doi: 10.1093/sleepadvances/zpad006
[32] D Arbuck, “Is yawning a tool for wakefulness or for sleep?,” Open J Psychiatr, vol 03, no 01, pp 5–11, 2013, doi: 10.4236/ojpsych.2013.31002
[33] E N A Neto et al., “Real-time head pose estimation for mobile devices,” in
Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, pp 467–
[34] F Vicente, Z Huang, X Xiong, F De La Torre, W Zhang, and D Levi,
“Driver Gaze Tracking and Eyes off the Road Detection System,” IEEE Transactions on Intelligent Transportation Systems, vol 16, no 4, pp 2014–
[35] J David, “Radar Fundamentals.” [Online] Available: http://www.nps.navy.mil/faculty/jenn
[36] B.-Chin Wang, Digital signal processing techniques and applications in radar image processing John Wiley, 2008
[37] S Rao and T Instruments, “Introduction to mmwave Sensing: FMCW Radars.”
[38] “FMCW Radar Part 2 - Velocity, Angle and Radar Data Cube | Wireless Pi.” Accessed: May 10, 2024 [Online] Available: https://wirelesspi.com/fmcw- radar-part-2-velocity-angle-and-radar-data-cube/