Research and design of a system for early fire detection using machine learning on indoor laboratory data

Motivation

Fires pose a significant threat to safety and property worldwide, and Hanoi, like many urban areas in Vietnam, is no exception[1][10][28][26][27] Every year, fires cause extensive damage, resulting in the loss of homes, valuable assets, and, tragically, lives. Despite advancements in fire-fighting technologies and equipment, timely detection and response remain critical challenges The densely populated areas and rapid urbaniza- tion of Hanoi exacerbate these risks, making early fire detection not just a matter of safety, but a pressing necessity.

In recent years, Hanoi has seen a number of high-profile fire incidents that have high- lighted the limitations of current fire detection systems These incidents often escalate rapidly due to delayed detection, leading to devastating consequences Traditional fire detection methods, which rely primarily on smoke and heat sensors, often fail to provide the necessary early warning needed to prevent such disasters This is particularly concerning in residential areas and commercial buildings where the concentration of people and flammable materials is high.

The motivation for this thesis stems from the urgent need to enhance fire detection capabilities in Hanoi and across Vietnam By leveraging modern technologies such as machine learning and advanced sensor systems, there is potential to significantly improve early fire detection and response times This research aims to create and implement a fire detection system that can provide accurate and timely alerts, thereby mitigating the risks and impact of fires.

Furthermore, the choice to concentrate on indoor laboratory data for this study is driven by the desire to create a controlled environment that allows for precise calibration and testing of fire detection algorithms This approach ensures that the system is robust and reliable before deployment in real-world scenarios The ultimate goal is to contribute to a safer environment in Hanoi and other urban areas in Vietnam, reducing the incidence and severity of fire-related incidents through innovative technological solutions.

The integration of machine learning into fire detection systems represents a significant advancement over traditional methods Machine learning algorithms are capable of examining patterns and identifying anomalies in environmental data, providing early warning signals that might be missed by conventional sensors This thesis seeks to ex- plore these capabilities, developing a system that not only detects fires more quickly but also reduces false alarms, thereby increasing the overall efficiency and reliability of fire response strategies.

In conclusion, the motivation for this research is rooted in the need to address the persistent and growing threat of fires in Hanoi and Vietnam By harnessing the power of machine learning and advanced sensor technology, this thesis aims to pioneer a new approach to fire detection, offering a solution that is both innovative and impactful.

Objectives

The main goal of this thesis is to design, develop, and evaluate an advanced early fire detection system using machine learning techniques and a comprehensive set of sensors, with the goal of providing a more effective and reliable solution for detecting fires at their inception in indoor environments, thereby enhancing safety and mitigating damage.

Approach and Methodology

The approach undertaken in this research is systematic and multi-faceted Initially,data collection involves the meticulous gathering of sensor readings from controlled fire scenarios in a laboratory environment This includes sensors measuring CO2 levels, humidity, temperature, MQ139, TVOC, and eCO2, ensuring a diverse set of data points for model training The subsequent stage involves comprehensive data preprocessing,which encompasses data cleaning, normalization, and addressing any missing values to maintain the integrity and suitability of the dataset for machine learning applications The research then progresses to model selection, where various algorithms such as Support Vector Machine (SVM) and Artificial Neural Network (ANN) are identified for evaluation The dataset is systematically partitioned into training, validation, and test sets to support robust model development and performance evaluation Rigor- ous performance metrics, including accuracy, F1-score, and ROC-AUC are utilized to compare the models, ultimately determining the most effective method for early fire detection The final stage involves the implementation of the best-performing model in a real-time detection system, followed by an in-depth analysis of the results to discuss the findings and pinpoint areas for future research and enhancement.

Scope and Limitation

This thesis focuses on designing, developing, and evaluating an early fire detection system using machine learning techniques tailored for indoor environments The system integrates various sensors, including those measuring temperature, humidity, gases, and flames, and employs machine learning algorithms, specifically Artificial Neural Networks and Support Vector Machines Data will be gathered via controlled fire experiments to train and validate these models The system’s performance will be evaluated using metrics such as precision, sensitivity, specificity, accuracy, and ROC- AUC, with real-world testing to ensure practical effectiveness.

However, this research has certain limitations The controlled laboratory environment for data collection and initial testing may not fully represent real-world conditions, potentially impacting the system’s generalizability The reliability and accuracy of the sensors used are critical, and any limitations in sensor performance could affect the overall effectiveness of the system Despite testing various fire scenarios, not all potential fire situations can be covered, which may limit the system’s applicability.Technical constraints related to the ESP8266 microcontroller and LPWAN for data transmission might also pose challenges Additionally, the system may experience false positives or negatives, necessitating further refinement Scaling the system to larger or more complex settings, such as industrial sites or extensive commercial buildings, may require additional adjustments and optimizations.

Challenges

The research encounters several challenges during conduct:

Collecting high-quality data is difficult due to controlled environment limitations, sensor precision issues, and data imbalance Laboratory settings may not capture real-world fire variability, and ensuring precise sensor calibration is crucial to avoid inaccurate readings Additionally, the rarity of fire incidents compared to non-fire events, leading to an imbalanced dataset that can bias the model.

Model selection and training also pose challenges There is a balance to be struck between complexity and performance, with more sophisticated models such as deep neural networks requiring more resources and being prone to overfitting Hyperpa- rameter tuning is time-consuming, and selecting relevant features from sensor data is critical for model accuracy.

Sensor integration involves technical difficulties such as varying communication protocols and data formats Ensuring seamless data transmission and synchronization, maintaining energy efficiency, and dealing with environmental interferences like humidity and dust are essential for system reliability.

Long-term reliability and maintenance are crucial Minimizing false positives and negatives is vital for system credibility, requiring continuous sensor calibration and maintenance The system must also be scalable to cover large areas or multiple buildings, posing challenges in data transmission and uniform performance.

Fire detection technologies

Smoke detectors

Smoke detectors operate based on the detection of airborne particulate matter, which is a byproduct of combustion There are primarily two types based on operating principles: ionization smoke detectors and photoelectric (optical) smoke detectors. Ionization smoke detectors utilize a minimal quantity of radioactive material, typically Americium-241, to ionize the air within the sensing chamber, creating a flow of ions between two electrodes Smoke particles disrupt this ionization process, reducing the current and triggering the alarm Ionization detectors are more sensitive to flaming fires, which generate smaller smoke particles.

The process can be summarized with these steps: 1 The ionization chamber con- tains ions generated by radioactive Americium-241.

2 Normally, these ions maintain a steady current between two electrodes.

3 Smoke particles entering the chamber attach to the ions, reducing the current.

4 The reduced current is detected by an electronic circuit, which activates the alarm.

Optical (photoelectric) smoke detectors employ a light source-emitting diode (LED) and a photocell housed in a sensing chamber When smoke enters the chamber, it

(a) ionization smoke detector (b) photoelectric smoke detector

Figure 2.1: Inside view of smoke detectors, showing the chamber and electronic components[29] scatters the light beam, directing it towards the photocell and triggering the alarm. This type of detector is particularly effective at identifying smoldering fires, which produce larger smoke particles The operational steps are:

1 The LED emits a light beam within the sensing chamber.

2 In the absence of smoke, the light beam does not reach the photocell.

3 Smoke entering the chamber scatters the light, directing it to the photocell.

4 The photocell detects the scattered light, and an electronic circuit activates the alarm.

Heat detectors

Heat detectors react to temperature changes and are especially suitable for settings where smoke detectors may frequently trigger false alarms due to dust, vapors, or other airborne particles, such as kitchens and garages Fixed temperature heat detectors activate an alarm when the surrounding temperature surpasses a set threshold, typically around 135°F (57°C) The scientific basis for these detectors lies in the ther- mal expansion of materials or the change in resistance of thermistors used within the device Rate-of-rise heat detectors activate when the temperature increases rapidly, usually about 15°F (8.3°C) per minute This rapid temperature increase can indicate a developing fire, prompting an early warning.

The operational mechanism of heat detectors is based on the response of a thermistor to temperature changes A thermistor is a resistor whose resistance changes significantly with temperature The key principle behind the operation of a heat detector is that the resistance of the thermistor decreases as the temperature increases. This relationship can be expressed mathematically by the following equation for the thermistor’s resistance R:

(2.1) where R 0 represents the resistance at a reference temperature T 0 , T is the current temperature, and β is a material-specific constant.

As the temperature in the environment rises, the thermistor detects this change through the decrease in its resistance The heat detector triggers a fire alarm signal based on one of two criteria:

1 Fixed Temperature: The heat detector activates when the ambient temperature reaches a predefined threshold, typically 60°C.

2 Rate-of-Rise: The heat detector triggers an alarm if there is a rapid increase in temperature, such as a 30°C rise within one minute These detection methods ensure that the heat detector responds promptly to potential fire conditions, providing an early warning system that is less susceptible to false alarms from non-fire-related particulates.

Feature Fixed Temperature Rate-of-Rise

Activation Criteria Specific temperature threshold

Rapid increase in temperature

60 ◦ C 30 ◦ C increase within one minute Application Areas Stable temperature environments

Table 2.1: Fixed temperature and Rate-of-Rise(RoR) features

Flame detectors

Flame detectors are sophisticated sensors engineered to identify and react to the presence of a flame or fire Flame detectors are widely used in various industrial and commercial applications due to their ability to detect fires even before they produce significant amounts of smoke or heat.

Flame detectors operate by sensing specific wavelengths of light emitted by flames.There are three primary types of flame detectors, each based on different detection principles: Ultraviolet (UV), Infrared (IR), and combined UV/IR flame detectors are all available These detectors utilize photoelectric circuits, signal conditioning circuits, and microprocessors to detect the presence of flames.

Flame detectors detect the presence of fire by sensing the specific wavelengths of ultraviolet (UV) or infrared (IR) radiation emitted by flames Infrared flame detectors identify the characteristic IR radiation from hydrocarbon-based fires, which emit strongly in the IR spectrum These detectors can sense fires from considerable dis- tances, making them effective in large industrial settings.

Ultraviolet flame detectors respond to UV radiation, which is a component of the flame spectrum These detectors offer near-instantaneous detection but can be susceptible to false alarms from other UV sources like welding arcs or sunlight reflections.

A flame monitoring system primarily consists of the following components:

• Detector: The core component that senses the flame It includes photoelectric detection circuits, signal conditioning circuits, and microprocessor systems.

• Signal Processing: The signal from the sensor is converted into an electrical signal, processed by a microprocessor to determine the presence of a flame.

• Output Circuits: These circuits trigger an appropriate response, such as sounding an alarm, shutting off fuel lines, or activating a fire suppression system.

Despite their widespread use, traditional fire detection technologies have several limitations Smoke and heat detectors may not provide sufficiently early warnings in large or well-ventilated spaces, where smoke and heat can dissipate before reaching the detectors Additionally, these detectors can be prone to false alarms from non-fire sources such as cooking smoke, steam, or dust Regular maintenance, including battery replacements and cleaning, is required to ensure proper functioning Furthermore, single-type detectors may not be effective in all fire scenarios For example, ionization detectors may miss smoldering fires, while photoelectric detectors may be slower to detect fast-flaming fires.

These limitations underscore the need for more advanced fire detection solutions.

By integrating machine learning and sensor fusion technologies, it is possible to enhance the reliability and timeliness of fire detection, addressing many of the shortcomings associated with traditional methods Machine learning algorithms can analyze com-

Feature UV Flame De- tector

Detection Method UV Radiation IR Radiation Both UV and IR

Speed of Response High Moderate High

Types of Fires De- tected

Hydrocarbon, hy- drogen, metal fires

Hydrocarbon fires Hydrocarbon fires, broader spectrum Typical Applica- tions

Table 2.2: Comparison of UV, IR, and UV/IR flame detectors plex patterns in sensor data, improving detection accuracy and reducing false alarms,thereby offering a significant advancement in fire safety technology.

Machine Learning

Overview of Machine Learning

There are three primary categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning Each category addresses different types of problems and data structures, making them suitable for various applications. Additionally, ML algorithms can be categorized based on their function, such as classification, regression, clustering, and dimensionality reduction.

1 Supervised learning involves training algorithms on labeled datasets, where each input is paired with a known output The objective is to learn and establish a mapping from inputs to outputs that enables prediction of labels for new, unseen data Fire detection can classify sensor data into fire and non-fire events. Supervised learning algorithms commonly used include Support Vector Machines (SVMs), decision trees, and neural networks.

2 Unsupervised learning algorithms operate on unlabeled data to uncover hidden patterns or intrinsic structures within it Tasks such as clustering and dimensionality reduction are typical in unsupervised learning For fire detection, unsupervised learning can be useful for anomaly detection, where the system learns the normal patterns of sensor data and identifies deviations that may indicate a fire.

3 Reinforcement learning entails an agent interacting with an environment to ac- quire optimal behaviors through trial and error, where the agent receives rewards or penalties based on its actions and learns to maximize cumulative rewards over time While reinforcement learning is not extensively applied in fire detection, it has potential applications in optimizing fire response strategies and resource al- location Typical algorithms include Q-learning and deep reinforcement learning techniques.

Machine learning algorithms can also be categorized based on their function: h!

Figure 2.2: (a) Distance from a point to a line in 2D space, (b) Distance from a point to a plane in 3D space.

1 Classification: Assigning inputs to predefined categories, such as classifying sensor readings as ”fire” or ”no fire.” Common classification algorithms include SVM, decision trees, and neural networks.

2 Regression: Predicting continuous outputs based on input data In fire detection, regression can predict the rate of fire spread or temperature increase Examples include linear regression and neural networks.

3 Clustering: Grouping data points with similar features is essential for detecting patterns and anomalies in sensor data Techniques like k-means clustering and hierarchical clustering are commonly used for this purpose.

4 Dimensionality Reduction: Reducing the number of input features while retaining essential information, simplifying data analysis and improving algorithm efficiency Common techniques include PCA and t-SNE.

Support Vector Machine

SVM[7] is a widely used classification algorithm renowned for its robustness and effectiveness, particularly in cases where the dataset is linearly separable The fundamental principle of SVM revolves around finding the hyperplane that best divides a dataset into two classes while maximizing the margin between the classes.

To understand the working of SVM, it is crucial to grasp the geometric concepts underlying it In a two-dimensional space, the distance (x0, y0) from a point to a line defined by the equation w1x+w2y+b = 0 is given by:

This concept can be extended to higher dimensions In d-dimensional space, the distance from a point x 0 to a hyperplane defined by w T x+b= 0 is:

(2.3) where ||w|| 2 denotes the Euclidean norm ofw.

The main goal of SVM is to locate a hyperplane that not only separates the two classes but achieves this with the maximum possible margin This problem can be visualized by considering the simplest case of a linearly separable dataset in a two- dimensional space.

Given a collection of training data points (x i , y i ), wherex i ∈R d and y i ∈ {1,−1}, the aim is to determine a hyperplane represented by w T x+b= 0 such that: mini y i (w T x i +b)

(2.4) is maximized This problem can be expressed as:

By setting the constraint that for the closest points to the hyperplane, y i (w T x i + b) = 1, the problem transforms into: minw,b

This formulation ensures that the hyperplane is placed equidistant from the nearest points of both classes, maximizing the margin.

Solving the primal problem directly can be computationally intensive, especially for large datasets Therefore, the dual formulation is often employed, leveraging Lagrange multipliers The dual problem is expressed as: max λ

In this context,λrepresents the Lagrange multipliers corresponding to each training point The Karush-Kuhn-Tucker (KKT) conditions aid in identifying support vectors,

Figure 2.3: Support vectors (circled points) and the optimal decision boundary (solid line) The margins are shown as dashed lines. which are pivotal data points that determine the position of the hyperplane Impor- tantly, numerousλ i values are zero, highlighting that only a subset of the training data(the support vectors) significantly impacts the final model.

Principal Component Analysis

Dimensionality reduction plays a crucial role in machine learning and data analysis.

In real-world applications, feature vectors can have very high dimensions, sometimes reaching thousands Moreover, the number of data points is often large Direct storage and computation on such high-dimensional data can be challenging due to storage constraints and computational speed Hence, reducing the dimensionality of data is a crucial step in many applications, also considered a method of data compression. PCA, or Principal Component Analysis, is a statistical method employed for reducing dimensionality while preserving the essential variability present in the data. PCA transforms the data into a new coordinate system where the highest variances in any projection of the data align with the first coordinates, known as the principal components Subsequent coordinates capture decreasing variances in sequence.

Dimensionality reduction seeks a function that transforms an initial high-dimensional data pointx∈R D into a new data pointz∈R K , whereK < D Principal ComponentAnalysis (PCA) is one of the simplest and most commonly employed linear methods for dimensionality reduction.

Figure 2.4: The primary concept of PCA is to identify a new orthonormal system where the most significant components are preserved in the first K components [6]

In the context of PCA, it’s crucial to grasp the concept of norms, especially the 2-norm of a matrix The 2-norm of a matrix A is:

||x|| 2 (2.8) where x is a vector This definition implies that the 2-norm of a matrix is the maximum stretching factor by which the matrix can scale a vector.

The covariance matrix is a critical concept in PCA For a dataset with N data points represented as vectors x 1 ,x 2 , ,x N , the covariance matrix S is defined as:

XˆXˆ T (2.9) where ˆX is the data matrix with each data point subtracted by the mean vector ¯x. The covariance matrix S is symmetric and positive semi-definite.

The core idea behind PCA is to discover a new orthogonal basis for the data such that the projections of the data along the basis vectors (principal components) capture the maximum variance The steps involved in PCA are:

1 Standardize the data by subtracting the mean of each feature from the data, thereby centering it around the origin.

2 Calculate the covariance matrix S using the standardized data.

3 Perform eigen decomposition to determine the eigenvalues and eigenvectors of the covariance matrix The eigenvectors denote the principal components, while the eigenvalues indicate the variance captured by each principal component.

4 The transformation matrix U K is formed by selecting the K eigenvectors associated with the K largest eigenvalues.

5 Transform the data: Project the data onto the new K-dimensional subspace using the transformation matrixU K Mathematically, ifXis the original data matrix, the new data matrixZ is given by:

Artificial Neural Network

Artificial Neural Networks (ANNs) are computational models inspired by the biological neural networks of the human brain These networks are designed to simulate the way biological neurons communicate and process information.

Researchers have translated these insights into artificial neural networks (ANNs), which can be implemented on computers An ANN is composed of interconnected units called neurons, modeled after biological neurons The following sections describe the fundamental components and functionality of a single artificial neuron and the structure of ANNs.

1 Synapses (Connecting Links): These connections between neurons are charac- terized by weights (w kj ), which can be positive or negative, unlike the strictly positive weights in biological synapses These weights determine the strength and direction of the signal transmitted from neuron j to neuron k.

2 Adder: This component aggregates the input signals at each neuron The sum of weighted input signals and a bias term (b k ) forms the neuron’s input to the activation function.

3 Activation Function: The activation function (φ) processes the input signal and produces the neuron’s output signal (y k ) This function maps the input to a specific range of output values, enabling the network to capture non-linear relationships. The mathematical representation of a neuron’s operation is given by: uk& m

Here, x 1 , x 2 , , x m denote the input signals, w k1 , w k2 , , w km are their corresponding weights,u k represents the output of the linear combiner, b k is the bias term, and φ(.) denotes the activation function.

Related works on Machine Learning approach

Fire detection systems have been widely researched and developed to mitigate the risks associated with fire hazards Traditional systems primarily rely on smoke, heat,

Figure 2.5: Model of an artificial neuron labeled k[19] and gas sensors These sensors trigger alarms when they detect conditions indicative of a fire, such as increased temperatures, presence of smoke, or elevated levels of specific gases like carbon monoxide (CO) and carbon dioxide (CO2) However, traditional methods often suffer from delayed responses, particularly in detecting smoldering fires, which can result in significant property damage and loss of life before detection.

To enhance early fire detection capabilities, researchers have explored the integration of multiple sensors and data fusion techniques[30] Multi-sensor fusion combines information from various types of sensors to increase detection accuracy and reduce false alerts For example, combining temperature, smoke concentration, and CO measurements can provide a more comprehensive picture of the fire conditions and improve the robustness of the detection system.

The effectiveness of multi-sensor fusion is supported by studies demonstrating that individual sensors, such as temperature sensors, often require high thresholds to avoid false alarms caused by environmental changes However, when used in conjunction with other sensors, these thresholds can be lowered without increasing false alarm rates[30] For instance, smoke detectors are prevalent due to their direct response to fire indicators, yet their performance can be significantly enhanced when integrated into a multi-sensor system that includes chemical gas sensors capable of detecting volatile organic compounds (VOCs) released during the early stages of combustion.

Recent improvements in ML-DL have further revolutionized fire detection systems.

ML models have been employed to classify fire events based on sensor data These models typically require extensive feature engineering and are often used in combination with data fusion techniques to improve detection accuracy.

Figure 2.6: The data are initially recorded from the onset of the fire until the fire alarm is triggered As the fire starts, the risk level progressively escalates from low to medium and then to high.[18]

Deep learning models, particularly RNNs[16][21][15], have shown promise in mod- eling the sequential nature of fire sensor data These models can capture temporal dependencies and provide more accurate predictions of fire events by analyzing patterns in the sensor data over time.

Transformer-based models have been employed for fire prediction tasks[14] These models leverage the multi-head self-attention to better understand the sequential patterns in time-series data.

Numerous empirical studies have validated the effectiveness of these advanced fire detection approaches For example, Nazir et al (2022)[18] conducted controlled fire experiments in indoor laboratory environments to collect data on humidity, temperature, VOCs, and eCO2 levels Their analysis demonstrated the potential of using these sensor readings for early forecasting of smoldering fires, which are typically challenging to detect with conventional systems.

Publicly available datasets, such as those from the National Institute of Standards and Technology (NIST)[17], have been instrumental in developing and testing ML and

DL models for fire detection These datasets provide comprehensive sensor readings from various fire scenarios, enabling researchers to train and validate their models under diverse conditions.

Data Acquisition

The Indoor Laboratory Fire Dataset (ILFD)

Figure 3.1: Hardware Deployment setup for Experiments: (a) setup physically situated in the middle of the laboratory room; (b) experimental setup showing sensors connected to the microcontroller motherboard and the LoRa Node wireless system connecting it to the cloud.[18]

The Indoor Laboratory Fire Dataset[18] consists of controlled fire experiments conducted in a laboratory environment. The primary focus was on capturing sensor data under various fire conditions. The dataset includes measurements from multiple sensors, such as temperature, humidity, CO 2 , and volatile organic compounds (VOCs) The experiments were designed to simulate different types of fire scenarios 3.2.

Each experiment was carefully mon- itored, and data was recorded at regular intervals to capture the dynamics of fire development The dataset provides a detailed representation of sensor readings

Figure 3.2: Indoor Laboratory Fire Dataset conducted experiments[18] during the progression of fire incidents.

3.1.2 The NIST Report of Test FR 4016 Manufactured Home

Figure 3.3: NIST experiments and activation time for non-modified smoke alarms, heat alarms, and sprinkler

The NIST Fire Dataset[17] was obtained from the Building and Fire Re- search Laboratory at the National Insti- tute of Standards and Technology This dataset includes comprehensive data from a series of fire tests conducted in a manufactured home The measurements cover various parameters, including:

• Concentrations of CO, CO 2 , and

• Temperature at multiple locations within the structure

The dataset includes 27 tests 3.3, cov- ering different ignition sources and fire conditions, such as: smoldering and flaming mattress in bedroom, smoldering and flaming chair in living room, cooking oil fire on kitchen stove, etc.

Figure 3.4: Data acquisition system using IoT analytics platform service to aggregate live data streams on cloud

Experimental Household Data

The third dataset, referred to as the Experimental Household Data, was collected from my research on a series of controlled fire experiments conducted to mimic common household fire scenarios The experiments included various fire sources: electrical faults, cooking mishaps, and combustibles like clothing and paper.

Key measurements included: humidity, temperature, MQ139, TVOC (Total VolatileOrganic Compounds), eCO2 (estimatedCO 2 ) Data was collected using sensor devices3.4, and the experiments were designed to simulate realistic fire scenarios.

Research design

The primary focus of this research is on the a binary classification problem where the objective is to predict the occurrence of a fire outburst in an experimental setup. The label y represents the binary outcome of the experiment, where y = 0 indicates a normal state and y = 1 signifies a fire outburst The dataset is defined by a matrix

X of dimensions [n×m], where m denotes count of sensors, and n signifies count of seconds measured Each matrix X is flattened into a fixed-length vector X.

The problem is framed to be a binary classification task with inputs being time- series data from multiple sensors, and output of binary label indicating the occurrence of a fire.

Each experiment generates a matrix X with dimensions [n ×m], where each row represents a timestamp and each column represents a sensor reading To standardize the input, X is flattened into a vector of length n ×m, ensuring uniform input size across different experiments.

Spline Interpolation

Spline interpolation is a mathematical method employed to create a smooth curve that passes through a specified set of data points In the scope of this research, spline interpolation is employed to ensure uniform frequency of the sensor data, which is crucial for maintaining consistency and reliability in the subsequent machine learning processes.

z-score Normalization

Sensor reading was normalized using z-score normalization, with the formula: z = X−à σ (3.1) whereXis the sensor reading data point,àis the mean, andσis the standard deviation of the sensor readings This transformation makes the data has mean 0 and standard deviation 1, bringing all measurements onto a common scale.

Wavelet Transform

Wavelet Transform is used for analyzing time-series data, especially useful for capturing both time and frequency information Unlike the Fourier Transform, which only provides frequency domain information, the Wavelet Transform allows for a multi- resolution analysis, which means it can effectively localize features in both time and frequency domains, making it particularly suitable for non-stationary signals such as those collected from sensors in our experiments.

(a) Discrete wavelet transform (b) Time and frequency domain Figure 3.6: Time, frequency & wavelet transform

Wavelet Transform breaks down a signal into a series of basis functions known as wavelets, which exhibit localization in both time and frequency domains The most common types of Wavelet Transforms are the Continuous (CWT) and Discrete (DWT) Wavelet Transform In this dissertation, DWT is employed due to its computational efficiency and suitability for discrete data.

The Discrete Wavelet Transform is a sampled version of the CWT, which makes it computationally feasible for practical applications It entails breaking down the signal hierarchically using a sequence of low-pass and high-pass filters.

DWT of a signal f(t) is given by: f(t) = X k c j 0 ,k ϕ j 0 ,k (t) +

X k d j,k ψ j,k (t) (3.2) where: ϕ j 0 ,k (t) are the scaling functions (low-pass components), ψ j,k (t) are wavelet functions (high-pass components), c j 0 ,k are approximation coefficients at scale j 0 , d j,k are detail coefficients at scale j, position k.

DWT is implemented using filter banks The signal is processed through two filters: a low-pass filterL that monitors coarse approximations, and a high-pass filter H that captures the detail components This process is iterated on the low-pass filter output to obtain multi-level decompositions.

• Low-pass filter (Approximation coefficients): c j+1 [n] =X k c j [k]L[2n−k] (3.3)

• High-pass filter (Detail coefficients): d j+1 [n] =X k c j [k]H[2n−k] (3.4)

In the context of predicting fire outbursts, the Wavelet Transform is applied to each time-series sensor data to extract both high-frequency (detail) and low-frequency (approximation) components This dual representation captures transient phenom- ena (such as sudden temperature spikes) as well as more gradual changes (like slowly increasing TVOC/eCO2 levels).

• Decomposition: The sensor data is decomposed into multiple levels using the DWT, separating the signal into various frequency bands.

• Feature Extraction: The coefficients obtained from the DWT (both approximation and detail) are used as features for the classification model These coefficients provide a rich representation of the signal’s behavior over time.

• Noise Reduction: By focusing on significant wavelet coefficients and discarding those corresponding to noise, the Wavelet Transform also aids in denoising the data, improving the robustness of the predictive model.

Kalman Filtering

Kalman filtering is an algorithm that particularly effective for time-series data, where it helps to filter out noise and produce a smoother signal, improving the quality of the data used for subsequent analysis and prediction tasks.

Kalman filter is an optimal, recursive linear estimator that works in two phases: prediction and update, aiming to estimate the state of a dynamic system using noisy measurements.

Kalman filter relies on a system of mathematical model:

Figure 3.7: Example of original and filtered data

1 State Equation: Describes how the state of the system evolves over time. x k =Ak−1xk−1 +Bk−1uk−1 +wk−1 (3.5) where: x k is the state vector at time step k,

A k−1 is the state transition matrix,

Bk−1 is the control input matrix, uk−1 is the control vector, wk−1 is the process noise, assumed to be Gaussian with zero mean and covariance Q.

2 Measurement Equation: Relates the state vector to the measurement vector by incorporating observations into the state estimation. z k =H k x k +v k (3.6) where: where: zk represents the measurement vector at time step k, Hk denotes the observation matrix, v k signifies the measurement noise, assumed to follow aGaussian distribution with zero mean and covariance R.

PCA-SVM Implementation

PCA

PCA is used to decrease the dimensionality of sensor data while retaining maxi- mal variability Initially, sensor readings are normalized using z-score normalization to standardize the features, ensuring that the PCA is not biased towards features with

Figure 3.8: Cumulative % variance explained by principal components larger scales The normalized data is then transformed using PCA, retaining components that explain 95% of the variance This helps reduce the data’s dimensionality while maintaining its essential characteristics.

SVM is used for binary classification, aiming to find the optimal hyperplane to separates the two classes of data Two SVM versions are trained: one on the original normalized data and another on the PCA-transformed The SVM with a kernel of radial basis function is chosen for its effectiveness in handling non-linearities.

The dataset is split into training and testing sets, the details are given in Table. SVM is trained using original normalized data and on the PCA-transformed The performance of both models is evaluated using accuracy metric The evaluation shows that the SVM model trained on the PCA-transformed data performs better than the one trained on the original data:

Table 3.1: Performance results of SVMs

(a) k-fold cross-validation (b) Grid-search

Fine-tuning

To further enhance the performance of the SVM model, hyperparameters are fine- tuned using grid search and cross-validation techniques The hyperparameters include the C parameter, kernel type, gamma parameter for the RBF kernel, and polynomial degree for the polynomial kernel Specifically, the following ranges for hyperparameters were explored during the grid search:

• Kernel type: [’linear’, ’rbf’, ’poly’]

ANN Implementation

The model consisted of an input layer matching the number of features in the dataset, followed by one or more hidden layers with a varying number of neurons Each hidden layer employed the ReLU activation function, which aids in learning complex patterns by introducing non-linearities Dropout layers were introduced to mitigate overfitting by randomly deactivating a fraction of input units, setting them to zero during training, thus ensuring the model generalizes well to unseen data The output layer employed a sigmoid activation function to generate a probability score for the binary classification task.

The model was compiled using a suitable optimizer and loss function For this task, the Adam optimizer was chosen due to its effectiveness and capability to manage sparse gradients on noisy problems The binary cross-entropy loss function was chosen as it is appropriate for binary classification tasks, evaluating the model performance by comparing the predicted probabilities with the actual binary outcomes.

Hyperparameter tuning was conducted to find the optimal settings for the model. The final model was evaluated using accuracy,and ROC-AUC.

Accuracy ROC-AUC Accuracy ROC-AUC

Table 3.2: Performance result of ANNs

Results on collected Experimental Household Data

In this section, we present the results of our fire detection system, developed using the datasets described earlier The ILFD and NIST was used to train the machine learning models, while the Experimental Household Data was used as the testing set. The primary aim was to validate the effectiveness of the machine learning models in accurately detecting fire incidents using realistic household fire data.

Table 3.3: SVMs (fine-tuned) on collected Experimental Household Data

Conclusion

The primary objective of this research was to develop and implement an effective early fire a system for detection utilizing advanced machine learning techniques. The focus was on leveraging sensor data to accurately identify potential fire events, thereby providing timely alerts to prevent extensive damage and enhance safety This research demonstrated the viability of using PCA combined with SVM and ANN for this purpose A comprehensive system architecture was developed, integrating various sensors, data collection units, preprocessing modules, machine learning models, and an alert system Z-score normalization and PCA were effectively applied to preprocess sensor data, standardizing it and reducing its dimensionality, thereby enhancing the performance and efficiency.

PCA-SVM and ANN models provided robust frameworks for detecting fire events. The SVM model, optimized through hyperparameter tuning, and the ANN model, trained with extensive sensor data, demonstrated high accuracy and reliability Both models were rigorously evaluated using metrics of accuracy, F1-score, and ROC-AUC. PCA-SVM model achieved an accuracy of 87% Both models successfully predicted fire events on real data measured using the designed system, confirming the practical effectiveness of the research The system provides timely and accurate detection of fire events, enabling prompt response and potentially saving lives and property Its modular design innovates easy ability to scale, tailored for various indoor environments.

Despite the successful outcomes, the research faced challenges such as ensuring high-quality sensor data, balancing model complexity with computational efficiency, and implementing the system in real-world scenarios This research demonstrated that machine learning, specifically PCA-SVM and ANN, can be effectively applied to develop an early fire detection system The findings underscore the importance of combining data science with traditional sensor technology to enhance safety systems.This research contributes significantly to the field of fire detection and safety systems,offering a robust framework for future development and broader applications.

Future work

While this research has successfully developed an effective early fire detection system utilizing machine learning techniques, several areas for future work have been identified to enhance the system’s accuracy, reliability, and applicability.

One significant area for future work is the completion, fabrication, and delivery of the complete software and hardware solution of the system Finalizing the design and production of the integrated system will ensure that it meets practical requirements and can be deployed effectively in various environments This includes refining the sensor integration, optimizing the data processing algorithms, and ensuring the system’s robustness and reliability.

Developing comprehensive training, management, and operating instruction documents is also essential These documents will help users understand and utilize the full potential of the system, providing detailed guidelines on system installation, operation, and maintenance Training materials should cover the use of the software interface, interpretation of alerts, and routine system checks to ensure optimal performance Management documents should include protocols for regular updates and troubleshooting guides.

Deploying and analyzing the solution in practical settings is the next critical step.Implementing the system in real-world environments will allow for extensive testing and evaluation This practical deployment will help identify any issues or limitations of the system that were not apparent during controlled testing Analyzing the performance data collected during these deployments will provide valuable insights for further refinement and optimization of the system Additionally, real-world feedback from users will be instrumental in improving the usability and effectiveness of the system.

[1] “2023 Hanoi building fire - Wikipedia” In: (2023).url:https://en.wikipedia. org/wiki/2023_Hanoi_building_fire.

[2] 4 Smart Fire Protection Technologies to Safeguard Your Home - Rescu Saves Lives 2024 url: https://www.rescusaveslives.com/blog/4-smart-fire- protection-technologies-to-safeguard-your-home/.

[3] Christopher M Bishop Pattern Recognition and Machine Learning Springer, 2006.

[4] Leo Breiman “Random Forests” In: Machine Learning 45.1 (2001), pp 5–32.

[5] Kyunghyun Cho et al “Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation” In: Proceedings of the 2014 Con- ference on Empirical Methods in Natural Language Processing (EMNLP) 2014, pp 1724–1734.

[6] Duy Coban.Principal Component Analysis (PCA).https://machinelearningcoban. com/2017/06/15/pca/ Accessed: 2024-06-16 2017.

[7] Corinna Cortes and Vladimir Vapnik “Support-vector networks” In: Machine Learning 20.3 (1995), pp 273–297.

[8] Richard O Duda, Peter E Hart, and David G Stork Pattern Classification. John Wiley & Sons, 2012.

[9] Fire Detection and Management through a Multi-Sensor Network for the Pro- tection of Cultural Heritage Areas from the Risk of Fire and Extreme Weather Conditions 2023 url: https : / / cordis europa eu / project / id / 244088 / reporting/fr.

[10] Fire hazard stalks Vietnam’s tube houses 2022 url: https://e.vnexpress.net / news / life / trend / fire - hazard - stalks - vietnam - s - tube - houses -4258889.html.

[11] Ian Goodfellow et al “Generative Adversarial Nets” In: Advances in Neural Information Processing Systems (NIPS) 2014, pp 2672–2680.

[12] Sepp Hochreiter and J¨urgen Schmidhuber “Long Short-Term Memory” In:Neu- ral Computation 9.8 (1997), pp 1735–1780.

[13] Crowcon Detection Instruments What is a Flame Detector and How Does it Work? https://www.crowcon.com/blog/what-is-a-flame-detector-and- how-does-it-work/ Accessed: 2024-06-16 2024.

[14] Young-Seob Jeong et al “Sensor-Based Indoor Fire Forecasting Using Trans- former Encoder” In: Sensors 24.7 (2024), p 2379 doi: 10 3390 / s24072379. url:https://www.mdpi.com/1424-8220/24/7/2379.

[15] James L McClelland and David E Rumelhart “Serial order: a parallel distributed processing approach” In:Parallel Distributed Processing: Explorations in the Mi- crostructure of Cognition, Vol 2 Ed by David E Rumelhart, James L McClel- land, and the PDP Research Group Cambridge, MA, USA: MIT Press, 1986, pp 318–362.

[16] Marvin Minsky and Seymour Papert Perceptrons: An Introduction to Compu- tational Geometry https://archive.org/embed/perceptronsintro00mins. Accessed: 2024-06-16 1969.

[17] National Institute of Standards and Technology NIST Report of Test FR 4016. https://www.nist.gov/el/nist-report-test-fr-4016 Accessed: 2024-06-

[18] Amril Nazir et al “Early Fire Detection: A New Indoor Laboratory Dataset and Data Distribution Analysis” In:Fire 5.1 (2022).issn: 2571-6255.doi:10.3390/ fire5010011 url: https://www.mdpi.com/2571-6255/5/1/11.

[19] Truong Long Nguyen.L´y thuyt v mng n-ron nhˆan to (Artificial Neural Network - ANN) https://nguyentruonglong.net/ly-thuyet-ve-mang-no-ron-nhan- tao-artificial-neural-network-ann.html Accessed: 2024-06-16 2024.

[20] F Pedregosa et al “Scikit-learn: Machine Learning in Python” In: Journal ofMachine Learning Research 12 (2011), pp 2825–2830.

Tiêu đề	Research and Design of a System for Early Fire Detection Using Machine Learning on Indoor Laboratory Data
Tác giả	Trần Thảo Chi
Người hướng dẫn	Assoc. Prof. Dr. Ha Manh Hung
Trường học	Vietnam National University, Hanoi International School
Chuyên ngành	Business Data Analytics
Thể loại	Graduation Project
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	42
Dung lượng	9,07 MB