Deep learning models, such as convolutional neural networks CNNs, havedemonstrated their ability to learn and extract meaningful features from wildfire-relatedimages, enabling accurate f
INTRODUCTION .-.- ng HH Hư 7
Objective and SCOPE 200 eeccecceessceseceseceeceeeeeeeceeseecseecseeseceseaeseaeeeaeseaeeeaes 12
Train machine learning models: The goal is to create machine learning algorithms capable of proficiently detecting and categorizing wildland fires This entails acquiring and preprocessing labeled datasets comprising both fire and non- fire areas and leveraging these datasets to train models adept at precisely classifying images associated with fires This process will involve utilizing datasets sourced [9] to enhance the algorithm's ability to recognize and distinguish fire-related patterns.
Enhance accuracy and efficiency: The goal is to improve the accuracy and efficiency of wildfire detection and classification by leveraging the power of deep learning models Deep learning algorithms, such as convolutional neural networks (CNNs), can learn complex features from images, enabling more accurate detection and classification of fires.
Adaptability and scalability: The aim is to develop models and algorithms that can adapt to different environments and are scalable to handle large datasets.
Integration with Web Backend: Integrate the ML and DL models into the web backend, ensuring smooth communication between the frontend and backend components for image processing.
Result Visualization on Web Interface: Present the detection results on the web interface in a visually interpretable manner, providing users with immediate feedback on the presence or absence of wildland fires in the uploaded images.
Machine learning algorithms play a pivotal role in the analysis of both infrared (IR) and RGB images, offering a sophisticated approach to detecting early signs of potential fire outbreaks The efficacy of these algorithms is notably
12 enhanced through meticulous training on diverse datasets, allowing them to discern and recognize intricate patterns across a spectrum of environmental conditions This adaptability is crucial in achieving a higher level of accuracy in fire detection, transcending the limitations posed by varying atmospheric and geographical factors.
In addition to their fundamental function of classifying fire images, contemporary studies are increasingly focusing on more advanced vision tasks within the realm of fire detection A notable example is the real-time identification and precise localization of fire or smoke areas, often represented with bounding boxes This evolution in the capabilities of machine learning algorithms transcends traditional fire detection methodologies, paving the way for a more nuanced understanding of fire dynamics and facilitating proactive intervention strategies.
Furthermore, the utilization of established object detection frameworks has proven to be instrumental in achieving real-time fire detection across diverse environments This is particularly evident in the realm of surveillance videos, where the algorithms can dynamically identify and track potential fire occurrences, contributing to enhanced situational awareness and rapid response mechanisms.
In summary, the multifaceted application of machine learning algorithms in fire detection not only involves their ability to classify fire images accurately but extends to advanced vision tasks such as real-time identification and localization of fire or smoke areas The continuous evolution of these algorithms, coupled with their adaptability to diverse datasets, signifies a significant stride towards more effective and sophisticated fire detection methodologies, particularly in dynamic and complex environmental scenarios.
This thesis is divided into five chapters, as follows: ¢ Chapter 1: Introduction
* Chapter 2: Theoretical background and related works ¢ Chapter 3: Experiments and results ¢ Chapter 4: Conclusions
Thesis SfTUCUIF€ Sàn HT HH TH TH TH TH HT TH Hàn HH 14
2.1.1 Wildfire Classification Using Deep Learning
Classify pixels within our FLAME? dataset into four categories: Flames with Smoke (YY), Flames with No Smoke (YN), Smoke with No Flames (NY), and No Flames No Smoke (NN) We employed several well-established machine learning and deep learning classification models These include ResNet, DenseNet121, SqueezeNet on the dataset The above models
Split data into Split Data train set and test set
Using ResNet, Train KNN Model SqueezeNet
FIGURE 2.1 Flow chart of training process.
THEORETICAL BACKGROUND AND RELATED WORKS
Theoretical Background - - 211132118511 1911 9111911 1 1 1 vn rưy 15
2.1.1 Wildfire Classification Using Deep Learning
Classify pixels within our FLAME? dataset into four categories: Flames with Smoke (YY), Flames with No Smoke (YN), Smoke with No Flames (NY), and No Flames No Smoke (NN) We employed several well-established machine learning and deep learning classification models These include ResNet, DenseNet121, SqueezeNet on the dataset The above models
Split data into Split Data train set and test set
Using ResNet, Train KNN Model SqueezeNet
FIGURE 2.1 Flow chart of training process.
Label Number of IR images | Number of RGB images | Class
No Fire,No Smoke | 13,700 13,700 NN
No Fire, Smoke 0 0 NY activation
} Ww Ww; M Ị b WX, + WX, + 2X; + Wx, +b a
*; ef =| Might, + WX, + WX, + WX, + — ,
As depicted in figure 2.2, it becomes evident that within a feedforward neural network, each layer is equipped with its distinctive weight matrix and biases, symbolizing the weights associated with individual neurons within the layer The utilization of the nn.Linear layer facilitates the realization of the matrix multiplication operation between input data and the weight matrix, coupled with the addition of the bias term for each respective layer [17].
FIGURE 2.3 ReLU layer [17] iv 0 n wn o
The ReLU (Rectified Linear Unit) layer is a fundamental component in neural networks, serving as an activation function applied to the output of each neuron in a specific layer It introduces non-linearity to the model by transforming negative inputs to zero and leaving positive inputs unchanged Mathematically, the ReLU activation function is defined as f (x) = max (0, x), where x is the input to the neuron The ReLU layer aids in capturing complex patterns and relationships within the data, allowing the neural network to learn and adapt during the training process.
With RGB images, each pixel is represented by three color channels: Red, Green, and Blue The model processes RGB images using convolutional layers, pooling layers, and fully connected layers to learn features from the images.
Explanation of how a model can handle RGB images: e Convolutional Layer (Conv): This layer applies filters to the image to extract features such as edges, corners, or color patterns. e Activation Layer (ReLU): Typically added after the Conv layer to introduce non-linearity into the model, allowing it to learn more complex features.
Pooling Layer (MaxPooling or AveragePooling): This layer reduces the size of the output from the Conv layer by retaining the maximum or average values within each region.
Fully Connected Layer (FC) or Linear Layer: This layer transforms the output from the Pooling layer into a suitable form for classification It often involves linear transformations and activation functions.
Softmax Layer: If the task is classification, a Softmax layer is often used at the end of the model to convert the output into predicted probabilities for each class.
With infrared (IR) images, the handling is quite similar to RGB images, with some differences in terms of the input channels and the nature of the data IR images typically have a single channel, representing the intensity of infrared radiation.
Convolutional Layer (Conv): Similar to RGB, convolutional layers are used to extract features from the IR images However, since IR images usually have a single channel, the number of input channels for the Conv layer is one.
Activation Layer (ReLU): Applied after the Conv layer to introduce non- linearity.
Pooling Layer (MaxPooling or AveragePooling): Reduces the spatial dimensions of the feature maps.
Fully Connected Layer (FC) or Linear Layer: Transforms the output from the Pooling layer for classification The number of input neurons in the first FC layer would be determined by the flattened size of the feature maps.
Softmax Layer: Used for classification tasks to produce probability scores for each class.
During training, the model learns to capture relevant patterns from the intensity values in the IR images The training process involves adjusting the weights of the model to minimize the difference between predicted outputs and actual labels.
When combining both RGB and IR modalities in a dual-stream model, the information from both streams is fused, allowing the model to leverage the complementary information from different imaging modalities for improved performance The specific architecture of the dual-stream model, such as the ResNet50_two_stream, aims to capture features from both RGB and IR images and make joint predictions.
FIGURE 2.4 Overview of how a dual-stream model works.
ResNet-50 is a deep neural network architecture that belongs to the ResNet (Residual Network) family It was introduced in the paper titled "Deep Residual Learning for Image Recognition" by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and
Jian Sun, published in 2016 ResNet-50 is specifically designed for image classification tasks and has 50 layers, including convolutional layers, residual blocks, and fully connected layers [13].
The ResNet architecture with 50 layers comprises the following components, as outlined in the table provided:
A convolution operation with a 7x7 kernel and 64 additional kernels using a stride of 2.
A max pooling layer with a stride of 2.
Nine additional layers, consisting of a 3x3 convolution with 64 kernels, followed by two more layers—one with 1x1 kernels (64 in number) and another with 1x1 kernels (256 in number) These three layers are repeated three times.
Twelve more layers, featuring 1x1 kernels (128 in number), 3x3 kernels (128 in number), and 1x1 kernels (512 in number), repeated four times. Eighteen additional layers, incorporating 1x1 kernels (256 in number), two 3x3 kernels (256 in number), and 1x1 kernels (1024 in number), iterated six times.
Nine more layers, including 1x1 kernels (512 in number), 3x3 kernels (512 in number), and 1x1 kernels (2048 in number), repeated three times. Average pooling: Subsequently, there is a fully connected layer containing 1000 nodes, employing the softmax activation function.
20 layer name | output size 18-layer 34-layer 50-layer 101-layer 152-layer convl 112x112 7x7, 64, stride 2
2 l : conv2x | 56x56 [ ee Ì* | ee |*a 3x3,64 |x3 3x3,64 |x3 3x3,64 |x3 xa wee 1x1, 256 1x1, 256 1x1, 256
IxI average pool, 1000-d fc, softmax
SqueezeNet was collaboratively developed by researchers from DeepScale, the University of California, Berkeley, and Stanford University.
The input to SqueezeNet is typically an RGB image The standard input size is 224x224 pixels with three color channels (red, green, and blue) This size and channel
21 configuration align with common image datasets, such as ImageNet, on which SqueezeNet was often trained.
The output of SqueezeNet is a probability distribution over the different classes in the classification task For example, if SqueezeNet is trained on the ImageNet dataset, which consists of 1,000 classes, the output will be a vector of probabilities with 1,000 elements Each element represents the likelihood of the input image belonging to a particular class.
Layers breakdown: e layer 1: regular convolution layer e layer 2-9: fire module (squeeze + expand layer) e layer 10: regular convolution layer e layer 11: softmax layer
Related wWOFK .- - àL nh HH TH HT TH TT nh HH TH 25
This section discusses the many research efforts that have been conducted to build models for detecting fire and smoke detection systems With the growth of AI, numerous research attempts have been made to detect the presence of fire/smoke in
25 images using machine learning and deep learning models However, in this work, we examined CNN-based models for fire/smoke detection As an example, Krizhevsky introduced AlexNet, a deep convolutional neural network that demonstrated outstanding performance during the 2012 ImageNet Challenge [3] Moreover, a multitude of CNN variations have demonstrated outstanding performance in the categorization of images [4] A survey investigated Convolutional Neural Networks (CNNs) in the context of smoke and fire detection [5] Moreover, this undertaking also delved into current datasets and provided overviews of contemporary computer vision methodologies In conclusion, the authors emphasized the challenges and possible remedies to advance the progress of Convolutional Neural Networks (CNNs) in this domain Mahmoud created an efficient fire detection system that utilized Convolutional Neural Networks (CNN) and transfer learning [6] The model employed a CNN architecture with a suitable computing time for real-time applications and claimed that the proposed approach demanded less training and classification time compared to established models in the literature, thanks to the incorporation of transfer learning Bari employed their curated v3-based dataset from online sources and recorded videos to fine-tune the InceptionV3 and MobileNetV2 models for fire detection [7] The researchers observed that models trained with transfer learning outperformed those fully trained when using a small dataset The researchers in the mentioned study devised a method employing a Fast Regional Convolutional Neural Network (Fast R-CNN) [8].
A selective search technique was applied to identify candidate images from the sample images As demonstrated by the outcomes, the fast R-CNN smoke detection exhibited an elevated detection rate and a reduction in false alarms Vinicius suggested a fire detection system based on Convolutional Neural Networks (CNNs) suitable for devices with power constraints [9] In an effort to reduce the computational burden of a deep detection network while preserving its initial performance, this approach entails training the network and subsequently removing less essential convolutional filters Dampage introduced a system and methodology that utilizes a wireless sensor network to detect forest fires at their inception [10] Furthermore, to enhance the precision of fire detection, the proposal includes the use of a machine learning regression model In their study, Dogan recommended employing deep learning models such as ResNet and
InceptionNet for fire detection in images [11] These models were utilized to extract features, and the classification of these features was accomplished using Support Vector Machines (SVM) The authors illustrated that ResNet exhibited superior performance in their experiments X Chen [12] suggested the utilization of deep learning architectures like ResNet, Logistic, LeNet, VGG, and MobileNet for the detection of fires in images, specifically using the Flame2 [2] dataset gathered by drone systems.
By delving into the existing literature, our aim is to combine deep learning models with pixel segmentation techniques, allowing for the development of a holistic wildfire detection and classification system This system has the capability to efficiently analyze real-time data from diverse sources, evaluate the presence and intensity of wildfires, and deliver timely alerts and visualizations to firefighting agencies The synergistic utilization of deep learning models and pixel segmentation techniques equips these systems to promptly and accurately identify wildfires, thereby supporting rapid response and the implementation of effective wildfire management strategies.
After researching many documents for choosing the suitable solution for our scenario We decided to implement ResNet, SqueezeNet, DenseNet for the detection of fires And the dataset we use is Flame2 [2] with include both IR and RGB images And we use MSER and NMS for object recognition especially the frame in the pictures by drawing boxes because it is more appropriate in scenarios such as image segmentation or post-processing steps in object detection pipelines The reason for this is Pre-trained models are trained on large-scale datasets for general tasks like image classification. Leveraging these pre-trained models as a starting point allows us to benefit from the knowledge gained during their training Also, training deep neural networks from scratch can be computationally expensive and time-consuming Pre-trained models serve as a good initialization point, reducing the time and resources required to train a model for a specific task.
EXPERIMENTS AND RESULTS 5 S5 srsesrrersrsee 28
Data Collection and Data Set Generafing - -c ccnssnneirerreeree 28
Methods for wildfire detection and modeling using drones offer a distinct advantage over traditional remote monitoring systems, such as satellites They provide high-precision, real-time monitoring, enabling swift intervention and effective management strategies Drones, with their easy deployment, maneuverability, and robust sensing capabilities, prove especially valuable for early wildfire detection in challenging environments However, the development of drone-based monitoring systems has been hindered by a scarcity of well-annotated aerial wildfire datasets, largely due to UAV flight regulations for prescribed burns and wildfires The accompanying dataset comprises infrared and visible spectrum video pairs captured by drones during a prescribed fire in Northern Arizona in 2021 The frames are classified by two independent evaluators with binary labels indicating the presence of fire or smoke The dataset is a crucial resource for advancing drone-based wildfire monitoring, offering valuable context through supplementary weather information, the burn plan, a geo-referenced RGB point cloud, an RGB orthomosaic, and additional references.
Initial Resolution Frame Pairs: Comprising 53,541 juxtaposed RGB/IR frame pairs (jpg), derived from the original video pairs within the dataset These frames underwent a reduction in size to 254p x 254p and were cropped to ensure comparable Field of View (FOV) and perspective for both RGB and IR frames The dataset contains 53,451 RGB frames, occupying 3.04 GB, and 53,451 IR frames, occupying 5.30 GB. Each frame pair is annotated with labels indicating '"Fire/NoFire" and
Label Number of IR images Number of RGB images
The dataset images bear the imprints of annotations from two human experts, yielding two binary labels shaped by their shared consensus The "Fire/No Fire" label captures the essence of whether the experts perceived the manifestation of fire, finding its presence in the tapestry of either the RGB or IR frame In contrast, the "Smoke/No Smoke" label encapsulates the contemplation of the experts, envisioning the ethereal presence of smoke within the RGB frame The decision to embrace a threshold emanates from the conviction that human experts can weave this contemplation more seamlessly and accurately than alternative thresholds might allow In light of the
29 nebulous definition surrounding the smoke/no-smoke label and the evasive transparency of smoke, users are urged to construe the label as an emblem of a profound presence of smoke within the frame It remains pivotal to underscore that the two labels share an intricate relationship, not mutually exclusive, as the dataset underwent the meticulous process of labeling by two distinct individuals Only upon harmonious consensus between both experts did the final labels crystallize.
The incorporation of dual RGB/IR data for labeling enhances accuracy compared to relying solely on RGB data The IR frames remain unaffected by smoke, enabling precise Fire/No Fire labeling irrespective of the degree of smoke obscuring the RGB frame Figure 3.1 illustrates multiple examples of frame pairs where the integration of both RGB and IR data elevates labeling accuracy and provides more comprehensive information than what a single RGB camera would offer.
In fig 3.2, ‘A’ belongs to No Fire/ No smoke ‘B’ belongs to Fire/ smoke ‘C’ belongs to Fire/ No smoke.
React ha
We use React for the front-end to build a user interface and create a visually appealing and responsive user interface for displaying the processed images and results.
We use react because React has a large and active community, providing a wealth of
31 resources, libraries, and third-party components This supports ecosystem accelerates development and problem-solving.
Python is the good option for process the image because Python boasts a vast ecosystem of libraries and tools for data processing, analysis, and machine learning.Popular libraries like NumPy, pandas, and scikit-learn provide powerful capabilities for handling large datasets and implementing machine learning algorithms Also, Python offers robust machine learning libraries such as TensorFlow and PyTorch for deep learning, as well as scikit-learn for traditional machine learning These libraries provide extensive functionalities for building and training models on large datasets.
Experimental Processes 1n <
For the per-frame fire classification in each experiment, we exclusively utilized arandomly sampled 20% of the data, dividing it into an 80% training set and a 20% test set.
Splitting a dataset into an 80% training set and a 20% test set is common in machine learning The training set teaches the model patterns, and the test set assesses how well the model generalizes to new data, preventing overfitting and ensuring a fair evaluation of real-world performance.
Table 3.2 An 80% training set and a 20% test set
CPU AMD Ryzen 5 5500U with Radeon Graphics
Utilization Speed Base speed: 2.10 GHz
Cores: 6 Processes Threads Handles Logical processors: 12
All experiments are performed on computer configuration: CPU (Central Processing Unit): AMD Ryzen 5 5500U, 2.1GHz, 6 cores, 12 threads with a RAM (Random Access Memory): 8GB DDR4 3200MHz) and a GPU (Graphics Processing Unit): AMD Radeon(TM) Graphics.
We employ the ADAM [16] optimizer with a learning rate set to le—3 for our models A learning rate of le-3 is often used as a starting point for fine-tuning or transfer learning scenarios It allows for gradual adjustments to pre-trained weights without causing drastic changes The training process spans 10 epochs, representing complete cycles through the entire training dataset Throughout training, we utilize a batch size of 64 A batch size of 64 is often chosen for computational efficiency It
33 strikes a balance between processing a reasonable number of samples in parallel, leveraging GPU capabilities efficiently, and managing memory constraints And label smoothing is incorporated with a probability of 0.2 to enhance model robustness by preventing overconfidence, encouraging calibration, and reducing overfitting This regularization technique introduces controlled uncertainty in predictions, particularly beneficial for handling noisy labels and improving generalization to unseen data The choice of a probability value, such as 0.2, allows for tuning the trade-off between encouraging uncertainty and maintaining model accuracy.
Table 3.3 Training parameters in the experiment
Parameter Value Parameter Value Learning Rate 1.00E-03 Momentum 0.1
Batch Size 64 Epochs 10 Images size 254*254
Moving beyond exclusive attention to classification accuracy, our main focus centers on broader metrics at the macro level, including macro F1 score, macro recall, and macro precision This emphasis is pivotal in practical situations where instances of wildfires are infrequent, making it challenging to accurately assess the model's performance based solely on classification accuracy For instance, in a dataset comprising 2000 images, the occurrence of a wildfire sample might be as rare as one instance If the model predicts all samples as "no fire," the reported accuracy could reach 99.9% To evaluate performance for each class, we calculate recall (R), precision (P), and F1 score.
TP represents True Positive, TN represents True Negative, FP represents False Positive, and FN represents False Negative The macro-level metrics compute the average of the class-wise metrics, irrespective of the sample count in each class.
The results are shown in Table 3.3 Both ResNet50 with mode ir and DenseNet121 with mode rgb achieve the highest accuracy, Fl score, precision and recall Others with different modes also showed the high number. open(log_ path, „newline= csv_write = -writer(f csv_write.writerow(data_row
FIGURE 3.4 Save FI score, P and R
This code segment is part of a larger process of logging model evaluation metrics toa CSV file It helps keep track of model performance over multiple runs and provides a structured way to store and analyze the evaluation results The metrics recorded in the CSV file can be later used for comparisons, visualizations, or further analysis.
35 results_array -zeros(3,dtype results array loss_list results array = train_acc results array|: test_acc args.log loss path+file_name+
This code is used to save training loss, training accuracy, and test accuracy in a NumPy array and store it as a binary file for later use or analysis The saved file will contain the information necessary to review or compare different training sessions of the model.
TABLE 3.4 F1 Score, Precision, Recall, Accuracy
Model_name Mode | F1 Score Precision | Recall Accuracy
DenseNet121(classes_num): model = densenet121(pretrained= model
In Fig 3.4, the function starts by loading the pre-trained DenseNet121 model from the torchvision.models module Pretrained=True means that the model is loaded with weights pre-trained on a large dataset (usually ImageNet).
After loading the pre-trained model, the function modifies the classifier part of the model In DenseNet121, the classifier is typically a fully connected layer (linear layer) that outputs predictions The code replaces this classifier with a new linear layer.
Finally, the modified DenseNet121 model is returned from the function.
SqueezeNet_two_stream(nn.Module):
(parameter) self: Self@SqueezeNet_two_stream
# Stream 1 with SqueezeNet self.streaml = models.squeezenet1_1(pretrained=True) param in self.stream1.parameters(): param.requires_grad = False
# Modify the final convolutional layer self.stream1.classifier[1] = nn.Sequential(nn.Conv2d(512, 256, kernel_size=1) )
# Stream 2 with Squee et self.stream2 = models.squeezenet1_1(pretrained=True)
FIGURE 3.7 SqueezeNet_two_stream model
In fig 3.5, the two streams process different input data The final output is a combination of the features extracted from both streams. def SqueezeNet(classes_num): model = models.squeezeneti_@(pretrained=True) # You can also use squeezeneti_1
# Modify the cl ifier model.num_classes = classes_num model
38 class ResNet5@_two_stream(Module): def _ init (self
He super()._ init_ () self.stream1= models.resnet59(pretrained=True) parameter.required_grad = False self.stream1.fc = nn.Sequential(nn.Linear(in_features 48, out_features%6) ) self.stream2= models.resnet50(pretrained=True) parameter in self.stream2.parameters(): parameter.required_grad = False self.stream2.fc = nn.Sequential(nn.Linear(in_features 48, out_features%6) )
ResNet50(classes_num): model = models.resnet5@(pretrained=True)
# Modify the classifier model.fc = nn.Sequential( nn.Linear(2048, 512), nn.Dropout(@.5), nn.Linear(512, classes_num
FIGURE 3.10 ResNet50_two_stream model
Accuracy results of different methods for each model
ResNet50_ir ResNet50_rgb SqueezeNet_ir SqueezeNet_rgb DenseNet121 ir DenseNet121_rgb
FIGURE 3.11 Accuracy results of different methods for each model
F1 Score results of different methods for each model
ResNet50_ir ResNet50_rgb SqueezeNet_ir SqueezeNet_rgb DenseNet121 ir DenseNet121_rgb
FIGURE 3.12 F1 score results of different methods for each model
Presision results of different methods for each model
ResNet50_ir ResNet50_rgb SqueezeNet_ir SqueezeNet_rgb
FIGURE 3.13 Precision results of different methods for each model
Presision results of different methods for each model
FIGURE 3.14 Recall results of different methods for each model
NoFire/NoSmoke Hre/Smoke Hre/NoSmoke
FIGURE 3.15 Confusion Matrix — DenseNet121 — mode RGB
In Figure 3.15, the instances with numerous correct predictions (176 cases) correspond to Fire/Smoke, and this can be attributed to the model being extensively trained on Fire/Smoke data.
NoFire/NoSmoke Hre/Smoke Hre/NoSmoke
FIGURE 3.16 Confusion Matrix — DenseNet121 — mode IR
In Figure 3.16, the instances with numerous correct predictions (170 cases) correspond to Fire/Smoke, and this can be attributed to the model being extensively trained on Fire/Smoke data Conversely, the cases with numerous incorrect predictions involve 1 instance wrongly classified as Fire/No Smoke when they were actually Fire/Smoke This misclassification may result from limitations in the training process with Fire/No Smoke data, and another possible factor is that the characteristics of these cases are not fully described.
NoFire/NoSmoke Hre/Smoke Hre/NoSmoke
FIGURE 3.17 Confusion Matrix — ResNet50 — mode IR
In Figure 3.15, the instances with numerous correct predictions (159 cases) correspond to Fire/Smoke, and this can be attributed to the model being extensively trained on Fire/Smoke data Conversely, the cases with numerous incorrect predictions involve 2 instances wrongly classified as Fire/No Smoke when they were actually Fire/Smoke. This misclassification may result from limitations in the training process with Fire/No Smoke data, and another possible factor is that the characteristics of these cases are not fully described.
NoFire/NoSmoke Hre/Smoke Hre/NoSmoke
FIGURE 3.18 Confusion Matrix — ResNet50 — mode RGB