We then analyze the effectiveness of the models in correctlyclassifying Vietnamese and foreign faces and identify areas for improvement.Our experiments involved using machine learning al
Trang 1HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY
——————– o0o ———————
PROJECT REPORT INTRODUCTION TO ARTIFICIAL INTELLIGENT
Facial recognition and classification
between Vietnamese and foreigner
Lecturer: Prof Pham Van Hai, Ph D
Class: 139410 (IT3160E)
Hanoi, 2023
Trang 21.1 History of facial recognition technology 1
1.2 Customer classification 2
1.3 Problem description 2
2 Model 4 2.1 Data Collection and Data preprocessing 5
2.1.1 Data sources 5
2.1.2 Processing steps 5
2.2 Neural network structure 7
2.2.1 ResNet50 model 7
2.2.2 Residual Block 10
2.2.3 Optimizer and Loss function 12
3 Analysis 14 4 Conclusion 17 4.1 Evaluation 17
4.2 Experimental result 19
4.3 Future development 23
Trang 3Facial recognition technology is a rapidly advancing field that has a significant impact onmany industries This technology enables critical emerging applications such as security andsurveillance, authentication, and human-computer interaction As the technology continues
to evolve, it will continue to transform various fields in the future
In the ”Introduction to Artificial Intelligence – IT3160E” course, we gained fundamentalknowledge about facial recognition technology In our project, our goal was to distinguishbetween Vietnamese and foreign faces using facial recognition algorithms
This report provides an explanation of the theoretical basis of the algorithms we usedand how we implemented them We then analyze the effectiveness of the models in correctlyclassifying Vietnamese and foreign faces and identify areas for improvement
Our experiments involved using machine learning algorithms like convolutional neural works (CNNs) and support vector machines (SVMs) CNNs can automatically learn relevantfeatures from image data, while SVMs perform classification based on learned patterns
net-We trained and tested the models on a dataset containing photos of Vietnamese andforeign faces We evaluated the models based on their accuracy and ability to generalize tonew images However, we faced some challenges, such as the need for a larger and morediverse dataset to improve performance
Keywords: Facial Recognition – Renet50
Trang 4Chapter 1
Introduction
1.1 History of facial recognition technology
Facial recognition technology is a powerful tool that uses facial features to identify andverify individuals It has numerous applications, including access control, surveillance, lawenforcement, and targeted advertising One of its primary functions is access control, where
it is used to verify the identity of individuals This technology has also become more accuratewith the help of machine learning and AI
However, there are concerns about potential biases, privacy, and security issues cial recognition algorithms can recognize facial landmarks like the eyes, eyebrows, nose, andmouth shape, and by measuring the distances and relative sizes of these features, a mathe-matical representation of the face is created for comparison and matching These algorithmscan also introduce biases and errors in the identification process As a result, governmentsand organizations must establish guidelines and regulations to ensure ethical and responsibledevelopment and use of the technology
Fa-In conclusion, facial recognition technology has the potential to transform various fields,but it also raises important questions and concerns While it has numerous applications inaccess control, surveillance, law enforcement, and targeted advertising, there are concernsabout potential biases, privacy, and security issues By establishing ethical guidelines andresponsible development, we can ensure that this technology is used in a way that benefitssociety as a whole
1
Trang 5Facial recognition could potentially allow businesses to automatically categorize customers
as Vietnamese or foreign upon entry, in real-time While this raises privacy and bias concerns,
if implemented responsibly it could supply data to optimize service for different nationalities
To build an effective categorization system, companies will need a large and diverse dataset
of Vietnamese and foreign customer facial images to train their systems They must alsoimplement policies, disclosures and consent processes to gain customer trust and addressethical issues around the technology’s use This will be key to realizing the benefits ofcustomer categorization while minimizing potential harm
The intended functionality of the final model will be to provide key facial measurementsfor any given input image, include:
• Age
• Gender
• Ethnicity as either Vietnamese or foreigner
2
Trang 61.3 PROBLEM DESCRIPTION 3
• Emotional state
Through extensive training and testing, the goal is to optimize the model’s performancemetrics such as facial recognition accuracy and classification accuracy when distinguishingbetween Vietnamese and foreign faces This designed functionality has promising applica-tions for use cases such as targeted advertising, access control, and customer insights andsegmentation
3
Trang 7Chapter 2
Model
Our project ill utilize three pre-trained model weights: age detection, gender detection, andemotion detection We will specifically train the models to differentiate between Vietnameseand foreign individuals
To train the model for distinguishing between Vietnamese and foreign individuals, weneed to gather and process the data, followed by model construction and training
1 Data collection: We will collect a diverse dataset consisting of images or relevant datafrom both Vietnamese and foreign individuals This dataset should adequately repre-sent the characteristics and variations present in both groups
2 Data preprocessing: The collected data will undergo preprocessing steps such as imageresizing, normalization, and data augmentation techniques to enhance the quality andvariety of the dataset This ensures that the model can generalize well to unseen data
3 Model construction: We will design and build a suitable neural network architecturefor the task of differentiating between Vietnamese and foreign individuals This ar-chitecture may involve various layers, such as convolutional layers for image analysis,followed by fully connected layers for classification
4 Model training: The constructed model will be trained using the preprocessed dataset.The training process involves feeding the model with input data and adjusting its inter-nal parameters through an optimization algorithm (e.g., stochastic gradient descent)
4
Trang 82.1 DATA COLLECTION AND DATA PREPROCESSING 5
The goal is to minimize the difference between the model’s predictions and the groundtruth labels in the training dataset
During training, it’s crucial to validate the model’s performance on a separate validationdataset to monitor its progress and prevent overfitting Fine-tuning the model’s hyperparam-eters, such as learning rate and regularization techniques, may also be necessary to optimizeits performance
2.1 Data Collection and Data preprocessing
2.1.1 Data sources
We collect data of images of Vietnamese people and foreigner people from these sources:
• Vietnamese images:Image Vietnamese
• Foriegn images:Image Foriegner
img= cv2 imread(img)
img= cv2.cvtColor(img, cv2 COLOR_BGR2RGB)
boxes_face= face_recognition.face_locations(img)
face_image=None
if len(boxes_face)!=0:
forbox_faceinboxes_face:
5
Trang 92.1 DATA COLLECTION AND DATA PREPROCESSING 6
ifface_imageis notNone:
# Tranform the image to tensor that can be fed into model
data_transform=transforms Compose([
transforms Resize(( 128 128, )),
transforms ToTensor(),
transforms Normalize(( 0.5,0.5,0.5), (0.5,0.5,0.5))
])
input_tensor=data_transform(Image.fromarray(face_image)) unsqueeze( ) 0
Since the main framework we use to build our model is tensorflow, we need to transformnumpy array data to tensor so that it can be fed into the model The data transformsfunction is being used to complete this task:
importtorchvision.transformsastransforms
data_transform=transforms Compose([
Trang 102.2 NEURAL NETWORK STRUCTURE 7
train_data=ImageFolder("data/train",transform=data_transforms)
train_loader=DataLoader(train_data, batch_size= 32, shuffle=True)
test_data= ImageFolder("data/test",transform= data_transforms)
test_loader=DataLoader(train_data, batch_size= 32, shuffle= False)
After completing all these steps, we can finally train our deep learning model with theprocessed data
2.2 Neural network structure
2.2.1 ResNet50 model
ResNet (short for Residual Network ) is a popular deep learning model architecture knownfor its ability to train very deep neural networks effectively It introduces skip connectionsthat allow the network to learn residual mappings, enabling the training of deeper modelswithout suffering from the vanishing gradient problem
In summary, the code defines a ResNet model architecture with multiple residual blocks
7
Trang 112.2 NEURAL NETWORK STRUCTURE 8
for image classification
Initial convolutional layer
This begin block given the tensor and prepare it before feed into residual block
• Convolution: extract input tensor from 3 channels to 32 channels
• Batch Normalization: normalize distribution between each
• Activation: ReLU algorithm
• Max Pooling: chose the great value of each kernels (2, 2) from receipted field (channels
of tensor)
Residual block
A residual block is a fundamental building block used in residual neural networks (ResNet)
It is designed to address the vanishing gradient problem and enable the training of verydeep neural networks effectively The key idea behind a residual block is to introduce skipconnections that allow the network to learn residual mappings, which are the differencebetween the input and output of a block (this construction will be ex)
This network use 8 residual blocks with 4 blocks down sampling the half of size
8
Trang 122.2 NEURAL NETWORK STRUCTURE 9
Global Average Pooling
Chose the mean value of each channel from tensor:
• This layer converts all channels of residual block’s output into notes of fully connectedlayer (512 channels = 512 notes)
• In the same way as the Flatten function, the Average Pooling operation aggregates theaverage value of the information Unlike Flatten, which transforms the input into aonedimensional vector, Average Pooling reduces the spatial dimensions of the input bytaking the average value within each pooling region
Fully Connected Network
The prediction refers to a one-hot vector that represents the predicted class probabilities,with each element in the vector corresponding to a class The predicted probabilities fallwithin the range of (0, 1) The number of elements in the vector is equal to the number ofclasses in the problem Hence, the activation function should be sigmoid function
9
Trang 132.2 NEURAL NETWORK STRUCTURE 10
Additionally, the final layer of the network, which produces the predictions, has the samenumber of nodes as the number of classes Each node in the final layer corresponds to aspecific class, and the output of the network represents the probabilities of each class based
on the input data, Ex:
• pred1= [[0.873, 0 233]] → pred1[0][0] > pred1[0][1]→Vietnamese
• pred2= [[0.551, 0 773]] → pred2[0][0] < pred2[0][1] → Foreigner
2.2.2 Residual Block
10
Trang 142.2 NEURAL NETWORK STRUCTURE 11
• 1 × 1 convolution to match the dimensions of the shortcut with the number of filters
• Batch normalization is applied to the shortcut
11
Trang 152.2 NEURAL NETWORK STRUCTURE 12
Addition and activation
The output of the identity block is added to the shortcut, and ReLU activation is applied
to the sum Dropout regularization is then applied
2.2.3 Optimizer and Loss function
input_shape= 128 128 3( , , )
num_classes= 2
optimizer= Adam(learning_rate= 0.0003)
model=resnet_model(input_shape, num_classes)
model.complied(optimizer= optimizer,
12
Trang 162.2 NEURAL NETWORK STRUCTURE 13
Learning rate (custom): 0.0003 because we find that this value is the best fit for data
13
Trang 17Chapter 3
Analysis
In this section, we will discuss the execution and flow of the project, focusing on the facerecognition application The application offers two modes: image mode and real- time cam-era mode In image mode, when the ”input image” parameter is received, the applicationautomatically retrieves the image file by accessing the provided file path In real-time cameramode, the application activates the device’s camera to capture live video frames for process-ing
The image or video frames are then passed into the f face info class, which contains eral sub-classes responsible for different tasks These sub-classes include vietnamese detection,emotion detection, gender detection, and age prediction Each sub-class loads itscorresponding model and provides predictions based on the input data
# instanciar detectores
age_detector=f_my_age Age_Model()
gender_detector= f_my_gender Gender_Model()
race_detector=f_my_race Race_Model()
emotion_detector=f_emotion_detection.predict_emotions()
14
Trang 18Within thef face infoclass, there is a face cropping function that extracts the facial regionfrom the input image This step ensures that the processed image aligns with the face imagesused during model training, enhancing the accuracy of predictions
Once the image is processed and the face is cropped, the sub-classes, including the trained models, perform their respective tasks For example, the vietnamese detectionsub-class predicts the ethnicity of the detected face, while theemotion detectionsub-classpredicts the emotional state, and thegender detectionsub-class predicts the gender Theage prediction sub-class estimates the age range of the individual
pre-However, it’s important to note that the age, emotion, and gender models used in ourapplication rely on pre-trained weights Therefore, the input to these models must conform
to their specific requirements to generate accurate predictions
Additionally, the application includes a box bound function, which creates a boundingbox around the detected face This bounding box serves as a visual indicator on the image
or real-time camera feed, highlighting the recognized object
fordata_faceinout:
box=data_face["bbx_frontal_face"]
Trang 19ac-16
Trang 20• The main goal is to achieve high accuracy in correctly classifying Vietnamese andforeign individuals.
2 Dataset:
• The dataset consists of facial images of both Vietnamese and foreign individuals
• It comprises 13,000 images, with 5,000 images belonging to Vietnamese individualsand 6,500 images belonging to foreign individuals
• The dataset is divided into a training set (11,000 images) and a test set (2,000images)
3 Evaluation Metrics:
• Accuracy is used as the primary metric to evaluate the performance of the model
17
Trang 214.1 EVALUATION 18
• Accuracy is calculated as the ratio of correctly classified instances to the totalnumber of instances in the test set
4 Evaluation Results:
• The model achieved an accuracy of over 90% on the test set
• Out of 2,000 test images, the model correctly classified 1,780 images, while rectly predicting 220 images
incor-• These results demonstrate the model’s ability to distinguish between Vietnameseand foreign individuals with high accuracy
5 Comparison with Other Models:
• To assess the performance of the developed model, a comparison was made withtwo existing state-of-the-art models in the field: Model Resnet50(which is used inour project) and Model VGG16
• Both ResNet and VGG16 demonstrated good performance in distinguishing tween Vietnamese and foreign individuals
be-• ResNet outperformed VGG16 with a slightly higher accuracy of 90% compared to89%
• This indicates that ResNet is better suited for the specific task of differentiatingbetween Vietnamese and foreign individuals based on facial images in this dataset
7 Evaluation against Objectives:
• The initial objective was to achieve high accuracy in classifying Vietnamese andforeign individuals
18