project report introduction to artificial intelligent facial recognition and classification between vietnamese and foreigner

We then analyze the effectiveness of the models in correctlyclassifying Vietnamese and foreign faces and identify areas for improvement.Our experiments involved using machine learning al

Trang 1

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

——————– o0o ———————

PROJECT REPORT INTRODUCTION TO ARTIFICIAL INTELLIGENT

Facial recognition and classification

between Vietnamese and foreigner

Lecturer: Prof Pham Van Hai, Ph D

Class: 139410 (IT3160E)

Hanoi, 2023

Trang 2

1.1 History of facial recognition technology 1

1.2 Customer classification 2

1.3 Problem description 2

2 Model 4 2.1 Data Collection and Data preprocessing 5

2.1.1 Data sources 5

2.1.2 Processing steps 5

2.2 Neural network structure 7

2.2.1 ResNet50 model 7

2.2.2 Residual Block 10

2.2.3 Optimizer and Loss function 12

3 Analysis 14 4 Conclusion 17 4.1 Evaluation 17

4.2 Experimental result 19

4.3 Future development 23

Trang 3

Facial recognition technology is a rapidly advancing field that has a significant impact onmany industries This technology enables critical emerging applications such as security andsurveillance, authentication, and human-computer interaction As the technology continues

to evolve, it will continue to transform various fields in the future

In the ”Introduction to Artificial Intelligence – IT3160E” course, we gained fundamentalknowledge about facial recognition technology In our project, our goal was to distinguishbetween Vietnamese and foreign faces using facial recognition algorithms

This report provides an explanation of the theoretical basis of the algorithms we usedand how we implemented them We then analyze the effectiveness of the models in correctlyclassifying Vietnamese and foreign faces and identify areas for improvement

Our experiments involved using machine learning algorithms like convolutional neural works (CNNs) and support vector machines (SVMs) CNNs can automatically learn relevantfeatures from image data, while SVMs perform classification based on learned patterns

net-We trained and tested the models on a dataset containing photos of Vietnamese andforeign faces We evaluated the models based on their accuracy and ability to generalize tonew images However, we faced some challenges, such as the need for a larger and morediverse dataset to improve performance

Keywords: Facial Recognition – Renet50

Trang 4

Chapter 1

Introduction

1.1 History of facial recognition technology

Facial recognition technology is a powerful tool that uses facial features to identify andverify individuals It has numerous applications, including access control, surveillance, lawenforcement, and targeted advertising One of its primary functions is access control, where

it is used to verify the identity of individuals This technology has also become more accuratewith the help of machine learning and AI

However, there are concerns about potential biases, privacy, and security issues cial recognition algorithms can recognize facial landmarks like the eyes, eyebrows, nose, andmouth shape, and by measuring the distances and relative sizes of these features, a mathe-matical representation of the face is created for comparison and matching These algorithmscan also introduce biases and errors in the identification process As a result, governmentsand organizations must establish guidelines and regulations to ensure ethical and responsibledevelopment and use of the technology

Fa-In conclusion, facial recognition technology has the potential to transform various fields,but it also raises important questions and concerns While it has numerous applications inaccess control, surveillance, law enforcement, and targeted advertising, there are concernsabout potential biases, privacy, and security issues By establishing ethical guidelines andresponsible development, we can ensure that this technology is used in a way that benefitssociety as a whole

1

Trang 5

Facial recognition could potentially allow businesses to automatically categorize customers

as Vietnamese or foreign upon entry, in real-time While this raises privacy and bias concerns,

if implemented responsibly it could supply data to optimize service for different nationalities

To build an effective categorization system, companies will need a large and diverse dataset

of Vietnamese and foreign customer facial images to train their systems They must alsoimplement policies, disclosures and consent processes to gain customer trust and addressethical issues around the technology’s use This will be key to realizing the benefits ofcustomer categorization while minimizing potential harm

The intended functionality of the final model will be to provide key facial measurementsfor any given input image, include:

• Age

• Gender

• Ethnicity as either Vietnamese or foreigner

2

Trang 6

1.3 PROBLEM DESCRIPTION 3

• Emotional state

Through extensive training and testing, the goal is to optimize the model’s performancemetrics such as facial recognition accuracy and classification accuracy when distinguishingbetween Vietnamese and foreign faces This designed functionality has promising applica-tions for use cases such as targeted advertising, access control, and customer insights andsegmentation

3

Trang 7

Chapter 2

Model

Our project ill utilize three pre-trained model weights: age detection, gender detection, andemotion detection We will specifically train the models to differentiate between Vietnameseand foreign individuals

To train the model for distinguishing between Vietnamese and foreign individuals, weneed to gather and process the data, followed by model construction and training

1 Data collection: We will collect a diverse dataset consisting of images or relevant datafrom both Vietnamese and foreign individuals This dataset should adequately repre-sent the characteristics and variations present in both groups

2 Data preprocessing: The collected data will undergo preprocessing steps such as imageresizing, normalization, and data augmentation techniques to enhance the quality andvariety of the dataset This ensures that the model can generalize well to unseen data

3 Model construction: We will design and build a suitable neural network architecturefor the task of differentiating between Vietnamese and foreign individuals This ar-chitecture may involve various layers, such as convolutional layers for image analysis,followed by fully connected layers for classification

4 Model training: The constructed model will be trained using the preprocessed dataset.The training process involves feeding the model with input data and adjusting its inter-nal parameters through an optimization algorithm (e.g., stochastic gradient descent)

4

Trang 8

2.1 DATA COLLECTION AND DATA PREPROCESSING 5

The goal is to minimize the difference between the model’s predictions and the groundtruth labels in the training dataset

During training, it’s crucial to validate the model’s performance on a separate validationdataset to monitor its progress and prevent overfitting Fine-tuning the model’s hyperparam-eters, such as learning rate and regularization techniques, may also be necessary to optimizeits performance

2.1 Data Collection and Data preprocessing

2.1.1 Data sources

We collect data of images of Vietnamese people and foreigner people from these sources:

• Vietnamese images:Image Vietnamese

• Foriegn images:Image Foriegner

img= cv2 imread(img)

img= cv2.cvtColor(img, cv2 COLOR_BGR2RGB)

boxes_face= face_recognition.face_locations(img)

face_image=None

if len(boxes_face)!=0:

forbox_faceinboxes_face:

5

Trang 9

2.1 DATA COLLECTION AND DATA PREPROCESSING 6

ifface_imageis notNone:

# Tranform the image to tensor that can be fed into model

data_transform=transforms Compose([

transforms Resize(( 128 128, )),

transforms ToTensor(),

transforms Normalize(( 0.5,0.5,0.5), (0.5,0.5,0.5))

])

input_tensor=data_transform(Image.fromarray(face_image)) unsqueeze( ) 0

Since the main framework we use to build our model is tensorflow, we need to transformnumpy array data to tensor so that it can be fed into the model The data transformsfunction is being used to complete this task:

importtorchvision.transformsastransforms

data_transform=transforms Compose([

Trang 10

2.2 NEURAL NETWORK STRUCTURE 7

train_data=ImageFolder("data/train",transform=data_transforms)

train_loader=DataLoader(train_data, batch_size= 32, shuffle=True)

test_data= ImageFolder("data/test",transform= data_transforms)

test_loader=DataLoader(train_data, batch_size= 32, shuffle= False)

After completing all these steps, we can finally train our deep learning model with theprocessed data

2.2 Neural network structure

2.2.1 ResNet50 model

ResNet (short for Residual Network ) is a popular deep learning model architecture knownfor its ability to train very deep neural networks effectively It introduces skip connectionsthat allow the network to learn residual mappings, enabling the training of deeper modelswithout suffering from the vanishing gradient problem

In summary, the code defines a ResNet model architecture with multiple residual blocks

7

Trang 11

for image classification

Initial convolutional layer

This begin block given the tensor and prepare it before feed into residual block

• Convolution: extract input tensor from 3 channels to 32 channels

• Batch Normalization: normalize distribution between each

• Activation: ReLU algorithm

• Max Pooling: chose the great value of each kernels (2, 2) from receipted field (channels

of tensor)

Residual block

A residual block is a fundamental building block used in residual neural networks (ResNet)

It is designed to address the vanishing gradient problem and enable the training of verydeep neural networks effectively The key idea behind a residual block is to introduce skipconnections that allow the network to learn residual mappings, which are the differencebetween the input and output of a block (this construction will be ex)

This network use 8 residual blocks with 4 blocks down sampling the half of size

8

Trang 12

Global Average Pooling

Chose the mean value of each channel from tensor:

• This layer converts all channels of residual block’s output into notes of fully connectedlayer (512 channels = 512 notes)

• In the same way as the Flatten function, the Average Pooling operation aggregates theaverage value of the information Unlike Flatten, which transforms the input into aonedimensional vector, Average Pooling reduces the spatial dimensions of the input bytaking the average value within each pooling region

Fully Connected Network

The prediction refers to a one-hot vector that represents the predicted class probabilities,with each element in the vector corresponding to a class The predicted probabilities fallwithin the range of (0, 1) The number of elements in the vector is equal to the number ofclasses in the problem Hence, the activation function should be sigmoid function

9

Trang 13

Additionally, the final layer of the network, which produces the predictions, has the samenumber of nodes as the number of classes Each node in the final layer corresponds to aspecific class, and the output of the network represents the probabilities of each class based

on the input data, Ex:

• pred1= [[0.873, 0 233]] → pred1[0][0] > pred1[0][1]→Vietnamese

• pred2= [[0.551, 0 773]] → pred2[0][0] < pred2[0][1] → Foreigner

2.2.2 Residual Block

10

Trang 14

• 1 × 1 convolution to match the dimensions of the shortcut with the number of filters

• Batch normalization is applied to the shortcut

11

Trang 15

Addition and activation

The output of the identity block is added to the shortcut, and ReLU activation is applied

to the sum Dropout regularization is then applied

2.2.3 Optimizer and Loss function

input_shape= 128 128 3( , , )

num_classes= 2

optimizer= Adam(learning_rate= 0.0003)

model=resnet_model(input_shape, num_classes)

model.complied(optimizer= optimizer,

12

Trang 16

Learning rate (custom): 0.0003 because we find that this value is the best fit for data

13

Trang 17

Chapter 3

Analysis

In this section, we will discuss the execution and flow of the project, focusing on the facerecognition application The application offers two modes: image mode and real- time cam-era mode In image mode, when the ”input image” parameter is received, the applicationautomatically retrieves the image file by accessing the provided file path In real-time cameramode, the application activates the device’s camera to capture live video frames for process-ing

The image or video frames are then passed into the f face info class, which contains eral sub-classes responsible for different tasks These sub-classes include vietnamese detection,emotion detection, gender detection, and age prediction Each sub-class loads itscorresponding model and provides predictions based on the input data

# instanciar detectores

age_detector=f_my_age Age_Model()

gender_detector= f_my_gender Gender_Model()

race_detector=f_my_race Race_Model()

emotion_detector=f_emotion_detection.predict_emotions()

14

Trang 18

Within thef face infoclass, there is a face cropping function that extracts the facial regionfrom the input image This step ensures that the processed image aligns with the face imagesused during model training, enhancing the accuracy of predictions

Once the image is processed and the face is cropped, the sub-classes, including the trained models, perform their respective tasks For example, the vietnamese detectionsub-class predicts the ethnicity of the detected face, while theemotion detectionsub-classpredicts the emotional state, and thegender detectionsub-class predicts the gender Theage prediction sub-class estimates the age range of the individual

pre-However, it’s important to note that the age, emotion, and gender models used in ourapplication rely on pre-trained weights Therefore, the input to these models must conform

to their specific requirements to generate accurate predictions

Additionally, the application includes a box bound function, which creates a boundingbox around the detected face This bounding box serves as a visual indicator on the image

or real-time camera feed, highlighting the recognized object

fordata_faceinout:

box=data_face["bbx_frontal_face"]

Trang 19

ac-16

Trang 20

• The main goal is to achieve high accuracy in correctly classifying Vietnamese andforeign individuals.

2 Dataset:

• The dataset consists of facial images of both Vietnamese and foreign individuals

• It comprises 13,000 images, with 5,000 images belonging to Vietnamese individualsand 6,500 images belonging to foreign individuals

• The dataset is divided into a training set (11,000 images) and a test set (2,000images)

3 Evaluation Metrics:

• Accuracy is used as the primary metric to evaluate the performance of the model

17

Trang 21

4.1 EVALUATION 18

• Accuracy is calculated as the ratio of correctly classified instances to the totalnumber of instances in the test set

4 Evaluation Results:

• The model achieved an accuracy of over 90% on the test set

• Out of 2,000 test images, the model correctly classified 1,780 images, while rectly predicting 220 images

incor-• These results demonstrate the model’s ability to distinguish between Vietnameseand foreign individuals with high accuracy

5 Comparison with Other Models:

• To assess the performance of the developed model, a comparison was made withtwo existing state-of-the-art models in the field: Model Resnet50(which is used inour project) and Model VGG16

• Both ResNet and VGG16 demonstrated good performance in distinguishing tween Vietnamese and foreign individuals

be-• ResNet outperformed VGG16 with a slightly higher accuracy of 90% compared to89%

• This indicates that ResNet is better suited for the specific task of differentiatingbetween Vietnamese and foreign individuals based on facial images in this dataset

7 Evaluation against Objectives:

• The initial objective was to achieve high accuracy in classifying Vietnamese andforeign individuals

18

Tiêu đề	Introduction to Artificial Intelligent Facial Recognition and Classification Between Vietnamese and Foreigner
Tác giả	Dang Phuc Khoa, Nguyen Trong Duy, Nguyen Dinh Dung, Le Phu Tai, Nguyen Tien Thanh
Người hướng dẫn	Prof. Pham Van Hai, Ph. D
Trường học	Hanoi University of Science and Technology, School of Information and Communication Technology
Chuyên ngành	Information and Communication Technology
Thể loại	Project Report
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	27
Dung lượng	2,52 MB