The second dataset aids us in carrying out the second task – face detection.Utilizing Support Vector Machines SVM and Histogram of Oriented Gradients HOG, we have trained a model to dete
Trang 1SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING
DIGITAL IMAGE PROCESSING
PROJECT REPORT Topic: ANIME FACE RECOGNITION
Instructor: Prof Trn Th Thanh Hi
Trang 2Japanese anime plays a crucial role in providing entertainment and is one of the threeindispensable components of ACG (Anime, Comic, and Game) Despite the widespread consumption of anime, viewers often encounter difficulties in recognizing characters solely based on their faces in images This challenge forms the basis of our project
In the overall context of the project, we aim to present an image containing anime characters Through machine learning model training methods, our model can accurately determine the full names of the characters in the image To address this problem, we have divided the project into three main tasks: data collection, face detection, and character recognition By successfully completing these tasks, we anticipate that the model will have the capability to accurately identify the facial positions of the characters and display their full names
We have divided the problem into three main tasks: data collection, anime face detection, and character identification
Trang 3The initial stage involves gathering a diverse dataset containing various images of different anime characters, along with a dataset containing facial features of these characters The first dataset is crucial for training the model to recognize the complexand unique features of each character accurately The second dataset aids us in carrying out the second task – face detection.
Utilizing Support Vector Machines (SVM) and Histogram of Oriented Gradients (HOG), we have trained a model to detect anime faces in images SVM assists in classifying and distinguishing between positive examples (containing anime faces) and negative examples (without anime faces), while HOG supports in extracting facial features This combined method ensures sharpness in detecting anime characterfaces in images, addressing challenges posed by the distinctive and often intricate facial features
The third and crucial task focuses on character identification using the VGG16 model VGG16, a renowned deep neural network with image recognition capabilities,has been fine-tuned to meet the specific requirements of anime character
identification The model has undergone training on our diverse and extensive dataset
to capture the intricate details of each character
The smooth integration of SVM/HOG for face detection and VGG16 for character identification forms a comprehensive pipeline The expected result is an efficient model capable of accurately determining the position of anime faces in images and framing them within a window This window acts like an image crop, displaying onlythe face of the character instead of the entire image This facilitates precise and easy character identification
Below are images for a clearer visualization of the results:
Trang 4
RELATED STUDIES
Below are some works related to our project:
1 The HOG (Histogram of Oriented Gradients) characterization method:
HOG stands for Histogram of Oriented Gradients - a type of feature descriptor The purpose of a feature descriptor is to abstract an object by extracting its characteristics and discarding irrelevant information Therefore, HOG is primarilyused to describe the shape and appearance of an object in an image
The essence of the HOG method lies in utilizing information about the
distribution of intensity gradients or edge directions to describe local objects in animage HOG operators are implemented by dividing an image into sub-regions called "cells," and for each cell, a histogram of gradient directions is computed for the points within that cell Concatenating these histograms results in a representation for the original image To enhance recognition performance, local histograms can be normalized to contrast by calculating an intensity threshold over a larger region than the cell, called "blocks," and using that threshold value
to normalize all cells within the block The result after normalization is a feature vector with higher invariance to changes in lighting conditions
There are five basic steps to construct a HOG vector for an image, including:
a Preprocessing
b Computing gradients
Trang 5c Calculating feature vectors for each cell
d Normalizing blocks
e Computing the HOG vector
Here we won't delve into the details of how to calculate HOG following the standard procedure Essentially, the HOG computation method in the project also adheres to the standard process We will touch on how to calculate it in the methodology section
2 Support Vector Machine (SVM):
Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression, and even outlier detection tasks SVMs can be used for a variety of tasks, such as text
classification, image classification, spam detection,,handwriting
identification, gene expression analysis, face detection, and anomaly
detection SVMs are adaptable and efficient in a variety of applications because they can manage high-dimensional data and nonlinear
relationships
The main objective of the SVM algorithm is to find the optimal hyperplane in
an N-dimensional space that can separate the data points in different classes in the feature space The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible The dimension of the hyperplane depends upon the number of features If the number of input features is two, then the hyperplane is just a line If the number of input features
is three, then the hyperplane becomes a 2-D plane It becomes difficult to imagine when the number of features exceeds three
Trang 6In the project, Support Vector Machines (SVM) machine learning is used to classify positive and negative examples in the process of detecting anime faces in images Specifically, SVM is applied to classify pixels that contain facial features
of anime characters (positive examples) and pixels that do not contain those features (negative examples)
Trang 7VGG16 is a convolutional neural network (CNN) model proposed by the Visual Geometry Group (VGG) research team at the University of Oxford The name "VGG16" comes from the combination of "VGG" and the number
"16," referring to the total number of layers in the model
This model is renowned for its deep architecture, consisting of 16 layers, including convolutional layers and fully connected layers The entire
architecture of VGG16 is designed to learn high-level features from images, and it has excelled in various image recognition and classification tasks.VGG16 has become one of the key computer vision models and is widely used in both research and practical applications
For anime face recognition, my partner uses the neural network VGG16 for class classification to identify who is the character
Trang 8multi-The Sliding Window Algorithm and Bounding Boxes are employed in the task to determine the precise location of anime faces in the image Below is a detailed description of how they are integrated:
Sliding Window Algorithm:
Purpose: Used to slide over the entire image with a fixed-size window to check which position of the image contains an anime face
Execution: Iterate through all possible positions on the image with a sliding window and apply the detection model to determine whether that region contains
an anime face or not
Benefits: Allows us to examine each small part of the image, capturing the accurate position of the face and enhancing the algorithm's accuracy
Benefits: The bounding box helps visualize the position and shape of the face, providing information for the subsequent steps in the detection and display process
In summary, combining the Sliding Window Algorithm and Bounding Boxes ensures that the model can capture the accurate position of anime faces in the image and present visual results for the user
Trang 9All data used to construct the dataset are images This dataset has been carefullycurated and preprocessed Based on the investigation, it is found that the dataset has been manually processed The author has utilized images from various sources such as:
Trang 10Dectection:
The dataset includes 2 types: positive and negative In positive, there are6.098 file for train and 146 file for test Similarly, negative have 19.670 filefor train and 218 file for test
The positive files will be images of the character's face, while the negative files will be images that do not contain a face
Ex: positive - negative
Recognition:
The dataset for the recognition problem is a collection of images of 10 different anime characters A total of 789 files for training and 185 files for testing
Trang 111 This step reveals the intensity changes and edges in the image.
Cell Division: Divide the image into cells, usually of size 8x8 pixels by default
Gradient Information: Calculate the gradient for each pixel in the cell, resulting in magnitudes and directions for the image patch Each cell will yield
a set of 128 numbers (8x8x2) if using the default settings
Histogram Creation: Form histograms based on the gradient information Use
9 bins per cell, treating the histograms as an array of a size equal to the number of bins
Block Normalization: Normalize the vector of histograms by blocks (default size 16x16 pixels) to remove scale variations This step enhances the robustness of the feature vector
Final Vector: Concatenate all the normalized histogram vectors to obtain the final feature vector
In summary, the HOG feature extraction process involves capturing gradient information, forming histograms, normalizing by blocks, and concatenating the results to create a comprehensive feature vector for the given image
Trang 12After obtaining the Histogram of Oriented Gradient (HOG) feature vectors from the images, the next step involves utilizing Support Vector Machines (SVM) for further processing SVM is a robust machine learning algorithm commonly used for binary classification problems In this context, an SVC (Support Vector Classification) is chosen.
The process incluce the following steps:
Selection of SVM Algorithm: Choose an SVC, a variant of SVM suitable for classification tasks SVC is particularly effective for scenarios where the goal is
to categorize data into two classes
Training Data Preparation: Select a certain number of training examples with corresponding labeled data The labeled data should include information about whether each example belongs to the positive or negative class
Training the SVM: Fit the SVM with the HOG feature vectors from the training examples The SVM algorithm works to generate a hyperplane that effectively
Trang 13separates the two classes The objective is to maximize the margin, i.e., the distance between the hyperplane and the closest points of each class.
Hyperplane Generation: The SVM produces a hyperplane in the feature space that optimally discriminates between the positive and negative classes This hyperplane serves as the decision boundary for classifying new, unseen data
In summary, the combination of HOG feature vectors and SVM classification, particularly using the SVC variant, forms a powerful approach for identifying and distinguishing patterns in the given images
3 Sliding Window Algorithm & Bounding Boxes:
In the context of the project, the Sliding Window Algorithm and Bounding Boxes play
a crucial role in detecting anime faces using the trained SVM classifier Here's an explanation of the process:
Setting Thresholds: After training the SVM classifier, the next step is to apply it
to an input image to locate anime faces Initially, specific thresholds are set for the minimum and maximum width (w), minimum and maximum height (h), sizestride, and sliding stride
Generating Bounding Boxes: Using the defined thresholds, a list of bounding boxes is generated based on the size (width and height) of the sliding window and their respective locations (x and y coordinates) Each tuple (x, y, w, h)
Trang 14represents a bounding box, where (x, y) denotes the top-left corner of the window.
Feature Extraction with HOG: For each window, the algorithm extracts features using the Histogram of Oriented Gradients (HOG) technique These features are then fed into a linear classifier to predict whether the window contains an anime face or not
Filtering Based on IOU: Once all potentially positive windows are obtained, a filtering process is applied to remove windows with a high Intersection over Union (IOU) with at least one of the other bounding boxes This helps eliminate redundant or overlapping detections
In summary, the Sliding Window Algorithm scans the image at different positions and scales, and for each window, the HOG features are used to make predictions with the SVM classifier The final step involves filtering out redundant detections based on the IOU metric to refine the results
For anime face recognition, my partner uses the neural network VGG16 for multi-class classification to identify who is the character
Trang 15This is the diagram of following works:
The methods have been validated and widely applied in recognition and identification tasks Specifically, the VGG16 model has been substantiated throughvarious related projects in the realm of recognition However, when it comes to detection, we have encountered several challenges – I will talk about it later
Trang 16Procedures
1 Read and transform the datasets
1 Using cv2 to read the images from both positive training data
images(anime faces) and negative training data images(not anime faces), give the limit for the number of positive data and negative data if necessary
2 For each image, do the following in order:
(a) Resize the image to suitable size (64x128 by default)
(b) Convert to HOG feature descriptor with block size 16x16, cell size 8x8, and 9 number of bins(default setting)
(c) Add the HOG feature to the dataset
(d) If the image is from the positive training test, set ”1” as label, if it is from the negative set, set ”0” as label 2
2 Train and Test the classifier
1 Pick an SVC(Support Vector Classification) for the SVM model[6](i.e choose hyperparameters), and set the number limit of positive and negativetraining data if needed
2 Put the X(HOG feature) and Y(labels) into the SVM model
3 Fit the model
4 Test the model by given test images(for both negative and positive)
5 Test separately for positive and negative images from the test data images Getting two accuracies, one for identifying Positive images, another one for identifying the Negative images
6 Save the support vector classifier coefficients to a pkl file
Trang 171 Load the trained support vector classifier coefficients from the pkl file
2 Setting minimum, maximum, and stride for the window side(usually depends on the size of the original image), also the stride for sliding the image
3 Gather the location of the top-left corner, width, and height (in default using square) of all of the windows and put them into a list, the list is ordered by width, height, x location, and y location accordingly
4 For each window, do the following in the order:
(a) Gather the patch from image by the given info from window
(b) Resize the patch to suitable size (64x128 by default)
(c) Convert to HOG feature descriptor with block size 16x16, cell size 8x8,and 9 number of bins(default setting)
(d) Input the HOG descriptor and predict the label with the trained classifier
(e) Save the label to a label list
5 For label list, if it is predicted as an anime image, save the index a positive index list which is for collect the windows which are determined as an anime face
6 For each window which predicted as an anime image, do the following in order:
(a) Calculate the intersection over union(IOU) as rectangles to other windows which is predicted as anime images
(b) If the window has high IOU(greater than a threshold, 0.3 by default) with other rectangles, setting the label ”0” in the label list
(c) Update the positive index list by the current label list