digital image processing project report topic anime face recognition

23 0 0
Tài liệu đã được kiểm tra trùng lặp
digital image processing project report topic anime face recognition

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

The second dataset aids us in carrying out the second task – face detection.Utilizing Support Vector Machines SVM and Histogram of Oriented Gradients HOG, we have trained a model to dete

Trang 1

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING

DIGITAL IMAGE PROCESSING

PROJECT REPORTTopic: ANIME FACE RECOGNITION

Instructor: Prof Trn Th Thanh Hi

Trang 2

Japanese anime plays a crucial role in providing entertainment and is one of the threeindispensable components of ACG (Anime, Comic, and Game) Despite the widespread consumption of anime, viewers often encounter difficulties in recognizing characters solely based on their faces in images This challenge forms the basis of our project.

In the overall context of the project, we aim to present an image containing anime characters Through machine learning model training methods, our model can accurately determine the full names of the characters in the image To address this problem, we have divided the project into three main tasks: data collection, face detection, and character recognition By successfully completing these tasks, we anticipate that the model will have the capability to accurately identify the facial positions of the characters and display their full names.

GENERAL INTRODUCTION

As mentioned, our project undertakes the challenging task of developing a robust system for anime face recognition We collectively brainstormed various approaches, and we converged on the idea that training histogram of gradient (HOG) feature descriptors and using them as a classifier is a competitive strategy for detecting anime faces compared to other methods Anime faces, being 2D art rather than 3D projections, exhibit clearer shapes and patterns within a given region Descriptors like SIFT, which focus on single points rather than regions, are deemed unsuitable for this task

We have divided the problem into three main tasks: data collection, anime face detection, and character identification.

Trang 3

The initial stage involves gathering a diverse dataset containing various images of different anime characters, along with a dataset containing facial features of these characters The first dataset is crucial for training the model to recognize the complexand unique features of each character accurately The second dataset aids us in carrying out the second task – face detection.

Utilizing Support Vector Machines (SVM) and Histogram of Oriented Gradients (HOG), we have trained a model to detect anime faces in images SVM assists in classifying and distinguishing between positive examples (containing anime faces) and negative examples (without anime faces), while HOG supports in extracting facial features This combined method ensures sharpness in detecting anime characterfaces in images, addressing challenges posed by the distinctive and often intricate facial features.

The third and crucial task focuses on character identification using the VGG16 model VGG16, a renowned deep neural network with image recognition capabilities,has been fine-tuned to meet the specific requirements of anime character

identification The model has undergone training on our diverse and extensive datasetto capture the intricate details of each character.

The smooth integration of SVM/HOG for face detection and VGG16 for character identification forms a comprehensive pipeline The expected result is an efficient model capable of accurately determining the position of anime faces in images and framing them within a window This window acts like an image crop, displaying onlythe face of the character instead of the entire image This facilitates precise and easy character identification.

Below are images for a clearer visualization of the results:

Trang 4

RELATED STUDIES

Below are some works related to our project:

1 The HOG (Histogram of Oriented Gradients) characterization method:

HOG stands for Histogram of Oriented Gradients - a type of feature descriptor The purpose of a feature descriptor is to abstract an object by extracting its characteristics and discarding irrelevant information Therefore, HOG is primarilyused to describe the shape and appearance of an object in an image.

The essence of the HOG method lies in utilizing information about the distribution of intensity gradients or edge directions to describe local objects in animage HOG operators are implemented by dividing an image into sub-regions called "cells," and for each cell, a histogram of gradient directions is computed for the points within that cell Concatenating these histograms results in a representation for the original image To enhance recognition performance, local histograms can be normalized to contrast by calculating an intensity threshold over a larger region than the cell, called "blocks," and using that threshold value to normalize all cells within the block The result after normalization is a feature vector with higher invariance to changes in lighting conditions.

There are five basic steps to construct a HOG vector for an image, including:a Preprocessing

b Computing gradients

Trang 5

c Calculating feature vectors for each celld Normalizing blocks

e Computing the HOG vector

Here we won't delve into the details of how to calculate HOG following the standard procedure Essentially, the HOG computation method in the project also adheres to the standard process We will touch on how to calculate it in the methodology section.

2 Support Vector Machine (SVM):

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression, and even outlier detection tasks SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection,,handwriting identification, gene expression analysis, face detection, and anomaly detection SVMs are adaptable and efficient in a variety of applications because they can manage high-dimensional data and nonlinear relationships

The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible The dimension of the hyperplane depends upon the number of features If the number of input features is two, then the hyperplane is just a line If the number of input features is three, then the hyperplane becomes a 2-D plane It becomes difficult to imagine when the number of features exceeds three

Trang 6

In the project, Support Vector Machines (SVM) machine learning is used to classify positive and negative examples in the process of detecting anime faces in images Specifically, SVM is applied to classify pixels that contain facial featuresof anime characters (positive examples) and pixels that do not contain those features (negative examples).

Trang 7

VGG16 is a convolutional neural network (CNN) model proposed by the Visual Geometry Group (VGG) research team at the University of Oxford The name "VGG16" comes from the combination of "VGG" and the number "16," referring to the total number of layers in the model.

This model is renowned for its deep architecture, consisting of 16 layers, including convolutional layers and fully connected layers The entire architecture of VGG16 is designed to learn high-level features from images, and it has excelled in various image recognition and classification tasks.VGG16 has become one of the key computer vision models and is widely used in both research and practical applications

For anime face recognition, my partner uses the neural network VGG16 for class classification to identify who is the character.

Trang 8

multi-The Sliding Window Algorithm and Bounding Boxes are employed in the task to determine the precise location of anime faces in the image Below is a detailed description of how they are integrated:

Sliding Window Algorithm:

Purpose: Used to slide over the entire image with a fixed-size window to check which position of the image contains an anime face.

Execution: Iterate through all possible positions on the image with a sliding window and apply the detection model to determine whether that region contains an anime face or not.

Benefits: Allows us to examine each small part of the image, capturing the accurate position of the face and enhancing the algorithm's accuracy.Bounding Boxes:

Purpose: Used to mark and define the region containing the anime face as determined by the algorithm.

Execution: When the Sliding Window algorithm identifies a position containing aface, a bounding box is drawn around that region to determine the precise boundaries of the face.

Benefits: The bounding box helps visualize the position and shape of the face, providing information for the subsequent steps in the detection and display process.

In summary, combining the Sliding Window Algorithm and Bounding Boxes ensures that the model can capture the accurate position of anime faces in the image and present visual results for the user

Trang 9

All data used to construct the dataset are images This dataset has been carefullycurated and preprocessed Based on the investigation, it is found that the dataset has been manually processed The author has utilized images from various sources such as:

1 Anime Faces

2 Anime Face Datasethttps://www.kaggle.com/splcher/animefacedataset

3 Safebooru - Anime Image Metadatahttps://www.kaggle.com/alamson/safebooru

4 Tagged Anime Illustrationshttps://www.kaggle.com/mylesoneill/tagged-anime-illustrations

Currently, we are dealing with two tasks: detection and recognition, so the dataset has been divided into two sets for each respective task.

Trang 10

Dectection:

The dataset includes 2 types: positive and negative In positive, there are6.098 file for train and 146 file for test Similarly, negative have 19.670 filefor train and 218 file for test

The positive files will be images of the character's face, while the negative files will be images that do not contain a face

Ex: positive - negative

Recognition:

The dataset for the recognition problem is a collection of images of 10 different anime characters A total of 789 files for training and 185 files for testing

Trang 11

Cell Division: Divide the image into cells, usually of size 8x8 pixels by default.

Gradient Information: Calculate the gradient for each pixel in the cell, resulting in magnitudes and directions for the image patch Each cell will yielda set of 128 numbers (8x8x2) if using the default settings.

Histogram Creation: Form histograms based on the gradient information Use 9 bins per cell, treating the histograms as an array of a size equal to the number of bins.

Block Normalization: Normalize the vector of histograms by blocks (default size 16x16 pixels) to remove scale variations This step enhances the robustness of the feature vector.

Final Vector: Concatenate all the normalized histogram vectors to obtain the final feature vector.

In summary, the HOG feature extraction process involves capturing gradient information, forming histograms, normalizing by blocks, and concatenating the results to create a comprehensive feature vector for the given image.

Trang 12

After obtaining the Histogram of Oriented Gradient (HOG) feature vectors from the images, the next step involves utilizing Support Vector Machines (SVM) for further processing SVM is a robust machine learning algorithm commonly used for binary classification problems In this context, an SVC (Support Vector Classification) is chosen.

The process incluce the following steps:

Selection of SVM Algorithm: Choose an SVC, a variant of SVM suitable for classification tasks SVC is particularly effective for scenarios where the goal is to categorize data into two classes.

Training Data Preparation: Select a certain number of training examples with corresponding labeled data The labeled data should include information about whether each example belongs to the positive or negative class.

Training the SVM: Fit the SVM with the HOG feature vectors from the training examples The SVM algorithm works to generate a hyperplane that effectively

Trang 13

separates the two classes The objective is to maximize the margin, i.e., the distance between the hyperplane and the closest points of each class.

Hyperplane Generation: The SVM produces a hyperplane in the feature space that optimally discriminates between the positive and negative classes This hyperplane serves as the decision boundary for classifying new, unseen data.

In summary, the combination of HOG feature vectors and SVM classification, particularly using the SVC variant, forms a powerful approach for identifying and distinguishing patterns in the given images.

3 Sliding Window Algorithm & Bounding Boxes:

In the context of the project, the Sliding Window Algorithm and Bounding Boxes play a crucial role in detecting anime faces using the trained SVM classifier Here's an explanation of the process:

Setting Thresholds: After training the SVM classifier, the next step is to apply it to an input image to locate anime faces Initially, specific thresholds are set for the minimum and maximum width (w), minimum and maximum height (h), sizestride, and sliding stride.

Generating Bounding Boxes: Using the defined thresholds, a list of bounding boxes is generated based on the size (width and height) of the sliding window and their respective locations (x and y coordinates) Each tuple (x, y, w, h)

Trang 14

represents a bounding box, where (x, y) denotes the top-left corner of the window.

Feature Extraction with HOG: For each window, the algorithm extracts features using the Histogram of Oriented Gradients (HOG) technique These features are then fed into a linear classifier to predict whether the window contains an anime face or not.

Filtering Based on IOU: Once all potentially positive windows are obtained, a filtering process is applied to remove windows with a high Intersection over Union (IOU) with at least one of the other bounding boxes This helps eliminate redundant or overlapping detections.

In summary, the Sliding Window Algorithm scans the image at different positions and scales, and for each window, the HOG features are used to make predictions with the SVM classifier The final step involves filtering out redundant detections based on the IOU metric to refine the results.

For anime face recognition, my partner uses the neural network VGG16 for multi-class classification to identify who is the character.

Trang 15

This is the diagram of following works:

The methods have been validated and widely applied in recognition and identification tasks Specifically, the VGG16 model has been substantiated throughvarious related projects in the realm of recognition However, when it comes to detection, we have encountered several challenges – I will talk about it later

Trang 16

Procedures

1 Read and transform the datasets

1 Using cv2 to read the images from both positive training data

images(anime faces) and negative training data images(not anime faces), give the limit for the number of positive data and negative data if necessary

2 For each image, do the following in order:

(a) Resize the image to suitable size (64x128 by default)

(b) Convert to HOG feature descriptor with block size 16x16, cell size 8x8, and 9 number of bins(default setting)

(c) Add the HOG feature to the dataset

(d) If the image is from the positive training test, set ”1” as label, if it is from the negative set, set ”0” as label 2

2 Train and Test the classifier

1 Pick an SVC(Support Vector Classification) for the SVM model[6](i.e choose hyperparameters), and set the number limit of positive and negativetraining data if needed

2 Put the X(HOG feature) and Y(labels) into the SVM model 3 Fit the model

4 Test the model by given test images(for both negative and positive) 5 Test separately for positive and negative images from the test data images

Getting two accuracies, one for identifying Positive images, another one for identifying the Negative images

6 Save the support vector classifier coefficients to a pkl file

Ngày đăng: 18/06/2024, 17:23