mod HO CHI MINH UNIVERSITY OF TECHNOLOGY FACULTY OF APPLIED SCIENCE BK TP.HCM LINEAR ALGEBRA MT1007 REPORT Project 11: Projections, eigenvectors, Principal Component Analysis and fa
Trang 1mod
HO CHI MINH UNIVERSITY OF TECHNOLOGY
FACULTY OF APPLIED SCIENCE
BK
TP.HCM
LINEAR ALGEBRA (MT1007)
REPORT
Project 11: Projections, eigenvectors, Principal Component Analysis and face recognition algorithms
Lecturer: Dr Dau The Phiet
Class DTQ1 — Group 6
Ho Chi Minh City, August 2024
Trang 2
Assignment Table
No Name Student ID Assigned Tasks Assessment
1 Nguyễn Văn Thắng 2353123 bo assignment TV 100% 2_ | Phan Nguyễn Thành Trund 2353252 DO aeoignrneii iva ie 100%
4 Nguyén Gia Minh 2352748 Do aeclgnmert IL VI theory 100%
5 Nguyén Hai Anh 2352040 Do aeclgnmert IL VI theory 100%
1lPage
Trang 3
oa
VỊ
Table of Content
09s 00a s 3
PCA (Principal Components AnalySiS) - HH HH khe 3 7-2 na 3
=0 .Ả 6 44 3 Mathematics baSÌS Tnhh ng KH kg tt ng kg kh 3 Principal Component Analysing s†ep by s†ep LH key 5 Applications of principal component analysis in fields of experise 5 Application PCA in face recognifiOn ch HH khi 5 )JEIIzIebtIỆIAIaaaddiaadaiaiiadđiaiaẳiiiidiiii 7 PRESUIS 20.0 e<a dd 3 11
0ì 900 s9 1 e 12 EI2sai- 0T 13
2lPage
Trang 4I Introduction
Due to the developments of technology and Internet nowadays, information’s confidential is very essential Every year, thousands of cases of identity thieves, imposter scams, phishing, money transfer fraud etc occur around the world, the consequences are lots of importance information lost, and millions of dollars are stolen because of these high-tech thief and outdated security technology Therefore, many ways of security methods have been developed The safest security method now is facial recognition system, this method uses user biological features to pass the security system The facial recognition system bases on an algorithm called PCA (Principal Components Analysis)
Il | PCA (Principal Components Analysis)
1 Theory
Principal component analysis is a means of communication that statistical analysts most frequently make use of in order to reduce the dimensionality of large-dimensional datasets, or
"big data," while taking care of the essential data that is needed for setting up models To achieve the best possible representation of data variability, the above technique responds to data from a high-dimensional space into a new space that contains fewer dimensions
2 Feature
« Dimensionality Reduction: Reduces data complexity while retaining most of the variance Variance Maximization: Captures directions of maximum variance in the data
Orthogonal Components: Produces uncorrelated principal components
Eigenvectors and Eigenvalues: Defines components based on eigenvectors and their corresponding eigenvalues
Sensitivity to Scaling: Requires data standardization for accurate results
¢ Linear Assumption: Assumes linear relationships between variables
3 Mathematics basis
- Mean centering:
It is the desired value, simply the arithmetic mean of all values For N values x1, X2, ., Xn:
1 N
E[X] = + Soa,
i=1
- Variance:
A measure of how widely distributed a group of numbers is called the variance It is one of numerous probability distribution descriptors that characterizes how much the values deviate from the mean (expected value)
N —\/“
ơ? = yy ie (ti - 2)?
- Covariance:
The amount that two random variables vary together is measured by their covariance The covariance is a positive number if the larger values of one variable mostly correspond with the larger values of the other variable, and the same is true for the smaller values, meaning that the variables generally exhibit comparable behavior
31Page
Trang 5In the alternative scenario, the covariance is negative because the larger values of one variable primarily correlate with the smaller values of the other, i.e., the variables tend to exhibit opposite behavior
N r TT
y ;_1(X;—X)(Y,—Y)
NOTE:
1 If X and Y are independent, COV (X, Y) = 0 for them to be referred to as uncorrelated However, the opposite is untrue If they come from a Gaussian distribution, they are equal i.e
2 If two Gaussian random variables X, Y have COV (X, Y) = 0, X and Y are
independent
- Covariance matrix:
An mxm matrix with an i, j element representing the covariance between the ith and jth random variables is the covariance matrix of a mx1 random vector
The covariance matrix of all the data, given N data points described by column vectors x1, Xa, , Xn, and expected vectors, are defined as follows:
* isa non-negative definite square symmetric m x m matrix
« The variance of every component makes up the diagonal components of =
« The covariance between components is represented by the off-diagonal elements of © Covariance, or the correlation between the ith and jth components, is represented by the off- diagonal elements in the data This number may be zero, positive, or negative The two components, i and j, in the data are said to be uncorrelated when the value is equal to 0 When the covariance matrix is diagonal, there is no correlation at all between the dimensions
of the data
Covariance matrix illustrating:
a = [var(z) cou(x,y) cou(y, x) var(y)|
- Eigenvalue, Eigenvector of covariance matrix:
Given a square matrix A, ifa scalar A and a vector v satisfy:
Ax = \x
then i is an eigenvalue of A, the eigenvector for that eigenvalue is denoted by v
One of the roots of the characteristic equation is eigenvalue:
det(A — XI) =0
There can be several eigenvectors for one eigenvalue Only one eigenvalue associated with each eigenvector There are n eigenvalues (including repeated values) in every nxn matrix, and these can be complex numbers The eigenvalues of a symmetric matrix are all real integers All real eigenvalues in a positive semidefinite matrix are non-negative, but all real eigenvalues in a positive definite matrix are positive
Solution to find eigenvalues and eigenvectors:
Step 1: To find eigenvalues, solve the characteristic equation:
det(A — AI) =0
Step 2: Find eigenvectors that match eigenvalues by solving the following equations:
(A— AT)u =0
4lPage
Trang 64 Principal Component Analysing step by step
Step 1: Determine the average value of X: X
Step 2: Verify vector ¥X-X Find the matrix of covariance S=XTK
Step 3: Determine the unit eigenvectors that correspond to the eigenvalues of S by computing the eigenvalues of S and sorting the results by decreasing value 1<2< <m
Step 4: The k unit eigenvectors that match the k starting eigenvalues should be chosen Using the selected eigenvectors as its columns, create a matrix A Matrix A is the transformation that can be
found
Step 5: Determine the vector X image ATWith XOAX+X, the original data X is approximated
The coordinates of the rows of the matrixrithe base from the columns of the matrix P are included
in each column AX
NOTE:
Real symmetric matrix S has eigenvalues that are non-negative real numbers
An orthogonal diagonalizable matrix S exists at all times
The variance of the x; x2; .; xn vectors are on S's diagonal The covariance of xi and x;is referred to as element sj The variance of the data table is the total of the items on S's
diagonal Assume that S = PDP" The eigenvalues of S are on D's diagonal The total of S's eigenvalues equals the total of S's elements, which is equivalent to S's trace
4 Matrix P is an orthogonal matrix Every orthogonal matrix has a corresponding rotation The columns of matrix P comprise the orthonormal system If we determine that the column vector family of the matrix P functions as the orthonormal basis, we may build a new
coordinate system based on these vectors and have a rotation from the original coordinate
system to the new coordinate system
5 Sa—xX TX if sample data is used §@X'& if population data is used
5 Applications of principal component analysis in fields of expertise
Data science and machine learning: By removing superfluous and unnecessary features, PCA
is frequently used for feature extraction and dimensionality reduction, which enhances the performance of machine learning models Additionally, it helps to improve pattern recognition
by presenting high-dimensional data in 2D or 3D plots
Finance: PCA is used in finance to simplify big datasets, for as when analyzing stock market fluctuations It supports portfolio risk management, the development of predictive models for financial forecasting, and the identification of underlying causes that influence asset price fluctuations
Genomics and bioinformatics: principal component analysis (PCA) is utilized to examine gene expression data, assisting in the identification of the most important genes responsible for genetic data changes It helps with the classification of various biological problems, such the differentiation of cancer kinds according to their genetic profiles
Computer vision and image processing: PCA is used for applications involving recognition and compression of images It makes picture data less dimensional, which facilitates processing and analysis—especially in applications involving pattern detection and facial recognition systems
6 Application PCA in face recognition
5lPage
Trang 7Principal Component Analysis (PCA) is a foundational technique used in face recognition systems, primarily for its ability to reduce the dimensionality of image data while preserving key features that are crucial for distinguishing between different faces Here's how PCA is applied in face recognition:
6.1 Dimensionality Reduction: Face photographs have a large dimensionality difficulty since even a 100x100 pixel image has 10,000 dimensions because every pixel represents a dimension In order to solve this, PCA lowers the dimensionality by converting the high- dimensional picture data into a lower-dimensional "eigenspace." By preserving the most important features, this method captures the fundamental facial traits that set each face different
6.2 Eigenfaces: "Eigenfaces" are the main components in PCA-based face recognition that are obtained from the training face pictures’ covariance matrix The most important aspects of the face are represented by these eigenfaces In order to simplify the data, the procedure entails calculating the mean face, calculating the differences between each training face and the mean, generating eigenfaces by applying PCA to these differences, and representing each face as a linear combination of these eigenfaces
6.3 Face Recognition: A fresh face picture is projected into the eigenspace using eigenfaces in PCA-based face recognition, producing a collection of weights that define the face Next, these weights are contrasted with the database's known face weights The match is the face that has the closest weights, which are usually found using a similarity metric like Euclidean distance 6.4 Advantages:
- Efficiency: PCA makes it possible to significantly reduce the number of data, which speeds
up and improves the efficiency of face recognition algorithms without sacrificing important information
- Noise Reduction: PCA improves identification accuracy by minimizing the effects of noise and extraneous information in the pictures by concentrating on the most important elements 6.5 Challenges:
- Sensitivity to Variations: Systems for facial identification based on PCA may be sensitive to changes in angles, illumination, and facial expressions Improvements and hybrid strategies, however, can lessen these problems
- Data Dependency: The caliber and variety of the training data have a major impact on PCA's efficacy For accurate recognition, the eigenfaces have to be generated from a representative dataset
In general, PCA played a crucial role in the early face recognition systems’ development, helping to pave the way for later, more sophisticated approaches It is a useful tool for both practical and academic applications due to its effective face recognition and representation capabilities
6lPage
Trang 8Ill Matlab
1 Code:
% Simple face recognition algorithm
%% Input database files into Matlab
clear;
Database_Size = 30;
% Reading images from the database located in the subfolder
‘database’
for j = 1:Database_Size
image_read = imread(['person' num2str(J) '.pgm']);
[m, n] = size(image_read);
PC: j) = reshape(image_read, m*n, 1);
end;
%% Computing and displaying the mean face
mean_face = mean(P, 2);
imshow(uint8(reshape(mean_face, m, n)))
%% Subtract the mean face
P= double(P);
P= P- mean_face * ones(1, Database_Size);
%% Compute the covariance matrix of the set and its eigenvalues
and eigenvectors
[Vectors, Values] = eig(P' * P);
EigenVectors = P * Vectors;
%% Display the set of eigenfaces
EigenFaces = [];
for j = 2:Database_Size
if j ==
EigenFaces = reshape(EigenVectors(:,j) + mean_face,m,
n);
else
EigenFaces = [EigenFaces reshape(EigenVectors(:, j) +
mean_face, m, n)];
end;
end
EigenFaces = uint8(EigenFaces);
figure;
imshow(EigenFaces);
%% Nerify orthogonality of eigenvectors
Products = EigenVectors' * EigenVectors;
%% Recognition of an altered image (sunglasses)
image_read = imread(["person30altered1.pgm']);
U = reshape(image_read, m*n, 1);
NormsEigenVectors = diag(Products);
W = (EjigenVectors' * (double(U) - mean_face)) /
NormsEigenVectors;
U_approx = EigenVectors * W + mean_face;
image_approx = uint&(reshape(U_approx, m, n));
figure;
imshow([image_read, image_approx]);
%% Unmask the image person30altered2.pgm
image_read = Imread([person30altered2.pgm']);
7lPage
Trang 9W = (EjigenVectors' * (double(U) - mean_face)) /
NormsEigenVectors;
U_approx = EigenVectors * W + mean_face;
image_approx = uint&(reshape(U_approx, m, n));
figure;
imshow([image_read, image_approx]);
%% Approximate the image person31.pgm
image_read = imread(["person31.pgm']);
U = reshape(image_read, m*n, 1);
W = (EjigenVectors' * (double(U) - mean_face)) /
NormsEigenVectors;
U_approx = EigenVectors * W + mean_face;
image_approx = uint&(reshape(U_approx, m, n));
figure;
imshow([image_read, image_approx]);
2 Explain code:
- Input Database Files into MATLAB:
Database_Size = 30;
The code sets the size of the database, i.e., the number of face images, to 30
forj=1
This loop reads each image from the database, which is assumed to be in the subfolder named
‘database’ The images are named sequentially (person1.pgm, person2.pgm, etc.)
P(:,j) = reshape(image_read, m*n, 1);
Each image is reshaped into a column vector and stored in matrix P
- Computing and Displaying the Mean Face:
mean_face = mean(P, 2);
The mean of all the images (mean face) is calculated by averaging all the columns of matrix P imshow(uint8(reshape(mean_face, m, n)));
The mean face is reshaped back into the original image size and displayed
- Subtract the Mean Face:
P = P - mean_face * ones(1, Database_Size);
The mean face is subtracted from all the images in the database to center the data around the origin
8lPage
Trang 10- Compute the Covariance Matrix and Its Eigenvalues and Eigenvectors:
[Vectors, Values] = eig(P' " P);
The code computes the covariance matrix P’ * P and finds its eigenvalues and eigenvectors EigenVectors = P * Vectors;
The eigenvectors of the original covariance matrix P * P' (which corresponds to the original data space) are computed
- Display the Set of Eigenfaces:
for j = 2
The code loops through the eigenvectors to display the eigenfaces The first eigenvector is usually not used because it corresponds to the mean face
EigenFaces = [EigenFaces reshape(EigenVectors(:, j) + mean_face, m, n)];
The eigenfaces are reshaped and stored in a matrix EigenFaces, then displayed
- Verify Orthogonality of Eigenvectors:
Products = EigenVectors' * EigenVectors;
This step verifies the orthogonality of the eigenvectors by checking if their dot products are zero (or close to zero)
- Recognition of an Altered Image (sunglasses):
image_ read = imread([person30altered1.pgm']);
The code reads an altered image (e.g., a person wearing sunglasses)
W = (EigenVectors' * (double(U) - mean_face)) / NormsEigenVectors;
The code projects the altered image onto the eigenface space, computes its weights (W), and reconstructs the approximate image
imshow([image_ read, image_ approx]);
The original and reconstructed images are displayed side by side for comparison
- Unmask the Image person30altered2.pgm:
Similar to the previous step, but with a different altered image, this section aims to "unmask"
or recognize the altered image using eigenfaces
- Approximate the Image person31.pgm:
9lPage