Linear Subspace Techniques 187 Figure 11.17 The first six eigenfaces. Figure 11.18 Recognition accuracy with PCA. where S b is the between-class scatter matrix and S w is the within-class scatter matrix defined as: S w = c i=1 PC i S i (11.18) S b = c i=1 PC i i − i − T (11.19) where c is the number of classes c =15 and PC i is the probability of class i. Here, PC i = 1/c, since all classes are equally probable. 188 Pose-invariant Face Recognition S i is the class-dependent scatter matrix and is defined as: S i = 1 N i x k ∈X i x k − i x k − i T i = 1c (11.20) One method for solving the generalized eigenproblem is to take the inverse of S w and solve the following eigenproblem for matrix S −1 w S b : S −1 w S b W = W (11.21) where is the diagonal matrix containing the eigenvalues of S −1 w S b . But this problem is numerically unstable as it involves direct inversion of a very large matrix, which is probably close to singular. One method for solving the generalized eigenvalue problem is to simultaneously diagonalize both S w and S b [21]: W T S w W = I W T S b W = (11.22) The algorithm can be outlined as follows: 1. Find the eigenvectors of P T b P b corresponding to the largest K nonzero eigenvalues, V c×K = e 1 e 2 e K where P b of size n ×cS b = P b P T b . 2. Deduce the first K most significant eigenvectors and eigenvalues of S b : Y = P b V (11.23) D b = Y T S b Y = Y T P b P T b Y (11.24) 3. Let Z = YD −1/2 b , which projects S b and S w onto a subspace spanned by Z, this results in: Z T S b Z≡ I and Z T S w Z (11.25) 4. We then diagonalize Z T S w Z, which is a small matrix of size K ×K: U T Z T S w ZU = w (11.26) 5. We discard the large eigenvalues and keep the smallest r eigenvalues, including the 0 s. The corresponding eigenvector matrix becomes R k ×r. 6. The overall LDA transformation matrix becomes W = ZR. Notice that we have diagonalized both the numerator and the denominator in the Fisher criterion. 4.2.1 Experimental Results We have also performed a leave one out experiment on the Yale faces database [20]. The first six Fisher faces are shown in Figure 11.19. The eigenvalue spectrum of between-class and within-class covariance matrices is shown in Figure 11.20. We notice that 14 Fisher faces are enough to reach the maximum recognition accuracy of 93.33 %. The result of recognition accuracy with respect to the number of Fisher faces is shown in Figure 11.21. Using LDA, we have achieved a maximum recognition accuracy of 93.333 %. Linear Subspace Techniques 189 Figure 11.19 The first six LDA basis vectors. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 No. of eigenvalues of between–class covariance 123456789101112131415 No. of eigenvalues of within–class covariance 5 4 3 2 1 0 Eigenvalues ×10 7 1000 800 600 400 200 0 Eigenvalues Figure 11.20 Eigenvalue spectrum of between-class and within-class covariance matrices. 4.3 Independent Component Analysis Since PCA only considers second order statistics, it lacks information on the complete joint probability density function and higher order statistics. Independent Component Analysis (ICA) accounts for such information and is used to identify independent sources from their linear combination. In face recognition, ICA is used to provide an independent, rather than an uncorrelated, image decomposition. In deriving the ICA algorithm, two main assumptions are made: 1. The input components are independent. 2. The input components are non-Gaussian. 190 Pose-invariant Face Recognition 2 4 6 8 10 12 14 Number of Fisher faces 100 90 80 70 60 50 40 30 20 10 0 Percentage recognition Figure 11.21 Recognition accuracy with LDA. Non-Gaussianity, in particular, is measured using the kurtosis function. In addition to the above assumptions, the ICA has three main limitations: 1. Variances of the independent components can only be determined up to a scaling. 2. The order of the independent components cannot be determined (only determined up to a permutation order). 3. The number of separated components cannot be larger than the number of observation signals. The four main stages of the ICA algorithm are: preprocessing; whitening; rotation; and normalization. The preprocessing stage consists of centering the data matrix X by removing the mean vector from each of its column vectors. The whitening stage consists of linearly transforming the mean removed input vector ˜ x i so that a new vector is obtained whose components are uncorrelated. The rotation stage is the heart of ICA. This stage performs source separation to find the independent components (basis face vectors) by minimizing the mutual information. A popular approach for estimating the ICA model is maximum likelihood estimation, which is connected to the info-max principle and the concept of minimizing the mutual information. A fast ICA implementation has been proposed in [22]. The FastICA is based on a fixed point iteration scheme for finding a maximum of the non-Gaussianity of W T X. Starting with a certain activation function g such as: gu = tana1u or gu = u−u 2 /2 or gu = u 3 (11.27) the basic iteration in the FastICA is as follows: 1. Choose an initial (random) transformation W. 2. Let W + = W +I +gyy T W, where is the learning rate and y = Wx. 3. Normalize W + and repeat until convergence. The last stage in implementing the ICA is the normalization operation that derives unique independent components in terms of orientation, unit norm and order of projections. A Pose-invariant System for Face Recognition 191 Figure 11.22 The first six ICA basis vectors. 0 5 10 15 100 90 80 70 60 50 40 30 20 10 Percentage recognition Number of ICA faces Figure 11.23 Recognition accuracy with ICA. 4.3.1 Experimental Results We have performed a leave one out experiment on the Yale faces database [20], the same experiment as performed on LDA and PCA. The first six ICA basis vectors are shown in Figure 11.22 and the curve for recognition accuracy is shown Figure 11.23. 5. A Pose-invariant System for Face Recognition The face recognition problem has been studied for more than two decades. In most systems, however, the input image is assumed to be a fixed size, clear background mug shot. However, a robust face recognition system should allow flexibility in pose, lighting and expression. Facial images are high- dimensional data and facial features have similar geometrical configuration. As such, under general conditions where pose, lighting and expression are varying, the face recognition task becomes more difficult. The reduction of that variability through a preliminary classification step enhances the performance of face recognition systems. Pose variation is a nontrivial problem to solve as it introduces nonlinear transformations (Figure 11.24). There have been a number of techniques proposed to overcome the problem of varying pose for face recognition. One of these was the application of Growing Gaussian Mixtures Models [23], GMMs are applied after reducing the data dimensions using PCA. The problem is that since 192 Pose-invariant Face Recognition Figure 11.24 A subject in different poses. GMM is a probabilistic approach, it requires a sufficient amount of training faces, which are usually not available (for example, 50 faces to fit five GMMs). One alternative is to use a three-dimensional model of the face [15]. However, 3D models are expensive and difficult to develop. The view-based eigenspaces of Moghaddam and Pentland [3] have also shown that separate eigenspaces perform better than using a combined eigenspace of the pose-varying images. This approach essentially consists of several discrete systems (multiple observers). We extend this method and apply it using linear discriminant analysis. In our experiments, we will show that view-based LDA performs better than view-based PCA. We have also demonstrated that LDA can be used to do pose estimation. 5.1 The Proposed Algorithm We propose, here, a new system which is invariant to pose. The system consists of two stages. During the first stage, the pose is estimated. In stage two, a view-specific subspace analysis is used for recognition. The block diagram is shown in Figure 11.25. To train the system, we first organize the images from the database into three different views and find the subspace transformation for each of these views. In the block diagram, we show the sizes of the matrices at different stages, so as to get the notion of dimensionality reduction. The matrices XL, XR and XF are of size 60 ×2128 (three images/person, 20 people, 2128 pixels per image). WL, WR and WF are the transformation matrices, each containing K basis vectors (where K =20). YL, YR and YF are the transformed matrices, called template matrices, each of size 60 ×K. 5.2 Pose Estimation using LDA The pose estimation stage is composed of a learning stage and a pose estimation stage. In this work, we are considering three possible classes for the pose. These are: the left pose at 45 C 1 , the front pose C 2 and the right pose at 45 C 3 . Some authors considered five and seven possible rotation angles, but our experiments have shown that the three angles mentioned above are enough to capture the main features of the face. Each of the faces in the training set is seen as an observation vector x i of a certain random vector x. These are denoted as x 1 x 2 x N , Each of these is a face vector of dimension n, concatenated from a pxpfacial image (n is the number of pixels in the facial image, for the faces in the UMIST database n = 2128). An estimate of the expected value of x can be obtained using the average: = 1 N N i=1 x i (11.28) A Pose-invariant System for Face Recognition 193 (45°) Left Database XL (60 ∗ 2128) (45°) Right Database XL (60 ∗ 2128) Left Subspace WL (20 ∗ 2128) Right Subspace WR (20 ∗ 2128) Front Database XF (60 ∗ 2128) Front Subspace WF (20 ∗ 2128) Left Template YL (60 ∗ 20) Right Template YR (60 ∗ 20) Front Template YF (60 ∗ 20) Test Database Pose Estimation Matching (a) (b) L R F Figure 11.25 Block diagram of the pose-invariant subspace system. (a) View-specific subspace training; (b) pose estimation and matching. In this training set, we have N observation vectors x 1 x 2 x N N 1 of which belong to class C 1 N 2 to class C 2 , and N 3 to class C 3 . These classes represent the left pose at 45 , the front pose and the right pose at 45 , respectively. After subtracting the mean vector from each of the image vectors, we combine the vectors, side by side, to create a data matrix of size n ×N : X = x 1 x 2 x N (11.29) Using linear discriminant analysis, we desire to find a linear transformation from the original image vectors to the reduced dimension feature vectors as: Y = W T X (11.30) where Y is the d ×N feature vector matrix, d is the dimension of the feature vectors and W is the transformation matrix. Note that d n. As mentioned in Section 4, linear discriminant analysis (LDA) attempts to reduce the dimension of the data and maximize the difference between classes. To find the transformation W, a generalized eigenproblem is solved: S b W = S w W (11.31) 194 Pose-invariant Face Recognition where S b is the between-class scatter matrix and S w is the within-class scatter matrix. Using the transformation W, each of the images in the database is transformed into a feature vector of dimension d. To estimate the pose of a given image, the image is first projected over the columns of W to obtain a feature vector z. The Euclidian distance is then used to compare the test feature vector to each of the feature vectors from the database. The class of the image corresponding to the minimum distance is then selected as the pose of the test image. 5.3 Experimental Results for Pose Estimation using LDA and PCA The experiments were carried out on the UMIST database, which contains 20 people and a total of 564 faces in varying poses. Our aim was to identify whether a subject was in pose left, right or front, so that we could use the appropriate view-based LDA. We performed pose estimation using both techniques, LDA and PCA. The experiments were carried out using three poses for each of the 20 people. We trained the system using ten people and tested it using the remaining ten people. The mean images from the three different poses are shown in Figure 11.26. Similarly, we trained the ‘pose estimation using PCA’ algorithm, but here we did not use any class information. Hence, we used the training images in three different poses: left 45 degrees, right 45 degrees and front. The results are show in Table 11.3. We noticed that LDA outperformed PCA in pose estimation. The reason being the ability of LDA to separate classes, while PCA only classifies features. As mentioned above, LDA maximizes the ratio of variances of between classes to within classes. 5.4 View-specific Subspace Decomposition Following the LDA procedure discussed above, we can derive an LDA transformation for each of the views. As such, using the images from each of the views and for all individuals, we obtained three transformation matrices: XL, XR and XF for left, right and front views respectively. Figure 11.26 Mean images of faces in front, left, and right poses. Table 11.3 Experimental results of pose estimation. No. of test images PCA LDA 90 90 % 100 % 180 88.333 % 98.888 % A Pose-invariant System for Face Recognition 195 5.5 Experiments on the Pose-invariant Face Recognition System We carried out our experiments on view-based LDA and compared the results to other algorithms. In the first experiment, we compared View-based LDA (VLDA) to the Traditional LDA (TLDA) [21]. The Fisher faces for the front, left and right poses are displayed in Figures 11.27, 11.28 and 11.29, Figure 11.27 Fisher faces trained for front faces (View-based LDA). Figure 11.28 Fisher faces trained for left faces (View-based LDA). Figure 11.29 Fisher faces trained for right faces (View-based LDA). 196 Pose-invariant Face Recognition respectively. Figure 11.30 shows the Fisher faces obtained using a unique LDA (TLDA). The performance results are presented in Figure 11.31. We noticed an improvement of 7 % in recognition accuracy. The reason for this improvement is that we managed to reduce within-class correlation by training different view-specific LDAs. This resulted in an improved Fisher criterion. For the same reason, we see that VLDA performs better than TPCA [19] (Figure 11.32). Experiments were also carried out on view-based PCA [3] and the results compared to those of PCA [19] and VPCA [3]. We found that there is not much improvement in the results and the recognition accuracy remains the same as we increase the number of eigenfaces (Figure 11.33). The reason for this could be that PCA just relies on the covariance matrix of the data and training view-specific PCAs does not help much in improving the separation. For all experiments, we see that the proposed view-based LDA performs better than traditional LDA and traditional PCA. Since the performance of LDA gets better if we have larger databases, we expect Figure 11.30 Fisher faces trained for traditional LDA. 90 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 Number of Fisher faces Percentage recognition Traditional LDA View-based LDA Figure 11.31 View-based LDA vs. traditional LDA. [...]... comprehensive multimodal biometric recognition systems References [1] International biometric group, Market report 2000–20 05, http://www.biometricgroup.com/, September 2001 [2] Jain, L C., Halici, U., Hayashi, I., Lee, S B and Tsutsui, S Intelligent Biometric Techniques in Fingerprint and Face Recognition CRC Press, 1990 [3] Moghaddam, B and Pentland, A “Face recognition using view-based and modular eigenspaces,”... [14] Brunelli, R and Poggio, T “Face Recognition: Features versus Templates,” IEEE Transactions on PAMI, 15( 10), pp 1042–1 052 , 1993 [ 15] Wiskott, N K L., Fellous, J M and von der Malsburg, C “Face recognition by elastic bunch graph matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), pp 7 75 779, 1997 [16] Taylor, C J., Edwards, G J and Cootes, T F “Face recognition using... physical growth Computer-Aided Intelligent Recognition Techniques and Applications © 20 05 John Wiley & Sons, Ltd Edited by M Sarfraz 202 Recognition of Human Faces by Humanoid Robots stage Then, what about mental growth? Mental growth of humanoid robots is an area with intense challenges and research values for investigation Babies can adaptively learn and acquire knowledge by various means, and gain ‘intelligence’... face recognition on systems,” in Fourth International Conference on Automatic Face and Gesture Recognition, 2000 [28] Schrater, P R “Bayesian data fusion and credit assignment in vision and fmri data analysis,” Computational Image Proceedings of SPIE, 50 16, pp 24– 35, 2003 [29] Moghaddam, B “Principal manifolds and probabilistic subspaces for visual recognition, ” IEEE Transactions on Pattern Analysis and. .. Pose-invariant Face Recognition Table 11.4 Summary of results Algorithm View-based LDA Traditional LDA View-based PCA Traditional PCA Time (in s) Max recognition accuracy 183 650 288 719 90 4 850 140 9060 88.333 83.333 84.444 85 Memory usage 51 4 429 51 4 429 320 200 320 200 6 Concluding Remarks In this chapter, we have presented an overview of biometric techniques and focused in particular on face recognition. .. represented as follows: 1 R, G and B values in RGB primary color space are converted into X, Y and Z tristimulus values defined by CIE in 1931 X = 2 7690R + 1 751 8G + 1 1300B Y = 1 0000R + 4 59 07G + 0 0601B (12.1) Z = 0 0000R + 0 056 5G + 5 5943B 2 L∗ a∗ b∗ values are obtained by a cube-root transformation of the X, Y and Z values: L∗ = 116 Y/Y0 1/3 − 16 ∗ 1/3 − Y/Y0 ∗ 1/3 − Z/Z0 a = 50 0 X/X0 b = 200 Y/Y0 1/3... http://www.ifp.uiuc.edu/speech/ [ 25] “Dna,” http://library.thinkquest.org/169 85/ dnamain.htm [26] Hong, L and Jain, A K “Integrating faces and fingerprints for personal identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, pp 12 95 1307, 1998 [27] Akamatsu, S., Vonder, C., Isburg, M and Okada, M K “Analysis and synthesis of pose variations of human faces by a linear pcmap model and its application... Numerous techniques have been proposed for face recognition, however, all have advantages and disadvantages The choice of a certain technique should be based on the specific requirements for the task at hand and the application of interest • Although numerous algorithms exist, robust face recognition is still difficult • A major step in developing face recognition algorithms is testing and benchmarking Standard... 24, pp 1–10, 1991 [34] Hyvrinen, A and Oja, E Independent component analysis: A tutorial, Helsinki University of Technology, Laboratory of Computer and Information Science, 1999 [ 35] Huber, P “Projection pursuit,” The Annals of Statistics, 13(2), pp 4 35 4 75, 19 85 [36] Jones, M and Sibson, R “What is projection pursuit?,” Journal of the Royal Statistical Society, ser A( 150 ), pp 1–36, 1987 [37] Hyvrinen,... Whitaker, C J., Shipp, C A and Duin, R P.W “Is Independence Good for Combining Classifiers?,” International Conference on Pattern Recognition (ICPR), pp 168–171, 2000 References 199 [12] Jain, A K and Ross, A “Information fusion in biometrics,” Pattern Recognition Letters, 24, pp 21 15 21 25, 2003 [13] Lu, X Image analysis for face recognition, Department of Computer Science and Engineering, Michigan . Lee, S. B. and Tsutsui, S. Intelligent Biometric Techniques in Fingerprint and Face Recognition. CRC Press, 1990. [3] Moghaddam, B. and Pentland, A. “Face recognition using view-based and modular. outperforming humans in the physical growth Computer-Aided Intelligent Recognition Techniques and Applications Edited by M. Sarfraz © 20 05 John Wiley & Sons, Ltd 202 Recognition of Human Faces by Humanoid. R. and Poggio, T. “Face Recognition: Features versus Templates,” IEEE Transactions on PAMI, 15( 10), pp. 1042–1 052 , 1993. [ 15] Wiskott, N. K. L., Fellous, J. M. and von der Malsburg, C. “Face recognition