Face tracking and indexing in video sequences

131 400 0
Face tracking and indexing in video sequences

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Face Tracking and Indexing In Video Sequences Reporters: M Serge MIGUET M Farid MELGANI Examiners: Mme Bernadette DORIZZI Student: M Malik MALLEM Ngoc-Trung Tran M Sami ROMDHANI M Kevin BAILLY Supervisors: M Fakhreddine ABABSA M Maurice CHARBIT A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the LTCI Department Telecom ParisTECH July 2015 Abstract Face tracking in video sequences is an important problem in computer vision because of multiple applications in many domains, such as: video surveillance, human computer interface, biometrics Although there are many recent advances in this field, it still remains a challenging topic, in particular if Degrees of Freedom (DOF) - three dimensional (3D) translation and rotation - or fiducial points (to capture the facial animation) needs to be estimated in the same time Its challenge comes mainly from following factors: illumination variations, wide head rotation, expression, occlusion, cluttered background, etc In this thesis, contributions are made to address two of mentioned major difficulties: the expression and wide head rotation We aim to build a 3D tracking framework on monocular cameras, which is able to estimate accurate DOF and facial animation while being robust with wide head rotation, even profile Our method adopt the 3D face model as a set of 3D vertices needed to compute continuous values of DOF in video sequences To track wide 3D head poses, there are significant difficulties regarding to the data collection, where the pose and/or a large number of fiducial points is generally required in order to build the statistical shape and appearance models The pose ground-truth is expensive to be collected because of requirement of some specific devices or sensors, while the manual annotation of correspondences on large databases can be tedious and error prone Moreover, the problem of correspondence annotation is the difficulty of locating the hidden points of self-occlusion faces at the profile The result is a general approach that wants to tackle the out-of-plane rotations have to usually use a view-based or adaptive models The problem matters since the different numbers of fiducial points on views, for example, profile and frontal, that causes a gap in term of tracking between them using a single model The first main contribution of our thesis is to use of a large synthetic data to overcome this problem Leveraging on the such data covering the full range of head pose and the on-line information correlating between frames, we propose the combination between them to robustify the tracking The local features are adopted in our thesis to reduce the high variation of facial appearance when learning on a large dataset ii To be rigorous with simultaneously tracking of the facial animation, out-of-plane rotations and towards to in-the-wild, the combination of synthetic and real datasets is proposed to train using cascaded-regression tracking model In the perspective of cascaded regression, the local descriptor is very significant for high performance Hence, we utilize the method of feature learning in order to allow the representation of local patches more discriminative than hand-crafted descriptors Furthermore, the proposed method introduces some modifications of the traditional cascaded approach at later stages in order to boost the quality of the found fiducial points or correspondences Lastly, Unconstrained 3D Pose Tracking (U3PT) dataset, our own recordings, would be nominated for the evaluation of in-the-wild face tracking in the community The dataset captured on ten subjects (five videos/subject) in the office environment with the cluttered background People, who are captured, moving comfortably in front of the camera, rotating their heads in three direction (Yaw, Pitch and Roll), even 90 degree of Yaw, doing some occlusion or expression In addition, these videos are recorded in different light conditions With the available ground-truth of 3D pose computed using the infrared camera system, the robustness and accuracy of tracking framework, which aims to work in-the-wild, could be accurately evaluated Keywords: head pose tracking, pose estimation, 3D face tracking, in-the-wild face tracking, Bayesian tracking, face alignment, cascaded regression, synthetic data iii Publications During this study, the following papers were published or under submission: U3PT: A New Dataset for Unconstrained 3D Pose Tracking Evaluation Ngoc Trung Tran, Fakhreddine Ababsa and Maurice Charbit, International Conference on Computer Analysis of Images and Patterns (CAIP), 2015 Cascaded Regression of Learning Feature for Face alignment Ngoc Trung Tran, Fakhreddine Ababsa, Sarra Ben Fredj and Maurice Charbit, Advanced Concepts for Intelligent Vision Systems (ACIVS), 2015 Towards Pose-Free Tracking of Non-Rigid Face using Synthetic Data Ngoc Trung Tran, Fakhreddine Ababsa and Maurice Charbit, International Conference on Pattern Recognition Applications and Methods (ICPRAM), 2015 A Robust Framework for Tracking Simultaneously Face Pose and Animation using Synthesized Faces Ngoc Trung Tran, Fakhreddine Ababsa and Maurice Charbit, Pattern Recognition Letter, 2014 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach Ngoc-Trung Tran, Fakhreddine Ababsa, Jacques Feldmar, Maurice Charbit, Dijana PetrovskaDelacretaz and Gerard Chollet International Symposium on Visual Computing (ISVC), 2013 3D Face Pose Tracking from Monocular Camera via Sparse Representation of Synthesized Faces Ngoc-Trung Tran, Jacques Feldmar, Maurice Charbit, Dijana Petrovska-Delacretaz and Gerard Chollet International Conference on Computer Vision Theory and Applications (VISAPP), 2013 Towards In-the-wild 3D Head Tracking Ngoc Trung Tran, Fakhreddine Ababsa and Maurice Charbit, Journal of Machine Vision and Applications (MVA), 2015 (Submitted) Abbreviations 2D Two Dimensional 3D Three Dimensional AAM Active Appearance Model ASM Active Shape Model CLM Constrained Local Model DOF Degrees-of-Freedom GMM Gaussian Mixture Model IC Inverse Compositional IRLS Iteratively Re-weighted Least Squares LDM Linear Deformable Model KF Kalman Filter KLT Kanade–Lucas–Tomasi ML Maximum Likelihood MAE Mean Average Error PCA Principal Component Analysis RANSAC RANdom SAmple Consensus RGB Red Green and Blue RMS Root Mean Squared POSIT Pose from Orthography and Scaling with ITerations SDM Supervised Descent Method SfM Structure from Motion SSD Sum of Squared Differences SVD Singular Value Decomposition SVM Support Vector Machine 3DMM 3D Morphable Model Contents Contents v List of Figures viii List of Tables Introduction 1.1 Motivation 1.2 Challenge 1.3 Objectives 1.4 Overview 1.5 Notation xi State-of-the-art 2.1 Face Models 2.1.1 Linear Deformable Models (LDMs) 2.1.2 Rigid Models 2.2 Tracking Approaches 2.2.1 Optical Flows Regularization 2.2.2 Manifold Embedding 2.2.3 Template Model 2.2.4 Local Matching 2.2.5 Discriminative 2.2.6 Regression 2.2.7 Hybrid 2.3 Multi-view Tracking 2.3.1 On-line adaptive models 2.3.2 View-based models 2.3.3 3D Synthetic models 2.4 Discussion 2.4.1 Off-line learning vs On-line adaptation 2.4.2 Real data vs Synthetic 2.4.3 Global vs Local features 2.5 Databases v 1 10 10 11 12 13 13 14 15 16 17 19 20 20 20 21 21 22 22 23 23 23 Contents vi A Baseline Framework for 3D Face Tracking 3.1 The General Bayesian Tracking Formulation 3.2 Face Tracking using Covariance Matrices of Synthesized Faces 3.2.1 Face Representation 3.2.2 Baseline Framework 3.2.2.1 Data Generation 3.2.2.2 Training 3.2.2.3 Tracking 3.2.3 Experiments 3.3 Face Tracking using Feature Reconstruction through Sparse Representation 3.3.1 Reconstructed-Error Function 3.3.2 Codebook 3.3.3 Experiments 3.4 Face Tracking using Adaptive Local Models 3.4.1 Adaptive Local Models 3.4.2 Experiments 3.4.2.1 Pose accuracy 3.4.2.2 Landmark precision 3.4.2.3 Out-of-plane tracking 3.5 Conlcusion 25 25 26 26 29 29 31 31 34 36 36 38 39 40 40 42 42 44 47 47 Pose-Free Tracking of Non-rigid Face in Controlled Environment 4.1 Introduction 4.2 Wide Baseline Matching as Initialization 4.3 Large-rotation tracking via Pose-wise Appearance Models 4.3.1 View-based Appearance Model 4.3.2 Matching Strategy by Keyframes 4.3.3 Rigid and Non-rigid Estimation using View-based Appearance Model 4.3.4 Experimental Results 4.4 Flexible tracking via Pose-wise Classifiers 4.4.1 Templates for Matching using Support Vector Machine 4.4.2 Large off-line Synthetic Dataset 4.4.3 Local Appearance Models 4.4.4 Fitting via Pose-Wise Classifiers 4.4.5 Experimental Results 4.4.5.1 Pose Accuracy 4.4.5.2 Landmark precision 4.4.5.3 Out-of-plane tracking 4.5 Conclusion 49 49 49 52 52 54 55 57 59 60 62 63 64 66 66 67 68 69 Towards In-the-wild 3D Face Tracking 70 5.1 Cascaded Regression of Learning Features for Landmark Detection 70 5.1.1 Introduction 70 Contents vii 5.1.2 5.1.3 5.1.4 5.1.5 5.2 5.3 Cascaded Regression Local Learning Feature Coarse-to-Fine Correlative Cascaded Regression Experimental Results 5.1.5.1 Learing Feature on Inner vs Boundary points 5.1.5.2 Coarse-to-Fine Correlative Regression (CFCR) 5.1.6 Conclusion In-the-wild 3D Face Tracking 5.2.1 Introduction 5.2.2 Real and synthetic data for in-the-wild tracking 5.2.3 Unconstrained 3D Pose Tracking Dataset 5.2.3.1 Recording System 5.2.3.2 Calibration 5.2.3.3 Dataset 5.2.4 Experiments 5.2.4.1 Public datasets 5.2.4.2 Datasets for Visualization 5.2.4.3 Our dataset: Unconstrained Pose Tracking dataset 5.2.5 Conlcusion Conlcusion Conclusion and Future Works A Algorithms in Use A.1 Nelder-Mead algorithm A.2 POSIT A.3 RANSAC A.4 Local Binary Patterns A.5 3D Morphable Model Bibliography 72 74 75 77 79 81 81 81 81 83 88 88 89 92 93 93 94 95 98 99 100 103 103 103 106 107 107 109 List of Figures 1.1 1.2 Three head orientations: Yaw, Pitch and Roll Some challenging conditions of 3D face tracking: illumination, head rotation, clustered background in one sample video sequence 2.1 Some 3D face models from left to right: AAM, Cynlinder, Mesh and 3DMM 11 3.1 3.2 The frontal and profile views of Candide-3 model One example of the perspective projection in our method to obtain 2D projected landmarks from the current model given the parameters The diagram of baseline framework for 3D face tracking: data generation, training and tracking stages Landmark initialization, 3D model fitting using POSIT and the rendering of synthesized images The landmarks used in the framework, examples of mean and covariance matrices at the eye and mouth corners The visualization of two and three first components of descriptor projection using PCA learned through synthetic images of some feature points Some video examples of BUFT videos at different views The framework using reconstructed features via sparse representation with two modifications from the baseline The sparse representation to estimate the coefficients of the positive (green) and negative (red) local patches of one corner of eyebrow The construction of posistive and negative patches of eye corners The framework using adaptive local models based on the baseline An sample video of BUFT dataset using local adaptive model Some tracking examples on BUFT dataset using local adaptive model (green) and FaceTracker (yellow) The visualization of our (Yaw, Pitch, Roll) estimation (blue) and groundtruth (red) on the video jam7.avi using local adaptive model The visualization of our (Yaw, Pitch, Roll) estimatation (blue) and groundtruth (red) on the video vam2.avi using local adaptive model The RMS error of 12 selected points for tracking in our framework (red) compared to [97] (blue) The vertical axis is RMS error (in pixel) and the horizontal axis is the frame number 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 4.1 27 29 30 31 32 33 34 37 38 39 40 44 45 45 46 47 The basic idea of wide baseline matching 50 viii List of Figures 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 5.1 5.2 Computing 3D points Lk as the 3D intersection points From left to right: The 2D SIFT keypoints of the keyframe, SIFT matching, outliers removed by RANSAC The pipeline of Pose-Wise Appearance Model based Framework The structure of mapping table with the key as three orientation and the content as the local descriptors The cross-validation to select two parameters: the number of nearest neighbors Nq and the threshold kHub The validation of a) first row: the number of nearest poses, b) second row: kHub for |Y aw| ≤ 30◦ , and c) third row: kHub for |Y aw| > 30◦ The vertical axis is RMS error Tracking examples on VidTimid and Honda/UCSD The way how to pick up positive (blue) and negative samples (red), and the response map computed at the mouth corner after training Some weight matrices or templates of local patches T (x) in our implementation From left to right of training process: 143 frontal images, landmark annotation and 3D model alignment, synthesized images rendering, and posewise SVMs training a) The Candide-3 model with facial points in our method (b) The way to compute the response map at the mouth corner using three descriptors via SVM templates The pipeline of tracking process from the frame t to t + The RMS of our framework (red curve) and FaceTracker [97] (blue curve) The vertical axis is RMS error (in pixel) and the horizontal axis is the frame number Our tracking method on some sample videos of VidTimid and Honda/UCSD Some results of face alignment on some challenging images of 300-W dataset The visualization of weight matrices when learning local descriptors of eye and mouth corners using RBM 5.3 The overview of our approach In training, we learn sequentially RBM models, coarse-to-find regression (correlative and global regression) 5.4 CED evaluation of local correlative regression on specific landmarks i → j means using i-th landmark to detect j-th landmark 5.5 The cross-validation of feature size, the number of hiddens, the number of random samples and the number of regressors 5.6 The evaluation of the effect of inner and boundary points on cross-validation set of 300-W dataset 5.7 Some results on 300-W dataset 5.8 The annotation of 51 landmarks in synthetic images and its automatic rendering in different views 5.9 Examples of landmark annotation of Multi-PIE at different views 5.10 Examples of automatic rendering of synthetic images 5.11 Landmark detection on some large rotations using synthetic training data ix 51 52 53 54 58 59 61 61 62 64 64 67 68 71 75 76 77 79 80 82 83 84 85 86 Algorithms in Use 105 rotation and translation matrices, and ii) POSIT (POS with ITerations) uses iterations to compute better the projection of feature points POSIT is easy to implement and it converges to accurate pose in a few iterations It does not require the initial guess at starting and it can be used with a lot of feature points to be less sensitive to measurement errors and image noises The detail of algorithms you can find in [33], we here present you just the key idea of the algorithm: {Fig A.1} is the geometric perspective to build the algorithm, where O is the center of projection f is the focal length Ox , Oy and Oz points along the rows, columns and optical axis respectively, where their unit vectors are called i, j and k Mi is the feature point of the object in the field of view of the camera G and K is the image plane and the parallel plane crossing the object center M0 The coordinates (Ui ,Vi ,Wi ) of the point Mi in the object coordinate are known The images of points Mi is named mi , and theirs coordinates (xi ,yi ) are known The goal is how to find the rotation R and translation T matrices of object to know its coordinates in the camera coordinate system T is equal Z0 f OM0 and R can be written:  iu iv iw     ju jv jw    ku kv kw (A.1) where iu , iv , and iw are coordinates of i in the coordinate system of the object It is similar for j and k We need only i and j to compute the rotation because k can be obtained by the cross-product i × j Therefore, the object pose is defined by finding i, j and Z0 , and the equation for this problem can be written: where I = f Z0 i, J= f Z0 j and MM0 I = xi (1 + i ) − x0 (A.2) MM0 J = yi (1 + i ) − y0 (A.3) i = Z0 M0 Mi k step method with a few iterations: given re-compute i i The algorithm is performed by a two- to compute I and J and fixed I and J to Notice that i and j are computed by normalizing I and J Algorithms in Use 106 Figure A.1: The pinhole camera model of POSIT (courtesy of [33]) A.3 RANSAC The RANdom SAmple Consensus (RANSAC) algorithm was firstly proposed by [39] which developed from within the computer vision community This algorithm is a general approach for parameter estimation designed to remove a large proportion of outliers from the input data It is a re-sampling technique that uses a small number of observations to create the candidate solutions in order to estimate the model parameters In our context, we use RANSAC to remove outliers of matching points and estimating the homography matrix between two frames The algorithm is presented as follows: Algorithm RANSAC algorithm for homography estimation and inliers INPUT: Two sets ({pi },{pi }) of matching points between two frames OUTPUT: The homography matrice H and well-matched points 1: 2: 3: 4: 5: 6: 7: repeat Select randomly four pairs of points from two sets Compute homography H from selected pairs using SVD Determine inliers from all points with a predefined tolerance: SSD(pi ,Hpi ) < Keep largest set of inliers Re-compute least-squares H estimate on all of the inliers until [Number of iterations > a predefined number] Algorithms in Use 107 Figure A.2: How to compute LBP of a pixel (courtesy of [90]) A.4 Local Binary Patterns Local Binary Patterns (LBP) is a very simple yet efficient descriptor, which is well-known in computer vision It is robust to gray scale variations and invariant to monotonic transformation of gray scale There is many variations of LBP, yet we propose to use only the traditional one The idea of LPB is to encode the relation between the center pixel and its neighbors of a 3×3 matrix into binary patterns The value of each pixel is given by: Nn =8 s(gp − gc )2p LBP (c) = (A.4) i=1  1, x ≥ where s(x) = 0, x < , Nn is the number of neighbors (equal to in our study) LBP(c) is the LBP value at pixel c (the center of a 3×3 matrix) gc and gp is intensity value at center c and its neighbors p {Fig A.2} is an sample of how to compute the LBP of the pixel center We use this descriptor because it returns the LBP image at the same size (excluding the image boundary) like the input image It is necessary to learn templates and compute fast the approximation of response map using linear SVM Notice that the LBP images are normalized before using in our methods A.5 3D Morphable Model In our study, 3D Morphable Model (3DMM) was used in order to generate synthetic 3D faces 3DMM is a statistical 3D facial model introduced by Blanz and Vetter Algorithms in Use 108 [14] A vertex-to-vertex correspondence of all 3D faces is performed in order to apply Principal Component Analysis (PCA) on 3D registration faces that were acquired to build this 3DMM The geometry of 3DMM is represented as a shape-vector S = (X1 , Y1 , Z1 , , Xn , Yn , Zn ) ∈ R3n , where Xi , Yi , Zi are coordinates of i-th vertex Similarly, the texture-vector of face is represented as S = (R1 , G1 , B1 , , Rn , Gn , Pn ) ∈ R3n , that contains RGB values of vertices The 3DMM is constructed using USF Human ID 3-D Database [14] of 3D face exemplars Assume that Si and Ti is the shape-vector and texture-vector of i-th examplar The shape Smod and texture Tmod models of 3DMM can be expressed as a linear combination of shapes and textures of Nm = 200 the exemplar faces: Nm −1 Smod = S + Nm −1 αi si , Tmod = T + i=1 βi ti (A.5) i=1 where S and T are averages of shape and textures from Nm face examplars {Si , Ti }, and si and ti (in descending order of eigenvalues) are principal components learned by PCA of shapes and textures {Si , Ti } Notice that all examplars are zero-mean normalized before So, the 3DMM is defined as a set of faces parameterized by shape α = (α1 , α2 , , αNm −1 )T and texture β = (β1 , β2 , , βNm −1 )T coefficients In our study, we create the synthetic database by varying randomly parameters α and β according to the multivariate norm distributions p(α) = exp[− Nm −1 (αi /σi )2 ] (A.6) i=1 with σi2 are eigenvalues of the shape covariance matrix computed by PCA p(β) is computed similarly Bibliography [1] Ralph Adolphs Recognizing emotion from facial expressions: psychological and neurological mechanisms Behavioral and cognitive neuroscience reviews, 1(1):21– 62, March 2002 [2] Gaurav Aggarwal, Ashok Veeraraghavan, and Rama Chellappa 3d facial pose tracking in uncalibrated videos In Proceedings of the First International Conference on Pattern Recognition and Machine Intelligence, 2005 [3] Jorgen Ahlberg Candide-3 - an updated parameterised face Technical report, Dept of Electrical Engineering, Linkoping University, Sweden, 2001 [4] Jose Alonso, Franck Davoine, and Maurice Charbit A linear estimation method for 3d pose and facial animation tracking In CVPR, 2007 [5] Kwang Ho An and Myung-Jin Chung 3d head tracking and pose-robust 2d texture map-based face recognition using a simple ellipsoid model IROS, pages 307–312, Sept 2008 [6] S Asteriadis, K Karpouzis, and S Kollias Visual focus of attention in noncalibrated environments using gaze estimation IJCV, 2014 [7] Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, and Maja Pantic Robust discriminative response map fitting with constrained local models In CVPR, 2013 [8] Sil`eye O Ba and Jean-Marc Odobez Probabilistic head pose tracking evaluation in single and multiple camera setups In Classification of Events, Activities and Relationship Evaluation and Workshop, 2007 [9] S Basu, I Essa, and A Pentland Motion regularization for model-based head tracking In ICPR, ICPR ’96, pages 611–, Washington, DC, USA, 1996 IEEE Computer Society 109 Bibliography 110 [10] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool Speeded-up robust features (surf) Comput Vis Image Underst., 110(3), June 2008 [11] Amir Beck and Marc Teboulle A fast iterative shrinkage-thresholding algorithm for linear inverse problems SIAM, 2(1), March 2009 [12] Peter N Belhumeur, David W Jacobs, David J Kriegman, and Neeraj Kumar Localizing parts of faces using a consensus of exemplars In CVPR, June 2011 [13] P.J Besl and N.D McKay A method for registration of 3d shapes PAMI, 14:239– 256, 1992 [14] Volker Blanz and Thomas Vetter A morphable model for the synthesis of 3d faces In SIGGRAPH, pages 187–194, New York, NY, USA, 1999 [15] Manuel Blum, Jost Tobias Springenberg, Jan Wulfing, and Martin Riedmiller A learned feature descriptor for object recognition in rgb-d data In ICRA, St Paul, Minnesota, USA, 2012 [16] J Y Bouguet Camera calibration toolbox for matlab 2003 [17] M.E Brand and R Bhotika Flexible flow for 3d nonrigid tracking and shape recovery In CVPR, volume 1, pages 315–322, December 2001 [18] X.P Burgos-Artizzu, P Perona, and P Doll´ar Robust face landmark estimation under occlusion In ICCV, 2013 [19] Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun Face alignment by explicit shape regression In CVPR, 2012 [20] Marco La Cascia, Stan Sclaroff, and Vassilis Athitsos Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3d models TPAMI, 22(4):322–336, 2000 [21] Jin-xiang Chai, Jing Xiao, and Jessica Hodgins Vision-based control of 3d facial animation In ACM SIGGRAPH, 2003 [22] Yisong Chen and Franck Davoine Simultaneous tracking of rigid head motion and non-rigid facial animation by analyzing local features statistically In BMVC, 2006 [23] Sukwon Choi and Daijin Kim Robust head tracking using 3d ellipsoidal head model in particle filter Pattern Recognition, 41(9):2901–2915, September 2008 Bibliography 111 [24] T F Cootes and C J Taylor Cj.taylor, ”active shape models - ”smart snakes In in Proceedings of the British Machine Vision Conference, 1992 [25] T F Cootes, C J Taylor, D H Cooper, and J Graham Active shape models: Their training and application CVIU, 61(1):38–59, January 1995 [26] Timothy F Cootes, Gareth J Edwards, and Christopher J Taylor Active appearance models TPAMI, 23(6):681–685, June 2001 [27] Timothy F Cootes, Gavin V Wheeler, Kevin N Walker, and Christopher J Taylor View-based active appearance models IVC, 20(9-10):657–664, 2002 [28] D Cristinacce and T F Cootes Facial feature detection using adaboost with shape constraints In Proceedings of the British Machine Vision Conference, 2003 [29] D Cristinacce and T F Cootes Feature detection and tracking with constrained local models In BMVC, 2006 [30] Navneet Dalal and Bill Triggs Histograms of oriented gradients for human detection In CVPR, 2005 [31] M Dantone, J Gall, G Fanelli, and L Van Gool Real-time facial feature detection using conditional regression forests In CVPR, 2012 [32] Douglas DeCarlo and Dimitris N Metaxas Optical flow constraints on deformable models with applications to face tracking IJCV, 38(2):99–127, 2000 [33] Daniel F Dementhon and Larry S Davis Model-based object pose in 25 lines of code IJCV, 15:123–141, June 1995 [34] P Dollar, P Welinder, and P Perona Cascaded pose regression In CVPR, June 2010 [35] Yanchao Dong, Zhencheng Hu, Keiichi Uchimura, and Nobuki Murayama Driver inattention monitoring system for intelligent vehicles: A review IEEE Transactions on Intelligent Transportation Systems, 12(2):596–614, 2011 [36] Fadi Dornaika and J¨ orgen Ahlberg Fast and reliable active appearance model search for 3-d face tracking volume 34, pages 1838–1853, 2004 [37] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin LIBLINEAR: A library for large linear classification JMLR, 9:1871–1874, 2008 Bibliography 112 [38] G Fanelli, J Gall, and L Van Gool Real time head pose estimation with random regression forests In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, June 2011 [39] Martin A Fischler and Robert C Bolles Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography volume 24, pages 381–395, June 1981 [40] G Golub and W Kahan Calculating the singular values and pseudo-inverse of a matrix SIAM Journal on Numerical Analysis, 2(2):205–224, 1965 [41] Ralph Gross, Iain Matthews, Jeffrey F Cohn, Takeo Kanade, and Simon Baker Multi-pie IVC, 28(5):807–813, 2010 [42] Leon Gu and Takeo Kanade A generative shape regularization model for robust face alignment In ECCV, ECCV ’08, 2008 [43] Lie Gu and T Kanade 3d alignment of face in a single image In CVPR, 2006 [44] Chris Harris and Mike Stephens A combined corner and edge detector In Fourth Alvey Vision Conference, pages 147–151, 1988 [45] Tal Hassner Viewing real-world faces in 3D In ICCV, Dec 2013 [46] K Heinzmann and A Zelinsky 3d facial pose and gaze point estimation using a robust real-time tracking paradigm In FG, 1998 [47] G E Hinton and R R Salakhutdinov Science, 313, 2006 [48] Geoffrey Hinton and Ruslan Salakhutdinov Reducing the dimensionality of data with neural networks, 2006 [49] Dong Huang, Markus Storer, Fernando De la Torre, and Horst Bischof Supervised local subspace learning for continuous head pose estimation In CVPR, 2011 [50] Jun-Su Jang and Takeo Kanade Robust 3d head tracking by online feature registration In FG, September 2008 [51] Andrew Jones, Magnus Lang, Graham Fyffe, Xueming Yu, Jay Busch, Ian McDowall, Mark Bolas, and Paul Debevec Achieving eye contact in a one-to-many 3d video teleconferencing system ACM SIGGRAPH, 28(3):64:1–64:8, July 2009 [52] Vahid Kazemi and Josephine Sullivan Face alignment with part-based modeling In BMVC, 2011 Bibliography 113 [53] Vahid Kazemi and Josephine Sullivan One millisecond face alignment with an ensemble of regression trees In CVPR, 2014 [54] Minyoung Kim, Sanjiv Kumar, Vladimir Pavlovic, and Henry A Rowley Face tracking and recognition with visual constraints in real-world videos In CVPR, 2008 [55] Martin Koestinger, Paul Wohlhart, Peter M Roth, and Horst Bischof Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011 [56] Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir D Bourdev, and Thomas S Huang Interactive facial feature localization In ECCV, 2012 [57] K.C Lee, J Ho, M.H Yang, and D Kriegman Video-based face recognition using probabilistic appearance manifolds volume 1, pages 313–320, 2003 [58] S Lefevre and J.-M Odobez Structure and appearance features for robust 3d facial actions tracking In ICME, 2009 [59] V Lepetit, J Pilet, and P Fua Point matching as a classification problem for fast and robust object pose estimation In CVPR, 2004 [60] Vincent Lepetit and Pascal Fua Monocular model-based 3d tracking of rigid objects Found Trends Comput Graph Vis., 1:1–89, 2005 [61] Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua Epnp: An accurate o(n) solution to the pnp problem IJCV, 81(2), February 2009 [62] J P Lewis Fast normalized cross-correlation volume 1995, pages 120–123, 1995 [63] H Li, P Roivainen, and R Forcheimer 3dd motion estimation in model-based facial image coding PAMI, 15(6), June 1993 [64] Jianguo Li, Eric Li, Yurong Chen, and Lin Xu Visual 3d modeling from images and videos Technical report, Intel Labs China, 2010 [65] Lin Liang, Rong Xiao, Fang Wen, and Jian Sun 0001 Face alignment via component-based discriminative search In ECCV (2), 2008 [66] Wei-Kai Liao and G Medioni 3d face tracking and expression inference from a 2d sequence using manifold learning In CVPR, 2008 Bibliography 114 [67] Zicheng Liu, Zhengyou Zhang, Chuck Jacobs, and Michael F Cohen Rapid modeling of animated faces from video Journal of Visualization and Computer Animation, 12(4):227–240, 2001 [68] David G Lowe Distinctive image features from scale-invariant keypoints IJCV, 60:91–110, 2004 [69] Le Lu, Zhengyou Zhang, Heung yeung Shum, Zicheng Liu, and Hong Chen Modeland exemplar-based robust head pose tracking under occlusion and varying expression In CVPR Workshop, 2001 [70] X Binefa M F Valstar, B Martinez and M Pantic Facial point detection using boosted regression and graph models In CVPR, pages 2729–2736, 2010 [71] Iain Matthews and Simon Baker Active appearance models revisited IJCV, 60(2):135 – 164, November 2004 [72] K Messer, J Matas, J Kittler, J Luettin, and G Maitre Xm2vtsdb: The extended m2vts database In Second International Conference on Audio and Video-based Biometric Person Authentication, March 1999 [73] Krystian Mikolajczyk and Cordelia Schmid Scale & affine invariant interest point detectors Int J Comput Vision, 60(1), October 2004 [74] Krystian Mikolajczyk and Cordelia Schmid A performance evaluation of local descriptors IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(10):1615–1630, 2005 [75] Stephen Milborrow and Fred Nicolls Locating facial features with an extended active shape model In ECCV, ECCV, 2008 [76] Thomas B Moeslund, Adrian Hilton, Volker Kr¨ uger, and Leonid Sigal, editors Visual Analysis of Humans - Looking at People Springer, 2011 [77] L Morency, P Sundberg, and T Darrell Pose estimation using 3d view-based eigenspaces In IEEE International Workshop on Analysis and Modeling of Faces and Gestures, 2003 [78] Louis-Philippe Morency, Jacob Whitehill, and Javier Movellan Monocular head pose estimation using generalized adaptive view-based appearance model IVC, 28(5):754–761, May 2010 Bibliography 115 [79] Louis-Philippe Morency, Jacob Whitehill, and Javier R Movellan Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation In FG, 2008 [80] E Munoz, J.M Buenaposada, and L Baumela A direct approach for efficiently tracking with 3d morphable models In ICCV, pages 1615–1622, Sept 2009 [81] Erik Murphy-Chutorian and Mohan Manubhai Trivedi HyHOPE: Hybrid Head Orientation and Position Estimation for Vision-based Driver Head Tracking IEEE Intelligent Vehicles Symposium, 2008 [82] Erik Murphy Chutorian and Mohan Manubhai Trivedi Head pose estimation in computer vision: A survey PAMI, 31(4), April 2009 [83] J A Nelder and R Mead A simplex algorithm for function minimization Computer Journal, pages 308–313, 1965 [84] Timo Ojala, Matti Pietik¨ainen, and David Harwood A comparative study of texture measures with classification based on featured distributions PR, 29(1):51– 59, 1996 [85] Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, and Pascal Fua Fast keypoint recognition using random ferns PAMI, 32(3), March 2010 [86] Maja Pantic Machine analysis of facial behaviour: Naturalistic and dynamic behaviour Philosophical Transactions of The Royal Society B: Biological sciences, 364(1535), December 2009 [87] Ulrich Paquet Convexity and bayesian constrained local models In CVPR, 2009 [88] J A Paterson and A W Fitzgibbon 3d head tracking using non-linear optimization In BMVC, volume 2, pages 609–618, September 2003 [89] K Pearson On lines and planes of closest fit to systems of points in space Philosophical Magazine, 2:559–572, 1901 [90] M Pietik¨ ainen Local Binary Patterns 5(3):9775, 2010 [91] B Raytchev, I Yoda, and K Sakaue Head pose estimation by nonlinear manifold learning In ICPR, 2004 [92] Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun Face alignment at 3000 fps via regressing local binary features June 2014 Bibliography 116 [93] S Romdhani and T Vetter Efficient, robust and accurate fitting of a 3d morphable model In ICCV, pages 59–66 vol.1, Oct 2003 [94] Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic 300 faces in-the-wild challenge: The first facial landmark localization challenge In ICCV Workshops, 2013 [95] Enrique Sanchez-Lozano, Fernando De la Torre, and D Gonzalez-Jimenez Continuous regression for non-rigid image alignment In ECCV, 2012 [96] Conrad Sanderson The VidTIMIT Database Technical report, IDIAP, 2002 [97] Jason M Saragih, Simon Lucey, and Jeffrey F Cohn Deformable model fitting by regularized landmark mean-shift IJCV, 91:200–215, January 2011 [98] Cordelia Schmid, Roger Mohr, and Christian Bauckhage Evaluation of interest point detectors IJCV, 37(2), June 2000 [99] Rainer Stiefelhagen, Keni Bernardin, Rachel Bowers, R Travis Rose, Martial Michel, and John S Garofolo The CLEAR 2007 evaluation In Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR, pages 3–34, 2007 [100] Jacob Str¨ om Model-based real-time head tracking EURASIP, 2002(1):1039–1052, 2002 [101] Yanchao Su, Haizhou Ai, and Shihong Lao Multi-view face alignment using 3d shape model for view estimation In Proceedings of the Third International Conference on Advances in Biometrics, 2009 [102] Yi Sun, Xiaogang Wang, and Xiaoou Tang Deep convolutional network cascade for facial point detection In CVPR, 2013 [103] Jaewon Sung, Takeo Kanade, and Daijin Kim Pose robust face tracking by combining active appearance models and cylinder head models IJCV, 80(2):260–274, November 2008 [104] Carlo Tomasi and Takeo Kanade Detection and tracking of point features Technical report, International Journal of Computer Vision, 1991 [105] Federico Tombari, Samuele Salti, and Luigi Di Stefano Unique signatures of histograms for local surface description In ECCV, 2010 Bibliography 117 [106] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler Nonrigid structurefrom-motion: Estimating shape and motion with hierarchical priors PAMI, pages 878–892, 2008 [107] Kentaro Toyama ”look, ma - no hands!”: Hands-free cursor control with real-time 3d face tracking In In Proceedings of the Workshop on Perceptual User Interfaces, January 1998 [108] Ngoc-Trung Tran, Fakhr eddine Ababsa, Maurice Charbit, Jacques Feldmar, Dijana Petrovska-Delacr´etaz, and Gerard Chollet 3d face pose and animation tracking via eigen-decomposition based bayesian approach In ISVC, July 2013 [109] J Tu, T Huang, and Hai Tao Face as mouse through visual face tracking In Computer and Robot Vision, pages 339–346, May 2005 [110] G Tzimiropoulos and M Pantic Optimization problems for fast aam fitting inthe-wild In ICCV 2013, 2013 [111] G Tzimiropoulos and M Pantic Gauss-newton deformable part models for face alignment in-the-wild In CVPR 2014, 2014 [112] Luca Vacchetti, Vincent Lepetit, and Pascal Fua Stable real-time 3d tracking using online and offline information TPAMI, 26(10):1385–1391, 2004 [113] S Valente and J Dugelay Face tracking and realistic animations for telecommunicant clones In International Conference on Multimedia Computing and Systems, volume 2, pages 678–683 vol.2, Jul 1999 [114] A Wahi, S Sundaramurthy, and P Abinaya A survey on automatic drowsy driver detection system in image processing International Journal Of Innovative Science And Applied Engineering Research, 2014 [115] Haibo Wang, Franck Davoine, Vincent Lepetit, Christophe Chaillou, and Chunhong Pan 3-d head tracking via invariant keypoint learning IEEE Transactions on Circuits and Systems for Video Technology, 22(8):1113–1126, 2012 [116] Nannan Wang, Xinbo Gao, Dacheng Tao, and Xuelong Li Facial feature point detection: A comprehensive survey CoRR, abs/1410.1037, 2014 [117] Qiang Wang, Weiwei Zhang, Xiaoou Tang, and Heung-Yeung Shum Real-time bayesian 3-d pose tracking IEEE Transactions on Circuits and Systems for Video Technology, 16(12):1533–1541, Dec 2006 Bibliography 118 [118] Yang Wang, Simon Lucey, and Jeffrey Cohn Enforcing convexity for improved alignment with constrained local models In CVPR, June 2008 [119] Yang Wang, Simon Lucey, Jeffrey Cohn, and Jason M Saragih Non-rigid face tracking with local appearance consistency constraint In FG, September 2008 [120] John Wright, Allen Y Yang, Arvind Ganesh, S Shankar Sastry, and Yi Ma Robust face recognition via sparse representation PAMI, 31(2):210–227, February 2009 [121] D Ramanan X Zhu Face detection, pose estimation, and landmark localization in the wild In CVPR, 2012 [122] Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade Real-time combined 2d+3d active appearance models In CVPR, volume 2, pages 535 – 542, June 2004 [123] Jing Xiao, Tsuyoshi Moriyama, Takeo Kanade, and Jeffrey Cohn Robust fullmotion recovery of head by dynamic templates and re-registration techniques International Journal of Imaging Systems and Technology, 13:85 – 94, September 2003 [124] Xuehan Xiong and Fernando De la Torre Frade Supervised descent method and its applications to face alignment In CVPR, May 2013 [125] Junjie Yan, Zhen Lei, Dong Yi, and Stan Z Li Learn to combine multiple hypotheses for accurate face alignment In ICCV Workshops, December 2013 [126] Yi Yang and D Ramanan Articulated pose estimation with flexible mixtures-ofparts In CVPR, 2011 [127] Lijun Yin, Xiaozhou Wei, Yi Sun, Jun Wang, and M.J Rosato A 3d facial expression database for facial behavior research In FG, 2006 [128] Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, and Dimitris N Metaxas Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model In ICCV, December 2013 [129] Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen Coarse-to-fine autoencoder networks (CFAN) for real-time face alignment In ECCV, 2014 [130] Wei Zhang, Qiang Wang, and Xiaoou Tang Real time feature based 3-d deformable face tracking In ECCV, pages 720–732, 2008 Bibliography 119 [131] Ye Zhang and Chandra Kambhamettu 3d head tracking under partial occlusion Pattern Recognition, 35(7):1545–1557, 2002 [132] Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, and Quang-Tuan Luong A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry Artificial Intelligence, 78(1-2), October 1995 [133] Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, and Xilin Chen Cascaded shape space pruning for robust facial landmark detection In ICCV, 2013 [134] Erjin Zhou, Haoqiang Fan, Zhimin Cao, Yuning Jiang, and Qi Yin Extensive facial landmark localization with coarse-to-fine convolutional network cascade In ICCV Workshops, December 2013 [135] Zhiwei Zhu and Qiang Ji Real time 3d face pose tracking from an uncalibrated camera In CVPR Workshop, pages 73–73, June 2004 [...]... depending on the tracking technique points of views More precisely, the technique of aligning the face model into 2D frames in video sequences Fortunately, each group of tracking methods uses one specific 3D model, feature descriptor or training data For example, discriminative methods for face tracking often use Linear Deformable Models (LDMs) and local descriptors and the face model is trained from... fragment to tracking system • Moreover, the face tracking has to be real-time for many applications In addition, many stages are rigorously integrated to build an automatic and robust 3D face tracking system in video sequences First, the face detection to localize the face at first frame Second, the alignment stage to align the face model into the 2D facial image Third, the tracking stage to infer the face. .. disgusted, surprise, and fearful, etc Indeed, the consideration for non-rigid parameters is often represented in form of detecting and tracking facial points These points are acknowledged as fiducial points, feature points or landmarks in face processing community The word indexing in our report means that rigid and non-rigid parameters are estimated from video frames Our aim is to read a video (or from... alarm during driving a car [35, 114] The human-machine interaction to control computers or devices using face gestures [107, 109] Face tracking and pose estimation is useful to build the teleconferencing and Virtual Reality-interfaces systems [51, 113] Face tracking is the compulsory stage of the face recognition system for video security systems to track and identify people, because the faces have... determine the direction where people are likely looking at in video sequences It makes sense that people may be focusing on someone or something while talking Furthermore, there are important meanings in head movements as a form of gesturing in a conversation For example, the head nodding or shaking indicates that people understand and misunderstand or agree and disagree respectively to what is being... tracking framework using 3D models The choice of head model allows people to decide what characteristics of face to be interpreted in the tracking 2.1.1 Linear Deformable Models (LDMs) The most popular face model in the literature is Linear Deformable Models (LDMs) Some instances of this model have the distinction representing deformability in only shape or both in shape and appearance The main point... views in 3D face tracking problem: i) using on-line adaptive models, ii) using view-based models and iii) using 3D synthetic models 2.3.1 On-line adaptive models Most of popular approaches for large rotation is usually using the cylindrical model The initialization of model at first frame could be manual if using 3D non-rigid model or automatic if using the cylindrical model During tracking, the single... [103] combined two face models of the AAM and cylindrical model The cylindrical model keeps the tracking robust in the profile views and it could provide the good initialization for AAM fitting in near-frontal views [6] combined skin detection, distance vector fields and Convolutional Neural Network (CNN) to track and recognize the head pose and gaze direction The hybrid approach is one promising direction... methods did tracking and updating on-line the 3DMM that can accumulate the errors in long video sequences For landmark detection, [43] used view-based patches from the off-line synthetic dataset to detect landmarks at large Y aw angle The use of 3DMM is promising direction [43] because of several following reasons Creating a good dataset for tracking costs a lot of efforts and annotate the hidden landmarks... parameters) before doing some pre-processing stages, such as: normalizing or estimating the frontal view of the face Many monitoring systems involve informations associated to the face position In addition to security issues, they can be used for marketing issues One example is the face tracking and recognition of human activity, which can be used in supermarkets to track customer behavior This kind of system

Ngày đăng: 21/05/2016, 23:12

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan