2012 Fourth International Conference on Knowledge and Systems Engineering Shift error analysis in image based 3D skull feature reconstruction Thi Chau Ma , The Duy Bui, Trung Kien Dang Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi, Vietnam chaumt@vnu.edu.vn Abstract 3D skull is crucial in skull-based 3D facial reconstruction [1, 2, 3, 4, 5, 6, 7, 8] In 3D reconstruction, especially in skull-based 3D facial reconstruction, features usually play an important role Because, the accuracy in feature detection strongly affects the accuracy of the 3D final model In this paper, we concentrate on accuracy of 3D reconstructed skull, one important part in skull-based 3D facial reconstruction We discuss a cause of errors called shift errors when taking sequence of skull images In addition, we analysis the effect of shift error in 3D reconstruction and propose solution to limit the effect Keywords: feature detection, accuracy, 3D facial reconstruction Figure A scanned 3D skull images and landmarks on the skull People are sensitive in face recognition Biederman [9] concluded that face recognition and thing recognition are different Contrast, illumination, size, and transformation (especially rotation) have strongly affected in face recognition, while, almost all those factors have few effects in other thing recognition Moreover, while distinctions between things are easily named, a little difference in two faces is easy to recognise but it is not named Hence, face reconstruction requires high accuracy for people to recognise Recently, in [1] the authors showed that ”54%, 65%, and 77% of the three facial reconstruction surfaces had less than 2.5 mm of error when compared to the relevant target face” for their experiments So, performance of 1mm-2mm in change to increase the accuracy for 3D reconstructed skull or 3D reconstructed face is significant In this paper, we have implemented errors assessment in feature detection from skull 2D images which affects 3D facial reconstruction We first analyses the effect of the errors in 3D skull features This analysis is then used to modify 3D skull features in order to increase the accuracy of the 3D reconstructed face The paper rest of the paper is organized as follows We show some related works in the next section Then, we discuss the cause of errors in feature detection in Section From that, we derive how adjustment could be applied in Introduction In skull based 3D facial reconstruction, given 3D anthropometric landmarks on the 3D skull, it is possible to morph a 3D template fitting the skull due to soft tissue thickness at the landmarks to get the final 3D face The landmarks are almost determined manually on scanned skull when anthropometric information is supplied (Figure 1) Therefore, instead of the whole skull, only 3D skull landmarks are used Moreover, scanning skull requires expensive equipment To overcome the problem, we used skull images rather than scanned 3D skull The whole process is shown in Figure We take sequence of images by moving camera around the skull to capture all important landmarks The landmarks included in set of features can be detected with any automatic feature detector These features are matched to create correspondences between successive images 3D skull features are then generated from these corresponding features The question is how errors of the feature detection process would affect the accuracy of the 3D reconstructed skull and how we can improve the accuracy of the 3D skull by analysing and fixing the errors 978-0-7695-4760-2/12 $26.00 © 2012 IEEE DOI 10.1109/KSE.2012.38 [5], the soft tissue thickness was calculated from statistical data based on the soft tissue database of Rhine and Moore [6] Initially, the landmarks were transformed using Procrustes transformation method, then, combination of RBF and Procustes on all the remaining points of the face The face template obtained in a typical database scan Therefore, the generated face depended on the face database Moreover, the resulted face was flawed because of lack of information about soft tissue thickness In [8] the authors used soft tissued thickness calculated from skull measures instead of the statical one Kahler et al [7] built expressive faces from skull The method used in this study combined surgery and soft tissue thickness The skull was scanned and attached 40 dowels corresponding to the statistical standard thickness of soft tissue [6] A mesh consisting of 8164 triangles was used to represent the skull Radius basis functions were used to deform the face template to match the skull The authors also conducted interpolation the thickness of soft tissue landmarks Moreover, the authors used additional anthropometric laws to support shaping the nose and mouth Recently, in [1], the faces of the subjects were reconstructed according to the facial soft tissue depths data for living Korean adults They used 3D deformation tools to alter the shape In addition, the authors used a number of guides to predict facial components, such as eyes, nose, mouth and ears Though, skull based 3D facial reconstruction has not been new and there are a number of accuracy studies using traditional 3D manual methods demonstrating good levels of likeness to the target faces [10, 11, 12, 13, 14, 15] However, for almost computer aided 3D facial reconstruction system, accuracy of 3D final results is not quantitative The results were ussually got feedback from anthropologists or Forensic experts In recent, there was a accurate study for computer aided system of facial reconstruction [1] The authors compared the reconstructed faces and target faces by Geomagic Qualify software and gave quantitative comparision on reconstructed faces For 3D reconstruction from images, especially 3D facial reconstruction from images, so far, the reconstruction requires one, more than one or sequence of images A 2D image includes 2D array of pixels Each pixel value holds intensity at that location Usually the intensity value is represented the mix of three colors Red, Green, Blue The intensity values is used to calculate the depth information (z coordinates) of objects in images With an input image, the shape of the objects has to taken out to conduct the modeling The techniques to retrieve the shape are shape from shading [16], shape from texture [17] shape from specularity [18], shape from contour [19], shape from 2D edge gradients [20] However, with a single input image, the computational complexity is high, moreover, the final mod- Figure Skull images based 3D facial reconstruction 3D skull feature In Section 4, we perform experiments to confirm those theoretical results Related works For 3D facial reconstruction from skull, in almost researches, authors used to make use of 3D scanned skulls as inputs [1, 2, 3, 4, 5, 6, 7, 8] In [2], Archer used the standard system of 32 dowels mounted on the skull at location of landmarks with lengths of the statistical data of the soft tissue thickness of African-American men Besides, the author added 89 dowels whose lengths were interpolated from 32 standard lengths A top surface in the form of hierarchical B-spline was transformed to match the skull The transformation was divided into levels In this study, the new dowels interpolation was not really accurate about positions as well as lengths So, the final face was swollen or unwanted collapsed The author made adjustments based on B-spline surface tangent at the control points The result got positive feedbacks However, it also showed some limitations In terms of anatomy, the mouth was wide, the eye position and size were not correct, the nose shape was not appropriate because the author did not take advantage of the relationship between anthropometric information In [3, 4], the authors used a reference face and a corresponded skull The authors then tried to find a transformation to transform from the refefence skull to the source skull Then, that transformation was applied to transform the refefrence face to get the target face This study exploited little anthropometric information Easy to see that, with such a transformation, soft tissue thickness of the faces to be rebuilt had no difference Thus reconstructed face clearly was not carrying identity information Another studies [5, 8] also carried out the transformation of the face model to match the dowels on the target face In els can not be observed at various angles Furthermore, if using direct intensity values to calculate the depth, the result is not good for extra brightness depending on many factors such as target surface color, geometry, material, direction of the object and light Approaches based on the input image sequence are then divided into two types: (i) photometric stereo: the images of objects are taken at an angle but under different lighting [21, 22], (ii) stereopis images are taken at different angles For photometric stereo methods, the surface orientation of each object is calculated based on light intensity of the corresponding points in different images From information on the orientation of the piece surface, people work out the depth of the objects Photometric stereo method requires a good setup of the light sources and understanding of related light laws For stereopsis methods, the main problem is to find matches between pairs of features of images to determine objects structures With sparse corresponding feature points, 3D features were calculated Researchers used to use extra 3D templates of objects and deformed the templates to fit to 3D features to get the final 3D model of objects The objects often are in the form of generic model [23, 24, 25] and 3D morphable model [26, 27, 28, 29, 30] Features on the objectl appear on the camera images However, under a perspective projection the location of features on the camera images is not the location of projected features This error in localization is called shift error When we take pictures in two views (Figure 4), they will in general be rotated and translated relative to each other This can be modeled by a 2D rotation and translation of the origin The difference between locations of features on the camera images and projected features is the shift error mentioned above 3.2 Effect of shift error in 3D reconstruction Figure depicts how shift error affects in 3D reconstruction We assume that the image planes lie between the 3D object and camera The small circle representing 2D points in Ii and Ii+1 images are projected from big circle representing 3D point X In fact, when taking picture, the object is rotated in x direction So, the projected point in Ii+1 image is the small square point, not the small circle one As the result, the reconstructed 3D point (big square point) does not coincide the original 3D point The reconstructed point is pushed far away from the object Clearly, the shift error causes wrong back projection Shift error: cause, effect and solution in 3D skull reconstruction 3.1 Cause of shift error When taking pictures of an object around (by moving the camera), images are obtained in different views in horizontal (x direction) (Figure 3) In each pair of successive images, features on the second are shifted a distance e in x direction These features are almost unchanged in y direction Experiment in section would confirm this conclusion Figure Shift error pushes reconstructed point far away from the object 3.3 Accuracy improvement in 3D reconstruction Figure Camera setup for 3D reconstruction We define the notation as in Figure 4: C1 , C2 are the locations of the camera in two successive cameras or centers where αx = f kx , αy = f ky of projections Given 3D point X, x1 and x2 are the theoretical images of X through the projection of centers C1 , C2 respectively x1 and x2 are the images of X detected by certain feature detector We assume x1 = x1 Nevertheless, x2 = x2 because of shift error X is 3D reconstructed point from corresponding pair (x1 , x2 ) Angle C1 XC2 = α If XX is estimated, X instead of X could be completely reconstructed XX is estimated to pull X to X We have XX e = C1 X C1 C2 ypix ⎡ αx ⎣ 0 so ⎡ C1 X (2) (pixels) C1 C2 Figure depicts the ralation between a 3D point and its projection Let us look at 3D point X(xs , ys , zs ) and x(xi , yi , f ) In the camera with center at C, X is projected as (xi , yi , f ) The transformation is as followed: XX = e ⎡ ⎤ ⎡ u f ⎣ v ⎦=⎣ w where xi = u w, yi = v w f αx ⎣ 0 (3) and xs z yss yi = f zs x0 y0 ⎤ 0 ⎦= ⎤⎡ x0 s y0 ⎦ ⎣ 0 s αy x0 y0 ⎤ 0 ⎦ K is the calibration matrix with s ∼ 0, αx = αy = f ks From and 5, we have ⎡ xi = f s αy = K [I3 |03 ] ⎤ xs 0 ⎢ ys ⎥ ⎥ 0 ⎦.⎢ ⎣ zs ⎦ 1 ⎤ (5) and (1) C1 X XX = e (pixels) C1 C2 u w v = w xpix = (4) ks + x0 xpix pixel x xs + x0 zs zs = = = mm xi zs f xs xi (6) ks + x0 xpix pixel x xs + x0 zs zs = = = mm xi zs f xs xi (7) From two formulas and 7, we have C1 X x0 (ks + )−1 (mm) C1 C2 xi (8) x0 (ks + )−1 (mm) 2sin α2 xi (9) XX = e or XX = e Figure 3D point and 2D point relation In the image plane coordinate (Figure 6), the origin (x0 , y0 ) is at center of the image, the image of X is in pixel unit as following: ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ xs u α x s x0 ⎥ ⎢ ⎣ v ⎦ = ⎣ α y y 0 ⎦ ⎢ ys ⎥ ⎣ zs ⎦ 0 w Figure Image plane As shown in formula 9, we easilly determine the 3D point X from reconstructed point X due to shift error Experiments We use scanned skulls for the experiment These skulls are rotated The rotation has small angle steps, here set at 10 degrees by using MeshLab For each picture, images are captured in the range of 100 degrees at the resolution of 1170 × 864 To compute the projected relative shift we first extract the ground truth homography between a pair of images Features are detected by Harris [31] and SIFT [32] detector These features are matched between pair of successive images Transferring coordinates of a feature in the first image to the second image using the ground truth homography gives us the coordinates of its ideal match The difference between the coordinates of the ideal match and the coordinates of the match is ideally the projected relative drift error Figure and Figure show that, for all data sets, the projected shift error is indeed mainly in the x -direction and about zero in y -direction The result is fitted with the analysis of shift error in Section Figure Shift error on Skull where d(.) is Euclidean distance Mean error is the average distance between S1 and S2 , which is E(S1 , S2 ) = ΣS1 e(p, S2 ) ||S1 || (11) Max error is the max distance between S1 and S2 , which is Emax (S1 , S2 ) = maxp∈S1 e(p, S2 ) (12) Figure Shift error on Skull Given the ground truth poses of the cameras, we triangulate to recover the 3D location of features (left of Figure 9) For each sample of scanned skull, the distance from 3D reconstructed point and the 3D ideal point is 1.09 ∼ 1.93mm We suggest that 3D features are pulled closer approximately 1.5mm (right of Figure 9) To give an assessment, we compare mean and max errors between 3D recovered features before and after pulling We compare set of 3D recovered features before and after pulling with original scanned skulls respectively using followed formulas Let S1 be the set of 3D detected features, and S2 be the set of vertices of the scanned skull The distance from point p to surface S is estimated as: e(p, S) = minp ∈S d(p, p ) Figure 3D recovered features before (left) and after (right) pulling Table shows the error of recovered 3D features before and after pulling of 1.5 mm The mean and max errors are very small In the case, the 3D skull features are not pulled, the mean error is about 13% the average of soft tissue thickness an adult Vietnamese skull (∼ 5, 895mm the average of soft tissue thickness at 22 anthropometric landmarks on Vietnamese skull: a vertex, a trichion, two subraobitals, a (10) Skull Ebef ore 0.7271 Eaf ter 0.6271 Emax,bef ore 3.1314 Emax,af ter 2, 0312 0.7903 0.5903 2.9004 2.3032 [4] G Quatrehomme, S Cotin, G Subsol, H Delingette, Y Garidel, G Grevin, M Fidrich, P Bailet, and A Ollier, “A fully three dimensional method for facial reconstruction based on deformable models,” Journal of Forensic Science, pp 649–652, 1997 Table Mean and max errors of 3D features and 3D recovered features [5] P Vanezis, M Vanezis, G MCCombe, and T Nibllet, “Facial reconstruction using 3-d computer graphics,” Journal of Forensic Science, vol 81, no 2, pp 81–95, 2000 glabella, a nasion, two excathions, two endocathions, a rhinion, two infraorbials, two zygomatics, two alares, a subnasale, two molars, a stomion and a metal.) And the max error is about 50% After pulling the features, these errors reduce significantly: about 10,3% for mean error and 37,2% for max error Obviously, 3D pulled features are better than the reconstructed ones [6] J S Rhine and C E Moore, “Tables of facial tissue thickness of american caucasoids in forensic anthropology,” Maxwell Museum Technical Series 1, 1984 Conclusions [8] Q H Dinh, C T Ma, T D Bui, T T Nguyen, and D T Nguyen, “Facial soft tissue thicknesses prediction using anthropometric distances,” Proceeding of ACIIDS 2011, 2011 [7] K Kolja, H Jorg, and S Hans-Peter, “Reanimating the dead: Reconstruction of expressive faces from skull data,” Published in ACMTOG (SIG-GRAPH conference proceedings), vol 23, no 3, Jyly 2003 When taking pictures by moving the camera around, we have found that the error, which we called shift error is in the same direction as the viewpoint move We analyses the cause of shift error We also show mathematically the effect of the error on 3D reconstructed landmarks of skull Then, we propose an effective solution to the problem The solution could reduce the error of reconstructed 3D landmarks relative to original scanned skull The experiments consolidate all these conclusions [9] I Biederman and P Kalocsai, “Neural and psychophysical analysis of object and face recognition,” In Face Recognition: From Theory to Applications NATO ASI Series F Springer Verlag, 1998 [10] J Prag and R Neave, “Making faces.” London, UK: British Museum Press, 1997 [11] C Snow, B Gatliff, and K McWilliams, “Reconstruction of facial features from the skull: an evaluation of its usefulness in forensic anthropology.” Am J Phys Anthropol, vol 33, no 2, 1970 Acknowledgment This work is supported by the project Towards a Model of an ”Intelligent Office Enviroment”, No QGTD.10.23 [12] M Gerasimov, “The face finder.” New York, NY: Lippincott, 1971 [13] R Helmer, S Rhricht, D Petersen, and F Mhr, “Assessment of the reliability of facial reconstruction,” Forensic analysis of the skull: craniofacial analysis, reconstruction, and identification New York, Wiley Liss Publishers, pp 75–83, 1993 References [1] L Won-Joon, M W Caroline, and H Hyeon-Shik, “An accuracy assessment of forensic computerized facial reconstruction employing cone-beam computed tomography from live subjects,” In Journal of Forensic Sciences, 2011 [14] C Wilkinson and D Whittaker, “Juvenile forensic facial reconstruction: a detailed accuracy study,” Proceedings of the 10th Conference of the International Association of Craniofacial Identification, pp 11–14, September 2002 [2] K M Archer, “Craniofacial reconstruction using hierarchical b-spline interpolation.” Masters thesis University of British Columbia, Department of Electrical and Computer Engineering, 1997 [15] G Quatrehomme, T Balaguer, P Staccini, and V Alunni-Perret, “Assessment of the accuracy of three-dimensional manual craniofacial reconstruction: a series of 25 controlled cases.” Int J Legal Med, vol 121, no 6, pp 469–475, 2007 [3] S Michael and M Chen, “The 3d reconstruction of facial features using volume distortion.” In Proc 14th Eurographics UK Conference, pp 297–305, 1996 [16] B Horn and M Brooks, “Shape from shading,” MIT Press, Cambridge, MA, 1989 [29] Z Zhang, Z Liu, D Adler, M F Cohen, E Hanson, and Y Shan, “Robust and rapid generation of animated faces from video images: A model-based modeling approach,” International Journal of ComputerVision, vol 58, no 2, pp 93–119, 2006 [17] J Aloimonos, “Shape from texture,” Biological cybernetics, vol 58, no 5, pp 345–360, 1988 [30] T Russ, C Boehnen, and T Peters, “3d face recognition using 3d alignment for pca,” IEEE Conf on Computer Vision and PatternRecognition, vol 2, pp 1391– 1398, 2006 [18] T O B Gleen Healey, “Local shape from specularity,” Computer Vision, Graphics, and Image Processing, vol 42, pp 62–86, 1988 [19] F Ulupinar and R Nevatia, “Shape from contour: Homogeneous generalized cylinders and constant cross section generalized cylinders,” IEEE transactions on pattern analysis and machine intellegence, vol 12, no 2, 1995 [31] C Harris and M Stephens, “A combined corner and edge detector,” In Alvey Vision Conference, pp 147– 152, 1988 [32] D G Lowe, “Distinctive image features from scaleinvariant keypoints,” International Journal of Computer Vision, vol 60, no 2, pp 91–110, 2004 [20] S Winkelbach and F M Wahl, “Shape from 2d edge gradients,” Proceedings of the 23rd DAGMSymposium on Pattern Recognition, 2001 [21] F Solomon and K Ikeuchi, “Extracting the shape and roughness of specular lobe objects using four light photometric stereo,” IEEE, 1992 [22] J Meng and J Zhu, “Recovering 3d face models by a usb camera and a lamp,” CS682 Digital Image Processing Term Project Report, 2006 [23] R L Hsu and A K Jain, “Face modeling for recognition.” Proc Int’l Conf Image Processing (ICIP), vol 2, pp 693–696, 2001 [24] A Ansari and M Abdel-Mottaleb, “3-d face modeling using two viewsand a generic face model with application to 3-d face recognition,” IEEE Conf on Advanced Video and Signal BasedSurveillance, pp 203– 222, 2003 [25] M Z Linna, M Xiangyong, and Z Yangsheng, “Image-based 3dface modeling,” Proc of Int’l Conf on Computer Graphics, Imaging andVisualization, pp 165–168, July 2004 [26] V Blanz and T Vetter, “A morphable model for the synthesis of 3d faces,” Proc of the SIGGRAPH’99, pp 187–194, August 1999 [27] H Guo, J Jiang, and L Zhang, “Building a 3d morphable face model by using thin plate splines for face reconstruction,” LNCS, vol 3338, pp 258–267, 2004 [28] Y Hu, D Jiang, S Yan, L Zhang, and H Zhang, “Automatic 3d reconstruction for face recognition,” Proc 6th IEEE Int’l Conf on Automatic Face and Gesture Recognition, pp 843–848, 2004 10 ... Figure Skull images based 3D facial reconstruction 3D skull feature In Section 4, we perform experiments to confirm those theoretical results Related works For 3D facial reconstruction from skull, in. .. rotation and translation of the origin The difference between locations of features on the camera images and projected features is the shift error mentioned above 3.2 Effect of shift error in 3D. .. These features are matched between pair of successive images Transferring coordinates of a feature in the first image to the second image using the ground truth homography gives us the coordinates