BUILD THE DEPTH IMAGE USING STEREO IMAGES
Trường: Đại Học Công Nghệ, Đại Học Quốc Gia Hà Nội CÔNG TRÌNH DỰ THI GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC” NĂM 2012 Tên công trình: BUILD THE DEPTH IMAGE USING STEREO IMAGES Họ và tên sinh viên: Nguyễn Văn Hoan Nam, Nữ: Nam Lớp: K53CA Khoa: Khoa Học Máy Tính Người hướng dẫn: Lê Thanh Hà VIET NAM NATIONAL UNIVERSITY, HA NOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Project Title: BUILD THE DEPTH IMAGE USING STEREO IMAGES Student Name: Nguyen Van Hoan Class: K53CA QHI 2008 Faculty: Computer Science Supervisor: PhD. Le Thanh Ha Ha Noi 2012 Abstract Building of depth maps from images plays an extremely important role in the field of machine vision. Stereo vision is one of the most direct ways of obtaining this from image data. Stereo vision has wide range of potential application areas including; three dimensional reconstruction and robot vision. However, a set of constraints can be used to instruct the correspondence algorithms depending on the characteristics of the image data set. For example, if camera calibration is available, epipolar constraints can be used. If the images are generated under constrained lighting conditions the images can display photometric properties allowing direct pixel matching. All of these factors have an influence on the quantity of depth map by an algorithm. In the past, many researches solved successfully a wide variety of stereo matching problems but they used combinations of the above constraints. In conclusion, stereo matching with many above constraints is still a huge problem. In my report, I will take an overall view in building depth map process, evaluate some successful stereo matching algorithms and develop a solution for building depth map of images input from Minoru webcam without any constraints. Outline Contents I. Introduction 5 I.1 Depth Map Concept 5 I.2 Application of Depth Map (3D Image) 5 I.3 Some Three Dimensional Data Acquisition Systems 6 II. Stereo Vision Technique 7 II.1 Stereo Vision Concept 7 II.2 Disparity Map 8 II.3 Correspondence 9 II.4 Epipolar Constraint 10 III. Stereo Correspondence Algorithms 11 III.1 Stereo Correspondence Algorithm Classification 11 III.2 Vision Correspondence Algorithm Evaluation 12 IV. Build Depth Map of Minoru Stereo Camera‟s Image 17 IV.1 Minoru Stereo Camera Overview 17 IV.2 Build Depth Map Using Minoru Image 18 a. Calibration and Rectification Process 18 b. Algorithm for Build Depth Map 20 c. Experimental Result 21 V. Conclusion 25 Reference 26 I. Introduction I.1 Depth Map Concept In three-dimensional computer graphic, depth map or 3D image is an image that contains information relating to the distance of the surfaces of scene objects from the view of a camera. Depth map is similar to a grey scale image except the z information which is replaces by the intensity information and stores in the pixel element. The "Z" in this term relates to a convention that the central axis of view of a camera is in the direction of the camera's Z axis, and not to the absolute Z axis of a scene. Examples Cubic Structure Depth Map: Nearer the camera is brighter. I.2 Application of Depth Map (3D Image) Depth map or 3D image has more advantages than 2D image: Explicit Geometry Two dimensional images give limited information about the physical shape and size of an object in a scene. Three dimensional images express the geometry in terms of three dimensional coordinates. Size and shape of an object in a scene can be directly computed from its three dimensional coordinates. Depth map has a number of uses, including: Simulating the effect of uniformly dense semi-transparent media within a scene. Rendering of 3D scenes more efficient. They can be used to locate objects hidden from view and build a visual map of real world. Their applications are wide in areas including architecture, medicine, robotics and augmented reality. Mapping shadow - part of one process used to create shadows cast by illumination in 3D computer graphics. Providing the distance information needed to create the illusion of 3D viewing through stereoscopy. I.3 Some Three Dimensional Data Acquisition Systems With recent technological advances in camera optics, camera calibration, laser sensor and stereo technique…, many three dimensional data acquisition systems have been developed. Their productions, depth maps, are very reliable and accurate. Stereo vision technique: Two or more cameras are located, calibrated and rectified to acquire simultaneously snapshots of the subject. The depth information for each point can be calculated from geometrical models by solving a correspondence problem. This method is the lowest cost but highest ease of use. Laser sensor: This technique is more accurate, but slower and more expensive than other above technique. The acquisition of a single 3D house scan can take several minutes. These are some restricting factors for systems using this technique. II. Stereo Vision Technique II.1 Stereo Vision Concept As mentioned in the previous section, in stereo vision technique: Two or more cameras that are located, calibrated and rectified are to acquire snapshots of the subject at the same time. The depth information for each point can be calculated from geometrical models by solving a correspondence problem. This technique„s strategy is based on functioning of the human visual system, obtaining three dimensional information from two images, called stereoscopic or vision binocular. Each eye projects an image on the retina that is transmitted to the brain where information is obtained three dimensional from the disparity between the images. It is clear that the process is more complex than that, including other factors such as reasoning and knowledge previous objects.[ Jurgen Leitner] For example, in the case of two images, the depth information is obtained by determining the disparity between corresponding points on both images. Example: Figure 2 - Pair of stereo images. When working with more than two images, the calculation of disparity is done in pairs, the process is carried out successively between each pair of images. One result is commonly presented map of disparities in which the clear tones of gray represent objects near and dark tones represent distant objects (Figure 3). It is a representation two- dimensional, one-dimensional scenario, where the depth is presented as grayscale. Figure 3 - Map of disparities. II.2 Disparity Map In computer vision, disparity is often treated as synonymous with inverse depth. More recently, several researchers have defined disparity as a three dimensional projective transformation of 3-D space (X, Y, Z). The enumeration of all possible matches in such a generalized disparity space can be easily achieved with a plane sweep algorithm, which for every disparity d projects all images onto a common plane using. Be a point P and its two images: the image on the left PL and PR in the right picture. Be the focal distance between the two cameras. Whereas the point P has a Z-depth and a lateral displacement X relative to the camera from left and PL is still in coordinated PR XL and XR in coordinated, Figure 4 illustrates the situation described. P Figure 4 - Illustration of the geometry stereo cameras representing two identical and parallel focal length f and b the distance between them. It is concluded that by similar triangles X l f = X Z (2) X r f = X + b Z (3) d = X R X L = f b Z (4) [Stefano Mattoccia] For the disparity d, we have: Note that the disparity is directly proportional to the focal length f distance b between the cameras, and is inversely proportional to depth Z. Since the focal length and distance between the cameras are constant for a given pair of images, a map of disparities provides an inverse relationship of the depths of the scene. II.3 Correspondence But, as can be concluded that in a couple of images, a point PL corresponds to the same point P, and PR is the point? To solve this problem search for matches is the difficult part of stereo vision. The ambiguities due to difficulties include textured surfaces and structures repetitive regions with intensities uniform, uncertainties caused by noise and partial occlusion visibility of a point only one camera, due to parallax. There are basically two ways to look for matches. One, it points to a greater degree of certainty, is the idea behind the stereo algorithms based on feature-based feature, where demand by mail is restricted to parties such as edges and corners, creating this form a sparse disparity map, which must be interpolated if there the need for in depth information points not treated. The other demands this form of the image by regions (or areas) that contain enough information to provide unambiguous correspondence. This form is known as area-based, area-based server that has the advantage of produce a map full of gaps, but with possible errors, since than distant points of the edges and corners have a lower degree of certainty in the correspondence. [Elas] II.4 Epipolar Constraint The main work of correspondence is finding the matching points between two images. Having a point in one image where search the corresponding point in other image should? Assuming two cameras whose centers of projection are CL and CR and its image planes L and R. Whereas the PL is the projection of a point P in L, the point P on the line must be defined by PL and CL, so its projection in R and R must belong to a straight, this is known as straight line epipolar (Figure 5). The epipolar geometry is the relationship between points in an image and its corresponding epipolar line in another image. This assumption is made by almost number of stereo vision algorithms. [...]... and then build the depth map Note that: I just use the snapshot captured from Minoru camera and this snapshot haven‟t been calibrated and rectified Figure 6: Minoru Image hasn‟t calibrated and rectified IV.2 Build Depth Map Using Minoru Image a Calibration and Rectification Process As mentioned above, the epopilar constraint of input images plays a vital role in the correspondence process for building... natural image input The natural image input means the image haven‟t calibrated and rectified However the standard or ground truth depth map results haven‟t done yet, it‟s hard for evaluating the experimental result Another issue is my project has only done the rectification step of each individual lens of stereo vision On these other hand, depth map result is quite good and sharp This can be input of other... employed In my project, using a chess board consists of 10 rows and 7 columns Figure 7: the chessboard with 10*7 size The method has two main steps: The first step is shift rectify In this step, adjusting the vertical of two images satisfies mostly the epipolar constraint It means that some area in the image are satisfies the epipolar constraint As see in the below figure: Figure 8: The left is before shift... rectification: Input image: Depth result: After apply shift rectification: Input image: Depth image: more shaper and better After shift rectification and calibration two individual lens Input Image: Depth Image: More sharper Because I can‟t create the ground truth of scene, the depth in each step is only evaluated visually V Conclusion In conclusion, my task has gotten successfully something when using calibration... Nguyen] and the image data should in all orient directions After many times, the method obtained good intrinsic and distortion matrices Because of the final result (depth map) is quite good Figure 10: intrinsic and distortion matrix of right and left lens of Minnoru Cammera b Algorithm for Build Depth Map The algorithm for build depth image which I use in my project is a function in OpenCV library The function... faster than the global algorithms IV Build Depth Map of Minoru Stereo Camera’s Image IV.1 Minoru Stereo Camera Overview Unlike most webcams that capture regular photos, this new gadget captures still photos and live videos in 3D The three-dimensional effect is achieved thanks to the recipient‟s special 3D glasses The webcam utilizes two lenses which enables 3D capturing The reason for the dualinput... FindStereoCorrespondence Calculates disparity for stereo- pair cvFindStereoCorrespondence( const CvArr* leftImage, const CvArr* rightImage, int mode, CvArr* depthImage, int maxDisparity, double param1, double param2, double param3, double param4, double param5 ); c Experimental Result After applying my method for calibrating and rectifying camera, the depth image output is better than result before using. .. rectification, the right is after shift recitification Many regions haven‟t deal with the constraint One real point lies on different verticals in right and left image Those are influences of the distortion and un-calibration Figure 9: Many regions haven‟t satisfied the epopilar constraint The second step calculates some intrinsic parameter of the camera This step allows for accurate alignment of the images. .. enables the evaluation of disparity map All stereo correspondence algorithms are in the OpenCV library About data sets, I have collected several standard two-frame stereo data sets with ground truth on the website http://vision.middlebury.edu /stereo/ Something about the OpenCV library, OpenCV which stand for Open Source Computer Vision, is an open library of programming functions for the computer vision The. .. correspondence algorithms in the following: - The best settings for many parameters of many correspondence algorithms vary depending on the input image pair, I often have to compromise and select a value that works reasonably well for several images tested For example, how to choose the best max disparity in some global algorithms and the best window size in some - local algorithms The local correspondence