Vision Systems - Applications Part 5 docx

Bearing-Only Vision SLAM with Distinguishable Image Features 151 landmark in the database are provided by the frame memory. For every new observation of a landmark the descriptor is compared to the existing ones and used to augment the descriptor list if it is different enough. The SIFT point descriptors are not globally unique (see Figure 2. again) and thus matching a single observation to a landmark is doomed to cause false matches in a realistic indoor environment. However, using large number of SIFT descriptors has proven to give robust matching results in object recognition applications. This is why we store, along with the landmark descriptor associated with the location of the landmark, the rest of the descriptors extracted from the same frame and use these for verification. We refer to the rest of the feature points in a frame as recognition features to distinguish them from the location feature associated with the location of the landmark. The structure of the database is shown on the right hand side in Figure 3. Each landmark F 1, ,F 2 , ,F N has a set of location descriptors shown in the dashed box. A KD-tree representation and a Best-Bin-First (Beis & Lowe, 1997) search allow for real-time matching between new image feature descriptors and those in the database. Each location descriptor has a set of recognition descriptors shown to the right. When we match to the database, we first look for a match between a single descriptor in the new frame and the location descriptors of the landmarks (dashed box Figure 3.). As a second step, we match all descriptors in the new frame to the recognition descriptors associated with candidate location descriptors for verification. As a final test, we require that the displacement in image coordinates for the two location features (new frame and database) is consistent with the transformation between the two frames estimated from the matched recognition descriptors (new frame and database). This assures that it is not just two similar structures in the same scene but that they are at the same position as well. Currently, the calculation is simplified by checking the 2D image point displacement. This final confirmation eliminates matches that are close in the environment and thus share recognition descriptors such as would be the case with the glass windows in Figure 2. 7. SLAM The previous sections have explained how we track features between frames to be able to determine which make good landmarks and how these are added to, represented in and matched to the database. In our current system, we use an EKF base implementation of SLAM. It is however important to point out that the output from the frame memory could be used as input to any number of different SLAM algorithms. It is possible to use normal EKF despite its limitation regarding complexity since most features extracted from the frames have been discarded by the matching and quality assessment process in the frame memory. Even though hundreds of features are extracted in each frame only a fraction of these are used for estimation. We are also able to supply the approximate 3D location of new landmark so that no special arrangement for this has to be added in the SLAM algorithm. This also makes the plug-n-play of SLAM algorithm easier. We use the same implementation for SLAM that was used in (Folkesson et al, 2005). This is part of the freely available CURE/toolbox software package. In (Folkesson et al, 2005) it was used for vision SLAM with a camera pointing up in the ceiling. To summarize, the division is such that the SLAM process is responsible for estimating the location of a landmark and the database for its appearance. Vision Systems: Applications 152 8. Experimental Evaluation Figure 4. The PowerBot platform with the Canon VC-C4 camera The camera used in the experimental evaluation is a Canon VC-C4 camera mounted in the front on a PowerBot platform from MobileRobotics Inc (see Figure 4.). The experimental robot platform has a differential drive base with two rear caster wheels. The camera was tilted upward slightly to reduce the amount of floor visible in the image. The field of view of the camera is about 45 degrees in the horizontal plane and 35 in the vertical plane. This is a relatively small field of view. In addition, the optical axis is aligned with the direction of motion of the platform so that it can be used for other navigation tasks. The combination of a small field of view and motion predominantly along the optical axis makes it hard to generate large baselines for triangulation. The experimental evaluation will show how we are able to build a map of the environment with few but high quality landmarks and how detection of loop closing is performed. The setting for the experiment is an area around an atrium that consists of loops of varying sizes. We let the robot drive 3 laps following approximately, but not exactly, the same path. Each lap is about 30m long. The trajectory along with the resulting map is shown in Figure 5. The landmarks are shown as small squares. Overlayed on the vision based map is a map built using a laser scanner (the lines). This second map is provided as a reference for the reader only. The laser scanner was not used at all in the vision experiments. Figure 6. shows the situation when the robot closes the loop for the first time. The lines protruding from the camera point out the points that are matched. Figure 7. shows one of the first acquired images along with the image in which the two matches shown in Figure 6. were found just as the loop is closed for the first time. There are a number of important observations that can be made. First, there are much fewer landmarks than typically seen in maps built using point landmarks and vision, see e.g. (Sim et al., 2005, Se et al., 2002). We can also see that the landmarks are well localized as they fall closely to the walls. Notice that some of the landmarks are found on lamps hanging from the ceiling and that the area in the upper left corner of Figure 6. is quite cluttered. It is a student study area and it has structures at many different depths. A photo of this area is shown in Bearing-Only Vision SLAM with Distinguishable Image Features 153 Figure 8. The line picked up by the laser scanner is the lower part of the bench where people sit and not the wall behind it. This explains why many of the points in this area do not fall on the laser-based line. Some of the spread of the point can also be explained by the small baseline. The depth error is inversely proportional to the baseline (Hartley & Zisserman, 2000). Figure 5. The landmark map with the trajectory and reference laser based map Figure 6. Situation when the first loop is closed. Lines show matched points Vision Systems: Applications 154 Another observation that can be made is that the final map contained 113 landmarks and that most of these were added to the map during the first loop (98). This indicates that landmarks were matched to the database rather than to be added to the map. Had this not been the case one would have expected to see roughly 3 times the number of landmarks. As many as half of the features in each frame typically do not match any of the old features in the frame memory and are thus matched to the database. A typical landmark in the database has around 10 descriptors acquired from different viewing angles. The matching to the database uses the KD-tree in the first step that makes this first step fast. This often results only in a few possible matching candidates. Figure 7. One of the matched points in the first loop detection (compare to Figure 6) Figure 8. Cluttered area in upper right corner of Figure 5 In the experiments, an image resolution of 320x240 was used and images were grabbed at 10Hz. Images were added to the frame buffer when the camera had moved more than 3cm and/or turned 1 degree. The entire experimental sequence contained 2611 images, out of which roughly half were processed. The total time for the experiment was 8min 40s and the processing time was 7min and 7s on a 1.8GHz laptop. This shows that it can operate under real-time conditions Bearing-Only Vision SLAM with Distinguishable Image Features 155 9. Conclusions and Future Work For enabling the autonomy of robotic systems, we have to equip them with the ability to build a map of the environment using natural landmarks and to be able to use it for localization purposes. Most of the robotic systems capable of SLAM presented so far in the literature have relied on range sensors such as laser scanners and sonar sensors. For large scale, complex environments with natural landmarks the problem of SLAM is still an open research problem. More recently, the use of cameras and machine vision as the only exteroceptive sensor has become one of the most active areas of research in SLAM. The main contributions presented in this chapter are the feature selection and matching mechanisms that allow for real-time performance even with an EKF implementation for SLAM. One of the key insights is to use few, well localized, high quality landmarks to acquire good 3D position estimates and then use the power of the many in the matching process by including all features in a frame for the verification. Another contribution is our use of a rotationally variant feature descriptor to better deal with the symmetries that are often present in indoor environments. An experimental evaluation was presented on data collected in a real indoor environment. Comparing the landmarks in the map built using vision with a map built using a laser scanner showed that the landmarks were accurately positioned. As part of the future research we plan to investigate how the estimation process can be improved by using active control of the pan-tilt degrees of freedom of the camera on the robot. By such coupling, the baseline can actively be made larger to improve triangulation/estimation results. It would also allow the system to use good landmarks, otherwise not in the field of view, to improve the localization accuracy and thus the map quality. 10. References Beis, J.S.; Lowe, D.G. (1997). Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1000-1006. Castellanos, J.A.; Tardós, J.D. (1999). Mobile Robot Localization and Map Building: A Multisensor Fusion Approach, Kluwer Academic Publishers. Davison A.J. (2003). Real-time simultaneous localisation and mapping with a single camera. Proceedings of the International Conference on Compupter vision (ICCV). Dissanayake, G.; Newman, P.; Clark, S.; Durrant-Whyte, H.F.; Corba, M. (2001). A solution to the slam building problem. IEEE Transactions on Robotics and Automation, 17, 3, 229-141 Folkesson, J.; Christensen, H.I. (2004). Graphical SLAM - A Self-Correcting Map, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Folkesson, J.; Jensfelt, P.; Christensen; H.I. (2005) Vision slam in the measurement subspace. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Frese, U.; Schröder, L. (2006) Closing a Million-Landmarks Loop, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Vision Systems: Applications 156 Goncavles, L.; di Bernardo, E.; Benson, D.; Svedman, M.; Ostrovski, J.; Karlsson, N.; Pirjanian, P. (2005) A visual fron-end for simultaneous localization and mapping. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).pp. 44—49. Gutmann, J.; Konolige, K. (1999) Incremental mapping of large cyclic environments. Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp. 318-325 Hartley, R.; Zisserman, A. (2000). Multiple View Geometry in Computer Vision, Cambridge University Press, ISBN: 0521623049 Kwok, N.M.; Dissanayake, G.; Ha, Q.P. (2005) Bearing only SLAM using a SPRT based gaussian sum filter. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Lemaire, T.; Lacroix, S.; Sola, J. (2005) A practical 3D bearing-only SLAM algorithm, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 2757-2762 Lowe, D.G. (1999) Object recognition from local scale-invariant features. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1150-57 Luke, R.H.; Keller, J.M.; Skubic, M.; Senger, S. (2005) Acquiring and maintaining abstract landmark chunks for cognitive robot navigation. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Mikolajczyk, K.; Schmid, C. (2001) Indexing based on scale invariant interest points. Proceedings of the IEEE International Conference on Computer Vision (ICCV). pp. 525- 531 Mikolajczyk, K.; Schmid, C. (2003) A performance evaluation of local descriptors. Proceedings of IEEE Conference on Computer Visiona and Pattern Recognition (CVPR), pp. 257-263. Newman, P.; Ho, K. (2005) SLAM-loop closing with visually salient features, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). pp. 644-651. Nistér, D.; Stewénius, H. (2006) Scalable Recognition with a Vocabulary Tree, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Se, S.; Lowe, D.G.; Little, J. (2002) Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. Journal of Robotics Research, 21, 8, 735-758 Sim, R.; Elinas, P.; Griffin, M.; Little, J. (2005) Vision-based slam using the rao-blackwellised particle filter, Proceedings of the Workshop on Reasoning with Uncertainty in Robotics (IJCAI) Tardós, J.D.; Neira, J; Newman, P.M.; Leonard, J.J. (2002) Robust mapping and localization in indoor environments using sonar data, Journal of Robotics Research, 4 Thrun, S.; Fox, D.; Burgard, W. (1998) A probalistic approach to concurrent mapping and localization for mobile robots, Autonomous Robots, 5, 253-271 Thrun, S.; Liu, Y.; Koller, D.; Ng, A.; Ghahramani, Z.; Durrant-White, H. (2004) SLAM with sparse extended information filters, Journal of Robotics Research, 23, 8, 690—717 Vidall-Calleja, T.; Davison, A.J.; Andrade-Cetto, J.; Murray, D.W. (2006) Active Control for Single Camera SLAM, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 1930-36 10 An Effective 3D Target Recognition Imitating Robust Methods of the Human Visual System Sungho Kim and In So Kweon Korea Advanced Institute of Science and Technology Korea 1. Introduction Object recognition is an important research topic in computer vision. Not only it is the ultimate goal of computer vision, but is also useful to many applications, such as automatic target recognition (ATR), mobile robot localization, visual servoing, and guiding visually impaired people. Great progress in this field has been made during the last 30 years. During 1970~1990, the research focused on the recognition of machine parts or polyhedral objects using edge or line information (Lowe, 2006, Faugeras & Hebert, 1986). A 2D invariant feature and hashing- based object recognition was popular during the 1990s (Mundy & Zisserman, 1992, Rothwell, 1993). Since the mid 1990s, view or appearance-based methods have become a popular approach in computer vision (Murase & Nayar, 1995). Current issues cover how to select a feature, handle occlusion, and cope with image variations in photometric and geometric distortions. Recently, object recognition methods based on a local visual patch showed successful performance in those environmental changes (Lowe, 2004, Rothganger et al., 2004, Fergus et al., 2003). But these approaches can work on textured complex object and do not provide 3D pose information of interesting objects. The goal of our research is to get the identification and pose information of 3D objects or targets from either a visible or infrared band sensor in a cluttered environment. The conventional approaches as mentioned above do not provide satisfying results. To achieve this goal more effectively, we pay attention to the perception mechanism of the human visual system (HVS), which shows the best efficiency and robustness to the above mentioned problems. Especially, we focus on the components of HVS robustness. 2. Robust Properties of HVS How have humans recognized objects robustly in a severe environment? What mechanisms cause a successful recognition of 3D objects? Based on these motivations, we researched various recent papers on psychophysical, physiological, and neuro-biological evidences and conclude the following facts: Vision Systems: Applications 158 2.1 Visual object representation in human brain The HVS uses both view-based and model-based object representation (Peters, 2000). Initially, novel views of an object are memorized, and an object-centered model is generated through training many view-based representations. Another supporting evidence of this fact is that different visual tasks may require different types of representations. For identification, view-based representations are sufficient. 3D volume-based (or object centered) representations are especially useful for visual guidance of interactions with objects, like grasping them. In this paper, the goal is object identification and estimating the pose of objects for grabbing by a service robot. Therefore, both representations are suitable for our task. 2.2 Cooperative bottom-up and top-down information Accordingly (Nichols & Newsome, 1999), not only the bottom-up process but also top-down information plays a crucial role in object recognition. Bottom-up process, called image- based, data-driven or discriminative process, begins with the visual information and analyses of smaller perception elements, then moves to higher levels. Top-down process is called knowledge-based perception, task dependent, or generative process. This process, such as high level context information (ex. place information) and expectation of the global shape, has an influence on object recognition (Siegel et al., 2000, Bar, 2004). So an image- based model is proper to the bottom-up and place context, and object-centered 3D model is suitable to top-down. The spatial attention is used to integrate separate feature maps in each process. From the detailed investigations in physiological and anatomical areas, many important functions of the bottom-up process were disclosed. Although the understanding of the neural mechanism of the top-down effects is still poor, it is certain that the object recognition is affected by both processes guided by the attention mechanism. 2.3 Robust visual feature extraction (1) Hierarchical visual attention (Treisman, 1998): The HVS utilizes three kinds of hierarchical attention: spatial, feature and object. We utilize these attentions to the proposed system. Spatial attention is performed by a high curvature point like Harris corner, feature attention is made on local Zernike moments, and 3D object attention is done by the top- down process. (2) Feature binding (Treisman, 1998): The binding problem concerns the way in which we select and integrate the separate features of objects in the correct combinations. Separate feature maps are bound by spatial visual attention. In the bottom-up process, we bind an edge map with a selected corner map and generate local structural parts. In the top-down process, we bind a gradient orientation map with gradient magnitude map focusing on a CAD model position. (3) Contrast mechanism (VanRullen, 2003): Important information is not the amplitude of a visual signal, but is the contrast between this amplitude at a given point and at the surrounding location. This fact is true in the whole recognition process. (4) Size-tuning process (Fiser et al., 2001): During object recognition, the visual system can tune in to an appropriate size sensitive to spatial extent, rather than to variations in spatial frequency. We use this concept for the automatic scale selection of the Harris corner. (5) Part-based representation (Biederman, 1987): Visual perception can be done from part information supported by RBC (recognition by components) theory. It is related to the An Effective 3D Target Recognition Imitating Robust Methods of the Human Visual System 159 properties of V4 receptive field, where the convex part is used to represent visual information (Pasupathy & Connor, 2001). A part-based representation is very robust to occlusion and background clutter. We represent visual appearance by a set of robust visual part. Motivated by these facts, many computational models were proposed in computer vision. Researchers of model-based vision regarded bottom-up/top-down processes as hypothesis/verification paradigms (Kuno et al., 1988, Zhu et al., 2000). To reduce computational complexity, visual attention mechanism is used (Milanese et al. 1994). Top- down constraint is used to recognize face and pose (Kumar, 2002). Recently, an interesting computational model (HMAX) was proposed based on the tuning and max operation of a simple cell and a complex cell, respectively (Serre & Riesenhuber, 2004). In a computer vision society, Tu et al. proposed a method of unifying segmentation, detection and recognition using boosting and prior information by learning (Tu et al., 2005). Although these approaches have their own advantages, they modeled only on partial evidences of human visual perception, and did not pay attention to the robust properties of HVS more closely. In this paper, we propose a computationally plausible model of 3D object recognition, imitating the above properties of the HVS. Bottom-up and top-down information is processed by a visual attention mechanism and integrated under a statistical framework. 3. Graphical Model of 3D Object Recognition 3.1 Problem definition A UAV (unmanned aerial vehicle) system, such as a guided missile, has to recognize an object ID (identity) and its pose from a single visible or infrared band sensor. The goal of this paper is to recognize target ID and its pose in a UAV system, using a forward-looking visible or infrared camera. The object pose information is necessary for precise targeting. We want to find the object name ( ID θ ), the object pose ( C θ : yaw θ , pitch θ , roll θ ) relative to camera coordinates in a 3D world, the object position ( :, P xy θθθ ) and the object scale ( D θ ) in a 2D image. This information is useful in various applications. Similar processes exist in a primary visual cortex: ventral stream (what pathway) and dorsal stream (where pathway). The recognition problem can be formulated as the Bayesian inference by (|) (| , ) ( |, )(| ) ( | ,,,, )( ,,, | ) {, } LC L C C LIDCDPC IDCDP C LC PIPZZ PZ ZPZ PZ Z P Z where I Z Z θθθθ θθθθ =∝ = = șș șș (1) where ș means the parameter set as explained, I denotes input image, and it is composed of two sets: L Z for object related local features C Z for place or scene related contextual features. The likelihood of the equation (1), the first factor (|,) LC PZ Zș represents the posterior distribution of local features, such as local structural patch, edge information given parameters and contextual information. There is a lot of contextual information, but we restrict the information as place context and a 3D global shape for our final goal. This information alleviates the search space and provides accurate pose information. The second factor (| ) C PZș provides context-based priors on object ID, pose which are related to the Vision Systems: Applications 160 scene information by learning. This can be represented as a graphical model in a general form as Figure 1 (Borgelt et al., 2001). Scene context information can be estimated in a discriminative way using contextual features C Z . Using the prior learning between scene and objects, initial object probabilities can be obtained from sensor observation. Initial pose information is also estimated in a discriminative way. Given those initial parameters, fine pose tuning is performed using a 3D global shape and sensor measurements, such as gradient magnitude and gradient orientation. … Place context 3D shape context View index Part index Input feature … … I P Z C Z L2 Z L1 … Z Ln O i O j V i1 P i2 V ik P j1 P j2 P i1 Z L3 V j1 V jk … Place context 3D shape context View index Part index Input feature … … I P Z C Z L2 Z L1 … Z Ln O i O j V i1 P i2 V ik P j1 P j2 P i1 Z L3 V j1 V jk Figure 1. Graphical model of context-based object recognition: shaded circles mean observations and clear circles mean hidden variables In the above graphical model, final parameters can be inferred from a discriminative method (bottom-up reasoning, such as directed arrows) and a generative method (top-down reasoning) with contextual information. To find an optimal solution from the equation (1), a MAP (maximum a posteriori) method is used generally. But it is difficult to obtain a correct posterior for a high dimensional parameter space (in our case 7 dimension). We bypass this problem by a statistical technique, drawing samples using a Markov Chain Monte Carlo (MCMC) technique (Green, 1996). The MCMC method is theoretically well-proved and a suitable global optimization tool for combining bottom-up and top-down information, which reveals superiority to genetic algorithm or simulated annealing although there are some analogies to the Monte Carlo method (Doucet et al., 2001). MCMC-like mechanism may not exist in the HVS, but it is a practically plausible inference technique in a high dimensional parameter space. Proposal samples generated from a bottom-up process achieve fast optimization or reduce burn-in time. 3.2 Basics of MCMC A major problem of Bayesian inference is that obtaining the posterior distribution often requires the integration of high-dimensional functions. The Monte Carlo (or sampling) method approximates the posterior distribution as weighted particles or samples (Doucet et al., 2001, Ristic et al., 2004). The simplest kind is importance sampling, where random samples x are generated from ()PX , the prior distribution of hidden variables, and then weight the samples with their likelihood (|)Py x . A more efficient approach in high dimension is called the Markov Chain Monte Carlo (MCMC), a subset of particle filter. The Monte Carlo means samples and the Markov Chain means that the transition probability of samples depends only on a function of the most recent sample value. The theoretical [...]... relaxation Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, 78 1-7 85, Seattle, USA , June Mundy, J & Zisserman, A (1992) Geometric invariance in computer vision, 33 5- 4 60, MIT Press, Cambridge, MA Murase, H & Nayar, S (19 95) Visual learning and recognition of 3-D objects from appearance International Journal of Computer Vision, Vol 14, 5- 2 4 Nichols, M.J & Newsome, W T (1999) The neurobiology... No 2, C 3 5- C38 Parkhurst, D.; Law, K & Niebur, E (2002) Modeling the role of salience in the allocation of overt visual attention Vision Research, Vol 42, 10 7-1 23 Pasupathy, A & Connor, C.E (2001) Shape representation in area V4: position-specific tuning for boundary conformation Journal of Neurophysiology, Vol 86, No 5, 250 5 251 9 Peters, G (2000) Theories of three-dimensional object perception - A Survey... Transworld Research Network, Part- I, Vol 1, 17 9-1 97 Reisfeld, D.; Wolfson, H & Yeshurun, Y (19 95) Context-free attentional Operators: the generalized symmetry transform International Journal of Computer Vision, Vo 14, No 2, 11 9-1 30 Ristic, B.; Arulampalam, S & Gordon, N (2004) Beyond the Kalman filter: particle filters for tracking applications, Artech House Publishers, London, 3 5- 6 2 Robert, C.P & Casella,... Konig, P (2000) Integrating top-down and bottom-up sensory processing by somato-dendritic interactions Journal of Computational Neuroscience, Vol 8, 16 1-1 73 Treisman, A (1998) Feature binding, attention and object perception Philosophical Transactions: Biological Sciences 29, Vol 353 , No 1373 129 5- 1 306 180 Vision Systems: Applications Tu, Z.; Chen, X.; Yuille, A & Zhu, S.C (20 05) Image parsing: unifying... Recognition, Vol 938, 44 4-4 53 Lindeberg, T (1998) Feature detection with automatic scale selection International Journal of Computer Vision, Vol 30, No 2, 7 7-1 16 An Effective 3D Target Recognition Imitating Robust Methods of the Human Visual System 179 Lowe, D.G (1987) Three-dimensional object recognition from single two-dimensional images Artificial Intelligence, Vol 31, No 3, 35 5- 3 95 Lowe, D.G (2004)... difference Therefore the light signal is modulated again with a much longer wavelength 184 Vision Systems: Applications Figure 2 Drawing of the phase-difference time-of-flight measurement principle A modulated light signal is split into a reference- and a measurement signal The measured phase-difference gives the time-of-flight of the signal and thus the distance As shown in figure 2 the modulated light... pose information and decision of the existence in the top-down process as Figure 2 Figure 3 Configuration of the database: scene context + 3D CAD model + part- based view representation 164 Vision Systems: Applications 4.3 View-based model representation Basically, the HVS memorizes objects in an orientation dependent, view-based or appearance-based way (Edelman & B lthoff, 1992) We quantize the view... Journal of Computer Vision, Vol 63, No 2, 11 3-1 40 VanRullen, R (2003) Visual saliency and spike timing in the ventral visual pathway Journal of Physiology (Paris) 97, 36 5- 3 77 Zhu, S.C.; Zhang, R & Tu Z (2000) Integrating bottom-up/top-down for object recognition by data driven Markov Chain Monte Carlo Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 73 8-7 45, Hilton Head, SC,... principles: stereo vision, motion parallax and a-priori knowledge about the perspective appearance of objects in dependency of their distance These tasks pose a challenge to computer vision since decades Today the most common techniques for 3D sensing are CCD- or CMOS-camera, laser scanner or 3D time-of-flight camera based Even though evolution has shown predominance for passive stereo vision systems, three... tuning in object recognition Vision Research, Vol 41, No 15, 193 1-1 950 Green, P (1996) Reversible jump Markov Chain Monte Carlo computation and Bayesian Model Determination, Champman and Hall, London Harris, C.J & Stephens, M (1988) A combined corner and edge detector In Proceedings of 4th Alvey Vision Conference, 14 7-1 51 , Manchester Kim, S & Kweon, I.S (20 05) Automatic model-based 3D object recognition . using scale-invariant visual landmarks. Journal of Robotics Research, 21, 8, 73 5- 7 58 Sim, R.; Elinas, P.; Griffin, M.; Little, J. (20 05) Vision- based slam using the rao-blackwellised particle. Robots and Systems (IROS) Vision Systems: Applications 156 Goncavles, L.; di Bernardo, E.; Benson, D.; Svedman, M.; Ostrovski, J.; Karlsson, N.; Pirjanian, P. (20 05) A visual fron-end for simultaneous. and neuro-biological evidences and conclude the following facts: Vision Systems: Applications 158 2.1 Visual object representation in human brain The HVS uses both view-based and model-based

Định dạng
Số trang	40
Dung lượng	0,95 MB