Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 35 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
35
Dung lượng
2,17 MB
Nội dung
VisualOdometryandmappingforunderwaterAutonomousVehicles 379 4.4. Topological Maps Tests to validate the mapping system proposed were performed. For example, during a navigation task s set of 1026 frames were captured. From these frames, a total of 40.903 vectors are extracted from SIFT feature algorithm. To build the map, 1026 frames and 40903 keypoints are presented to the SOM. Figure 11 show the final 2D map, discretizing the input space of the training samples. Fig. 11. Topological Map generated by ROVFURGII in movement. a) Building the map: when a new keypoint arrives, the topological map determines the feature vector of the reference node that best matches the input vector. The Growing Cell Structures (GCS) method allows the creation and removal of the nodes during the learning process. Table II shows intermediate GCS adaptation steps with number of frames, keypoints and SOM nodes. After the training stage (1026 frames), the kohonen map represents the relevant and noise tolerant descriptors space using a reduced number of nodes. This SOM can be used to locate the robot during the navigation. Frames Keypoints Nodes 324 20353 280 684 35813 345 1026 44903 443 Table 2. building the map with gcs algorithm b) Location of robot on the map: New frames are captured during the navigation. We use the trained SOM to map/locate the robot in the environment. Figure 12 shows the estimated position of a navigation task. In this task the robot crosses three times the position 0.0. In this figure we can see the position estimated by both the SOM map (blue) and only by visual odometry (red). In the crossings, table III shows the normalized errors of positioning in each of the methods. The reduced error associated with the SOM localization validate the robusteness of topological approach. Table 3. Normalized localization errors of only visual odometry and som. Fig. 12. Distance Y generated by ROVFURGII in movement. Visual Odometry SOM 0.33 0.09 0.68 0.35 1.00 0.17 RobotLocalizationandMapBuilding380 5. Conclusion This work proposed a new approach to visual odometry and mapping of a underwater robot using only online visual information. This system can be used either in autonomous inspection tasks or in control assistance of robot closed-loop, in case of a human remote operator. A set of tests were performed under different underwater conditions. The effectiveness of our proposal was evaluated inside a set of real scenario, with different levels of turbidity, snow marine, non-uniform illumination and noise, among others conditions. The results have shown the SIFT advantages in relation to others methods, as KLT, in reason of its invariance to illumination conditions and perspective transformations. The estimated localization is robust, comparing with the vehicle real pose. Considering time performance, our proposal can be used to online AUV SLAM, even in very extreme sea conditions. The correlations of interest points provided by SIFT were satisfying, even though with the presence of many outliers, i.e., false correlations. The proposal of use of fundamental matrix estimated in robust ways in order to remove outliers through RANSAC and LMedS algorithms. The original iintegration of SIFT and topological maps with GCS for AUV navigation is a promising field. The topological mapping based on Kohonen Nets and GCS showed potencial potential to underwater SLAM applications using visual information due to its robustness to sensory impreciseness and low computational cost. The GCS stabilizes in a limited number of nodes sufficient to represent a large number if descriptors in a long sequence of frames. The SOM localization shows good results, validating its use with visual odometry. 6. References [Arredondo05] Miguel Arredondo and Katia Lebart. A methodology for the systematic assessment of underwater video processing algorithms. Oceans - Europe, 1:362– 367, June 2005. [Bay06] H. Bay, T. Tuytelaars, and L.booktitle = SURF: Speeded Up Robust Features Van Gool. Surf: Speeded up robust features. In 9th European Conference on Computer Vision, pages 404–417, 2006. [Booij07] O. Booij, B. Terwijn, Z. Zivkovic, and B. Krose. Navigation using an appearance based topological map. In IEEE International Conference on Robotics and Automation, pages 3927–3932, April 2007. [Centeno07] Mario Centeno. Rovfurg-ii: Projeto e construção de um veículo subaquático não tripulado de baixo custo. Master’s thesis, Engenharia Oceânica - FURG, 2007. [Dechter85] Rina Dechter and Judea Pearl. Generalized best-first search strategies and the optimality af a*. Journal of the Association for Computing Machinery, 32(3):505– 536, July 1985. [Dijkstra59] Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. [Fischler81] Martin Fischler and Robert Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [Fleischer00] Stephen D. Fleischer. Bounded-Error Vision-Based Navigation of Autonomous Underwater Vehicles. PhD thesis, Stanford University, 2000. [Fritzke93] Bernd Fritzke. Growing cell structures - a self-organizing network for unsupervised and supervised learning. Technical report, University of California - Berkeley, International Computer Science Institute, May 1993. [Garcia01] Rafael Garcia, Xavier Cufi, and Marc Carreras. Estimating the motion of an underwater robot from a monocular image sequence. In IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 1682–1687, 2001. [Garcia05] Rafael Garcia, V. Lla, and F. Charot. Vlsi architecture for an underwater robot vision system. In IEEE Oceans Conference, volume 1, pages 674–679, 2005. [Gracias02] N. Gracias, S. Van der Zwaan, A. Bernardino, and J. Santos-Vitor. Results on underwater mosaic-based navigation. In IEEE Oceans Conference, volume 3, pages 1588–1594, october 2002. [Gracias00] Nuno Gracias and Jose Santos-Victor. Underwater video mosaics as visual navigation maps. Computer Vision and Image Understanding, 79(1):66–91, July 2000. [Hartley04] Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2004. [Kohonen01] Teuvo Kohonen. Self-Organizing Maps. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2001. [Lowe04] David Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [Mahon04] I. Mahon and S. Williams. Slam using natural features in an underwater environment. In Control, Automation, Robotics and Vision Conference, volume 3, pages 2076–2081, December 2004. [Nicosevici07] T Nicosevici, R. García, S. Negahdaripour, M. Kudzinava, and J Ferrer. Identification of suitable interest points using geometric and photometric cues in motion video for efficient 3-d environmental modeling. In International Conference in Robotic and Automation, pages 4969–4974, 2007. [Plakas00] K. Plakas and E. Trucco. Developing a real-time, robust, video tracker. In MTS/IEEE OCEANS Conference and Exhibition, volume 2, pages 1345–1352, 2000. [Rousseeuw84] Peter Rousseeuw. Least median of squares regression. Journal of the American Statistics Association, 79(388):871–880, December 1984. [Se02] Stephen Se, David Lowe, and James Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research, 21(8):735–758, 2002. [Se05] Stephen Se, David Lowe, and James Little. Vision-based global localization and mapping for mobile robots. IEEE Transactions on Robotics, 21(3):364–375, June 2005. [Shi94] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition, pages 593–600, 1994. [Tomasi91] Carlos Tomasi and Takeo Kanade. Detection and tracking of point features. Technical report, Carnegie Mellon University, April 1991. VisualOdometryandmappingforunderwaterAutonomousVehicles 381 5. Conclusion This work proposed a new approach to visual odometry and mapping of a underwater robot using only online visual information. This system can be used either in autonomous inspection tasks or in control assistance of robot closed-loop, in case of a human remote operator. A set of tests were performed under different underwater conditions. The effectiveness of our proposal was evaluated inside a set of real scenario, with different levels of turbidity, snow marine, non-uniform illumination and noise, among others conditions. The results have shown the SIFT advantages in relation to others methods, as KLT, in reason of its invariance to illumination conditions and perspective transformations. The estimated localization is robust, comparing with the vehicle real pose. Considering time performance, our proposal can be used to online AUV SLAM, even in very extreme sea conditions. The correlations of interest points provided by SIFT were satisfying, even though with the presence of many outliers, i.e., false correlations. The proposal of use of fundamental matrix estimated in robust ways in order to remove outliers through RANSAC and LMedS algorithms. The original iintegration of SIFT and topological maps with GCS for AUV navigation is a promising field. The topological mapping based on Kohonen Nets and GCS showed potencial potential to underwater SLAM applications using visual information due to its robustness to sensory impreciseness and low computational cost. The GCS stabilizes in a limited number of nodes sufficient to represent a large number if descriptors in a long sequence of frames. The SOM localization shows good results, validating its use with visual odometry. 6. References [Arredondo05] Miguel Arredondo and Katia Lebart. A methodology for the systematic assessment of underwater video processing algorithms. Oceans - Europe, 1:362– 367, June 2005. [Bay06] H. Bay, T. Tuytelaars, and L.booktitle = SURF: Speeded Up Robust Features Van Gool. Surf: Speeded up robust features. In 9th European Conference on Computer Vision, pages 404–417, 2006. [Booij07] O. Booij, B. Terwijn, Z. Zivkovic, and B. Krose. Navigation using an appearance based topological map. In IEEE International Conference on Robotics and Automation, pages 3927–3932, April 2007. [Centeno07] Mario Centeno. Rovfurg-ii: Projeto e construção de um veículo subaquático não tripulado de baixo custo. Master’s thesis, Engenharia Oceânica - FURG, 2007. [Dechter85] Rina Dechter and Judea Pearl. Generalized best-first search strategies and the optimality af a*. Journal of the Association for Computing Machinery, 32(3):505– 536, July 1985. [Dijkstra59] Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. [Fischler81] Martin Fischler and Robert Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [Fleischer00] Stephen D. Fleischer. Bounded-Error Vision-Based Navigation of Autonomous Underwater Vehicles. PhD thesis, Stanford University, 2000. [Fritzke93] Bernd Fritzke. Growing cell structures - a self-organizing network for unsupervised and supervised learning. Technical report, University of California - Berkeley, International Computer Science Institute, May 1993. [Garcia01] Rafael Garcia, Xavier Cufi, and Marc Carreras. Estimating the motion of an underwater robot from a monocular image sequence. In IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 1682–1687, 2001. [Garcia05] Rafael Garcia, V. Lla, and F. Charot. Vlsi architecture for an underwater robot vision system. In IEEE Oceans Conference, volume 1, pages 674–679, 2005. [Gracias02] N. Gracias, S. Van der Zwaan, A. Bernardino, and J. Santos-Vitor. Results on underwater mosaic-based navigation. In IEEE Oceans Conference, volume 3, pages 1588–1594, october 2002. [Gracias00] Nuno Gracias and Jose Santos-Victor. Underwater video mosaics as visual navigation maps. Computer Vision and Image Understanding, 79(1):66–91, July 2000. [Hartley04] Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2004. [Kohonen01] Teuvo Kohonen. Self-Organizing Maps. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2001. [Lowe04] David Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [Mahon04] I. Mahon and S. Williams. Slam using natural features in an underwater environment. In Control, Automation, Robotics and Vision Conference, volume 3, pages 2076–2081, December 2004. [Nicosevici07] T Nicosevici, R. García, S. Negahdaripour, M. Kudzinava, and J Ferrer. Identification of suitable interest points using geometric and photometric cues in motion video for efficient 3-d environmental modeling. In International Conference in Robotic and Automation, pages 4969–4974, 2007. [Plakas00] K. Plakas and E. Trucco. Developing a real-time, robust, video tracker. In MTS/IEEE OCEANS Conference and Exhibition, volume 2, pages 1345–1352, 2000. [Rousseeuw84] Peter Rousseeuw. Least median of squares regression. Journal of the American Statistics Association, 79(388):871–880, December 1984. [Se02] Stephen Se, David Lowe, and James Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research, 21(8):735–758, 2002. [Se05] Stephen Se, David Lowe, and James Little. Vision-based global localization and mapping for mobile robots. IEEE Transactions on Robotics, 21(3):364–375, June 2005. [Shi94] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition, pages 593–600, 1994. [Tomasi91] Carlos Tomasi and Takeo Kanade. Detection and tracking of point features. Technical report, Carnegie Mellon University, April 1991. RobotLocalizationandMapBuilding382 [Tommasini98] T. Tommasini, A. Fusiello, V. Roberto, and E. Trucco. Robust feature tracking in underwater video sequences. In IEEE OCEANS Conference and Exhibition, volume 1, pages 46–50, 1998. [Torr97] P. H. S. Torr and D. W. Murray. The development and comparison of robust methods for estimating the fundamental matrix. International Journal of Computer Vision, 24(3):271 – 300, 1997. [Xu97] Xun Xu and Shahriar Negahdaripour. Vision-based motion sensing for underwater navigation and mosaicing of ocean floor images. In MTS/IEEE OCEANS Conference and Exhibition, volume 2, pages 1412–1417, October 1997. ADaisy-ChainingVisualServoingApproachwith ApplicationsinTracking,Localization,andMapping 383 A Daisy-Chaining Visual Servoing Approach with Applications in Tracking,Localization,andMapping S.S.Mehta,W.E.Dixon,G.HuandN.Gans 0 A Daisy-Chaining Visual Servoing Approach with Applications in Tracking, Localization, and Mapping S. S. Mehta, W. E. Dixon University of Florida, Gainesville USA G. Hu Kansas State University, Manhattan USA N. Gans University of Texas, Dallas USA 1. Introduction Recent advances in image processing, computational technology and control theory are en- abling vision-based control, localization and mapping to become more prevalent in au- tonomous vehicle applications. Instead of relying solely on a global positioning system (GPS) or inertial measurement units (IMU) for navigation, image-based methods are a promising approach to provide autonomous vehicles with position and orientation (i.e., pose) informa- tion. Specifically, rather than obtain an inertial measurement of an autonomous vehicle, vision systems can be used to recast the navigation, localization, control and mapping problems in terms of the image space. Applications involving localization and mapping using camera as a sensor are often described as Visual Simultaneous Localization and Mapping (VSLAM) (Davison et al., 2007; Eustice et al., 2005; Goncalves et al., 2005; Jensfelt et al., 2006; Jung & Lacroix, 2003; Kim & Sukkarieh, 2003; Se et al., 2002), wherein the camera is the main sensor used to estimate the location of a robot in the world, as well as estimate and maintain estimates of surrounding terrain or features. There are many overlapping ways to categorize VSLAM approaches. Some authors (e.g., (Eustice et al., 2005; Jensfelt et al., 2006; Se et al., 2002)) make a distinction between “local VSLAM” and “global VSLAM”. Many VSLAM approaches use probabilistic filters (e.g., ex- tended Kalman filter or particle filter) (Davison et al., 2007; Eustice et al., 2005; Jensfelt et al., 2006; Jung & Lacroix, 2003; Kim & Sukkarieh, 2003), typically estimating a state vector com- posed of the camera/robot position, orientation and velocity, and the 3D coordinates of visual features in the world frame. An option to a filtered based approach is the use of epipolar geometry (Goncalves et al., 2005; Se et al., 2002). A final possible category are methods that 20 RobotLocalizationandMapBuilding384 build a true 3D map (i.e., a map that is easily interpreted by a human being such as walls or topography) (Eustice et al., 2005; Jensfelt et al., 2006; Jung & Lacroix, 2003; Kim & Sukkarieh, 2003; Se et al., 2002), and those that build a more abstract map that is designed to allow the camera/robot to accurately navigate and recognize its location, but not designed for human interpretation. From the navigation perspective, vision-based pose estimation has motivated results such as (Baker & Nayar, 1999; Burschka & Hager, 2001; Chen et al., 2006; Das et al., 2001; Dixon et al., 2001; Fang et al., 2005; Hagar et al., 1998; Kim et al., 2001; Ma et al., 1999; Song & Huang, 2001) and others, where a camera provides feedback information to enable autonomous navigation of a control agent. See (Chen et al., 2006) for a detailed review of these and other related results. Typically these results are focused on the regulation result, and in all the results the targets are static with respect to the moving camera or the camera is stationary and recording images of the moving control agent. Vision-based cooperative control methods can involve a moving camera supplying regula- tion/tracking control input to a moving control agent. A practical example application of this scenario is an airborne camera attached to a remote controlled aircraft that is used to deter- mine a desired video of an unmanned ground vehicle (UGV) moving in a terrain, and then another moving camera (which does not have to follow the same trajectory as the previous camera) is used to relate and control the pose of a moving UGV with respect to the recorded video. The challenge here is to account for the relative velocity between the moving camera and the moving UGV. Also, the reference objects (or features) used to evaluate the pose can leave the camera’s field-of-view (FOV) while new reference object enters the FOV. In this sce- nario, the vision-based system should be intelligent to switch from the leaving reference object to a new reference object to provide the pose information to the controller. This chapter uses a new daisy-chaining method for visual servo tracking control of a rigid- body object, such as an UGV, while providing localization of the moving camera and moving object in the world frame, and mapping the location of static landmarks in the world frame. Hence, this approach can be used in control and local VSLAM of the UGV, with applications toward path planning, real time trajectory generation, obstacle avoidance, multi-vehicle coor- dination control and task assignment, etc. By using the daisy-chaining strategy, the coordi- nates of static features out of the FOV can also be estimated. The estimates of static features can be maintained as a map, or can be used as measurements in existing VSLAM methods. Section 2 introduces the imaging model, geometric model used in this chapter, as well as introduces the daisy-chaining method as applied to the case of controlling a six-DOF planar object through visual data from a moving camera and fixed reference camera. These results are extended to the case of an UGV with nonholonomic constraints and a moving camera and moving reference camera in Section 3. The efforts of previous sections are then brought to bear on a tracking and mapping application, where the UGV is controlled to track a trajectory that takes the vehicle outside of the initial FOV of the camera. The daisy-chaining approach must be extended to allow for new fixed landmarks to enter the FOV and related to previous landmarks and the UGV. 2. Daisy-Chaining Based Tracking Control In this section, a visual servo tracking controller is developed for a moving six-DOF agent based on daisy-chained image feedback from a moving camera. The objective is to enable a controlled agent to track a desired trajectory determined by a sequence of prerecorded images from a stationary camera. To achieve this result, several technical issues must be resolved including: discriminating the relative velocity between the moving camera and the moving agent, compensating for the unknown time-varying distance measurement from the camera to the agent, relating the unknown attitude of the control agent to some measurable signals, and using the unit quaternion to formulate the rotation motion and rotation error system. The relative velocity issue is resolved by utilizing multi-view image geometry to daisy-chain homography relationships between the moving camera frame and the moving agent coordi- nate frames. By using the depth ratios obtained from the homography decomposition, the unknown depth information is related to an unknown constant that can be compensated for by a Lyapunov-based adaptive update law. Lyapunov-based methods are provided to prove the adaptive asymptotic tracking result. 2.1 Problem Scenario Over the past decade, a variety of visual servo controllers have been addressed for both camera-to-hand and camera-in-hand configurations (e.g., see (Allen et al., 1993; Hager et al., 1995; Hutchinson et al., 1996; Wiiesoma et al., 1993)). Typical camera-to-hand and camera- in-hand visual servo controllers have required that either the camera or the target remain stationary so that an absolute velocity can be determined and used in the control develop- ment. For the problem of a moving camera tracking a moving target (i.e. control of relative pose/velocity), integral control or predictive Kalman filters have been used to overcome the unknown target velocity (Bensalah & Chaumette, 1995; Papanikolopoulos et al., 1993). In con- trast to these methods, the development in this section and our previous preliminary work in (Hu, Mehta, Gans & Dixon, 2007; Mehta, Dixon, MacArthur & Crane, 2006; Mehta, Hu, Gans & Dixon, 2006) is motivated by the problem when the camera and the target are moving. A practical example application of this scenario is an airborne camera attached to a remote con- trolled aircraft that is used to determine pose measurements of an UGV and then relay the information to the UGV for closed-loop control. Fig. 1. Geometric model for a moving camera, moving target and stationary reference camera. The scenario examined in this section is depicted in Fig. 1, where various coordinate frames are defined as a means to develop the subsequent Euclidean reconstruction and control meth- ods. In Fig. 1, a stationary coordinate frame I R is attached to a camera and a time-varying ADaisy-ChainingVisualServoingApproachwith ApplicationsinTracking,Localization,andMapping 385 build a true 3D map (i.e., a map that is easily interpreted by a human being such as walls or topography) (Eustice et al., 2005; Jensfelt et al., 2006; Jung & Lacroix, 2003; Kim & Sukkarieh, 2003; Se et al., 2002), and those that build a more abstract map that is designed to allow the camera/robot to accurately navigate and recognize its location, but not designed for human interpretation. From the navigation perspective, vision-based pose estimation has motivated results such as (Baker & Nayar, 1999; Burschka & Hager, 2001; Chen et al., 2006; Das et al., 2001; Dixon et al., 2001; Fang et al., 2005; Hagar et al., 1998; Kim et al., 2001; Ma et al., 1999; Song & Huang, 2001) and others, where a camera provides feedback information to enable autonomous navigation of a control agent. See (Chen et al., 2006) for a detailed review of these and other related results. Typically these results are focused on the regulation result, and in all the results the targets are static with respect to the moving camera or the camera is stationary and recording images of the moving control agent. Vision-based cooperative control methods can involve a moving camera supplying regula- tion/tracking control input to a moving control agent. A practical example application of this scenario is an airborne camera attached to a remote controlled aircraft that is used to deter- mine a desired video of an unmanned ground vehicle (UGV) moving in a terrain, and then another moving camera (which does not have to follow the same trajectory as the previous camera) is used to relate and control the pose of a moving UGV with respect to the recorded video. The challenge here is to account for the relative velocity between the moving camera and the moving UGV. Also, the reference objects (or features) used to evaluate the pose can leave the camera’s field-of-view (FOV) while new reference object enters the FOV. In this sce- nario, the vision-based system should be intelligent to switch from the leaving reference object to a new reference object to provide the pose information to the controller. This chapter uses a new daisy-chaining method for visual servo tracking control of a rigid- body object, such as an UGV, while providing localization of the moving camera and moving object in the world frame, and mapping the location of static landmarks in the world frame. Hence, this approach can be used in control and local VSLAM of the UGV, with applications toward path planning, real time trajectory generation, obstacle avoidance, multi-vehicle coor- dination control and task assignment, etc. By using the daisy-chaining strategy, the coordi- nates of static features out of the FOV can also be estimated. The estimates of static features can be maintained as a map, or can be used as measurements in existing VSLAM methods. Section 2 introduces the imaging model, geometric model used in this chapter, as well as introduces the daisy-chaining method as applied to the case of controlling a six-DOF planar object through visual data from a moving camera and fixed reference camera. These results are extended to the case of an UGV with nonholonomic constraints and a moving camera and moving reference camera in Section 3. The efforts of previous sections are then brought to bear on a tracking and mapping application, where the UGV is controlled to track a trajectory that takes the vehicle outside of the initial FOV of the camera. The daisy-chaining approach must be extended to allow for new fixed landmarks to enter the FOV and related to previous landmarks and the UGV. 2. Daisy-Chaining Based Tracking Control In this section, a visual servo tracking controller is developed for a moving six-DOF agent based on daisy-chained image feedback from a moving camera. The objective is to enable a controlled agent to track a desired trajectory determined by a sequence of prerecorded images from a stationary camera. To achieve this result, several technical issues must be resolved including: discriminating the relative velocity between the moving camera and the moving agent, compensating for the unknown time-varying distance measurement from the camera to the agent, relating the unknown attitude of the control agent to some measurable signals, and using the unit quaternion to formulate the rotation motion and rotation error system. The relative velocity issue is resolved by utilizing multi-view image geometry to daisy-chain homography relationships between the moving camera frame and the moving agent coordi- nate frames. By using the depth ratios obtained from the homography decomposition, the unknown depth information is related to an unknown constant that can be compensated for by a Lyapunov-based adaptive update law. Lyapunov-based methods are provided to prove the adaptive asymptotic tracking result. 2.1 Problem Scenario Over the past decade, a variety of visual servo controllers have been addressed for both camera-to-hand and camera-in-hand configurations (e.g., see (Allen et al., 1993; Hager et al., 1995; Hutchinson et al., 1996; Wiiesoma et al., 1993)). Typical camera-to-hand and camera- in-hand visual servo controllers have required that either the camera or the target remain stationary so that an absolute velocity can be determined and used in the control develop- ment. For the problem of a moving camera tracking a moving target (i.e. control of relative pose/velocity), integral control or predictive Kalman filters have been used to overcome the unknown target velocity (Bensalah & Chaumette, 1995; Papanikolopoulos et al., 1993). In con- trast to these methods, the development in this section and our previous preliminary work in (Hu, Mehta, Gans & Dixon, 2007; Mehta, Dixon, MacArthur & Crane, 2006; Mehta, Hu, Gans & Dixon, 2006) is motivated by the problem when the camera and the target are moving. A practical example application of this scenario is an airborne camera attached to a remote con- trolled aircraft that is used to determine pose measurements of an UGV and then relay the information to the UGV for closed-loop control. Fig. 1. Geometric model for a moving camera, moving target and stationary reference camera. The scenario examined in this section is depicted in Fig. 1, where various coordinate frames are defined as a means to develop the subsequent Euclidean reconstruction and control meth- ods. In Fig. 1, a stationary coordinate frame I R is attached to a camera and a time-varying RobotLocalizationandMapBuilding386 coordinate frame F d is attached to some mobile agent (e.g., an aircraft, a ground vehicle, a ma- rine vessel). The agent is identified in an image through a collection of feature points that are assumed (without loss of generality 1 ) to be coplanar and non-collinear (i.e., a planar patch of feature points). The camera attached to I R a priori records a series of snapshots (i.e., a video) of the motion of the coordinate frame F d until F d comes to rest. A stationary coordinate frame F ∗ is attached to another planar patch of feature points that are assumed to be visible in every frame of the video recorded by the camera. For example, the camera attached to I R is on-board a stationary satellite that takes a series of snapshots of the relative motion of F d with respect to F ∗ . Therefore, the desired motion of F d can be encoded as a series of relative translations and rotations with respect to the stationary frame F ∗ a priori. Spline functions or filter algorithms can then be used to generate a smooth desired feature point trajectory as described in (Chen et al., 2005). Fig. 1 also depicts a time-varying coordinate frame I that is attached to another camera (e.g., a camera attached to a remote controlled aircraft), and a time-varying coordinate frame F that is attached to the current pose of the planar patch. The camera attached to I captures snapshots of the planar patches associated with F and F ∗ , respectively. The a priori motion of F d represents the desired trajectory of the coordinate system F, where F and F d are attached to identical objects, but at different points in time. The camera attached to I R can be a different camera (with different calibration parameters) as the camera attached to I. Based on these coordinate frame definitions, the problem considered in this section is to develop a kinematic controller for the object attached to F so that the time-varying rotation and translation of F converges to the desired time-varying rotation and translation of F d , where the motion of F is determined from the time-varying overhead camera attached to I. 2.2 Geometric Relationships Relationships between the various coordinate frames are summarized in Table I. In Table I, R ( t ) , R ∗ (t), R r (t), R ( t ) , R rd ( t ) , R ∗ r ∈ SO(3) denote rotation matrices, and x f r (t), x f r (t), x f rd ( t ) , x ∗ f r ∈ R 3 denote translation vectors. From Fig. 1, the translation x f r (t) and the rotation R ( t ) can be expressed as x f r = x ∗ f r + R ∗ r R ∗T (x f − x ∗ f ) R = R ∗ r R ∗T R. (1) As illustrated in Fig. 1, π, π d and π ∗ denote the planes of feature points associated with F, F d , and F ∗ , respectively. The constant Euclidean coordinates of the i-th feature point in F (and also F d ) are denoted by s 1i ∈ R 3 ∀i = 1, 2, ··· , n ( n ≥ 4 ) , and s 2i ∈ R 3 ∀i = 1, 2, ··· , n denotes the constant Euclidean coordinates of the i-th feature point in F ∗ . From the geometry between the coordinate frames depicted in Fig. 1, the following relationships 1 Image processing techniques can often be used to select coplanar and non-collinear feature points within an image. However, if four coplanar target points are not available then the subsequent de- velopment can also exploit the virtual parallax method (Boufama & Mohr, 1995; Malis & Chaumette, 2000) where the non-coplanar points are projected onto a virtual plane. Motion Frames R ( t ) , x f ( t ) F to I in I R ∗ (t), x ∗ f (t) F ∗ to I in I R r (t), x f r (t) I to I R R ( t ) , x f r ( t ) F to I R in I R R ∗ r , x ∗ f r F ∗ to I R in I R R rd (t), x f rd (t) F d to I R in I R Table 1. Coordinate frames relationships. can be developed: ¯ m i = x f + Rs 1i ¯ m rdi = x f rd + R rd s 1i (2) ¯ m ∗ ri = x ∗ f r + R ∗ r s 2i ¯ m i = x f r + R s 1i (3) ¯ m ∗ i = x ∗ f + R ∗ s 2i . (4) In (2)-(4), ¯ m i (t), ¯ m ∗ i (t) ∈ R 3 denote the Euclidean coordinates of the feature points on π and π ∗ , respectively, expressed in I as ¯ m i (t) x i (t) y i (t) z i (t) T (5) ¯ m ∗ i (t) x ∗ i (t) y ∗ i (t) z ∗ i (t) T , (6) ¯ m i (t), ¯ m rdi ( t ) ∈ R 3 denote the actual and desired time-varying Euclidean coordinates, respec- tively, of the feature points on π expressed in I R as ¯ m i (t) x i (t) y i (t) z i (t) T (7) ¯ m rdi (t) x rdi (t) y rdi (t) z rdi (t) T , (8) and ¯ m ∗ ri ∈ R 3 denotes the constant Euclidean coordinates of the feature points on the plane π ∗ expressed in I R as ¯ m ∗ ri x ∗ ri y ∗ ri z ∗ ri T . (9) After some algebraic manipulation, the expressions in (2)-(4) can be rewritten as ¯ m ∗ i = ¯ x n + R n ¯ m i (10) ¯ m i = ¯ x f + ¯ R ¯ m ∗ i ¯ m rdi = ¯ x f rd + ¯ R rd ¯ m ∗ ri (11) ¯ m ∗ ri = x f r + R r ¯ m ∗ i ¯ m i = x f r + R r ¯ m i , (12) ADaisy-ChainingVisualServoingApproachwith ApplicationsinTracking,Localization,andMapping 387 coordinate frame F d is attached to some mobile agent (e.g., an aircraft, a ground vehicle, a ma- rine vessel). The agent is identified in an image through a collection of feature points that are assumed (without loss of generality 1 ) to be coplanar and non-collinear (i.e., a planar patch of feature points). The camera attached to I R a priori records a series of snapshots (i.e., a video) of the motion of the coordinate frame F d until F d comes to rest. A stationary coordinate frame F ∗ is attached to another planar patch of feature points that are assumed to be visible in every frame of the video recorded by the camera. For example, the camera attached to I R is on-board a stationary satellite that takes a series of snapshots of the relative motion of F d with respect to F ∗ . Therefore, the desired motion of F d can be encoded as a series of relative translations and rotations with respect to the stationary frame F ∗ a priori. Spline functions or filter algorithms can then be used to generate a smooth desired feature point trajectory as described in (Chen et al., 2005). Fig. 1 also depicts a time-varying coordinate frame I that is attached to another camera (e.g., a camera attached to a remote controlled aircraft), and a time-varying coordinate frame F that is attached to the current pose of the planar patch. The camera attached to I captures snapshots of the planar patches associated with F and F ∗ , respectively. The a priori motion of F d represents the desired trajectory of the coordinate system F, where F and F d are attached to identical objects, but at different points in time. The camera attached to I R can be a different camera (with different calibration parameters) as the camera attached to I. Based on these coordinate frame definitions, the problem considered in this section is to develop a kinematic controller for the object attached to F so that the time-varying rotation and translation of F converges to the desired time-varying rotation and translation of F d , where the motion of F is determined from the time-varying overhead camera attached to I. 2.2 Geometric Relationships Relationships between the various coordinate frames are summarized in Table I. In Table I, R ( t ) , R ∗ (t), R r (t), R ( t ) , R rd ( t ) , R ∗ r ∈ SO(3) denote rotation matrices, and x f r (t), x f r (t), x f rd ( t ) , x ∗ f r ∈ R 3 denote translation vectors. From Fig. 1, the translation x f r (t) and the rotation R ( t ) can be expressed as x f r = x ∗ f r + R ∗ r R ∗T (x f − x ∗ f ) R = R ∗ r R ∗T R. (1) As illustrated in Fig. 1, π, π d and π ∗ denote the planes of feature points associated with F, F d , and F ∗ , respectively. The constant Euclidean coordinates of the i-th feature point in F (and also F d ) are denoted by s 1i ∈ R 3 ∀i = 1, 2, ··· , n ( n ≥ 4 ) , and s 2i ∈ R 3 ∀i = 1, 2, ··· , n denotes the constant Euclidean coordinates of the i-th feature point in F ∗ . From the geometry between the coordinate frames depicted in Fig. 1, the following relationships 1 Image processing techniques can often be used to select coplanar and non-collinear feature points within an image. However, if four coplanar target points are not available then the subsequent de- velopment can also exploit the virtual parallax method (Boufama & Mohr, 1995; Malis & Chaumette, 2000) where the non-coplanar points are projected onto a virtual plane. Motion Frames R ( t ) , x f ( t ) F to I in I R ∗ (t), x ∗ f (t) F ∗ to I in I R r (t), x f r (t) I to I R R ( t ) , x f r ( t ) F to I R in I R R ∗ r , x ∗ f r F ∗ to I R in I R R rd (t), x f rd (t) F d to I R in I R Table 1. Coordinate frames relationships. can be developed: ¯ m i = x f + Rs 1i ¯ m rdi = x f rd + R rd s 1i (2) ¯ m ∗ ri = x ∗ f r + R ∗ r s 2i ¯ m i = x f r + R s 1i (3) ¯ m ∗ i = x ∗ f + R ∗ s 2i . (4) In (2)-(4), ¯ m i (t), ¯ m ∗ i (t) ∈ R 3 denote the Euclidean coordinates of the feature points on π and π ∗ , respectively, expressed in I as ¯ m i (t) x i (t) y i (t) z i (t) T (5) ¯ m ∗ i (t) x ∗ i (t) y ∗ i (t) z ∗ i (t) T , (6) ¯ m i (t), ¯ m rdi ( t ) ∈ R 3 denote the actual and desired time-varying Euclidean coordinates, respec- tively, of the feature points on π expressed in I R as ¯ m i (t) x i (t) y i (t) z i (t) T (7) ¯ m rdi (t) x rdi (t) y rdi (t) z rdi (t) T , (8) and ¯ m ∗ ri ∈ R 3 denotes the constant Euclidean coordinates of the feature points on the plane π ∗ expressed in I R as ¯ m ∗ ri x ∗ ri y ∗ ri z ∗ ri T . (9) After some algebraic manipulation, the expressions in (2)-(4) can be rewritten as ¯ m ∗ i = ¯ x n + R n ¯ m i (10) ¯ m i = ¯ x f + ¯ R ¯ m ∗ i ¯ m rdi = ¯ x f rd + ¯ R rd ¯ m ∗ ri (11) ¯ m ∗ ri = x f r + R r ¯ m ∗ i ¯ m i = x f r + R r ¯ m i , (12) [...]... analysis methods and Barbalat’s lemma can be used to proved Theorem 2 based on a positive definite function V (t) ∈ R defined as Mehta, Hu, Gans & Dixon (2006) V 1 ∗ 2 1 2 1 2 1 ∗2 ˜ ¯ z e + e + e + z 2 r1 1 2 2 2 3 2γ1 r1 (90) 402 Robot Localization and Map Building 4 Simultaneous Tracking, Localization and Mapping For vision-based autonomous systems applications (e.g., tracking, localization and mapping),... the position of the stationary reference camera, and F1 and F2 denote the position of the stationary reference objects Robot Localization and Map Building e1(t) [m] 406 0.1 0 −0.1 −0.2 2 10 20 0 10 20 0 e (t) [m] 0 10 20 time [s] 30 40 50 30 40 50 30 40 50 0 −0.05 e3(t) [rad] −0.1 time [s] 0.2 0 −0.2 −0.4 time [s] Fig 7 Linear (i.e e1 (t) and e2 (t)) and angular (i.e e3 (t)) tracking error I(t) (1)... S (1995) Robot hand-eye coordination based on stereo vision, IEEE Contr Syst Mag 15(1): 30–39 Hu, G., Dixon, W E., Gupta, S & Fitz-coy, N (2006) A quaternion formulation for homography-based visual servo control, Proc IEEE Int Conf Robot Automat., pp 2391–2396 408 Robot Localization and Map Building Hu, G., Gans, N., Mehta, S & Dixon, W E (2007) Daisy chaining based visual servo control part II: Extensions,... processor running at 576MHz, and 64MB of main memory It gets information from the environment mainly through a 350K-pixel color camera and 2 infrared sensors AIBO’s main locomotion characteristic is its canine aspect with four legs 410 Robot Localization and Map Building The main reason for choosing this robotic platform for our research is its generalization as a low cost legged robotic platform Our group... Tracking, Localization, and Mapping 403 Fig 5 Geometric model for a moving camera, moving target, moving reference camera and multiple reference objects for simultaneous tracking, localization, and mapping 4.3 Euclidean Reconstruction The Euclidean reconstruction for the geometric model in Fig 5 can be separated into three ∗ cases Case 1: a single reference object π1 is within the reference camera’s FOV and. .. mri , (27) are known, constant, and invertible intrinsic camera calibration matriwhere A1 , A2 ∈ ∗ ces of the current camera and the reference camera, respectively In (27), pi (t), pi (t) ∈ R3 ∗ expressed represent the image-space coordinates of the Euclidean feature points on π and π in terms of I as T T ∗ ∗ ∗ ui vi 1 ui vi 1 pi pi , (28) 390 Robot Localization and Map Building ∗ ∗ ∗ respectively, where... RmT (t) Rmd (t) 0 0 0 0 −1 T −1 T (73) 400 Robot Localization and Map Building The expressions in (72) and (73) can be used to determine that η3 = ηd3 = − dr ∗ zr1 (74) The expressions in (57)-(66) can be used to rewrite η (t) and ηd (t) in terms of the measurable ∗ signals α1 (t), αr1 (t), αrm1 (t), αmd1 (t), R(t), R∗ (t), Rr , Rmd (t), R∗ (t), p1 (t), and pmd1 (t) as m η (t) = ηd ( t ) = αr1 T ∗... S (2003) Airborne simultaneous localisation and map building, Proc IEEE Int Conf Robot Automat., pp 406–411 Ma, Y., Kosecka, J & Sastry, S (1999) Vision guided navigation for nonholonomic mobile robot, IEEE Trans Robot 15(3): 521–536 Mackunis, W., Gans, N., Kaiser, K & Dixon, W E (2007) Unified tracking and regulation visual servo control for wheeled mobile robots, Proc IEEE Multi-Conf Syst Control,... the objects attached to F and F ∗ are the identical objects), the assumption is generally restrictive and is the focus of future research As described in our preliminary work in Hu, Gans, Mehta & Dixon (2007), 2 ¯ ¯ Note that Rn (t), R(t) and Rrd (t) in (13) are the rotation matrices between F and F ∗ , F ∗ and F , and ¯ ¯ ¯ F ∗ and Fd , respectively, but xn (t), x f (t) and x f rd (t) in (14)-(16)... approach only calculates the localization probability of a set of samples named particles, allocated along the space of states The amount of particles in a state determines Visual Based Localization of a Legged Robot with a topological representation 411 the robot s position This approach has been used in applications (e.g (Sridharan et al, 2005), (Guerrero and Solar, 2003) and (Thrun et al, 2001)) where . control and mapping problems in terms of the image space. Applications involving localization and mapping using camera as a sensor are often described as Visual Simultaneous Localization and Mapping. R r ¯ m i , (12) Robot Localization and Map Building3 88 where R n ( t ) , ¯ R ( t ) , ¯ R rd (t), R r (t) ∈ SO ( 3 ) and ¯ x n (t), ¯ x f ( t ) , ¯ x f rd (t), x f r (t) ∈ R 3 are new rotation and translation. − ˜ q T v K ω ˜ q v −e T K v e, (56) Robot Localization and Map Building3 94 where (49) and (51)-(53) were utilized. Based on (55) and (56), e(t ), ˜ q v (t), ˜ q 0 (t), ˜ z ∗ r1 (t) ∈ L ∞ and e(t), ˜ q v (t)