sensor based learning for practical planning of fine motion in robotics ppt

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	22
Dung lượng	598,54 KB

Nội dung

Sensor-based learning for practical planning of fine motions in robotics Enric Cervera * , Angel P. del Pobil Department of Computer Science and Engineering, Jaume-I University, Castell  oo, Spain Received 4 July 2001; received in revised form 8 October 2001; accepted 28 November 2001 Abstract This paper presents an implemented approach to part-mating of three-dimensional non-cylindrical parts with a 6 DOF manipulator, considering uncertainties in modeling, sensing and control. The core of the proposed solution is a reinforcement learning algorithm for selecting the actions that achieve the goal in the minimum number of steps. Position and force sensor values are encoded in the state of the system by means of a neural network. Experimental results are presented for the insertion of different parts – circular, quadrangular and triangular prisms – in three dimensions. The system exhibits good generalization capabilities for different shapes and location of the assembled parts. These results significantly extend most of the previous achievements in fine motion tasks, which frequently model the robot as a polygon translating in the plane in a polygonal environment or do not present actual implemented prototypes. Ó 2002 Elsevier Science Inc. All rights reserved. Keywords: Robotics; Neural nets; Reinforcement learning 1. Introduction We present a practical framework for fine motion tasks, particularly the insertion of non-cylindrical parts with uncertainty in modeling, sensing and control. The approach is based on an algorithm which autonomously learns a Information Sciences 145 (2002) 147–168 www.elsevier.com/locate/ins * Corresponding author. Present address: Department of Computer Science and Engineering, Jaume-I, Castell  oo, Spain. E-mail addresses: ecervera@icc.uji.es (E. Cervera), pobil@icc.uji.es (A.P. del Pobil). 0020-0255/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved. PII: S 0 0 20-0255(0 2 ) 0 0 2 28-1 relationship between sensed states and actions. This relationship allows the robot to select those actions which attain the goal in the minimum number of steps. A feature extraction neural network complements the learning algorithm, forming a practical sensing-action architecture for manipulation tasks. In the type of motion planning problems addressed in this work, interactions between the robot and objects are allowed, or even mandatory, for operations such as compliant motions and parts mating. We restrict ourselves to tasks which do not require complex plans; however, they are significantly difficult to attain in practice due to uncertainties. Among these tasks, the peg-in-hole insertion problem has been broadly studied, but very few results can be found in the literature for three-dimensional non-cylindrical parts in an actual implementation. We believe that practicality, although an important issue, has been vastly underestimated in fine motion methods, since most of these approaches are based on geometric models which become complex for non-trivial cases espe- cially in three dimensions [1]. The remainder of this paper is structured as follows. Section 2 reviews some related work and states the key contributions of our work. In Section 3, we describe the components of the architecture. Thorough experimental results are then presented in Section 4. Finally, Section 5 discusses a number of issues regarding the proposed approach, and draws some conclusions. 2. Background and motivation 2.1. Related research Though the peg-in-hole problem has been exhaustively studied for a long time [2–4], most of the implementations have been limited to planar motions or cylindrical parts [5–7]. Caine et al. [8] pointed out the difficulties of inserting prismatic pegs. To our knowledge, our results are the first for a system which learns to insert non-cylindrical pegs (see Fig. 1) in a real-world task with uncertainty in position and orientation. Parts mating in real-world industry is frequently performed by passive compliance devices [4], which support parts and aid their assembly. They are capable of high-speed precision insertions, but they lack the flexibility of software methods. A difficult issue in parts mating is the need for nonlinear compliance for chamferless insertions, which was demonstrated by Asada [2], who proposed a supervised neural network for learning the nonlinear relationship between sensing and motion in a two-dimensional frictionless peg-in-hole task. The use of a supervised network presents a great difficulty in real-world three-dimensional problems, since a proper training set has to be generated. 148 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 Lozano-P  eerez [9] first proposed a formal approach to the synthesis of compliant-motion strategies from geometric descriptions of assembly operations and explicit estimates of errors in sensing and control. In an extension to this approach, Donald [10] presented a formal framework for computing motion strategies which are guaranteed to succeed in the presence of three kinds of uncertainty (sensing, control and model). Experimental verification is described in [11], but only for planar tasks. Following DonaldÕs work, Briggs [12] proposed an Oðn 2 logðnÞÞ algorithm, where n is the number of vertices in the environment, for the basic problem of manipulating a point from a specified start region to a specified goal region amidst planar polygonal obstacles where control is subject to uncertainty. Latombe et al. [13] describe two practical methods for computing preimages for a robot having a two-dimensional Euclidean configuration space. Though the general principles of the planning methods immediately extend to higher dimensional spaces, the geometric algorithms do not, and only simulated examples of planar tasks are shown. LaValle and Hutchinson [14] present another framework for manipulation planning under uncertainty, based on preimages, though they consider such approach to be reasonable only for a few dimensions. Their computed examples are restricted to planar polygonal models. A different geometric approach is introduced by McCarragher and Asada [15] who define a discrete event in assembly as a change in contact state re- flecting a change in a geometric constraint. The discrete event modeling is accomplished using Petri nets. Dynamic programming is used for task-level planning to determine the sequence of desired markings (contact state) for discrete event control that minimizes a path length and uncertainty performance measure. The method is applied to a dual peg-in-hole insertion task, but the motion is kept planar. Learning methods provide a framework for autonomous adaptation and improvement during task execution. An approach to learning a reactive control strategy for peg-in-hole insertion under uncertainty and noise is presented in [16]. This approach is based on active generation of compliant behavior using a Fig. 1. Diagram of the insertion task. E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 149 nonlinear admittance mapping from sensed positions and forces to velocity commands. The controller learns the mapping through repeated attempts at peg insertion. A two-dimensional version of the peg-in-hole task is implemented on a real robot. The controller consists of a supervised neural network, with stochastic units. In [5] the architecture is applied to a real ball-balancing task, and a three-dimensional cylindrical peg-in-hole task. Kaiser and Dillman [17] propose a hierarchical approach to learning the efficient application of robot skills in order to solve complex tasks. Since people can carry out manipulation tasks with no apparent difficulty, they develop a method for the acquisition of sensor-based robot skills from human demonstration. Two manipulation skills are investigated: peg insertion and door opening. Distante et al. [18] apply reinforcement learning techniques to the problem of target reaching by using visual information. 2.2. Motivation Approaches based on geometric models are far from being satisfactory: most of them are restricted to planar problems, and a plan might not be found if the part geometries are complex or the uncertainties are great. Many frameworks do not consider incorrect modeling and robustness. Though many of the approaches have been implemented in real-world en- vironments, they are frequently limited to planar motions. Furthermore, cyl- inders are the most utilized workpieces in three-dimensional problems. If robots can be modeled as polygons moving amid polygonal obstacles in a planar world, and a detailed model is available, a geometric framework is fine. However, since such conditions are rarely found in practice, we argue that a robust, adaptive, autonomous learning architecture for robot manipulation tasks – particularly part mating – is a necessary alternative in real-world en- vironments, where uncertainties in modeling, sensing and control are un- avoidable. 3. A practical adaptive architecture Fig. 2 depicts the three components of the adaptive architecture: two sensor- based motions – guarded and compliant – and an additional subsystem com- bining learning and exploration. This architecture relies on two types of sensor: position ðxÞ and force ðf Þ. Throughout this work, position and orientation of the tool frame are obtained from the robot joint angles using the kinematic equations. Force measurements are obtained from a wrist-mounted strain gauge sensor. It is assumed that all sensors are calibrated, but uncertainty cannot be absolutely eliminated due to sensor noise and calibration imprecision. The systemÕs output is the end- 150 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 effector velocity ðvÞ in Cartesian coordinates, which is translated to joint coordinates by a resolved motion rate controller: _ hh ¼ J À1 v; where v ¼ _ xx: ð1Þ Since the work space of the fine motion task is limited to a small region, the singularities of J are not important in this framework. 3.1. The insertion plan Uncertainty in the location of the part and the hole prevents the success of a simple position-based plan. Contact between parts has to be monitored, and different actions are needed to perform a correct insertion. Other approaches have tried to build a plan by considering all the possible contact states, but they have only succeeded in simple planar tasks. In addition, uncertainty poses difficulties for identifying the current state. The proposed insertion plan consists of three steps, which are inspired by intuitive manipulation skills: (1) Approach hole until a contact is detected. (2) Move compliantly around the hole until contact is lost (hole found). (3) Move into the hole until a contact is detected (bottom of the hole). This strategy differs from a pure random search in that an adaptation procedure is performed during the second step. The system learns a relationship between sensing and action, in an autonomous way, which guides the exploration towards the target. Initially, the system relies heavily on exploration. As a result of experience, an insertion skill is learned, and the mean insertion time for the task is considerably improved. 3.2. Guarded motions In guarded motions, the system is continuously monitoring a condition, which usually stops the motion, e.g. a force value going beyond a fixed threshold. Fig. 2. Subsystems of the adaptive architecture. E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 151 In the above insertion plan, all the steps are force-guarded. Starting from a free state and due to the geometry of the task, a contact is gained if jF z j raises to 0.1 kgf, and the contact is lost if jF z j falls below 0.05 kgf. This dual-threshold accounts for small variations in the contact force due to friction, or uncertainty in the measurements. It is not impossible to insert the part at the first step, and additional information is required to know that the contact has been caused by the surface. A position value is enough, since the depth of the hole is usually much greater than the uncertainty in location. Another possibility is making small lateral motions: if large forces are detected, the part has already been inserted into the hole. 3.3. Compliant motions Once a contact is achieved, motion is restricted to a surface. In practice, two degrees of freedom ðX ; Y Þ are position-controlled, while the third one ðZÞ is force-controlled. Initially, random compliant motions are performed, but a relationship between sensed forces and actions is learned, which decreases the time needed to insert the part. During the third step, a complementary compliant motion is performed. In this task, when the part is inserted, Z is position-controlled, while ðX ; Y Þ are force-controlled. 3.4. Exploration and learning Random search has been proposed in the literature as a valid tool for dealing with uncertainties [19]. However, the insertion time greatly increases when the clearance ratio decreases. In the proposed architecture (see Fig. 3), an adaptation process learns a relationship between sensed states and actions, which guides the insertion task towards completion with the minimum number of actions. A sensed state consists of a discretized position and force measurement, as described below. A value is stored in a look-up table for each pair of state and Fig. 3. Learning subsystem. Exploration is embedded in the action selection block. 152 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 action. This value represents the amount of reinforcement which is expected in the future, starting from the state, if the action is performed. The reinforcement (or cost) is a scalar value which measures the quality of the performed action. In our setup, a negative constant reinforcement is generated after every motion. The learning algorithm adapts the values of the table so that the expected reinforcement is maximized, i.e., the number of actions (cost) to achieve the goal is minimized. The discrete nature of the reinforcement learning algorithm poses the ne- cessity of extracting discrete values from the sensor signals of force and position. This feature extraction process along with the basis of the learning algorithm is described below. 3.4.1. Feature extraction Force sensing is introduced to compensate for the uncertainty in positioning the end-effector. It does a good job when a small displacement causes a contact, since a big change in force is detected. However, with only force signals it is not always possible to identify the actual contact state, i.e., different contacts produce similar force measurements, as described in [20]. The adopted solution is to combine the force measurements with the relative displacement of the end-effector from the initial position, i.e., that of the first contact between the part and the surface. The next problem is the discretization of the inputs, which is a requirement of the learning algorithm. There is a conflict between size and fineness. With a fine representation, the number of states is increased, thus slowing down the convergence of the learning algorithm. Solutions are problem-dependent, using heuristics for finding a good representation of manageable size. We have obtained good results with the division of the exploration space in three intervals along each position-controlled degree of freedom. For cylindrical parts, the XY-plane of the exploration space is divided into nine regions – a3Â 3 grid. For non-cylindrical parts, the rotation around Z-axis has to be considered too, thus the total number of states is 27. Region limits are fixed according to the estimated uncertainty and the radius of exploration. Though the force space could be partitioned in a similar way, an unsupervised clustering scheme is used. In a previous work [20] we pointed out the feasibility of unsupervised learning algorithms, particularly KohonenÕs self- organizing maps (SOMs) [21], for extracting feature information from sensor data in robotic manipulation tasks. An SOM is a lattice of units, or cells. Each unit is a vector with as many components as inputs to the SOM. Though there is a neighborhood relationship between units in the lattice, this is only used during the training of the map and not in our scheme. SOMs perform a nonlinear projection of the probability density function of the input space onto the two-dimensional lattice of units. Though all the six E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 153 force and torque signals are available, the practical solution adopted is to use only the three torque signals as inputs to the map. The reason for this is the strong correlation between the force and the torque; thus, adding those cor- related signals does not include any new information to the system. The SOM is trained with sensor samples obtained during insertions. After training, each cell or unit of the map becomes a prototype or codebook vector, which represents a region of the input space. The discretized force state is the codebook vector which comes the nearest (measured by the Euclidean distance) to the analog force values. The number of units must be chosen a priori, seeking for a balance between size and fineness. In the experiments, a 6 Â 4 map is used, thus totalling 24 force discrete states. Since the final state consists of position and force, there are 9 Â 24 ¼ 216 discrete states in the cylindrical insertion, and 27 Â 24 ¼ 648 discrete states in the non-cylindrical task. 3.4.2. Reinforcement learning The advantage of the proposed architecture over other random approaches is the ability to learn a relationship between sensed states and actions. As the system becomes skilled, this relationship is more intensely used to guide the process towards completion with the minimum number of steps. The system must learn without a teacher. The skill measurement is the time or number of steps required to perform a correct insertion and is expressed in terms of cost or negative reinforcement. Sutton [22] defined reinforcement learning (RL) as the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. Q-learning [23] is an RL algorithm that can be used whenever there is no explicit model of the system and the cost structure. This algorithm learns the state–action pairs which maximize a scalar reinforcement signal that will be received over time. In the simplest case, this measure is the sum of the future reinforcement values, and the objective is to learn an associative mapping that at each time step selects, as a function of the current state, an action that maximizes the expected sum of future reinforcement. In Q-learning, a look-up table of Q-values is stored in memory, one Q-value for each state–action pair. The Q-value is the expected amount of reinforcement if, from that state, the action is performed and, afterwards, only optimal actions are chosen. In our setup, when the system performs any action (motion), a negative constant reinforcement is signalled. This reinforcement represents the cost of the motion. Since the learning algorithm tends to maximize the reinforcement, cost will be minimized, i.e., the system will learn those actions which lead to the goal with the minimum number of steps. The basic learning step consists in updating a single Q-value. If the system senses state s, and it performs action a, resulting in reinforcement r and 154 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 the system senses a new state s 0 , then the Q-value for ðs; aÞ is updated as follows: Qðs; aÞ ð1 À aÞQðs; aÞþa  r þ c max a 0 2Aðs 0 Þ Qðs 0 ; a 0 Þ  ; ð2Þ where a is the learning rate and c is a discount factor, which weighs the value of future reinforcement. The table converges to the optimal values as long as all the states are visited infinitely often. In practice, a good solution is obtained with a few thousand trials of the task. 3.4.3. Action selection and exploration During the learning process, there is a conflict between exploration and exploitation. Initially, the Q-values are meaningless and actions should be chosen randomly, but as learning progresses, better actions should be chosen to minimize the cost of learning. However, exploration cannot be completely turned off, since the optimal action might not yet be discovered. Some heuristics for exploration and exploitation can be found in the literature. In the implementation, we have chosen the Boltzmann exploration: the Q-values are used for weighing exploitation and exploration. The probability of selecting an action a in state s is pðs ; aÞ¼ exp Qðs;aÞ T  P a 0 exp Qðs;a 0 Þ T  ; ð3Þ where T is a positive value, which controls the degree of randomness, and it is often referred to as temperature. It gradually decays from an initial value, and exploration is turned off when it is close to zero, since the best action is selected with Probability 1. 4. Experimental results The system has been implemented in a robot arm equipped with a wrist- mounted force sensor (Fig. 4). The task is the insertion of pegs of different shapes (circular, square and triangular section) into their appropriate holes. Pegs are made of wood, and the platform containing the holes is made of a synthetic resin. Uncertainty in the position and orientation is greater than the clearance between the pegs and holes. The nominal goal is specified by a vector and a rotation matrix relative to an external fixed frame of reference. This location is supposed to be centered above the hole, so the peg would be inserted just by moving straight along the Z axis with no rotation if there were no uncertainty E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 155 present. After positioning over the nominal goal, the robot performs a guarded motion towards the hole. If the insertion fails, the robot starts a series of perception and action cycles. First, sensors are read, and a state is identified; depending on such state, one action or another is chosen, and the learning mechanism updates the internal parameters of decision. The robot performs compliant motions, i.e., it keeps the contact with the surface while moving, so that it can detect the hole by a sudden force change due to the loss of contact. To avoid long exploration cycles, a timeout is set which stops the process if the hole is not found within that time. In this case a new trial is started. 4.1. Case of the cylindrical peg The peg is 29 mm in diameter, while the hole is chamferless and 29.15 mm in diameter. The clearance between the peg and the hole is 0.075, thus the clearance ratio is 0.005. The peg has to be inserted to a depth of 10 mm into the hole. The input space of the self-organizing map is defined by the three filtered torque components. The map has 6 Â 4 units. The map is trained off-line with approximately 70,000 data vectors extracted from previous random trials. Once the map is trained, the robot performs a sequence of trials, each of which starts at a random position within an uncertainty radius of 3 mm. To ensure absolutely that the goal is within the exploration area, this area is set to a 5 mm square, centered at the real starting position. Exploration motions are Fig. 4. Zebra Zero robot arm, grasping a peg over the platform. 156 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 [...]... rest of the architecture and the training procedure remains unchanged The increased difficulty of the task is shown by the low percentage of successful insertions that are achieved randomly at the beginning of the learning process 4.2.1 Learning results Fig 11 depicts the insertion time during 8000 learning trials One should take into account that any failed insertion is rated at an untrue value of 30... motion planning with uncertainty in control and sensing, Artificial Intelligence 52 (1) (1991) 1–47 [14] S.M LaValle, S.A Hutchinson, An objective -based framework for motion planning under sensing and control uncertainties, International Journal of Robotics Research 17 (1) (1998) 19– 42 [15] B.J McCarragher, H Asada, A discrete event controller using Petri nets applied to assembly, in: Proceedings of. .. reaching by using visual information and Qlearning controllers, Autonomous Robots 9 (2000) 41–50 [19] M.A Erdmann, Randomization in robot tasks, International Journal of Robotics Research 11 (5) (1992) 399–436 [20] E Cervera, A.P del Pobil, E Marta, M.A Serna, Perception -based learning for motion in contact in task planning, Journal of Intelligent and Robotic Systems 17 (1996) 283–308 [21] T Kohonen, in: ... projection of the SOM on dimensions ðMx ; My Þ, for the triangle task E Cervera, A.P del Pobil / Information Sciences 145 (2002) 147–168 165 this absence of results in the literature might be indicative of the difficulties for properly performing and learning this task 4.3.2 Learning using the previous SOM An interesting generalization test is to use an SOM trained with samples from insertions of the square... robot motion planning problems, in: 28th IEEE Symposium on Foundations of Computer Science, 1987, pp 49–70 [2] H Asada, Representation and learning of nonlinear compliance using neural nets, IEEE Transactions on Robotics and Automation 9 (6) (1993) 863–867 [3] R.J Desai, R.A Volz, Identification and verification of termination conditions in fine motion in presence of sensor errors and geometric uncertainties,... degrees of freedom ðXY Þ being position-controlled and other degrees ðZÞ being forcecontrolled The complexity of the motion is transferred to the control modules, and the learning process is simplified 4.1.1 Learning results The learning update step consists in modifying the Q-value of the previous state and the performed action according to the reinforcement and the value of the next state The agent... Fig 5 Smoothed insertion time taken on 4000 trials of the cylinder task 158 E Cervera, A.P del Pobil / Information Sciences 145 (2002) 147–168 Fig 6 Evolution of the probability of successful insertion during the training process of 4000 consecutive trials is shown The timeout is set to 20 s in each trial The smoothed curve was obtained by filtering the data using a moving-average window of 100 consecutive... insertion only reaches about 60% of success after the training process, whereas 80% of successful insertions were attained in the cube example This is quite surprising, since initially the probability of insertion for the triangle is higher, and that means that it is easier to insert the triangle randomly than the cube However, it is more difficult to improve these skills based on the sensed forces for. .. is an interesting result which demonstrates the generalization capabilities of the SOM for extracting features which are suitable for different tasks 5 Conclusion A practical sensor- based learning architecture has been presented We have indicated the need for a robust representation of the task state, to minimize the effects of uncertainty The implemented system is fully autonomous, and incrementally... 210 s (31 min) The difference is more dramatic than in the case 2 of the cylinder, since the random controller, even for a long time, is only capable of performing a low percentage of trials (about 45%), whereas the learned controller achieves more than 90% of the trials As far as we know, this is the best performance achieved for this task using a square peg In [5] only results for the cylinder are . Sensor- based learning for practical planning of fine motions in robotics Enric Cervera * , Angel P. del Pobil Department of Computer Science and Engineering, Jaume-I University,. required to perform a correct insertion and is expressed in terms of cost or negative reinforcement. Sutton [22] defined reinforcement learning (RL) as the learning of a mapping from situations. Inc. All rights reserved. Keywords: Robotics; Neural nets; Reinforcement learning 1. Introduction We present a practical framework for fine motion tasks, particularly the insertion of non-cylindrical

Ngày đăng: 28/03/2014, 14:20

Xem thêm