Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 22 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
22
Dung lượng
598,54 KB
Nội dung
Sensor-based learningfor practical
planning of fine motions in robotics
Enric Cervera
*
, Angel P. del Pobil
Department of Computer Science and Engineering, Jaume-I University, Castell
oo, Spain
Received 4 July 2001; received in revised form 8 October 2001; accepted 28 November 2001
Abstract
This paper presents an implemented approach to part-mating of three-dimensional
non-cylindrical parts with a 6 DOF manipulator, considering uncertainties in modeling,
sensing and control. The core of the proposed solution is a reinforcement learning al-
gorithm for selecting the actions that achieve the goal in the minimum number of steps.
Position and force sensor values are encoded in the state of the system by means of a
neural network. Experimental results are presented for the insertion of different parts –
circular, quadrangular and triangular prisms – in three dimensions. The system exhibits
good generalization capabilities for different shapes and location of the assembled
parts. These results significantly extend most of the previous achievements in fine
motion tasks, which frequently model the robot as a polygon translating in the plane in
a polygonal environment or do not present actual implemented prototypes.
Ó 2002 Elsevier Science Inc. All rights reserved.
Keywords: Robotics; Neural nets; Reinforcement learning
1. Introduction
We present a practical framework for fine motion tasks, particularly the
insertion of non-cylindrical parts with uncertainty in modeling, sensing and
control. The approach is based on an algorithm which autonomously learns a
Information Sciences 145 (2002) 147–168
www.elsevier.com/locate/ins
*
Corresponding author. Present address: Department of Computer Science and Engineering,
Jaume-I, Castell
oo, Spain.
E-mail addresses: ecervera@icc.uji.es (E. Cervera), pobil@icc.uji.es (A.P. del Pobil).
0020-0255/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved.
PII: S 0 0 20-0255(0 2 ) 0 0 2 28-1
relationship between sensed states and actions. This relationship allows the
robot to select those actions which attain the goal in the minimum number of
steps. A feature extraction neural network complements the learning algo-
rithm, forming a practical sensing-action architecture for manipulation tasks.
In the type ofmotionplanning problems addressed in this work, interactions
between the robot and objects are allowed, or even mandatory, for operations
such as compliant motions and parts mating. We restrict ourselves to tasks
which do not require complex plans; however, they are significantly difficult to
attain in practice due to uncertainties. Among these tasks, the peg-in-hole in-
sertion problem has been broadly studied, but very few results can be found in
the literature for three-dimensional non-cylindrical parts in an actual imple-
mentation.
We believe that practicality, although an important issue, has been vastly
underestimated in fine motion methods, since most of these approaches are
based on geometric models which become complex for non-trivial cases espe-
cially in three dimensions [1].
The remainder of this paper is structured as follows. Section 2 reviews some
related work and states the key contributions of our work. In Section 3, we
describe the components of the architecture. Thorough experimental results are
then presented in Section 4. Finally, Section 5 discusses a number of issues
regarding the proposed approach, and draws some conclusions.
2. Background and motivation
2.1. Related research
Though the peg-in-hole problem has been exhaustively studied for a long
time [2–4], most of the implementations have been limited to planar motions or
cylindrical parts [5–7]. Caine et al. [8] pointed out the difficulties of inserting
prismatic pegs. To our knowledge, our results are the first for a system which
learns to insert non-cylindrical pegs (see Fig. 1) in a real-world task with un-
certainty in position and orientation.
Parts mating in real-world industry is frequently performed by passive
compliance devices [4], which support parts and aid their assembly. They are
capable of high-speed precision insertions, but they lack the flexibility of
software methods.
A difficult issue in parts mating is the need for nonlinear compliance for
chamferless insertions, which was demonstrated by Asada [2], who proposed a
supervised neural network forlearning the nonlinear relationship between
sensing and motionin a two-dimensional frictionless peg-in-hole task. The use
of a supervised network presents a great difficulty in real-world three-dimen-
sional problems, since a proper training set has to be generated.
148 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168
Lozano-P
eerez [9] first proposed a formal approach to the synthesis of
compliant-motion strategies from geometric descriptions of assembly opera-
tions and explicit estimates of errors in sensing and control. In an extension to
this approach, Donald [10] presented a formal framework for computing
motion strategies which are guaranteed to succeed in the presence of three
kinds of uncertainty (sensing, control and model). Experimental verification is
described in [11], but only for planar tasks. Following DonaldÕs work, Briggs
[12] proposed an Oðn
2
logðnÞÞ algorithm, where n is the number of vertices in
the environment, for the basic problem of manipulating a point from a spec-
ified start region to a specified goal region amidst planar polygonal obstacles
where control is subject to uncertainty. Latombe et al. [13] describe two
practical methods for computing preimages for a robot having a two-dimen-
sional Euclidean configuration space. Though the general principles of the
planning methods immediately extend to higher dimensional spaces, the geo-
metric algorithms do not, and only simulated examples of planar tasks are
shown. LaValle and Hutchinson [14] present another framework for manipu-
lation planning under uncertainty, based on preimages, though they consider
such approach to be reasonable only for a few dimensions. Their computed
examples are restricted to planar polygonal models.
A different geometric approach is introduced by McCarragher and Asada
[15] who define a discrete event in assembly as a change in contact state re-
flecting a change in a geometric constraint. The discrete event modeling is
accomplished using Petri nets. Dynamic programming is used for task-level
planning to determine the sequence of desired markings (contact state) for
discrete event control that minimizes a path length and uncertainty perfor-
mance measure. The method is applied to a dual peg-in-hole insertion task, but
the motion is kept planar.
Learning methods provide a framework for autonomous adaptation and
improvement during task execution. An approach to learning a reactive control
strategy for peg-in-hole insertion under uncertainty and noise is presented in
[16]. This approach is based on active generation of compliant behavior using a
Fig. 1. Diagram of the insertion task.
E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 149
nonlinear admittance mapping from sensed positions and forces to velocity
commands. The controller learns the mapping through repeated attempts at
peg insertion. A two-dimensional version of the peg-in-hole task is imple-
mented on a real robot. The controller consists of a supervised neural network,
with stochastic units. In [5] the architecture is applied to a real ball-balancing
task, and a three-dimensional cylindrical peg-in-hole task. Kaiser and Dillman
[17] propose a hierarchical approach to learning the efficient application of
robot skills in order to solve complex tasks. Since people can carry out ma-
nipulation tasks with no apparent difficulty, they develop a method for the
acquisition of sensor-based robot skills from human demonstration. Two
manipulation skills are investigated: peg insertion and door opening. Distante
et al. [18] apply reinforcement learning techniques to the problem of target
reaching by using visual information.
2.2. Motivation
Approaches based on geometric models are far from being satisfactory:
most of them are restricted to planar problems, and a plan might not be found
if the part geometries are complex or the uncertainties are great. Many
frameworks do not consider incorrect modeling and robustness.
Though many of the approaches have been implemented in real-world en-
vironments, they are frequently limited to planar motions. Furthermore, cyl-
inders are the most utilized workpieces in three-dimensional problems.
If robots can be modeled as polygons moving amid polygonal obstacles in a
planar world, and a detailed model is available, a geometric framework is fine.
However, since such conditions are rarely found in practice, we argue that a
robust, adaptive, autonomous learning architecture for robot manipulation
tasks – particularly part mating – is a necessary alternative in real-world en-
vironments, where uncertainties in modeling, sensing and control are un-
avoidable.
3. A practical adaptive architecture
Fig. 2 depicts the three components of the adaptive architecture: two sensor-
based motions – guarded and compliant – and an additional subsystem com-
bining learning and exploration.
This architecture relies on two types of sensor: position ðxÞ and force ðf Þ.
Throughout this work, position and orientation of the tool frame are obtained
from the robot joint angles using the kinematic equations. Force measurements
are obtained from a wrist-mounted strain gauge sensor. It is assumed that all
sensors are calibrated, but uncertainty cannot be absolutely eliminated due to
sensor noise and calibration imprecision. The systemÕs output is the end-
150 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168
effector velocity ðvÞ in Cartesian coordinates, which is translated to joint co-
ordinates by a resolved motion rate controller:
_
hh ¼ J
À1
v; where v ¼
_
xx: ð1Þ
Since the work space of the fine motion task is limited to a small region, the
singularities of J are not important in this framework.
3.1. The insertion plan
Uncertainty in the location of the part and the hole prevents the success of a
simple position-based plan. Contact between parts has to be monitored, and
different actions are needed to perform a correct insertion. Other approaches
have tried to build a plan by considering all the possible contact states, but they
have only succeeded in simple planar tasks. In addition, uncertainty poses
difficulties for identifying the current state.
The proposed insertion plan consists of three steps, which are inspired by
intuitive manipulation skills:
(1) Approach hole until a contact is detected.
(2) Move compliantly around the hole until contact is lost (hole found).
(3) Move into the hole until a contact is detected (bottom of the hole).
This strategy differs from a pure random search in that an adaptation
procedure is performed during the second step. The system learns a relation-
ship between sensing and action, in an autonomous way, which guides the
exploration towards the target. Initially, the system relies heavily on explora-
tion. As a result of experience, an insertion skill is learned, and the mean in-
sertion time for the task is considerably improved.
3.2. Guarded motions
In guarded motions, the system is continuously monitoring a condition, which
usually stops the motion, e.g. a force value going beyond a fixed threshold.
Fig. 2. Subsystems of the adaptive architecture.
E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 151
In the above insertion plan, all the steps are force-guarded. Starting from a
free state and due to the geometry of the task, a contact is gained if jF
z
j raises to
0.1 kgf, and the contact is lost if jF
z
j falls below 0.05 kgf. This dual-threshold
accounts for small variations in the contact force due to friction, or uncertainty
in the measurements.
It is not impossible to insert the part at the first step, and additional in-
formation is required to know that the contact has been caused by the surface.
A position value is enough, since the depth of the hole is usually much greater
than the uncertainty in location. Another possibility is making small lateral
motions: if large forces are detected, the part has already been inserted into the
hole.
3.3. Compliant motions
Once a contact is achieved, motion is restricted to a surface. In practice, two
degrees of freedom ðX ; Y Þ are position-controlled, while the third one ðZÞ is
force-controlled. Initially, random compliant motions are performed, but a
relationship between sensed forces and actions is learned, which decreases the
time needed to insert the part.
During the third step, a complementary compliant motion is performed. In
this task, when the part is inserted, Z is position-controlled, while ðX ; Y Þ are
force-controlled.
3.4. Exploration and learning
Random search has been proposed in the literature as a valid tool for
dealing with uncertainties [19]. However, the insertion time greatly increases
when the clearance ratio decreases. In the proposed architecture (see Fig. 3), an
adaptation process learns a relationship between sensed states and actions,
which guides the insertion task towards completion with the minimum number
of actions.
A sensed state consists of a discretized position and force measurement, as
described below. A value is stored in a look-up table for each pair of state and
Fig. 3. Learning subsystem. Exploration is embedded in the action selection block.
152 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168
action. This value represents the amount of reinforcement which is expected in
the future, starting from the state, if the action is performed.
The reinforcement (or cost) is a scalar value which measures the quality of
the performed action. In our setup, a negative constant reinforcement is gen-
erated after every motion. The learning algorithm adapts the values of the table
so that the expected reinforcement is maximized, i.e., the number of actions
(cost) to achieve the goal is minimized.
The discrete nature of the reinforcement learning algorithm poses the ne-
cessity of extracting discrete values from the sensor signals of force and posi-
tion. This feature extraction process along with the basis of the learning
algorithm is described below.
3.4.1. Feature extraction
Force sensing is introduced to compensate for the uncertainty in positioning
the end-effector. It does a good job when a small displacement causes a contact,
since a big change in force is detected. However, with only force signals it is
not always possible to identify the actual contact state, i.e., different contacts
produce similar force measurements, as described in [20].
The adopted solution is to combine the force measurements with the relative
displacement of the end-effector from the initial position, i.e., that of the first
contact between the part and the surface.
The next problem is the discretization of the inputs, which is a requirement
of the learning algorithm. There is a conflict between size and fineness. With a
fine representation, the number of states is increased, thus slowing down the
convergence of the learning algorithm. Solutions are problem-dependent, using
heuristics for finding a good representation of manageable size.
We have obtained good results with the division of the exploration space in
three intervals along each position-controlled degree of freedom. For cylin-
drical parts, the XY-plane of the exploration space is divided into nine regions –
a3Â 3 grid. For non-cylindrical parts, the rotation around Z-axis has to be
considered too, thus the total number of states is 27. Region limits are fixed
according to the estimated uncertainty and the radius of exploration.
Though the force space could be partitioned in a similar way, an unsuper-
vised clustering scheme is used. In a previous work [20] we pointed out the
feasibility of unsupervised learning algorithms, particularly KohonenÕs self-
organizing maps (SOMs) [21], for extracting feature information from sensor
data in robotic manipulation tasks.
An SOM is a lattice of units, or cells. Each unit is a vector with as many
components as inputs to the SOM. Though there is a neighborhood relation-
ship between units in the lattice, this is only used during the training of the map
and not in our scheme.
SOMs perform a nonlinear projection of the probability density function of
the input space onto the two-dimensional lattice of units. Though all the six
E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 153
force and torque signals are available, the practical solution adopted is to use
only the three torque signals as inputs to the map. The reason for this is the
strong correlation between the force and the torque; thus, adding those cor-
related signals does not include any new information to the system.
The SOM is trained with sensor samples obtained during insertions. After
training, each cell or unit of the map becomes a prototype or codebook vector,
which represents a region of the input space. The discretized force state is the
codebook vector which comes the nearest (measured by the Euclidean distance)
to the analog force values.
The number of units must be chosen a priori, seeking for a balance between
size and fineness. In the experiments, a 6 Â 4 map is used, thus totalling 24
force discrete states. Since the final state consists of position and force, there
are 9 Â 24 ¼ 216 discrete states in the cylindrical insertion, and 27 Â 24 ¼ 648
discrete states in the non-cylindrical task.
3.4.2. Reinforcement learning
The advantage of the proposed architecture over other random approaches
is the ability to learn a relationship between sensed states and actions. As the
system becomes skilled, this relationship is more intensely used to guide the
process towards completion with the minimum number of steps.
The system must learn without a teacher. The skill measurement is the time
or number of steps required to perform a correct insertion and is expressed in
terms of cost or negative reinforcement.
Sutton [22] defined reinforcement learning (RL) as the learningof a map-
ping from situations to actions so as to maximize a scalar reward or rein-
forcement signal.
Q-learning [23] is an RL algorithm that can be used whenever there is no
explicit model of the system and the cost structure. This algorithm learns the
state–action pairs which maximize a scalar reinforcement signal that will be
received over time. In the simplest case, this measure is the sum of the future
reinforcement values, and the objective is to learn an associative mapping that
at each time step selects, as a function of the current state, an action that
maximizes the expected sum of future reinforcement.
In Q-learning, a look-up table of Q-values is stored in memory, one Q-value
for each state–action pair. The Q-value is the expected amount of reinforce-
ment if, from that state, the action is performed and, afterwards, only optimal
actions are chosen. In our setup, when the system performs any action (mo-
tion), a negative constant reinforcement is signalled. This reinforcement rep-
resents the cost of the motion. Since the learning algorithm tends to maximize
the reinforcement, cost will be minimized, i.e., the system will learn those ac-
tions which lead to the goal with the minimum number of steps.
The basic learning step consists in updating a single Q-value. If the system
senses state s, and it performs action a, resulting in reinforcement r and
154 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168
the system senses a new state s
0
, then the Q-value for ðs; aÞ is updated as
follows:
Qðs; aÞ ð1 À aÞQðs; aÞþa
r þ c max
a
0
2Aðs
0
Þ
Qðs
0
; a
0
Þ
; ð2Þ
where a is the learning rate and c is a discount factor, which weighs the value of
future reinforcement. The table converges to the optimal values as long as all
the states are visited infinitely often. In practice, a good solution is obtained
with a few thousand trials of the task.
3.4.3. Action selection and exploration
During the learning process, there is a conflict between exploration and
exploitation. Initially, the Q-values are meaningless and actions should be
chosen randomly, but as learning progresses, better actions should be chosen to
minimize the cost of learning. However, exploration cannot be completely
turned off, since the optimal action might not yet be discovered.
Some heuristics for exploration and exploitation can be found in the liter-
ature. In the implementation, we have chosen the Boltzmann exploration: the
Q-values are used for weighing exploitation and exploration. The probability
of selecting an action a in state s is
pðs ; aÞ¼
exp
Qðs;aÞ
T
P
a
0
exp
Qðs;a
0
Þ
T
; ð3Þ
where T is a positive value, which controls the degree of randomness, and it is
often referred to as temperature. It gradually decays from an initial value, and
exploration is turned off when it is close to zero, since the best action is selected
with Probability 1.
4. Experimental results
The system has been implemented in a robot arm equipped with a wrist-
mounted force sensor (Fig. 4). The task is the insertion of pegs of different
shapes (circular, square and triangular section) into their appropriate holes.
Pegs are made of wood, and the platform containing the holes is made of a
synthetic resin.
Uncertainty in the position and orientation is greater than the clearance
between the pegs and holes. The nominal goal is specified by a vector and a
rotation matrix relative to an external fixed frame of reference. This location is
supposed to be centered above the hole, so the peg would be inserted just by
moving straight along the Z axis with no rotation if there were no uncertainty
E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168 155
present. After positioning over the nominal goal, the robot performs a guarded
motion towards the hole.
If the insertion fails, the robot starts a series of perception and action cycles.
First, sensors are read, and a state is identified; depending on such state, one
action or another is chosen, and the learning mechanism updates the internal
parameters of decision. The robot performs compliant motions, i.e., it keeps
the contact with the surface while moving, so that it can detect the hole by a
sudden force change due to the loss of contact.
To avoid long exploration cycles, a timeout is set which stops the process if
the hole is not found within that time. In this case a new trial is started.
4.1. Case of the cylindrical peg
The peg is 29 mm in diameter, while the hole is chamferless and 29.15 mm in
diameter. The clearance between the peg and the hole is 0.075, thus the clear-
ance ratio is 0.005. The peg has to be inserted to a depth of 10 mm into the hole.
The input space of the self-organizing map is defined by the three filtered
torque components. The map has 6 Â 4 units. The map is trained off-line with
approximately 70,000 data vectors extracted from previous random trials.
Once the map is trained, the robot performs a sequence of trials, each of
which starts at a random position within an uncertainty radius of 3 mm. To
ensure absolutely that the goal is within the exploration area, this area is set to
a 5 mm square, centered at the real starting position. Exploration motions are
Fig. 4. Zebra Zero robot arm, grasping a peg over the platform.
156 E. Cervera, A.P. del Pobil / Information Sciences 145 (2002) 147–168
[...]... rest of the architecture and the training procedure remains unchanged The increased difficulty of the task is shown by the low percentage of successful insertions that are achieved randomly at the beginning of the learning process 4.2.1 Learning results Fig 11 depicts the insertion time during 8000 learning trials One should take into account that any failed insertion is rated at an untrue value of 30... motionplanning with uncertainty in control and sensing, Artificial Intelligence 52 (1) (1991) 1–47 [14] S.M LaValle, S.A Hutchinson, An objective -based framework formotionplanning under sensing and control uncertainties, International Journal ofRobotics Research 17 (1) (1998) 19– 42 [15] B.J McCarragher, H Asada, A discrete event controller using Petri nets applied to assembly, in: Proceedings of. .. reaching by using visual information and Qlearning controllers, Autonomous Robots 9 (2000) 41–50 [19] M.A Erdmann, Randomization in robot tasks, International Journal ofRobotics Research 11 (5) (1992) 399–436 [20] E Cervera, A.P del Pobil, E Marta, M.A Serna, Perception -based learningformotionin contact in task planning, Journal of Intelligent and Robotic Systems 17 (1996) 283–308 [21] T Kohonen, in: ... projection of the SOM on dimensions ðMx ; My Þ, for the triangle task E Cervera, A.P del Pobil / Information Sciences 145 (2002) 147–168 165 this absence of results in the literature might be indicative of the difficulties for properly performing and learning this task 4.3.2 Learning using the previous SOM An interesting generalization test is to use an SOM trained with samples from insertions of the square... robot motionplanning problems, in: 28th IEEE Symposium on Foundations of Computer Science, 1987, pp 49–70 [2] H Asada, Representation and learningof nonlinear compliance using neural nets, IEEE Transactions on Robotics and Automation 9 (6) (1993) 863–867 [3] R.J Desai, R.A Volz, Identification and verification of termination conditions in fine motionin presence ofsensor errors and geometric uncertainties,... degrees of freedom ðXY Þ being position-controlled and other degrees ðZÞ being forcecontrolled The complexity of the motion is transferred to the control modules, and the learning process is simplified 4.1.1 Learning results The learning update step consists in modifying the Q-value of the previous state and the performed action according to the reinforcement and the value of the next state The agent... Fig 5 Smoothed insertion time taken on 4000 trials of the cylinder task 158 E Cervera, A.P del Pobil / Information Sciences 145 (2002) 147–168 Fig 6 Evolution of the probability of successful insertion during the training process of 4000 consecutive trials is shown The timeout is set to 20 s in each trial The smoothed curve was obtained by filtering the data using a moving-average window of 100 consecutive... insertion only reaches about 60% of success after the training process, whereas 80% of successful insertions were attained in the cube example This is quite surprising, since initially the probability of insertion for the triangle is higher, and that means that it is easier to insert the triangle randomly than the cube However, it is more difficult to improve these skills based on the sensed forces for. .. is an interesting result which demonstrates the generalization capabilities of the SOM for extracting features which are suitable for different tasks 5 Conclusion A practical sensor- basedlearning architecture has been presented We have indicated the need for a robust representation of the task state, to minimize the effects of uncertainty The implemented system is fully autonomous, and incrementally... 210 s (31 min) The difference is more dramatic than in the case 2 of the cylinder, since the random controller, even for a long time, is only capable of performing a low percentage of trials (about 45%), whereas the learned controller achieves more than 90% of the trials As far as we know, this is the best performance achieved for this task using a square peg In [5] only results for the cylinder are . Sensor- based learning for practical planning of fine motions in robotics Enric Cervera * , Angel P. del Pobil Department of Computer Science and Engineering, Jaume-I University,. required to perform a correct insertion and is expressed in terms of cost or negative reinforcement. Sutton [22] defined reinforcement learning (RL) as the learning of a map- ping from situations. Inc. All rights reserved. Keywords: Robotics; Neural nets; Reinforcement learning 1. Introduction We present a practical framework for fine motion tasks, particularly the insertion of non-cylindrical