1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Robot Learning 2010 Part 7 pptx

15 176 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Uncertainty in Reinforcement Learning — Awareness, Quantisation, and Control 83 While the full-matrix UP is the more fundamental and theoretically more sound method, its computational cost is considerable (see table 3). If used with care, however, DUIPI and DUIPI-QM constitute valuable alternatives that proved well in practice. Although our experiments are rather small, we expect DUIPI and DUIPI-QM to also perform well on larger problems. 8.3 Increasing the expected performance Incorporating uncertainty in RL can even improve the expected performance for concrete MDPs in many practical and industrial environments, where exploration is expensive and only allowed within a small range. The available amount of data is hence small and exploration takes place in an, in part extremely, unsymmetrical way. Data is particularly collected in areas where the operation is already preferable. Many of the insufficiently explored so-called on-border states are undesirable in expectation, but might, by chance, give a high reward in the singular case. If the border is sufficiently large this might happen at least a few times and such an outlier might suggest a high expected reward. Note that in general the size of the border region will increase with the dimensionality of the problem. Carefully incorporating uncertainty avoids the agent to prefer those outliers in its final operation. We applied the joint iteration on a simple artificial archery benchmark with the “border phenomenon”. The state space represents an archer’s target (figure 7). Starting in the target’s middle, the archer has the possibility to move the arrowhead in all four directions and to shoot the arrow. The exploration has been performed randomly with short episodes. The dynamics were simulated with two different underlying MDPs. The arrowhead’s moves are either stochastic (25 percent chance of choosing another action) or deterministic. The event of making a hit after shooting the arrow is stochastic in both settings. The highest probability for a hit is with the arrowhead in the target’s middle. The border is explored quite rarely, such that a hit there misleadingly causes the respective estimator to estimate a high reward and thus the agent to finally shoot from this place. 0.06 0.17 0.28 0.17 0.06 0.17 0.28 0.39 0.28 0.17 0.28 0.39 0.5 0.39 0.28 0.17 0.28 0.39 0.28 0.17 0.06 0.17 0.28 0.17 0.06 Fig. 7. Visualisation of the archery benchmark. The picture shows the target consisting of its 25 states, together with their hitting probabilities. Robot Learning 84 Setting Model Discr. # Obs. ξ = 0 ξ = 0.5 ξ = 1 ξ = 2 ξ = 3 ξ = 4 ξ = 5 Archery Frequentist 100 0.14 0.16 0.13 0.05 0.05 0.04 0.04 02.071.0005)citsahcotS( 0.25 0.22 0.10 0.05 0.04 1000 0.21 0.26 0.29 0.27 0.22 0.11 0.07 2500 0.27 0.29 0.31 0.31 0.30 0.28 0.24 Archery Deterministic 100 0.35 0.38 0.23 0.17 0.12 0.11 0.09 (Deterministic) Dirichlet Prior 500 0.32 0.38 0.39 0.4 1 0.27 0.18 0.11 ∀ i : α i = 44.014.053.000010 0.45 0.44 0.30 0.14 2500 0.44 0.46 0.48 0.49 0.50 0.50 0.48 Turbine Frequentist coarse 10 4 0.736 0.758 0.770 0.815 0.837 0.848 0.855 medium 10 4 0.751 0.769 0.784 0.816 0.83 3 0.830 0.815 fine 10 4 0.767 0.785 0.800 0.826 0.837 0.840 0.839 Turbine Maximum Entropy coarse 10 4 0.720 0.767 0.814 0.848 0.851 0.854 0.854 Dirichlet Prior medium 10 4 0.713 0.731 0.749 0.777 0.78 7 0.780 0.771 ∀ i : α i = 1fine10 4 0.735 0.773 0.789 0.80 0 0.800 0.786 0.779 NNCRRRNGPRRRNRyzzuFRSPRLQRnoCfeRnosirapmoCroF 01esraocenibruT 5 0.680 0.657 0.662 medium 10 5 0.53 0.687 0.745 0.657 0.851 0.861 0.859 fine 10 5 0.717 0.729 0.668 Table 4. Average reward for the archery and gas turbine benchmark. Uncertainty in Reinforcement Learning — Awareness, Quantisation, and Control 85 In table 4 the performance, averaged over 50 trials (two digits precision), for the frequentist setting (in the stochastic case) and the deterministic prior (in the deterministic case) for the transition probabilities are listed. The table shows that the performance indeed increases with ξ until a maximum and then decreases rapidly. The position of the maximum apparently increases with the number of observations. This can be explained by the decreasing uncertainty. The performance of the theoretical optimal policy is 0.31 for the stochastic archery benchmark and 0.5 for the deterministic one. They are achieved in average by the certain-optimal policy based on 2500 observations with 1 ≤ ξ ≤ 2 in the stochastic case and for 3 ≤ ξ ≤ 4 in the deterministic case. 8.4 An industrial application We further applied the uncertainty propagation together with the joint iteration on an application to gas turbine control (Schaefer et al., 2007) with a continuous state and a finite action space, where it can be assumed that the “border phenomenon” appears as well. We discretised the internal state space with three different precisions (coarse (4 4 = 256 states), medium (5 4 = 625 states), fine (6 4 = 1296 states)), where the high-dimensional state space has already been reduced to a four-dimensional approximate Markovian state space, called “internal state space”. A detailed description of the problem and the construction of the internal state space can be found in Schaefer et al. (2007). Note that the Bellman iteration and the uncertainty propagation is computationally feasible even with 6 4 states, since P and Cov((P,R)) are sparse. We summarise the averaged performances (50 trials with short random episodes starting from different operating points, leading to three digits precision) in table 4 on the same uninformed priors as used in section 8.3. The rewards were estimated with an uninformed normal-gamma distribution as conjugate prior with σ = ∞ and α = β = 0. In contrary to the archery benchmark, we left the number of observations constant and changed the discretisation. The finer the discretisation, the larger is the uncertainty. Therefore the position of the maximum tends to increase with decreasing number of states. The performance is largest using the coarse discretisation. Indeed, averaged over all discretisations, the results for the frequentist setting tend to be better than for the maximum entropy prior. The overall best performance can be achieved with the coarse discretisation and the frequentist setting with ξ = 5, but using the maximum entropy prior leads to comparable results even with ξ = 3. The theoretical optimum is not known, but for comparison we show the results of the recurrent Q-learning (RQL), prioritised sweeping (RPS), fuzzy RL (RFuzzy), neural rewards regression (RNRR), policy gradient NRR (RPGNRR), and control neural network (RCNN) (Schaefer et al., 2007; Appl & Brauer, 2002; Schneegass et al., 2007). The highest observed performance is 0.861 using 10 5 observations, which has almost been achieved by the best certain-optimal policy using 10 4 observations. 9. Conclusion A new approach incorporating uncertainty in RL is presented, following the path from awareness to quantisation and control. We applied the technique of uncertainty propagation Robot Learning 86 (awareness) not only to understand the reliability of the obtained policies (quantisation) but also to achieve certain-optimality (control), a new optimality criterion in RL and beyond. We exemplarily implemented the methodology on discrete MDPs, but want to stress on its generality, also in terms of the applied statistical paradigm. We demonstrated how to realistically deal with large-scale problems without a substantial loss of performance. In addition, we have shown that the method can be used to guide exploration (control). By changing a single parameter the derived policies change from certain-optimal policies for quality assurance to policies that are certain-optimal in a reversed sense and can be used for information-seeking exploration. Current and future work considers several open questions as the application to other RL paradigms and function approximators like neural networks and support vector machines. Another important issue is the utilisation of the information contained in the full covariance matrix rather than only the diagonal. This enhancement can be seen as a generalisation of the local to a global measure of uncertainty. It can be shown that the guaranteed minimal performance for a specific selection of states depends on the covariances between the different states, i.e., the non-diagonal entries of the covariance matrix. Last but not least the application to further industrial environments is strongly aspired. Definitely, as several laboratory conditions, such as the possibility of an extensive exploration or the access on a sufficiently large number of observations, are typically not fulfilled in practice, we conclude that the knowledge of uncertainty and its intelligent utilisation in RL is vitally important to handle control problems of industrial scale. 10. References Abbeel, P., Coates, A., Quigley, M. & Ng, A. Y. (2006). An application of reinforcement learning to aerobatic helicopter flight, Proc. of the 20th Conference on Neural Information Processing Systems, MIT Press, pp. 1–8. Antos, A., Szepesvári, C. & Munos, R. (2006). Learning near-optimal policies with Bellman- residual minimization based fitted policy iteration and a single sample path, Proc. of the Conference on Learning Theory, pp. 574–588. Appl, M. & Brauer, W. (2002). Fuzzy model-based reinforcement learning, Advances in Computational Intelligence and Learning, pp. 211–223. Bertsekas, D. P. & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming, Athena Scientific. Brafman, R. I. & Tennenholtz, M. (2003). R-Max - a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research 3: 213– 231. Coppersmith, D. & Winograd, S. (1990). Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation 9: 251–280. D’Agostini, G. (2003). Bayesian Reasoning in Data Analysis: A Critical Introduction, World Scientific Publishing. Dearden, R., Friedman, N. & Andre, D. (1999). Model based Bayesian exploration, Proc. of the Conference on Uncertainty in Artificial Intelligence, pp. 150–159. Dearden, R., Friedman, N. & Russell, S. J. (1998). Bayesian Q-learning, Proc. of the Innovative Applications of Artificial Intelligence Conference of the Association for the Advancement of Artificial Intelligence, pp. 761–768. Uncertainty in Reinforcement Learning — Awareness, Quantisation, and Control 87 Delage, E. & Mannor, S. (2007). Percentile optimization in uncertain Markov decision processes with application to efficient exploration, Proc. of the International Conference on Machine Learning, pp. 225–232. Engel, Y., Mannor, S. & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning, Proc. of the International Conference on Machine Learning, pp. 154–161. Engel, Y., Mannor, S. & Meir, R. (2005). Reinforcement learning with Gaussian processes, Proc. of the International Conference on Machine learning, pp. 201–208. Geibel, P. (2001). Reinforcement learning with bounded risk, Proc. of the 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 162–169. Ghavamzadeh, M. & Engel, Y. (2006). Bayesian policy gradient algorithms, Advances in Neural Information Processing Systems 19, pp. 457–464. Ghavamzadeh, M. & Engel, Y. (2007). Bayesian actor-critic algorithms, Proc. of the International Conference on Machine learning, pp. 297–304. Hans, A. & Udluft, S. (2009). Efficient uncertainty propagation for reinforcement learning with limited data, Proc. of the International Conference on Artificial Neural Networks, Springer, pp. 70–79. Hans, A. & Udluft, S. (2010). Uncertainty propagation for efficient exploration in reinforcement learning, Proc. of the European Conference on Artificial Intelligence Heger, M. (1994). Consideration of risk in reinforcement learning, Proc. 11th International Conference on Machine Learning, Morgan Kaufmann, pp. 105–111. ISO (1993). Guide to the Expression of Uncertainty in Measurement, International Organization for Standardization. Kaelbling, L. P., Littman, M. L. & Moore, A. W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4: 237–285. Kearns, M., Mansour, Y. & Ng, A. Y. (2000). Approximate planning in large POMDPs via reusable trajectories, Advances in Neural Information Processing Systems 12. Kearns, M. & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time, Proceedings of the 15th International Conference on Machine Learning, pp. 260–268. Lagoudakis, M. G. & Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research pp. 1107–1149. Lee, H., Shen, Y., Yu, C H., Singh, G. & Ng, A. Y. (2006). Quadruped robot obstacle negotiation via reinforcement learning, Proc. of the 2006 IEEE International Conference on Robotics and Automation, ICRA 2006, May 15-19, 2006, Orlando, Florida, USA, pp. 3003–3010. MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Cambridge. Merke, A. & Riedmiller, M. A. (2001). Karlsruhe brainstormers - a reinforcement learning approach to robotic soccer, RoboCup 2001: Robot Soccer World Cup V, Springer, pp. 435– 440. Mihatsch, O. & Neuneier, R. (2002). Risk-sensitive reinforcement learning, Machine Learning 49(2–3): 267–290. Munos, R. (2003). Error bounds for approximate policy iteration., Proc. of the International Conference on Machine Learning, pp. 560–567. Robot Learning 88 Peshkin, L. & Mukherjee, S. (2001). Bounds on sample size for policy evaluation in Markov environments, Proc. of Annual Conference on Computational Learning Theory, COLT and the European Conference on Computational Learning Theory, Vol. 2111, Springer, Berlin, pp. 616–629. Peters, J. & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks 21(4): 682–697. Poupart, P., Vlassis, N., Hoey, J. & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning, Proc. of the International Conference on Machine Learning, pp. 697–704. Puterman, M. L. (1994). Markov Decision Processes, John Wiley & Sons, New York. Rasmussen, C. E. & Kuss, M. (2003). Gaussian processes in reinforcement learning, Advances in Neural Information Processing Systems 16, pp. 751–759. Schaefer, A. M., Schneegass, D., Sterzing, V. & Udluft, S. (2007). A neural reinforcement learning approach to gas turbine control, Proc. of the International Joint Conference on Neural Networks. Schneegass, D., Udluft, S. & Martinetz, T. (2007). Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification, Proc. of the International Conference on Artificial Neural Networks, pp. 109–118. Schneegass, D., Udluft, S. & Martinetz, T. (2008). Uncertainty propagation for quality assurance in reinforcement learning, Proc. of the International Joint Conference on Neural Networks, pp. 2589–2596. Stephan, V., Debes, K., Gross, H M., Wintrich, F. & Wintrich, H. (2000). A reinforcement learning based neural multi-agent-system for control of a combustion process, Proc. of the International Joint Conference on Neural Networks, pp. 217–222. Strehl, A. L. & Littman, M. L. (2008). An analysis of model-based interval estimation for Markov decision processes., Journal of Computer and System Sciences 74(8): 1309– 1331. Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge. Wiering, M. & Schmidhuber, J. (1998). Efficient model-based exploration, Proceedings of the 5th International Conference on Simulation of Adaptive Behavior: From Animals to Animats 5, MIT Press/Bradford Books, Montreal, pp. 223–228. Appendix Theorem 1 Suppose a finite MDP M = (S,A,P,R) with discount factor 0 < γ < 1 and C 0 an arbitrary initial symmetric and positive definite covariance matrix. Then the function ( ) ( ) 1111 ,= , ( ) mm m m m mT QC TQ D C D −−−− (35) provides a unique fixed point (Q * ,C * ) almost surely, independent of the initial Q, for policy evaluation and policy iteration. Proof: It has already been shown that Q m = TQ m-1 converges to a unique fixed point Q * (Sutton & Barto, 1998). Since Q m does not depend on C k or the Jacobi matrix D k for any Uncertainty in Reinforcement Learning — Awareness, Quantisation, and Control 89 iteration k<m, it remains to show that C * unambiguously arises from the fixed point iteration. We obtain 11 0 =0 =0 =() mm mi iT ii CDCD −− ∏∏ (36) after m iterations. Due to convergence of Q m , D m converges to D * as well, which leads to ** * conv =0 =0 =() T ii CDC D ∞∞ ∏∏ (37) with C conv the covariance matrix after convergence of Q. By successive matrix multiplication we obtain ***** ,,,,, =0 =0 * () ()() ()() ()= 0 I 0 00 I nn ni i QQ QQ QP QQ QR ii n DDDDD D ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ∑∑ (38) eventually leading to ***** ,,,,, =0 =0 * () ()() ()() ()= 0 I 0 00 I ii QQ QQ QP QQ QR ii DDDDD D ∞∞ ∞ ∞ ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ∑∑ (39) *1* *1* ,, ,, 0(I( ) )( ) (I( ) )( ) =0 I 0 00 I QQ QP QQ QR DD DD −− ⎛⎞ −− ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ (40) since all eigenvalues of (D * ) Q,Q are strictly smaller than 1 and I — (D * ) Q,Q is invertible for all but finitely many (D * ) Q,Q . Therefore, almost surely, (D * ) ∞ exists, which implies that C * exists as well. We finally obtain () **1** ,,,, =(I())() () QQ QQ QP QR CDDD − − (41) () * , *1 , * , Cov( , ) Cov( , ) () (I ( ) ) . Cov(,) Cov(,) () T T QP QQ T T QR PP PR D D PR RR D − ⎛⎞ ⎛⎞ − ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ⎝⎠ (42) Robot Learning 90 The fixed point C * depends on the initial covariance matrices Cov(P), Cov(R), and Cov(P,R) solely, but not on Cov(Q,Q), Cov(Q,P), or Cov(Q,R) and is therefore independent of the operations necessary to reach the fixed point Q * . □ 5 Anticipatory Mechanisms of Human Sensory- Motor Coordination Inspire Control of Adaptive Robots: A Brief Review Alejandra Barrera Mexico’s Autonomous Technological Institute (ITAM) Mexico City, Mexico 1. Introduction Sensory-motor coordination involves the study of how organisms make accurate goal- directed movements based on perceived sensory information. There are two problems associated to this process: sensory feedback is noisy and delayed, which can make movements inaccurate and unstable, and the relationship between a motor command and the movement it produces is variable, as the body and the environment can both change. Nevertheless, we can observe everyday our ability to perform accurate movements, which is due to a nervous system that adapts to those existing limitations and continuously compensates for them. How does the nervous system do it? By means of anticipating the sensory consequences of motor commands. The idea that anticipatory mechanisms guide human behaviour, i.e., that predictions about future states directly influence current behavioural decision making, has been increasingly appreciated over the last decades. Various disciplines have explicitly recognized anticipations. In cognitive psychology, the ideo-motor principle states that an action is initiated by the anticipation of its effects, and before this advanced action mechanism can be used, a learning phase has to take place, advising the actor about several actions and their specific effects (Stock and Stock, 2004). In biorobotics, anticipation plays a major role in the coordination and performance of adaptive behaviour (Butz et al., 2002), being interested on designing artificial animals (animats) able to adapt to environmental changes efficiently by learning and drawing inferences. What are the bases of human anticipation mechanisms? Internal models of the body and the world. Internal models can be classified into (Miall & Wolpert, 1996): a. forward models, which are predictive models that capture the causal relationship between actions and outcome, translating the current system state and the current motor commands (efference copy) into predictions of the future system state, and b. inverse models, which generate from inputs about the system state and state transitions, an output representing the causal events that produced that state. Forward models are further divided into (Miall & Wolpert, 1996): i. forward dynamic models, estimating future system states after current motor commands, Robot Learning 92 ii. forward sensory models, predicting sensory signals resultant from a given current state, and iii. forward models of the physical properties of the environment, anticipating the behaviour of the external world. Hence, by cascading accurate forward dynamic and forward sensory models, transformation of motor commands into sensory consequences can be achieved, producing a lifetime of calibrated movements. The accuracy of forward models is maintained through adaptive processes driven by sensory prediction errors. Plenty of neuroscientific studies in humans suggest evidence of anticipatory mechanisms based on the concept of internal models, and several robotic implementations of predictive behaviors have been inspired on those biological mechanisms in order to achieve adaptive agents. This chapter provides an overview of such neuroscientific evidences, as well as the state of the art relative to corresponding implementations in robots. The chapter starts by reviewing several behavioral studies that have demonstrated anticipatory and adaptive mechanisms in human sensory-motor control based on internal models underlying tasks such as eye–hand coordination, object manipulation, eye movements, balance control, and locomotion. Then, after providing a description of neuroscientific bases that have pointed to the cerebellum as a site where internal models are learnt, allocated and maintained, the chapter summarizes different computational systems that may be developed to achieve predictive robot architectures, and presents specific implementations of adaptive behaviors in robots including anticipatory mechanisms in vision, object manipulation, and locomotion. The chapter also provides a discussion about the implications involved in endowing a robot with the capability of exhibiting an integral predictive behavior while performing tasks in real-world scenarios, in terms of several anticipatory mechanisms that should be implemented to control the robot. Finally, the chapter concludes by suggesting an open challenge in the biorobotics field: to design a computational model of the cerebellum as a unitary module able to learn and operate diverse internal models necessary to support advanced perception-action coordination of robots, showing a human-like robust reactive behavior improved by integral anticipatory and adaptive mechanisms, while dynamically interacting with the real world during typical real life tasks. 2. Neuroscientific bases of anticipatory and adaptive mechanisms This section reviews diverse neuroscientific evidences of human anticipatory and adaptive mechanisms in sensory-motor control, including the consideration of the cerebellum as a prime candidate module involved in sensory prediction. 2.1 Behavioral evidences Several behavioural studies have demonstrated anticipatory and adaptive mechanisms in human sensory-motor control based on internal models underlying tasks such as eye–hand coordination (Ariff et al., 2002; Nanayakkara & Shadmehr, 2003; Kluzik et al., 2008), object manipulation (Johansson, 1998; Witney et al., 2004; Danion & Sarlegna, 2007), eye movements (Barnes & Asselman, 1991), balance control (Huxham et al., 2001), and locomotion (Grasso et al., 1998), as described in the following subsections. [...]... actuators based on sensory inputs Learning to control the agent consists in learning to associate the good set of outputs to any set of inputs that the agent may experience The most common way to perform such learning consists in using the back-propagation Anticipatory Mechanisms of Human Sensory-Motor Coordination Inspire Control of Adaptive Robots: A Brief Review 97 algorithm, which computes, for... goaldirected reaching movements while holding the handle of a robotic arm that produced forces perturbing trajectories Authors compared subjects’ adaptation between three trial conditions: with robot forces turned off in unannounced manner, with robot forces turned off in announced manner, and free-space trials holding the handle but detached from the robot When forces increased abruptly and in a single step,... predictive behaviors in robots include anticipatory mechanisms in vision (Hoffmann, 20 07; Datteri et al., 2003), object manipulation (Nishimoto et al., 2008; Laschi et al., 2008), and locomotion (Azevedo et al., 2004; Gross et al., 1998), as described in the following subsections 3.1 Vision In (Hoffmann, 20 07) , results are presented from experiments with a visually-guided fourwheeled mobile robot carrying out... behavioural meaning The robot learns a forward model by moving randomly within arrangements of obstacles and observing the changing visual input For perceptual judgment, the robot stands still, observes a single image, and internally simulates the changing images given a sequence of movement commands (wheel speeds) as specified by a certain movement plan With this simulation, the robot judges the distance... associated proprioception of the robotic manipulator If the system prediction is correct, full processing of the sensory input is not needed at this stage Only when expected perceptions do not match incoming sensory data, full perceptual processing is activated Experimental results from a feeding task where the robotic arm places a spoon in its Cartesian space, showed the robot capability to monitor the... programming techniques applied to the learnt predictive model and the sign list • Anticipatory learning classifier systems that, similar to the schema mechanism and SRS/E, contain an explicit prediction component, and the predictive model consists of a set of rules (classifiers) which are endowed with an “effect” part to predict the next situation the agent will encounter if the action specified by the... (20 07) monitored grip force while subjects transported a hand-held object to a visual target that could move unexpectedly They found that subjects triggered fast arm movement corrections to bring the object to the new target location, and initiated grip force adjustments before or in synchrony with arm movement corrections Throughout the movement, grip force anticipated the mechanical consequences 94 Robot. .. intermediate and lateral cerebellum with a forward internal model of the arm predicting the consequences of arm movements, specifically the position, direction of movement, and speed of the limb 96 Robot Learning Internal models are useful in sensory-motor coordination only if their predictions are generally accurate When an accurate representation has been learnt, e.g., a forward model of how motor... order to maintain a desired level of performance, the brain needs to be “robust” to those changes by means of updating or adapting the internal models (Shadmehr et al., 2010) According to Lisberger (2009), the theory of cerebellar learning could be an important facet of the operation of internal models in the cerebellum In this theory, errors in movement are signaled by consistently timed spikes on... before the time the climbing fiber input arrived The extension of the cerebellar learning theory to cerebellar internal models proposes that depression of the parallel fiber to Purkinje cell synapses corrects the internal model in the cerebellum, so that the next instance of a given movement is closer to perfection 3 Robotic implementations of predictive behaviours Anticipatory animats involve agent . 10 4 0 .72 0 0 .76 7 0.814 0.848 0.851 0.854 0.854 Dirichlet Prior medium 10 4 0 .71 3 0 .73 1 0 .74 9 0 .77 7 0 .78 7 0 .78 0 0 .77 1 ∀ i : α i = 1fine10 4 0 .73 5 0 .77 3 0 .78 9 0.80 0 0.800 0 .78 6 0 .77 9 NNCRRRNGPRRRNRyzzuFRSPRLQRnoCfeRnosirapmoCroF 01esraocenibruT 5 0.680. 10 4 0 .73 6 0 .75 8 0 .77 0 0.815 0.8 37 0.848 0.855 medium 10 4 0 .75 1 0 .76 9 0 .78 4 0.816 0.83 3 0.830 0.815 fine 10 4 0 .76 7 0 .78 5 0.800 0.826 0.8 37 0.840 0.839 Turbine Maximum Entropy coarse 10 4 0 .72 0 0 .76 7. shoot from this place. 0.06 0. 17 0.28 0. 17 0.06 0. 17 0.28 0.39 0.28 0. 17 0.28 0.39 0.5 0.39 0.28 0. 17 0.28 0.39 0.28 0. 17 0.06 0. 17 0.28 0. 17 0.06 Fig. 7. Visualisation of the archery benchmark.

Ngày đăng: 11/08/2014, 23:22

TỪ KHÓA LIÊN QUAN