1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: Reinforcement learning-based intelligent tracking control for wheeled mobile robot

11 111 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

DSpace at VNU: Reinforcement learning-based intelligent tracking control for wheeled mobile robot tài liệu, giáo án, bài...

Transactions of the Institute of Measurement and Control http://tim.sagepub.com/ Reinforcement learning-based intelligent tracking control for wheeled mobile robot Nguyen Tan Luy, Nguyen Thien Thanh and Hoang Minh Tri Transactions of the Institute of Measurement and Control published online 10 March 2014 DOI: 10.1177/0142331213509828 The online version of this article can be found at: http://tim.sagepub.com/content/early/2014/03/10/0142331213509828 Published by: http://www.sagepublications.com On behalf of: The Institute of Measurement and Control Additional services and information for Transactions of the Institute of Measurement and Control can be found at: Email Alerts: http://tim.sagepub.com/cgi/alerts Subscriptions: http://tim.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav >> OnlineFirst Version of Record - Mar 10, 2014 What is This? Downloaded from tim.sagepub.com at Afyon Kocatepe Universitesi on May 16, 2014 Article Reinforcement learning-based intelligent tracking control for wheeled mobile robot Transactions of the Institute of Measurement and Control 1–10 Ó The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0142331213509828 tim.sagepub.com Nguyen Tan Luy1, Nguyen Thien Thanh2 and Hoang Minh Tri2 Abstract This paper proposes a new method to design a reinforcement learning-based integrated kinematic and dynamic tracking control algorithm for a nonholonomic wheeled mobile robot without knowledge of the system’s drift tracking dynamics The actor critic structure in the control scheme uses only one neural network to reduce computational cost and storage resources A novel tuning law for a single neural network is designed to learn an online solution of a tracking Hamilton–Jacobi–Isaacs (HJI) equation The HJI solution is used to approximate an HN optimal tracking performance index function and an intelligent tracking control law in the case of the worst disturbance The laws guarantee closed-loop stability in real time The convergence and stability of the overall system are proved by Lyapunov techniques The simulation results on a non-linear system and wheeled mobile robot verify the effectiveness of the proposed controller Keywords Actor critic, Hamilton–Jacobi–Isaacs equation, neural network, wheeled mobile robot Introduction An important motion control problem for the system of wheeled mobile robots (WMRs) is the trajectory tracking This problem has been extensively studied in past few decades Generally, a variety of control algorithms for the trajectory tracking problem has been devoted in the form of adaptive control (Fierro and Lewis, 1998; Marvin et al., 2009; Mohareri et al., 2012) where the back-stepping techniques are used The kinematic controllers are designed using the available models, and dynamic controllers are designed based on neural networks (NNs) They are considered indirect adaptive controllers Besides, they not minimize any long-term performance function and hence are not optimal HN adaptive control for a WMR based on inverse optimality is proposed in Miyasato (2008) but it is an offline control scheme A specific characteristic of the WMR models is that it can be presented as a non-linear system in a strict-feedback form, but until now, to the best knowledge of the authors, methods of tracking control for a WMR using this form are just considered in adaptive back-stepping (Chwa, 2010) or adaptive feedback linearization schemes (Khoshnam et al., 2011) without any optimality In the other direction, thanks to the abilities of online adaptive learning of reinforcement learning (RL) methods in optimal control, tracking control methods for WMRs have been studied The adaptive critic structures in RL are exploited to learn discrete controllers (Lin and Yang, 2008; Zenon and Marcin, 2011) or a continuous controller without disturbance using the learned solution of the Hamilton– Jacobi–Bellman (HJB) equation (Luy, 2012) These controllers not only overcome the drawbacks of the other methods such as the domain expert of fuzzy or existing controllers to generate a training sample for NNs, but also optimize utility functions, in contrast to the tracking error at the current time instant in the NN-based adaptive controllers However, these methods have access to the known explicit model of WMR and ignored the disturbance, so they are not a type of robust adaptive control method To control a non-linear system, i.e a WMR system with optimality related to disturbances using RL, the solutions of Hamilton–Jacobi–Isaacs (HJI) in the H‘ optimal control problem must be learned (Dierks and Jagannathan, 2010) The integral RL-based direct adaptive control algorithm for a class of general non-linear system has been studied in Vamvoudakis et al (2011) to solve the HJI equation The most favourable part of this algorithm is that NNs can be trained synchronously to approximate optimal control input and worst-case disturbance without knowledge of the system drift dynamics terms However, it requires three NNs in the same structure – one for the critic and the others for actors The number of neurons in the hidden layers should be at least Industrial University of Ho Chi Minh City, Ho Chi Minh City, Vietnam Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam Corresponding author: Nguyen Tan Luy, Division of Automation Electronics, 305A BaHom District, Kienthanh Building, Ho Chi Minh City, Ho Chi Minh 70000, Vietnam Email: luynguyentan@yahoo.com Downloaded from tim.sagepub.com at Afyon Kocatepe Universitesi on May 16, 2014 Transactions of the Institute of Measurement and Control (n + 1)n/2, where n is number of state variables In practical applications, e.g robotics, the number of state variables measured from sensors for feedback may be relatively large With three NNs, the number of NNs weights and the activation functions representing the elements in combination of the states will significantly increase If applied directly, the algorithm to a non-linear system may lead to the computational complexity and resource consumption In contrast, a method using a single online approximator (SOLA) in Dierks and Jagannathan (2010) to solve the HJI equation can reduce the number of NNs but, unfortunately, it is a type of model-based RL From the aforementioned problems, there are three main contributions in the paper The first involves the derivation of a tracking dynamics formed from a non-linear strict-feedback model of WMR the purpose of which is to design an integrated kinematic and dynamic control RL based-intelligent controller, i.e the integrated kinematic and dynamic robust direct adaptive tracking controller with optimality without explicit knowledge of the system’s drift dynamics The actor critic structure in the RL scheme uses only one NN for the critic law Secondly, the last contribution is the tuning law for the critic NN so that solutions of the tracking HJI equation are learned, and optimality values of the tracking performance index function and the robust direct adaptive control law as well as the worst-case disturbance law are approximated without accessing the system’s drift dynamics By Lyapunov techniques, the closed-loop system state and critic NN error are proved to be uniform ultimate bounds and system parameters show convergence to optimal target values asymptotically The paper is organized as follows The next section provides the theoretical background of the WMR to establish the non-linear WMR system in the strict-feedback form and then the new tracking dynamics is derived Then we design the integrated kinematic and dynamic robust direct adaptive tracking control scheme with optimality along with tuning law for the critic NN and give proof of stability and convergence The results of simulation on the WMR verify the effectiveness of the proposed algorithm and conclusions are drawn Strict-feedback kinematic and dynamic model A WMR with differentially driven wheels mounted on a driving axle can move and rotate on the horizontal plane thanks to two independent actuators Torque from the actuators is transmitted to the left and the right wheels to drive the robot The mass of the WMR including the mass of the platform without the wheels and the mass of the wheels is focused on a central point The distance of the driven wheels is b1 The radius of each wheel is r1 The distance from the centre point to the driving axe is l Without loss of generality, it can be assume that l=0 The WMR is considered a mechanical system with n generalized configuration variables q suffering m constraints (m\n) and represented by the equation as follows (Khoshnam et al., 2011) _ = Hk, j (q, q) n X _ q_ = 0, j = 1, , m hk, ji (q, q) ð1Þ where the number of holonomic and non-holonomic constrains are k and m2k, respectively The constrains are independent of time and can be written as Ak (q)q_ = 0, where Ak

Ngày đăng: 16/12/2017, 02:29

Xem thêm:

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w