Hƣớng phát triển - tóm tắt luận án Nghiên cứu giải- 123docz.net

Giải thuật học củng cố điều khiển thích nghi bền vững nghiên cứu và phát triển trong luận án đã giảm bớt khoảng cách giữa lĩnh vực học máy và điều khiển. Tuy nhiên, so với hệ thống điều khiển thông minh ngày càng phát triển thì kết quả trình bày còn rất khiêm tốn và còn nhiều hướng cần phát triển:  Mở rộng ORADP cho hệ phi tuyến tổng quát hơn: hệ phi tuyến không biết

trước toàn bộ các thành phần động học, hoặc không biết trước cấu trúc.  Mở rộng ORADP để phát triển lý thuyết điều khiển hệ phi tuyến hồi tiếp

ngõ ra.

 Mở rộng ORADP trong học củng cố phân cấp, để tăng tốc độ hội tụ. Thực nghiệm ORADP trên hệ thống nhiều robot hợp tác, hoặc các đối tượng bầy đàn khác nhau.

DANH MỤC CÔNG TRÌNH ĐÃ CÔNG BỐ

[1]. Luy N.T., Thanh N.T., and Tri H.M. (2014), Reinforcement learning-based intelligent

tracking control for wheeled mobile robot, Transactions of the Institute of Measurement and Control, (SCIE), 36(7), pp. 868-877.

[2]. Luy N.T., Thanh N.T., and Tri, H.M. (2013), Reinforcement learning-based robust adaptive

tracking control for multi-wheeled mobile robots synchronization with optimality, IEEE Workshop on Robotic Intelligence In Informationally Structured Space, pp. 74-81.

[3]. Luy N.T. (2012), “Reinforcement learning-based tracking control for wheeled mobile

robot,” IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 462- 467.

[4]. Luy N.T. (2012), Reinforcement learning-based optimal tracking control for wheeled

mobile robot, IEEE International Conference on Cyber Technology in Automation, Control,

and Intelligent Systems, pp. 371-376.

[5]. Luy N.T., Thanh N.D., Thanh, N.T., and Ha, N.T.P. (2010), Robust Reinforcement

Learning-Based Tracking Control for Wheeled Mobile Robot, IEEE International

Conference on Computer and Automation Engineering, 1, pp. 171-176.

[6]. Nguyễn Tấn Lũy, Nguyễn Thiện Thành, Nguyễn Thị Phương Hà (2010), “Điều khiển thích

nghi bền vững sử dụng học củng cố cho hệ phi tuyến có ngõ vào bị ràng buộc bão hòa,”

Tạp chí khoa học và công nghệ các trường đại học kỹ thuật, số 75, trang 36-43.

[7]. Luy N.T., Thanh N.T., and Ha N.T.T. P. (2009), Robust adaptive control using

reinforcement learning for nonlinear system with input constraints, Journal of Science and

Technology Development – Vietnam National University- Ho Chi Minh City, 12, pp. 5-18.

[8]. Nguyễn Thị Phương Hà, Nguyễn Thiện Thành, Nguyễn Tấn Lũy (2008), “Nghiên cứu xấp

xỉ hàm trong học giám sát và học củng cố,” Tạp chí khoa học và công nghệ các trường đại

học kỹ thuật, số 68, trang 16-21.

TÀI LIỆU THAM KHẢO

[9]. Jiang Y. (2014), Robust adaptive dynamic programming for continuous-time linear and

nonlinear system, PhD. Thesis, New York Polytechnic University, New York.

[10]. Marvin K.B., Simon G.F., and Liberato C. (2009), “Dual adaptive dynamic control of

mobile robots using neural networks,” IEEE Trans. Syst., Man, Cybern., B Cybern., 39(1), pp. 129-141.

[11]. Mohareri O., Dhaouadi R., and Rad, A.B. (2012), “Indirect adaptive tracking control of a

nonholonomic mobile robot via neural networks,” Neurocomputing, 88, pp. 54–66. [12]. Vamvoudakis K.G., and Lewis F.L. (2011), “Online actor-critic algorithm to solve the

continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, pp. 878-

888.

[13]. Vamvoudakis K.G. (2011), Online learning algorithms for differential dynamic games and

optimal control, Ph.D. Thesis, Univ. of Texas at Arlington.

[14]. Vamvoudakis K.G., and Lewis F.L. (2012), “Online solution of nonlinear two-player zero-

sum games using synchronous policy iteration,” Int. J. Robust and Nonlinear Control, vol. 22, no. 13, pp. 1460–1483.

sum games with integral reinforcement learning,” Journal of Artificial Intelligence and Soft Computing Research, 1(4), pp. 315-332.

[16]. Van der Schaft A.J. (1992) “𝐿2-gain analysis of nonlinear systems and nonlinear state

feedback 𝐻∞ control,” IEEE Trans. on Autom. Contr., vol. 37, no. 6, pp. 770-784.

[17]. Wu W.H., and Biao L. (2012), “Neural network based online simultaneous policy update

algorithm for solving the HJI equation in nonlinear H∞ control,” IEEE Trans. Neur. Netw.

Learn. Syst., 23(12), pp. 1884 –1895.

[18]. Wenjie D., and Kuhnert K.D. (2005), “Robust adaptive control of nonholonomic mobile

robot with parameter and nonparameter uncertainties,” IEEE Trans. Robotics, 21(2), pp.

261-266.

[19]. Yang X., Liu D., and Wei Q. (2014), “Online approximate optimal control for affine non-

linear systems with unknown internal dynamics using adaptive dynamic programming,”

IET Control Theory and Applications, 8(16), pp. 1676-1688.

[20]. Zargarzadeh H., Dierks T., and Jagannathan S. (2014), “Adaptive neural network-based

optimal control of nonlinear continuous-time systems in strict feedback form,” Int. J. Adaptive Control and Signal Processing, 28, pp. 305-324.