Applying reinforcement learning for autonomous robot navigation in unknown environments

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FACULTY OF COMPUTER SCIENCE AND ENGINEERING GRADUATION THESIS APPLYING REINFORCEMENT LEARNING FOR AUTONOMOUS ROBOT NAVIGATION IN UNKNOWN ENVIRONMENTS Major: COMPUTER ENGINEERING THESIS COMMITTEE :COMPUTER ENGINEERING REVIEWER SUPERVISORS :PHAM HOANG ANH, Ph.D STUDENT :TRAN VAN HOAI, Assoc Prof Ph.D :TRAN THANH BINH, M.S :BUI QUANG DUC - 1852323 STUDENT :LE NGUYEN ANH TU - 1751110 HO CHI MINH CITY, 12/2022 Declaration We proclaim that this is our research work, which is being supervised by Assoc Prof Ph.D Tran Van Hoai and M.S Tran Thanh Binh The materials and results of the research in this topic are legitimate and have never been published in any format The contents of the analysis, survey and evaluation were gathered by the writers from various sources and are cited in the reference section In addition, figures and models from other groups of writers were used in the study, and the sources were cited and annotated If any fraud is discovered, we undertake to accept full responsibility for the research’s substance Ho Chi Minh City University Of Technology has nothing to with copyright infringement i Acknowledgement Firstly, we want to sincerely thank all of our advisors, M S Tran Thanh Binh and Assoc Prof Tran Van Hoai, for their enthusiasm and patience They have also given us excellent supervision and guidance which have helped us tremendously at all times during our research Additionally, we would like to thank all of the lecturers at the Faculty of Computer Science and Engineering as well as all of the faculty members of Ho Chi Minh City University of Technology in general, for their dedication to teaching and assisting the group in acquiring the knowledge and skills required Show the group how much fun the lessons are so they may keep pursuing their passions and achieving the group’s goals The group would also like to thank all the teachers in the Faculty of Computer Science and Engineering in particular and the teachers of Ho Chi Minh City University of Technology in general, for their dedicated teaching and helping the group to grasp the knowledge and skills needed by the group Show the group the excitement in the lessons so that they can continue to pursue their passions and conquer the group’s ambitions ii Abstract In recent years, robots that are mobile or autonomous have become more and more common in industries One of the most crucial areas of the field is the study of path-finding techniques that enable the robot to move efficiently from the starting point to the goal avoiding obstacles Although there are applications of mathematics and algorithms to solve the task in known static environments, a lot of additional research is required to adapt autonomous robots to unknown static environments and even dynamic environments The Reinforcement Learning (RL) approach is thus put into the study as one of the potential solutions to solve such challenges In this thesis, we aim to describe the research on path-planning and samplingbased-planning algorithms for the autonomous robot with a limited vision range to find the path to the pre-determined goal in static environments avoiding 2D uncertainties that are polygonal obstacles In addition, we researched the basics of the real autonomous robot (TurtleBot 3) which are capable of performing tasks in an unknown environment itself without explicit human control Most importantly, we studied the principles of RL and then implemented an RL-based mechanism that allows the autonomous robot to move in unknown static environments The challenge of applying RL to different maps or dynamic environments is an option to research but not an urgent one Based on the analysis of the simulation results, we demonstrate the feasibility and efficiency of the proposed approach in comparison with others published at [1], opening up opportunities in the future to add more features to create robots that are appropriate for moving in dynamic situations with an unknown set of obstacles iii This thesis is arranged into five chapters including: Chapter 1: Introduction Chapter 2: General knowledge Chapter 3: Methodology and implementation for RL system Chapter 4: Metrics and simulation results in comparison with other approaches Chapter 5: Conclusion iv Contents Declaration i Acknowledgement ii Abstract iii List of Figures viii Terms ix Introduction 1.1 General 1.2 Background 1.3 Scope of Work 2 Preliminary Studies 2.1 2.2 2.3 Path Planning 2.1.1 Global Path Planning 2.1.2 Local Path Planning RRT-based algorithms 2.2.1 RRT 2.2.2 RRT* Reinforcement Learning 2.3.1 Introduction v Experimental Results This chapter evaluates all the work that we have done so far using performance criteria Moreover, it shows our working result and comparison by pictures and graphs 4.1 Selecting Metrics This particular section describes the metrics which will be used to evaluate our work We will display the result specifically in the next section: • Convergence • Driven Distance • Waypoints 4.2 4.2.1 Simulation and Comparison in Python Environment Testing environments In this section, we test our algorithm - RRT* applying RL if the robot can reach the goal after training over types of obstacle: dead ends and convex polygon obstacles • Dead ends: We have trained the robot from start position (8,9) to goal 34 position (65,65) over maps that have dead ends And the figures below will show the result: Figure 4.1: Map with dead ends Figure 4.2: Map with dead ends 35 Figure 4.3: Map with dead ends As result, after training, the robot can only reach goal on over maps And the robot takes many step to escape the dead end on the map that it reach the goal Therefore, our algorithm does not work well on the maps that having dead ends as obstacle • Convex polygon obstacles: Next, we also trained the robot from start start position (8,9) to goal position (65,65) over maps But these map only have convex polygon obstacles: 36 Figure 4.4: Map with convex polygon obstacles Figure 4.5: Map with convex polygon obstacles 37 Figure 4.6: Map with convex polygon obstacles Over figures above, it shows the RRT* applying RL can train and make the robot reach goal if the obstacles are convex polygon Moreover, we also create a map having 10 convex polygon obstacles to test the robot can reach the goal if the obstacles density is high: Figure 4.7: Extended obstacle As the figure above, the robot can not reach the goal after training Therefore, if the obstacle density is too high which will cover almost the neighbor nodes 38 of the robot at a specific state, this makes the robot can not jump to any further node And the robot gets stuck and cannot reach the goal After testing these map, we recommend applying our algorithm on the maps that having only convex polygon obstacles Avoiding apply on maps that have dead ends or maps with too high obstacles density Therefore, in further testing in this chapter, we will compare our algorithm with others over maps with convex polygon obstacles limit the amount of convex polygon obstacles to lower than 10 in a 70x70 map 4.2.2 Testing Convergence The convergence metric ensures that the path length of the agent has to be improved after training And to get the optimize path length, we need to get the convergence of the total rewards In this section ,we will train the robot from start position (8,9) to goal position (65,65) in the map below: Figure 4.8: Map with obstacles And here is the graph of the total rewards of the agent received each episode: 39 Figure 4.9: Total rewards each episode converged during the training process According to Figure 4.9, we can see that the total rewards the agent received each episode gradually converge to a certain number during the training process In other words, the agent early takes random actions, and thus receives unstable total rewards each episode The agent then applies its learned knowledge to get maximum total rewards, the total rewards hence are taken into consideration convergence And these figure below is the training result each 100 episode of the example above: 40 Figure 4.10: The path length of agent each 100 episode As it easily seen, in the initial episode, the agent moves extremely wildly However, the agent then applied learned knowledge to move more efficiently in later episodes And after about 500 episodes the agent nearly walks alongside obstacles as expected and this path length would stay stable 41 4.2.3 Comparing with RRTX In this section, we will compare our algorithm - RRT* applying RL with algorithm of M.S Tran Thanh Binh - RRTX over maps with convex polygon obstacles in order to evaluate path length and waypoints metrics 4.2.3.1 Known Maps Firstly, we will compare with RRTX through known maps below: (a) obstacles (b) obstacles (c) obstacles Figure 4.11: Map with convex polygon obstacles And in each map, we will have examples to compare with RRTX as different start positions ((8,9) and (5,42)) and a goal (65,65) Given that our robot has been trained on these maps with these start positions and the obstacles are unchanged 42 Map with obstacles: • Start position (8,9): (a) RRTX (b) RRT* applying RL Figure 4.12: Map with obstacles example • Start position (5,42): (a) RRTX (b) RRT* applying RL Figure 4.13: Map with obstacles example 43 According to Figure 4.12, the RRTX algorithm takes 106.17 units of path length and 29 waypoints to the goal in comparison with 115.72 units of path length and 34 waypoints of RRT* applying RL On the other hand, the RRTX method needs 70.83 units of path length and 19 waypoints to reach the target in the second starting position (Figure 4.13), and the RRT* applying RL needs 75.39 units of path length and 22 waypoints Map with obstacles: • Start position (8,9): (a) RRTX (b) RRT* applying RL Figure 4.14: Map with obstacles example 44 • Start position (5,42): (a) RRTX (b) RRT* applying RL Figure 4.15: Map with obstacles example According to Figure 4.14, the RRTX algorithm takes 95.22 units of path length and 24 waypoints to the goal in comparison with 99.01 units of path length and 30 waypoints of RRT* applying RL On the other hand, in Figure 4.15, the RRTX method performs a path length of 76.26 and 16 waypoints , and the RRT* applying RL creates 78.14 units of path length and 24 waypoints 45 Map with obstacles: • Start position (8,9): (a) RRTX (b) RRT* applying RL Figure 4.16: Map with obstacles example • Start position (5,42): (a) RRTX (b) RRT* applying RL Figure 4.17: Map with obstacles example 46 According to Figure 4.16, the RRTX algorithm takes 92.18 units of path length and 22 waypoints to the goal in comparison with 96.63 units of path length and 31 waypoints of RRT* applying RL To reach the destination from the second starting place, the RRTX technique requires 66.58 units of path length and 15 waypoints (Figure 4.17), while the RRT* applying RL requires 67.54 units of path length and 20 waypoints Overall, M S Binh’s algorithm implementation performs better path length and waypoints metrics, which creates a more efficient path for the robot to reach the goal 4.2.3.2 Unknown Maps Secondly, we try to compare our algorithm with RRTX through unknown maps that base on the known map below: Figure 4.18: Known map with obstacles By changing the obstacles, we create unknown maps based on the map that the robot has been trained on Then we let the robot run in these new maps without the training process (the robot still has the knowledge of the known map) to see if the robot still can reach the goal after changing environment with RRT* applying RL and we also compare our algorithm with the RRTX algorithm on these new maps For the first unknown map, we shift the bottom-left obstacle downward units of the y-axis On the second map, we add a new obstacle to the map And the last map, we add many obstacles to the map, particularly the third map will be added more obstacles 47 (a) Shifting an obstacle (b) Adding an obstacle (c) Adding many obstacles Figure 4.19: Unknown maps from known map Next, we will show the comparison of those maps, given start position is (8,9) and the goal position is (65,65): • Shifting an obstacle (a) RRTX (b) RRT* applying RL Figure 4.20: Map with shifting bottom-left obstacle 48

Định dạng
Số trang	68
Dung lượng	4,39 MB