www.ags.uni-sb.de/\protect\unhbox\voidb@x\penalty\ @M\{}omega, 3–28 4.1 ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✸✽ Function approximation, Generalization and Convergence • continuous state variables ⇒ infinite state space • Table with V − or Q−values can not be stored explicitly Solution: • Q(s, a)-table is replaced by a neural network with • Input-variables s, a and Q-value as output • Finite representation of (infinite) function Q(s, a)! • Generalization (from finite training samples) Attention: No convergence guarantee any more because Theorem 10.2 only holds if all state-action pairs are visited infinitely often Alternative: any other function approximator ❈♦♣②r✐❣❤t ➞ ✷✵✵✾✱ ❲✳ ❊rt❡❧ ✻✸✾ POMDP POMDP: partially observable Markov decision process: • many different states are recognized as one particular state • many states in the real world are mapped to a observation • Convergence problem with value iteration or Q-learning • Possible solutions:1, Observation Based Learning2 ✶ ❙✉tt♦♥✱ ❘✳✴❇❛rt♦✱ ❆✳ ❘❡✐♥❢♦r❝❡♠❡♥t ▲❡❛r♥✐♥❣✳ ▼■❚ Pr❡ss✱ ✶✾✾✽✳ ✷ ▲❛✉❡r✱ ▼✳✴❘✐❡❞♠✐❧❧❡r✱ ▼✳ ●❡♥❡r❛❧✐s❛t✐♦♥ ✐♥ ❘❡✐♥❢♦r❝❡♠❡♥t ▲❡❛r♥✐♥❣ ❛♥❞ t❤❡ ❯s❡ ♦❢ ❖❜s❡ r✈❛t✐♦♥✲❇❛s❡❞ ▲❡❛r♥✐♥❣✳ ■♥ ❑♦❦❛✐✱ ●❛❜r✐❡❧❧❛✴❩❡✐❞❧❡r✱ ❏❡♥s✱ ❡❞✐t♦rs Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ❋●▼▲ ❲♦r❦s❤♦♣ ✷✵✵✷✳ ✷✵✵✷✳ ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✹✵ Application: TD-Gammon • TD-Learning (Temporal Difference Learning) utilizes states that are more far in the future • TD-Gammon: a program for playing backgammon • TD-Learning using a backpropagation network with 40 to 80 hidden neurons • The only reward: Scoring at the end of the game • TD-Gammon was trained within 1.5 million games against itself • Beat’s Backgammon grandmaster! ❈♦♣②r✐❣❤t ➞ ✷✵✵✾✱ ❲✳ ❊rt❡❧ ✻✹✶ Other applications • RoboCup: with reinforcement learning a policy for the robot is learned, e.g dribbling3 • Inverse pendulum • Control of a Quadrocopter Problems in robotics: • extreme computation times in higher dimensional problems (many variables/actions • Feedback of the environment on real robots is [Russ Tedrake, IROS 08] ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✹✻ Curse of dimensionality • Problem: high dimensional state- and action spaces Solution methods: • Learning in nature happens on many abstraction layers • Computer Science: every learned skill is encapsulated in a module • Action space is scaled down • States are abstracted • Hierachical learning Barto✴Mahadevan • distributed learning (Centipede, a brain for each leg) ❈♦♣②r✐❣❤t ➞ ✷✵✵✾✱ ❲✳ ❊rt❡❧ ✻✹✼ Curse of dimensionality, other ideas • Human brain is at birth no tabula rasa • Good initial policy for a robot? • Classical programming • Reinforcement learning • Trainer offers additional feedback • or: • Learning from demonstration (learning with a teacher) • Reinforcement learning Billard ❡t ❛❧✳ • Trainer offers additional feedback ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✹✽ Current state of research • Fitted Value Iteration • Connecting reinforcement learning with imitation learning • Policy Gradient Methods • Actor Critic Methods • Natural Gradient Methods ❈♦♣②r✐❣❤t ➞ ✷✵✵✾✱ ❲✳ ❊rt❡❧ ✻✹✾ Fitted Value Iteration 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: Randomly sample m states from the MDP Ψ=0 n = the number of available actions in A repeat for i = → m for j = → n ´ (i) (j) q(a) = R(s ) + γV (s ) end for y (i) = maxj q(a) end for m (i) T (i) Ψ = arg minΨ i=1 (y − Ψ Φ(s )) until Ψ Converges ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✺✵ www.teachingbox.org • reinforcement learning algorithms: • value iteration • Q(λ), SARSA(λ) • TD(λ) • tabular and function approximation versions • actor critic • tile coding • locally weighted regression • Example Environments: • mountain car ❈♦♣②r✐❣❤t ➞ ✷✵✵✾✱ ❲✳ ❊rt❡❧ • gridworld (with editor), windy gridworld • dicegame • n armed bandit • pole swing up ✻✺✶ ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✻✺✷ Literatur • First introduction: Mitchell, ❚✳ ▼❛❝❤✐♥❡ ▲❡❛r♥✐♥❣✳ ▼❝●r❛✇ ❍✐❧❧✱ ✶✾✾✼ • Standard work: Sutton, ❘✳✴❇❛rt♦✱ ❆✳ ❘❡✐♥❢♦r❝❡♠❡♥t ▲❡❛r♥✐♥❣✳ ▼■❚ Pr❡ss✱ ✶✾✾✽ • Overview: Kaelbling, ▲✳P✳✴▲✐tt♠❛♥✱ ▼✳▲✳✴▼♦♦r❡✱ ❆✳P✳ ❘❡✐♥❢♦r❝❡♠❡♥t ▲❡❛r♥✐♥❣✿ ❆ ❙✉r✈❡②✳ ❏♦✉r♥❛❧ ♦❢ ❆rt✐✜❝✐❛❧ ■♥t❡❧❧✐❣❡♥❝❡ ❘❡s❡❛r❝❤✱ ✹ ✶✾✾✻ ... Philipp Reclam, 1994 (document) Chapter Introduction ❈♦♣②r✐❣❤t ➞ ✷✵✶✶✱ ❲✳ ❊rt❡❧ ✶✹ What is Artificial Intelligence (AI) • What is intelligence? • How can intelligence be measured? • How does our...

