In my previous post, I discussed about Value Iteration algorithm for Grid World Games. Policy iteration is another algorithm which is commonly used for solving MDP problems. Here I will be discussing Policy Iteration algorithm for Grid World Games.
Consider the same problem in the previous post. Both Value Iteration and Policy Iteration algorithms are initialized to arbitrary and random values which are far from optimal solution. These initialization do not take into account the specification of all states S and actions A which may be detrimental to the speed of convergence. For each iteration, the algorithms explore all the set A of actions, defined in the MDP, to improve the current policy, this may be causing unnecessary calculations. The transition model used is fixed, but, as we shall see later, in robotics a model of this type can degrade the results and not be in agreement with the theory of MDP.
No comments:
Post a Comment