MDPs and Reinforcement Learning
Prove that the always-west policy [for all states s, = West] isbetter than the always-east policy: [for all states s,
= East].
Hint: you can prove it by showing that for each state, itsrewards under always-west is higher than its reward underalways-east .
π(s) π(s) Show transcribed image text π(s)
π(s)
Answer