* It turns out that these techniques are insufficient for other games
* [Pitfall](https://ale.farama.org/environments/pitfall/) has state space that is too large
* The problem dimensionality makes it too easy to find a "new" state
The Atari game, Pitfall.
---
## Concepts
* We went over some high-level concepts
* Rollout and the policy improvement theorem
* MCTS
* Policy Learning
* Latent spaces
* Those are hopefully more fresh in your minds
* I can't ask you a mathematical question about Z that you can do by hand, so expect concept questions
---
## Policy Learning
* Estimating Q may actually be harder than estimating a policy
* After all, $\pi$ just needs to know that one thing is preferable to another, not their exact values
* And a policy can be stochastic in a way not supported by a Q value alone
* So policy learning may simplify this space
---
## Policy Learning
* We discussed REINFORCE
* And Actor-Critic training, where training is stabilized by a second model that is a past copy of the one being trained
* Both use the same general form of the policy update
* $\theta \leftarrow \theta + \alpha \cdot \frac{\delta log \left[ \pi(a|s_t, \theta) \right]}{\delta \theta}G_t$
* High positive rewards should be likely
* High negative rewards should be unlikely
---
## State Compression
* The solution is to use latent representations to compress the observable state into a smaller relevant state
* This is a broad, powerful idea
* That is difficult to compress into a slide or two
* So if you did not follow this concept, look over the last four lectures