By now we are all familiar with the Continuous Mountain Car problem. If you follow that link, you’ll find information about setting up and install the RL gym.
A starting template is in gym_mountain_car_start.py
This assignment asks you to create two agents. This is a good exercise to kickstart your progress on the final project if you haven’t started training an agent for it yet.
First, create an agent that estimates the Q(a,s) value estimation function. Use this agent and an epsilon-greedy policy to explore the state space and reach the goal. The default rewards are insufficient to regularly converge to a good policy, so create new rewards of your choosing. See the modifyReward function in the template for a starting point.
The discretized agent requires no training. Instead, implement the discretizeState function in the template, and perform value iteration. You have the layout of the mountain car environment from the documentation. The key here will be in choosing the right level of discretization. You could, for example, simply divide the world in half; left of the valley and right of the valley. Then you could say that current velocities are going left or going right. Now the continuous world has been discretized into 4 states. Is that sufficient?
Your discretized agent should be able to guide the vehicle to the goal with extremely high reliability.
Submit your code on canvas as homework3.py.