Imitation Learning on Enduro
Introduction
In Imitation Learning, the model learns from the actions taken by the human.
Imitation Learning can be seen as supervised learning problem with the data being generated by the expert human.
About the Environment
Enduro consists of maneuvering a race car in the National Enduro, a long-distance endurance race. The object of the race is to pass a certain number of cars each day.
Doing so will allow the player to continue racing for the next day. The driver must avoid other racers and pass 200 cars on the first day, and 300 cars with each following day. An episode ends after 150 seconds per level. The game also ends and resets if the player reaches 999.99KM.
Observation Space
In this environment, the observation is an RGB image of the screen, which is an array of shape (210, 160, 3). Each action is repeatedly performed for a duration of k frames, where k is uniformly sampled from {2,3,4}. The game is running at 30fps.
Action Space
A total of 9 actions are defined in this environment.
The actions are NOOP, FIRE/ACCELERATE, RIGHT, LEFT, DOWN/DECELERATE, DOWNRIGHT, DOWNLEFT, RIGHTFIRE and LEFTFIRE.
Reward
In the gym package a reward of +1 is given for each car passed and -1 for each car that passes the agent. However, the net reward cannot drop below 0.
Dataset
We played the Enduro game and recorded some gameplay for the model to learn. We had three different gameplays with a total of 15 minutes of gameplay.
Model Architecture
We tried 3 different architectures:
- SimpleNet (LeNet Architecture)
- BigNet
- ResNet18
Results
Model | Optimizer | Learning Rate | Rank (In Level 1) | Rank (In Level 2) | Number of cars passed |
---|---|---|---|---|---|
SimpleNet | Adam | 10^-5 | 2 | - | 198 |
BigNet | SGD | 50^-3 | 1 | 100 | 400 |
ResNet18 | Adam | 10^-3 | 1 | 150 | 350 |