Imitation Learning on Enduro

Introduction

In Imitation Learning, the model learns from the actions taken by the human.
Imitation Learning can be seen as supervised learning problem with the data being generated by the expert human.

About the Environment

Enduro consists of maneuvering a race car in the National Enduro, a long-distance endurance race. The object of the race is to pass a certain number of cars each day.
Doing so will allow the player to continue racing for the next day. The driver must avoid other racers and pass 200 cars on the first day, and 300 cars with each following day. An episode ends after 150 seconds per level. The game also ends and resets if the player reaches 999.99KM.

Observation Space

In this environment, the observation is an RGB image of the screen, which is an array of shape (210, 160, 3). Each action is repeatedly performed for a duration of k frames, where k is uniformly sampled from {2,3,4}. The game is running at 30fps.

Action Space

A total of 9 actions are defined in this environment.
The actions are NOOP, FIRE/ACCELERATE, RIGHT, LEFT, DOWN/DECELERATE, DOWNRIGHT, DOWNLEFT, RIGHTFIRE and LEFTFIRE.

Reward

In the gym package a reward of +1 is given for each car passed and -1 for each car that passes the agent. However, the net reward cannot drop below 0.

Dataset

We played the Enduro game and recorded some gameplay for the model to learn. We had three different gameplays with a total of 15 minutes of gameplay.

Model Architecture

We tried 3 different architectures:

SimpleNet (LeNet Architecture)
BigNet
ResNet18

Results

Model	Optimizer	Learning Rate	Rank (In Level 1)	Rank (In Level 2)	Number of cars passed
SimpleNet	Adam	10^-5	2	-	198
BigNet	SGD	50^-3	1	100	400
ResNet18	Adam	10^-3	1	150	350

Report

Code

Share on

Twitter Facebook LinkedIn

Sumukh Aithal