wiki:GettingStartedReinforcementLearning

Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 4 (modified by Brian Broll, 3 years ago) ( diff )
--

First Steps with RL in 0 A.D.

Introduction

Since r23917 the Pyrogenesis engine now features a dedicated interface for reinforcement learning.

Machine learning and reinforcement learning have been making impressive strides across a variety of domains from videogames to robotics. In this post, we will show how you can get up and running with reinforcement learning within 0 A.D., an open source RTS game! Before we start, we will be assuming some background knowledge of the key concepts in reinforcement learning and familiarity with OpenAI gym. Another good resource for learning about state and action spaces is available on the OpenAI gym website!

Installation

First we will need to install the following:

0 A.D. (at least version 0.24 or built from source)
zero_ad, the Python client to interface with the game engine.
zero_ad_rl, a suite of tools for RL in 0 A.D. using RLlib. This includes a 0 A.D. mod containing our starter scenario.

Installation of 0 A.D. should be straightforward following the links above. Next, we will install the python package using pip. At the time of this writing, the package has yet to be published to pypi and will need to be installed from the 0 A.D. codebase. We can check that our installation worked by running one of the example scripts. Now we should have the essential building blocks for creating RL agents in 0 A.D.!

For our first steps, we will be using an existing Python package, zero_ad_rl, which simply wraps RLlib, containing implementations of many of the state-of-the-art RL algorithms, and registers an OpenAI gym environment for 0 A.D. As we may want to create our own environments in the future, we will first clone the repo and install an editable copy locally. Note: there isn’t a single OpenAI gym environment because there are many different possible scenarios that we may want to explore with different goals. Furthermore, there isn’t a single clear representation for the state and action spaces; simple representations are often easier to learn whereas representations like minimaps will often generalize to new scenarios but can slow down learning significantly.

git clone "https://github.com/brollb/zero_ad_rl"
cd "zero_ad_rl"
pip install -e .

If the above command fails to find a distribution matching “ray[rllib]”, you may need to install it manually.

Finally, we simply need to install the mod with the scenario, 0ad_rl_maps.zip, which we will use for training our agent!

Scenario

Before we actually start training the agent, we will explain the scenario that the agent will be playing. The scenario is actually quite simple and designed for the agent to learn to kite. The player controls a small group of cavalry javelinists and is trying to defeat a larger group of infantry. A screenshot of the scenario is shown below:

As mentioned earlier, the state and action spaces need to be defined. We will be using very simple representations; as a result it should learn pretty quickly but will not generalize to other scenarios. Specifically, our state space will be single number which specifies the distance between the center of the player’s units and the center of the opposing units. The action space will consist of 2 (discrete) actions: attack or retreat. These are perhaps the simplest state and action space representations sufficient for a policy to learn to kite as the policy simply needs to learn that if the given input is below some (learned) value, it should retreat. Otherwise, it should attack!

Training an Agent!

Now that we understand the scenario, we simply need to train the agent! There are many different training algorithms that we could use. To keep things simple, we will start by training the agent using DQN. In a nutshell, DQN is trying to learn a Q function; that is, a function that predicts the return (cumulative reward) of taking a given action from the current state. Given the (exact) Q function, the optimal policy is trivial as we simply can compute the return for all actions and select the one that will return the largest cumulative reward! More information about learning the Q function can be found here!

Anyway, let’s get started training our agent! First we need to start 0 A.D. with the RL interface enabled. We will run it headlessly to speed things up a bit:

pyrogenesis --rl-interface=127.0.0.1:6000 --autostart-nonvisual --mod=rl-scenarios --mod=public

Now that we have installed zero_ad_rl, we should be able to run the train script using:

python -m zero_ad_rl.train --help

This command essentially wraps the RLlib equivalent but ensures that the 0 A.D. gym environments are registered with RLlib. We can now use this command to train a simple agent:

python -m zero_ad_rl.train --env CavalryVsSpearmen --run DQN --checkpoint-freq 25 --experiment-name MyFirstAgent

After running this command, we should should see a number of RLlib logs including information about the agent’s reward received during the training episode. By default, the experiment metrics will be stored in ~/ray_results and can be viewed using tensorboard as follows:

tensorboard --logdir=~/ray_results

Evaluating the Agent

After our agent appears to have achieved a reasonable performance via the tensorboard metrics, we can view what it has learned by watching the agent play (ie, generating rollouts in the environment). Similar to how the zero_ad_rl package wraps the train command line tool from RLlib, it also wraps the rollout command:

python -m zero_ad_rl.rollout --help

Now, we can simply select a checkpoint that we would like to evaluate and we should be all set! However, before we jump right in, we will want to restart 0 A.D. without the --autostart-nonvisual command so we can actually view the agent playing the game. (If the game minimizes upon losing the focus, you may need to set the SDL_MINIMIZE_ON_FOCUS_LOSS=0.)

pyrogenesis --rl-interface=127.0.0.1:6000 --mod=rl-scenarios --mod=public

Finally, we can select a checkpoint from ~/ray_results/MyFirstAgent/<experiment name> and then tell it to play 25 games (replace the first path to the checkpoint with the path to the checkpoint you would like to view):

python -m zero_ad_rl.rollout ~/ray_results/MyFirstAgent/<experiment name>/checkpoint_150/checkpoint-150 --env CavalryVsSpearmen --run DQN --episodes 25

After starting the script, you should see the newly trained RL agent playing the scenario upon switching back to 0 A.D.!

Next Steps

Although we were able to effectively train an RL agent from scratch to learn to play our small skirmish scenario, there is still plenty of room for improvement! A few ideas include:

make the RL agent generalize better by using a more expressive state and action spaces
train the agent where the enemy units are spawned in different locations
train it using a different scenario
train the agent via imitation from human demonstrations first

Stay tuned for a future post on how to get started on some of these!

Attachments (1)

CavalryVsSpearmen.png (730.7 KB ) - added by Stan 3 years ago.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text