Changes between Version 1 and Version 2 of GettingStartedReinforcementLearning


Ignore:
Timestamp:
Feb 5, 2021, 12:32:08 PM (3 years ago)
Author:
Stan
Comment:

Version from here http://brollb.github.io/machine-learning/games/2021/02/02/0ad-rl/

Legend:

Unmodified
Added
Removed
Modified
  • GettingStartedReinforcementLearning

    v1 v2  
    11[[TOC]]
    22
    3 = Introduction =
     3= First Steps with RL in 0 A.D. =
     4== Introduction ==
    45
    56Since r23917 the Pyrogenesis engine now features a dedicated interface for reinforcement learning.
     7
     8Machine learning and reinforcement learning have been making impressive strides across a variety of domains from videogames to robotics. In this post, we will show how you can get up and running with reinforcement learning within 0 A.D., an open source RTS game! Before we start, we will be assuming some background knowledge of the [https://spinningup.openai.com/en/latest/spinningup/rl_intro.html key concepts] in reinforcement learning and familiarity with OpenAI gym. Another good resource for learning about [https://gym.openai.com/docs/#spaces state and actions spaces] is available on the OpenAI gym website!
     9
     10== Installation ==
     11
     12First we will need to install the following:
     13
     14* [https://play0ad.com 0 A.D.] (at least version 0.24 or [BuildInstructions built from source])
     15* [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python zero_ad], the Python client to interface with the game engine.
     16* [http://github.com/brollb/zero_ad_rl zero_ad_rl], a suite of tools for RL in 0 A.D. using RLlib. This includes a 0 A.D. mod containing our starter scenario.
     17
     18Installation of 0 A.D. should be straightforward following the links above. Next, we will install the python package using pip. At the time of this writing, the package has yet to be published to pypi and will need to be installed from the [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python 0 A.D. codebase]. We can check that our installation worked by running one of the [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python/samples example scripts]. Now we should have the essential building blocks for creating RL agents in 0 A.D.!
     19
     20For our first steps, we will be using an existing Python package, [https://github.com/brollb/zero_ad_rl zero_ad_rl], which simply wraps RLlib, containing implementations of many of the state-of-the-art RL algorithms, and registers an OpenAI gym environment for 0 A.D. As we may want to create our own environments in the future, we will first clone the repo and install an editable copy locally. Note: there isn’t a single OpenAI gym environment because there are many different possible scenarios that we may want to explore with different goals. Furthermore, there isn’t a single clear representation for the state and action spaces; simple representations are often easier to learn whereas representations like minimaps will often generalize to new scenarios but can slow down learning significantly.
     21{{{#!bash
     22git clone "https://github.com/brollb/zero_ad_rl"
     23cd "zero_ad_rl"
     24pip install -e .
     25}}}
     26If the above command fails to find a distribution matching “ray[rllib]”, you may need to [https://docs.ray.io/en/master/installation.html install it manually].
     27
     28Finally, we simply need to [/wiki/Modding_Guide#Howtoinstallmods install the mod] with the scenario, 0ad_rl_maps.zip, which we will use for training our agent!
     29
     30== Scenario ==
     31
     32Before we actually start training the agent, we will explain the scenario that the agent will be playing. The scenario is actually quite simple and designed for the agent to learn to kite. The player controls a small group of cavalry javelinists and is trying to defeat a larger group of infantry. A screenshot of the scenario is shown below:
     33
     34[[Image(CavalryVsSpearmen.jpg, 25%)]]
     35
     36As mentioned earlier, the state and action spaces need to be defined. We will be using very simple representations; as a result it should learn pretty quickly but will not generalize to other scenarios. Specifically, our state space will be single number which specifies the distance between the center of the player’s units and the center of the opposing units. The action space will consist of 2 (discrete) actions: attack or retreat. These are perhaps the simplest state and action space representations sufficient for a policy to learn to kite as the policy simply needs to learn that if the given input is below some (learned) value, it should retreat. Otherwise, it should attack!
     37
     38== Training an Agent! ==
     39
     40Now that we understand the scenario, we simply need to train the agent! There are [https://docs.ray.io/en/master/rllib-algorithms.html many different training algorithms that we could use]. To keep things simple, we will start by training the agent using DQN. In a nutshell, DQN is trying to learn a Q function; that is, a function that predicts the return (cumulative reward) of taking a given action from the current state. Given the (exact) Q function, the optimal policy is trivial as we simply can compute the return for all actions and select the one that will return the largest cumulative reward! More information about learning the Q function can be found [https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html#what-to-learn here]!
     41
     42Anyway, let’s get started training our agent! First we need to start 0 A.D. with the RL interface enabled. We will run it headlessly to speed things up a bit:
     43{{{#!bash
     44pyrogenesis --rl-interface=127.0.0.1:6000 --autostart-nonvisual --mod=rl-scenarios --mod=public
     45}}}
     46Now that we have installed zero_ad_rl, we should be able to run the train script using:
     47{{{#!bash
     48python -m zero_ad_rl.train --help
     49}}}
     50This command essentially wraps the [https://docs.ray.io/en/master/rllib-training.html RLlib equivalent] but ensures that the 0 A.D. gym environments are registered with RLlib. We can now use this command to train a simple agent:
     51{{{#!bash
     52python -m zero_ad_rl.train --env CavalryVsSpearmen --run DQN --checkpoint-freq 25 --experiment-name MyFirstAgent
     53}}}
     54After running this command, we should should see a number of RLlib logs including information about the agent’s reward received during the training episode. By default, the experiment metrics will be stored in ~/ray_results and can be viewed using tensorboard as follows:
     55{{{#!bash
     56tensorboard --logdir=~/ray_results
     57}}}
     58
     59== Evaluating the Agent ==
     60
     61After our agent appears to have achieved a reasonable performance via the tensorboard metrics, we can view what it has learned by watching the agent play (ie, generating rollouts in the environment). Similar to how the zero_ad_rl package wraps the train command line tool from RLlib, it also wraps the [https://docs.ray.io/en/master/rllib-training.html#evaluating-trained-policies rollout] command:
     62
     63{{{#!bash
     64python -m zero_ad_rl.rollout --help
     65}}}
     66
     67Now, we can simply select a checkpoint that we would like to evaluate and we should be all set! However, before we jump right in, we will want to restart 0 A.D. without the --autostart-nonvisual command so we can actually view the agent playing the game. (If the game minimizes upon losing the focus, you may need to set the SDL_MINIMIZE_ON_FOCUS_LOSS=0.)
     68
     69{{{#!bash
     70pyrogenesis --rl-interface=127.0.0.1:6000 --mod=rl-scenarios --mod=public
     71}}}
     72
     73Finally, we can select a checkpoint from ~/ray_results/MyFirstAgent/<experiment name> and then tell it to play 25 games (replace the first path to the checkpoint with the path to the checkpoint you would like to view):
     74{{{#!bash
     75python -m zero_ad_rl.rollout ~/ray_results/MyFirstAgent/<experiment name>/checkpoint_150/checkpoint-150 --env CavalryVsSpearmen --run DQN --episodes 25
     76}}}
     77After starting the script, you should see the newly trained RL agent playing the scenario upon switching back to 0 A.D.!
     78
     79== Next Steps ==
     80
     81Although we were able to effectively train an RL agent from scratch to learn to play our small skirmish scenario, there is still plenty of room for improvement! A few ideas include:
     82
     83    make the RL agent generalize better by using a more expressive state and action spaces
     84    train the agent where the enemy units are spawned in different locations
     85    train it using a different scenario
     86    train the agent via imitation from human demonstrations first
     87
     88Stay tuned for a future post on how to get started on some of these!