[[TOC]] = First Steps with RL in 0 A.D. = == Introduction == Since r23917 the Pyrogenesis engine now features a dedicated interface for reinforcement learning. Machine learning and reinforcement learning have been making impressive strides across a variety of domains from videogames to robotics. In this post, we will show how you can get up and running with reinforcement learning within 0 A.D., an open source RTS game! Before we start, we will be assuming some background knowledge of the [https://spinningup.openai.com/en/latest/spinningup/rl_intro.html key concepts] in reinforcement learning and familiarity with OpenAI gym. Another good resource for learning about [https://gym.openai.com/docs/#spaces state and action spaces] is available on the OpenAI gym website! == Installation == First we will need to install the following: * [https://play0ad.com 0 A.D.] (at least version 0.24 or [BuildInstructions built from source]) * [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python zero_ad], the Python client to interface with the game engine. * [http://github.com/brollb/zero_ad_rl zero_ad_rl], a suite of tools for RL in 0 A.D. using RLlib. This includes a 0 A.D. mod containing our starter scenario. Installation of 0 A.D. should be straightforward following the links above. Next, we will install the python package using pip. At the time of this writing, the package has yet to be published to pypi and will need to be installed from the [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python 0 A.D. codebase]. We can check that our installation worked by running one of the [https://github.com/0ad/0ad/tree/master/source/tools/rlclient/python/samples example scripts]. Now we should have the essential building blocks for creating RL agents in 0 A.D.! For our first steps, we will be using an existing Python package, [https://github.com/brollb/zero_ad_rl zero_ad_rl], which simply wraps RLlib, containing implementations of many of the state-of-the-art RL algorithms, and registers an OpenAI gym environment for 0 A.D. As we may want to create our own environments in the future, we will first clone the repo and install an editable copy locally. Note: there isn’t a single OpenAI gym environment because there are many different possible scenarios that we may want to explore with different goals. Furthermore, there isn’t a single clear representation for the state and action spaces; simple representations are often easier to learn whereas representations like minimaps will often generalize to new scenarios but can slow down learning significantly. {{{#!bash git clone "https://github.com/brollb/zero_ad_rl" cd "zero_ad_rl" pip install -e . }}} If the above command fails to find a distribution matching “ray[rllib]”, you may need to [https://docs.ray.io/en/master/installation.html install it manually]. Finally, we simply need to [/wiki/Modding_Guide#Howtoinstallmods install the mod] with the scenario, 0ad_rl_maps.zip, which we will use for training our agent! == Scenario == Before we actually start training the agent, we will explain the scenario that the agent will be playing. The scenario is actually quite simple and designed for the agent to learn to kite. The player controls a small group of cavalry javelinists and is trying to defeat a larger group of infantry. A screenshot of the scenario is shown below: [[Image(CavalryVsSpearmen.png, 75%)]] As mentioned earlier, the state and action spaces need to be defined. We will be using very simple representations; as a result it should learn pretty quickly but will not generalize to other scenarios. Specifically, our state space will be single number which specifies the distance between the center of the player’s units and the center of the opposing units. The action space will consist of 2 (discrete) actions: attack or retreat. These are perhaps the simplest state and action space representations sufficient for a policy to learn to kite as the policy simply needs to learn that if the given input is below some (learned) value, it should retreat. Otherwise, it should attack! == Training an Agent! == Now that we understand the scenario, we simply need to train the agent! There are [https://docs.ray.io/en/master/rllib-algorithms.html many different training algorithms that we could use]. To keep things simple, we will start by training the agent using DQN. In a nutshell, DQN is trying to learn a Q function; that is, a function that predicts the return (cumulative reward) of taking a given action from the current state. Given the (exact) Q function, the optimal policy is trivial as we simply can compute the return for all actions and select the one that will return the largest cumulative reward! More information about learning the Q function can be found [https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html#what-to-learn here]! Anyway, let’s get started training our agent! First we need to start 0 A.D. with the RL interface enabled. We will run it headlessly to speed things up a bit: {{{#!bash pyrogenesis --rl-interface=127.0.0.1:6000 --autostart-nonvisual --mod=rl-scenarios --mod=public }}} Now that we have installed zero_ad_rl, we should be able to run the train script using: {{{#!bash python -m zero_ad_rl.train --help }}} This command essentially wraps the [https://docs.ray.io/en/master/rllib-training.html RLlib equivalent] but ensures that the 0 A.D. gym environments are registered with RLlib. We can now use this command to train a simple agent: {{{#!bash python -m zero_ad_rl.train --env CavalryVsSpearmen --run DQN --checkpoint-freq 25 --experiment-name MyFirstAgent }}} After running this command, we should should see a number of RLlib logs including information about the agent’s reward received during the training episode. By default, the experiment metrics will be stored in ~/ray_results and can be viewed using tensorboard as follows: {{{#!bash tensorboard --logdir=~/ray_results }}} == Evaluating the Agent == After our agent appears to have achieved a reasonable performance via the tensorboard metrics, we can view what it has learned by watching the agent play (ie, generating rollouts in the environment). Similar to how the zero_ad_rl package wraps the train command line tool from RLlib, it also wraps the [https://docs.ray.io/en/master/rllib-training.html#evaluating-trained-policies rollout] command: {{{#!bash python -m zero_ad_rl.rollout --help }}} Now, we can simply select a checkpoint that we would like to evaluate and we should be all set! However, before we jump right in, we will want to restart 0 A.D. without the --autostart-nonvisual command so we can actually view the agent playing the game. (If the game minimizes upon losing the focus, you may need to set the SDL_MINIMIZE_ON_FOCUS_LOSS=0.) {{{#!bash pyrogenesis --rl-interface=127.0.0.1:6000 --mod=rl-scenarios --mod=public }}} Finally, we can select a checkpoint from ~/ray_results/MyFirstAgent/ and then tell it to play 25 games (replace the first path to the checkpoint with the path to the checkpoint you would like to view): {{{#!bash python -m zero_ad_rl.rollout ~/ray_results/MyFirstAgent//checkpoint_150/checkpoint-150 --env CavalryVsSpearmen --run DQN --episodes 25 }}} After starting the script, you should see the newly trained RL agent playing the scenario upon switching back to 0 A.D.! == Next Steps == Although we were able to effectively train an RL agent from scratch to learn to play our small skirmish scenario, there is still plenty of room for improvement! A few ideas include: * make the RL agent generalize better by using a more expressive state and action spaces * train the agent where the enemy units are spawned in different locations * train it using a different scenario * train the agent via imitation from human demonstrations first Stay tuned for a future post on how to get started on some of these!