Profiling

This page aims to give an overview of the various tools available for profiling the game (i.e. measuring speed and resource usage), and some details on how to use them.

In-game profiler

When the game is running, press F11 once to display the profiler. This is hierarchical: some rows have a digit to their left, and pressing the corresponding key will drill down into that row and show timings for the sub-sections within that. (Press 0 to go back up a level). Rows in white are from C++, rows in red are from scripts. Only code running on the main thread is counted.

The columns are:

  • calls/frame - number of times that section has been entered in a single frame (averaged over the past 30 frames). A frame corresponds to a single iteration of the main game loop, usually clamped to a maximum 60fps by vsync.
  • msec/frame - total amount of time spent inside that section per frame (summed for all calls; averaged over the past 30 frames).
  • mallocs/frame - number of memory allocations inside that section per frame. Only works when the game is compiled in debug mode - in release mode it's always 0. Might not work on Windows at all.
  • calls/turn - number of times called in a single simulation turn (not averaged). A simulation turn occurs typically every 200ms or 500ms or so, and runs all of the gameplay update code, and corresponds to a variable number of frames, so this is more useful than calls/frame for measuring code that only gets called during the simulation turn.
  • msec/turn - same idea.
  • mallocs/turn - same idea again.

To use this profiler in code, do:

#include "ps/Profile.h"

then

{
  PROFILE("section name");
  ... code to measure ...
}

and it will measure all code from the PROFILE until the end of the current scope. (You can also use PROFILE_START("foo"); ... PROFILE_END("foo"); which automatically add scoping braces.)

The profiler is relatively expensive; you usually shouldn't put PROFILE in code that will be called thousands of times per frame, since the profiling overhead will dominate the code you're trying to measure.

Pressing F11 multiple times will toggle through different profiler modes (script data, network data, renderer data).

Pressing Shift+F11 will save profile.txt in the game's logs folder (see GameDataPaths). Pressing it multiple times (without restarting the game) will append new measurements to that file.

Profiler2

The in-game profiler is designed to show average performance data at an instant in time. While this is adequate for some purposes, it's not as good for analyzing rapid fluctuations in game performance. It also doesn't support multiple threads, so only the engine thread can be analyzed. To solve these shortcomings, proflier2 was created.

Profiler2 collects profiling data across multiple threads and runs a small web server in the engine. When enabled, an HTML page and script can request, analyze and render this profiling data. Profiler2 is currently enabled by pressing F11 while the game is open, similar to the in-game profiler. Then the HTML page included with the profiler2 tool (source/tools/profiler2/profiler2.html) is opened in a web browser supporting HTML5.

TODO screenshot and describe it

Using profiler2 in code works similarly to the in-game profiler, except the scoped macro is named PROFILER2:

#include "ps/Profiler2.h"
...
{
  PROFILE2("section name");
  ... code to measure ...
}

and it will measure all code from the PROFILE2 until the end of the current scope. There is also PROFILE2_EVENT to record events in profiler2 and PROFILE2_ATTR to add printf-style strings to the current region or event (seen in tooltips when hovering the profiler2 rendered output). This allows displaying more contextual data than would typically be possible in a profiler. For convenience, using PROFILE3 will measure with both the in-game profiler and the new profiler2.

For more detailed GPU profiling data on some systems, PROFILE2_GPU and PROFILE3_GPU can be used similarly.

Low-overhead timer

A less fancy way to measure sections of code is:

#include "lib/timer.h"
...
{
  TIMER(L"description");
  ... code to measure ...
}

which will record the time from TIMER until the end of the current scope, then immediately print it via debug_printf (which goes to stdout on Unix, and the debugger Output window on Windows).

To measure cumulative time spent in a section of code:

#include "lib/timer.h"
TIMER_ADD_CLIENT(tc_SomeDescription);
...
{
  TIMER_ACCRUE(tc_SomeDescription);
  ... code to measure ...
}

which will sum the time spent, then print it via debug_printf once when the game is shutting down. The output will be like

TIMER TOTALS (9 clients)
-----------------------------------------------------
  tc_SomeDescription: 10.265 Mc (12x)
  ...

saying the cumulative time, measured in CPU cycles (kc = kilocycles (103), Mc = megacycles (106), etc). (It measures cycles since that's cheaper and more reliable than converting to seconds, given that clock frequency can change while the game is running. Divide by your CPU frequency manually to get real time). This should have significantly lower overhead than PROFILE, so you can use it in functions that get called more often.

Simulation replay mode

If you're not measuring graphics or GUI code, replay mode lets you run the simulation (the gameplay code and scripts and the AI etc) at maximum speed with all graphics disabled. This should be fully deterministic (i.e. you'll get the same results on each run) and doesn't need user interaction (so you can easily run in Valgrind). It even allows you to create nice graphs for comparing the performance before and after your changes!

First, play the game normally, in either single-player or multiplayer. It will generate a file sim_log/${PID}/commands.txt in the game's logs path (see GameDataPaths) based on the process ID; looking for the newest directory should find it. This contains the map setup data and a list of all players' inputs, so the game can be replayed. You might want to copy the commands.txt to somewhere memorable.

Run the game like

./pyrogenesis -replay=path/to/commands.txt

(or something equivalent on Windows). It will print its status to stdout (or the Output window on Windows). It gives the MD5 checksum of the complete simulation state once it's finished, which is handy if you're changing the code and want to make sure you're not affecting the simulation behaviour. Run in a profiler to measure whatever you want to measure.

Creating graphs

The replay mode also stores the in-game profiler state in profile.txt every 20 turns. There's a script in source/tools/replayprofile/ which can plot a graph from that file.

  1. Check source/tools/replayprofile/extract.pl if it points to the right profile.txt (you can also change it to point to another working copy, but remember not to accidentally commit this change later)
  2. Optionally apply the attached patch and define a filter for some of the events (again, don't commit your changes!)
  3. Run perl extract.pl > data.js
  4. Optionally make a second measurement for your modified code using the same commands.txt and extracting the data to data_1.js
  5. Optionally copy and paste lines from data_1.js to data.js and give them a sensible name (of course you can paste as many lines from as many other data files as you want)
  6. Open graph.html

It will look similar to this:
example profile graph

Valgrind

Callgrind with KCacheGrind is quite nice for detailed (though not hugely accurate) profiling. Valgrind often doesn't like SpiderMonkey's JITs so you may have to use valgrind --smc-check=all ... if it otherwise crashes.

Very Sleepy

Win32 tool very fast and easy to use and setup download

Graphic performance tools

Nowadays, GPU profiler are specific tools, allowing very good insight on where,how time is spent by GPU Intel gpa, AMD codeXL, Nvidia nsight, microsoft xperf

Last modified 11 months ago Last modified on Dec 29, 2013 10:45:23 AM

Attachments (2)

Download all attachments as: .zip