12 | | |
13 | | * [http://www.microsoft.com/visualstudio Visual Studio] - the basic tool for debugging the game on Windows. Break into the debugger on a breakpoint, on a crash or assertion failure, or any other time. Visual C++ Express is free and contains similar debugging features. Can be used to analyze crash dumps and get a useful call stack. |
14 | | * [http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx WinDbg] - part of the Windows SDK, a very powerful debugging suite which is primarily command line driven, unlike Visual Studio. Analyze crashes in more detail than VS. |
15 | | * [http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx DebugView] - If you don't run the process in a debugger, !DebugView lets you view its normally hidden debug output. Users can install and run this much more easily than a full debugging suite. |
16 | | * [http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx VMMap] - Free tool from Microsoft to analyze the virtual memory usage of a process; shows fragmentation, can be useful for observing memory leaks or finding why a large allocation fails. |
17 | | * [http://www.gremedy.com/ gDEBugger] - Debug and profile OpenGL applications. Useful for debugging GL errors and finding unexpected behavior. |
18 | | * [http://notepad-plus-plus.org/ Notepad++] - small, simple, powerful text editor. You need a decent text editor on Windows. |
19 | | * A hex editor, like [http://home.gna.org/bless/ Bless] - useful for examining binary simulation state dumps, either for saved games or serialization errors. |
| 10 | * [http://www.microsoft.com/visualstudio Visual Studio] - the basic tool for debugging the game on Windows. Break into the debugger on a breakpoint, on a crash or assertion failure, or any other time. Visual C++ Express is free and contains similar debugging features. Can be used to analyze crash dumps and get a useful call stack. |
| 11 | * [http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx WinDbg] - part of the Windows SDK, a very powerful debugging suite which is primarily command line driven, unlike Visual Studio. Analyze crashes in more detail than VS. |
| 12 | * [http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx DebugView] - If you don't run the process in a debugger, !DebugView lets you view its normally hidden debug output. Users can install and run this much more easily than a full debugging suite. |
| 13 | * [http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx VMMap] - Free tool from Microsoft to analyze the virtual memory usage of a process; shows fragmentation, can be useful for observing memory leaks or finding why a large allocation fails. |
| 14 | * [http://www.gremedy.com/ gDEBugger] - Debug and profile OpenGL applications. Useful for debugging GL errors and finding unexpected behavior. |
| 15 | * [http://notepad-plus-plus.org/ Notepad++] - small, simple, powerful text editor. You need a decent text editor on Windows. |
| 16 | * A hex editor, like [http://home.gna.org/bless/ Bless] - useful for examining binary simulation state dumps, either for saved games or serialization errors. |
44 | | Important info to gather: |
45 | | * Build environment - custom build, SVN autobuild, or release package? Which compiler version? |
46 | | * Hardware (e.g. `system_info.txt`) |
47 | | * Operating system (e.g. `system.info.txt`) |
48 | | * Which version of the game was the user playing? |
49 | | * What was the user doing when the crash occurred? |
50 | | * Where there any errors or visible problems before the crash? (e.g. `interestinglog.html`) |
51 | | * What are the minimal steps to get the crash? Is it consistent? |
| 39 | * Build environment - custom build, SVN autobuild, or release package? Which compiler version? |
| 40 | * Hardware (e.g. `system_info.txt`) |
| 41 | * Operating system (e.g. `system.info.txt`) |
| 42 | * Which version of the game was the user playing? |
| 43 | * What was the user doing when the crash occurred? |
| 44 | * Where there any errors or visible problems before the crash? (e.g. `interestinglog.html`) |
| 45 | * What are the minimal steps to get the crash? Is it consistent? |
61 | | * It can be opened in Visual Studio or !WinDbg. For this to be useful, '''you need to have the debug symbols and source code matching the affected build of the game'''. Most users use the autobuild version of the game, you can simply download the correct autobuilt binaries from SVN. Or for a release, install that particular version of the game and acquire the matching source package from [http://releases.wildfiregames.com/]. In the future we should automate this process, see #290. |
62 | | * You also need to set up symbol paths and a cache location for Microsoft symbol server, see [http://support.microsoft.com/kb/311503 Use the Microsoft Symbol Server to obtain debug symbol files]. |
63 | | * In Visual Studio, after setting up your debug symbol paths, open the `crashlog.dmp` and choose to debug natively. You should get a crash of some kind, then you can break into the debugger. The call stack window will show you which functions were being called at that point. Note that in release builds, some data will be optimized out and not easily viewable. |
64 | | * In !WinDbg, after setting up your debug symbol and source code paths, open the crash dump and use the `~*kp` command to get a full call stack of each thread. See [http://www.windbg.info/doc/1-common-cmds.html this extremely helpful article] for more useful commands in !WinDbg. For example, `.frame 3` lets you set the current stack frame to !#3 (the 3rd from the top of the call stack), then you can e.g. use the source code window to see exactly the line of code matching this function call, and locals window to see the variables in that function. Note that !WinDbg can often open dump files that VS fails to open. |
| 55 | |
| 56 | * It can be opened in Visual Studio or !WinDbg. For this to be useful, '''you need to have the debug symbols and source code matching the affected build of the game'''. Most users use the autobuild version of the game, you can simply download the correct autobuilt binaries from SVN. Or for a release, install that particular version of the game and acquire the matching source package from [http://releases.wildfiregames.com/]. In the future we should automate this process, see #290. |
| 57 | * You also need to set up symbol paths and a cache location for Microsoft symbol server, see [http://support.microsoft.com/kb/311503 Use the Microsoft Symbol Server to obtain debug symbol files]. |
| 58 | * In Visual Studio, after setting up your debug symbol paths, open the `crashlog.dmp` and choose to debug natively. You should get a crash of some kind, then you can break into the debugger. The call stack window will show you which functions were being called at that point. Note that in release builds, some data will be optimized out and not easily viewable. |
| 59 | * In !WinDbg, after setting up your debug symbol and source code paths, open the crash dump and use the `~*kp` command to get a full call stack of each thread. See [http://www.windbg.info/doc/1-common-cmds.html this extremely helpful article] for more useful commands in !WinDbg. For example, `.frame 3` lets you set the current stack frame to !#3 (the 3rd from the top of the call stack), then you can e.g. use the source code window to see exactly the line of code matching this function call, and locals window to see the variables in that function. Note that !WinDbg can often open dump files that VS fails to open. |
67 | | * If `crashlog.txt` was created and it contains a call stack, you may have enough information there to find the source of the crash. |
68 | | * However, if the crashlog or stack dump failed for some reason, you either need to reproduce the crash locally, or get the user to run the game in a debugger (e.g. !WinDbg). |
69 | | * A third option on newer versions of Windows is to have the user create a memory dump from Windows task manager. The user can find `pyrogenesis.exe` in the task manager, right-click it and choose '''Create Dump File'''. Beware the resulting `MEMORY.DMP` will be very large as it contains all memory pages being accessed by the process at the time, but it may be compressed with e.g. [http://www.7-zip.org/ 7-Zip] down to a more reasonable size. |
| 62 | |
| 63 | * If `crashlog.txt` was created and it contains a call stack, you may have enough information there to find the source of the crash. |
| 64 | * However, if the crashlog or stack dump failed for some reason, you either need to reproduce the crash locally, or get the user to run the game in a debugger (e.g. !WinDbg). |
| 65 | * A third option on newer versions of Windows is to have the user create a memory dump from Windows task manager. The user can find `pyrogenesis.exe` in the task manager, right-click it and choose '''Create Dump File'''. Beware the resulting `MEMORY.DMP` will be very large as it contains all memory pages being accessed by the process at the time, but it may be compressed with e.g. [http://www.7-zip.org/ 7-Zip] down to a more reasonable size. |
81 | | Debug symbols relate a binary build of the game (and its libraries) to the source code used to compile the binary. Obviously, if the source code changes, the debug symbol info will also change. Debug symbols are invaluable in debugging the game, as they provide a "friendly" view of what a running process is doing or was doing. Function calls, parameters and variable contents can be inspected with the aid of debug symbols. Acquiring and setting up the matching symbols is critical to debugging crashes. This is a brief overview of where symbols come from and how to get them. |
82 | | * Debug symbols can contain a lot of data (10+ MB each is not uncommon), and most users aren't interested in debugging software, so often the symbols are omitted from release packages. This is very common with Linux packages. |
83 | | * For Windows builds, WFG manages the distribution of the game via SVN and alpha releases. Some but not all debug symbols are distributed as PDB files, which Visual Studio and !WinDbg can read. These are generated and committed by the autobuild process. |
84 | | * Symbols for Windows libraries are distributed by Microsoft and can be acquired from their [http://support.microsoft.com/kb/311503 public symbol server]. |
85 | | * Symbols for proprietary drivers (e.g. graphics drivers) are typically ''not'' publicly distributed. |
86 | | * If symbols are missing: this is generally the case with the game's libraries (SpiderMonkey, FCollada, NVTT, etc.) if they were built by WFG. In this case, you may have no choice but to rebuild the library in question and the game, then try to reproduce the crash once debug symbols are obtained. (In the future, we should distribute the PDBs for all open source libraries, to aid debugging.) |
87 | | * For Linux builds, the package maintainers handle the distribution of the game. It is up to them to choose how or whether they will distribute the debug symbols. |
88 | | * If symbols are missing: first check if there is a "debug" package of the module available. If not, the same advice applies as for Windows: try to build the library with debug symbols and reproduce the crash. |
| 77 | * Debug symbols can contain a lot of data (10+ MB each is not uncommon), and most users aren't interested in debugging software, so often the symbols are omitted from release packages. This is very common with Linux packages. |
| 78 | * For Windows builds, WFG manages the distribution of the game via SVN and alpha releases. Some but not all debug symbols are distributed as PDB files, which Visual Studio and !WinDbg can read. These are generated and committed by the autobuild process. |
| 79 | * Symbols for Windows libraries are distributed by Microsoft and can be acquired from their [http://support.microsoft.com/kb/311503 public symbol server]. |
| 80 | * Symbols for proprietary drivers (e.g. graphics drivers) are typically ''not'' publicly distributed. |
| 81 | * If symbols are missing: this is generally the case with the game's libraries (SpiderMonkey, FCollada, NVTT, etc.) if they were built by WFG. In this case, you may have no choice but to rebuild the library in question and the game, then try to reproduce the crash once debug symbols are obtained. (In the future, we should distribute the PDBs for all open source libraries, to aid debugging.) |
| 82 | * For Linux builds, the package maintainers handle the distribution of the game. It is up to them to choose how or whether they will distribute the debug symbols. |
| 83 | * If symbols are missing: first check if there is a "debug" package of the module available. If not, the same advice applies as for Windows: try to build the library with debug symbols and reproduce the crash. |
94 | | === Out of sync === |
95 | | |
96 | | Out of sync (OOS) and serialization errors are generally difficult to debug, but knowing where to look can make this process simpler. An OOS error occurs in a multiplayer game when one player's serialized simulation state isn't identical to another player's serialized simulation state (breaking the concept of network synchronization). The following data are useful to collect in this case: |
97 | | * `oos_dump.txt` - a human readable snapshot of the simulation state at the point of OOS, created on each player's computer. Found in the logs folder, see GameDataPaths. Each player should zip these files and send them to the person troubleshooting the bug. |
98 | | * Each player's game version - these have to match, while the game is in alpha phase the simulation changes constantly and there is no backward compatibility. For releases, it means using the same alpha release, for SVN users, it means using the same SVN revision (with few exceptions). |
99 | | * OS and hardware info for each player (`system_info.txt`) - Some serialization bugs are platform specific, so knowing the systems involved is key to reproducing the error. |
100 | | * `commands.txt` - this is the commands issued by each player during the game, which can be used to replay the game exactly as it happened. |
| 91 | * `oos_dump.txt` - a human readable snapshot of the simulation state at the point of OOS, created on each player's computer. Found in the logs folder, see GameDataPaths. Each player should zip these files and send them to the person troubleshooting the bug. |
| 92 | * Each player's game version - these have to match, while the game is in alpha phase the simulation changes constantly and there is no backward compatibility. For releases, it means using the same alpha release, for SVN users, it means using the same SVN revision (with few exceptions). |
| 93 | * OS and hardware info for each player (`system_info.txt`) - Some serialization bugs are platform specific, so knowing the systems involved is key to reproducing the error. |
| 94 | * `commands.txt` - this is the commands issued by each player during the game, which can be used to replay the game exactly as it happened. |
106 | | However, it may be that you won't see any diff, or maybe it will be huge and affect many entities and components. If there's no diff, that means the simulation state differed, but the difference doesn't affect the debug serializer. There are a few reasons why this could happen, but most likely the [wiki:JSON JSON] representation of the dump doesn't allow the actual value or (for a JavaScript value) the difference is in SpiderMonkey's internal representation of the data. This has been reported before with e.g. `NaN`, having a single bit difference depending on the JIT behavior (see #1879). |
| 100 | However, it may be that you won't see any diff, or maybe it will be huge and affect many entities and components. If there's no diff, that means the simulation state differed, but the difference doesn't affect the debug serializer. There are a few reasons why this could happen, but most likely the [wiki:JSON] representation of the dump doesn't allow the actual value or (for a JavaScript value) the difference is in SpiderMonkey's internal representation of the data. This has been reported before with e.g. `NaN`, having a single bit difference depending on the JIT behavior (see #1879). |
131 | | * SpiderMonkey has some JIT compiler issues (see #2000 for example). If you want to rule out the JIT compiler, you can disable it by uncommenting the lines in ScriptInterface.cpp that set the options JSOPTION_JIT and JSOPTION_METHODJIT. If it is the JIT compiler, you should still try to find the exact location of the problem because disabling JIT compiling completely is bad for performance. |
132 | | * Uninitialized variables can cause different behaviour because their value depends on random memory content. Valgrind helps detecting these errors (more detailed explanation needed). |
133 | | * Data affecting the simulation is kept past the runtime of one game (see #2285 for example). In this case OOS errors typically occur if one or more players have previously played another game without shutting down the engine afterwards. It's difficult to troubleshoot with replays because they always start a fresh instance of the engine. |
134 | | * Due to SpiderMonkey internals, floating point operations can differ slightly on different machines and different architectures (see [https://bugzilla.mozilla.org/show_bug.cgi?id=531915 Bugzilla #531915]). |
| 125 | * SpiderMonkey has some JIT compiler issues (see #2000 for example). If you want to rule out the JIT compiler, you can disable it by uncommenting the lines in ScriptInterface.cpp that set the options JSOPTION_JIT and JSOPTION_METHODJIT. If it is the JIT compiler, you should still try to find the exact location of the problem because disabling JIT compiling completely is bad for performance. |
| 126 | * Uninitialized variables can cause different behaviour because their value depends on random memory content. Valgrind helps detecting these errors (more detailed explanation needed). |
| 127 | * Data affecting the simulation is kept past the runtime of one game (see #2285 for example). In this case OOS errors typically occur if one or more players have previously played another game without shutting down the engine afterwards. It's difficult to troubleshoot with replays because they always start a fresh instance of the engine. |
| 128 | * Due to SpiderMonkey internals, floating point operations can differ slightly on different machines and different architectures (see [https://bugzilla.mozilla.org/show_bug.cgi?id=531915 Bugzilla #531915]). |
143 | | * `debug.after.a` / `debug.after.b` - A good place to begin, these are the debug output of the serializer '''after''' the current turn updates and can be compared with a diff tool (like `oos_dump.txt` for OOS errors). |
144 | | * `state.after.a` / `state.after.b` - Binary dump of the simulation state '''after''' the current turn update occurs. If an error occurs here, it is probably because some data that affects the simulation state isn't being serialized. |
145 | | * `state.before.a` / `state.before.b` - Binary dump of the simulation state '''before''' the current turn update occurs. If an error occurs here, it's probably a bug in the (de)serializer or the way it is (de)serializing the data. |
146 | | * `hash.after.a` / `hash.after.b` / `hash.before.a` / `hash.before.b` - hash values used to compare the above states. |
| 135 | * `debug.after.a` / `debug.after.b` - A good place to begin, these are the debug output of the serializer '''after''' the current turn updates and can be compared with a diff tool (like `oos_dump.txt` for OOS errors). |
| 136 | * `state.after.a` / `state.after.b` - Binary dump of the simulation state '''after''' the current turn update occurs. If an error occurs here, it is probably because some data that affects the simulation state isn't being serialized. |
| 137 | * `state.before.a` / `state.before.b` - Binary dump of the simulation state '''before''' the current turn update occurs. If an error occurs here, it's probably a bug in the (de)serializer or the way it is (de)serializing the data. |
| 138 | * `hash.after.a` / `hash.after.b` / `hash.before.a` / `hash.before.b` - hash values used to compare the above states. |
163 | | |
164 | | * '''Debug annotations''' can prove very helpful when viewing the binary simulation state in a hex editor. The dump will be much larger but it will also contain more textual data. Set `DEBUG_SERIALIZER_ANNOTATE` to 1 in [source:/ps/source/simulation2/serialization/StdSerializer.h StdSerializer.h]. |
165 | | * '''Check hashes more frequently'''. By default, the game balances e.g. OOS checks with the performance impact of serializing and hashing the state. You might find it helpful to change e.g. `CReplayPlayer::Replay` in [source:/ps/source/ps/Replay.cpp Replay.cpp] to check hashes more frequently in replay mode, you can also generate hashes more frequently by changing `CNetTurnManager::TurnNeedsFullHash` (multiplayer games) or `CNetLocalTurnManager::NotifyFinishedUpdate` (single player games) in [source:/ps/source/network/NetTurnManager.cpp NetTurnManager.cpp]. |
| 153 | * '''Debug annotations''' can prove very helpful when viewing the binary simulation state in a hex editor. The dump will be much larger but it will also contain more textual data. Set `DEBUG_SERIALIZER_ANNOTATE` to 1 in [source:/ps/source/simulation2/serialization/StdSerializer.h StdSerializer.h]. |
| 154 | * '''Check hashes more frequently'''. By default, the game balances e.g. OOS checks with the performance impact of serializing and hashing the state. You might find it helpful to change e.g. `CReplayPlayer::Replay` in [source:/ps/source/ps/Replay.cpp Replay.cpp] to check hashes more frequently in replay mode, you can also generate hashes more frequently by changing `CNetTurnManager::TurnNeedsFullHash` (multiplayer games) or `CNetLocalTurnManager::NotifyFinishedUpdate` (single player games) in [source:/ps/source/network/NetTurnManager.cpp NetTurnManager.cpp]. |