Opened 13 months ago

Closed 8 months ago

#6320 closed defect (fixed)

Start game - freeze before main menu - black or distorted graphics

Reported by: dave_k Owned by: wraitii
Priority: Should Have Milestone: Alpha 26
Component: Core engine Keywords: freeze, frozen, halt, hang, hung, stop, stuck, artifact, black, blank, corrupt, distort, garbage, garble
Cc: dave_k Patch: Phab:D4275

Description (last modified by dave_k)

Summary
After upgrading to a new version, multiple users have experienced a problem where the game seems to be frozen and has black or distorted video output. User-installed mods seem to trigger this problem.

Removing all user-installed mod files seems to resolve the problem, but should only be considered a workaround. I think that this bug will still be experienced by users in the future due to design flaws in the game startup sequence.

Visual proof
Example 1 (dave_k's computer)
Symptoms on dave_k's computer - scaled down

Example 2 (dave_k's computer)
Symptoms on dave_k's computer - scaled down

Example 3 (Tirili's computer)
Example symptoms - Tirili's computer

Example 4 (allalongthetower's computer)
allalongthetower screenshot of example symptoms - scaled down

Expected vs. actual results
Expected results: The game starts up, displays the main menu and thematic artwork, and plays music. The main menu responds to mouse clicks and the game functions correctly.

Actual results: The game starts, and the game window is all black or else very garbled with strange colors and a collage of various parts of the windows on the desktop. Mouse input has no effect other than moving the 0ad-themed mouse cursor around. No audio plays, not even music, despite resetting the user configuration.

Tirili is one such user who has experienced this problem. Based on a debugging session with Tirili, the main game loop is actually running, and the game responds correctly to keyboard input such as F2 for taking a screenshot. But the game is not usable.

Steps to reproduce

  1. If you are using Debian or a Debian-derived distribution of Linux such as Ubuntu, ensure that 0ad-data was not built with dh_strip_nondeterminism enabled for .zip files. This ensures reliable reproduction of the symptoms.
  1. Backup and remove all files and directories in /home/<yourusername>/.local/share/0ad/mods, or the equivalent path for your operating system. See GameDataPaths for equivalent paths on different operating systems.
  1. Extract two files from the attached file, "Tirili 0ad dot local share 0ad files aka 0adRenamed3.tar.gz", to /home/<yourusername>/.local/share/0ad/, or the equivalent path for your operating system where user mods are installed. The two files that one should extract, mods/user/gui/page_pregame.xml.cached.xmb and mods/user/gui/page_splashscreen.xml.cached.xmb, will produce a minimal test case that should reproduce the symptoms. See "Note" at the end of this section.
  1. Clear the 0ad cache (remove all files and folders in ~/.cache/0ad or the equivalent path for your operating system)
  1. Start the game.

Note: Other files in the attached tarball are unnecessary. In fact, there are security issues with unverified mods, and I don't know whether or not Tirili avoided installing unverified mods. So, I would suggest testing with only the minimal test case of the two simple xmb files.

I have attached the entire user mods folder that Tirili sent as proof that this problem is practically occurring in a realistic user environment. Specifically, user mods were installed in the past, 0ad was upgraded and previously-invisible bugs were triggered.

Environnment
Three users have reported this problem so far, Tirili, Farid, and allalongthetower.

  • Tirili uses GNU/Linux. Slackware distribution. AMD APU (CPU combined with GPU) that is approximately 6 months old.
  • Farid uses GNU/Linux. Unknown distribution and hardware.
  • allalongthetower uses Mac OS-X. Intel 3rd generation CPU. Intel HD Graphics 4000 GPU.
  • And, I am able to reproduce the symptoms with Debian GNU/Linux. Intel CPU and nVIDIA GPU.

Console logs
See attached file, "2021-08-21 bug report from Tirili - backtrace.txt". Note that the last console log message is "TIMER| ps_lang_hotkeys: 1.00483 ms". Also note that the current source line in the backtrace is just where it happened to be at the time that I asked Tirili to press Ctrl-C in order to obtain a backtrace. The game loop is able to continue normally and calls to Render() and LimitFPS() complete. Yet the game window video output remains black (or garbled, depending on the test environment).

Related tickets
I plan to post additional tickets for the following related issues.

  1. Add diagnostics: listing files accessed and gui error messages on console and in log file instead of only in graphical window. The latter can't be read if the GUI is not working correctly.
  2. Add ability to enable verbose startup, error and warning messages or logging.
  3. Add detection of broken mods and offer to reset mod configuration. Or, support a command line switch to reset the mod configuration. Or, support a command line switch to enable "safe mode" where mods are disabled, and advanced graphics features are also disabled.

Attachments (14)

screenshot0001.png (806.8 KB ) - added by dave_k 13 months ago.
Example symptoms on dave_k's computer
screenshot0002.png (1.5 MB ) - added by dave_k 13 months ago.
Example symptoms on dave_k's computer
Tirili 0ad dot local share 0ad files aka 0adRenamed3.tar.gz (10.2 KB ) - added by dave_k 13 months ago.
screenshot0001_scaled_down.png (338.5 KB ) - added by dave_k 13 months ago.
Symptoms on dave_k's computer - scaled down
screenshot0002_scaled_down.png (374.9 KB ) - added by dave_k 13 months ago.
Symptoms on dave_k's computer - scaled down
tirili screenshot.png (5.2 KB ) - added by dave_k 13 months ago.
Example symtoms - Tirili's computer
tirili screenshot_scaled_down.png (4.0 KB ) - added by dave_k 13 months ago.
Example symptoms - Tirili's computer
picus_xmb.jpg (31.2 KB ) - added by Langbart 13 months ago.
2021-08-21 bug report from Tirili - backtrace.txt (4.9 KB ) - added by dave_k 13 months ago.
Console logs and backtrace during problem reproduction by Tirili
allalongthetower_screenshot_of_error_messages.png (724.4 KB ) - added by dave_k 13 months ago.
allalongthetower screenshot of example symptoms - full resolution
allalongthetower_screenshot_of_error_messages_scaled_down.png (527.9 KB ) - added by dave_k 13 months ago.
allalongthetower screenshot of example symptoms - scaled down
allalongthetower_0ad_config_log_mod_etc_backup.tar.xz (779.7 KB ) - added by dave_k 13 months ago.
allalongthetower's user-writeable data related to 0ad including logs, mod files, mod/user, etc
allalongthetower_cache_files_with_sizes_and_timestamps.txt (212.4 KB ) - added by dave_k 13 months ago.
allalongthetower's 0ad cache files - a summary list with file size, date and name
allalongthetower_cache_files.sha512 (361.3 KB ) - added by dave_k 13 months ago.
allalongthetower's 0ad cache files - sha512sum SHA-512 hashes

Change History (33)

by dave_k, 13 months ago

Attachment: screenshot0001.png added

Example symptoms on dave_k's computer

by dave_k, 13 months ago

Attachment: screenshot0002.png added

Example symptoms on dave_k's computer

comment:1 by dave_k, 13 months ago

Description: modified (diff)

by dave_k, 13 months ago

Symptoms on dave_k's computer - scaled down

by dave_k, 13 months ago

Symptoms on dave_k's computer - scaled down

comment:2 by dave_k, 13 months ago

Description: modified (diff)

comment:3 by dave_k, 13 months ago

Description: modified (diff)

comment:4 by dave_k, 13 months ago

Description: modified (diff)

comment:5 by dave_k, 13 months ago

Description: modified (diff)

by dave_k, 13 months ago

Attachment: tirili screenshot.png added

Example symtoms - Tirili's computer

comment:6 by dave_k, 13 months ago

Description: modified (diff)

by dave_k, 13 months ago

Example symptoms - Tirili's computer

comment:7 by dave_k, 13 months ago

Description: modified (diff)

comment:8 by Langbart, 13 months ago

Related IRC 0ad-dev conversation 31/May/21:

10:03 < Langbart> wraitii when I distribute my mod I run the pyrogenesis command. After rP25375 it creates some xmb files in this folder: /Users/picus/Library/Application Support/0ad/mods/user. Changing the appearance of the game even if no mod is enabled.
10:03 < Langbart> Is the problem with my mod, do I need to make changes in the mod that this does not occur anymore?
10:04 < wraitii> what is that xmb file?
10:04 < Langbart> https://pasteall.org/media/6/7/6769447654b2bf964acce130fb462669.png
10:06 < wraitii> mh, well that's peculiar
10:06 < wraitii> Can you clear your cache, your user/ mod folder, and retry?
10:06 < Langbart> This xmb file is created when I enable the pyromod mod.
10:09 < Langbart> When I create the pyromod mod with rP25374 and enable the mod --> no file is created, but with 25375 there is. I deleted the cache and the contents of user/ mod folder as well.
10:10 < wraitii> and only that file?
10:10 < wraitii> can you send me your mod?
10:10 < Langbart> hmm

  • Summarize: When I created a mod with the SVN version after [25375] and enabled this mod with A24, some files were created in my 0ad/mod/user folder. Since I created my mods after [25375] and enabled them using A25, there were no problems. All good :)

by Langbart, 13 months ago

Attachment: picus_xmb.jpg added

by dave_k, 13 months ago

Console logs and backtrace during problem reproduction by Tirili

comment:9 by dave_k, 13 months ago

Description: modified (diff)
Keywords: corrupt added

comment:10 by dave_k, 13 months ago

Description: modified (diff)

comment:11 by dave_k, 13 months ago

Description: modified (diff)

comment:12 by wraitii, 13 months ago

The issue here is that those files should really not have been written to the user mod in the first place. They are 'archived' variants of XML files, which the game _will_ attempt to load, but fail-to (because they have the wrong XMB version), and thus the game fails to load entirely, thus black screen.

My current understanding is that writing those files can only happen when enabling an A25-written mod with A24 (per the above), which seems like a rather rare behaviour.

If you are saying that A25 itself is writing those files, then we have a problem. Otherwise, I think we're going to have to backlog this until we have better ways to deal with version upgrades.

comment:13 by dave_k, 13 months ago

Angen said this in the #0ad IRC channel yesterday:

< Angen> Summarize: When I created a mod with the SVN version after [25375] and enabled this mod with A24, some files were created in my 0ad/mod/user folder. Since I created my mods after [25375] and enabled them using A25, there were no problems.

I think that it is a useful clue, but there are too many unknowns to call this a solved problem.

0ad still needs to handle XMB files in user/ gracefully. We continue to get reports about these symptoms.

Last edited 13 months ago by dave_k (previous) (diff)

comment:14 by dave_k, 13 months ago

wraitii and I agreed on IRC today (#0ad-dev, 2021-09-16 18:22 UTC) that there are at least two bugs in 0ad related to this ticket.

  1. pyrogenesis needs to prevent XMBs from being written in the wrong place (the user/ directory)
  1. pyrogenesis needs to gracefully process, or skip, XMBs that have been written in the wrong place (the user/ directory), even when bug #1 is fixed

I would add another bug: a chain of events has made the GUI easy to break and difficult for users to fix. In order to support a change in the binary format of XMB files, all cases should be handled in the source code.

Another bug: mods are marking themselves as compatible with future and past versions of 0ad. wraitii agreed that this is not a good thing especially due to the binary format change of XMB files. He seemed to agree that 0ad should disallow mods from marking any dependency on 0ad other than 0ad=[version].

Here are ideas for solutions to the bugs. I think that it's likely that there is enough time for implementing at least one of these ideas. I am listing these ideas so that the 0ad team has a wide variety of options to choose among, in addition to ideas that others propose. Some of the ideas in this list are intended as alternate ideas instead of additive ideas.

  1. Rework user/ system so that Atlas doesn't write there, and instead writes to a folder dedicated to storing map that are works-in-progress. The Atlas wip folder would not be auto-loaded by the game on every startup.
  2. Sandbox mods so that they can only read from mods/modname, and read/write to config/modname/*.json
  3. Output a warning in logs and the console that a file is being written to the user/ directory. Writes of .XMB format would be unintended under all circumstances. After implementing the first two ideas, all writes to user/ would be unintended.
  4. Prevent writes to user/ entirely, enforcing a security and reliability policy.
  5. Automatically delete all content in user/ on every startup of 0ad so that old files there do not break new versions of the game engine
  6. Ensure that .XMB files that have an incompatible binary format do not cause unexpected failure of the XML parsing system.
  7. Ensure that .XMB files are marked with version information so that incompatible binary format .XMB files are skipped entirely by 0ad's XML processing system.
  8. Ignore all reads and writes of files (or ignore only .XMB files) in user/, unless the user passes a command line switch to enable the use of .XMB files there, for purposes of rapid mod prototyping as a mod developer. mod.json is not required (or enforced?) for user/ so only very knowledgeable users should be using user/, anyway.
  9. Perform a code review with the insights that have been made about the causes of this problem. Ensure that the code handles all possible cases properly. Test corner cases in order to verify correct functionality.
  10. Restrict mods to marking dependency with 0ad= instead of 0ad < or >. Dependency on other community mods would still be allowed to be marked with =, >, or <.
  11. Add a feature to mod.json where mods can mark conflicts with certain versions of 0ad and/or certain (or all) versions of other community mods.

by dave_k, 13 months ago

allalongthetower screenshot of example symptoms - full resolution

by dave_k, 13 months ago

allalongthetower screenshot of example symptoms - scaled down

by dave_k, 13 months ago

allalongthetower's user-writeable data related to 0ad including logs, mod files, mod/user, etc

by dave_k, 13 months ago

allalongthetower's 0ad cache files - a summary list with file size, date and name

by dave_k, 13 months ago

allalongthetower's 0ad cache files - sha512sum SHA-512 hashes

comment:15 by dave_k, 13 months ago

Description: modified (diff)

comment:16 by dave_k, 13 months ago

A pattern of user environment: Tirili and allalongthetower had stopped playing 0ad for extended periods of time and then attempted to upgrade. It was then that they observed the symptoms described in this ticket.

Another pattern of user environment: Tirili and allalongthetower had a delayed upgrade of 0ad. They upgraded several weeks after the release of alpha 25. Alpha 24 was working fine on their systems while they were trying to get alpha 25 working. There is a possibility that a mod was updated to a version designed for alpha 25 and enabled while alpha 24 was running.

Another pattern of user environment: Tirili and allalongthetower switched rapidly among alpha 23, alpha 24 and alpha 25 without clearing the user/ folder, config files, cache, etc. This was probably a user reaction to the fact that the 0ad game window was all black or garbled, not the cause of the symptoms. Anyway, wraitii said that it's bad for alpha 25 mods to be run by alpha 24 because the binary XMB format is not compatible.

Note that allalongthetower ran a development version and tested a pre-release version of a mod at one time. Here is an excerpt from the IRC chat log on #0ad on 2021-09-16.

20:30 < dave_k> allalongthetowr, have you ever created a mod before?\\
20:30 < allalongthetowr> No, I haven't\\
20:30 < dave_k> and have you used a development version of 0ad?\\
20:31 < dave_k> (ie. something between alpha 23 and alpha 24, or a version between alpha 24 and alpha 25)\\
20:31 < allalongthetowr> Yes, I opened a pre-release version of 25.\\
20:31 < allalongthetowr> To test a mod that had been created for me and my students.\\
20:31 < allalongthetowr> Didn't use it too much.\\
20:32 < allalongthetowr> but did open it.\\
20:32 < allalongthetowr> I have it saved if you'd like to know what it is.\\
20:32 < dave_k> do you know the approximate date that you used a pre-release version of 25?\\
20:32 < dave_k> yes, knowing which mod it was would be helpful as well\\
20:33 < allalongthetowr> I can look to see when I installed it maybe.\\
20:33 < allalongthetowr> The mod is "No Violence" it takes all the attacking out of the game. We've been tweaking it to make it more kid friendly.\\
20:33 < allalongthetowr> I can send you the thread in the chat.\\
20:35 < allalongthetowr> the early release file was modified on 6/18/2021 and added to my Applications on 6/24/2021 (I assume that's when I opened it)\\
20:36 < dave_k> https://wildfiregames.com/forum/topic/27716-single-player-map/\\
20:36 < allalongthetowr> Since then, I downloaded and played a tiny bit with Alpha 25a.

Details of allalongthetowr's 0ad configuration, mods, logs, etc. are attached in the following files. A tarball of the cache files is 8 MB, so it exceeds the attachment size limit of 2 MB. I can make it available on request.

allalongthetower_0ad_config_log_mod_etc_backup.tar.xz​
allalongthetower_cache_files_with_sizes_and_timestamps.txt
allalongthetower_cache_files.sha512​

Last edited 13 months ago by Stan‘ (previous) (diff)

in reply to:  14 comment:17 by wraitii, 13 months ago

Patch: Phab:D4275

Just to state my opinion directly:

  1. pyrogenesis needs to gracefully process, or skip, XMBs that have been written in the wrong place (the user/ directory), even when bug #1 is fixed

Specifically, failing to load a cached file should fail somewhat explicitly here, or not fail at all.

wraitii [...] seemed to agree that 0ad should disallow mods from marking any dependency on 0ad other than 0ad=[version].

I'm saying we should consider it, mostly the '>=' case is problematic when upgrading.


See diff for details on my proposed fix. Note that this won't fix A24, obviously.

comment:18 by Silier, 10 months ago

Owner: set to wraitii

comment:19 by wraitii, 8 months ago

Resolution: fixed
Status: newclosed

In 26272:

Fix bug where 'archive' XMB files could end up being written to the user mod

Users sometimes ended up with bad (wrong version) XMB files in the user mod. This resulted in A25 loading a black screen.
There is a combination of unfortunate code paths that lead to this. The core issue is that:

  • rP25375 changed the XMB loading code that if there is an error in Init from a cached XMB, it reports an error. This error happens to be silent, because the GUI expects CXeromyces to do its own error reporting (a pretty poor decision, all in all, but whatever). This explained why the black screen showed no errors.
  • The code flow attemps to load an 'archive' XMB first, then only a loose cache. _But_ if the XMB that fails to load is an archive (which generally never happens except when using incompatible mods, which is generally less easy in A25 since we added code to stop that), then the game will try to recreate the XMB as an 'archived' path, not a 'loose cache' path as it would usually do.
  • Because the 'archived' path already exists in the VFS, the game will attempt to overwrite that. It so happens that in non-dev copies, this writes to the user mod.
  • Because the user-mod is always loaded, this was unexpected for users.

Fixing this is rather simple: the game should never attempt to write 'archive' XMBs in that function. Added explicit barrier, which shouldn't matter performance-wise but fixes the issue by writing in the proper place, and also properly recovering in case of read failure.
I will note that the game will still try to load the archived file, and recreate it every time, but I don't think that's a particularly big deal, in general having engine-incompatible mods in the future should be harder because of A25 changes there.
(NB: users that have used both A24 and A25 should perhaps still be advised to check their user mod folder, otherwise they'll end up recreating those files forever).

Reported by: dave_k

Fixes #6320

Differential Revision: https://code.wildfiregames.com/D4275

Note: See TracTickets for help on using tickets.