#2760 closed defect (fixed)
[PATCH] Oos svn 15692
Reported by: | Oimat | Owned by: | Itms |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 17 |
Component: | UI & Simulation | Keywords: | patch |
Cc: | Patch: |
Description
Quit a multiplayer game and join it again.
Played against alpha. Reproduced the bug twice.
Attachments (8)
Change History (31)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
confirm that svn always goes oos when rejoining. Also we both (humans) verified that it is a real oos, game was not in sync anymore. Tested with tau.
by , 10 years ago
Attachment: | oos_dump.txt added |
---|
comment:3 by , 10 years ago
Cc: | added |
---|
This is related to mirage entities. Luckily, there's a very simple way to reproduce it. Play a single player game with the -serializationtest
option (see this page for more OOS debugging hints) and note that the serialization test fails when player visibility changes.
The attached 1-player map with a handful of trees is probably one of the simplest possible ways to test. Just move the infantryman away from the trees and wait. Then the debug logs can be diff'd to see the differences. I can't tell why it's failing without going deep into the mirage code, but it's a start :)
follow-up: 5 comment:4 by , 10 years ago
Cc: | removed |
---|---|
Owner: | set to |
Okay, closing #2698 as duplicate, as these serialization errors are not false positives. I'll look into that but we already tried to understand where the problem came from and we failed.
What I already know is that the problem is not (hopefully) related to mirages themselves, it arises from the VisibilityChanged message introduced earlier. As a consequence, the miraging of entities is wrongly synchronized.
comment:5 by , 10 years ago
Replying to Itms:
these serialization errors are not false positives
There's no such thing, hopefully :D (unless you count AIs but they are broken so that's not really a false positive either)
Maybe it's something to do with the order of messages or init? Or could it be depending on unserialized state or cache that gets cleared when leaving the game? I see r15508 added a "Cache for visibility tracking (not serialized)". A cache is something that would probably stay sync'd during multiplayer games, but break on rejoin or loading saved games.
comment:6 by , 10 years ago
Oh, here is something you may not be aware of. On multiplayer rejoin (and loading saved games, and serialization test mode), the original map is loaded first, before any deserialization takes place. So the simulation state will be much like the game just began - for a moment. IIRC the simulation state is deserialized next, followed by any messages received while joining, so the joining player can catch up. If there is cache that isn't cleared or invalidated on deserialization, but which affects the synchronized simulation state, it will break.
(I think a common incorrect assumption is that in all these cases, the reconstructed game state comes only from the [de]serialized data, but that's not how it works)
comment:9 by , 10 years ago
Had a chance to do more in-depth debugging. I put in a bunch of debug print statements to find the order of events and LOS data updates in the range manager. Here's an example of when the serialization error occurs (see notes below):
Primary update 11 MT_Update UpdateVisibility entity 20, old vis = 2, new vis = 1 <----- becomes fogged MT_PositionChanged MT_PositionChanged UpdateVisibility entity 24, old vis = 0, new vis = 1 UpdateVisibility entity 23, old vis = 2, new vis = 2 UpdateVisibility entity 24, old vis = 1, new vis = 1 UpdateVisibility entity 24, old vis = 1, new vis = 1 MT_PositionChanged Deserialize Secondary update 11 MT_Update UpdateVisibility entity 23, old vis = 2, new vis = 2 UpdateVisibility entity 20, old vis = 2, new vis = 1 <----- becomes fogged MT_PositionChanged MT_PositionChanged UpdateVisibility entity 24, old vis = 0, new vis = 1 UpdateVisibility entity 23, old vis = 2, new vis = 2 UpdateVisibility entity 24, old vis = 1, new vis = 1 UpdateVisibility entity 24, old vis = 1, new vis = 1 MT_PositionChanged Primary update 12 MT_Update UpdateVisibility entity 23, old vis = 2, new vis = 2 MT_PositionChanged Deserialize Secondary update 12 MT_Update UpdateVisibility entity 23, old vis = 2, new vis = 2 UpdateVisibility entity 20, old vis = 1, new vis = 0 <----- becomes hidden :( UpdateVisibility entity 24, old vis = 1, new vis = 1 UpdateVisibility entity 23, old vis = 2, new vis = 2 MT_PositionChanged Simulation2.cpp(312): Assertion failed: "0 && (L"Serialization test failure")" Primary update 13 MT_Update UpdateVisibility entity 20, old vis = 1, new vis = 0 <----- becomes hidden UpdateVisibility entity 24, old vis = 1, new vis = 1 UpdateVisibility entity 23, old vis = 2, new vis = 2 MT_PositionChanged Deserialize Secondary update 13 MT_Update UpdateVisibility entity 23, old vis = 2, new vis = 2 UpdateVisibility entity 20, old vis = 1, new vis = 0 <----- becomes hidden UpdateVisibility entity 24, old vis = 1, new vis = 1 UpdateVisibility entity 23, old vis = 2, new vis = 2 MT_PositionChanged
Entities:
id: 2 template: "special/player" id: 3 template: "special/player" id: 20 template: "gaia/flora_tree_oak" id: 23 template: "units/athen_infantry_archer_e" id: 24 template: "mirage|gaia/flora_tree_oak"
The "primary update #" is the actual simulation updating once per turn; "secondary update #" is the deserialized copy of the simulation updating once per turn. UpdateVisibility is showing calculated visibility changes for player 1 only. So from the output sequence, we can see that entity 20 transitions from fogged to hidden on turn 12 in the secondary simulation, but not in the primary, causing the error.
Strangely, on the next turn 13, entity 20 transitions from fogged to hidden in both secondary and primary simulation states. From this point on, they agree and there are no more errors (unless LOS changes again).
comment:10 by , 10 years ago
Ben, many many thanks for the debugging, I wouldn't have succeeded in isolating the failure in a such clear way.
What happens here is that, after the mirage is created (turn 11), the actual entity must get hidden. In the secondary update, in an inefficient way, the' visibility data of every entity is updated, so the actual tree gets hidden. In the primary though, as nothing toggled the update, nothing happens.
In turn 13, there is something that triggers the update (a unit moving and updating the FoW in the region around the tree) so everything's back to normal.
The fix is simple, when the visibility of an entity changes, this entity must be marked for update for the next turn. I tested it and the serialization tests didn't fail! I'd like people to test rejoining before I commit though.
comment:11 by , 10 years ago
Keywords: | patch review added |
---|---|
Summary: | Oos svn 15692 → [PATCH] Oos svn 15692 |
by , 10 years ago
comment:12 by , 10 years ago
Okay, this is not perfect.
- I get serialization fails at the end of a game
- I got a fail with the visibility of an actor (thus not miraged)
- I tested a lobby game and at rejoin, we got an oos, the dumps only differ by the visibility of an actor.
So I think I corrected one failure but there is at least another one. Ben, can you check the problem that occurred on your test map is fixed? Also, I may have missed something about game ends.
comment:13 by , 10 years ago
The patch fixes the problem from my test.
With or without the patch, I get a serialization test failure when revealing the map through the developer overlay, that's basically what happens when a player is defeated/victorious:
--- C:/Users/Ben/AppData/Local/0ad/logs/oos_log/debug.after.b Sun Sep 14 22:35:57 2014 +++ C:/Users/Ben/AppData/Local/0ad/logs/oos_log/debug.after.a Sun Sep 14 22:35:53 2014 @@ -588,7 +588,7 @@ entities: x: 294.82261 z: 393.89921 vision: 0 - visibilities: 2863311530 + visibilities: 2 retain in fog: 1 owner: 0 in world: 1 @@ -597,7 +597,7 @@ entities: x: 296.54645 z: 323.0146 vision: 80 - visibilities: 2863311530 + visibilities: 2 retain in fog: 0 owner: 1 in world: 1
2863311530 is 0xAAAAAAAA. Unlike the last problem, this one doesn't go away after a turn, so long as LOS isn't changing. But if I move the unit away from the tree after revealing the map, eventually the serialization errors stop, maybe that forces the visibility data to refresh but otherwise it goes stale?
comment:14 by , 10 years ago
This is a new patch which corrects problems related to LosRevealAll toggling.
However, this should not correct the problem still occurring at rejoin, because no player had revealed the map. I'll investigate, please tell me if you have any clue.
by , 10 years ago
Attachment: | oos.2.patch added |
---|
comment:15 by , 10 years ago
Tested with trompetin today, without rejoining OOS!
Could more people test the patch? Thanks in advance!
comment:16 by , 10 years ago
Multiple tests with angelogenius (aka altamura) today: game goes not always oos, but it still does (always) when 1 player sends a scout into unexplored map area and then leaves and rejoins.
On an explored and revealed map this does not happen, all is fine. Hope that helps. (@ltms: remember to send me a forum pm if you want something specific to be tested on this issue. And yes, this time we applied the oos.2.patch correctly ;) )
comment:17 by , 10 years ago
Thanks a lot for testing this! :D
However I couldn't reproduce what you describe. Could you provide a commands.txt so that I can reproduce the situation? Thanks in advance!
comment:18 by , 10 years ago
Attaching two dumps for reference. oos_dump_z.txt is the one for the rejoining player (non host). At first sight the problem is that the miraging order is not respected. I also have a visibility inconsistency for an actor (see comment 12).
by , 10 years ago
comment:20 by , 10 years ago
Well, I propose this patch that should fix the remaining issue and that also addresses a possible hard edge case about border vertices. However the bug would happen only for square maps so that's possible no one encountered it.
by , 10 years ago
Attachment: | lostiles.patch added |
---|
comment:22 by , 10 years ago
Please reopen this if you encounter more problems on rejoining. Please note #2806 might also be the cause of the problem.
comment:23 by , 8 years ago
Keywords: | review removed |
---|
Were there any AIs or only humans? Can you provide the oos_dump logs for each player?