Opened 10 years ago

Closed 10 years ago

Last modified 8 years ago

#2760 closed defect (fixed)

[PATCH] Oos svn 15692

Reported by: Oimat Owned by: Itms
Priority: Release Blocker Milestone: Alpha 17
Component: UI & Simulation Keywords: patch
Cc: Patch:

Description

Quit a multiplayer game and join it again.

Played against alpha. Reproduced the bug twice.

Attachments (8)

oos_dump.txt (1.4 MB ) - added by z 10 years ago.
svn_oos.rar (114.5 KB ) - added by tau 10 years ago.
logs of the game with z
mirage_test.zip (2.4 KB ) - added by historic_bruno 10 years ago.
Simple test map
oos_dump.txt.diff (15.2 KB ) - added by historic_bruno 10 years ago.
Diff of OOS dumps provided
oos.patch (762 bytes ) - added by Itms 10 years ago.
oos.2.patch (1.1 KB ) - added by Itms 10 years ago.
Dumps.rar (326.8 KB ) - added by Itms 10 years ago.
lostiles.patch (2.7 KB ) - added by Itms 10 years ago.

Download all attachments as: .zip

Change History (31)

comment:1 by historic_bruno, 10 years ago

Were there any AIs or only humans? Can you provide the oos_dump logs for each player?

comment:2 by z, 10 years ago

confirm that svn always goes oos when rejoining. Also we both (humans) verified that it is a real oos, game was not in sync anymore. Tested with tau.

by z, 10 years ago

Attachment: oos_dump.txt added

by tau, 10 years ago

Attachment: svn_oos.rar added

logs of the game with z

comment:3 by historic_bruno, 10 years ago

Cc: Itms added

This is related to mirage entities. Luckily, there's a very simple way to reproduce it. Play a single player game with the -serializationtest option (see this page for more OOS debugging hints) and note that the serialization test fails when player visibility changes.

The attached 1-player map with a handful of trees is probably one of the simplest possible ways to test. Just move the infantryman away from the trees and wait. Then the debug logs can be diff'd to see the differences. I can't tell why it's failing without going deep into the mirage code, but it's a start :)

by historic_bruno, 10 years ago

Attachment: mirage_test.zip added

Simple test map

by historic_bruno, 10 years ago

Attachment: oos_dump.txt.diff added

Diff of OOS dumps provided

comment:4 by Itms, 10 years ago

Cc: Itms removed
Owner: set to Itms

Okay, closing #2698 as duplicate, as these serialization errors are not false positives. I'll look into that but we already tried to understand where the problem came from and we failed.

What I already know is that the problem is not (hopefully) related to mirages themselves, it arises from the VisibilityChanged message introduced earlier. As a consequence, the miraging of entities is wrongly synchronized.

in reply to:  4 comment:5 by historic_bruno, 10 years ago

Replying to Itms:

these serialization errors are not false positives

There's no such thing, hopefully :D (unless you count AIs but they are broken so that's not really a false positive either)

Maybe it's something to do with the order of messages or init? Or could it be depending on unserialized state or cache that gets cleared when leaving the game? I see r15508 added a "Cache for visibility tracking (not serialized)". A cache is something that would probably stay sync'd during multiplayer games, but break on rejoin or loading saved games.

comment:6 by historic_bruno, 10 years ago

Oh, here is something you may not be aware of. On multiplayer rejoin (and loading saved games, and serialization test mode), the original map is loaded first, before any deserialization takes place. So the simulation state will be much like the game just began - for a moment. IIRC the simulation state is deserialized next, followed by any messages received while joining, so the joining player can catch up. If there is cache that isn't cleared or invalidated on deserialization, but which affects the synchronized simulation state, it will break.

(I think a common incorrect assumption is that in all these cases, the reconstructed game state comes only from the [de]serialized data, but that's not how it works)

Last edited 10 years ago by historic_bruno (previous) (diff)

comment:7 by Itms, 10 years ago

That is interesting, I'll investigate in that way. Thanks! :D

comment:8 by historic_bruno, 10 years ago

We discussed this in IRC yesterday @ 21:11.

comment:9 by historic_bruno, 10 years ago

Had a chance to do more in-depth debugging. I put in a bunch of debug print statements to find the order of events and LOS data updates in the range manager. Here's an example of when the serialization error occurs (see notes below):

Primary update 11
MT_Update
UpdateVisibility entity 20, old vis = 2, new vis = 1   <----- becomes fogged
MT_PositionChanged
MT_PositionChanged
UpdateVisibility entity 24, old vis = 0, new vis = 1
UpdateVisibility entity 23, old vis = 2, new vis = 2
UpdateVisibility entity 24, old vis = 1, new vis = 1
UpdateVisibility entity 24, old vis = 1, new vis = 1
MT_PositionChanged
Deserialize
Secondary update 11
MT_Update
UpdateVisibility entity 23, old vis = 2, new vis = 2
UpdateVisibility entity 20, old vis = 2, new vis = 1  <----- becomes fogged
MT_PositionChanged
MT_PositionChanged
UpdateVisibility entity 24, old vis = 0, new vis = 1
UpdateVisibility entity 23, old vis = 2, new vis = 2
UpdateVisibility entity 24, old vis = 1, new vis = 1
UpdateVisibility entity 24, old vis = 1, new vis = 1
MT_PositionChanged
Primary update 12
MT_Update
UpdateVisibility entity 23, old vis = 2, new vis = 2
MT_PositionChanged
Deserialize
Secondary update 12
MT_Update
UpdateVisibility entity 23, old vis = 2, new vis = 2
UpdateVisibility entity 20, old vis = 1, new vis = 0   <----- becomes hidden :(
UpdateVisibility entity 24, old vis = 1, new vis = 1
UpdateVisibility entity 23, old vis = 2, new vis = 2
MT_PositionChanged
Simulation2.cpp(312): Assertion failed: "0 && (L"Serialization test failure")"
Primary update 13
MT_Update
UpdateVisibility entity 20, old vis = 1, new vis = 0   <----- becomes hidden
UpdateVisibility entity 24, old vis = 1, new vis = 1
UpdateVisibility entity 23, old vis = 2, new vis = 2
MT_PositionChanged
Deserialize
Secondary update 13
MT_Update
UpdateVisibility entity 23, old vis = 2, new vis = 2
UpdateVisibility entity 20, old vis = 1, new vis = 0   <----- becomes hidden
UpdateVisibility entity 24, old vis = 1, new vis = 1
UpdateVisibility entity 23, old vis = 2, new vis = 2
MT_PositionChanged

Entities:

id: 2 template: "special/player"
id: 3 template: "special/player"
id: 20 template: "gaia/flora_tree_oak"
id: 23 template: "units/athen_infantry_archer_e"
id: 24 template: "mirage|gaia/flora_tree_oak"

The "primary update #" is the actual simulation updating once per turn; "secondary update #" is the deserialized copy of the simulation updating once per turn. UpdateVisibility is showing calculated visibility changes for player 1 only. So from the output sequence, we can see that entity 20 transitions from fogged to hidden on turn 12 in the secondary simulation, but not in the primary, causing the error.

Strangely, on the next turn 13, entity 20 transitions from fogged to hidden in both secondary and primary simulation states. From this point on, they agree and there are no more errors (unless LOS changes again).

comment:10 by Itms, 10 years ago

Ben, many many thanks for the debugging, I wouldn't have succeeded in isolating the failure in a such clear way.

What happens here is that, after the mirage is created (turn 11), the actual entity must get hidden. In the secondary update, in an inefficient way, the' visibility data of every entity is updated, so the actual tree gets hidden. In the primary though, as nothing toggled the update, nothing happens.

In turn 13, there is something that triggers the update (a unit moving and updating the FoW in the region around the tree) so everything's back to normal.

The fix is simple, when the visibility of an entity changes, this entity must be marked for update for the next turn. I tested it and the serialization tests didn't fail! I'd like people to test rejoining before I commit though.

comment:11 by Itms, 10 years ago

Keywords: patch review added
Summary: Oos svn 15692[PATCH] Oos svn 15692

by Itms, 10 years ago

Attachment: oos.patch added

comment:12 by Itms, 10 years ago

Okay, this is not perfect.

  • I get serialization fails at the end of a game
  • I got a fail with the visibility of an actor (thus not miraged)
  • I tested a lobby game and at rejoin, we got an oos, the dumps only differ by the visibility of an actor.

So I think I corrected one failure but there is at least another one. Ben, can you check the problem that occurred on your test map is fixed? Also, I may have missed something about game ends.

comment:13 by historic_bruno, 10 years ago

The patch fixes the problem from my test.

With or without the patch, I get a serialization test failure when revealing the map through the developer overlay, that's basically what happens when a player is defeated/victorious:

--- C:/Users/Ben/AppData/Local/0ad/logs/oos_log/debug.after.b	Sun Sep 14 22:35:57 2014
+++ C:/Users/Ben/AppData/Local/0ad/logs/oos_log/debug.after.a	Sun Sep 14 22:35:53 2014
@@ -588,7 +588,7 @@ entities:
     x: 294.82261
     z: 393.89921
     vision: 0
-    visibilities: 2863311530
+    visibilities: 2
     retain in fog: 1
     owner: 0
     in world: 1
@@ -597,7 +597,7 @@ entities:
     x: 296.54645
     z: 323.0146
     vision: 80
-    visibilities: 2863311530
+    visibilities: 2
     retain in fog: 0
     owner: 1
     in world: 1

2863311530 is 0xAAAAAAAA. Unlike the last problem, this one doesn't go away after a turn, so long as LOS isn't changing. But if I move the unit away from the tree after revealing the map, eventually the serialization errors stop, maybe that forces the visibility data to refresh but otherwise it goes stale?

comment:14 by Itms, 10 years ago

This is a new patch which corrects problems related to LosRevealAll toggling.

However, this should not correct the problem still occurring at rejoin, because no player had revealed the map. I'll investigate, please tell me if you have any clue.

by Itms, 10 years ago

Attachment: oos.2.patch added

comment:15 by Itms, 10 years ago

Tested with trompetin today, without rejoining OOS!

Could more people test the patch? Thanks in advance!

comment:16 by z, 10 years ago

Multiple tests with angelogenius (aka altamura) today: game goes not always oos, but it still does (always) when 1 player sends a scout into unexplored map area and then leaves and rejoins.

On an explored and revealed map this does not happen, all is fine. Hope that helps. (@ltms: remember to send me a forum pm if you want something specific to be tested on this issue. And yes, this time we applied the oos.2.patch correctly ;) )

comment:17 by Itms, 10 years ago

Thanks a lot for testing this! :D

However I couldn't reproduce what you describe. Could you provide a commands.txt so that I can reproduce the situation? Thanks in advance!

comment:18 by Itms, 10 years ago

Attaching two dumps for reference. oos_dump_z.txt is the one for the rejoining player (non host). At first sight the problem is that the miraging order is not respected. I also have a visibility inconsistency for an actor (see comment 12).

by Itms, 10 years ago

Attachment: Dumps.rar added

comment:19 by Itms, 10 years ago

In 15765:

Address some possible causes of bad handling of visibility updates. Also includes a little code tweaking allowing the global update of all visibility data.

Refs #2760

comment:20 by Itms, 10 years ago

Well, I propose this patch that should fix the remaining issue and that also addresses a possible hard edge case about border vertices. However the bug would happen only for square maps so that's possible no one encountered it.

by Itms, 10 years ago

Attachment: lostiles.patch added

comment:21 by Itms, 10 years ago

Resolution: fixed
Status: newclosed

In 15779:

Extend visibility update to all tiles around LoS vertices. Incidentally prevent an issue from arising on square maps. Also improve a code comment.

This should be enough to fix #2760.

comment:22 by Itms, 10 years ago

Please reopen this if you encounter more problems on rejoining. Please note #2806 might also be the cause of the problem.

comment:23 by sanderd17, 8 years ago

Keywords: review removed
Note: See TracTickets for help on using tickets.