Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#3271 closed defect (fixed)

[PATCH] OOS on rejoin - different mirage order

Reported by: elexis Owned by: Itms
Priority: Must Have Milestone: Alpha 19
Component: Core engine Keywords: patch
Cc: Patch:

Description (last modified by elexis)

This out of sync error happened when I rejoined an a18 match. The only real difference is that two entities have swapped IDs. Their properties are identical as far as I have checked (the tree even has the same amount of wood in both versions).

--- oos_dump_ffm.txt	2015-05-27 01:12:35.000000000 +0200
+++ oos_dump_elexis.txt	2015-05-27 01:12:35.000000000 +0200
@@ -9318,15 +9318,15 @@
     id: 6743
     template: "mirage|gaia/flora_tree_oak_aut"
     id: 6744
     template: "foundation|structures/rome_storehouse"
     id: 6745
-    template: "mirage|gaia/flora_tree_oak_aut"
+    template: "mirage|foundation|structures/rome_storehouse"
     id: 6746
     template: "mirage|gaia/flora_tree_oak_aut"
     id: 6747
-    template: "mirage|foundation|structures/rome_storehouse"
+    template: "mirage|gaia/flora_tree_oak_aut"
     id: 6748
     template: "mirage|gaia/flora_tree_oak_aut"
     id: 6749
     template: "mirage|gaia/flora_tree_oak_aut"
     id: 6750

The other differences are follow ups (properties for the wrong IDs of those two entities) and irrelevant local entities (waypoint flags & dead units) at the end of the log.

Attachments (10)

t3271_oos_dump_and_commands.txt.7z (326.2 KB ) - added by elexis 9 years ago.
t3271_oos_dump_and_complete_commands_and_map.7z (666.7 KB ) - added by elexis 9 years ago.
Luckily ffm found the commands.txt file from that time, so I can now provide the full commands.txt file of the host, the skirmish map that is equivalent to that random map and the modified commands.txt file, so that you can actually replay that match in serializationtestmode! Remember it was an a18 match.
t3271_serializationtest_dumps.7z (606.7 KB ) - added by elexis 9 years ago.
When you apply this patch http://trac.wildfiregames.com/attachment/ticket/3460/t3460_preview_wip_v0.1.patch to alpha 18 and do a serializationtest on the modified commands.txt file using that custom map, then you will obstain those dumps. Notice they also contain the visibilities error of #3107, which is to be ignored. If you diff debug.after.a and debug.after.b, you will see that the exact same two entities mentioned above are different here too. So this counts as a full reproduce with a18!
t3271_serializationtest_dumps_cleaned.7z (604.7 KB ) - added by elexis 9 years ago.
If you also apply this patch http://trac.wildfiregames.com/attachment/ticket/3107/t3107_bugfix_a18.patch to alpha 18 on top of the other, then the visibilities bug #3107 will not appear in the oosdump. Thus you get a more minimal diff with only that bug.
t3271_oosdump_turn_2906.txt.7z (317.7 KB ) - added by elexis 9 years ago.
Dump of the simulation state one turn prior to the rejoin.
reproduce_r17211.7z (623.4 KB ) - added by elexis 8 years ago.
Here a commands.txt with a custom scenario map where the serializationtest fails on turn 135, r17211.
commands_minimized.txt (5.1 KB ) - added by elexis 8 years ago.
Minimized version of the replay above (only 9 commands)
Fix.patch (1.6 KB ) - added by wraitii 8 years ago.
Fix.2.patch (3.7 KB ) - added by wraitii 8 years ago.
better version
visibility_serialization.patch (5.4 KB ) - added by Itms 8 years ago.

Change History (27)

by elexis, 9 years ago

comment:1 by elexis, 9 years ago

Description: modified (diff)
Summary: OOS on rejoin - miragesOOS on rejoin - swapped IDs

comment:2 by elexis, 9 years ago

Maybe related to #3375 ?

by elexis, 9 years ago

Luckily ffm found the commands.txt file from that time, so I can now provide the full commands.txt file of the host, the skirmish map that is equivalent to that random map and the modified commands.txt file, so that you can actually replay that match in serializationtestmode! Remember it was an a18 match.

by elexis, 9 years ago

When you apply this patch http://trac.wildfiregames.com/attachment/ticket/3460/t3460_preview_wip_v0.1.patch to alpha 18 and do a serializationtest on the modified commands.txt file using that custom map, then you will obstain those dumps. Notice they also contain the visibilities error of #3107, which is to be ignored. If you diff debug.after.a and debug.after.b, you will see that the exact same two entities mentioned above are different here too. So this counts as a full reproduce with a18!

by elexis, 9 years ago

If you also apply this patch http://trac.wildfiregames.com/attachment/ticket/3107/t3107_bugfix_a18.patch to alpha 18 on top of the other, then the visibilities bug #3107 will not appear in the oosdump. Thus you get a more minimal diff with only that bug.

by elexis, 9 years ago

Dump of the simulation state one turn prior to the rejoin.

comment:3 by elexis, 9 years ago

Summary: OOS on rejoin - swapped IDsOOS on rejoin - different mirage order

Using attachment:visual_replay_a18_megapatch.patch:ticket:9 you can visually replay the game.

Notice you can open the javascript console by hitting F9 and type selectAndMoveTo(6745); or selectAndMoveTo(6747); to see those two mirages. In visual replay, 6745 is the tree and 6747 the dock. The mirage of the storehouse is available only for a very short time (after the foundation is placed and before the building was finished). This occurs minute 23/24.

The tree mirage is located near the dock of player 1 (blue) and exists after min 23/24 when the red player walks past that dock with roughly 20 units.


Observation 1: entity 6745 and 6747 are swapped for rejoined client and host Observation 2: If you compare the dump of turn 2906 and 2907, you will see that exactly those four entities were added in that turn:

>     id: 6744
>     template: "foundation|structures/rome_storehouse"
>     id: 6745
>     template: "mirage|foundation|structures/rome_storehouse"
>     id: 6746
>     template: "mirage|gaia/flora_tree_oak_aut"
>     id: 6747
>     template: "mirage|gaia/flora_tree_oak_aut"

Observation 3: It was player 2 (enemy of player 5) who brought the tree mirages into existance, and it is player 5 who placed the storehouse foundation.

For the rejoined client (observer), the mirage of the foundation is added immediately, while for the host and other clients it is added last.

Last edited 9 years ago by elexis (previous) (diff)

comment:4 by elexis, 9 years ago

You can print a stacktrace in Fogging.prototype.Init and Fogging.prototype.LoadMirage of Fogging.js using error("\nINIT " + this.entity + "\n" + new Error().stack);. It shows which part of the code initiated the mirage:

Turn 2907 (500)... ERROR: 
INIT 6744
ERROR: Fogging.prototype.Init@simulation/components/Fogging.js:14:2
TryConstructBuilding@simulation/helpers/Commands.js:830:6
commands.construct@simulation/helpers/Commands.js:342:3
ProcessCommand@simulation/helpers/Commands.js:37:3

ERROR: 
INIT 1354
ERROR: Fogging.prototype.LoadMirage@simulation/components/Fogging.js:59:2
Fogging.prototype.OnVisibilityChanged@simulation/components/Fogging.js:219:3

ERROR: 
INIT 1354
ERROR: Fogging.prototype.LoadMirage@simulation/components/Fogging.js:59:2
Fogging.prototype.OnVisibilityChanged@simulation/components/Fogging.js:219:3

ERROR: 
INIT 6744
ERROR: Fogging.prototype.LoadMirage@simulation/components/Fogging.js:59:2
Fogging.prototype.OnVisibilityChanged@simulation/components/Fogging.js:219:3

So clearly the OnVisibilityChanged message is processed earlier for the host.

comment:5 by elexis, 9 years ago

As mentioned above, the game went oos on turn 2907 and only those 4 entities were added in that turn (storehouse, its foundation and 2 tree mirages).

As can be seen in the original oos-dump-diff, the host (ffm) creates the storehouse-foundation mirage after the tree mirages are created, while the rejoined client (me) creates the mirage for the storehouse first.

So the easiest fix for this particular scenario would be to create the mirage of the foundation immediately after placing the foundation. There might be other scenarios where mirages are created in different orders though.

comment:6 by elexis, 9 years ago

Priority: Release BlockerMust Have

In my opinion not a release blocker in comparison to the other ones, as this one occured rarely on alpha 18, while the other a19 release blockers occur all the time.

But since the solution seems to lie at hand, lets try to solve this in a19.

comment:7 by Josh, 8 years ago

How is the list of mirages created being stored? It sounds like the serializer somehow loses the order of a list.

comment:8 by wraitii, 8 years ago

Might be related to #3375 ?

by elexis, 8 years ago

Attachment: reproduce_r17211.7z added

Here a commands.txt with a custom scenario map where the serializationtest fails on turn 135, r17211.

comment:9 by elexis, 8 years ago

The oos-dump diff of the new sample:

     id: 2617
-    template: "mirage|foundation|structures/cart_civil_centre"
-    id: 2618
     template: "mirage|foundation|structures/gaul_civil_centre"
+    id: 2618
+    template: "mirage|foundation|structures/cart_civil_centre"

by elexis, 8 years ago

Attachment: commands_minimized.txt added

Minimized version of the replay above (only 9 commands)

comment:10 by elexis, 8 years ago

What happens in that replay:

  • turn 0: everyone scouts
  • turn 50: red and blue place CC foundation
  • turn 51+: teal cav scouts the foundation of red's cc
  • turn 114: teal cav scouts the foundation of blue's cc
  • turn 135:
    • teal cav moved away from blue CC. The foundation enters it's FOW.
    • black sets CC foundation.
    • serializationtesterror.

Entities in the visual replay of the minimized commands.txt: 2599: foundation of blue's cc 2600: foundation of red's cc 2601: construction site of blue's cc 2602: construction site of red's cc 2603: teal's mirage of red's cc 2604: foundation of black's cc 2605: teal's mirage of blue's cc 2606: black's mirage of black's cc

Those should be equivalent to the host's version.


Acccording to the oosdumps of the serializationtesterror: host on turn 135:

    id: 2605
    template: "mirage|foundation|structures/cart_civil_centre" (blue) <- teal's mirage created when the blue's CC enters it's FOW
    id: 2606
    template: "mirage|foundation|structures/gaul_civil_centre" (black) <- black's foundation placed this turn

rejoined client on turn 135:

    id: 2605
    template: "mirage|foundation|structures/gaul_civil_centre" (black) <- black's foundation placed this turn
    id: 2606
    template: "mirage|foundation|structures/cart_civil_centre" (blue) <- teal's mirage created when the blue's CC enters it's FOW

Again the conclusion is, that the host first processes the FOW-mirage creation, while the rejoined client first processes the newly placed foundation.

comment:11 by wraitii, 8 years ago

The only way I see that happening is if commands are processed in a different order, or if messages are processed in a different order.

by wraitii, 8 years ago

Attachment: Fix.patch added

comment:12 by wraitii, 8 years ago

This fixes the issue, caused by the global flag being set only for the rejoining player.

I don't think there's a much simpler fix given the number of functions those can call, I guess we could "skip" the modified entities in the global reprocessing? Is it worth it?

by wraitii, 8 years ago

Attachment: Fix.2.patch added

better version

comment:13 by wraitii, 8 years ago

Okay scrap the above patch, it's completely buggy.

The only ways I see to fix this are to either:

-serialize everything needed so that a call to UpdateVisibilityData is exactly similar (m_dirtyVisibility...). This implies some changes to ResetDerivedData. I'm also not entirely sure it'd work. But it might still be a good, sane idea.

-make the code less reliant on orders by not sending messages immediately but storing them in a vector of some kind, then sort and make unique that vector and send messages. This should fix the OOS at a rather low cost if the number of messages at any time remains low.

comment:14 by Itms, 8 years ago

Keywords: patch review added
Summary: OOS on rejoin - different mirage order[PATCH] OOS on rejoin - different mirage order

Here is a patch fixing the issue.

As understood by wraitii, assuming everything is dirty on deserialization is not always the good thing to do and it's best to make sure everything is the same when calling UpdateVisibilityData.

The attached patch replaces m_GlobalVisibilityUpdate = true by a real serialization of the dirtiness values. To achieve that, I had to add a "deserialization in progress" flag to the range manager and I also used it to remove an old function parameter.

This patch should be checked for bugs (elexis?) and for style, especially that new flag (leper?) ;)

by Itms, 8 years ago

comment:15 by Itms, 8 years ago

Owner: set to Itms
Resolution: fixed
Status: newclosed

In 17223:

Serialize everything that is needed by UpdateVisibilityData instead of assuming everything as dirty.
Fixes #3271.

comment:16 by Itms, 8 years ago

Keywords: review removed

comment:17 by elexis, 8 years ago

The minimal replay doesn't crash with r17223, but r17222, so this particular issue is fixed. There were some svn 1v1s and people did arbitrary rejoins without OOS, so I guess it's fine.

Note: See TracTickets for help on using tickets.