#4277 closed defect (fixed)
[PATCH] OOS on rejoin - ptolemian lighthouse causing different dirty visibilities in the rangemanager
Reported by: | elexis | Owned by: | elexis |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 21 |
Component: | UI & Simulation | Keywords: | patch |
Cc: | Patch: |
Description
Trying some rejoining with r18829 on a skirmish map yielded this OOS where two entities have different dirty visibilities. The error is reproducible using the rejointest tool #4242.
--- debug.after.a.txt 2016-10-11 15:30:53.738286090 +0200 +++ debug.after.b.txt 2016-10-11 15:30:53.746285963 +0200 @@ -32611,11 +32611,15 @@ length: 1024 #: 363 dirty visibility: 0 #: 1 dirty visibility: 1 - #: 151 + #: 119 + dirty visibility: 0 + #: 1 + dirty visibility: 1 + #: 31 dirty visibility: 0 #: 2 dirty visibility: 1 #: 31 dirty visibility: 0
Attachments (8)
Change History (22)
by , 8 years ago
Attachment: | OOS_dirty_visibilities.7z added |
---|
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Summary: | OOS on rejoin - different dirty visibilities → OOS on rejoin - ptolemian lighthouse causing different dirty visibilities in the rangemanager |
---|
comment:3 by , 8 years ago
Not sure if it is related or not but you can very easily trigger an OOS by building the lighthouse and destroying it. It will OOS on the turn it gets destroyed. It seems capturing triggers the same OOS so I'm assuming any ownership change actually fails.
Might be related to this OOS, which is caused by the LOSPlayerCount for player 1 being completely fubar.
comment:4 by , 8 years ago
Correct, build lighthouse, rejoin, delete it or user "wololo" cheat to change ownership -> OOS
by , 8 years ago
Attachment: | lighthouse_rejointest_r18832.7z added |
---|
Minimal skirmish map reproducing wraitiis OOS with the rejointest.
comment:5 by , 8 years ago
Notice in this diff there are no dirty visibilities differing, but "visbilities" and "los state" of the range manager.
@@ -17656,7 +17656,7 @@ x: 611.55506 z: 249.45731 vision: 0 - visibilities: 9 + visibilities: 10 size: 2 retain in fog: 1 owner: 0 @@ -27552,7 +27552,7 @@ #: 39 los state: 0 #: 89 - los state: 13 + los state: 15 #: 97 los state: 0 #: 34
comment:6 by , 8 years ago
Attached patch fixes the lighthouse-specific OOS, but this is non-trivial.
The core issue is that the range manager does not serialize player LOS count (m_PlayerLosCount). This is because in theory it can be recomputed from the range manager state. However the lighthouse breaks this: the range manager does not know that it needs to reveal the shore (in practice, it does because m_losstate is serialized, but this will OOS once the lighthouse goes down).
The thing to note here is that any entity with special LOS behavior will trigger an OOS ultimately.
We have various ways of fixing this: -serialize m_PlayerLosCount. Would work for any new entities we'd add down the line, but may seriously bump the size of the serialization. -cache in the entity state of the range manager that this one needs to call reveal shore. Would work, but any new "special" entity would need another boolean and that just seems like ti would use too much memory to be worth the while. -Call a function in CmpVision when deserializing to call the reveal shore. This could work for different entities.
There is another problem: the reveal shore call relies on the pathfinder grid being initialized. But that gets initialized at the beginning of UpdateComponents, i.e. after deserialization and even after the deserialized message. To fix this, I make CmpVision force a grid update on MT_Deserialize, but this is rather inelegant. Furthermore, it breaks the serialization test because the "before" state becomes different. This is why the above patch comments that out. Note that the "After" state is similar.
comment:7 by , 8 years ago
Keywords: | patch rfc added |
---|---|
Summary: | OOS on rejoin - ptolemian lighthouse causing different dirty visibilities in the rangemanager → [PATCH] OOS on rejoin - ptolemian lighthouse causing different dirty visibilities in the rangemanager |
This is not guaranteed to fix all sources of OOS but it is the proper version of wraitii's hack. I should never have put the revealing shore instructions inside the vision component, that belongs to the range manager.
With respect to wraitii's comment about "any new "special" entity would need another boolean": indeed that is not very streamlined. However, now all the relevant code is where it should be, so it would be really easy to generalize this system into a "special vision" system that associates a vision type with a LOS update function (here RevealShore
) that is called upon ownership change and during the resetting of derived data.
by , 8 years ago
Attachment: | reveal-shore-fix.patch added |
---|
comment:8 by , 8 years ago
wraitii's patch works, Itms one crashes on rejoin if and only if a lighthouse is built:
CCmpRangeManager.cpp(1910): Assertion failed: "shoreGrid.m_W == m_TerrainVerticesPerSide-1 && shoreGrid.m_H == m_TerrainVerticesPerSide-1" Assertion failed: "shoreGrid.m_W == m_TerrainVerticesPerSide-1 && shoreGrid.m_H == m_TerrainVerticesPerSide-1" Location: CCmpRangeManager.cpp:1910 (RevealShore)
by , 8 years ago
Attachment: | fix_through_serialization.patch added |
---|
comment:9 by , 8 years ago
Itms's patches is my second solution above. Indeed it should work, but the pathfinder grid being unitinialized on deserialization makes it crash. To fix that we could serialize the pathfinder grid (no easy way to be sure we can recompute it before then).
Another solution to this OOS is to serialize player counts, which my patch above does. With the repetitive vectors, it shouldn't be a huge increase in memory footprint.
comment:10 by , 8 years ago
I do not think serializing the entire player LOS count is sane.
In order to fix my patch, we should make sure that the pathfinder data that allows one to compute the shore grid is properly serialized. At first sight one should serialize m_MapSize
in the pathfinder, and that should be enough.
by , 8 years ago
Attachment: | reveal-shore-notfix_v2.patch added |
---|
Prior patch by Itms with his new suggestion to also serialize m_MapSize
in the pathfinder. Doesn't work as there is still the OOS after deleting the lighthouse when someone rejoined after it was built.
by , 7 years ago
Attachment: | reveal-shore-fixv0.patch added |
---|
comment:11 by , 7 years ago
Should work. The problem with the above patches is that the water manager gets deserialized last, so when the pathfinder calls it it gets garbage. This fixes it by doing the "reset derived data" on the deserialized message which is guaranteed to come after those.
BTW, why is the shore grid computed in the pathfinder, and why is it not cached?
comment:12 by , 7 years ago
Keywords: | review added; rfc removed |
---|
After some research, it looks like it would be the correct solution indeed. I modified the comment so it is clear that the ResetDerivedData
needs to be done at that point, I removed your debug code and I changed the pathfinder component so it uses the SerializeCommon
pattern.
The shore grid is strongly coupled to the pathfinding code (except for that lighthouse) so it made sense for it to be part of the pathfinder. I agree it could in principle be part of the water manager, but this component should then ask for some data from the pathfinder. Wrt caching, it's not really useful as the shore grid is not computed often, and note that caching would not have prevented the OOS here.
The attached patch should be ready for commit. I checked that serialization tests passed, rejoining could be tested too.
by , 7 years ago
Attachment: | reveal-shore.patch added |
---|
comment:14 by , 7 years ago
Keywords: | review removed |
---|
Thanks for fixing this OOS!
The patch was rereviewed by wraitii and passed all my rejoin tests.
The original commands.txt
from the report still triggers a failure with r18878 but not r18879. Notice on that report there was no ownership change or destruction of that lighthouse. Something else must have caused a recomputation of that one particular tile.
fatherbushido has also mentioned concern with #4290, that players might own multiple lighthouses. The code looks like it would remove the shoreline visibility after capturing (even if the player still has another lighthouse), but according to some testing this isn't an issue.
m_DirtyVisibility
ofCCmpRangeManager.cpp