Opened 8 years ago
Closed 8 years ago
#4270 closed defect (fixed)
[PATCH] OOS on rejoin - Visual Actor
Reported by: | elexis | Owned by: | Itms |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 21 |
Component: | UI & Simulation | Keywords: | patch |
Cc: | sanderd17 | Patch: |
Description (last modified by )
As #4239 was solved yesterday, we finally got to test for other OOS issues the first time since many months. Here, bb rejoined a game and triggered an OOS.
The textual simstate representations only differ in the visual actor component of 3 women and 1 spearman. They were all in the carry_wood
animation for the host but in the walk
animation for the rejoined client.
VisualActor: base actor: "units/persians/infantry_spearman_b.xml" r: 1 g: 1 b: 1 anim run threshold: 11.75 anim name: "build" anim once: false anim speed: 1 sound group: "resource/construction/con_wood.xml" anim desync: 0.05 anim sync repeat time: 0 length: 1 key: "animation" - value: "carry_wood" + value: "walk" seed: 423 actor: "units/persians/infantry_spearman_b.xml"
Attachments (13)
Change History (32)
by , 8 years ago
comment:1 by , 8 years ago
Reproduce: Gather a bit of wood with a unit, then rejoin while the unit is walking.
comment:2 by , 8 years ago
Description: | modified (diff) |
---|
comment:3 by , 8 years ago
Keywords: | rfc patch added |
---|---|
Summary: | OOS on rejoin - Visual Actor → [PATCH] OOS on rejoin - Visual Actor |
The attached patch should fix this issue, and it also fixes a number of problems. It is a bigger patch than I expected, and I'm going to try to break down the different changes.
- I removed a confusing comment (IMHO): the
SelectAnimation
inInit
is just a proper and fairly simple animation initialization, it is not some clever code to avoid OOS or anything. - The animation sync offset was not saved in the component's memory, thus it was lost when the actor was reloaded. I am not sure it is a big deal, but for the sake of it I added some memory, that I serialized.
- The function that sets the running threshold also loaded a new "walk" animation, which is stupid. The function should be renamed but I don't want to do that right now, it could cause a lot of breakage. I will propose a patch for the renaming.
- I fixed the deserialization by only reloading the animations in the visual
Unit
. Previously we used theSelectAnimation
methods that overwrote a number of things, whereas now everything is reloaded properly without touching the serialized data, so it cannot OOS. I created a new function from existing code, and avoided code duplication. - I fixed the
Update
function and improved the style a lot. Previously it updated only a part of the serialized data, so it could have caused problems.
The OOS of the ticket was caused by the bullet points 3 and 4: the deserialization code wrongly called (4) a function that was itself broken (3).
I am unsure whether it could cause regressions. I believe that all the code I deleted was not logical at all, but maybe some other parts of 0 A.D. assume that it behaves this way.
by , 8 years ago
Attachment: | fix-visualactor-save-reload.patch added |
---|
comment:4 by , 8 years ago
I misunderstood the actual purpose of SelectMovementAnimation
, so here is an updated patch.
The only differences with respect to the previous one are the SelectMovementAnimation
function and the Update
function that I had not rewritten properly. I added some detailed comments about how this second function, and a TODO explaining how it should ideally be done.
by , 8 years ago
Attachment: | fix-visualactor-save-reload.2.patch added |
---|
comment:5 by , 8 years ago
The patch doesn't seem to fix the issue, I can still reproduce it the same way (gather a bit and then rejoin while the carrying unit walks). The oos dump diff is the same.
There are two "walk" animations hardcoded, I would guess those would have to become "carry_wood" if the unit is carrying wood (same for other res).
comment:6 by , 8 years ago
Found the issue, it is also necessary to serialize the animation overrides (else the walk
animation won't be overrode by the carry_wood
one on update).
However I'm stumbling upon another serialization issue I'm trying to fix now.
comment:7 by , 8 years ago
The other issue was caused by the fact that the secondary simulation in the test mode is non-visual. This would not have caused an OOS in an actual game. So I should have fixed the issue now.
I propose to remove the early return in HandleMessage
so we won't have issues when running serialization or rejoin tests. I'm creating a git branch for eased reviewing, and I will post a squashed patch.
by , 8 years ago
Attachment: | fix-visualactor-save-reload.3.patch added |
---|
squashed version for testing
comment:8 by , 8 years ago
Here is a RC that also contains style fixes, splitted for easier reviewing: https://github.com/na-Itms/0ad/tree/visual-actor
And have a squashed RC version for SVN testing.
by , 8 years ago
Attachment: | visual-actor-squash.patch added |
---|
comment:13 by , 8 years ago
Keywords: | rfc removed |
---|
comment:14 by , 8 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
- This error is triggered shortly after rejoins sometimes:
Assertion failed: "! m_Anim->m_ObjectBounds.IsEmpty()" Location: Model.cpp:117 (CModel::CalcBounds)
- We got an OOS on rejoin, identical txt dumps, on a random map script unfortunately, so will try to reproduce it
by , 8 years ago
Attachment: | 0ad_rejoin_error.txt added |
---|
by , 8 years ago
Attachment: | OOS_on_rejoin_siole_r18825.7z added |
---|
comment:15 by , 8 years ago
Thanks for the report, I can take a look this afternoon.
Did this assertion fail in the same game as this OOS? Else I would need the commands.txt for the assertion failing.
by , 8 years ago
Attachment: | crash_on_rejoin_siole_vs_Dizaka_r18828.7z added |
---|
Can't reproduce the crash with the (non-visual) rejointest despite scenario equivalent map
by , 8 years ago
Attachment: | visualactor_crash_pyrennean_r18830.7z added |
---|
by , 8 years ago
Attachment: | reproduce_crash.7z added |
---|
Reproduce the crash by applying this patch (which deserializes the given simstate in loadme
) and doing a -visual-replay
on the replay file. The crash occurs here after zooming out.
by , 8 years ago
Attachment: | savegame-0100.0adsave added |
---|
Alternative reproduce of the crash instantaneously after loading this savegame from the menu (without patch).
by , 8 years ago
Attachment: | screenshot.jpg added |
---|
comment:16 by , 8 years ago
- turn 5842: rejointest starts
- turn 5969: serializationtest fails
- turn 5980: OOS detected in the actual match (as it is checked only every 20th turn in MP)
The savegame in attachment:reproduce_crash.7z is from turn 5980. The game doesn't crash unless one of the entities in the center of this screenshot would become visible. (The screenshot was made from the replay starting on turn 0.)
comment:17 by , 8 years ago
Okay so I've gone digging and based on my findings the above patch fixes the issue. Note that it won't save a broken save game, but adding the following to the deserialize call does:
if (m_AnimSyncRepeatTime == fixed::Zero()) { m_AnimSyncOffsetTime = fixed::Zero(); }
That being said, the issue is that when we select a new animation we reset m_AnimSyncRepeatTime but not m_AnimSyncOffsetTime, which on deserialization may trigger a recalculation of anim states with bogus information resulting in nan down the line (see irc logs of october 15).
To reproduce the original issue, do the following: -Set Engine.SetSimRate(0.1) -Get a citizen soldier (such as an archer). -Order it to attack something twice in a short interval of time (so we trigger SetAnimationSync in UnitAI) -Order it to build something. -Save when the unit is in building animation -Load that save.
It should crash. If you follow the same steps with the above patch, it doesn't.
r18808