Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#2000 closed defect (fixed)

OOS in a multiplayer game against Aegis

Reported by: scythetwirler Owned by: Yves
Priority: Release Blocker Milestone: Alpha 15
Component: Core engine Keywords:
Cc:

Description

I got another OOS error while playing on Oasis 9 against Aegis with a friend.

It seems there was a discrepancy concerning whether one of the AI's units was walking or idle between the client and the host.


This may be a duplicate of #1881, but here are the logs just the case.

Attachments (8)

difference.txt.tar.7z (7.5 KB) - added by scythetwirler 4 years ago.
Em_oos_dump.txt.tar.7z (229.1 KB) - added by scythetwirler 4 years ago.
scythetwirler_oos_dump.txt.tar.7z (227.5 KB) - added by scythetwirler 4 years ago.
Em_scythetwirler.diff (86.7 KB) - added by scythetwirler 4 years ago.
oos_dump_a14.7z.001 (512.0 KB) - added by historic_bruno 4 years ago.
Alpha 14 OOS data - 09052013 (part 1 of 2)
oos_dump_a14.7z.002 (140.2 KB) - added by historic_bruno 4 years ago.
Alpha 14 OOS data - 09052013 (part 2 of 2)
disable_jit_as_OOS_workaround_v1.0.diff (722 bytes) - added by Yves 3 years ago.
work around the OOS problem by disabling JIT compiling for a problematic function/loop
commands.txt (294.1 KB) - added by Yves 3 years ago.
the commands.txt I used to reliably reproduce the problem

Download all attachments as: .zip

Change History (18)

Changed 4 years ago by scythetwirler

Changed 4 years ago by scythetwirler

Changed 4 years ago by scythetwirler

Changed 4 years ago by scythetwirler

comment:1 Changed 4 years ago by historic_bruno

  • Resolution set to fixed
  • Status changed from new to closed

Resolving this as fixed in A14 :) If the OOS can be reproduced with A14 or later, please reopen and attach relevant logs.

comment:2 Changed 4 years ago by historic_bruno

  • Milestone changed from Alpha 14 to Alpha 15
  • Resolution fixed deleted
  • Status changed from closed to reopened

Changed 4 years ago by historic_bruno

Alpha 14 OOS data - 09052013 (part 1 of 2)

Changed 4 years ago by historic_bruno

Alpha 14 OOS data - 09052013 (part 2 of 2)

comment:3 Changed 4 years ago by historic_bruno

Still occurs with A14 release and using our own Math.sqrt implementation doesn't solve it. Happens basically every game between my friend and I.

comment:4 Changed 3 years ago by Yves

  • Owner set to Yves
  • Status changed from reopened to new

I've found a way to reproduce such an OOS problem with Aegis and I'm now trying to find the issue.
I should have more information or a solution this weekend.

comment:5 Changed 3 years ago by Yves

  • Status changed from new to assigned

comment:6 Changed 3 years ago by Yves

It's a Spidermonkey JIT problem.
I can reliably get different results on different machines and sometimes also on the same machine.

This patch works around the problem by adding "try {} finally {}" which is not supported by the v1.8.5 JIT compiler and prevents it from JIT compiling the functions/loops (thanks to the guys on the #jsapi channel for this tip!).
With this patch I was able to run the replay twice on my development client and my VM without any differences. Before the patch I ran it countless times and only once got the same result by chance.

I'm not sure if the first "try {} finally {}" block is needed, I will have to test that a bit more tomorrow and will also add comments for the final patch.
The clean solution for this is upgrading Spidermonkey, but I'd say this workaoround is reasonable for the moment.

Changed 3 years ago by Yves

work around the OOS problem by disabling JIT compiling for a problematic function/loop

Changed 3 years ago by Yves

the commands.txt I used to reliably reproduce the problem

comment:7 Changed 3 years ago by historic_bruno

Can you give some details on how you located the source of the OOS? AI state isn't serialized so it would be interesting to know how it can be narrowed down. It should probably go on Debugging.

comment:8 Changed 3 years ago by Yves

I've updated the Debugging wiki article.
Basically I printed all the commands from the AI players as described. This revealed that different move commands are coming from the AI when it starts an attack. Then I added additional debug output to all the places in attack_plan.js where the AI sends move commands and slowly got closer to the source of the problem by comparing a lot of output logs from replays with filediffs.

comment:9 Changed 3 years ago by Yves

  • Resolution set to fixed
  • Status changed from assigned to closed

In 14303:

Disables JIT compiling of a loop to work around OOS errors in multiplayer games with AI players.
Fixes #2000

comment:10 Changed 3 years ago by historic_bruno

Nice work, my friend and I will try with A15 once it's released, since we had OOS every game before :)

Note: See TracTickets for help on using tickets.