Opened 10 years ago

Closed 9 years ago

#1828 closed defect (fixed)

Serialization test fails during construction

Reported by: historic_bruno Owned by: ben
Priority: Release Blocker Milestone: Alpha 13
Component: UI & Simulation Keywords:
Cc: Patch:

Description (last modified by historic_bruno)

This is consistently reproducible for me:

  1. Load a map in Atlas (include -serializationtest option), Acropolis 1 works
  2. Start simulation test
  3. Build a field
  4. When construction finishes, delete the field
  5. Build a mill

The serialization test fails while approaching the mill foundation. Sometimes there is no difference in the debug state logs, sometimes there is one local entity difference (which shouldn't affect the serialization).

Change History (12)

comment:1 by thomas, 9 years ago

I'm not able to reproduce this. Should I be seeing something in the pyrogenesis_dbg_stdout.txt?

in reply to:  1 comment:2 by historic_bruno, 9 years ago

Replying to thomas:

I'm not able to reproduce this. Should I be seeing something in the pyrogenesis_dbg_stdout.txt?

Did you run the game with the -serializationtest option?

comment:3 by Kieran P, 9 years ago

Priority: Should HaveRelease Blocker

comment:4 by historic_bruno, 9 years ago

Description: modified (diff)

comment:5 by historic_bruno, 9 years ago

Looked into this a bit more, doing a binary diff of state.after.a and state.after.b shows they differ by one bit in CmpFoundation's buildMultiplier field value (for future reference, enabling DEBUG_SERIALIZER_ANNOTATE in StdSerializer.h can be very helpful ;):

00 00 00 00 00 00 F8 FF
00 00 00 00 00 00 F8 7F

both appear as "null" in the debug state dumps, so there's no difference there. If I'm reading the file correctly, they are both supposed to be type SCRIPT_TYPE_DOUBLE.

comment:6 by Yves, 9 years ago

BinarySerializer.cpp checks if the double type fits into an integer for serialization (around line 170). I don't know if that matters but it came to my mind when I red this.

comment:7 by thomas, 9 years ago

I'm not sure if this is related, but I'm now getting a serialization test failure on the "combat demo" scenario after a few units die.

in reply to:  6 ; comment:8 by historic_bruno, 9 years ago

Replying to Yves:

BinarySerializer.cpp checks if the double type fits into an integer for serialization (around line 170). I don't know if that matters but it came to my mind when I red this.

That caught my eye as well, but in that case the type should be SCRIPT_TYPE_INT, not SCRIPT_TYPE_DOUBLE. And I don't understand why it ends up being converted to "null" by the debug serializer.

comment:9 by historic_bruno, 9 years ago

After discussing this with Philip, he found those values are -NaN and NaN. That's caused by 0/0 in

this.buildMultiplier = Math.pow(this.numRecentBuilders, 0.7) / this.numRecentBuilders;

which is painfully obvious now that I see it. As for why one is NaN and the other -NaN, we think it's a difference in JIT vs. non-JIT paths (primary simulation in one, secondary in the other), and indeed it doesn't seem reproducible with JIT disabled.

in reply to:  8 comment:10 by historic_bruno, 9 years ago

Replying to historic_bruno:

And I don't understand why it ends up being converted to "null" by the debug serializer.

And we should fix canonfloat() in DebugSerializer.cpp to output actual NaN with hex values so it's easier to detect these problems :)

in reply to:  7 comment:11 by historic_bruno, 9 years ago

Replying to thomas:

I'm not sure if this is related, but I'm now getting a serialization test failure on the "combat demo" scenario after a few units die.

You (or we) would need to look at the oos_logs from the game's log path right after you get the first serialization failure. You can try a diff between debug.after.* but in this case that didn't work, the difference isn't actually output by the debug serializer, if they appear identical, you would need to use a binary diff between state.after.*.

It doesn't sound related though (other than being another possible cause of an OOS), since this particular error occurs during building construction.

comment:12 by ben, 9 years ago

Owner: set to ben
Resolution: fixed
Status: newclosed

In 13322:

Fixes serialization test failure caused by CmpFoundation's buildMultiplier property sometimes being NaN (which can assume different internal values in Spidermonkey). For now, NaN cannot be reliably serialized. Fixes #1828, refs #1879

Note: See TracTickets for help on using tickets.