Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#3108 closed defect (fixed)

[PATCH] A18: Instant OOS between OS X 10.8 and other OSes

Reported by: elexis Owned by: Ben Brian
Priority: Release Blocker Milestone: Alpha 19
Component: Core engine Keywords: patch
Cc: Patch:

Description

Multiple people using OSX have reported that they receive an oos on every start of the game. ysey is one of them. He is on revision 16409, which is the currently released revision for OSX. I am on revision 16411. He sent me his oos dump after we started a game. The oos_dump is identical to mine! LjL reported the same issue in #0ad on march 17'th. trompetin17 and historicbruno do not seem to experience this issue with 16409.

Attachments (2)

hasher-output.zip (495.3 KB) - added by Ben Brian 4 years ago.
osx-libcpp-serialization-fixes.patch (6.8 KB) - added by Ben Brian 4 years ago.
Updated for latest SVN

Download all attachments as: .zip

Change History (23)

comment:1 Changed 4 years ago by elexis

Only background music changed between 16409 and 16411, so it can't be the version difference.

trompetin17 (OSX Yosemite + revision 16409) tested with me and we got no out of sync.

LJL reported in #0ad http://irclogs.wildfiregames.com/2015-03-17-QuakeNet-%230ad.log that he has OSX 10.8.5 (Mountain Lion).

comment:2 Changed 4 years ago by elexis

ysey has mac os version 10.8.5

comment:3 Changed 4 years ago by Ben Brian

I am able to reproduce this only on OS X 10.8. I used the -ooslog option to dump the binary state every turn, but I see no difference there. The only difference is in the calculated hash between the clients, my thought is there is a bug affecting the serializer or MD5 hash calculation on OS X 10.8. It may be possible to continue playing the game, since further hash mismatches will be ignored.

comment:4 in reply to:  3 Changed 4 years ago by elexis

Replying to historic_bruno:

It may be possible to continue playing the game, since further hash mismatches will be ignored.

True, ysey is playing happily ignoring the error. Only downside of this bug is the confusion other players have.

comment:5 Changed 4 years ago by Ben Brian

Summary: Alpha18 broken on some OSXA18: Instant OOS between OS X 10.8 and other OSes

Changed 4 years ago by Ben Brian

Attachment: hasher-output.zip added

comment:6 Changed 4 years ago by Ben Brian

The raw data that is hashed differs substantially from the dumped binary state in this case. I modified the HashSerializer to dump all processed bytes to a new file, those files are attached for analysis. They are identical up to the CCmpAIManager state, which is apparently missing from the OS X version. They are identical again after that missing data.

Philip's idea is that it might be a difference (or bug?) in the std::streambuf implementation. I set a breakpoint in CSerializerStreamBuf::xsputn which is supposed to do stream write redirection, but it's not called on OS X 10.8. CCmpAIManager has its own serializer and relies on this behavior.

Last edited 4 years ago by Ben Brian (previous) (diff)

comment:7 Changed 4 years ago by elexis

Milestone: BacklogAlpha 19

comment:8 Changed 4 years ago by Ben Brian

I've done more testing of this. It seems that streambuf::xsputn isn't required to be called, though in practice most C++ libs will do that as an optimization when writing strings. It was added to libc++ in r172542, which explains why our code worked with libstdc++ and works with libc++ on 10.9+.

Here's a minimal test case:

#include <iostream>
#include <streambuf>

class OurStreamBuf : public std::streambuf
{
public:
    OurStreamBuf()
    {
    }

    int overflow(int ch)
    {
        std::cout << "overflow: " << char(ch) << std::endl;
        return ch;
    }

    std::streamsize xsputn(const char* s, std::streamsize n)
    {
        std::cout << "xsputn: " << s << std::endl;
        return n;
    }
};

int main(int argc, char* argv[])
{
    OurStreamBuf ourbuf;
    std::streambuf& streambuf = ourbuf;
    std::ostream stream(&streambuf);

    stream << "This is a test\n";

    return 0;
}

In MSVC, GCC, clang w/ libstdc++, and clang with libc++ on 10.9+, one call to OurStreamBuf::xsputn is made with the complete test string. On clang with libc++ on 10.8, a call to OurStreamBuf::overflow is made for each character, but we don't implement that in the game, so it never makes it to the serializer implementation.

I think our code is technically incorrect, we need to override streambuf::overflow as well, even though it won't be called in most C++ libs - it's the technically correct thing to do and most examples I find of deriving streambuf are implemented that way. Single characters get passed in and if we want it buffered, we have to implement that ourselves.

comment:9 Changed 4 years ago by Ben Brian

Owner: set to Ben Brian
Status: newassigned

comment:10 Changed 4 years ago by Ben Brian

Reminder: add a test for this

comment:11 Changed 4 years ago by Ben Brian

Keywords: patch review added
Summary: A18: Instant OOS between OS X 10.8 and other OSes[PATCH] A18: Instant OOS between OS X 10.8 and other OSes

I've attached a patch for review, tested on OS X and Windows.

There are two points of concern:

  • I implemented a buffer to prevent a large number of calls from streambuf::overflow to the hash algorithm, but I'm not sure about the optimal size. In the patch it is 16 bytes, which seems to match nicely with the hash implementation, while not being too large and leading to memory fragmentation. I've opted for a fixed size array, not a vector.
  • The buffer must be flushed after the final serialization. A destructor doesn't seem like a good place for that, because then the hash result depends on the scope of the serializer object. I don't want to require users of the serializer to manually flush the stream either. The only solution I've found that works reliably is to set the unitbuf flag on the stream as early as possible, forcing every write to flush the buffer. Maybe this has implications on performance...

The buffer and associated streambuf functions should only be needed on old versions of libc++. To test them on other platforms, I think commenting out the overridden xsputn is enough. Simulation state hashes should be identical and there should be no OOSes.

I added a test to catch future problems like this in the streambuf implementation.

comment:12 Changed 4 years ago by Ben Brian

The patch also has a temporary timer added to the hash calculations. I have to say that with MSVC, sometimes the buffered streambuf had better performance than using xsputn, but depending on the buffer size it could also be worse. I haven't tested performance with other C++ libs.

comment:13 Changed 4 years ago by stanislas69

Could someone with a Mac try this ?

comment:14 Changed 4 years ago by Ben Brian

Note: m_Buffer for the StreamBuf? could be a pointer to a dynamically allocated array, that only gets created as needed by overflow(), if that's more efficient. Some implementations do that. It could also be something like a std::vector, but as with the current solution, performance would need to be evaluated.

Changed 4 years ago by Ben Brian

Updated for latest SVN

comment:15 Changed 4 years ago by stanislas69

Can someone test this ?

comment:16 Changed 4 years ago by stanislas69

I cannot reproduce the OOS with Maverics 10.9.5 so this is truly a Mountain Lion Problem, would be nice that it gets commited since we support up 10.7.

Last edited 4 years ago by stanislas69 (previous) (diff)

comment:17 Changed 4 years ago by Ben Brian

In latest SVN, there is some other OOS between OS X and Windows, so it's difficult for me to test this patch.

comment:18 Changed 4 years ago by elexis

While #3499 is open, you could try with a revision prior to r16751? If you revert to that revision, you could also see if that commit introduced #3499 or if its caused by something else.

comment:19 Changed 4 years ago by Ben Brian

That will take some time as reverting tons of files and recompiling isn't quick, but I don't have much choice it seems :(

comment:20 Changed 4 years ago by ben

Resolution: fixed
Status: assignedclosed

In 17133:

Fixes stream serialization bug on OS X 10.8 and older, which caused instant OOS in multiplayer games, fixes #3108.
Fixes test failures on OS X 10.7 and older, refs #3109

comment:21 Changed 4 years ago by Ben Brian

Keywords: review removed
Note: See TracTickets for help on using tickets.