Opened 8 years ago
Last modified 4 years ago
#3643 reopened defect
[PATCH] Server NetTurnManager crash when a client cancels a rejoin and rejoins again — at Version 10
Reported by: | elexis | Owned by: | |
---|---|---|---|
Priority: | Should Have | Milestone: | Backlog |
Component: | Network | Keywords: | |
Cc: | Patch: |
Description (last modified by )
Today a game on r17298 crashed after I late-joined a game as spectator, cancelled the rejoin by killing the process and then rejoining again or something similar. I could reproduce the issue at least once.
NetTurnManager.cpp(586): Assertion failed: "turn == m_ClientsReady[client] + 1" Assertion failed: "turn == m_ClientsReady[client] + 1" Location: NetTurnManager.cpp:586 (NotifyFinishedClientCommands) Call stack: (0x96d68b) ./pyrogenesis() [0x96d68b] (0x915861) ./pyrogenesis() [0x915861] (0x9170ca) ./pyrogenesis() [0x9170ca] (0x4552e9) ./pyrogenesis() [0x4552e9] (0x46f927) ./pyrogenesis() [0x46f927] (0x461362) ./pyrogenesis() [0x461362] (0x46ca3f) ./pyrogenesis() [0x46ca3f] (0x47334e) ./pyrogenesis() [0x47334e] (0x47399d) ./pyrogenesis() [0x47399d] (0x473a7d) ./pyrogenesis() [0x473a7d] (0x7f978d0f96aa) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f978d0f96aa] (0x7f978ce2eeed) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f978ce2eeed] errno = 0 (Try again later) OS error = ?
Also this finite state machine error pops up on the server when cancelling the rejoin sometimes (but doesn't produce a crash):
ERROR: Net server: Error running FSM update (type=19 state=4)
MessageType 19 means NMT_SYNC_CHECK
and server state 4 means means SERVER_STATE_INGAME
.
Thanks to fatherbushido for sending the crashlog, allowing to understand the error!
Change History (15)
comment:1 by , 8 years ago
by , 8 years ago
Attachment: | crashlog_reproduce.txt.7z added |
---|
This is the crashlog of my accidental reproduce.
comment:2 by , 8 years ago
Milestone: | Backlog → Alpha 20 |
---|
Putting this on the milestone as this crash occured thrice the last couple of days (and it kills the server thread, so the game is irrevocably gone (which it wouldnt if it only crashed the client thread like #3638).
Should be solvable by removing an ENSURE
and reseting the expected turn number on rejoin.
by , 8 years ago
Attachment: | destroy_server.patch added |
---|
by , 8 years ago
Attachment: | defuse_v1.patch added |
---|
by , 8 years ago
Attachment: | defuse_v2.2.patch added |
---|
comment:3 by , 8 years ago
Keywords: | patch review added |
---|---|
Summary: | Server NetTurnManager crash when a client cancels a rejoin and rejoins again → [PATCH] Server NetTurnManager crash when a client cancels a rejoin and rejoins again |
Still not sure what rid the client to send that message with a turn number of the past.
One hypothesis is that the client adds a simulation command to m_QueuedCommands
after finishing the loading screen but before finishing the rejoin (Synchronising gameplay with other players...
) and then sends the CEndCommandBatchMessage
with the turnnumber of the past (after raching the official current turn).
Failed to reproduce it by opening the developer overlay in that timeframe.
Whatever is causing it, it shouldn't kill the whole game (even a bugged game might be better than expierencing a total shutdown). The patch removes the ENSURE
breakpoint, shows an error message and disconnects the client.
comment:4 by , 8 years ago
Milestone: | Alpha 20 → Alpha 21 |
---|
comment:5 by , 8 years ago
Keywords: | review removed |
---|
Works with regards to not crashing, but the NetTurnManager
still expects a wrong turnnumber from the clients. The game will appear to be paused.
comment:7 by , 8 years ago
Milestone: | Alpha 21 → Backlog |
---|
comment:9 by , 7 years ago
Milestone: | Backlog → Alpha 22 |
---|---|
Priority: | Must Have → Release Blocker |
Doesn't have to stay a release blocker inevitably, but killed too many MP games to go unconsidered.
comment:10 by , 7 years ago
Description: | modified (diff) |
---|---|
Milestone: | Alpha 22 → Alpha 23 |
Priority: | Release Blocker → Must Have |
Here the last messages of fatherbushidos crashlog:
Notice
7F03C2CA5EAD7D9D
is my GUID and that the lastCEndCommandBatchMessage
sticks out as it has turn4136
instead of4158
or above:Not sure if relevant, but I noticed the server continues to send the serialized state even when the client has disconnected already.