Opened 8 years ago

Last modified 4 years ago

#3643 reopened defect

[PATCH] Server NetTurnManager crash when a client cancels a rejoin and rejoins again — at Version 10

Reported by: elexis Owned by:
Priority: Should Have Milestone: Backlog
Component: Network Keywords:
Cc: Patch:

Description (last modified by elexis)

Today a game on r17298 crashed after I late-joined a game as spectator, cancelled the rejoin by killing the process and then rejoining again or something similar. I could reproduce the issue at least once.

NetTurnManager.cpp(586): Assertion failed: "turn == m_ClientsReady[client] + 1"
Assertion failed: "turn == m_ClientsReady[client] + 1"
Location: NetTurnManager.cpp:586 (NotifyFinishedClientCommands)

Call stack:

(0x96d68b) ./pyrogenesis() [0x96d68b]
(0x915861) ./pyrogenesis() [0x915861]
(0x9170ca) ./pyrogenesis() [0x9170ca]
(0x4552e9) ./pyrogenesis() [0x4552e9]
(0x46f927) ./pyrogenesis() [0x46f927]
(0x461362) ./pyrogenesis() [0x461362]
(0x46ca3f) ./pyrogenesis() [0x46ca3f]
(0x47334e) ./pyrogenesis() [0x47334e]
(0x47399d) ./pyrogenesis() [0x47399d]
(0x473a7d) ./pyrogenesis() [0x473a7d]
(0x7f978d0f96aa) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f978d0f96aa]
(0x7f978ce2eeed) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f978ce2eeed]

errno = 0 (Try again later)
OS error = ?

Also this finite state machine error pops up on the server when cancelling the rejoin sometimes (but doesn't produce a crash):

ERROR: Net server: Error running FSM update (type=19 state=4)

MessageType 19 means NMT_SYNC_CHECK and server state 4 means means SERVER_STATE_INGAME.


Thanks to fatherbushido for sending the crashlog, allowing to understand the error!

Change History (15)

comment:1 by elexis, 8 years ago

Here the last messages of fatherbushidos crashlog:

<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4147, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4148, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4149, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4150, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4151, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4152, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4153, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4154, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CSimulationMessage { m_Client: 1, m_Player: 1, m_Turn: 4155, m_Data: ({type:"attack", entities:[12115, 12506, 12702, 12876, 12877, 13090, 13234], target:11715, allowCapture:true, queued:false}) } of size 224 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4155, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4156, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4157, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CSimulationMessage { m_Client: 7, m_Player: 3, m_Turn: 4158, m_Data: ({type:"construct", template:"structures/iber_civil_centre", x:270.0105285644531, z:945.39990234375, angle:2.356194490192345, actorSeed:15341, entities:[213, 214, 6907, 6908, 6909, 6910, 6911, 7161, 7164, 8006, 8008, 8009, 8178, 8179, 8180, 8181, 8182, 8695, 8696, 8697, 8698, 8699, 8871, 8872, 8873, 9047, 9048, 9049, 9050, 9051, 9160, 9161, 9162, 9163, 9164, 9328, 9329, 9330, 9331, 9577, 9578, 9579, 9580, 9581, 10133, 10231, 11031, 11032, 11033, 11034, 11035, 11077, 11078, 11079, 11080, 11215, 11216, 11217, 11218, 11219, 13617, 14730], autorepair:true, autocontinue:true, queued:true}) } of size 1105 to [7F03C2CA...]</p>
<p>Net: Sending message CEndCommandBatchMessage { m_Turn: 4158, m_TurnLength: 500 } of size 9 to [7F03C2CA...]</p>
<p>Net: Sending message CLoadedGameMessage { m_CurrentTurn: 4158 } of size 7 to [7F03C2CA...]</p>
<p>Net server: Received message CRejoinedMessage { m_GUID:  } of size 7 from [7F03C2CA...]</p>
<p>Net: Sending message CRejoinedMessage { m_GUID: 7F03C2CA5EAD7D9D } of size 23 to [7DCDFED8...]</p>
<p>Net: Sending message CRejoinedMessage { m_GUID: 7F03C2CA5EAD7D9D } of size 23 to [A1692034...]</p>
<p>Net: Sending message CRejoinedMessage { m_GUID: 7F03C2CA5EAD7D9D } of size 23 to [6094DF57...]</p>
<p>Net: Sending message CRejoinedMessage { m_GUID: 7F03C2CA5EAD7D9D } of size 23 to [7F03C2CA...]</p>
<p>Net server: Received message CEndCommandBatchMessage { m_Turn: 4136, m_TurnLength: 500 } of size 9 from [7F03C2CA...]</p>
<p>Net: Sending message CSimulationMessage { m_Client: 1, m_Player: 1, m_Turn: 4160, m_Data: ({type:"attack", entities:[14240, 14238, 13090, 12508, 12506, 12118, 12117, 12116, 12115, 12114, 12704, 12703, 12702, 12701, 12700, 12509, 12507, 12877, 12876, 13238, 13237, 13236, 12878, 13235, 13234, 12874, 13087, 14239, 14242, 13088, 14241, 14716, 13086, 14715], target:14554, allowCapture:true, queued:false}) } of size 569 to server</p>
<p>Net client: Disconnected</p>
<p>Net message: ({type:"netstatus", status:"disconnected", reason:0})</p>

Notice 7F03C2CA5EAD7D9D is my GUID and that the last CEndCommandBatchMessage sticks out as it has turn 4136 instead of 4158 or above:

<p>Net server: Received message CEndCommandBatchMessage { m_Turn: 4136, m_TurnLength: 500 } of size 9 from [7F03C2CA...]</p>

Not sure if relevant, but I noticed the server continues to send the serialized state even when the client has disconnected already.

by elexis, 8 years ago

Attachment: crashlog_reproduce.txt.7z added

This is the crashlog of my accidental reproduce.

comment:2 by elexis, 8 years ago

Milestone: BacklogAlpha 20

Putting this on the milestone as this crash occured thrice the last couple of days (and it kills the server thread, so the game is irrevocably gone (which it wouldnt if it only crashed the client thread like #3638).

Should be solvable by removing an ENSURE and reseting the expected turn number on rejoin.

by elexis, 8 years ago

Attachment: destroy_server.patch added

by elexis, 8 years ago

Attachment: defuse_v1.patch added

by elexis, 8 years ago

Attachment: defuse_v2.patch added

Not tested yet with the destroy-patch

by elexis, 8 years ago

Attachment: defuse_v2.2.patch added

comment:3 by elexis, 8 years ago

Keywords: patch review added
Summary: Server NetTurnManager crash when a client cancels a rejoin and rejoins again[PATCH] Server NetTurnManager crash when a client cancels a rejoin and rejoins again

Still not sure what rid the client to send that message with a turn number of the past.

One hypothesis is that the client adds a simulation command to m_QueuedCommands after finishing the loading screen but before finishing the rejoin (Synchronising gameplay with other players...) and then sends the CEndCommandBatchMessage with the turnnumber of the past (after raching the official current turn). Failed to reproduce it by opening the developer overlay in that timeframe.

Whatever is causing it, it shouldn't kill the whole game (even a bugged game might be better than expierencing a total shutdown). The patch removes the ENSURE breakpoint, shows an error message and disconnects the client.

comment:4 by elexis, 8 years ago

Milestone: Alpha 20Alpha 21

comment:5 by elexis, 8 years ago

Keywords: review removed

Works with regards to not crashing, but the NetTurnManager still expects a wrong turnnumber from the clients. The game will appear to be paused.

comment:6 by elexis, 8 years ago

Backlogging due to lack of progress.

comment:7 by elexis, 8 years ago

Milestone: Alpha 21Backlog

comment:8 by elexis, 8 years ago

Component: Core engineNetwork

(set component to network)

comment:9 by elexis, 7 years ago

Milestone: BacklogAlpha 22
Priority: Must HaveRelease Blocker

Doesn't have to stay a release blocker inevitably, but killed too many MP games to go unconsidered.

Last edited 7 years ago by elexis (previous) (diff)

comment:10 by elexis, 7 years ago

Description: modified (diff)
Milestone: Alpha 22Alpha 23
Priority: Release BlockerMust Have
Note: See TracTickets for help on using tickets.