Opened 8 years ago
Last modified 3 years ago
#3700 closed defect
Running the NetClient in a separate thread — at Version 6
Reported by: | elexis | Owned by: | |
---|---|---|---|
Priority: | Must Have | Milestone: | Alpha 24 |
Component: | Network | Keywords: | beta |
Cc: | andy011973@… | Patch: |
Description (last modified by )
Since the NetClient
is running in the same thread as the map generation, the loading screen can entirely block the client for a couple dozen of seconds, thus the client appears to be offline ("losing the connection") while joining or rejoining the game. This can cause the disconnect of the client while loading the game (some players can't rejoin anymore).
Some of the map generator calls are threaded, but it were more sustainable if the client is executed in a separate thread, avoiding repetition of this problem.
Original report (only one symptom of the underlying problem):
Since a chat-message is displayed when a client has finished rejoining (#1949), one can easily observe a reoccuring, distinct type of lag ("finished-rejoin lag").
It happens whenever a game has been played for a while (much serialized data).
The game appears to be frozen for 5 to 15 seconds.
Here is what happens:
The client is starting to rejoin.The server sends the client a lot of serialized data.The client needs a significant time to deserialize it.Meanwhile the game has continued (i.e. new turns)In order to finish the rejoin, the client simulates the new turns while all other clients wait for him. This is the finish-rejoin lag.
With #3242 rejoins happen more often, so the problem accumulates more in that situation.
In order to solve or reduce the impact of the issue, the playing clients could progress one turn for every five turns that the rejoined client computes. Hence the rejoined client still catches up with the other clients, while those are still able to play.
Change History (6)
comment:1 by , 8 years ago
comment:3 by , 7 years ago
Priority: | If Time Permits → Must Have |
---|
The problem is more severe then lag. Some players can't rejoin a game anymore and become disconnected due to extensive lag after the loading screen!
It's not necessarily the syncing after the map load, but also parts of the map loading process itself. It can be reproduced by starting a huge map in multiplayer mode and adding a LOGERROR("NetClientSession Poll %d", std::time(nullptr));
to CNetClientSession::Poll()
:
ERROR: NetClientSession Poll 1483494187 ERROR: NetClientSession Poll 1483494187 ERROR: NetClientSession Poll 1483494188 TIMER| ParseTerrain: 37.8426 ms ERROR: NetClientSession Poll 1483494188 ERROR: NetClientSession Poll 1483494188 TIMER| ParseEntities: 17.3228 s ERROR: NetClientSession Poll 1483494206 ERROR: NetClientSession Poll 1483494206 ERROR: NetClientSession Poll 1483494206 ERROR: NetClientSession Poll 1483494206 TIMER| common/modern/setup.xml: 140.615 us TIMER| common/modern/styles.xml: 148.302 us TIMER| common/modern/sprites.xml: 1.91044 ms TIMER| common/setup.xml: 261.822 us TIMER| common/setup_resources.xml: 77.32 us TIMER| common/sprites.xml: 523.206 us TIMER| common/styles.xml: 115.968 us TIMER| session/sprites.xml: 4.15153 ms TIMER| session/styles.xml: 195.216 us TIMER| session/session.xml: 111.865 ms TIMER| common/global.xml: 6.56328 ms GAME STARTED, ALL INIT COMPLETE ERROR: NetClientSession Poll 1483494220 ERROR: NetClientSession Poll 1483494222 TIMER| LoadDLL: 588.092 us ERROR: NetClientSession Poll 1483494231 ERROR: NetClientSession Poll 1483494231
As we can see the ParseEntities
part of the random map loading code already mutes the NetClient
for 17 seconds, which can be sufficient to disconnect it.
Two more undocumented tasks blocked the client for 14 seconds and 9 seconds respectively.
These tasks are apparently not being run in a separate thread like GenerateMap
(and the entire server) do.
comment:4 by , 7 years ago
Keywords: | beta added |
---|---|
Milestone: | Backlog → Alpha 22 |
comment:5 by , 7 years ago
TODO: We should likely thread the entire client instead of identifying the individual bottlenecks. Would be more sustainable as the design pattern avoids repetition of such bugs.
comment:6 by , 7 years ago
Description: | modified (diff) |
---|---|
Summary: | Finish-rejoin lag → Running the NetClient in a separate thread |
The same issue is stated in
NetServer.cpp
of r10437 (in particular the part about clients not being happy):