#3264 closed enhancement (fixed)
[PATCH] Better timeout handling
Reported by: | elexis | Owned by: | elexis |
---|---|---|---|
Priority: | Should Have | Milestone: | Alpha 20 |
Component: | Network | Keywords: | patch |
Cc: | Patch: |
Description (last modified by )
Refs #69 Enhance Multiplayer Experience, #2854 Extensive lag may cause disconnections
Problem: When a client drops, the game hangs for up to 30 seconds sometimes. The game appears to be paused and no one knows what's going on (although you can be sure that someone is dropping after like 10 seconds).
Fix: (1) Some seconds after the last turn was processed, a message should be displayed for all clients that we are waiting for. (2) The maximum timeout duration could be reduced from 30 to 10-20 seconds.
Reproduce timeout pauses: Without modifying 0ad, I could reproduce a 5 to 8 second timeout, where the game appears to be paused for that duration. You need to start a game with another client and then kill that client with
kill -9 <pid>
To get the correct process ID use:
pidof pyrogenesis
Or just use this command to kill the 0ad instance with the highest PID (usually the second instance):
kill -9 `pidof pyrogenesis | tr ' ' '\n' | sort -r | head -n 1`
Following the timeout model of enet (see below), the 30 second timeout can only be reproduced if you have actual lag.
Technical Background:
Currently the client network timeouts are not set in 0ad, i.e. the default values of the enet library are used. They are hardcoded in enet.h
:
ENET_PEER_TIMEOUT_LIMIT = 32, ENET_PEER_TIMEOUT_MINIMUM = 5000, ENET_PEER_TIMEOUT_MAXIMUM = 30000,
The documentation doesn't have too much information. The code can be obtained from http://enet.bespin.org/Downloads.html.
In peer.c
you can find the following method to change the timeout settings for each peer:
/** Sets the timeout parameters for a peer. The timeout parameter control how and when a peer will timeout from a failure to acknowledge reliable traffic. Timeout values use an exponential backoff mechanism, where if a reliable packet is not acknowledge within some multiple of the average RTT plus a variance tolerance, the timeout will be doubled until it reaches a set limit. If the timeout is thus at this limit and reliable packets have been sent but not acknowledged within a certain minimum time period, the peer will be disconnected. Alternatively, if reliable packets have been sent but not acknowledged for a certain maximum time period, the peer will be disconnected regardless of the current timeout limit value. @param peer the peer to adjust @param timeoutLimit the timeout limit; defaults to ENET_PEER_TIMEOUT_LIMIT if 0 @param timeoutMinimum the timeout minimum; defaults to ENET_PEER_TIMEOUT_MINIMUM if 0 @param timeoutMaximum the timeout maximum; defaults to ENET_PEER_TIMEOUT_MAXIMUM if 0 */
Therefore timeouts can be changed in NetServer.cpp
in the function CNetServerWorker::RunStep
, for the case ENET_EVENT_TYPE_CONNECT
by adding for example:
enet_peer_timeout(event.peer, 0, 0, 10000);
The timeout logic can be found in protocol.c
.
If I understand the matter correctly, the client will be disconnected after 5 seconds if he had good latency before disconnecting. If he already had bad latency then enet will wait up to 30 seconds.
Attachments (5)
Change History (16)
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
comment:3 by , 9 years ago
The hosting player should never be able to timeout.
If for some reason you had a timeout (i.e. lag for more than 30 seconds) it would kill the whole server.
Some reasons for such a huge timeout include sunken ship lag (r16636) or having 0ad on an external harddrive that goes to standby mode after starting the game and then needing more than 30 seconds to load later.
Moved to #3423
by , 8 years ago
Attachment: | t3264_show_network_warnings_WIP_v1.patch added |
---|
Shows a warning in the top right corner of the screen in gamesetup and session if a client is losing the connection or having a bad ping (< turn length).
by , 8 years ago
Attachment: | screenshot_ping.jpg added |
---|
by , 8 years ago
Attachment: | screenshot_timeout.jpg added |
---|
by , 8 years ago
Attachment: | video_ping.gif added |
---|
comment:5 by , 8 years ago
Reproduce bad latency and timeouts
With this bash script which uses traffic control it is possible to simulate arbitrary lag.
function setlag() { for device in "eth0"; do sudo tc qdisc del dev $device root netem; if [ $1 -ne 0 ]; then sudo tc qdisc add dev $device root netem delay ${1}ms; fi done }
The attached patch shows a warning in the top right corner of the screen in case the latency is worse than 110% of a turnlength (500ms) since in that case the game will be lagging notably.
Another warning is shown in case a client loses the connection (after 4 seconds). While the client is timing out, the game will not progress (for up to 30 seconds) before the client is disconnected.
These message give the host the possibility to kick the lagger or start a new game without that client.
Notice:
- If the client detects bad latency / disconnect, it will trigger a local warning.
- If the server detects bad latency / disconnect, it will send a packet to inform all other clients.
The packet CClientPerformanceMessage
can be reused for a future feature to show the latencies of all clients.
comment:6 by , 8 years ago
Keywords: | patch review added |
---|---|
Milestone: | Backlog → Alpha 20 |
Summary: | Better timeout handling → [PATCH] Better timeout handling |
Updates in the patch:
Makes netwarnings optional, but enabled by default.
Correctly sizes the textbox by exposing CalculateStringSize
to the ScriptFunctions
.
Checks for bad ping in gamesetup by assuming the turnlength to be constant.
Uses u32
for the RTT in the client-performance packet, since the client of the host should be able to freeze for more than 65 seconds
without being disconnected (see #2854).
Don't throw errors when receiving the new packet types while loading or synchronizing the game.
std::time_t
instead of u32
for std::time
.
Early return instead of ternary operator.
Use extern
instead of defining constants in header files.
Don't report while the loading screen is visible.
Don't process warnings if that client has been disconnected already (case "Unknown Player").
by , 8 years ago
Attachment: | t3264_show_network_warnings_v1.patch added |
---|
comment:7 by , 8 years ago
After the netwarnings, three more message types can be added to explain all known distinct lagtypes:
The notification area can be reused to show other warnings that impact on the simulation rate. To explain to the user why the game is not advancing, those three message types should be added too:
- Finish-rejoin lag: #3700 A network warning should be shown if the client needs more than 4 seconds to switch from
NCS_JOIN_SYNCING
toNCS_INGAME
. - Pausing #1950: Pausing is currently indistinguishable from lag. This notification area would in my opinion be a good place to inform the user who paused.
- Performance warnings:
- The option should become a dropdown option with 3 possible values (no warnings, network warnings, all warnings).
- If all warnings are enabled, it should show a message if a client needs more than turn_length to compute the last turn.
- This will be quite spammy since the engine can't run 1000 units on a map flawlessly. The default setting should therefore only show network warnings.
(Therefore it should be renamed from netwarnings to gamestate-notifications)
comment:9 by , 8 years ago
Keywords: | review removed |
---|
Actually it would be bad if we reduced the timeout duration. It would only result in more disconnects, since the performance is not optimized currently. If people are informed if the game is paused (#1950) or waiting for someone to timeout, then the issue should be resolved.
That line should still be added as a comment, so that it is easier to adapt the timeout to the performance in future revisions:
Also the game should display a different notification if a player quits the game on purpose, so that you can distinguish it from a network disconnect.