Opened 8 years ago

Last modified 2 years ago

#3752 new enhancement

Improve multiplayer game responsiveness

Reported by: fcxSanya Owned by: wraitii
Priority: Should Have Milestone: Backlog
Component: Network Keywords: beta
Cc: Krinkle Patch: Phab:D3275

Description (last modified by elexis)

In r8400 multiplayer turn length was increased to mask the effects of latency. Consequently multiplayer has longer delay before player orders are performed after being issued (see for example start of the match here: https://youtu.be/Dl85JBt6zGM?t=57s). According to Georg (aka leper):

the current plan is to make the turn length shorter and allow scheduling commands for not only the next turn but also N+2, N+3 and such (or some computations start on the next turn, but the result will only be used some turns later)

(the corresponding IRC log is here: http://irclogs.wildfiregames.com/2016-01-22-QuakeNet-%230ad-dev.log also there are relevant discussions in the earlier logs)

#69 also proposed to dynamically adjust the turn length, but according to Georg:

the main issue with dynamic turn lengths is lots of pointless complexitiy

Change History (18)

comment:1 by fcxSanya, 8 years ago

Type: defectenhancement

comment:2 by elexis, 8 years ago

This seems to be a must-have:

Motivation:

  1. Eliminate network-induced simulation pauses causing lost lifetime and moonwalking units if there is a lagger.
  2. Improve the responsiveness if there is no lagger.

As observed in #69 and #3264, if the round trip time of a client exceeds the current turn length, the simulation is paused, waiting for the ack of that client.

While the simulation is paused, the units stay in their position while remaining in the walk animation (moonwalking).

Adapting the turn length eliminates those simulation pauses by increasing the turn length.

If the simulation is never paused waiting for the lagger, the ingame and realtime progress with the same speed (thus not losing lifetime to network lag anymore).

Since performance improvements in the engine can't fix network induced lag, these optimizations should be done in other tickets.

Existing implementation: (As of r18264) changing the rate manually is implemented, but it's not done automatically.

Explanation: The host can call Engine.SetTurnLength(foo); from the JS command-line interface (F9 key). It will directly change m_TurnLength of the NetServerTurnManager. Then the new turn length is broadcasted to the clients when sending the EndCommandBatchMessage from CheckClientsReady() in the NetServerTurnManager. On receive, the clients adopt that turn length in FinishedAllCommands() of the client's NetTurnManager.

To finish the implementation:

  • Adapt turn length: The turn length must be as short as possible while remaining just as long as needed, which is exactly the highest roundtrip time of the connected clients. Changing the turnlength to the highest ping when sending the EndCommandBatchMessage from the server is likely sufficient.
  • The replay menu needs to compute or save the duration of the game #3404
  • The lag-warnings from #3264 could just keep the 500ms limit. A network window (#3787) could show the pings of all clients. If the host wants to kick people with a ping below that.
  • Optional: Cleaning EndCommandBatchMessage: It is used for both directions (server->client, client->server) but with different connotations:
    • server sends it -> all clients are ready + dictate the next turn-length
    • client sends it -> that client is ready and doesn't use that turn-length field

The reuse of that message might be okay. But the currently ignored turn-length field for the client sent EndCommandBatchMessage could receive an actual use: verifying that the previous turn length was correct. Would need some rewiring of the m_SavedTurnLengths and potentially adding a new variable m_NextTurnLength to avoid off-by-one errors.

Version 0, edited 8 years ago by elexis (next)

comment:3 by elexis, 8 years ago

In 18268:

Network cleanup, refs #3752.
Replace a TODO comment asking why something is set with a comment answering that.

comment:4 by elexis, 8 years ago

Component: Core engineNetwork

Refs #4025.

comment:5 by elexis, 8 years ago

causative mentioned that some things in the simulation occur only once per turn (instead of being executed periodically every N simulation milliseconds).

For example if a unit is blocked by an obstruction, it will not try a new path until the next turn. Thus shorter turn lengths will yield much less annoying pathfinding behavior.

So either all these simulation components need to be readjusted to check for simulation time instead of turn number, or we could execute multiple turns inside the network latency timeframe (and execute these turns without waiting for the ack of a client to avoid simulation pauses).

comment:6 by elexis, 7 years ago

According to Philip (see irclogs yesterday), this behavior is already implemented with COMMAND_DELAY, so we should check whether COMMAND_DELAY = 3 and DEFAULT_TURN_LENGTH_MP = 150 works f.e. (and doesn't respond to lag before 450ms rtt).

comment:7 by scythetwirler, 7 years ago

Keywords: beta added

comment:8 by echotangoecho, 7 years ago

Owner: set to echotangoecho
Status: newassigned

comment:9 by echotangoecho, 7 years ago

Currently, with increased COMMAND_DELAY an assertion fails: NetServerTurnManager.cpp(49): Assertion failed: "turn == m_ClientsReady[client] + 1"

comment:10 by elexis, 6 years ago

Description: modified (diff)
Priority: Should HaveMust Have

If I had written the ticket today:
Title: TurnManager should handle observer timeouts and lag more fault-tolerantly
Description:
Observers are often frowned upon because of the possibility of them delaying the game by
(1) network timeouts
(2) bad network latency
(3) rejoins
(4) insufficient simulation performance

The root for all four problems is the server-turnmanager waiting for the affected observer to compute the next turn before progressing with the simulation.

The severity of networking problems can be reduced by improving the UI (making the network warnings "kick" buttons) and improving the simulation performance.

But the task of this ticket is to establish "graceful degradation", meaning to teach the NetServerTurnManager to progress the simulation in spite of one observerclient being affected by one of these four conditions.

The NetServerTurnManager can handle the four cases equally.

In CNetServerWorker::StartGame we find

m_ServerTurnManager->InitialiseClient(session->GetHostID(), 0); TODO: only for non-observers

But because rejoin syncs might never finish and because we don't want observers commenting on different times of the game, the NetServerTurnManager may not entirely disregard observers being too slow with the simulation / rejoin sync and have to wait for them to catch up only for a certain amount of time, for instance 8 simulation turns or a dynamically adjusted amount of time.

comment:11 by Stan, 6 years ago

Might be stupid but can't they just get the computed state from say the host ?

comment:12 by Krinkle, 5 years ago

Cc: Krinkle added

comment:13 by wraitii, 3 years ago

Owner: changed from echotangoecho to wraitii
Patch: Phab:D3275
Status: assignednew

The key here is indeed the command delay, as command_delay * turn_length = max lag before things freeze.

Decreasing turn lengths certainly could work, but it has diminishing returns, and the current value of 200ms in SP seems 'fine enough' in general.

I think it would be interesting to also reduce the COMMAND_DELAY dynamically, though possibly harder. However, if we were to go with Phab:D3275, it'd probably be necessary: with a delay of 4, we can't really have 'low lag' unless we use <125ms turns, which seems low. In SP, we effectively have a COMMAND_DELAY of 1, and 200ms turns.

(having turn lengths be too low means the ratio of "time spent computing turns" vs "time spent computing graphics" might become too unfavourable, resulting in really low FPS overall, not just 'spiky' fps).

comment:14 by wraitii, 3 years ago

Milestone: BacklogAlpha 24

Bumping to A24 though it's probably a bit late in the release cycle... This is at least an A25 RB imo.

comment:15 by Stan, 3 years ago

Milestone: Alpha 24Alpha 25
Priority: Must HaveRelease Blocker

comment:16 by wraitii, 3 years ago

In 25001:

Increase MP Command delay to 4 turns, decrease MP turns to 200ms.

To hide network latency, MP turns send commands not for the next turn but N turns after that (introduced in rP).
Further, MP turn length was increased to 500ms compared to 200ms SP turns (introduced in rP7551).
Unfortunately, increasing MP turn length has negative consequences:

  • makes pathfinding/unit motion much worse & unit behaviour worse in general.
  • makes the game more 'lag-spikey', since computations are done less often, but thus then can take more time.

This diff essentially reverts rP8400, instead increasing COMMAND_DELAY from 2 to 4 in MP. This:

  • Reduces the 'inherent command lag' in MP from 1000ms to 800ms
  • Increases the lag range at which MP will run smoothtly from 500ms to 600ms
  • makes SP and MP turns behave identically again, removing the hindrances described above.

As a side effect, single-player was not actually using COMMAND_DELAY, this is now done (can be used to simulate MP command lag).

Refs #3752

Differential Revision: https://code.wildfiregames.com/D3275

comment:17 by wraitii, 3 years ago

Milestone: Alpha 25Alpha 26
Priority: Release BlockerShould Have

comment:18 by Freagarach, 2 years ago

Milestone: Alpha 26Backlog
Note: See TracTickets for help on using tickets.