Opened 8 years ago

Last modified 2 years ago

#3752 new enhancement

Improve multiplayer game responsiveness — at Version 10

Reported by: fcxSanya Owned by: echotangoecho
Priority: Should Have Milestone: Backlog
Component: Network Keywords: beta
Cc: Krinkle Patch:

Description (last modified by elexis)

In r8400 multiplayer turn length was increased to mask the effects of latency. Consequently multiplayer has longer delay before player orders are performed after being issued (see for example start of the match here: https://youtu.be/Dl85JBt6zGM?t=57s). According to Georg (aka leper):

the current plan is to make the turn length shorter and allow scheduling commands for not only the next turn but also N+2, N+3 and such (or some computations start on the next turn, but the result will only be used some turns later)

(the corresponding IRC log is here: http://irclogs.wildfiregames.com/2016-01-22-QuakeNet-%230ad-dev.log also there are relevant discussions in the earlier logs)

#69 also proposed to dynamically adjust the turn length, but according to Georg:

the main issue with dynamic turn lengths is lots of pointless complexitiy

Change History (10)

comment:1 by fcxSanya, 8 years ago

Type: defectenhancement

comment:2 by elexis, 8 years ago

This seems to be a must-have:

Motivation:

  1. Eliminate network-induced simulation pauses causing lost lifetime and moonwalking units if there is a lagger.
  2. Improve the responsiveness if there is no lagger.

As observed in #69 and #3264, if the round trip time of a client exceeds the current turn length, the simulation is paused, waiting for the ack of that client.

While the simulation is paused, the units stay in their position while remaining in the walk animation (moonwalking).

Adapting the turn length eliminates those simulation pauses by increasing the turn length.

If the simulation is never paused waiting for the lagger, the ingame and realtime progress with the same speed (thus not losing lifetime to network lag anymore).

Since performance improvements in the engine can't fix network induced lag, these optimizations should be done in other tickets.

Existing implementation: In r7936 changing the rate manually (from JS) was implemented.

Explanation: The host can call Engine.SetTurnLength(foo); from the JS command-line interface (F9 key). It will directly change m_TurnLength of the NetServerTurnManager. Then the new turn length is broadcasted to the clients when sending the EndCommandBatchMessage from CheckClientsReady() in the NetServerTurnManager. On receive, the clients adopt that turn length in FinishedAllCommands() of the client's NetTurnManager.

To finish the implementation:

  • Adapt turn length: The turn length must be as short as possible while remaining just as long as needed, which is exactly the highest roundtrip time of the connected clients. Changing the turnlength to the highest ping when sending the EndCommandBatchMessage from the server is likely sufficient.
  • The replay menu needs to compute or save the duration of the game #3404
  • The lag-warnings from #3264 could just keep the 500ms limit. A network window (#3787) could show the pings of all clients. If the host wants to kick people with a ping below that.
  • Optional: Cleaning EndCommandBatchMessage: It is used for both directions (server->client, client->server) but with different connotations:
    • server sends it -> all clients are ready + dictate the next turn-length
    • client sends it -> that client is ready and doesn't use that turn-length field

The reuse of that message might be okay. But the currently ignored turn-length field for the client sent EndCommandBatchMessage could receive an actual use: verifying that the previous turn length was correct. Would need some rewiring of the m_SavedTurnLengths and potentially adding a new variable m_NextTurnLength to avoid off-by-one errors.

Last edited 8 years ago by elexis (previous) (diff)

comment:3 by elexis, 8 years ago

In 18268:

Network cleanup, refs #3752.
Replace a TODO comment asking why something is set with a comment answering that.

comment:4 by elexis, 8 years ago

Component: Core engineNetwork

Refs #4025.

comment:5 by elexis, 8 years ago

causative mentioned that some things in the simulation occur only once per turn (instead of being executed periodically every N simulation milliseconds).

For example if a unit is blocked by an obstruction, it will not try a new path until the next turn. Thus shorter turn lengths will yield much less annoying pathfinding behavior.

So either all these simulation components need to be readjusted to check for simulation time instead of turn number, or we could execute multiple turns inside the network latency timeframe (and execute these turns without waiting for the ack of a client to avoid simulation pauses).

comment:6 by elexis, 8 years ago

According to Philip (see irclogs yesterday), this behavior is already implemented with COMMAND_DELAY, so we should check whether COMMAND_DELAY = 3 and DEFAULT_TURN_LENGTH_MP = 150 works f.e. (and doesn't respond to lag before 450ms rtt).

comment:7 by scythetwirler, 7 years ago

Keywords: beta added

comment:8 by echotangoecho, 7 years ago

Owner: set to echotangoecho
Status: newassigned

comment:9 by echotangoecho, 7 years ago

Currently, with increased COMMAND_DELAY an assertion fails: NetServerTurnManager.cpp(49): Assertion failed: "turn == m_ClientsReady[client] + 1"

comment:10 by elexis, 6 years ago

Description: modified (diff)
Priority: Should HaveMust Have

If I had written the ticket today:
Title: TurnManager should handle observer timeouts and lag more fault-tolerantly
Description:
Observers are often frowned upon because of the possibility of them delaying the game by
(1) network timeouts
(2) bad network latency
(3) rejoins
(4) insufficient simulation performance

The root for all four problems is the server-turnmanager waiting for the affected observer to compute the next turn before progressing with the simulation.

The severity of networking problems can be reduced by improving the UI (making the network warnings "kick" buttons) and improving the simulation performance.

But the task of this ticket is to establish "graceful degradation", meaning to teach the NetServerTurnManager to progress the simulation in spite of one observerclient being affected by one of these four conditions.

The NetServerTurnManager can handle the four cases equally.

In CNetServerWorker::StartGame we find

m_ServerTurnManager->InitialiseClient(session->GetHostID(), 0); TODO: only for non-observers

But because rejoin syncs might never finish and because we don't want observers commenting on different times of the game, the NetServerTurnManager may not entirely disregard observers being too slow with the simulation / rejoin sync and have to wait for them to catch up only for a certain amount of time, for instance 8 simulation turns or a dynamically adjusted amount of time.

Note: See TracTickets for help on using tickets.