Opened 9 years ago

Last modified 3 years ago

#3373 new enhancement

Reduce latency by using multiple ENet channels

Reported by: elexis Owned by:
Priority: If Time Permits Milestone: Backlog
Component: Network Keywords:
Cc: Patch:

Description (last modified by elexis)

0 A.D. uses the ENet protocol (http://enet.bespin.org/) in order to send reliable, sequenced messages that are automatically fragmented and reassembled without using the slow TCP protocol.

Issue

Currently 0 A.D. only uses one channel for all traffic, thus introducing unnecessary lag. From NetSession.cpp and NetServer.cpp:

static const int CHANNEL_COUNT = 1;

and NetHost.h

static const int DEFAULT_CHANNEL = 0;

The only call to enet_peer_send in 0 A.D. occurs in CNetHost::SendMessage and uses DEFAULT_CHANNEL.


Reason

Sequenced packets can cause lag As ENet sequences the packets, it doesn't deliver packets to 0 A.D. if previous packets were not received yet. Waiting with the delivery of packets means latency / lag. See http://enet.bespin.org/Features.html (Sequencing)

ENet guarantees that no packet with a higher sequence number will be delivered before a packet with a lower sequence number, thus ensuring packets are delivered exactly in the order they are sent. For reliable packets, if a higher sequence number packet arrives, but the preceding packets in the sequence have not yet arrived, ENet will stall delivery of the higher sequence number packets until its predecessors have arrived.

Using multiple channels reduces that lag Different channels can be used to allow concurrent traffic. See http://enet.bespin.org/Features.html (Channels)

Since ENet will stall delivery of reliable packets to ensure proper sequencing, and consequently any packets of higher sequence number whether reliable or unreliable, in the event the reliable packet's predecessors have not yet arrived, this can introduce latency into the delivery of other packets which may not need to be as strictly ordered with respect to the packet that stalled their delivery. To combat this latency and reduce the ordering restrictions on packets, ENet provides multiple channels of communication over a given connection. Each channel is independently sequenced, and so the delivery status of a packet in one channel will not stall the delivery of other packets in another channel.


Implementation

How many channels to use: The following things should run on a separate channel:

  • Default: Simulation relevant and miscellaneous
  • Chat
  • File-Transfers (rejoined clients)
  • OOS-Checks

The channel numbers should be hardcoded in NetMessages.h where the other protocol constants reside.

Why chat messages can have their own channel: It is not a problem if the game continues while not having received all chat messages. Also it is not a problem if we process chat messages while the previous simulation commands of that client haven't been received yet. Cheats are not sent as chat but parsed locally and then sent as a simulation command (see comment:10:ticket:3545). So there can't be any orderding issues that might cause OOS.

Why downloads can have their own channel: If a client rejoins, then it will download the serialized simulation state of the host in many fragments. Meanwhile the game / simulation continues (which was implemented so that players don't have to wait for the download to finish). Since the sequencing is done independently for each client and since the rejoining client can't send any other, it shouldn't be required in theory. However it is good practice to move concurrent downloads to another channel and in future we might add other download types. (In that case we should probably use one channel per concurrent download).

Why OOS-Check messages can have their own channel: The OOS-check messages NMT_SYNC_CHECK and NMT_SYNC_ERROR should also run on a different channel. The Sync-Check and Sync-Error messages carry the turn number and hash-value, see NetMessages.h:

START_NMT_CLASS_(SyncCheck, NMT_SYNC_CHECK)
	NMT_FIELD_INT(m_Turn, u32, 4)
	NMT_FIELD(CStr, m_Hash)
END_NMT_CLASS()

START_NMT_CLASS_(SyncError, NMT_SYNC_ERROR)
	NMT_FIELD_INT(m_Turn, u32, 4)
	NMT_FIELD(CStr, m_HashExpected)
END_NMT_CLASS()

The hash-matching is already done asynchroneously in CNetServerTurnManager::NotifyFinishedClientUpdate, (which is also why OOS dumps sometimes contain simulation states of different turns #3348):

// Find the newest turn which we know all clients have simulated
// For every set of state hashes that all clients have simulated, check for OOS
...
// Oh no, out of sync
// Tell everyone about it

Change History (6)

comment:1 by elexis, 9 years ago

Description: modified (diff)

comment:2 by elexis, 9 years ago

Description: modified (diff)

comment:3 by elexis, 8 years ago

Priority: Should HaveIf Time Permits

Won't change much since most of the lag is "performance" lag. If actual network lag appears, the user will have a notification #3264.

comment:4 by elexis, 8 years ago

Owner: elexis removed

comment:5 by elexis, 8 years ago

Component: Core engineNetwork

(set component to network)

comment:6 by wraitii, 3 years ago

In 24518:

Thread the NetClient session.

This threads the netclient session, which avoids timeouts when the main-thread is not polling (map creation, very long simulation frames).

Unlike the NetServer, which should be as independent as possible from the main thread, the NetClient is fundamentally tied to the game thread. Therefore, this only threads the session object.
To ensure good performance and ease-of-use, lock-free queues for in/out-going messages are used.

This fixes artificial timeouts, while also improving actual ping reports (since frame-time is no longer a factor).
It effectively reverts D1513/rP21842 and rP17772, all hacks around lag-timeouts (and bits of rP18140).

Based on a patch by: Stan

Comments by: Vladislavbelov

Fixes #3700, refs #3373

Differential Revision: https://code.wildfiregames.com/D2848

Note: See TracTickets for help on using tickets.