Opened 7 years ago

Last modified 7 years ago

#4315 new defect

[PATCH] Disconnect detection is broken in lobby.

Reported by: scythetwirler Owned by: sbirmi
Priority: Should Have Milestone: Backlog
Component: Multiplayer lobby Keywords: patch
Cc: Patch:

Description

If you are kicked in the lobby, it shows "[your username] has left." but the entire lobby appears functional, though the client is actually disconnected.

Attachments (2)

screenshot01-lobby_disconnected.png (635.3 KB ) - added by sbirmi 7 years ago.
0ad handles tcp session termination gracefully
4315-lobby-kick-v1.patch (2.3 KB ) - added by sbirmi 7 years ago.
Return to main menu if we are kicked. See comment 5 in #4315 for details and questions.

Download all attachments as: .zip

Change History (14)

comment:1 by sbirmi, 7 years ago

Owner: set to sbirmi

comment:2 by elexis, 7 years ago

Reproduce:

  1. Join the multiplayer lobby
  2. Press F9 and execute Engine.StopXmppClient();

comment:3 by sbirmi, 7 years ago

I don't think that's the right way to reproduce *this* issue.

The original issue was specifically about being kicked. I am guessing we receive a leave message when kicked which is why the message says "[your username] has left" is visible.

                "leave": msg => {
                        addChatMessage({
                                "text": "/special " + sprintf(translate("%(nick)s has left."), {
                                        "nick": msg.text
                                }),
                                "isSpecial": true
                        });
                },

This seems to assume someone else is getting kicked? (we never check if the player itself got kicked and remains in the "lobby" view). I am trying to reproduce this specific case (of getting kicked).

I tried injecting a TCP reset packet in between and I see a "Disconnected. An I/O error occurred" and the player list on the left is cleared. This is again not the same as what the original ticket describes. To me, that was handled gracefully. (Host game is also grayed out in this case). I am attaching a screenshot of this. In this case, hitting "Main Menu" also returns us to the main menue gracefully.

By doing a "Engine.StopXmppClient()" from the console, we incorrectly free state so that hitting "Main Menu" then crashes likely because of a double free/double StopXmppClient() somewhere.

by sbirmi, 7 years ago

0ad handles tcp session termination gracefully

comment:4 by sbirmi, 7 years ago

Notes on reproducing the original issue

I injected packets (using "packit") in the TCP stream to simulate a not-so-graceful (TCP RESET) and a graceful (getting kicked) disconnection. Here are notes if someone else wants to try reproducing this/finds this useful.

  1. Run tcpdump and monitor port 5222 traffic (this is xmpp traffic. we will need TCP seq/ack numbers and the local port number)

# tcpdump -i any port 5222 -n -vvv --absolute-tcp-sequence-numbers

(in subsequent description, my local ip address is 192.168.1.6)

  1. The last packet in the exchange gives you the needed sequence and ack numbers. So if the last packet is:
06:48:16.723531 IP (tos 0x0, ttl 64, id 64609, offset 0, flags [DF], proto TCP (6), length 52) 
    192.168.1.6.38760 > 136.243.15.233.5222: Flags [.], cksum 0x288c (correct), seq 2159777185, ack 3606714551, win 396, options [nop,nop,TS val 13595552 ecr 631809573], length 0

local-tcp-port = 38760 send-tcp-seq = 3606714551 send-tcp-ack = 2159777185 (note that we are sending this packet, seq/ack numbers would otherwise swap).

  1. To inject a TCP RESET, use:

# packit -t TCP -i lo -s 136.243.15.233 -S 5222 -d 192.168.1.6 -D 38760 -F R -q 3606714551 -a 2159777185

You will see that the lobby already detects this correctly (see attachment screenshot01-lobby_disconnected.png)

  1. To emulate getting kicked, the TCP packet has this payload.

<presence from='arena22@conference.lobby.wildfiregames.com/sbirmi' to='sbirmi@lobby.wildfiregames.com/0ad' type='unavailable'><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='none' role='none'/><status code='307'/></x></presence>

This in hex is: "3c 70 72 65 73 65 6e 63 65 20 66 72 6f 6d 3d 27 61 72 65 6e 61 32 32 40 63 6f 6e 66 65 72 65 6e 63 65 2e 6c 6f 62 62 79 2e 77 69 6c 64 66 69 72 65 67 61 6d 65 73 2e 63 6f 6d 2f 73 62 69 72 6d 69 27 20 74 6f 3d 27 73 62 69 72 6d 69 40 6c 6f 62 62 79 2e 77 69 6c 64 66 69 72 65 67 61 6d 65 73 2e 63 6f 6d 2f 30 61 64 27 20 74 79 70 65 3d 27 75 6e 61 76 61 69 6c 61 62 6c 65 27 3e 3c 78 20 78 6d 6c 6e 73 3d 27 68 74 74 70 3a 2f 2f 6a 61 62 62 65 72 2e 6f 72 67 2f 70 72 6f 74 6f 63 6f 6c 2f 6d 75 63 23 75 73 65 72 27 3e 3c 69 74 65 6d 20 61 66 66 69 6c 69 61 74 69 6f 6e 3d 27 6e 6f 6e 65 27 20 72 6f 6c 65 3d 27 6e 6f 6e 65 27 2f 3e 3c 73 74 61 74 75 73 20 63 6f 64 65 3d 27 33 30 37 27 2f 3e 3c 2f 78 3e 3c 2f 70 72 65 73 65 6e 63 65 3e"

(note that sbirmi="73 62 69 72 6d 69" and appears twice)

The command to run is

# packit -t TCP -i lo -s 136.243.15.233 -S 5222 -d 192.168.1.6 -D 38760 -F PA -p "0x 3c 70 72 65 73 65 6e 63 65 20 66 72 6f 6d 3d 27 61 72 65 6e 61 32 32 40 63 6f 6e 66 65 72 65 6e 63 65 2e 6c 6f 62 62 79 2e 77 69 6c 64 66 69 72 65 67 61 6d 65 73 2e 63 6f 6d 2f 73 62 69 72 6d 69 27 20 74 6f 3d 27 73 62 69 72 6d 69 40 6c 6f 62 62 79 2e 77 69 6c 64 66 69 72 65 67 61 6d 65 73 2e 63 6f 6d 2f 30 61 64 27 20 74 79 70 65 3d 27 75 6e 61 76 61 69 6c 61 62 6c 65 27 3e 3c 78 20 78 6d 6c 6e 73 3d 27 68 74 74 70 3a 2f 2f 6a 61 62 62 65 72 2e 6f 72 67 2f 70 72 6f 74 6f 63 6f 6c 2f 6d 75 63 23 75 73 65 72 27 3e 3c 69 74 65 6d 20 61 66 66 69 6c 69 61 74 69 6f 6e 3d 27 6e 6f 6e 65 27 20 72 6f 6c 65 3d 27 6e 6f 6e 65 27 2f 3e 3c 73 74 61 74 75 73 20 63 6f 64 65 3d 27 33 30 37 27 2f 3e 3c 2f 78 3e 3c 2f 70 72 65 73 65 6e 63 65 3e" -q 3606714551 -a 2159777185

(note the addition of "0x" in the -p argument)

This reliably recreates both getting kicked and a TCP reset.

comment:5 by sbirmi, 7 years ago

There are three aspects to the fix:

  1. leave message handler detecting and notifying that we got kicked. I can think of 3 ways to do this:
    1. using a global variable (but that's just ugly)
    2. "leave" : msg => { ...; return <bool>; } but that's not generic enough to apply to other possible uses
    3. "leave" : msg => { ...; return {'kicked': true}; } which can be extended in the future. I have added a method to parse the response as well: netMessageResultAttr()
  1. When control returns to onTick, we call updatePlayerList() which was tripping if we already call returnToMainLobby() inside the leave handler. We can choose to move returnToMainLobby() here if that's preferred.
  1. Provide feedback to the player that s/he has been disconnected. I am using warn(). Can you recommend a better way?

The fix works (returns to main lobby with a warning if the player is kicked).

by sbirmi, 7 years ago

Attachment: 4315-lobby-kick-v1.patch added

Return to main menu if we are kicked. See comment 5 in #4315 for details and questions.

comment:6 by sbirmi, 7 years ago

Keywords: patch rfc added

comment:7 by sbirmi, 7 years ago

Summary: Disconnect detection is broken in lobby.[PATCH] Disconnect detection is broken in lobby.

comment:8 by elexis, 7 years ago

Keywords: rfc removed

First of all thanks for putting so much effort into reproducing the issue :)

With regards to the patch:

  • The netMessageResultAttr is bad as it is too generic but used only in a very specific (and only one) situation.
  • How can you conclude from the room notification that we have left the lobby, that we were kicked? How do we distinguish it from a ban and a usual network disconnect?
  • The network disconnect detection itself being broken must be relevant to this ticket. We should execute exactly the same code when kicking as on a regular disconnect (besides the different notification) without repeating the code, i.e. the "disconnected" notification must arrive. That one ought to be sent from XmppClient::onDisconnect in the c++ code.

comment:9 by elexis, 7 years ago

  • Engine.DisconnectXmppClient():
    • disconencts oneself from the lobby, that case is handled by the existing XmppClient::onDisconnect code.
    • is not used anywhere.
    • seems to be the only place calling the "disconnected" message in lobby.js, as a failed login or account registration is handled in prelobby.js
  • Engine.StopXmppClient() destroys the XmppClient and XmppClient::onDisconnect is not called, so we cannot receive the disconnect in case of calling this function (and is indeed not useful for reproducing the issue).
  • After a kick, Engine.LobbySetPlayerPresence("playing"); reconnects oneself to the lobby, which is for example called by clicking on the host button.
  • On timeout, nothing ever seems to happen.

So, you are right that the kick case is different. It is expected that the player loses the ability to communicate, but should still be able to read the chat messages that were sent just before the kick. As scythetwirler mentioned, the controls should become disabled.

comment:10 by elexis, 7 years ago

You have overlooked the most important three bytes in that packet: 307!

https://xmpp.org/registrar/mucstatus.html

301	presence	Removal from room	Inform user that he or she has been banned from the room
307	presence	Removal from room	Inform user that he or she has been kicked from the room
332	presence	Removal from room	Inform user that he or she is being removed from the room because of a system shutdown

comment:12 by elexis, 7 years ago

Milestone: Alpha 22Backlog

Kick/ban support added by r19250. Disconnect detection still broken in case of timeout (see fatherbushido's experience described in D116)! Seen some gloox threads about that somewhere.

Note: See TracTickets for help on using tickets.