#2305 closed enhancement (fixed)
[PATCH] UDP Hole Punching / NAT Traversal / STUN
Reported by: | Josh | Owned by: | fcxSanya |
---|---|---|---|
Priority: | Must Have | Milestone: | Alpha 22 |
Component: | Multiplayer lobby | Keywords: | patch, beta |
Cc: | fcxSanya@… | Patch: |
Description (last modified by )
Details
Right now in 0 A.D. we use a primitive UPnP system to implement NAT traversal, but this system is unreliable. We would like to see a better method using UDP Hole Punching instead. The solution would involve interacting with the lobby server to get traversal information. The recommended implementation would use STUN with ICE. It may be possible to (ab)use XEP-0176: Jingle ICE-UDP Transport Method for the desired result. Ejabberd has a built-in STUN server (Manual Entry). Both Ejabberd and Gloox support XEP-0176.
Expected Result
Enabling 0 A.D. clients to connect to a host behind any type of NAT.
Required Prerequisites
Fluency in C++ and understanding of NAT traversal. JavaScript knowledge would also be recommended. (Ability to learn/familiarity with erlang would also be required when dealing with the lobby server)
Helpful Links
Attachments (2)
Change History (49)
comment:1 by , 11 years ago
Component: | Core engine → Multiplayer lobby |
---|
comment:2 by , 11 years ago
Priority: | Release Blocker → Must Have |
---|
comment:3 by , 11 years ago
Milestone: | Alpha 15 → Alpha 16 |
---|
comment:4 by , 11 years ago
comment:5 by , 11 years ago
Description: | modified (diff) |
---|---|
Priority: | Must Have → Should Have |
Because we now have UPnP-based automatic port forwarding (which serves nearly the same purpose), I'm dropping this ticket's priority.
comment:7 by , 11 years ago
Description: | modified (diff) |
---|---|
Owner: | removed |
Priority: | Should Have → Must Have |
I'm not really working on this anymore as I have not been able to fully understand what implementing this would require.
comment:8 by , 10 years ago
Milestone: | Alpha 16 → Alpha 17 |
---|
comment:9 by , 10 years ago
Milestone: | Alpha 17 → Alpha 18 |
---|
comment:10 by , 10 years ago
Milestone: | Alpha 18 → Backlog |
---|
Backlogging this due to lack of activity.
comment:11 by , 9 years ago
Summary: | UDP Hole Punching / NAT Transversal → UDP Hole Punching / NAT Traversal |
---|
comment:12 by , 9 years ago
Description: | modified (diff) |
---|
Added a few bits of information to the description from my recent research.
comment:13 by , 9 years ago
Cc: | added |
---|
follow-up: 15 comment:14 by , 9 years ago
Description: | modified (diff) |
---|
It seems gloox has only data structures representing Jingle transport, but doesn't have any ICE logic (candidates gathering, exchange; connectivity checks; keeping connection alive etc.). libnice on the contrary is providing ICE logic, but as far as I understand it doesn't have signaling. I'm planning to write a proof-of-concept test application with gloox and libnice (essentially it requires only to combine sample apps for both libraries; also googling "gloox libnice" returns at least one application with available source which uses this combination of libraries (https://github.com/yokto/xicecat)).
comment:15 by , 9 years ago
Update for my previous comment:
I'm planning to write a proof-of-concept test application with gloox and libnice
I uploaded it here: https://github.com/AlexanderOlkhovskiy/gloox-libnice-test It basically works (I tested it only locally so far though), but requires some fixing, cleanup and refactoring.
comment:16 by , 8 years ago
As I reported on the staff forums at the end of April, recent few months I was even less active in 0 A.D. development than usually due to personal life events. Part of the message is related to this ticket:
At winter/start of spring (when I had a bit more free time) I've written a test gloox/libnice application (linked in my last comment in the ticket) and I stopped on libnice/enet integration: in 0 A.D. we are using one enet host on host and enet clients for each player (including the host one). libnice seems to be intended to one-to-one interaction, so for each client we need to have a separate agent (or at least a separate stream) on host (at least that's the best way I was able to find and the one I implemented in the aforementioned test application), so the only way to integrate libnice I saw was to rewrite the network code sending data via libnice itself (what's probably its intended way of usage, but isn't good in our case) or maybe I needed more time to research.
Also earlier Georg suggested on IRC to look at STK implementation of NAT traversal which is done via STUN (which initially (in RFC 3489) was intended as a complete solution, but was reconsidered to be used only as part of higher-level protocols (in particular ICE) later (see RFC 5389)) in which case as far as I understand we are doing a STUN request from the intended host endpoint (generating the inside/outside endpoint mapping on the router and learning the outside one's address/port) and then start a enet host on it (sharing the router's outside endpoint with clients). I looked at the STK code (the STUN request itself is here: https://github.com/supertuxkart/stk-code/blob/master/src/network/protocols/get_public_address.cpp) and had some things to clarify, but then I didn't have enough free time to proceed with it.
I hope to proceed with these things once I will have a bit more free time (unless someone will be too tired of waiting and would finish the NAT traversal himself).
Since then I've written another test application consisting of STUN-related code extracted from STK: https://github.com/AlexanderOlkhovskiy/stk-stun-test It's less sophisticated than ICE-based solution, but it should be much easier to integrate it into 0 A.D.
comment:17 by , 8 years ago
Description: | modified (diff) |
---|
follow-up: 19 comment:18 by , 8 years ago
Do you know how much time will pass until you have more time ? Thanks for your work so far =)
comment:19 by , 8 years ago
Replying to stanislas69:
Do you know how much time will pass until you have more time ?
Last few years I generally don't have much time due to work and family (that's why I asked to consider myself 'retired' from 0 A.D. dev team about 3 years ago, but I didn't left the project completely — I'm following the development and trying to contribute when possible).
Recent months were even busier/harder than usually, but it's a bit better now, so I hope to slowly proceed with 0 A.D. activities (in particular to attempt to integrate STUN-based NAT traversal solution into 0 A.D.).
As I've said before, I don't mind if someone else will work on this task too (either collaborating with me or doing an alternative solution in parallel), I started doing research on it in the first place because there wasn't much activity since Joshua's initial attempt, it's somewhat self-contained feature (rather than something related to frequently changed areas of code) and it's an interesting subject to me.
by , 8 years ago
Attachment: | stk_stun_wip_2016-08-28.diff added |
---|
comment:20 by , 8 years ago
Attached an early unfinished version of STK-derived STUN code integration: it doesn't work nor compiles yet, and STK-derived code has to be rewritten using 0 A.D. libraries/data structures/approaches/conventions/etc. The patch in the current state only outlines the planned integration (what code should go where).
In the current lobby implementation, host's port is configured in match setup and is sent in the register game stanza, IP is detected by an ejabberd plugin and inserted into the game record. The simplest way to integrate STUN (the one I used in the attached patch) is to include STUN endpoint into the same register game stanza / lobby game record besides the 'normal' one (aforementioned IP + port), this way STUN-based code is closer to the existing network implementation (for both lobby-based matchmaking and direct connection).
Another possible option is to publish only JID (rather than IP + port) in the game list, in this case before connecting to host, a client should exchange XMPP messages with it to discover available endpoint(s). This is the way ICE works.
License note (it was promptly discussed on IRC when Georg suggested to check out STK's implementation, but wasn't mentioned in this ticket yet): STK code is GPLv3+, so we have to: a) include it as is, which will make the resulting binary GPLv3+ too (what AFAIK is already the case when compiling with lobby, due to gloox) b) ask the relevant part of STK contributors (starting from Robin 'hilnius' Nicollet who has apparently wrote the initial STUN implementation in https://github.com/supertuxkart/stk-code/commit/613e97d0d6bb477d51412dbf09709df810a0e6a5) to relicense the part of code we need under GPLv2+ c) rewrite the code from scratch (it needs to be adapted to 0 A.D. in any case, but it would be convenient to have an option to leave some pieces as is)
comment:21 by , 8 years ago
Created a git branch to simplify publishing/tracking interim updates: https://github.com/AlexanderOlkhovskiy/0ad/commits/stk-stun (cd73fe3 corresponds to stk_stun_wip_2016-08-28.diff
attached above)
comment:22 by , 8 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:23 by , 8 years ago
hi fcxSanya,
I assigned this to me cause I would want to help you do this in my spare time. When you got time, please contact me via IRC to talk about the next step to do this.
Thx you
by , 8 years ago
Attachment: | fcxSanyaUdpPatchList.zip added |
---|
Summary of changes made by fcxSanya's branch
follow-up: 29 comment:24 by , 8 years ago
Hi,
I've downloaded and collected the commit list of the branch you created, so other people can apply the last patches of the UDP hole punch if they're using the svn version, like me, so it'ld be easier to test on existing local branches.
Of course, everytime you do a commit, I can update the .zip and overwrite the existing one.
Also I would want to know is if there's something left (apart from the testing and patching of course) I could help you with the coding, or it's completely fixed now. If that's right, I can unassign this ticket from me since you're the one who did it.
Thx for working on this, this patch will avoid lots of players quitting because they can't create matches for their friends or simply can't find a game to join.
comment:25 by , 8 years ago
Keywords: | patch added |
---|---|
Summary: | UDP Hole Punching / NAT Traversal → [PATCH] UDP Hole Punching / NAT Traversal |
Some code style things:
if (foo) {
->if (foo)\n{
gamesetup_mp.js
-> brackets not needed if there is only one command in the scope.cpp
and.h
file have the year and GPL license in the header. Since the STK code also uses GPL ("2 and 3+"), it's probably compatible to our license. The hint that our code is based on theirs should go toLICENSE.txt
in the root directory of 0AD- magic cookie, does that stand for something? (Like 20595 meaning PS for prometheus)
a--
our coding convention prefers--a
unless the othercase is explicitly needed// TODO: make STUN server configurable
-> Looking fordefault.cfg
andCFG_GET_VAL
?assert
-> we useENSURE
+ some string message usually, f.e.ENSURE(proto.isObject() && "A serializable prototype has to be an object!");
if (foo != NULL)
should be the same asif (foo)
(andif (foo == NULL)
->if (!foo)
, see also r18821)printf
probably not all but some of them could becomeLOGMESSAGE
,LOGMESSAGERENDER
orLOGERROR
m_foo_bar
->m_FooBar
StunClient::FindStunEndpoint
broken indentation- some commented out code, probably because it's all WIP
comment:26 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:27 by , 8 years ago
I unassigned it from me cause fcxSanya has done most of the work and I'm here to help, so owning the ticket doesn't feel legit.
comment:29 by , 8 years ago
Owner: | changed from | to
---|
Replying to javiergodas:
I've downloaded and collected the commit list of the branch you created
Thanks for gathering the archive, but as was discussed in the IRC chat (2016-10-09-QuakeNet-#0ad-dev.log 19:25 -- 19:49), you can always download the latest version of the git branch diff here: https://github.com/0ad/0ad/compare/master...AlexanderOlkhovskiy:stk-stun.diff and you can apply it to SVN copy with patch -p1
:
curl https://github.com/0ad/0ad/compare/master...AlexanderOlkhovskiy:stk-stun.diff | patch -p1
so there is no need to create/upload archives for each new commit
Also I would want to know is if there's something left (apart from the testing and patching of course) I could help you with the coding, or it's completely fixed now.
I've committed a fix for the assert issue I mentioned on IRC (2016-10-09-QuakeNet-#0ad-dev.log 18:49): 35da3c1. The remaining part of my message ("The StunClient code still needs to be adapted to 0ad standards and then the STUN endpoint has to be passed from host to clients via lobby (there is already a draft of these changes, but they are unfinished yet).") still applies. And again, as I've said earlier, you are welcome to participate in the coding if you want, but given that you don't yet have much experience with 0 A.D. codebase and sometimes I may be slow to respond, it may be easier/more convenient and productive for you to start from simpler/smaller/more self-containing features and return to this one later if I won't finish it until then.
Replying to elexis: thanks for the review! Some notes/answers for now:
StunClient
code style, printf/logging, commented out code: I started to adjust it to 0 A.D. conventions, but didn't finish yet- license: I just put the note there for now to have some attribution, will update it later, see also some license-related notes in http://trac.wildfiregames.com/ticket/2305#comment:20
gamesetup_mp.js
(and all other code in *.js files): that's just a draft to mark what code is expected to go where, so I didn't care about the conventions much, will fix it when will rewrite the code- "magic cookie" is a term/value from RFC 5389, the document doesn't state where the value comes from as far as I can see
default.cfg
andCFG_GET_VAL
: yeah, I just recently made a patch introducing a new config option for another ticket: #2845, but didn't reach this particular TODO item here yetassert
/ENSURE
: I replaced one ofassert
's with a 'normal if' in 35da3c1 and I think I will do the same with remaining ones, since they check the data based on incoming message from a third-party service (STUN server) rather than ensure that the code itself produces the proper data (what is their purpose as far as I understand)- broken indentation: did you look at the diff attached here (
stk_stun_wip_2016-08-28.diff
) or at the code in the github branch: https://github.com/0ad/0ad/compare/master...AlexanderOlkhovskiy:stk-stun ? I tried to fix the indentation in d5fef01 (which was committed into the git branch afterstk_stun_wip_2016-08-28.diff
, but before your review), I may have missed something or may have done something wrong though.
Replying to javiergodas:
I unassigned it from me
I started to do some research for this ticket at least in December 2015 (possible even earlier), but I didn't assign it to myself back then because I was far from making 0 A.D. patch itself at that time and I was progressing very slowly (due to business with work and family), so I didn't want to discourage more active contributors from working on it. Since I have a patch now, and regularly making some progress on it and plan to finish it in some foreseeable future, and there is no one else more actively working on this feature, I guess it makes sense to finally assign the ticket to myself to indicate my intention to finish it if noting else.
follow-up: 33 comment:30 by , 8 years ago
Good news[1]
, bad news[2]
and good news again[3]
:
[1]
with ecef7db I finished a 'draft' version (presumably working, but still requiring cleanup and adaptation to 0ad coding standards) of my patch, I tested it and NAT-mapped endpoint was discovered by STUN and published in the lobby game record, but... [2]
I wasn't able to connect to it :D (which is the whole point of this functionality). This simple STUN usage should work for 'Full Cone' NATs (see https://tools.ietf.org/html/rfc3489#section-5), but my device is apparently a 'Port Restricted Cone' one. [3]
Good news here is that we should be able to cover first 3 NAT types*
with the following somewhat-ICE-like (but still simplified) STUN usage:
- perform STUN request on host
- perform STUN request on client
- send client's STUN-discovered endpoint to host via XMPP
- send a 'hole-punching' message from host to the client's endpoint (we don't care whether client would be able to receive it)
- connect from client to host
I tested this approach with modified test application as a host, a simple UDP client as a client and with manually typing the endpoint instead of sending it via XMPP, and it seems working.
*
forth one, symmetric, requires either a relay (TURN) or complicated port-prediction technics: https://tools.ietf.org/id/draft-takeda-symmetric-nat-traversal-00.txt
comment:31 by , 8 years ago
Keywords: | beta added |
---|
comment:32 by , 8 years ago
Milestone: | Backlog → Alpha 22 |
---|
comment:33 by , 8 years ago
Replying to fcxSanya:
<...> we should be able to cover first 3 NAT types* with the following somewhat-ICE-like (but still simplified) STUN usage: <...>
I created an outline[1]
of this implementation in stk-stun-2 branch, it's based on the original stk-stun branch and for client/host XMPP interaction uses gloox ICEUDP in the same way as it was tested in gloox-libnice-test experimental application[2]
.
[1]
all steps from http://trac.wildfiregames.com/ticket/2305#comment:30 are added, but a hole-punching message itself is missing (should be easy to do since any UDP packet will work, probably most convenient solution is to re-use the STUN request as the code comment suggests), also glooxwrapper layer should be finished and some other small things here and there may be missing/unfinished
[2]
ICEUDP is used to pass a single ip:port pair from client to host, so ICEUDP transport may be a bit overkill and a standard message or a custom stanza may be used instead, but ICEUDP usage may be more clear/easier to understand and less intertwined with other messages processing.
comment:34 by , 8 years ago
Status update
The 'advanced' version (with a 'hole-punching' message from host to client) described in comment:30 is basically finished in 02c5b91 (the entire commit log/diff is here: https://github.com/0ad/0ad/compare/master...AlexanderOlkhovskiy:stk-stun-2).
We tested it with Sestroretsk1714 (lobby nickname) and it works for his 3G modem-based connection.
To my surprise, my own primary connection is routed through a provider's symmetric NAT now (at least 3 months ago my router received a public IP and stun test utility reported 'Port Restricted Cone' NAT (my own router); now router receives LAN IP and the utility says that there is a symmetric NAT on the route), so neither the current implementation of NAT traversal nor port forwarding does work with this connection anymore.
Currently the code doesn't build on Windows, but it should be easily fixable by including the proper headers. Also non-primary connection modes (direct connection, STUN enabled host with non-STUN client and vice versa etc.) may have some easily noticeable/fixable bugs. And the code still requires some style fixes (in particular part of those reported by elexis in comment:25)
follow-up: 36 comment:35 by , 8 years ago
(Can do the code style fixing too once the functionality is implemented if you want)
comment:36 by , 8 years ago
Status update
Windows build works starting from e2238f3 (see stk-stun-2 branch).
Turned out that Sestroretsk1714's connection (mentioned in my previous comment) had a full cone nat, which works even with the first simple nat traversal implementation (from stk-stun branch)
I have two port-restricted-nat mobile connections, which should be able to connect to each other using stk-stun-2 implementation. This doesn't work properly yet though. After some testing I came up with the two last commits in stk-stun-2-test branch (2ed4795 and 5ab4c93) and the following procedure:
- host on PC1
- attempt to join from PC2 (unsuccessufully)
- host on PC2
- join from PC1 (successfully)
(this apparently results in sending a punch-through message "manually", when it should be (and seemingly is) sent automatically (by CNetServer::SendHolePunchingMessage
))
In any case this means that nat traversal with this configuration is possible and actually works with the aforementioned extra changes / actions, I just need more experimenting / debugging to finish the properly working implementation.
Replying to elexis:
(Can do the code style fixing too once the functionality is implemented if you want)
Thanks for the offer! :) I will notify you when the logic will be done and work as intended :)
comment:37 by , 8 years ago
Added a couple more experimental changes into the stk-stun-2-test branch:
- 6e2b733 when joining, add a delay between sending endpoint to host and trying to connect
- 9b149f4 send multiple hole-punching messages between host and client
Now on some network configurations I'm able to connect from the first attempt. On other configurations I'm not able to connect at all (and the host / join switch procedure described in the previous comment doesn't help in such cases either). Under configurations I mean the same two mobile connections I mentioned before, I'm just reconnecting one of them receiving a different public ip (and presumable being routed via different hardware). Connections seemingly always remain of the same port-restricted nat type (I don't check the type each time, but when I did, I've never seen other types).
I plan to proceed with testing and in parallel I hope to finally start to clean-up and prepare the patch for review and testing by some theoretical volunteers :)
follow-up: 39 comment:38 by , 8 years ago
Also notice that it doesn't have to work perfectly in every case. Making it possible in many or most cases would already help tremendously. We can fix weird bugs in further releases.
comment:39 by , 8 years ago
Replying to elexis:
Also notice that it doesn't have to work perfectly in every case. Making it possible in many or most cases would already help tremendously. We can fix weird bugs in further releases.
That's my desire too. I'm finally somewhat satisfied with the current version (stk-stun-2-test), that's why I said I'm going to prepare it for review / testing with other people.
My current undestanding of its ability to connect (depending on the host / client nat types) is the following (host -- rows, client -- columns):
host \ client | full cone | a-restricted[1] | p-restricted[2] | symmetric |
---|---|---|---|---|
full cone | always | always | always | always |
a-restricted | always? | always? | always? | always? |
p-restricted | sometimes? | sometimes? | sometimes | never |
symmetric | never | never | never | never |
[1]
a-restricted = "(Address)-restricted cone"
[2]
p-restricted = "Port-restricted cone"
follow-up: 41 comment:40 by , 7 years ago
Description: | modified (diff) |
---|
Can you upload the patch to phabricator? I guess it's my call now to do some linting and testing. Would be quite awesome to have this in the upcoming release :-)
comment:41 by , 7 years ago
Replying to elexis:
Can you upload the patch to phabricator?
Done: https://code.wildfiregames.com/D364 (note that here are still TODO items (stated in the revision summary) which I plan to address in some near future).
comment:44 by , 6 years ago
Summary: | [PATCH] UDP Hole Punching / NAT Traversal → [PATCH] UDP Hole Punching / NAT Traversal / STUN |
---|
In 14332: