#2338 closed defect (fixed)
[PATCH] Crash in FreeUPnPUrls on OS X 64-bit
Reported by: | Petr | Owned by: | Echelon9 |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 17 |
Component: | Core engine | Keywords: | patch |
Cc: | Patch: |
Description (last modified by )
When I go to Multiplayer / Host game, 0AD will crash back to my desktop. This happens always. I am using Alpha 15 64bit, latest MAC OS X, MacBook Pro.
The crashing thread in both reports is thread 10 (the UPnP thread). The error is in both cases "pointer being freed was not allocated" in FreeUPnPUrls(), so we seem to have an incorrect pointer in the urls struct. (Possibly unrelated, but GetUPNPUrls() doesn't check for success of malloc)
Thread 10 Crashed: 0 libsystem_kernel.dylib 0x00007fff8dc6b866 __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff94c7b35c pthread_kill + 92 2 libsystem_c.dylib 0x00007fff9008dbba abort + 125 3 libsystem_malloc.dylib 0x00007fff91a06093 free + 411 4 com.wildfiregames.0ad 0x000000010067b328 FreeUPNPUrls + 152 5 com.wildfiregames.0ad 0x0000000100018aba CNetServerWorker::SetupUPnP(void*) + 1438 6 libsystem_pthread.dylib 0x00007fff94c7a899 _pthread_body + 138 7 libsystem_pthread.dylib 0x00007fff94c7a72a _pthread_start + 137 8 libsystem_pthread.dylib 0x00007fff94c7efc9 thread_start + 13
Thread 10 Crashed: 0 libsystem_kernel.dylib 0x00007fff8dc6b866 __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff94c7b35c pthread_kill + 92 2 libsystem_c.dylib 0x00007fff9008dbba abort + 125 3 libsystem_malloc.dylib 0x00007fff91a06093 free + 411 4 com.wildfiregames.0ad 0x000000010067b328 FreeUPNPUrls + 152 5 com.wildfiregames.0ad 0x0000000100018aba CNetServerWorker::SetupUPnP(void*) + 1438 6 libsystem_pthread.dylib 0x00007fff94c7a899 _pthread_body + 138 7 libsystem_pthread.dylib 0x00007fff94c7a72a _pthread_start + 137 8 libsystem_pthread.dylib 0x00007fff94c7efc9 thread_start + 13
Attachments (14)
Change History (56)
by , 10 years ago
by , 10 years ago
Attachment: | errorHD4000 added |
---|
same situation, but on integrated graphic card, Intel HD 4000
comment:1 by , 10 years ago
If I read those crashlogs correctly, the crash is occurring in the system after the GUI does a draw call. This appears to be an bug on Apple's side.
comment:2 by , 10 years ago
This only happens when hosting a game from the lobby? Can you try other code paths, like playing single player games, etc. to see if there is a similar crash? It doesn't seem related to the lobby at all. It's too bad how many problems 10.9 is causing :(
comment:3 by , 10 years ago
I can host and play online with external ip, I can join and play via Lobby, but join only, when I try to host game, it crashes when I am selecting maps. Otherwise game is running very smoothly without any bugs.
comment:7 by , 10 years ago
Component: | Multiplayer lobby → UI & Simulation |
---|---|
Keywords: | Multiplayer Lobby removed |
That's nice to know, so this is probably a UI engine bug.
comment:8 by , 10 years ago
another update... I removed 64bit version and installed 32bit .... Lobby and Multiplayer are working good, no more crash, but there is no sound, only cracking noise :)
comment:9 by , 10 years ago
Description: | modified (diff) |
---|---|
Summary: | Lobby Host game crash on MAC → Crash when hosting game on 64 bit MAC |
Ah, great! Knowing that should help even more.
I've updated the title and description to be more relevant.
comment:10 by , 10 years ago
Component: | UI & Simulation → Core engine |
---|---|
Description: | modified (diff) |
Summary: | Crash when hosting game on 64 bit MAC → Crash in FreeUPnPUrls on OS X 64-bit |
Both crashes are because of a free of something that wasn't allocated in the UPnP thread. (FreeUPnPUrls())
Petrsharks: Could you try to host once more and attach the mainlog.html file from GameDataPaths?
by , 10 years ago
another error log from today, when I try to play multiplayer with AI opponent
comment:12 by , 10 years ago
Got two more reports of this issue today (http://pastebin.com/Bb8AK76f), both on 10.9.
A run of the game in a debugger would be nice.
follow-up: 17 comment:16 by , 10 years ago
after apply "setting CONFIG2_MINIUPNPC to 0 instead of 1 in source/lib/config2.h" game will not crash, but I am unable to see other online players on lobby
follow-up: 22 comment:17 by , 10 years ago
Replying to Petrsharks:
after apply "setting CONFIG2_MINIUPNPC to 0 instead of 1 in source/lib/config2.h" game will not crash, but I am unable to see other online players on lobby
Did you see WFGbot? The time you tried, there were probably no other players in the SVN lobby (the SVN lobby and the A15 lobby are separated).
As an easy workaround, we could disable CONFIG2_MINIUPNPC for all OS X users. We want to switch to UDP hole punching anyway, so I don't see the point in trying to fix the cause of the problem.
comment:18 by , 10 years ago
Yes I seen that and 3 more bots. Another man explained to me on irc how it works, so I'll try to compile Alpha 15 later today.
follow-up: 23 comment:21 by , 10 years ago
Open the file 0ad/source/network/NetServer.cpp and add two slashes at the beginning of the line containing "FreeUPNPUrls(&urls)". In svn it is on line 296.
comment:22 by , 10 years ago
Replying to sanderd17:
As an easy workaround, we could disable CONFIG2_MINIUPNPC for all OS X users. We want to switch to UDP hole punching anyway, so I don't see the point in trying to fix the cause of the problem.
Because leaving stuff broken (and who knows when we get NAT punching) is no solution.
Also this ticket is very likely caused by the same issue as #2418 since we try to free an invalid pointer (and commenting out the FreeUPNPUrls call just prevents that). #2418 hints at something (possibly miniupnpc or our way of calling it (which should be fine)) corrupting the stack near the urls variable, so fixing that one should also fix this ticket.
comment:23 by , 10 years ago
Replying to Josh:
Open the file 0ad/source/network/NetServer.cpp and add two slashes at the beginning of the line containing "FreeUPNPUrls(&urls)". In svn it is on line 296.
working good, I just compiled 0.a.d with rev. version 14386, now I can see all people in lobby and host games. Thank you guys
follow-up: 25 comment:24 by , 10 years ago
The issue is that memory is free'd which was never allocated, under the path through CNetServerWorker::SetupUPnP() which utilises a cached UPnP URL, rather than seeking a broadcast on the local network for any IGD capable devices.
The two paths can be seen within the if() statement at the following line.
http://trac.wildfiregames.com/browser/ps/trunk/source/network/NetServer.cpp#L233
Where:
- !rootDescURL.empty(), and
- UPNP_GetIGDFromUrl(rootDescURL.c_str(), &urls, &data, internalIPAddress, sizeof(internalIPAddress))
there is no subsequent call to UPNP_GetValidIGD(). UPNP_GetValidIGD() allocates struct UPNPUrls urls on the stack for all non-zero return value. See documentation:
/* UPNP_GetValidIGD() : * return values : * 0 = NO IGD found * 1 = A valid connected IGD has been found * 2 = A valid IGD has been found but it reported as * not connected * 3 = an UPnP device has been found but was not recognized as an IGD * * In any non zero return case, the urls and data structures * passed as parameters are set. Donc forget to call FreeUPNPUrls(urls) to * free allocated memory. */
However, irrespective of the path taken at the if() statement on line 233 and the return value from UPNP_GetValidIGD(), the structure urls on the stack is always attempted to be free'd by FreeUPNPUrls() on line 296.
I have been preparing a patch to address certain other non-crashing correctness fixes in the usage of the UPnP API within CNetServerWorker::SetupUPnP(), but wanted to provide details of the reason for the memory corruption here.
The related ticket #2418 also reflects that certain systems are picking up the free'ing of unallocated memory at that line.
comment:25 by , 10 years ago
Replying to Echelon9:
I have been preparing a patch to address certain other non-crashing correctness fixes in the usage of the UPnP API within CNetServerWorker::SetupUPnP(), but wanted to provide details of the reason for the memory corruption here.
Any updates? It'd be nice to get this into Alpha 16. :-)
comment:26 by , 10 years ago
Milestone: | Backlog → Alpha 16 |
---|---|
Priority: | Should Have → Must Have |
comment:27 by , 10 years ago
Keywords: | review patch added |
---|---|
Owner: | set to |
Summary: | Crash in FreeUPnPUrls on OS X 64-bit → [PATCH] Crash in FreeUPnPUrls on OS X 64-bit |
In Fix-UPnP.diff, I tried to patch the issue based on Echelon9's description. I'd appreciate if someone could test this actually fixes the issue before I commit my patch.
comment:28 by , 10 years ago
Keywords: | review patch removed |
---|---|
Summary: | [PATCH] Crash in FreeUPnPUrls on OS X 64-bit → Crash in FreeUPnPUrls on OS X 64-bit |
Nevermind, after discussion on IRC it was noted that the case is already handled in the previous if statement. So my patch does effectively nothing.
comment:29 by , 10 years ago
Following discussion on the forums, I'm attaching two alternative patches (Stylistic approach #1 and Stylistic approach #2). They should be broadly functionally equivalent. I have a slight preference for the structure of Stylistic approach #2.
It reflects the documentation efforts above and the understanding of the underlying cause of the memory corruption when calling FreeUPNPUrls(urls) after UPNP_GetValidIGD() returned zero or was never called at all.
NOTE: Given the current non-building status of Mac OS X Mavericks development environments, I've been unable to compile fully and test this patch thoroughly. It at least cleared a basic 'make', but it was necessarily rudimentary testing.
PATCH CHOSEN WILL NEED TO BE TESTED ON WINDOWS AND LINUX.
Suggested testing steps:
- Run game,
- From main menu select the hosting option,
- Repeat, without a prefilled cached IGD UPnP URL.
If either patch resolves this issue, it should also fix related ticket #2418.
Happy to discuss or work on any comments.
comment:30 by , 10 years ago
Keywords: | review patch added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Summary: | Crash in FreeUPnPUrls on OS X 64-bit → [PATCH] Crash in FreeUPnPUrls on OS X 64-bit |
comment:31 by , 10 years ago
Just to make sure I'm understanding this right, the only functional change of this patch (excluding changes to logging) is to not free the urls variable when we use a cached url. If that's true, we shouldn't try to save urls.controlURL to the config on lines 292 to 295 of the original file when bAllocatedDescriptorURL is false.
As a style note, your if statement on line 257 is superfluous. It will always evaluate to true as you already check that ret != 0 in the body of the parent if statement.
comment:32 by , 10 years ago
It's slightly more nuanced as the patch aims to not free the urls variable when (a) we use a cached url, or (b) UPNP_GetValidIGD() returns zero.
We need to ensure the free doesn't occur under both cases.
I didn't quite follow your comment about issues with line 257. Do you mean the use of the conditional ternary operator? Because ret should be set on the line immediately preceding it, and 0, 1, 2 or 3 could be returned from UPNP_GetValidIGD().
You're probably right about not saving urls.controlURL on lines 292 to 295.
comment:33 by , 10 years ago
Whoops, I meant line 242 in style1. On style2 the assignment could be shortened to:
// If ret != 0, we allocated urls. bAllocatedDescriptorURL = ret;
Also, (b) worked fine in the previous implementation.
comment:34 by , 10 years ago
I'm increasingly preferring style2, but good to clarify that line reference on style1.
However, I didn't follow how it would be possible to get rid of bAllocatedDescriptorURL. There appears to be a plausible scenario where upnpDiscover will return in the affirmative, whilst UPNP_GetValidIGD() will return zero and have not allocated the urls variable.
Without bAllocatedDescriptorURL guarding the call to FreeUPNPUrls(urls), the memory corruption could still occur.
comment:35 by , 10 years ago
With wraitii's assistance on IRC, I was able to build an 0ad.app for the first time on OS X Mavericks 10.9. Wraitii and I will separately write up and settle the building steps for this platform.
This pleasing development did allow me to test this patch on the Mac platform.
Happily to say, style2 permitted no crashes when running the UPnP setup routines affected by the double-free bug. My testing confirmed the UPnP Internet Gateway Device was successfully contacted and the port forward registered (run one).
A second run which cached the location of the UPnP IGD also worked as expected without any crashes.
So this is confirmation from at least one testing platform (OSX 10.9) that the latest patch attached here does fix the memory corruption bug.
N.B this memory corruption will in fact affect each platform (Win, Linux, Mac) but it appears to only cause a hard crash for now on Mac.
Subtler bugs may be being happening on other platforms and it would be good to get it fixed up promptly. I've seen reported on other platforms of memory issues in and around this function.
comment:36 by , 10 years ago
Thanks, a few style comments about the patch:
- Don't use the ternary operator, just make it an if-else block.
- Don't include the big comment block for UPNP_GetValidIGD, a single line would probably alleviate any confusion, and the reader can refer to the relevant headers for more docs.
comment:37 by , 10 years ago
(Some more comments)
The if block containing the ternary operator could be replaced by
if ((ret = UPNP_GetValidIGD(devlist, &urls, &data, internalIPAddress, sizeof(internalIPAddress))) bAllocatedDescriptorURL = true;
It should also be }\nelse\n{
.
comment:38 by , 10 years ago
Following code review feedback, a further revised version of the patch attached.
This should be style only changes. No functional difference expected.
by , 10 years ago
Attachment: | fix-trac-2338-style2_v2.patch added |
---|
Revised version of style2 patch (post code review comments)
comment:39 by , 10 years ago
Milestone: | Alpha 16 → Alpha 17 |
---|
comment:40 by , 10 years ago
Priority: | Must Have → Release Blocker |
---|
error log file