Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#2338 closed defect (fixed)

[PATCH] Crash in FreeUPnPUrls on OS X 64-bit

Reported by: Petr Owned by: Echelon9
Priority: Release Blocker Milestone: Alpha 17
Component: Core engine Keywords: patch
Cc: Patch:

Description (last modified by leper)

When I go to Multiplayer / Host game, 0AD will crash back to my desktop. This happens always. I am using Alpha 15 64bit, latest MAC OS X, MacBook Pro.

The crashing thread in both reports is thread 10 (the UPnP thread). The error is in both cases "pointer being freed was not allocated" in FreeUPnPUrls(), so we seem to have an incorrect pointer in the urls struct. (Possibly unrelated, but GetUPNPUrls() doesn't check for success of malloc)

Thread 10 Crashed:
0   libsystem_kernel.dylib          0x00007fff8dc6b866 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fff94c7b35c pthread_kill + 92
2   libsystem_c.dylib               0x00007fff9008dbba abort + 125
3   libsystem_malloc.dylib          0x00007fff91a06093 free + 411
4   com.wildfiregames.0ad           0x000000010067b328 FreeUPNPUrls + 152
5   com.wildfiregames.0ad           0x0000000100018aba CNetServerWorker::SetupUPnP(void*) + 1438
6   libsystem_pthread.dylib         0x00007fff94c7a899 _pthread_body + 138
7   libsystem_pthread.dylib         0x00007fff94c7a72a _pthread_start + 137
8   libsystem_pthread.dylib         0x00007fff94c7efc9 thread_start + 13
Thread 10 Crashed:
0   libsystem_kernel.dylib          0x00007fff8dc6b866 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fff94c7b35c pthread_kill + 92
2   libsystem_c.dylib               0x00007fff9008dbba abort + 125
3   libsystem_malloc.dylib          0x00007fff91a06093 free + 411
4   com.wildfiregames.0ad           0x000000010067b328 FreeUPNPUrls + 152
5   com.wildfiregames.0ad           0x0000000100018aba CNetServerWorker::SetupUPnP(void*) + 1438
6   libsystem_pthread.dylib         0x00007fff94c7a899 _pthread_body + 138
7   libsystem_pthread.dylib         0x00007fff94c7a72a _pthread_start + 137
8   libsystem_pthread.dylib         0x00007fff94c7efc9 thread_start + 13

Attachments (14)

error (49.0 KB ) - added by Petr 8 years ago.
error log file
errorHD4000 (47.8 KB ) - added by Petr 8 years ago.
same situation, but on integrated graphic card, Intel HD 4000
mainlog.html (32.8 KB ) - added by Petr 8 years ago.
mainlog.html
errorlog (48.8 KB ) - added by Petr 8 years ago.
another error log from today, when I try to play multiplayer with AI opponent
terminal (3.0 KB ) - added by Petr 8 years ago.
terminal debugg file
error+terminal (48.4 KB ) - added by Petr 8 years ago.
error log file which came with that debug
debugg (4.4 KB ) - added by Petr 8 years ago.
debugg 2nd
debugg2 (3.7 KB ) - added by Petr 8 years ago.
debugg3
bt (800 bytes ) - added by Petr 8 years ago.
bt file from terminal
btall (11.5 KB ) - added by Petr 8 years ago.
bt all from terminal
Fix-UPnP.diff (445 bytes ) - added by Josh 8 years ago.
Exit early when UPNP_GetValidIGD() returns 0.
fix-trac-2338-style1.patch (2.1 KB ) - added by Echelon9 8 years ago.
Stylistic approach #1
fix-trac-2338-style2.patch (2.8 KB ) - added by Echelon9 8 years ago.
Stylistic approach #2
fix-trac-2338-style2_v2.patch (2.3 KB ) - added by Echelon9 8 years ago.
Revised version of style2 patch (post code review comments)

Download all attachments as: .zip

Change History (56)

by Petr, 8 years ago

Attachment: error added

error log file

by Petr, 8 years ago

Attachment: errorHD4000 added

same situation, but on integrated graphic card, Intel HD 4000

comment:1 by Josh, 8 years ago

If I read those crashlogs correctly, the crash is occurring in the system after the GUI does a draw call. This appears to be an bug on Apple's side.

comment:2 by historic_bruno, 8 years ago

This only happens when hosting a game from the lobby? Can you try other code paths, like playing single player games, etc. to see if there is a similar crash? It doesn't seem related to the lobby at all. It's too bad how many problems 10.9 is causing :(

comment:3 by Petr, 8 years ago

I can host and play online with external ip, I can join and play via Lobby, but join only, when I try to host game, it crashes when I am selecting maps. Otherwise game is running very smoothly without any bugs.

comment:4 by Petr, 8 years ago

This problem is ONLY when I host game from Lobby

comment:5 by Petr, 8 years ago

any update on this?

comment:6 by Petr, 8 years ago

Just realized that I can not Host game even via my IP, not only Lobby

comment:7 by Josh, 8 years ago

Component: Multiplayer lobbyUI & Simulation
Keywords: Multiplayer Lobby removed

That's nice to know, so this is probably a UI engine bug.

comment:8 by Petr, 8 years ago

another update... I removed 64bit version and installed 32bit .... Lobby and Multiplayer are working good, no more crash, but there is no sound, only cracking noise :)

comment:9 by Josh, 8 years ago

Description: modified (diff)
Summary: Lobby Host game crash on MACCrash when hosting game on 64 bit MAC

Ah, great! Knowing that should help even more.

I've updated the title and description to be more relevant.

comment:10 by leper, 8 years ago

Component: UI & SimulationCore engine
Description: modified (diff)
Summary: Crash when hosting game on 64 bit MACCrash in FreeUPnPUrls on OS X 64-bit

Both crashes are because of a free of something that wasn't allocated in the UPnP thread. (FreeUPnPUrls())

Petrsharks: Could you try to host once more and attach the mainlog.html file from GameDataPaths?

by Petr, 8 years ago

Attachment: mainlog.html added

mainlog.html

by Petr, 8 years ago

Attachment: errorlog added

another error log from today, when I try to play multiplayer with AI opponent

comment:11 by Petr, 8 years ago

any updates?

comment:12 by leper, 8 years ago

Got two more reports of this issue today (http://pastebin.com/Bb8AK76f), both on 10.9.

A run of the game in a debugger would be nice.

comment:13 by Petr, 8 years ago

how to run debugging mode?

comment:14 by Petr, 8 years ago

misstake

Last edited 8 years ago by Petr (previous) (diff)

by Petr, 8 years ago

Attachment: terminal added

terminal debugg file

by Petr, 8 years ago

Attachment: error+terminal added

error log file which came with that debug

by Petr, 8 years ago

Attachment: debugg added

debugg 2nd

by Petr, 8 years ago

Attachment: debugg2 added

debugg3

by Petr, 8 years ago

Attachment: bt added

bt file from terminal

by Petr, 8 years ago

Attachment: btall added

bt all from terminal

comment:16 by Petr, 8 years ago

after apply "setting CONFIG2_MINIUPNPC to 0 instead of 1 in source/lib/config2.h" game will not crash, but I am unable to see other online players on lobby

in reply to:  16 ; comment:17 by sanderd17, 8 years ago

Replying to Petrsharks:

after apply "setting CONFIG2_MINIUPNPC to 0 instead of 1 in source/lib/config2.h" game will not crash, but I am unable to see other online players on lobby

Did you see WFGbot? The time you tried, there were probably no other players in the SVN lobby (the SVN lobby and the A15 lobby are separated).

As an easy workaround, we could disable CONFIG2_MINIUPNPC for all OS X users. We want to switch to UDP hole punching anyway, so I don't see the point in trying to fix the cause of the problem.

comment:18 by Petr, 8 years ago

Yes I seen that and 3 more bots. Another man explained to me on irc how it works, so I'll try to compile Alpha 15 later today.

comment:19 by Josh, 8 years ago

A better solution would be to just comment out the FreeUPnPUrls call.

Last edited 8 years ago by Josh (previous) (diff)

comment:20 by Petr, 8 years ago

How to do that?

comment:21 by Josh, 8 years ago

Open the file 0ad/source/network/NetServer.cpp and add two slashes at the beginning of the line containing "FreeUPNPUrls(&urls)". In svn it is on line 296.

in reply to:  17 comment:22 by leper, 8 years ago

Replying to sanderd17:

As an easy workaround, we could disable CONFIG2_MINIUPNPC for all OS X users. We want to switch to UDP hole punching anyway, so I don't see the point in trying to fix the cause of the problem.

Because leaving stuff broken (and who knows when we get NAT punching) is no solution.

Also this ticket is very likely caused by the same issue as #2418 since we try to free an invalid pointer (and commenting out the FreeUPNPUrls call just prevents that). #2418 hints at something (possibly miniupnpc or our way of calling it (which should be fine)) corrupting the stack near the urls variable, so fixing that one should also fix this ticket.

in reply to:  21 comment:23 by Petr, 8 years ago

Replying to Josh:

Open the file 0ad/source/network/NetServer.cpp and add two slashes at the beginning of the line containing "FreeUPNPUrls(&urls)". In svn it is on line 296.

working good, I just compiled 0.a.d with rev. version 14386, now I can see all people in lobby and host games. Thank you guys

comment:24 by Echelon9, 8 years ago

The issue is that memory is free'd which was never allocated, under the path through CNetServerWorker::SetupUPnP() which utilises a cached UPnP URL, rather than seeking a broadcast on the local network for any IGD capable devices.

The two paths can be seen within the if() statement at the following line.

http://trac.wildfiregames.com/browser/ps/trunk/source/network/NetServer.cpp#L233

Where:

  • !rootDescURL.empty(), and
  • UPNP_GetIGDFromUrl(rootDescURL.c_str(), &urls, &data, internalIPAddress, sizeof(internalIPAddress))

there is no subsequent call to UPNP_GetValidIGD(). UPNP_GetValidIGD() allocates struct UPNPUrls urls on the stack for all non-zero return value. See documentation:

/* UPNP_GetValidIGD() :
 * return values :
 *     0 = NO IGD found
 *     1 = A valid connected IGD has been found
 *     2 = A valid IGD has been found but it reported as
 *         not connected
 *     3 = an UPnP device has been found but was not recognized as an IGD
 *
 * In any non zero return case, the urls and data structures
 * passed as parameters are set. Donc forget to call FreeUPNPUrls(urls) to
 * free allocated memory.
 */

However, irrespective of the path taken at the if() statement on line 233 and the return value from UPNP_GetValidIGD(), the structure urls on the stack is always attempted to be free'd by FreeUPNPUrls() on line 296.

I have been preparing a patch to address certain other non-crashing correctness fixes in the usage of the UPnP API within CNetServerWorker::SetupUPnP(), but wanted to provide details of the reason for the memory corruption here.

The related ticket #2418 also reflects that certain systems are picking up the free'ing of unallocated memory at that line.

in reply to:  24 comment:25 by leper, 8 years ago

Replying to Echelon9:

I have been preparing a patch to address certain other non-crashing correctness fixes in the usage of the UPnP API within CNetServerWorker::SetupUPnP(), but wanted to provide details of the reason for the memory corruption here.

Any updates? It'd be nice to get this into Alpha 16. :-)

comment:26 by historic_bruno, 8 years ago

Milestone: BacklogAlpha 16
Priority: Should HaveMust Have

comment:27 by Josh, 8 years ago

Keywords: review patch added
Owner: set to Josh
Summary: Crash in FreeUPnPUrls on OS X 64-bit[PATCH] Crash in FreeUPnPUrls on OS X 64-bit

In Fix-UPnP.diff, I tried to patch the issue based on Echelon9's description. I'd appreciate if someone could test this actually fixes the issue before I commit my patch.

by Josh, 8 years ago

Attachment: Fix-UPnP.diff added

Exit early when UPNP_GetValidIGD() returns 0.

comment:28 by Josh, 8 years ago

Keywords: review patch removed
Summary: [PATCH] Crash in FreeUPnPUrls on OS X 64-bitCrash in FreeUPnPUrls on OS X 64-bit

Nevermind, after discussion on IRC it was noted that the case is already handled in the previous if statement. So my patch does effectively nothing.

by Echelon9, 8 years ago

Attachment: fix-trac-2338-style1.patch added

Stylistic approach #1

by Echelon9, 8 years ago

Attachment: fix-trac-2338-style2.patch added

Stylistic approach #2

comment:29 by Echelon9, 8 years ago

Following discussion on the forums, I'm attaching two alternative patches (Stylistic approach #1 and Stylistic approach #2). They should be broadly functionally equivalent. I have a slight preference for the structure of Stylistic approach #2.

It reflects the documentation efforts above and the understanding of the underlying cause of the memory corruption when calling FreeUPNPUrls(urls) after UPNP_GetValidIGD() returned zero or was never called at all.

NOTE: Given the current non-building status of Mac OS X Mavericks development environments, I've been unable to compile fully and test this patch thoroughly. It at least cleared a basic 'make', but it was necessarily rudimentary testing.

PATCH CHOSEN WILL NEED TO BE TESTED ON WINDOWS AND LINUX.

Suggested testing steps:

  1. Run game,
  2. From main menu select the hosting option,
  3. Repeat, without a prefilled cached IGD UPnP URL.

If either patch resolves this issue, it should also fix related ticket #2418.

Happy to discuss or work on any comments.

comment:30 by Echelon9, 8 years ago

Keywords: review patch added
Owner: changed from Josh to Echelon9
Status: newassigned
Summary: Crash in FreeUPnPUrls on OS X 64-bit[PATCH] Crash in FreeUPnPUrls on OS X 64-bit

comment:31 by Josh, 8 years ago

Just to make sure I'm understanding this right, the only functional change of this patch (excluding changes to logging) is to not free the urls variable when we use a cached url. If that's true, we shouldn't try to save urls.controlURL to the config on lines 292 to 295 of the original file when bAllocatedDescriptorURL is false.

As a style note, your if statement on line 257 is superfluous. It will always evaluate to true as you already check that ret != 0 in the body of the parent if statement.

comment:32 by Echelon9, 8 years ago

It's slightly more nuanced as the patch aims to not free the urls variable when (a) we use a cached url, or (b) UPNP_GetValidIGD() returns zero.

We need to ensure the free doesn't occur under both cases.

I didn't quite follow your comment about issues with line 257. Do you mean the use of the conditional ternary operator? Because ret should be set on the line immediately preceding it, and 0, 1, 2 or 3 could be returned from UPNP_GetValidIGD().

You're probably right about not saving urls.controlURL on lines 292 to 295.

comment:33 by Josh, 8 years ago

Whoops, I meant line 242 in style1. On style2 the assignment could be shortened to:

// If ret != 0, we allocated urls.
bAllocatedDescriptorURL = ret;

EDIT: Point (b) should already work. And you might actually be able to get rid of bAllocatedDescriptorURL.

Last edited 8 years ago by Josh (previous) (diff)

comment:34 by Echelon9, 8 years ago

I'm increasingly preferring style2, but good to clarify that line reference on style1.

However, I didn't follow how it would be possible to get rid of bAllocatedDescriptorURL. There appears to be a plausible scenario where upnpDiscover will return in the affirmative, whilst UPNP_GetValidIGD() will return zero and have not allocated the urls variable.

Without bAllocatedDescriptorURL guarding the call to FreeUPNPUrls(urls), the memory corruption could still occur.

comment:35 by Echelon9, 8 years ago

With wraitii's assistance on IRC, I was able to build an 0ad.app for the first time on OS X Mavericks 10.9. Wraitii and I will separately write up and settle the building steps for this platform.

This pleasing development did allow me to test this patch on the Mac platform.

Happily to say, style2 permitted no crashes when running the UPnP setup routines affected by the double-free bug. My testing confirmed the UPnP Internet Gateway Device was successfully contacted and the port forward registered (run one).

A second run which cached the location of the UPnP IGD also worked as expected without any crashes.

So this is confirmation from at least one testing platform (OSX 10.9) that the latest patch attached here does fix the memory corruption bug.

N.B this memory corruption will in fact affect each platform (Win, Linux, Mac) but it appears to only cause a hard crash for now on Mac.

Subtler bugs may be being happening on other platforms and it would be good to get it fixed up promptly. I've seen reported on other platforms of memory issues in and around this function.

Last edited 8 years ago by Echelon9 (previous) (diff)

comment:36 by historic_bruno, 8 years ago

Thanks, a few style comments about the patch:

  • Don't use the ternary operator, just make it an if-else block.
  • Don't include the big comment block for UPNP_GetValidIGD, a single line would probably alleviate any confusion, and the reader can refer to the relevant headers for more docs.

comment:37 by leper, 8 years ago

(Some more comments)

The if block containing the ternary operator could be replaced by

if ((ret = UPNP_GetValidIGD(devlist, &urls, &data, internalIPAddress, sizeof(internalIPAddress)))
    bAllocatedDescriptorURL = true;

It should also be }\nelse\n{.

comment:38 by Echelon9, 8 years ago

Following code review feedback, a further revised version of the patch attached.

This should be style only changes. No functional difference expected.

by Echelon9, 8 years ago

Revised version of style2 patch (post code review comments)

comment:39 by stanislas69, 8 years ago

Milestone: Alpha 16Alpha 17

comment:40 by wraitii, 7 years ago

Priority: Must HaveRelease Blocker

comment:41 by leper, 7 years ago

Resolution: fixed
Status: assignedclosed

In 15619:

Fix free() of not allocated struct in the UPnP code. Patch by Echelon9. Fixes #2338, #2418.

comment:42 by leper, 7 years ago

Keywords: review removed

Thanks for the patch.

Note: See TracTickets for help on using tickets.