Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3466 closed defect (fixed)

[PATCH] Crash after playing a long time with Ais

Reported by: mimo Owned by: mimo
Priority: Release Blocker Milestone: Alpha 19
Component: Core engine Keywords: patch
Cc: Yves Patch:

Description

Replaying the attached commands.txt reproducibly crashes around turn 12580 while it had about 18000 turns.

Attachments (2)

commands.txt (329.4 KB ) - added by mimo 9 years ago.
ticket3466.diff (1.5 KB ) - added by mimo 9 years ago.
reverting r17086 and better fix for #3430

Download all attachments as: .zip

Change History (18)

by mimo, 9 years ago

Attachment: commands.txt added

comment:1 by mimo, 9 years ago

using gdb, i get

Program received signal SIGSEGV, Segmentation fault.
js::Invoke (cx=0xfe5850, args=<error reading variable: Cannot access memory at address 0x7fffff7fecd0>, construct=construct@entry=js::NO_CONSTRUCT)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/vm/Interpreter.cpp:440
440     {
(gdb) bt
#0  js::Invoke (cx=0xfe5850, args=<error reading variable: Cannot access memory at address 0x7fffff7fecd0>, construct=construct@entry=js::NO_CONSTRUCT)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/vm/Interpreter.cpp:440
#1  0x00007ffff70b26d3 in js_fun_apply (cx=0xfe5850, argc=<optimized out>, vp=0x7fffff7ffd68)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jsfun.cpp:1020
#2  0x00007fffe2f2bfe2 in ?? ()
#3  0xfffbffffe1b3e900 in ?? ()
#4  0x00007fffff7ffd40 in ?? ()
#5  0x0000000000000000 in ?? ()

comment:2 by elexis, 9 years ago

I ran it twice in visual-replay mode from autostart and once in non-visual replay mode and got a segfault each time on turn 10204.

Last edited 9 years ago by elexis (previous) (diff)

comment:3 by mimo, 9 years ago

That's very strange, because I tested again and crashed at turn 12707 using latest svn r17088. Using this time the visual replay, the backtrace given by gdb is

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6f765d8 in js::jit::GetPcScript (cx=cx@entry=0x17f4e30, scriptRes=scriptRes@entry=0x7fffff7ff5b0, pcRes=pcRes@entry=0x7fffff7ff580)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jit/IonFrames.cpp:1268
1268            ++it;
(gdb) bt
#0  0x00007ffff6f765d8 in js::jit::GetPcScript (cx=cx@entry=0x17f4e30, scriptRes=scriptRes@entry=0x7fffff7ff5b0, pcRes=pcRes@entry=0x7fffff7ff580)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jit/IonFrames.cpp:1268
#1  0x00007ffff6da78de in currentScript (allowCrossCompartment=JSContext::DONT_ALLOW_CROSS_COMPARTMENT, ppc=0x7fffff7ff580, this=0x17f4e30)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jscntxtinlines.h:479
#2  js::types::GetTypeCallerInitObject (cx=cx@entry=0x17f4e30, key=key@entry=JSProto_Array)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jsinferinlines.h:338
#3  0x00007ffff6da0607 in array_filter (cx=0x17f4e30, argc=<optimized out>, vp=0x7fffff7ff868)
    at /data/kubuntu/jeux/0ad/svnbak/libraries/source/spidermonkey/mozjs31/js/src/jsarray.cpp:2816
#4  0x00007ffff7e20549 in ?? ()

comment:4 by elexis, 9 years ago

When running gdb, I get the same result as in mimo's first post.

Running valgrind ./pyrogenesis -replay="/path/to/commands.txt" -mod=public I get the following:

==7968== Memcheck, a memory error detector
==7968== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==7968== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==7968== Command: ./pyrogenesis -replay=/path/to/commands.txt -mod=public
==7968== 
==7968== Invalid read of size 4
==7968==    at 0x5992ECB: runtimeData (IonCode.h:327)
==7968==    by 0x5992ECB: getCache (IonCode.h:530)
==7968==    by 0x5992ECB: js::jit::SetPropertyIC::update(JSContext*, unsigned long, JS::Handle<JSObject*>, JS::Handle<JS::Value>) (IonCaches.cpp:2757)
==7968==    by 0x402FC0B: ???
==7968==    by 0x594F104: EnterIon (Ion.cpp:2376)
==7968==    by 0x594F104: js::jit::IonCannon(JSContext*, js::RunState&) (Ion.cpp:2456)
==7968==    by 0x5C09DA2: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:401)
==7968==    by 0x5BFAD38: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:494)
==7968==    by 0x5C0AA70: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:531)
==7968==    by 0x5A9B0E7: JS_CallFunctionName(JSContext*, JS::Handle<JSObject*>, char const*, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (jsapi.cpp:5051)
==7968==    by 0x5EFD83: CallFunction_ (ScriptInterface.cpp:597)
==7968==    by 0x5EFD83: ScriptInterface::CallFunctionVoid(JS::Handle<JS::Value>, char const*) (ScriptInterface.cpp:581)
==7968==    by 0x5DFCF3: CComponentTypeScript::Init(CParamNode const&, unsigned int) (ScriptComponent.cpp:48)
==7968==    by 0x4A7142: CComponentManager::AddComponent(CEntityHandle, int, CParamNode const&) (ComponentManager.cpp:706)
==7968==    by 0x4A7290: CComponentManager::AddEntity(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, unsigned int) (ComponentManager.cpp:879)
==7968==    by 0x70D5C6: CXMLReader::ReadEntities(XMBElement, double) (MapReader.cpp:1048)
==7968==  Address 0x38 is not stack'd, malloc'd or (recently) free'd
==7968== 
==7968== 
==7968== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==7968==  Access not within mapped region at address 0x38
==7968==    at 0x5992ECB: runtimeData (IonCode.h:327)
==7968==    by 0x5992ECB: getCache (IonCode.h:530)
==7968==    by 0x5992ECB: js::jit::SetPropertyIC::update(JSContext*, unsigned long, JS::Handle<JSObject*>, JS::Handle<JS::Value>) (IonCaches.cpp:2757)
==7968==    by 0x402FC0B: ???
==7968==    by 0x594F104: EnterIon (Ion.cpp:2376)
==7968==    by 0x594F104: js::jit::IonCannon(JSContext*, js::RunState&) (Ion.cpp:2456)
==7968==    by 0x5C09DA2: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:401)
==7968==    by 0x5BFAD38: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:494)
==7968==    by 0x5C0AA70: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) (Interpreter.cpp:531)
==7968==    by 0x5A9B0E7: JS_CallFunctionName(JSContext*, JS::Handle<JSObject*>, char const*, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (jsapi.cpp:5051)
==7968==    by 0x5EFD83: CallFunction_ (ScriptInterface.cpp:597)
==7968==    by 0x5EFD83: ScriptInterface::CallFunctionVoid(JS::Handle<JS::Value>, char const*) (ScriptInterface.cpp:581)
==7968==    by 0x5DFCF3: CComponentTypeScript::Init(CParamNode const&, unsigned int) (ScriptComponent.cpp:48)
==7968==    by 0x4A7142: CComponentManager::AddComponent(CEntityHandle, int, CParamNode const&) (ComponentManager.cpp:706)
==7968==    by 0x4A7290: CComponentManager::AddEntity(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, unsigned int) (ComponentManager.cpp:879)
==7968==    by 0x70D5C6: CXMLReader::ReadEntities(XMBElement, double) (MapReader.cpp:1048)
==7968==  If you believe this happened as a result of a stack
==7968==  overflow in your program's main thread (unlikely but
==7968==  possible), you can try to increase the size of the
==7968==  main thread stack using the --main-stacksize= flag.
==7968==  The main thread stack size used in this run was 8388608.
==7968== 
==7968== HEAP SUMMARY:
==7968==     in use at exit: 52,007,811 bytes in 327,315 blocks
==7968==   total heap usage: 964,589 allocs, 637,274 frees, 261,254,884 bytes allocated
==7968== 
==7968== LEAK SUMMARY:
==7968==    definitely lost: 17,064 bytes in 155 blocks
==7968==    indirectly lost: 314,896 bytes in 300 blocks
==7968==      possibly lost: 9,719,231 bytes in 57,573 blocks
==7968==    still reachable: 41,956,620 bytes in 269,287 blocks
==7968==         suppressed: 0 bytes in 0 blocks
==7968== Rerun with --leak-check=full to see details of leaked memory
==7968== 
==7968== For counts of detected and suppressed errors, rerun with: -v
==7968== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
Killed

comment:5 by mimo, 9 years ago

elexis, it seems that your valgrind exited immediately. You should use "valgrind --smc-check=all-non-file ..." to allow valgrind to deal with js.

comment:6 by elexis, 9 years ago

After running that replay on valgrind with r17093 for 3-4 hours with the --smc-check=all-non-file argument, the application crashed on turn 12707:

==2593== Stack overflow in thread 1: can't grow stack to 0xffe801e08
==2593== Can't extend stack to 0xffe800e78 during signal delivery for thread 1:
==2593==   no stack segment
==2593== 
==2593== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2593==  Access not within mapped region at address 0xFFE800E78
==2593==    at 0x599E570: JitFrameIterator (JitFrameIterator.h:107)
==2593==    by 0x599E570: js::jit::GetPcScript(JSContext*, JSScript**, unsigned char**) (IonFrames.cpp:1253)
==2593==  If you believe this happened as a result of a stack
==2593==  overflow in your program's main thread (unlikely but
==2593==  possible), you can try to increase the size of the
==2593==  main thread stack using the --main-stacksize= flag.
==2593==  The main thread stack size used in this run was 8388608.
==2593== 
==2593== HEAP SUMMARY:
==2593==     in use at exit: 226,284,857 bytes in 1,011,524 blocks
==2593==   total heap usage: 190,817,959 allocs, 189,806,435 frees, 108,525,842,931 bytes allocated
==2593== 
==2593== LEAK SUMMARY:
==2593==    definitely lost: 15,656 bytes in 180 blocks
==2593==    indirectly lost: 517,028 bytes in 330 blocks
==2593==      possibly lost: 19,449,546 bytes in 160,161 blocks
==2593==    still reachable: 206,302,627 bytes in 850,853 blocks
==2593==         suppressed: 0 bytes in 0 blocks
==2593== Rerun with --leak-check=full to see details of leaked memory
==2593== 
==2593== For counts of detected and suppressed errors, rerun with: -v
==2593== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Killed

Should I repeat with --leak-check=full?

comment:7 by mimo, 9 years ago

Cc: Yves added
Milestone: BacklogAlpha 19
Priority: Should HaveRelease Blocker

Could be linked to http://wildfiregames.com/forum/index.php?showtopic=20057 so i put it as release blocker waiting for confirmation, and cc Yves if you have an idea about the moz31 crash.

comment:8 by elexis, 9 years ago

Summary: Replay a commands.txt with only Ais crashes reproduciblyCrash after playing a long time with Ais

Changing the title as it is assumingly the same crash. We should figure out the revision number where it was introduced. Notice there have been many forum posts with attached commands.txt files, so it seems to occur frequently.

Yves proposed to checkout #3349

(20:16:49) Yves`: mimo: my first suggestion would be testing the 31.5 release of SpiderMonkey

It seems the crash only occurs when playing with AIs, but the C++ code of the AI wasn't changed. However the AI uses a different interface to the pathfinder, so maybe some change in that determined the error. But maybe it was also a JS change that SpiderMonkey can't handle.

Most of that can be figured out by letting some games run with AIs for a long duration with different revisions.

comment:9 by elexis, 9 years ago

mimos replay: With r17085 and below, it doesn't crash. With r17086 and r17087 it crashes on turn 10204. With r17088 and above it crashes on turn 12707.


The first one on the forum [¹] uploaded by WhiteTreePaladin crashes with r17022 (which is the revision he linked [²]) due to #3425 (which was fixed by r17068).

[¹] http://wildfiregames.com/forum/index.php?showtopic=20057&p=309204 [²] https://github.com/0ad/0ad/commit/aa3abfb1f03dc910b14b3f5bb15bc7a9aa7e72ad

Last edited 9 years ago by elexis (previous) (diff)

comment:10 by elexis, 9 years ago

The second one on the forum [¹] uploaded by WhiteTreePaladin crashes with r17087 (which is the revision he linked [²]) on the last turn of the replay due to SpiderMonkey. gdb backtrace:

#0  0x00007ffff6ee7502 in js::GetLengthProperty (cx=cx@entry=0xe70650, obj=obj@entry=..., lengthp=lengthp@entry=0x7fffff7ff0b0)
    at /path/to/trunk/libraries/source/spidermonkey/mozjs31/js/src/jsarray.cpp:54
#1  0x00007ffff720238d in js_fun_apply (cx=0xe70650, argc=<optimized out>, vp=0x7fffff7ffc28)
    at /path/to/trunk/libraries/source/spidermonkey/mozjs31/js/src/jsfun.cpp:998
#2  0x00007fffcd345439 in ?? ()
#3  0xfffbffffe453e900 in ?? ()
#4  0x00007fffff7ffc00 in ?? ()
#5  0x0000000000000000 in ?? ()

[¹] http://wildfiregames.com/forum/index.php?showtopic=20057&p=309519 [²] https://github.com/0ad/0ad/commit/9bdb767577ebece38d60d66c70d9da772b81d04e

Last edited 9 years ago by elexis (previous) (diff)

comment:11 by Lionkanzen, 9 years ago

You attached my commands, I have the same issue.(2 times)

comment:12 by Stan, 9 years ago

Do not know if that helps but I got

<p class="error">ERROR: Script message handler OnGlobalConstructionFinished failed</p>

by adding

{
   error("" + this.GetCurrentState().slice(0,index+1));
   this.UnitFsm.SwitchToNextState(this, this.GetCurrentState().slice(0,index+1) + "IDLE");
   error("index :" + index);
}

Index is nearly always 10, except sometimes when it is 6. Curiously enough, this didn't crash that time. I'm on r17086

EDIT : I relaunched it and it crashed. So I think that's the fast forward cause it didn't if I went normal speed.

EDIT 2 : It seems that speed + line debug prevents the crash.

Last edited 9 years ago by Stan (previous) (diff)

comment:13 by Stan, 9 years ago

Ran the game with VS2013 attached, and it broke a 34:00 on line 55 of EngineScriptConversion.cpp still r17086. I will try newer versions.

55	if (!val->NewJSObject(*ScriptInterface::GetScriptInterfaceAndCBData(cx)->pScriptInterface, &obj))
56	{
57		// Report as an error, since scripts really shouldn't try to use unscriptable interfaces
58		LOGERROR("IComponent does not have a scriptable interface");
59		ret.setUndefined();
60		return;
61	}
Last edited 9 years ago by Stan (previous) (diff)

by mimo, 9 years ago

Attachment: ticket3466.diff added

reverting r17086 and better fix for #3430

comment:14 by mimo, 9 years ago

Keywords: patch review added
Summary: Crash after playing a long time with Ais[PATCH] Crash after playing a long time with Ais

comment:15 by mimo, 9 years ago

Owner: set to mimo
Resolution: fixed
Status: newclosed

In 17108:

revert r17086 and better fix of #3430, fixes #3466

comment:16 by mimo, 9 years ago

Keywords: review removed
Note: See TracTickets for help on using tickets.