#5460 closed defect (fixed)
UnitAI infinite loop / segfault on a24
Reported by: | elexis | Owned by: | wraitii |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 24 |
Component: | UI & Simulation | Keywords: | regression |
Cc: | Patch: | Phab:D2008 |
Description
While testing on r22393 with gcc 9.1.0:
Turn 2599 (200)... Turn 2600 (200)... Turn 2601 (200)... Turn 2602 (200)... Turn 2603 (200)... Turn 2604 (200)... Turn 2605 (200)... Turn 2606 (200)... Turn 2607 (200)... Thread 1 "pyrogenesis" received signal SIGSEGV, Segmentation fault. ScriptInterface::ToJSVal<IComponent*> (cx=cx@entry=0x555555dd75b0, ret=ret@entry=..., val=@0x7fffff7ff0c8: 0x555557baa750) at ../../../source/simulation2/scripting/EngineScriptConversions.cpp:38 38 JSAutoRequest rq(cx); (gdb) info stack #0 ScriptInterface::ToJSVal<IComponent*> (cx=cx@entry=0x555555dd75b0, ret=ret@entry=..., val=@0x7fffff7ff0c8: 0x555557baa750) at ../../../source/simulation2/scripting/EngineScriptConversions.cpp:38 #1 0x000055555567e547 in ScriptInterface::AssignOrToJSVal<IComponent*> (a=@0x7fffff7ff0c8: 0x555557baa750, handle=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/ScriptInterface.h:446 #2 ScriptInterface::AssignOrToJSValUnrooted<IComponent*> (a=@0x7fffff7ff0c8: 0x555557baa750, handle=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/ScriptInterface.h:349 #3 ScriptInterface_NativeWrapper<IComponent*>::call<IComponent* (ScriptInterface::CxPrivate*, int, int), int, int>(JSContext*, JS::MutableHandle<JS::Value>, IComponent* (ScriptInterface::CxPrivate*, int, int), int, int) ( fptr=0x555555673ac0 <CComponentManager::Script_QueryInterface(ScriptInterface::CxPrivate*, int, int)>, rval=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/NativeWrapperDefns.h:75 #4 ScriptInterface::call<IComponent*, int, int, &CComponentManager::Script_QueryInterface> (cx=0x555555dd75b0, argc=<optimized out>, vp=0x7fffff7ff168) at ../../../source/scriptinterface/NativeWrapperDefns.h:125 #5 0x00007fffd060833c in ?? () #6 0x0000000000000000 in ?? () (gdb)
Reproducible for me in non-visual replay mode, see attachment.
Stan could not reproduce it on Windows.
(02:10:30 PM) Stan-laptop: TIMER| profile2 get buffer: 436.645 ms TIMER| profile2 visitor: 1.24783 s TIMER| profile2 query: 1.68587 s # Final state: bfb47daf7b73556810ca68e39fb27b5f
Attachments (2)
Change History (16)
by , 5 years ago
Attachment: | commands.txt added |
---|
comment:1 by , 5 years ago
There is an infinity loop:
UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js FSM.prototype.SwitchToNextState@simulation/helpers/FSM.js FSM.prototype.ProcessMessage@simulation/helpers/FSM.js UnitAI.prototype.PushOrderFront@simulation/components/UnitAI.js UnitAI.prototype.AttackVisibleEntity@simulation/components/UnitAI.js UnitAI.prototype.RespondToTargetedEntities@simulation/components/UnitAI.js UnitAI.prototype.AttackEntitiesByPreference@simulation/components/UnitAI.js UnitAI.prototype.FindNewTargets@simulation/components/UnitAI.js UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js
It causes the out of memory error which causes the crash. Sadly the OOM error isn't processed very well by SM.
comment:2 by , 5 years ago
Summary: | Segfault on a24 → UnitAI infinite loop / segfault on a24 |
---|
comment:3 by , 5 years ago
as said on IRC:
[12:55] <wraitii> the cause is that Combat.Approaching says we can't move to attack an enemy, which causes it to refuse the Attack order, which means we go back to IDLE, which then right away finds a new target, rinse, repeat [12:56] <wraitii> so either MoveToTargetRange is wrong, and should return true and the order accepted, or it is right, and then I don't think we have a much better choice than to make "Find New Target" part of the stuff we do in Idle-Timer instead of Idle-enter
After some digging, this happens for formation members, since MoveToTargetAttackRange returns false. That formations members go through INDIVIDUAL.IDLE seems buggy.
comment:4 by , 5 years ago
I haven't found the exact stack trace, nor been able to reproduce this in a simple example, but this is happening because:
- FinishOrder puts units in the IDLE state if there are no more orders (start of the infinite loop))
- Idle units, in the enter state (aka right away) check for new targets
- This gives us an Order.Attack order, which moves us in INDIVIDUAL.ATTACKINg
- That calls MoveToTargetAttackRange which fails for formation members if the target is attacking (NB, unlike CheckTargetAttackRange, there is no check that said target is the same as the formation controller or not - that seems possibly a mistake, but it's un-related). If so, it returns false
- this rejects the order, putting us back in IDLE state.
I see two changes possible
- make INDIVIDUAL.IDLE not check for new targets if we are a formation member (probably makes sense).
- change MoveToTargetAttackRange to not return false if we are a formation member and the controller is attacking (probably makes sense).
comment:5 by , 5 years ago
Patch: | → Phab:D2008 |
---|
comment:6 by , 5 years ago
A way to reproduce at r22418 (and some revisions earlier): fly a plane above the gaia city in Jebel Barkal
by , 5 years ago
Attachment: | commands_jebel_plane_infinite_loop.txt added |
---|
r22426 (jebel map needs normal size, tiny didnt work for me)
comment:7 by , 5 years ago
In this jebel-plane replay, it's a Nubian archer garrisoned on the wall (entity 6348) that enters the following IDLE state re-enter infinite loop
Stacktrace for r22426 (with 6 lines inserted at SetNextState in L3420):
WARNING: SetNextState 6348 IDLE WARNING: "UnitAI.prototype.SetNextState@simulation/components/UnitAI.js:3423:15 UnitAI.prototype.UnitFsmSpec[\"Order.Garrison\"]@simulation/components/UnitAI.js:591:4 FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:265:12 UnitAI.prototype.FinishOrder@simulation/components/UnitAI.js:3467:13 UnitAI.prototype.UnitFsmSpec[\"Order.Attack\"]@simulation/components/UnitAI.js:430:4 FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:265:12 UnitAI.prototype.PushOrderFront@simulation/components/UnitAI.js:3558:13 UnitAI.prototype.AttackVisibleEntity@simulation/components/UnitAI.js:4489:1 UnitAI.prototype.RespondToTargetedEntities@simulation/components/UnitAI.js:4526:1 UnitAI.prototype.AttackEntitiesByPreference@simulation/components/UnitAI.js:5906:9 UnitAI.prototype.FindNewTargets@simulation/components/UnitAI.js:5367:9 UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js:1436:9 FSM.prototype.SwitchToNextState@simulation/helpers/FSM.js:366:8 FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:274:3 UnitAI.prototype.FinishOrder@simulation/components/UnitAI.js:3467:13
comment:8 by , 5 years ago
That jebel-plane / wall-archer looping was introduced in r22023 (found using bisection), the according stack is:
"Order.Attack": function(msg) { // If we can't reach the target, but are standing ground, then abandon this attack order. // Unless we're hunting, that's a special case where we should continue attacking our target. if (this.GetStance().respondStandGround && !this.order.data.force && !this.order.data.hunting || this.IsTurret()) { this.FinishOrder(); return; } UnitAI.prototype.RespondToTargetedEntities = function(ents) if (this.GetStance().respondStandGround) return this.AttackVisibleEntity(ents); UnitAI.prototype.AttackVisibleEntity = function(ents) this.PushOrderFront("Attack", { "target": target, "force": false, "allowCapture": true }); UnitAI.prototype.AttackEntitiesByPreference = function(ents) return this.RespondToTargetedEntities(entsWithoutPref); UnitAI.prototype.FindNewTargets = function() return this.AttackEntitiesByPreference(cmpRangeManager.ResetActiveQuery(this.losRangeQuery)); "IDLE": { "enter": function() { if (this.FindNewTargets())
comment:9 by , 5 years ago
I think the issue is that the archer is garrisoned, thus cannot move, so we enter Idle -> Find New Target -> Order.ATtack -> ATTACK.APPROACHING -> Fail to move -> Finish Order -> Idle.
I've just had a loop where for some reason a unit entered the LOS query of an enemy, but wasn't actually visible to that enemy, and bam, endless loop.
Soooo... I'm getting really convinced that the FindNewTarget() call needs to be moved to the timer.
comment:10 by , 5 years ago
Incorrect: the Jebel crash is because the unit is IDLE. It finds a new target, so tries to attack, but can't reach it. Since it's a garrisoned unit (ie. turret), it calls "FinishOrder" in Order.Attack. That makes us go through the Order.Garrison order that we have as a turret, which calls SetNextState("IDLE"), which loops all over again since there is FindNewTargets in IDLE.enter.
So the jebel crash is caused by rP22023, the rest aren't, but FindNewTargets is a common thread in all of these.
comment:11 by , 5 years ago
comment:12 by , 5 years ago
Old unitAI infinite loop cases:
r21594/#5079 Fix infinite UnitAI loop when units are moved out of world (after promotion)
r19900 protect possible infinite loop when an order is issued on the same turn as garrisoning is effectively processed
r14665 Fix infinite loop when attacking a formation without valid targets
r14638 Use a filter to pick a new attack target in the same turn, while still avoiding an infinite loop
comment:14 by , 5 years ago
Keywords: | regression added |
---|
r22393, segfault on turn 2607