Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#5460 closed defect (fixed)

UnitAI infinite loop / segfault on a24

Reported by: elexis Owned by: wraitii
Priority: Release Blocker Milestone: Alpha 24
Component: UI & Simulation Keywords: regression
Cc: Patch: Phab:D2008

Description

While testing on r22393 with gcc 9.1.0:

Turn 2599 (200)...
Turn 2600 (200)...
Turn 2601 (200)...
Turn 2602 (200)...
Turn 2603 (200)...
Turn 2604 (200)...
Turn 2605 (200)...
Turn 2606 (200)...
Turn 2607 (200)...
 
Thread 1 "pyrogenesis" received signal SIGSEGV, Segmentation fault.
ScriptInterface::ToJSVal<IComponent*> (cx=cx@entry=0x555555dd75b0, ret=ret@entry=..., val=@0x7fffff7ff0c8: 0x555557baa750) at ../../../source/simulation2/scripting/EngineScriptConversions.cpp:38
38      JSAutoRequest rq(cx);
(gdb) info stack
#0  ScriptInterface::ToJSVal<IComponent*> (cx=cx@entry=0x555555dd75b0, ret=ret@entry=..., val=@0x7fffff7ff0c8: 0x555557baa750) at ../../../source/simulation2/scripting/EngineScriptConversions.cpp:38
#1  0x000055555567e547 in ScriptInterface::AssignOrToJSVal<IComponent*> (a=@0x7fffff7ff0c8: 0x555557baa750, handle=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/ScriptInterface.h:446
#2  ScriptInterface::AssignOrToJSValUnrooted<IComponent*> (a=@0x7fffff7ff0c8: 0x555557baa750, handle=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/ScriptInterface.h:349
#3  ScriptInterface_NativeWrapper<IComponent*>::call<IComponent* (ScriptInterface::CxPrivate*, int, int), int, int>(JSContext*, JS::MutableHandle<JS::Value>, IComponent* (ScriptInterface::CxPrivate*, int, int), int, int) (
    fptr=0x555555673ac0 <CComponentManager::Script_QueryInterface(ScriptInterface::CxPrivate*, int, int)>, rval=..., cx=0x555555dd75b0) at ../../../source/scriptinterface/NativeWrapperDefns.h:75
#4  ScriptInterface::call<IComponent*, int, int, &CComponentManager::Script_QueryInterface> (cx=0x555555dd75b0, argc=<optimized out>, vp=0x7fffff7ff168) at ../../../source/scriptinterface/NativeWrapperDefns.h:125
#5  0x00007fffd060833c in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb)

Reproducible for me in non-visual replay mode, see attachment.

Stan could not reproduce it on Windows.

(02:10:30 PM) Stan-laptop: TIMER| profile2 get buffer: 436.645 ms TIMER| profile2 visitor: 1.24783 s TIMER| profile2 query: 1.68587 s # Final state: bfb47daf7b73556810ca68e39fb27b5f

Attachments (2)

commands.txt (59.2 KB ) - added by elexis 5 years ago.
r22393, segfault on turn 2607
commands_jebel_plane_infinite_loop.txt (9.6 KB ) - added by elexis 5 years ago.
r22426 (jebel map needs normal size, tiny didnt work for me)

Download all attachments as: .zip

Change History (16)

by elexis, 5 years ago

Attachment: commands.txt added

r22393, segfault on turn 2607

comment:1 by Vladislav Belov, 5 years ago

There is an infinity loop:

UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js
FSM.prototype.SwitchToNextState@simulation/helpers/FSM.js
FSM.prototype.ProcessMessage@simulation/helpers/FSM.js
UnitAI.prototype.PushOrderFront@simulation/components/UnitAI.js
UnitAI.prototype.AttackVisibleEntity@simulation/components/UnitAI.js
UnitAI.prototype.RespondToTargetedEntities@simulation/components/UnitAI.js
UnitAI.prototype.AttackEntitiesByPreference@simulation/components/UnitAI.js
UnitAI.prototype.FindNewTargets@simulation/components/UnitAI.js
UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js

It causes the out of memory error which causes the crash. Sadly the OOM error isn't processed very well by SM.

comment:2 by elexis, 5 years ago

Summary: Segfault on a24UnitAI infinite loop / segfault on a24

comment:3 by wraitii, 5 years ago

as said on IRC:

[12:55] <wraitii> the cause is that Combat.Approaching says we can't move to attack an enemy, which causes it to refuse the Attack order, which means we go back to IDLE, which then right away finds a new target, rinse, repeat [12:56] <wraitii> so either MoveToTargetRange is wrong, and should return true and the order accepted, or it is right, and then I don't think we have a much better choice than to make "Find New Target" part of the stuff we do in Idle-Timer instead of Idle-enter

After some digging, this happens for formation members, since MoveToTargetAttackRange returns false. That formations members go through INDIVIDUAL.IDLE seems buggy.

Last edited 5 years ago by wraitii (previous) (diff)

comment:4 by wraitii, 5 years ago

I haven't found the exact stack trace, nor been able to reproduce this in a simple example, but this is happening because:

  • FinishOrder puts units in the IDLE state if there are no more orders (start of the infinite loop))
  • Idle units, in the enter state (aka right away) check for new targets
  • This gives us an Order.Attack order, which moves us in INDIVIDUAL.ATTACKINg
  • That calls MoveToTargetAttackRange which fails for formation members if the target is attacking (NB, unlike CheckTargetAttackRange, there is no check that said target is the same as the formation controller or not - that seems possibly a mistake, but it's un-related). If so, it returns false
  • this rejects the order, putting us back in IDLE state.

I see two changes possible

  • make INDIVIDUAL.IDLE not check for new targets if we are a formation member (probably makes sense).
  • change MoveToTargetAttackRange to not return false if we are a formation member and the controller is attacking (probably makes sense).

comment:5 by elexis, 5 years ago

Patch: Phab:D2008

comment:6 by bb, 5 years ago

A way to reproduce at r22418 (and some revisions earlier): fly a plane above the gaia city in Jebel Barkal

Last edited 5 years ago by bb (previous) (diff)

by elexis, 5 years ago

r22426 (jebel map needs normal size, tiny didnt work for me)

comment:7 by elexis, 5 years ago

In this jebel-plane replay, it's a Nubian archer garrisoned on the wall (entity 6348) that enters the following IDLE state re-enter infinite loop

Stacktrace for r22426 (with 6 lines inserted at SetNextState in L3420):

WARNING: SetNextState 6348 IDLE
WARNING: "UnitAI.prototype.SetNextState@simulation/components/UnitAI.js:3423:15
UnitAI.prototype.UnitFsmSpec[\"Order.Garrison\"]@simulation/components/UnitAI.js:591:4
FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:265:12
UnitAI.prototype.FinishOrder@simulation/components/UnitAI.js:3467:13
UnitAI.prototype.UnitFsmSpec[\"Order.Attack\"]@simulation/components/UnitAI.js:430:4
FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:265:12
UnitAI.prototype.PushOrderFront@simulation/components/UnitAI.js:3558:13
UnitAI.prototype.AttackVisibleEntity@simulation/components/UnitAI.js:4489:1
UnitAI.prototype.RespondToTargetedEntities@simulation/components/UnitAI.js:4526:1
UnitAI.prototype.AttackEntitiesByPreference@simulation/components/UnitAI.js:5906:9
UnitAI.prototype.FindNewTargets@simulation/components/UnitAI.js:5367:9
UnitAI.prototype.UnitFsmSpec.INDIVIDUAL.IDLE.enter@simulation/components/UnitAI.js:1436:9
FSM.prototype.SwitchToNextState@simulation/helpers/FSM.js:366:8
FSM.prototype.ProcessMessage@simulation/helpers/FSM.js:274:3
UnitAI.prototype.FinishOrder@simulation/components/UnitAI.js:3467:13
Last edited 5 years ago by elexis (previous) (diff)

comment:8 by elexis, 5 years ago

That jebel-plane / wall-archer looping was introduced in r22023 (found using bisection), the according stack is:

	"Order.Attack": function(msg) {
		// If we can't reach the target, but are standing ground, then abandon this attack order.
		// Unless we're hunting, that's a special case where we should continue attacking our target.
		if (this.GetStance().respondStandGround && !this.order.data.force && !this.order.data.hunting || this.IsTurret())
		{
			this.FinishOrder();
			return;
		}


	UnitAI.prototype.RespondToTargetedEntities = function(ents)
		if (this.GetStance().respondStandGround)
			return this.AttackVisibleEntity(ents);

UnitAI.prototype.AttackVisibleEntity = function(ents)
	this.PushOrderFront("Attack", { "target": target, "force": false, "allowCapture": true });

UnitAI.prototype.AttackEntitiesByPreference = function(ents)
	return this.RespondToTargetedEntities(entsWithoutPref);

UnitAI.prototype.FindNewTargets = function()
	return this.AttackEntitiesByPreference(cmpRangeManager.ResetActiveQuery(this.losRangeQuery));

		"IDLE": {
			"enter": function() {
				if (this.FindNewTargets())

comment:9 by wraitii, 5 years ago

I think the issue is that the archer is garrisoned, thus cannot move, so we enter Idle -> Find New Target -> Order.ATtack -> ATTACK.APPROACHING -> Fail to move -> Finish Order -> Idle.

I've just had a loop where for some reason a unit entered the LOS query of an enemy, but wasn't actually visible to that enemy, and bam, endless loop.

Soooo... I'm getting really convinced that the FindNewTarget() call needs to be moved to the timer.

comment:10 by wraitii, 5 years ago

Incorrect: the Jebel crash is because the unit is IDLE. It finds a new target, so tries to attack, but can't reach it. Since it's a garrisoned unit (ie. turret), it calls "FinishOrder" in Order.Attack. That makes us go through the Order.Garrison order that we have as a turret, which calls SetNextState("IDLE"), which loops all over again since there is FindNewTargets in IDLE.enter.

So the jebel crash is caused by r22023, the rest aren't, but FindNewTargets is a common thread in all of these.

Last edited 5 years ago by elexis (previous) (diff)

comment:11 by wraitii, 5 years ago

BTW I think the reason why we seem to have more infinite loops right now is that r22313 moved some checks from orders to states, so the check introduced in r22059 is not protecting us from as many problems.

Last edited 5 years ago by elexis (previous) (diff)

comment:12 by elexis, 5 years ago

Old unitAI infinite loop cases:
r21594/#5079 Fix infinite UnitAI loop when units are moved out of world (after promotion)
r19900 protect possible infinite loop when an order is issued on the same turn as garrisoning is effectively processed
r14665 Fix infinite loop when attacking a formation without valid targets
r14638 Use a filter to pick a new attack target in the same turn, while still avoiding an infinite loop

comment:13 by wraitii, 5 years ago

Owner: set to wraitii
Resolution: fixed
Status: newclosed

In 22475:

Fix IDLE-related infinite loops by moving stateful calls to IDLE.timer.

The game currently has several infinite loops, and the stack trace is always a variation on the same pattern that units go through IDLE as a default state, immediately try another order (possibly entering a new state), failing, and goes back to IDLE.

IDLE.Timer is a general workaround for this issue. It wastes a turn every time a unit goes idle, so should a better solution be found for the general problem of infinite loops in UnitAI, it should be removed.

This revert rP22059 which was intended as a safety net, but couldn't protect againt infinite loops between two different states.

Fixes #5460

Differential Revision: https://code.wildfiregames.com/D2041

comment:14 by elexis, 5 years ago

Keywords: regression added
Note: See TracTickets for help on using tickets.