Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#5521 closed defect (fixed)

Some ComputeShortPath 700ms simulation lagspike over 40 seconds

Reported by: elexis Owned by: wraitii
Priority: Release Blocker Milestone: Alpha 24
Component: Simulation Keywords: regression
Cc: Patch:

Description (last modified by elexis)

Between turn 1920 and turn 2000 in this r22504 replay, computing the turn took more than the average 80ms, even 5 times more than the otherwise maximum of 140ms - 700ms.

Notice that this is quite long - from 16:00 to 16:40.

The part that is slow is ComputeShortPath and Process Short Requests.

replay at https://code.wildfiregames.com/F1008256
profile1 json at https://code.wildfiregames.com/F1008316

What is particularly surprising is that there are not many units walking around, maybe < 200 that werent gathering?

Perhaps this isn't a regression, and perhaps it is a duplicate ticket.

In early 2018 temple looked a bit at lagspikes and mentioned that one type of lagspike occurs when every path is obstructed (all gates closed) and there was a gate-lock-command in this replay. (But does it have to be that bad for performance?)

https://trac.wildfiregames.com/raw-attachment/ticket/5521/Screenshot%20from%202019-07-20%2002-39-22.png

Attachments (4)

Screenshot from 2019-07-20 02-39-22.png (174.6 KB ) - added by elexis 5 years ago.
Capture d’écran 2019-07-20 à 09.59.17.png (743.8 KB ) - added by wraitii 5 years ago.
Non-visual replayy - Profiler2 of the same Turns
commands.txt (8.1 KB ) - added by wraitii 5 years ago.
Minimal replay of similar lag
shortpath.jpg (334.5 KB ) - added by wraitii 5 years ago.
Picture of the short path in the forest

Download all attachments as: .zip

Change History (10)

comment:1 by elexis, 5 years ago

Description: modified (diff)

comment:2 by elexis, 5 years ago

And without the patch in Phab:D2098 I wouldn't have found the part of the code that is triggering the lagspike (as either the Profiler2 UI capped it or I used it incorrectly).

by wraitii, 5 years ago

Non-visual replayy - Profiler2 of the same Turns

by wraitii, 5 years ago

Attachment: commands.txt added

Minimal replay of similar lag

by wraitii, 5 years ago

Attachment: shortpath.jpg added

Picture of the short path in the forest

comment:3 by wraitii, 5 years ago

Milestone: BacklogAlpha 24
Priority: Release BlockerMust Have

See attached screenshot for a profiler2 non-visual replay of the same.

As can be seen, several of these turns take over 800ms to compute. The cause is the few (~10) ComputeShortPath calls which are each taking 80ms.

Attached is a minimal replay of the same lag, and a screenshot of how the short-path must take many, many nodes into account.

This is a worst-case for the short pathfinder: none of the trees are AA squares, and the path is sometimes unreachable because of surrounding units. One fix in UM would be to fail faster when walking (aka having a higher max-range). But the fundamental issue is that this is a worst-case for the short pathfinder and we can't do much against it here. Probably unit-pushing would help in this case as we would not be computing as short path.

I'm downgrading this to "Must Have" - It might happen, but it's fairly rare that maps generate such big forests that aren't completely impassable. The UM issue of trying to go somewhere unreachable I plan to alleviate somewhat downstream by failing earlier when the order is walking or such.

--

A 'map' fix would be to clump such trees close so that units can no longer enter them, which would probably help.

comment:4 by wraitii, 5 years ago

Owner: set to wraitii
Priority: Must HaveRelease Blocker

Re-upgrading to Release Blocker, as this actually doesn't seem to happen on A23 (or at least much more rarely, from my testing).

The 'good' news is I tried reverting r22473 and it didn't fix the lag, so that bug fix commit can stay. Iit appears in general to be more tied to how UM no longer fails so early, thus repeatedly computing short paths when the goal is unreachable, whereas before with "ShouldConsiderOurselvesAtDestination" it didn't.

Gonna try a few things and see how I can mitigate this, possibly I'll just have to do the 'more messages' thing I had in mind.

comment:5 by wraitii, 5 years ago

Resolution: fixed
Status: newclosed

In 22526:

UnitMotion - Send messages to UnitAI when obstructed, to allow stopping early when walking and avoiding pathfinding lag.

As reported by #5521, Ordering units to walk to a point in a forest can lag terribly, as units will end up computing long short paths in the forest, which can result in ComputeShortPath calls that individually take upwards of 80ms.

This can be alleviated by allowing units to stop a bit earlier. A23 handled this in UnitMotion directly, but it's better to handle this in UnitAI to avoid surprises and to make it customisable on a per-state level.

This diff further speeds up using the short pathfinder by starting with a smaller search range, limiting the max-range more and moving the range slightly towards the goal.

This also refactors UM sending messages to UnitAI so that we may in the future push more information (in particular, the entity_id that a unit was obstructed by could be interesting for COMBAT).

This doesn't fix the possibility of lag, but it reduces its occurrence to levels that should be similar to A23 and thus acceptable enough.

Tested By: Freagarach

Fixes #5521

Differential Revision: https://code.wildfiregames.com/D2105

comment:6 by elexis, 5 years ago

Keywords: regression added
Note: See TracTickets for help on using tickets.