Opened 2 years ago

Last modified 12 months ago

#3588 new defect

General pathfinder optimizations

Reported by: wraitii Owned by:
Priority: Should Have Milestone: Backlog
Component: UI & Simulation Keywords: pathfinding
Cc: Itms Patch:

Description

For the short-range pathfinder, see #1942.

This ticket deals with various ways to optimize unit motion further. There are a few components at play here:

  • CmpUnitMotion?, which decides how units move, and makes the move.
  • Long-range pathfinder, which uses the hierarchical pathfinder and JPS
  • Short-range (vertex) pathfinder, to compute shorter distances that take units into account
  • ObstructionManager?, which keeps track of all obstructions in the game (buildings, tree, units, etc.)

Some profiling on A19 games (circa [17120]), on real MP 3V3/4V4 games have shown me the following.

On the C++ side, a turn is basically UpdateComponents?, which is:

So it's basically 70% pathfinder, 10% unitMotion (mostly sending messages), and the rest is the range manager running queries.

There are a few more interesting details:

  • UpdateGrid? is 50% HierarchicalPathf? update, which is slow as it uses std::maps, mostly.
    • 20% rasterization in the obstructionManager
  • The pathfinder itself is about 50% short-range pathfinder, 20% long.
    • of the Long-range pathfinder, it's 15% (!!!) MakeGoalReachable? in the hierarchical one, the JPS is basically only 3 to 4%
    • short-range is a bloody mess, it's all in small bits in ComputeShortPath?, see #1942

So what we can tell here is that the hierarchical pathfinder is slow. It should probably be moved to use another container than map, and we need to make MakeGoalReachable? MUCH faster.

MakeGoalReachable? tries to find the accessible goal navcell that's closest to us. This ends up basically being a pathfinding problem, and I think we should run a raw form of A* in some situations, as it would be much faster than iterating over all cells (96*96) which is what we currently do in most cases. This is basically the function "RegionNearestNavcellInGoal?", which should be optimized. Also needing optimization: PathGoal::NavcellContainsGoal?.

Also, we do not necessarily need exactly the closest cell accessible to us, so we should run with some approximations here.We might even get away with letting UnitMotion? handle the very end of the path.

For information, UnitMotion? is 30% CheckMovement? and 25% MoveAndTurnTo?, because the former uses "TestLine?" and the latter broadcasts a message toalmost all entities on Earth. The use of "GetUnitsOnObstruction?" is also bad, and should be remedied somewhat.

Attachments (13)

optimize-rasterize.patch (2.7 KB) - added by mimo 2 years ago.
short-rangememusage.patch (6.0 KB) - added by wraitii 2 years ago.
profileReference.png (236.9 KB) - added by mimo 2 years ago.
profiling on a non-visual replay 2vs2 with 4 Ais
geometry.patch (1.7 KB) - added by mimo 2 years ago.
inlining of Geometry::PointIsInSquare?
profileR17297.png (228.6 KB) - added by mimo 2 years ago.
profileR17310.png (173.2 KB) - added by mimo 2 years ago.
InitRegions.patch (2.2 KB) - added by fsincos 21 months ago.
Cleanup1.patch (11.5 KB) - added by fsincos 20 months ago.
geometry2.patch (9.8 KB) - added by fsincos 20 months ago.
Rough first version of the mentioned CheckVisibility? trick.
geometry1.patch (4.8 KB) - added by fsincos 19 months ago.
Small geometry cleanup to enable further optimizations.
geometry1.2.patch (11.1 KB) - added by fsincos 19 months ago.
Initial implementation of replacing CheckVisibility? with a TestRaySquare?-based version.
UnorderedSetHierPath.patch (3.8 KB) - added by wraitii 18 months ago.
profile_unordered.png (183.5 KB) - added by Itms 14 months ago.

Download all attachments as: .zip

Change History (43)

comment:1 Changed 2 years ago by wraitii

Some comments about memory.

Those numbers are mostly indicative, they are not over the course of a whole game (mostly the middle of a MP 3V3), since I was too lazy to wait for the end of the replay:

Looking at transient/temporary memory:

  • ProcessSameTurnMoves? is 60% in terms of size, Pathf::UpdateGrid? 10%. The short pathfinder is about 50% of those. See #1942
  • The most often allocated slots are of course rather small. 32, 48, 64 bytes. The highest in quantity is the few kb ones, such as the 1kb, 32KB. Those are most often allocated by the JS.
  • That being said, the short-range pathfinder seems to use a ton of temporary allocations when it builds the edges. Apparently we use "push_back" a lot, and not the right way.

Changed 2 years ago by mimo

Attachment: optimize-rasterize.patch added

comment:2 Changed 2 years ago by mimo

I had a look at the RasterizeRectWithClearance? function that used quite some time in your profiles, see http://irclogs.wildfiregames.com/2015-11-08-QuakeNet-%230ad-dev.log Here is a patch which should decrease its time by a factor 2 or more (I've checked that it gives exactly the same Hash code, but didn't profile it). Could go to a19 when confirmed.

comment:3 Changed 2 years ago by mimo

optimize-rasterize.patch commited in r17212

Changed 2 years ago by wraitii

Attachment: short-rangememusage.patch added

comment:4 Changed 2 years ago by wraitii

Good, thanks for your work!

Attache patch, short-rangememusage.patch, is an aggressive optimization of the short-range pathfinder's memory footprint (see #1942). I haven't reprofiled the memory usage but it should be way down.

This both optimizes the pathfinder itself (by around 5/10%, it seems) and the game, as it reduces fragmentation (and the short-range pathfinder was basically responsible for 50% of the C++'s allocations). Over the course of a complete game I think it could speed the pathfinder up by around 10%.

The price to pay is an increased ugliness to the code itself, because I haven't tried using memory pools yet. A proper fix would use those, but I wanted proof of concept.

I'm running some numbers to see if it would still be interesting for A19 or not.

comment:5 Changed 2 years ago by wraitii

I ran a comparison with vanilla SVN: the game does seem to run substantially smoother. I ran 2 instances with the optimization, one without, an the total time to process 3000 turns of an MP 3V3 nomad game were:

-Opt run 1: 192000 ms with 24% in the short-range pathfinder

-SVN run 1: 235000 ms with 27% in the short-range pathfinder

-Opt run 2: 217000 ms with 24% in the short-range pathfinder

Given that the other components, in proportion, took more time, I think it's safe to say this optimization speeds the game up by about 3 to 4%, and possibly more (up to about 10%).

Changed 2 years ago by mimo

Attachment: profileReference.png added

profiling on a non-visual replay 2vs2 with 4 Ais

Changed 2 years ago by mimo

Attachment: geometry.patch added

inlining of Geometry::PointIsInSquare?

comment:6 Changed 2 years ago by mimo

Looking at the profile given here, the function PointIsInSquare? is a good candidate to be inlined. Here is a patch doing that. The gain seems to be about 0.4% on nonvisual replays.

comment:7 Changed 2 years ago by wraitii

In 17236:

Optimize Geometry::PointIsInSquare?, which is used quite often in performance hotspots.
Original patch by mimo. Refs #3588

comment:8 Changed 2 years ago by stanislas69

Refs #3541

Changed 2 years ago by mimo

Attachment: profileR17297.png added

comment:9 Changed 2 years ago by mimo

The Hierarchical pathfinder was optimized in r17284, resulting in the new profile (profileR17297.png) attached. Comparing both, we see that the huge number (63 millions) of calls to NavcellContainsGoal? is now gone, and consequently, the regionNearestNavcellInGoal which represented 6.4% of the replay time is now reduced to 1.1% From this profile, we also see that the updateGrids (both for the Hierarchical and long pathfinder) account each for 15% of the replay time.

comment:10 Changed 2 years ago by mimo

In 17310:

improves performance of hierarchical pathFinder, refs #3588

Changed 2 years ago by mimo

Attachment: profileR17310.png added

comment:11 Changed 2 years ago by mimo

As can be seen comparing the profiles from r17297 and r17310, the HierarchicalPathfinder? function FindEdges? drops from 5.6% of replay time to 1.7%, most of the time being due to a large amount of insert trials (34 millions from the r17297 profile, now decreased to 2.6 millions).

comment:12 Changed 2 years ago by Itms

Cc: Itms added; itms removed

heh :)

comment:13 Changed 2 years ago by mimo

In 17350:

improve pathFinder by optimizing the DistanceToSquare? computation, refs #3588

comment:14 Changed 2 years ago by mimo

In the previous profile, we see that the function DistanceToSquareSquared? takes about 2% of the replay time. r17350 simplifies the code (taking advantage of the symmetries of the problem) and makes it faster, decreasing it to 1.3%.

comment:15 Changed 2 years ago by mimo

In 17413:

improve performance of hierarchical pathfinder, refs #3588

comment:16 Changed 2 years ago by mimo

As seen from the previous cachegrind profile, the grid update of the hierarchical pathfinder takes a large part of the non-visual replay: in r17412, it represents 11% with 7% internal to the Update function. The patch in r17413 changes the way the loops are done, reducing this to 7% for the grid update with 3% internal to the Update function.

comment:17 Changed 21 months ago by Itms

Milestone: Alpha 20Alpha 21

This is still a work under progress so I don't backlog it, but pushing it so it doesn't delay the release.

Changed 21 months ago by fsincos

Attachment: InitRegions.patch added

comment:18 Changed 21 months ago by fsincos

Keywords: review added

Some micro-optimizations in HierarchicalPathfinder::Chunk::InitRegions?:

  • The check for infinite loops is not needed since RootID(n,...) terminates after < n+1 steps (x gets strictly smaller with each step).
  • We can avoid some RootID calls by checking if we just connected the same IDs, i.e. if the previous DownID and the previous LeftID were non-zero.
  • Computing the RootIDs can be simplified: If we computed RootID(1, ...), ..., RootID(n, ...), the value of RootID(n+1, ...) is simply either n+1 (if it's a fixed point) or the RootID of whichever ID it gets mapped to (which we already computed).

comment:19 in reply to:  18 Changed 21 months ago by fabio

Replying to fsincos:

Some micro-optimizations in HierarchicalPathfinder::Chunk::InitRegions?:

I would suggest to open a new ticket for this, to avoid forgetting it in this one, already too long. If it's worth, you may want to add a graph showing performance improvement, as described here.

comment:20 Changed 21 months ago by Itms

That's ok if you want to keep it here, I won't forget it. Else if you create a new ticket please put me in Copy. :)

comment:21 Changed 20 months ago by Itms

Hi, I took a look at the patch and it looks nice :)

I have one comment and one question:

  • I am going to remove your last change, because it's not trivial to prove this induction, so the code is less clear, and in any case you don't save anything (your proof is right, so connect[connect[i]]; is either equivalent either worse than calling RootID).
  • Would you like to have your name in the credits? If yes, do you want both your nickname and real name or only one in them?

Also I read in the IRC logs that you were working on the vertex pathfinder the last time you were active. If you don't succeed at something, the vertex pathfinder will probably be replaced so it's not too much a problem if you can't manage to improve it :)

Thanks for your work!

Changed 20 months ago by fsincos

Attachment: Cleanup1.patch added

comment:22 Changed 20 months ago by fsincos

  • That last change (connect[connect[i]] instead of RootID) is indeed unneeded.
  • If I contribute something significant, I'll think about putting my name on the credits. However, at the moment I'm just doing some small stuff for fun.

To say I've worked on the short-range pathfinder would be an overstatement, but I've put together a couple of small patches. This one is the first of them (took me about 20 minutes, so it's not like it's much). The memory fragmentation gets reduced a bit and the code is a bit cleaner. Barring oversights, it should produce exactly the same results; if that isn't needed the number of terrain edges can be reduced by only ending edges if that side is passable (this combines some edges). The longer edges resulting from this would also be more likely to block rays.

As a bonus, the CheckVisibility? functions have been updated to eliminate shifts and overflows from the dot products. The CompareLength? functions could receive the same treatment (x*x + y*y > u*u + v*v <=> x*x - u*u > v*v - y*y).

comment:23 Changed 20 months ago by Itms

In 18011:

Slight improvement of the flood filling algorithm in the hierarchical pathfinder.
Remove some useless check and a useless reverse loop, and add a flag to prevent redundant checks.

Patch by fsincos, refs #3588

comment:24 Changed 20 months ago by fsincos

Here is a small cleanup / optimization of the TestRaySquare? functions and the SquareSAT function in source/simulation2/helpers/Geometry.cpp. The trick (all of k +- l +- m have the same sign) <=> |k| > |l| + |m| allows us to reduce the amount of multiplications done.

This trick could enable checking visibility against squares directly (as opposed to checking each edge) which would remove one of the bigger memory problems (no more splitting required). It might also be faster from a pure calculation standpoint, but I don't know enough about the frequencies of the code paths to state this with any certainty.

Changed 20 months ago by fsincos

Attachment: geometry2.patch added

Rough first version of the mentioned CheckVisibility? trick.

comment:25 Changed 19 months ago by Itms

Keywords: review removed

I really like both of the patches!

I have a general comment about the changes: it would be nice to better document them and add as much geometrical explanations as possible.

For instance, could you rename CrossGTZero to something that would be more meaningful? The comment already explains what it computes but it could use a description in terms of geometry. I can't figure out what it does, given two vectors on a piece of paper

Another example: you changed the TestRaySquare documentation: now it's impossible to determine where the separating axis theorem is used because it's hidden in the optimized computations. You should add details.

As a rule of thumb, each time you delete a comment, or when you add a non-trivial optimization, please write some documentation :)

Keep up the good work!

Changed 19 months ago by fsincos

Attachment: geometry1.patch added

Small geometry cleanup to enable further optimizations.

Changed 19 months ago by fsincos

Attachment: geometry1.2.patch added

Initial implementation of replacing CheckVisibility? with a TestRaySquare?-based version.

comment:26 Changed 18 months ago by wraitii

In 18349:

Reuse vectors in the short-range pathfinder, making SplitAAEdges much faster and reducing memory fragmentation substantially. Refs #3588

Changed 18 months ago by wraitii

Attachment: UnorderedSetHierPath.patch added

comment:27 Changed 18 months ago by wraitii

Attached patch uses a hash table to store reachable regions, which from testing should be around 2 to 3 times as fast. I had a checksum change but I need to check with a clean SVN.

comment:28 Changed 14 months ago by Itms

Keywords: patch rfc added
Milestone: Alpha 21Alpha 22

Grmbl, keywords everyone!

The latest patch from wraitii does change the simulation state, and it does not improve at all the performance as far as I could see (see attached profiling graph, for a large game). So IMO it should be forgotten.

I will review fsincos patches after the release of A21.

Changed 14 months ago by Itms

Attachment: profile_unordered.png added

comment:29 Changed 12 months ago by elexis

Milestone: Alpha 22Work In Progress

Moving to the new WIP milestone.

comment:30 Changed 12 months ago by wraitii

Keywords: patch rfc removed
Milestone: Work In ProgressBacklog

Backlogging, no real progress or timeline and I have specific patches in other tickets (namely #4327, #4340, #4324).

Note: See TracTickets for help on using tickets.