Opened 11 years ago

Last modified 5 years ago

#2179 new task

Optimization of "gui sim update" needed when large groups of units are selected.

Reported by: RefinedCode Owned by:
Priority: Should Have Milestone: Backlog
Component: UI – In-game Keywords:
Cc: Patch:

Description (last modified by RefinedCode)

If you open up the in game profiler, and then select a large number of units you can see a huge increase to "gui sim update". In my case selecting 100 Longbowmen causes the msec/turn to increase from ~1 to ~48 which is more than a third of what the rendering takes. This causes a very noticeable choppiness in game.

I measured execution time of various functions and determined the cause of this lag stems from the many calls to GetEntityState from EntitySelection.prototype.update in selection.js. Every turn this function is called for every one of the selected units.

There is a much larger performance hit when the jsdebugger is enabled but when I took my measurements when it was off, so it would be a pretty good performance increase if the cause of this lag is fixed.

Attachments (6)

unit_commands.js.patch (1.1 KB ) - added by RefinedCode 11 years ago.
Small optimization of one of the functions called during the selection panel GUI update.
moveFormation.diff (2.9 KB ) - added by mimo 11 years ago.
selectionCache.diff (5.0 KB ) - added by mimo 10 years ago.
Simplify the patch keeping only the small part which had nearly all the time saving
extendedEntityState.diff (13.1 KB ) - added by mimo 10 years ago.
resourceCarried.diff (1.8 KB ) - added by mimo 10 years ago.
js-opcodes-loops.txt (6.6 KB ) - added by agentx 10 years ago.

Download all attachments as: .zip

Change History (42)

comment:1 by RefinedCode, 11 years ago

Description: modified (diff)

comment:2 by mimo, 11 years ago

I suspect that the call to cmpRangeManager.GetElevationAdaptedRange is responsible for (at least part of) that as it does quite a lot of computations. Can you check by commenting these calls or testing with 100 spearmen.

If that's true, that's really wasted time because it seems that the variable elevationAdaptedRange is used by the gui only when one entity is selected (in selection_details.js). So a simple optimization would be to compute it only when the selection contains only one entity, and take the default range otherwise.

Futhermore, for buildings, this elevationAdaptedRange should not vary and would better be cached (maybe in BuildingAI).

comment:3 by RefinedCode, 11 years ago

I don't see any noticeable difference between 100 longbowmen with the elevation range calculation turned off. Thanks for the idea, I will look around more for what is causing this. It will help me get familiar with the code.

Last edited 11 years ago by RefinedCode (previous) (diff)

by RefinedCode, 11 years ago

Attachment: unit_commands.js.patch added

Small optimization of one of the functions called during the selection panel GUI update.

comment:4 by RefinedCode, 11 years ago

I attached a small improvement in performance for the GUI update. When it goes through each entity to find what they can build, it makes a huge array of 99% duplicates. Going through this array to remove duplicates every turn caused a tiny delay.

Still, this is just a small improvement. I don't know how it can be fixed, but the biggie remains to be GetEntityState.

comment:5 by historic_bruno, 11 years ago

This has been discussed before and I think what would help is limiting the number of calls to e.g. GetEntityState by passing a list of entity IDs to a new GetEntitiesState (instead of one call per entity, since there is some overhead copying data between contexts) and maybe also passing in a list of desired components to limit the amount of work that's done and the amount of data returned by GuiInterface.GetEntityState. The latter would complicate the GUI-side caching, so I think we should try the "batching" idea and see what difference it makes.

comment:6 by historic_bruno, 11 years ago

Hmm I tried the batching, but it didn't make much difference :( I guess we need a different approach.

comment:7 by mimo, 11 years ago

Looking a bit into it, CanMoveEntsIntoFormation takes quite some time. When testing with 40 units, the "gui sim update" is about 30 ms and can be reduced to 25 by either optimising CanMoveEntsIntoFormation into simulation/helpers/Commands.js, or caching it in the gui (as it only depends on the selection, there is no use to recompute it every turn). And the gain should be bigger when more units are selected. I attach a patch with the two approaches implemented.

by mimo, 11 years ago

Attachment: moveFormation.diff added

comment:8 by mimo, 10 years ago

I had some time to look a bit more on it, and we can easily same time by two ways

1) we can split GetEntityState in two parts, one containing the most used or the fast to compute variables (GetEntityState) and one containing the less used and slow to compute (GetExtendedEntityState). This is done in the attached patch extendedEntityState.diff

2) we may cache more quantities on the gui side, specially those quantities which depend only on the selected entities. I've already mentionned the possibility for the selection to be in a formation in comment #7, but there are others we can cache. This is done in the attached patch selectionCache.diff

Testing a bit using F11 and only selecting units (no actions), I get for the "gui sim update" timing (in ms)

# units 0 1 8 50
svn 2 8 15 32
extendedEntityState.diff 2 8 13 22
selectionCache.diff 2 8 13 28
both patches 2 7 12 18

comment:9 by mimo, 10 years ago

Keywords: patch review added
Summary: Optimization of "gui sim update" needed when large groups of units are selected.[PATCH] Optimization of "gui sim update" needed when large groups of units are selected.

comment:10 by historic_bruno, 10 years ago

Are those times in ms/frame? What are the timings with ~200 entities selected? Since that is the worst case.

comment:11 by mimo, 10 years ago

Timings are ms/turn as the original report

I've just tested with 100 units, svn = 48 ms/turn, both-patches-applied = 23 ms/turn

comment:12 by mimo, 10 years ago

I've now tested with 200 units (Huge combat demo map). The timings fluctuate a bit, but typical averages are:

  • svn 38 ms/frame or 90 ms/turn
  • selectioncache patch 34 ms/frame or 80 ms/turn
  • extendedEntityState patch 21 ms/frame or 54 ms/turn

The two patchs are completely independant, so that the gain should be additive.

Last edited 10 years ago by fabio (previous) (diff)

comment:13 by historic_bruno, 10 years ago

Milestone: BacklogAlpha 15

by mimo, 10 years ago

Attachment: selectionCache.diff added

Simplify the patch keeping only the small part which had nearly all the time saving

comment:14 by mimo, 10 years ago

Update the two patches (corresponding to the timings of comment 12) to a recent svn

comment:15 by sanderd17, 10 years ago

I like both patches, do you think it can be committed for A15?

comment:16 by mimo, 10 years ago

Yes depending on when is the feature freeze. The selectionCache patch is certainly fine without change, but the extendedEntityState may need to be adapted to latest svn if more variables have been added to EntityState in the last two weeks.

comment:17 by mimo, 10 years ago

First patch (selectioncache.diff) is commited in #14296.

In fact, the gain is bigger that advertized before as previous versions of the patch were still using non-cached quantities. In Huge combat demo, selecting 200 units, we go from about 90 ms/turn to 70 ms/turn with this patch.

by mimo, 10 years ago

Attachment: extendedEntityState.diff added

comment:18 by mimo, 10 years ago

I've just attached an updated version of the second patch for this ticket. With it, the "gui sim update" timing will go from 70 ms/turn to about 35 ms/turn. I'll do more tests and commit it in one day or two if no comments.

comment:19 by mimo, 10 years ago

Milestone: Alpha 15Backlog

The second patch has been commited in #14306. With these two patches, the gui sim update with 200 selected units goes from 90 ms/turn to 35 ms/turn. As it is still too much, I do not close this ticket, but put it to backlog.

comment:20 by mimo, 10 years ago

Keywords: review removed

by mimo, 10 years ago

Attachment: resourceCarried.diff added

comment:21 by mimo, 10 years ago

I've just noticed that these optimisations have been lost due to some commits which do not respect the separation between GetEntityState and GetExtendedEntityState (see comment 8 for the splitting). Going back to combat demo huge, and selecting 200 units without doing anything, the "gui sim update" timing is now about 110 ms/turn (same PC as 10 months ago). With the attached patch, it can be decreased to 50 ms/turn. If there are no objection, I propose to commit it for A17.

comment:22 by Stan, 10 years ago

Keywords: review added

comment:23 by Stan, 10 years ago

This guy reported it (In one of his posts) http://www.wildfiregames.com/forum/index.php?showtopic=19186#entry298908

in reply to:  21 comment:24 by leper, 10 years ago

Replying to mimo:

If there are no objection, I propose to commit it for A17.

Looks good, commit it.

comment:25 by mimo, 10 years ago

In 15836:

improve gui performance, refs #2179 comment 21

comment:26 by mimo, 10 years ago

Keywords: patch review removed

comment:27 by mimo, 10 years ago

Summary: [PATCH] Optimization of "gui sim update" needed when large groups of units are selected.Optimization of "gui sim update" needed when large groups of units are selected.

comment:28 by mimo, 10 years ago

All buildable entities for all units are transfered (through the EntityState) to the gui at each turn, but they are used only when the selection change in order to build the list of buildables entities. That's a waste of time, all the more than all units have nearly the same buildables, so that most of the transfered info is dupplicated n times if we have n units. We can avoid this waste by having a function in GuiInterface building the list of buildable entities and only transfering the non-dupplicated info, and called only when the selection change. Testing this on Combat_Demo_huge with 200 selected units doing nothing, the gui_sim_update timing goes from 50 ms/turn to 45 ms/turn.

comment:29 by mimo, 10 years ago

In 15889:

improve gui performance, refs #2179 comment 28

comment:30 by Radagast, 10 years ago

Keep it up. Do you think using Sets may improve performance further as Sets not allow duplicates (not sure about Javascript set internals but perhaps it's constant time if they simply overwrite a duplicate entry instead of checking for duplicates)?

This way we might save the cost of indexOf() === -1 which is expensive as it is checked within the double loop (=> n*m*m) in GetAllBuildableEntities() in the GuiInterface. Also === check is expensive in comparison to != check (as all bits and type have to be checked for equality while for != we can terminate once we have found one non-equal bit .. anyway, that's lowlevel, no idea if the javascript compiler optimizes this automatically .. but probably not).

Last edited 10 years ago by Radagast (previous) (diff)

comment:31 by mimo, 10 years ago

GetAllBuildableEntities is only called when the selection changes, so these optimisations won't have visible effect, but I agree that using Sets would give a cleaner code.

in reply to:  30 comment:32 by historic_bruno, 10 years ago

Replying to Radagast:

Also === check is expensive in comparison to != check (as all bits and type have to be checked for equality while for != we can terminate once we have found one non-equal bit .. anyway, that's lowlevel, no idea if the javascript compiler optimizes this automatically .. but probably not).

I can't imagine any code working that way, because the generated instructions will only be operating on bytes at the hardware level, not bits, and the biggest cost associated with that is the memory accesses, not the arithmetic. Any software doing a bit by bit comparison is potentially doing 7x more comparisons (memory accesses, arithmetic and jumps) as it would if the entire byte was compared with a single instruction. That's the very opposite of an optimization :)

Type+value equality isn't necessary there because indexOf should always return an integer, but I doubt it will have a noticeable performance impact, feel free to verify that.

by agentx, 10 years ago

Attachment: js-opcodes-loops.txt added

comment:33 by agentx, 10 years ago

If you really go for speed use an object instead of an array, so the test for existence doesn't involve indexOf(). Although, that really matters only above of a few hundred items. What is really slow is the use of for in and for of loops. In both cases JS goes over the iterator protocol which is factor ~70 slower than a simple while/for loop.

I've attached the output of the JIT Inspector for all three loop types. One can just look at the amount of opcodes SM produces to get the picture.

Last edited 10 years ago by agentx (previous) (diff)

comment:34 by Radagast, 10 years ago

@historic: I agree on your byte vs. bit. Though that's not really relevant in the context of what I said, and that was that an equality check does cost more than an unequality check. Or do you doubt this too?

I agree on the type check too, and what mimo said of course also makes these optimizations even less significant.

thanks for the discussion.

@agentx: That's a shocker as we use these special for-in loops a lot. (In addition to the performance problem it also makes our code less portable but I'm not sure that really matters for JavaScript, we are pretty much fixed to it anyway despite some recent efforts of comrades to check if we can replace it with python which surely would be no easy undertaking.)

comment:35 by agentx, 10 years ago

@Radagast

Well, Python is interpreted and firstly slow bytecode. And you open a whole can of worms if you start thinking of embedded pypy, numba, etc. Also, there is a strong competition between Mozilla and Google to make their JIT compilers faster with every iteration, (see arewefastyet.com). OTOH, these for in and for of loops can be easily replaced without understanding the code at all. If only one would know which loops are worth the effort...

PS: Probably the only advantage Python has is the availability of strong AI and ML libs. So, from my PoV I'm not strictly against it. But in terms of speed it would be a downgrade.

comment:36 by Imarok, 5 years ago

Component: UI & SimulationIn-game UI

Move tickets to In-game UI as UI & Simulation got some sub components.

Note: See TracTickets for help on using tickets.