Opened 11 years ago
Last modified 5 years ago
#2179 new task
Optimization of "gui sim update" needed when large groups of units are selected.
Reported by: | RefinedCode | Owned by: | |
---|---|---|---|
Priority: | Should Have | Milestone: | Backlog |
Component: | UI – In-game | Keywords: | |
Cc: | Patch: |
Description (last modified by )
If you open up the in game profiler, and then select a large number of units you can see a huge increase to "gui sim update". In my case selecting 100 Longbowmen causes the msec/turn to increase from ~1 to ~48 which is more than a third of what the rendering takes. This causes a very noticeable choppiness in game.
I measured execution time of various functions and determined the cause of this lag stems from the many calls to GetEntityState from EntitySelection.prototype.update in selection.js. Every turn this function is called for every one of the selected units.
There is a much larger performance hit when the jsdebugger is enabled but when I took my measurements when it was off, so it would be a pretty good performance increase if the cause of this lag is fixed.
Attachments (6)
Change History (42)
comment:1 by , 11 years ago
Description: | modified (diff) |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
I don't see any noticeable difference between 100 longbowmen with the elevation range calculation turned off. Thanks for the idea, I will look around more for what is causing this. It will help me get familiar with the code.
by , 11 years ago
Attachment: | unit_commands.js.patch added |
---|
Small optimization of one of the functions called during the selection panel GUI update.
comment:4 by , 11 years ago
I attached a small improvement in performance for the GUI update. When it goes through each entity to find what they can build, it makes a huge array of 99% duplicates. Going through this array to remove duplicates every turn caused a tiny delay.
Still, this is just a small improvement. I don't know how it can be fixed, but the biggie remains to be GetEntityState.
comment:5 by , 11 years ago
This has been discussed before and I think what would help is limiting the number of calls to e.g. GetEntityState
by passing a list of entity IDs to a new GetEntitiesState
(instead of one call per entity, since there is some overhead copying data between contexts) and maybe also passing in a list of desired components to limit the amount of work that's done and the amount of data returned by GuiInterface.GetEntityState
. The latter would complicate the GUI-side caching, so I think we should try the "batching" idea and see what difference it makes.
comment:6 by , 11 years ago
Hmm I tried the batching, but it didn't make much difference :( I guess we need a different approach.
comment:7 by , 11 years ago
Looking a bit into it, CanMoveEntsIntoFormation takes quite some time. When testing with 40 units, the "gui sim update" is about 30 ms and can be reduced to 25 by either optimising CanMoveEntsIntoFormation into simulation/helpers/Commands.js, or caching it in the gui (as it only depends on the selection, there is no use to recompute it every turn). And the gain should be bigger when more units are selected. I attach a patch with the two approaches implemented.
by , 11 years ago
Attachment: | moveFormation.diff added |
---|
comment:8 by , 11 years ago
I had some time to look a bit more on it, and we can easily same time by two ways
1) we can split GetEntityState in two parts, one containing the most used or the fast to compute variables (GetEntityState) and one containing the less used and slow to compute (GetExtendedEntityState). This is done in the attached patch extendedEntityState.diff
2) we may cache more quantities on the gui side, specially those quantities which depend only on the selected entities. I've already mentionned the possibility for the selection to be in a formation in comment #7, but there are others we can cache. This is done in the attached patch selectionCache.diff
Testing a bit using F11 and only selecting units (no actions), I get for the "gui sim update" timing (in ms)
# units | 0 | 1 | 8 | 50 |
---|---|---|---|---|
svn | 2 | 8 | 15 | 32 |
extendedEntityState.diff | 2 | 8 | 13 | 22 |
selectionCache.diff | 2 | 8 | 13 | 28 |
both patches | 2 | 7 | 12 | 18 |
comment:9 by , 11 years ago
Keywords: | patch review added |
---|---|
Summary: | Optimization of "gui sim update" needed when large groups of units are selected. → [PATCH] Optimization of "gui sim update" needed when large groups of units are selected. |
comment:10 by , 11 years ago
Are those times in ms/frame? What are the timings with ~200 entities selected? Since that is the worst case.
comment:11 by , 11 years ago
Timings are ms/turn as the original report
I've just tested with 100 units, svn = 48 ms/turn, both-patches-applied = 23 ms/turn
comment:12 by , 10 years ago
I've now tested with 200 units (Huge combat demo map). The timings fluctuate a bit, but typical averages are
svn 38 ms/frame or 90 ms/turn selectioncache patch 34 ms/frame or 80 ms/turn extendedEntityState patch 21 ms/frame or 54 ms/turn
The two patchs are completely independant, so that the gain should be additive
comment:13 by , 10 years ago
Milestone: | Backlog → Alpha 15 |
---|
by , 10 years ago
Attachment: | selectionCache.diff added |
---|
Simplify the patch keeping only the small part which had nearly all the time saving
comment:14 by , 10 years ago
Update the two patches (corresponding to the timings of comment 12) to a recent svn
comment:16 by , 10 years ago
Yes depending on when is the feature freeze. The selectionCache patch is certainly fine without change, but the extendedEntityState may need to be adapted to latest svn if more variables have been added to EntityState in the last two weeks.
comment:17 by , 10 years ago
First patch (selectioncache.diff) is commited in #14296.
In fact, the gain is bigger that advertized before as previous versions of the patch were still using non-cached quantities. In Huge combat demo, selecting 200 units, we go from about 90 ms/turn to 70 ms/turn with this patch.
by , 10 years ago
Attachment: | extendedEntityState.diff added |
---|
comment:18 by , 10 years ago
I've just attached an updated version of the second patch for this ticket. With it, the "gui sim update" timing will go from 70 ms/turn to about 35 ms/turn. I'll do more tests and commit it in one day or two if no comments.
comment:19 by , 10 years ago
Milestone: | Alpha 15 → Backlog |
---|
The second patch has been commited in #14306. With these two patches, the gui sim update with 200 selected units goes from 90 ms/turn to 35 ms/turn. As it is still too much, I do not close this ticket, but put it to backlog.
comment:20 by , 10 years ago
Keywords: | review removed |
---|
by , 10 years ago
Attachment: | resourceCarried.diff added |
---|
follow-up: 24 comment:21 by , 10 years ago
I've just noticed that these optimisations have been lost due to some commits which do not respect the separation between GetEntityState and GetExtendedEntityState (see comment 8 for the splitting). Going back to combat demo huge, and selecting 200 units without doing anything, the "gui sim update" timing is now about 110 ms/turn (same PC as 10 months ago). With the attached patch, it can be decreased to 50 ms/turn. If there are no objection, I propose to commit it for A17.
comment:22 by , 10 years ago
Keywords: | review added |
---|
comment:23 by , 10 years ago
This guy reported it (In one of his posts) http://www.wildfiregames.com/forum/index.php?showtopic=19186#entry298908
comment:24 by , 10 years ago
comment:26 by , 10 years ago
Keywords: | patch review removed |
---|
comment:27 by , 10 years ago
Summary: | [PATCH] Optimization of "gui sim update" needed when large groups of units are selected. → Optimization of "gui sim update" needed when large groups of units are selected. |
---|
comment:28 by , 10 years ago
All buildable entities for all units are transfered (through the EntityState) to the gui at each turn, but they are used only when the selection change in order to build the list of buildables entities. That's a waste of time, all the more than all units have nearly the same buildables, so that most of the transfered info is dupplicated n times if we have n units. We can avoid this waste by having a function in GuiInterface building the list of buildable entities and only transfering the non-dupplicated info, and called only when the selection change. Testing this on Combat_Demo_huge with 200 selected units doing nothing, the gui_sim_update timing goes from 50 ms/turn to 45 ms/turn.
follow-up: 32 comment:30 by , 10 years ago
Keep it up. Do you think using Sets may improve performance further as Sets not allow duplicates (not sure about Javascript set internals but perhaps it's constant time if they simply overwrite a duplicate entry instead of checking for duplicates)?
This way we might save the cost of indexOf() === -1 which is expensive as it is checked within the double loop (=> n*m*m) in GetAllBuildableEntities() in the GuiInterface. Also === check is expensive in comparison to != check (as all bits and type have to be checked for equality while for != we can terminate once we have found one non-equal bit .. anyway, that's lowlevel, no idea if the javascript compiler optimizes this automatically .. but probably not).
comment:31 by , 10 years ago
GetAllBuildableEntities is only called when the selection changes, so these optimisations won't have visible effect, but I agree that using Sets would give a cleaner code.
comment:32 by , 10 years ago
Replying to Radagast:
Also === check is expensive in comparison to != check (as all bits and type have to be checked for equality while for != we can terminate once we have found one non-equal bit .. anyway, that's lowlevel, no idea if the javascript compiler optimizes this automatically .. but probably not).
I can't imagine any code working that way, because the generated instructions will only be operating on bytes at the hardware level, not bits, and the biggest cost associated with that is the memory accesses, not the arithmetic. Any software doing a bit by bit comparison is potentially doing 7x more comparisons (memory accesses, arithmetic and jumps) as it would if the entire byte was compared with a single instruction. That's the very opposite of an optimization :)
Type+value equality isn't necessary there because indexOf
should always return an integer, but I doubt it will have a noticeable performance impact, feel free to verify that.
by , 10 years ago
Attachment: | js-opcodes-loops.txt added |
---|
comment:33 by , 10 years ago
If you really go for speed use an object instead of an array, so the test for existence doesn't involve indexOf(). Although, that really matters only above of a few hundred items. What is really slow is the use of for in and for of loops. In both cases JS goes over the iterator protocol which is factor ~70 slower than a simple while/for loop.
I've attached the output of the JIT Inspector for all three loop types. One can just look at the amount of opcodes SM produces to get the picture.
comment:34 by , 10 years ago
@historic: I agree on your byte vs. bit. Though that's not really relevant in the context of what I said, and that was that an equality check does cost more than an unequality check. Or do you doubt this too?
I agree on the type check too, and what mimo said of course also makes these optimizations even less significant.
thanks for the discussion.
@agentx: That's a shocker as we use these special for-in loops a lot. (In addition to the performance problem it also makes our code less portable but I'm not sure that really matters for JavaScript, we are pretty much fixed to it anyway despite some recent efforts of comrades to check if we can replace it with python which surely would be no easy undertaking.)
comment:35 by , 10 years ago
@Radagast
Well, Python is interpreted and firstly slow bytecode. And you open a whole can of worms if you start thinking of embedded pypy, numba, etc. Also, there is a strong competition between Mozilla and Google to make their JIT compilers faster with every iteration, (see arewefastyet.com). OTOH, these for in and for of loops can be easily replaced without understanding the code at all. If only one would know which loops are worth the effort...
PS: Probably the only advantage Python has is the availability of strong AI and ML libs. So, from my PoV I'm not strictly against it. But in terms of speed it would be a downgrade.
comment:36 by , 5 years ago
Component: | UI & Simulation → In-game UI |
---|
Move tickets to In-game UI
as UI & Simulation
got some sub components.
I suspect that the call to cmpRangeManager.GetElevationAdaptedRange is responsible for (at least part of) that as it does quite a lot of computations. Can you check by commenting these calls or testing with 100 spearmen.
If that's true, that's really wasted time because it seems that the variable elevationAdaptedRange is used by the gui only when one entity is selected (in selection_details.js). So a simple optimization would be to compute it only when the selection contains only one entity, and take the default range otherwise.
Futhermore, for buildings, this elevationAdaptedRange should not vary and would better be cached (maybe in BuildingAI).