Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#1707 closed enhancement (fixed)

[PATCH] CCmpRangeManager optimisation

Reported by: Jonathan Waller Owned by:
Priority: Should Have Milestone: Alpha 15
Component: UI & Simulation Keywords:
Cc: Patch:

Description (last modified by wraitii)

This ticket is for some optimisation to CCmpRangeManager, converting from a std::map to std::vector.

The many patches reflect the many approaches tried, none having worked perfectly.

Attachments (12)

range_manager_optimisation.diff (4.6 KB ) - added by Jonathan Waller 10 years ago.
WIP patch, currently works but is done in parallel and doesn't serialize so won't survive saving.
rangeManagerOptimizations.patch (18.3 KB ) - added by wraitii 10 years ago.
rangemanager_vectorization_binsearch.patch (16.1 KB ) - added by mrf 10 years ago.
spatial_optimization.patch (2.7 KB ) - added by mrf 10 years ago.
entitymap.patch (10.7 KB ) - added by mrf 9 years ago.
Now also with serialization.
entitymap.diff (10.8 KB ) - added by mrf 9 years ago.
RangeManagerOpt.patch (13.2 KB ) - added by wraitii 9 years ago.
EntityMap.2.h (3.9 KB ) - added by wraitii 9 years ago.
Updated EntityMap with custom iterator that skips invalid entities.
RangeManagerOpt.2.patch (13.0 KB ) - added by wraitii 9 years ago.
EntityMap.3.h (3.9 KB ) - added by wraitii 9 years ago.
RangeManagerEntityMap.patch (41.8 KB ) - added by Jorma Rebane 9 years ago.
Working application of the EntityMap. Debugged and ~5x performance improvement to ExecuteActiveQuery.
ParallelFor_rangemanager.patch (10.0 KB ) - added by Jorma Rebane 9 years ago.
Implemented Parallel For on pthreads (scary), improves rangemanager performance on multicore platforms.

Download all attachments as: .zip

Change History (69)

by Jonathan Waller, 10 years ago

WIP patch, currently works but is done in parallel and doesn't serialize so won't survive saving.

comment:1 by Jonathan Waller, 10 years ago

Keywords: simple added

Some comments on this.

The approach used in the patch is poor. Increasing entity id's will basically make this a memory leak. If people start using lots of temporary entities things could end up being pretty bad. Performance is critical however so I think a std::vector is still the way to go. A solution would be creating a map from entity ids to CCmpRangeManager ids, CCmpRangeManager ids would be reused. This would be fast enough because the majority of lookups are from internal queries where the internal id can be used.

An alternative would be to store all of the information in the spatial data structure rather than needing a separate lookup by id. I haven't looked at this in detail however.

The other main cost is deduplicating the spatial query results. My proposed solution for this is to set a "small unit" parameter, any units less than this size are not duplicated. Range queries increase the range by this amount, this will increase the units found but should still increase performance significantly. Then only large units need deduplicating and due to the far smaller number of large units this should be pretty cheap.

comment:2 by Kieran P, 10 years ago

Keywords: patch added
Milestone: BacklogAlpha 13
Priority: Should HaveMust Have

comment:3 by wraitii, 10 years ago

Tried this patch: on Combat Demo (Huge), this speeds up the ExecuteActiveQueries by a factor of at least 3 (10 ms against 30/40, basically). The approach doesn't seem that poor to me, it is inefficient memory-wise but I don't think it would be too bad (even with tons of entities, like 100,000, this would take less than 10Mb of RAM which I think is OK given that speed is crucial here. But then again I don't care too much about memory…). Furthermore I think (not too sure about those things) we could make the "EntityData" structure more memory-efficient by using one u8 and setting binary flags instead of using 3 u8/i8 as booleans (if I read it correctly this struct currently uses 64(perhaps more) bits of data, could be reduced to 48 (owner is 3bits + "-1", I think, at most, so that's 4, retainInFog 1, inWorld 1), if it's needed.
I'm not too sure how we could efficiently make it not use too much memory otherwise (temporary entities being a bad case indeed), your map solution could work. Not sure I understand what you mean in the 2nd paragraph.

Looking at Quantumstate's final paragraph, it appears to me that we could remove the de-duplicating altogether. I don't think having a few doubles is too bad (correct me if I'm wrong though. In that case your solution appears better). This makes it indeed significantly faster (on Combat Demo Huge it goes from ≈13 ms to "GetNear" to 2.7ms.) It might also be removed for the Pathfinder itself (the TestLine functions). I think basically this breaks nothing and makes things faster, though slightly less clean.

Finally still on the SpatialSubdivision, if I understand it correctly it's currently fixed size, so we could avoid using ::at() and use [] instead, which is ever so slightly faster (might be slightly unsafe if we ever change this though).

(note: all those numbers were with the JS profiler activated).

Last edited 10 years ago by wraitii (previous) (diff)

comment:4 by B. Guns, 10 years ago

Sbte said he'd also look at this.

comment:5 by wraitii, 10 years ago

My patch builds on Quantumstate's. Same approach but I replaced every instance of the map with the vector. Also probably made it able to serialize, haven't really tested. This is mostly for demonstration purposes, as indeed something probably should be done for the fact that this constitutes a bad memory leak (and also this patch is probably buggy in various ways.) Also updates the "GetNear" functions to not sort the entities, which is faster.

edit: haven't tested on something other than COmbat Demo, could be buggy if entities are created or things like that.

(A problem with using a vector is that we can't too easily check if an entity is supposed to have been destroyed. Since it will never be "deleted" as that would throw the all thing down.)

Last edited 10 years ago by wraitii (previous) (diff)

comment:6 by Kieran P, 10 years ago

Milestone: Alpha 13Alpha 14

by wraitii, 10 years ago

comment:7 by wraitii, 10 years ago

Uploaded a (slightly) more sane version that completely removes the map and uses only a vector. It stores pointers to entity data so I check for destruction by checking if the pointer is "null". This is probably really not safe, and probably annoying to serialize/deserialize, but at least starting a game with that patch should work properly, so the speed improvement can be tested.

comment:8 by sbte, 10 years ago

The only other thing that I could come up with is using a unordered_map with this as hash function:

class Hasher{
public:
    size_t operator()(entity_id_t id) const{
        return id;
    }
};

This is still slower than a vector on Combat Demo (Huge), but might be faster in case a lot of units are already removed from the game (killed, used up). But I'm not sure about that. And the change to an unordered_map might also be safer. wraitii, maybe you can try it out together with your other optimizations?

by mrf, 10 years ago

Attachment: spatial_optimization.patch added

comment:9 by mrf, 10 years ago

This patch avoids the memory-leak. It stores the entity data with ids in a vector and does binary search to retrieve the data of a given entity. I have included a custom binary search that tries to narrow down the search range by incorporating the knowledge that we are searching a sorted set of integers (i.e. no duplicates and values strictly increasing).

Additionally I have changed the SpatialSubdivision code to check whether the search range overlaps with a division before returning the contained entities as part of the result. This does not help a lot in most cases though, because the search range is typically smaller then the size of a division cell.

comment:10 by historic_bruno, 10 years ago

Keywords: review added
Summary: CCmpRangeManager optimisation[PATCH] CCmpRangeManager optimisation

comment:11 by tuan kuranes, 9 years ago

Tried the same kind of optimisation, but with Two std::vector<> and got even better perf results, a full std::vector (push_back) for iteration AND a sparse std::vector (with holes) indexed by entity_id.

Just wanted to point out it gives even better perf, solving the std::map::find perf and the iteration perf (which changes the log(n) std::map::find calls into a simple pointer adressing operation which are called by all kind of *Query methods.)

Not much memory consumption (hardly a single mipmap of a normal texture...) but great perf gains on both iteration and find.

(see #1860 for an idea of a template container for entity_id related data, allowing fast find and fast iteration)

Last edited 9 years ago by historic_bruno (previous) (diff)

comment:12 by scroogie, 9 years ago

The last two patches sound like good solutions. What would be the next steps?

Last edited 9 years ago by scroogie (previous) (diff)

comment:13 by wraitii, 9 years ago

I think Tuan Kuranes' optimizations with my changes to use GetInRangeUnsorted or something would be optimal, but we need Kuan to post his patch.

comment:14 by mrf, 9 years ago

Not so sure about this unsorted: the set_difference operations used in the active queries require a sorted set. One option would be to sort the entities at the end of PerformQuery, i.e. after filtering out the non-relevant entities. However, I have doubts that sorting a few items fewer will have a positive impact on performance.

About next steps:

  • creating an entity-map container sounds like a nice idea.

I prefer my approach over kuranes' because it avoids having a potential memory leak: if entities are deleted the vector indexed by entity_id cannot be reduced in size and it can become huge after some time. This seems risky to me. While it's true that the binary search in my approach is log(n) and not constant, the tweaked implementation is very quick and I think it won't be much slower.

  • the patch on SpatialSubdivision is independant from this discussion and I think it can be reviewed and hopefully then be used

comment:15 by tuan kuranes, 9 years ago

  • Don't stop on my account (works here, but problem with different hash replay from unpatched code ). If needed by perf, (find still showing up in profilers), I'll post another patch that will permits further perf test.
  • About memory, it's more bloat than leak, but it's there. (still even with up to a 65k entity up use of id, the "std::vector<EntityData* >" is still smaller than a small texture/mesh in memory). Anyway, if still raise concerns, some entity_id "reuse scheme" or "recycling" could be advised in order to avoid that particular memory bloat (Have yet to stumble on the unique id generation code).

comment:16 by wraitii, 9 years ago

I must say I tend to follow Kuranes here (and my patch did it too): a sparse vector seems to me like a smallish memory issue, particularly as speed is crucial. It it however slightly annoying to serialize, I believe.

mrf: are you actually sure that all of them require such sorting?

by mrf, 9 years ago

Attachment: entitymap.patch added

Now also with serialization.

comment:17 by mrf, 9 years ago

Ok guys, I have written such an entity-map-container using your lookup-vector approach. I tried to make it a (simplified) drop-in replacement for std::map<entity_id_t, T>. Let me know what you think!

Last edited 9 years ago by historic_bruno (previous) (diff)

comment:18 by tuan kuranes, 9 years ago

Nicely done, stl like, and reusable in #1860, perfect.

Remarks:

  • entity_id_t is unsigned so check like " >= 0" could be pruned. What I'm not sure about is difference_type checks. Are you're sure about difference_type beinh signed, doesn't the size() - 1 doesn't rather give you a std::numeric_limits<difference_type>::max() ? (disclaimer:I don't know about that one in stl spec. if not specified in spec, it might even be different between stl implementations and would need unit-tests on gcc/clang/vs just to be sure and future proof.)
  • would just add ::reserve, ::resize method just in case
  • Did you achieve to get no hash changes in replay when comparing this from a vanilla svn code run ? Or were I wrong trying to do that ?

by mrf, 9 years ago

Attachment: entitymap.diff added

comment:19 by mrf, 9 years ago

Thanks!

  • good catch, I've removed the redundant ">=0". Yes, according to the reference, difference_type is supposed to be signed.
  • the lookup map is much bigger and there ::reserve would not help much I guess. Also, ::resize does not seem to be meaningful in the context of a map.
  • Not sure what you mean, sorry. I would expect that the serialization is equivalent to the normal std::map version if that is what you were checking. It seemed to work for me but I don't know how to systematically test this.
Last edited 9 years ago by mrf (previous) (diff)

comment:20 by Jorma Rebane, 9 years ago

Nice work, mrf.

Some remarks:

This way you can completely avoid any over-resizing of the vectors or using the mLookup vector and updating its indices. Binary search would be the preferred choice here, since insert and erase performance would greatly increase. However, obviously enough the 'find' function would suffer a small performance penalty. If you use the iterative binary search, it should be minimal though.

  • You could add a few more comments to your functions. :) Try following the Doxygen style.

Either way, I like the direction of this patch! This will greatly improve our entity maps; but insert/erase should have less latency. If you can show however that using the mLookup vector has significantly better performance in insert/erase situations, be sure to post comparison data here. :)

comment:21 by mrf, 9 years ago

Thanks for your comment, RedFox.

  • inline and comments: will do.
  • return value of EntityMap::insert: this is just the interface of std::map, so I guess it makes sense to keep it like that
  • Binary search: this was discussed above. Actually I have expressed my preference for the binary search approach as well (see also my first patch with a binary search implementation tailored for unique integer-keys), but the majority seemed to prefer this approach with faster lookups.

comment:22 by tuan kuranes, 9 years ago

@RedFox: binary search and previously std::map [] accessor are fast, but still showing in profiler due to very, very heavy calls of those. (and even more if we use that template in #1860 too)

comment:23 by mrf, 9 years ago

@kuranes: if it is only about the speed of iteration (as in #1860), it does not matter whether you use a lookup map or search for find. It's just iteration is over a (sorted) vector.

comment:24 by Jorma Rebane, 9 years ago

I ran a quick performance test on the new EntityMap, in VC++2012 Release mode, no debugger attached, 100000 iterations:

	

    Starting perftest 100000 iterations...
     
    map< entity_id_t, int >:
        insert: 33.72ms
        find:   29.52ms
        erase:  28.49ms
        clear:  7.85ms
     
     
    EntityMap< int >:
        insert: 10.34ms
        find:   2.95ms
        erase:  13058.40ms
        clear:  0.00ms
     
     
    map< entity_id_t, string >:
        insert: 31.26ms
        find:   25.36ms
        erase:  24.87ms
        clear:  8.50ms
     
     
    EntityMap< string >:
        insert: 19.85ms
        find:   2.97ms
        erase:  71514.26ms
        clear:  0.32ms
     
     
    map< entity_id_t, vector<int> >:
        insert: 26.93ms
        find:   23.36ms
        erase:  18.91ms
        clear:  7.83ms
     
     
    EntityMap< vector<int> >:
        insert: 13.78ms
        find:   2.83ms
        erase:  35313.02ms
        clear:  0.19ms

Most cases are faster, but EntityMap::erase is ridiculously slow. The performance improvement would be huge, if we can just improve the erase() method. In its current state, we can't use it, because every erase operation would negate any performance gain we would have with find() and insert().

comment:25 by Jorma Rebane, 9 years ago

Added a true 'sparse' vector EntityMap based on mrf's EntityMap. I wouldn't be concerned with the entity map getting too big. A reasonable optimization might be to use a memory pool for the mapped value. This would remove any fear of the entity vector growing too big (because even 100,000 * 8 bytes is still pretty small).

At least it gave reasonable performance:

Starting perftest 100000 iterations...

map< entity_id_t, int >:
    insert: 35.66ms
    find:   31.19ms
    erase:  28.90ms
    clear:  8.87ms


map< entity_id_t, string >:
    insert: 34.77ms
    find:   33.48ms
    erase:  25.24ms
    clear:  8.21ms


map< entity_id_t, vector<int> >:
    insert: 33.39ms
    find:   27.72ms
    erase:  23.28ms
    clear:  7.66ms


EntityMap< int >:
    insert: 2.79ms
    find:   3.00ms
    erase:  0.35ms
    clear:  0.00ms


EntityMap< string >:
    insert: 15.78ms
    find:   2.40ms
    erase:  0.71ms
    clear:  0.37ms


EntityMap< vector<int> >:
    insert: 9.11ms
    find:   3.20ms
    erase:  0.51ms
    clear:  0.14ms
Last edited 9 years ago by Jorma Rebane (previous) (diff)

comment:26 by wraitii, 9 years ago

Uploaded a complete patch with

-RedFox's EntityMap.

-mrf's changes to CCmpRangeManager and SerializeTemplates

-my own changes to Spatial.h to make the range manager use the (new) unsorted version of GetNear, which is much faster and which works equivalently in the case of the range manager as far as I know.

I think this is ready for commit, and it should give a significant speed boost (the numbers I gave some time ago above probably still hold mostly true). On my computer the RangeManager "ExecuteActiveQueries" function goes from taking about 23% of game simulation time to about 3%, resulting in FPS going from 5 to 7. This is only my computer but I guess it's indicative.

edit: just need to check serialization does work.

Last edited 9 years ago by wraitii (previous) (diff)

comment:27 by mrf, 9 years ago

RedFox: nice evaluation! Unfortunately, your version breaks the iterators, because the data vector now contains invalid entries. We could add a custom iterator which skips the invalid entries, but of course that would significantly reduce the performance for iteration (which is important for #1860).

The way see it, there is no solution which can make all operations (insert, erase, find and iterate) fast, so the right compromise has to be found. As mentioned in the comment in the code, we assumed that the performance of modifications (in particular for erase), is less crucial then the performance for find and iterate. To check this assumption, I have run a short test run (playing Acropolis for some minutes) to get an idea of the magnitude of the number of execution for each operation in CCmpRangeManager:

  • find : 19036
  • insert: 4405
  • iteration (calls to begin): 236
  • erase: 53

I think these numbers suggest that kuranes idea (i.e. using the lookup map, where erase is the slowest operation) is on the right track. I suggest that, instead of sacrificing iteration performance, we could try to improve the performance of erase in this approach. A rather simple trick could be to use a "bulk" erase, i.e. to only mark erased entries as invalid and only to clean up the m_Data vector before iteration -- this would be kind of a compromise between RedFox's version and "my" version.

wraitii: did you check that the unsorted GetNear does not break the active queries? (cf. comment:14, set_difference expects a sorted range)

comment:28 by wraitii, 9 years ago

Ah, right, hadn't actually noticed your comment... It appeared to work on all maps I tried with this optimization, but I really can't be 100% sure.

I guess we could still remove the last lines (the calls to ret.erase(std::unique(…))) though, which would already be much faster, the sorting itself is probably not too slow. Would need to check.

Would it be nearly that slow to use a custom iterator? It seems to me like your test case is a little too peaceful: if you tried on Combat Demo(Huge) and had units fight, you'd probably have a lot more "erase" calls too. Not sure what calls "iteration" but I think testing on Acropolis only is insufficient (I'd recommend combat demo (huge), Gambia river, and something like Oasis).

comment:29 by scroogie, 9 years ago

wraiiti, did you perhaps upload the wrong patch? It does not contain the EntityMap itself, and mrf's changes seem to be missing as well. Also the comments to the new GetUnsorted methods is wrong ("Returns a sorted list of...").

by wraitii, 9 years ago

Attachment: RangeManagerOpt.patch added

comment:30 by wraitii, 9 years ago

I had forgotten to "svn add" the entity map, the comment is fixed, however I'm quite sure mrf's changes were in already, or am I mistaken here?

in reply to:  27 comment:31 by Jorma Rebane, 9 years ago

Replying to mrf:

RedFox: nice evaluation! Unfortunately, your version breaks the iterators, because the data vector now contains invalid entries. We could add a custom iterator which skips the invalid entries, but of course that would significantly reduce the performance for iteration (which is important for #1860).

It doesn't 'break' the iterators per se - leaving in 'invalid' entries is intentional in order to get the speed boost. Once we start reusing entity handles, this solution would be quite perfect :)

When we iterate over the entitymap, we can do 2 things to fix it: 1) Write a custom iterator (easy) that skips handles with '-1' (invalid). 2) Let the code itself handle it: if(it->first == -1) continue;

Another issue might be current simulation code that expects size = size - 1 after erase(), which doesn't happen right now. That is easy to fix. Just search for all references with your favorite IDE and all is well.

I don't see why we can't use this solution though - it's pretty much the best we've got so far. I don't really care about wasting a bit of memory either. This vector never grows too big :)

Last edited 9 years ago by Jorma Rebane (previous) (diff)

in reply to:  30 comment:32 by scroogie, 9 years ago

Replying to wraitii:

I had forgotten to "svn add" the entity map, the comment is fixed, however I'm quite sure mrf's changes were in already, or am I mistaken here?

Ah, sorry, I thought you meant that one: http://trac.wildfiregames.com/attachment/ticket/1707/spatial_optimization.patch

Don't know about the >=0 changes.

by wraitii, 9 years ago

Attachment: EntityMap.2.h added

Updated EntityMap with custom iterator that skips invalid entities.

comment:33 by scroogie, 9 years ago

Just as a question. Maybe I'm naive, but couldn't PerformQuery(), ExecuteQuery() and ExecuteActiveQueries() cope completely without the repetitive .find() calls if the spatial subdivision stored both id and pointer. The template for an Octree thats somewhere in Trac for example does that already. The RangeManager already saves all relevant data, but not in the spatial structure. The latter could even just use pointers to EntityData instead of the id, couldn't it?

I'm just saying that, because according to profiler data of a simple match, all other uses of the m_EntityData map together consume not even a tenth of the amount of resources that the ExecuteActiveQueries function takes (looking only at RangeManager). Maybe this is a special case, but in case this is true, it would perhaps make sense to eliminate the finds and concentrate on iteration speed instead.

comment:34 by mrf, 9 years ago

Ok, i did the test again, now playing Gambia River for some minutes:

  • find : 106086
  • insert: 12061
  • begin: 510
  • erase: 93

The iterations come from the LOS code in CCmpRangeManager.

From these numbers and also keeping in mind that the ComponentManager also requires fast iteration, I don't see a motivation for improving the erase performance on the cost of iteration performance. I would stick to the lookup-vector approach and consider bulk removal if erase proves to be too slow. One simple improvement to erase in my version would be to use the following code, which avoids touching the empty entries during index update:

inline void erase(iterator it)
{
	// update all lookup indicies
	m_Lookup[it->first] = -1;
	for(iterator jt = it + 1; jt != m_Data.end(); jt++)
	{
		m_Lookup[jt->first]--;
	}
	// erase from m_Data
	m_Data.erase(it); 
}

Anyway, I have some comments on wraitii's EntityMap.h:

  • begin should also skip invalid entries before the first valid one
  • size should return the number of valid entries (e.g. by manually keeping track of this). Also, empty should be equivalent to size()==0
  • I guess max_size does not make too much sense anymore -- ok, it has never made sense ;)

@scroogie: note that this is also suggested by quantumstate in comment:1. A problem with that might be that the spatial subdivision might store the entity data multiple times. Also, they data would have to be moved (physically, in the memory) every time the entity moves to a different subdivision cell.

comment:35 by wraitii, 9 years ago

mrf: Right, your comments about the entity map are spot on.

I'm not sure we can decide which of the approach is better unless we benchmark both of them on several maps. I think we ought to have two finished patches, one for each approach, so we can compare them. I'll update the entity map accordingly.

edit: about scroogie's proposition. I really haven't studied that possibility, but it sounds doable, so we probably should.

Last edited 9 years ago by wraitii (previous) (diff)

in reply to:  34 comment:36 by scroogie, 9 years ago

Replying to mrf:

@scroogie: note that this is also suggested by quantumstate in comment:1.

Okay, but there seems to be no further discussion, or what do you mean by that?

A problem with that might be that the spatial subdivision might store the entity data multiple times.

Only a pointer, just as now the ids are stored multiple times.

Also, they data would have to be moved (physically, in the memory) every time the entity moves to a different subdivision cell.

Just as the id now. I'm not proposing to drop the map, merely to concentrate on iteration performance for the map and replace the id mapping with pointers.

by wraitii, 9 years ago

Attachment: RangeManagerOpt.2.patch added

comment:37 by wraitii, 9 years ago

Alright, uploaded two things... I noticed an issue with the entity map: if the last element is made invalid, the iterator will crash because of bad memory access. This is bad. Thus:

-RangeManagerOpt.2 is an updated version that reverts iterator changes and makes the RangeManager check for validity in the loops. This is not too nice but works cleanly.

-EntityMap.3.h is an updated version of the entityMap that works with the original RangeMangerOpt.patch, which checks on "erase" if the reased element is the last, and if so removes it from m_Data so that the last element is always valid. It seems to work but it could be slow.

I've updated both cases to take into account mrf' observations.

Last edited 9 years ago by wraitii (previous) (diff)

by wraitii, 9 years ago

Attachment: EntityMap.3.h added

comment:38 by Kieran P, 9 years ago

Milestone: Alpha 14Alpha 15

comment:39 by Jorma Rebane, 9 years ago

Starting perftest 100000 iterations...

stdmap< int >:
    insert:   36.99ms
    find:     29.13ms
    iterate1: 1.94ms
    erase:    31.32ms
    iterate2: 2.11ms
    clear:    6.13ms


stdmap< string >:
    insert:   38.10ms
    find:     27.83ms
    iterate1: 2.70ms
    erase:    17.86ms
    iterate2: 2.42ms
    clear:    7.41ms


stdmap< vector<int> >:
    insert:   28.19ms
    find:     22.23ms
    iterate1: 1.88ms
    erase:    20.92ms
    iterate2: 2.67ms
    clear:    7.18ms


EntityMap< int >:
    insert:   4.91ms
    find:     2.38ms
    iterate1: 0.18ms
    erase:    2.49ms
    iterate2: 0.17ms
    clear:    0.53ms


EntityMap< string >:
    insert:   20.65ms
    find:     2.18ms
    iterate1: 0.32ms
    erase:    2.24ms
    iterate2: 0.26ms
    clear:    0.49ms


EntityMap< vector<int> >:
    insert:   11.43ms
    find:     2.32ms
    iterate1: 0.25ms
    erase:    2.64ms
    iterate2: 0.27ms
    clear:    0.27ms
Last edited 9 years ago by Jorma Rebane (previous) (diff)

comment:40 by historic_bruno, 9 years ago

Keywords: simple removed

comment:41 by Jorma Rebane, 9 years ago

Owner: set to Jorma Rebane
Status: newassigned

Alright, I optimized the SpatialSubdivision class and also included the EntityMap to drastically improve ExecuteActiveQueries performance. (that function is the main bottleneck in rangemanager).

The overall performance improvement can be measured in Combat Demo (Huge). On my laptop, the IDLE ExecuteActiveQueries time went down from 17-20ms per frame to a much better 2-3ms per frame.

I should note that once units start moving and getting closer, the ExecuteActiveQueries still shows up as a bottleneck. However, it is dwarfed by ComputeShortPath, so I suggest we concentrate on optimizing ComputeShortPath next.

comment:42 by Riemer, 9 years ago

Just posting here what I told on IRC as well, after applying "RangeManagerEntityMap.patch" on the latest GIT version, I get the following compile errors:

  • On CCmpRangeManager.cpp:24:42: #include "simulation2\system\EntityMap.h" does not exists (easy to fix, note the "\" instead of "/")
  • Big complaint at CCmpRangeManager.cpp:24:0 http://pastebin.com/JJc56kzt

For the second error I removed "typename" from "typedef typename K first_type" (and the second), as suggested on IRC.

This compiles without errors, but crashes on loading a map with: http://pastebin.com/zMtzciRB

I suppose the proposed fix for the second error is incorrect? (my c++ templating is a but rusty, so I really can't tell)

Last edited 9 years ago by Riemer (previous) (diff)

comment:43 by wraitii, 9 years ago

Redfox: this sounds good enough for me, combat demo huge is basically the case we'll never reach

in reply to:  42 comment:44 by Jorma Rebane, 9 years ago

Replying to Riemer:

For the second error I removed "typename" from "typedef typename K first_type" (and the second), as suggested on IRC.

This compiles without errors, but crashes on loading a map with: http://pastebin.com/zMtzciRB

I suppose the proposed fix for the second error is incorrect? (my c++ templating is a but rusty, so I really can't tell)

The fix is correct. The problem is with our own sub-optimal code. I have intentionally put a hardcoded limit to number of Entities that can exist in a Subdivision grid. Ideally it should always stay below 128. If it goes over that, it's because the subdivsion grid is too sparse - thus also suboptimal.

comment:45 by Yves, 9 years ago

This patch needs to be merged with the current trunk version. I've fixed the compile-error but then it crashed when running the tests and when trying to start a game. Maybe something is not compatible with Philip's latest change.

comment:46 by Jorma Rebane, 9 years ago

Merged with Philip's new ComponentCache patch.

comment:47 by historic_bruno, 9 years ago

With the latest patch built with VC++ 2010, I get a crash starting Atlas from main menu:

>	pyrogenesis.exe!CCmpRangeManager::ResetSubdivisions(CFixed<int,2147483647,32,15,16,65536> x1={...}, CFixed<int,2147483647,32,15,16,65536> z1={...})  Line 669 + 0x3 bytes	C++
 	pyrogenesis.exe!CCmpRangeManager::Init(const CParamNode & __formal={...})  Line 330	C++
 	pyrogenesis.exe!CComponentManager::AddComponent(CEntityHandle ent={...}, int cid=29, const CParamNode & paramNode={...})  Line 590	C++
 	pyrogenesis.exe!CSimulation2Impl::ResetComponentState(CComponentManager & componentManager={...}, bool skipScriptedComponents=false, bool skipAI=false)  Line 112	C++
 	pyrogenesis.exe!CSimulation2::ResetState(bool skipScriptedComponents=false, bool skipAI=false)  Line 760 + 0x2d bytes	C++
 	pyrogenesis.exe!CMapReader::LoadMap(const Path & pathname={...}, CTerrain * pTerrain_=0x017ea578, WaterManager * pWaterMan_=0x0189e6d0, SkyManager * pSkyMan_=0x0189ebf0, CLightEnv * pLightEnv_=0x003de180, CGameView * pGameView_=0x01871cb0, CCinemaManager * pCinema_=0x09c62c80, CTriggerManager * pTrigMan_=0x00000000, CPostprocManager * pPostproc_=0x0189f664, CSimulation2 * pSimulation2_=0x0186c1a8, const CSimContext * pSimContext_=0x09c6ad78, int playerID_=1, bool skipEntities=false)  Line 116	C++
 	pyrogenesis.exe!CWorld::RegisterInit(const CStrW & mapFile={...}, int playerID=1)  Line 95	C++
 	pyrogenesis.exe!CGame::RegisterInit(const CScriptValRooted & attribs={...}, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & savedState="")  Line 151 + 0x36 bytes	C++
 	pyrogenesis.exe!CGame::StartGame(const CScriptValRooted & attribs={...}, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & savedState="")  Line 259 + 0xc bytes	C++
 	pyrogenesis.exe!`anonymous namespace'::StartGame(const CScriptValRooted & attrs={...})  Line 67 + 0x30 bytes	C++
 	pyrogenesis.exe!AtlasMessage::fLoadMap(AtlasMessage::mLoadMap * msg=0x09c9f358)  Line 142	C++
 	pyrogenesis.exe!AtlasMessage::fLoadMap_wrapper(AtlasMessage::IMessage * msg=0x09c9f358)  Line 126 + 0x3c bytes	C++
 	pyrogenesis.exe!RunEngine(void * data=0x0071f810)  Line 175 + 0x6 bytes	C++
 	pyrogenesis.exe!thread_start(void * param=0x0aab5408)  Line 624 + 0x3 bytes	C++
 	msvcr100.dll!__endthreadex()  + 0x3a bytes	
 	msvcr100.dll!__endthreadex()  + 0xe4 bytes	
 	kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes	
 	ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

Deserialization crashes similarly - try quicksave(Shift+F5)/quickload(Shift+F8):

>	pyrogenesis.exe!CCmpRangeManager::ResetSubdivisions(CFixed<int,2147483647,32,15,16,65536> x1={...}, CFixed<int,2147483647,32,15,16,65536> z1={...})  Line 671	C++
 	pyrogenesis.exe!CCmpRangeManager::Init(const CParamNode & __formal={...})  Line 330	C++
 	pyrogenesis.exe!CCmpRangeManager::Deserialize(const CParamNode & paramNode={...}, IDeserializer & deserialize={...})  Line 380	C++
 	pyrogenesis.exe!CComponentManager::DeserializeState(std::basic_istream<char,std::char_traits<char> > & stream={...})  Line 313	C++
 	pyrogenesis.exe!CNetTurnManager::QuickLoad()  Line 356	C++
 	pyrogenesis.exe!ScriptInterface::call<void,&`anonymous namespace'::QuickLoad>(JSContext * cx=0x031efd38, unsigned int argc=0, unsigned __int64 * vp=0x04d80068)  Line 103 + 0x13a bytes	C++
 	mozjs185-ps-release-1.0.dll!69070d10() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for mozjs185-ps-release-1.0.dll]	
 	msvcr80.dll!59f64b6c() 	
 	kernel32.dll!76c314ad() 	
 	msvcr80.dll!59f64c39() 	
 	pyrogenesis.exe!GUI<bool>::GetSettingPointer(const IGUIObject * pObject=0x031efd38, const CStr8 & Setting={...}, bool * & Value=0x0bd9e330)  Line 341 + 0x18 bytes	C++
 	pyrogenesis.exe!IGUIObject::ScriptEvent(const CStr8 & Action=)  Line 496 + 0x70 bytes	C++
Last edited 9 years ago by historic_bruno (previous) (diff)

comment:48 by Jorma Rebane, 9 years ago

Fixed serialization error in the patch.

by Jorma Rebane, 9 years ago

Attachment: RangeManagerEntityMap.patch added

Working application of the EntityMap. Debugged and ~5x performance improvement to ExecuteActiveQuery.

comment:49 by historic_bruno, 9 years ago

We wasted a bunch of time looking in the wrong place... comparing all the profiling data from a replay of huge combat demo, with and without this patch applied, there is one thing that explains the 200ms difference. The A* in the short pathfinder is taking that much longer.

Looking at what changed in the patch that might cause this, it had to be CCmpObstructionManager::GetObstructionsInRange, range manager isn't even used by the pathdinder. I put some printfs in there and sure enough, it is now returning many more obstructions than before, something like 16% more, so the already slow pathfinder is doing that much more work. Since the simulation states are otherwise identical (both coming from a single replay file), this must be a bug introduced in GetObstructionsInRange - used only by the pathfinder.

comment:50 by Jorma Rebane, 9 years ago

Excellent find historicbruno. I'll direct my remaining time to figuring out why it's returning so many obstructions.

comment:52 by Jorma Rebane, 9 years ago

Keywords: review removed

Committed the memory optimizations part of the patch to r13854. The ticket should still remain open - considering replacing SpatialSubdivision for RangeManager completely with Sweep & Prune approach.

comment:53 by Jorma Rebane, 9 years ago

Right now the Parallel For still needs some work: 1) The implementation isn't very solid - we could probably do better. 2) There are probably some concurrency issues in RangeManager, I only placed 1 Mutex lock. 3) There is no native threadcount detection, which is very bad for systems that only have 1 thread.

by Jorma Rebane, 9 years ago

Implemented Parallel For on pthreads (scary), improves rangemanager performance on multicore platforms.

comment:54 by wraitii, 9 years ago

Description: modified (diff)
Keywords: patch removed
Milestone: Alpha 15Alpha 16
Owner: Jorma Rebane removed
Priority: Must HaveShould Have
Status: assignednew

Moving to A16 and resetting some info. Setting as should have since this is a lot harder than it looks.

comment:55 by scroogie, 9 years ago

The many patches reflect the many approaches tried, none having worked perfectly.

But didn't some of the patches already work and improve performance but were ditched because some even faster version seemed to be possible? Or did mrf's patches (e.g. #19) already include the problem with GetObstructionsInRange?

comment:56 by wraitii, 9 years ago

Resolution: fixed
Status: newclosed

Fixed behind our backs by Redfox and Philip (see notable [13854]).

comment:57 by fabio, 9 years ago

Milestone: Alpha 16Alpha 15
Note: See TracTickets for help on using tickets.