#1707 closed enhancement (fixed)
[PATCH] CCmpRangeManager optimisation
Reported by: | Jonathan Waller | Owned by: | |
---|---|---|---|
Priority: | Should Have | Milestone: | Alpha 15 |
Component: | UI & Simulation | Keywords: | |
Cc: | Patch: |
Description (last modified by )
This ticket is for some optimisation to CCmpRangeManager, converting from a std::map to std::vector.
The many patches reflect the many approaches tried, none having worked perfectly.
Attachments (12)
Change History (69)
by , 11 years ago
Attachment: | range_manager_optimisation.diff added |
---|
comment:1 by , 11 years ago
Keywords: | simple added |
---|
Some comments on this.
The approach used in the patch is poor. Increasing entity id's will basically make this a memory leak. If people start using lots of temporary entities things could end up being pretty bad. Performance is critical however so I think a std::vector is still the way to go. A solution would be creating a map from entity ids to CCmpRangeManager ids, CCmpRangeManager ids would be reused. This would be fast enough because the majority of lookups are from internal queries where the internal id can be used.
An alternative would be to store all of the information in the spatial data structure rather than needing a separate lookup by id. I haven't looked at this in detail however.
The other main cost is deduplicating the spatial query results. My proposed solution for this is to set a "small unit" parameter, any units less than this size are not duplicated. Range queries increase the range by this amount, this will increase the units found but should still increase performance significantly. Then only large units need deduplicating and due to the far smaller number of large units this should be pretty cheap.
comment:2 by , 11 years ago
Keywords: | patch added |
---|---|
Milestone: | Backlog → Alpha 13 |
Priority: | Should Have → Must Have |
comment:3 by , 11 years ago
Tried this patch: on Combat Demo (Huge), this speeds up the ExecuteActiveQueries by a factor of at least 3 (10 ms against 30/40, basically). The approach doesn't seem that poor to me, it is inefficient memory-wise but I don't think it would be too bad (even with tons of entities, like 100,000, this would take less than 10Mb of RAM which I think is OK given that speed is crucial here. But then again I don't care too much about memory…). Furthermore I think (not too sure about those things) we could make the "EntityData" structure more memory-efficient by using one u8 and setting binary flags instead of using 3 u8/i8 as booleans (if I read it correctly this struct currently uses 64(perhaps more) bits of data, could be reduced to 48 (owner is 3bits + "-1", I think, at most, so that's 4, retainInFog 1, inWorld 1), if it's needed.
I'm not too sure how we could efficiently make it not use too much memory otherwise (temporary entities being a bad case indeed), your map solution could work. Not sure I understand what you mean in the 2nd paragraph.
Looking at Quantumstate's final paragraph, it appears to me that we could remove the de-duplicating altogether. I don't think having a few doubles is too bad (correct me if I'm wrong though. In that case your solution appears better). This makes it indeed significantly faster (on Combat Demo Huge it goes from ≈13 ms to "GetNear" to 2.7ms.) It might also be removed for the Pathfinder itself (the TestLine functions). I think basically this breaks nothing and makes things faster, though slightly less clean.
Finally still on the SpatialSubdivision, if I understand it correctly it's currently fixed size, so we could avoid using ::at() and use [] instead, which is ever so slightly faster (might be slightly unsafe if we ever change this though).
(note: all those numbers were with the JS profiler activated).
comment:5 by , 11 years ago
My patch builds on Quantumstate's. Same approach but I replaced every instance of the map with the vector. Also probably made it able to serialize, haven't really tested. This is mostly for demonstration purposes, as indeed something probably should be done for the fact that this constitutes a bad memory leak (and also this patch is probably buggy in various ways.) Also updates the "GetNear" functions to not sort the entities, which is faster.
edit: haven't tested on something other than COmbat Demo, could be buggy if entities are created or things like that.
(A problem with using a vector is that we can't too easily check if an entity is supposed to have been destroyed. Since it will never be "deleted" as that would throw the all thing down.)
comment:6 by , 11 years ago
Milestone: | Alpha 13 → Alpha 14 |
---|
by , 11 years ago
Attachment: | rangeManagerOptimizations.patch added |
---|
comment:7 by , 11 years ago
Uploaded a (slightly) more sane version that completely removes the map and uses only a vector. It stores pointers to entity data so I check for destruction by checking if the pointer is "null". This is probably really not safe, and probably annoying to serialize/deserialize, but at least starting a game with that patch should work properly, so the speed improvement can be tested.
comment:8 by , 11 years ago
The only other thing that I could come up with is using a unordered_map with this as hash function:
class Hasher{ public: size_t operator()(entity_id_t id) const{ return id; } };
This is still slower than a vector on Combat Demo (Huge), but might be faster in case a lot of units are already removed from the game (killed, used up). But I'm not sure about that. And the change to an unordered_map might also be safer. wraitii, maybe you can try it out together with your other optimizations?
by , 11 years ago
Attachment: | rangemanager_vectorization_binsearch.patch added |
---|
by , 11 years ago
Attachment: | spatial_optimization.patch added |
---|
comment:9 by , 11 years ago
This patch avoids the memory-leak. It stores the entity data with ids in a vector and does binary search to retrieve the data of a given entity. I have included a custom binary search that tries to narrow down the search range by incorporating the knowledge that we are searching a sorted set of integers (i.e. no duplicates and values strictly increasing).
Additionally I have changed the SpatialSubdivision code to check whether the search range overlaps with a division before returning the contained entities as part of the result. This does not help a lot in most cases though, because the search range is typically smaller then the size of a division cell.
comment:10 by , 11 years ago
Keywords: | review added |
---|---|
Summary: | CCmpRangeManager optimisation → [PATCH] CCmpRangeManager optimisation |
comment:11 by , 11 years ago
Tried the same kind of optimisation, but with Two std::vector<> and got even better perf results, a full std::vector (push_back) for iteration AND a sparse std::vector (with holes) indexed by entity_id.
Just wanted to point out it gives even better perf, solving the std::map::find perf and the iteration perf (which changes the log(n) std::map::find calls into a simple pointer adressing operation which are called by all kind of *Query methods.)
Not much memory consumption (hardly a single mipmap of a normal texture...) but great perf gains on both iteration and find.
(see #1860 for an idea of a template container for entity_id related data, allowing fast find and fast iteration)
comment:12 by , 11 years ago
The last two patches sound like good solutions. What would be the next steps?
comment:13 by , 11 years ago
I think Tuan Kuranes' optimizations with my changes to use GetInRangeUnsorted or something would be optimal, but we need Kuan to post his patch.
comment:14 by , 11 years ago
Not so sure about this unsorted: the set_difference operations used in the active queries require a sorted set. One option would be to sort the entities at the end of PerformQuery, i.e. after filtering out the non-relevant entities. However, I have doubts that sorting a few items fewer will have a positive impact on performance.
About next steps:
- creating an entity-map container sounds like a nice idea.
I prefer my approach over kuranes' because it avoids having a potential memory leak: if entities are deleted the vector indexed by entity_id cannot be reduced in size and it can become huge after some time. This seems risky to me. While it's true that the binary search in my approach is log(n) and not constant, the tweaked implementation is very quick and I think it won't be much slower.
- the patch on SpatialSubdivision is independant from this discussion and I think it can be reviewed and hopefully then be used
comment:15 by , 11 years ago
- Don't stop on my account (works here, but problem with different hash replay from unpatched code ). If needed by perf, (find still showing up in profilers), I'll post another patch that will permits further perf test.
- About memory, it's more bloat than leak, but it's there. (still even with up to a 65k entity up use of id, the "std::vector<EntityData* >" is still smaller than a small texture/mesh in memory). Anyway, if still raise concerns, some entity_id "reuse scheme" or "recycling" could be advised in order to avoid that particular memory bloat (Have yet to stumble on the unique id generation code).
comment:16 by , 11 years ago
I must say I tend to follow Kuranes here (and my patch did it too): a sparse vector seems to me like a smallish memory issue, particularly as speed is crucial. It it however slightly annoying to serialize, I believe.
mrf: are you actually sure that all of them require such sorting?
comment:17 by , 11 years ago
Ok guys, I have written such an entity-map-container using your lookup-vector approach. I tried to make it a (simplified) drop-in replacement for std::map<entity_id_t, T>. Let me know what you think!
comment:18 by , 11 years ago
Nicely done, stl like, and reusable in #1860, perfect.
Remarks:
- entity_id_t is unsigned so check like " >= 0" could be pruned. What I'm not sure about is difference_type checks. Are you're sure about difference_type beinh signed, doesn't the size() - 1 doesn't rather give you a std::numeric_limits<difference_type>::max() ? (disclaimer:I don't know about that one in stl spec. if not specified in spec, it might even be different between stl implementations and would need unit-tests on gcc/clang/vs just to be sure and future proof.)
- would just add ::reserve, ::resize method just in case
- Did you achieve to get no hash changes in replay when comparing this from a vanilla svn code run ? Or were I wrong trying to do that ?
by , 11 years ago
Attachment: | entitymap.diff added |
---|
comment:19 by , 11 years ago
Thanks!
- good catch, I've removed the redundant ">=0". Yes, according to the reference, difference_type is supposed to be signed.
- the lookup map is much bigger and there ::reserve would not help much I guess. Also, ::resize does not seem to be meaningful in the context of a map.
- Not sure what you mean, sorry. I would expect that the serialization is equivalent to the normal std::map version if that is what you were checking. It seemed to work for me but I don't know how to systematically test this.
comment:20 by , 11 years ago
Nice work, mrf.
Some remarks:
- Small one/two line functions should receive the compiler hint 'inline'.
- Why does EntityMap::insert return an std::pair? Is that necessary?
- EntityMap::insert and EntityMap::erase have some very bad performance cases. You don't actually need the mLookup vector. If your data is sorted as it is in mData, you can just do a binary search: http://en.wikipedia.org/wiki/Binary_search_algorithm#Deferred_detection_of_equality;
This way you can completely avoid any over-resizing of the vectors or using the mLookup vector and updating its indices. Binary search would be the preferred choice here, since insert and erase performance would greatly increase. However, obviously enough the 'find' function would suffer a small performance penalty. If you use the iterative binary search, it should be minimal though.
- You could add a few more comments to your functions. :) Try following the Doxygen style.
Either way, I like the direction of this patch! This will greatly improve our entity maps; but insert/erase should have less latency. If you can show however that using the mLookup vector has significantly better performance in insert/erase situations, be sure to post comparison data here. :)
comment:21 by , 11 years ago
Thanks for your comment, RedFox.
- inline and comments: will do.
- return value of EntityMap::insert: this is just the interface of std::map, so I guess it makes sense to keep it like that
- Binary search: this was discussed above. Actually I have expressed my preference for the binary search approach as well (see also my first patch with a binary search implementation tailored for unique integer-keys), but the majority seemed to prefer this approach with faster lookups.
comment:22 by , 11 years ago
comment:23 by , 11 years ago
@kuranes: if it is only about the speed of iteration (as in #1860), it does not matter whether you use a lookup map or search for find
. It's just iteration is over a (sorted) vector.
comment:24 by , 11 years ago
I ran a quick performance test on the new EntityMap, in VC++2012 Release mode, no debugger attached, 100000 iterations:
Starting perftest 100000 iterations... map< entity_id_t, int >: insert: 33.72ms find: 29.52ms erase: 28.49ms clear: 7.85ms EntityMap< int >: insert: 10.34ms find: 2.95ms erase: 13058.40ms clear: 0.00ms map< entity_id_t, string >: insert: 31.26ms find: 25.36ms erase: 24.87ms clear: 8.50ms EntityMap< string >: insert: 19.85ms find: 2.97ms erase: 71514.26ms clear: 0.32ms map< entity_id_t, vector<int> >: insert: 26.93ms find: 23.36ms erase: 18.91ms clear: 7.83ms EntityMap< vector<int> >: insert: 13.78ms find: 2.83ms erase: 35313.02ms clear: 0.19ms
Most cases are faster, but EntityMap::erase is ridiculously slow. The performance improvement would be huge, if we can just improve the erase() method. In its current state, we can't use it, because every erase operation would negate any performance gain we would have with find() and insert().
comment:25 by , 11 years ago
Added a true 'sparse' vector EntityMap based on mrf's EntityMap. I wouldn't be concerned with the entity map getting too big. A reasonable optimization might be to use a memory pool for the mapped value. This would remove any fear of the entity vector growing too big (because even 100,000 * 8 bytes is still pretty small).
At least it gave reasonable performance:
Starting perftest 100000 iterations... map< entity_id_t, int >: insert: 35.66ms find: 31.19ms erase: 28.90ms clear: 8.87ms map< entity_id_t, string >: insert: 34.77ms find: 33.48ms erase: 25.24ms clear: 8.21ms map< entity_id_t, vector<int> >: insert: 33.39ms find: 27.72ms erase: 23.28ms clear: 7.66ms EntityMap< int >: insert: 2.79ms find: 3.00ms erase: 0.35ms clear: 0.00ms EntityMap< string >: insert: 15.78ms find: 2.40ms erase: 0.71ms clear: 0.37ms EntityMap< vector<int> >: insert: 9.11ms find: 3.20ms erase: 0.51ms clear: 0.14ms
comment:26 by , 11 years ago
Uploaded a complete patch with
-mrf's changes to CCmpRangeManager and SerializeTemplates
-my own changes to Spatial.h to make the range manager use the (new) unsorted version of GetNear, which is much faster and which works equivalently in the case of the range manager as far as I know.
I think this is ready for commit, and it should give a significant speed boost (the numbers I gave some time ago above probably still hold mostly true). On my computer the RangeManager "ExecuteActiveQueries" function goes from taking about 23% of game simulation time to about 3%, resulting in FPS going from 5 to 7. This is only my computer but I guess it's indicative.
edit: just need to check serialization does work.
follow-up: 31 comment:27 by , 11 years ago
RedFox: nice evaluation! Unfortunately, your version breaks the iterators, because the data vector now contains invalid entries. We could add a custom iterator which skips the invalid entries, but of course that would significantly reduce the performance for iteration (which is important for #1860).
The way see it, there is no solution which can make all operations (insert, erase, find
and iterate) fast, so the right compromise has to be found. As mentioned in the comment in the code, we assumed that the performance of modifications (in particular for erase), is less crucial then the performance for find and iterate.
To check this assumption, I have run a short test run (playing Acropolis for some minutes) to get an idea of the magnitude of the number of execution for each operation in CCmpRangeManager:
find
: 19036insert
: 4405- iteration (calls to
begin
): 236 erase
: 53
I think these numbers suggest that kuranes idea (i.e. using the lookup map, where erase is the slowest operation) is on the right track.
I suggest that, instead of sacrificing iteration performance, we could try to improve the performance of erase
in this approach. A rather simple trick could be to use a "bulk" erase, i.e. to only mark erased entries as invalid and only to clean up the m_Data
vector before iteration -- this would be kind of a compromise between RedFox's version and "my" version.
wraitii: did you check that the unsorted GetNear
does not break the active queries? (cf. comment:14, set_difference expects a sorted range)
comment:28 by , 11 years ago
Ah, right, hadn't actually noticed your comment... It appeared to work on all maps I tried with this optimization, but I really can't be 100% sure.
I guess we could still remove the last lines (the calls to ret.erase(std::unique(…))) though, which would already be much faster, the sorting itself is probably not too slow. Would need to check.
Would it be nearly that slow to use a custom iterator? It seems to me like your test case is a little too peaceful: if you tried on Combat Demo(Huge) and had units fight, you'd probably have a lot more "erase" calls too. Not sure what calls "iteration" but I think testing on Acropolis only is insufficient (I'd recommend combat demo (huge), Gambia river, and something like Oasis).
comment:29 by , 11 years ago
wraiiti, did you perhaps upload the wrong patch? It does not contain the EntityMap itself, and mrf's changes seem to be missing as well. Also the comments to the new GetUnsorted methods is wrong ("Returns a sorted list of...").
by , 11 years ago
Attachment: | RangeManagerOpt.patch added |
---|
follow-up: 32 comment:30 by , 11 years ago
I had forgotten to "svn add" the entity map, the comment is fixed, however I'm quite sure mrf's changes were in already, or am I mistaken here?
comment:31 by , 11 years ago
Replying to mrf:
RedFox: nice evaluation! Unfortunately, your version breaks the iterators, because the data vector now contains invalid entries. We could add a custom iterator which skips the invalid entries, but of course that would significantly reduce the performance for iteration (which is important for #1860).
It doesn't 'break' the iterators per se - leaving in 'invalid' entries is intentional in order to get the speed boost. Once we start reusing entity handles, this solution would be quite perfect :)
When we iterate over the entitymap, we can do 2 things to fix it: 1) Write a custom iterator (easy) that skips handles with '-1' (invalid). 2) Let the code itself handle it: if(it->first == -1) continue;
Another issue might be current simulation code that expects size = size - 1 after erase(), which doesn't happen right now. That is easy to fix. Just search for all references with your favorite IDE and all is well.
I don't see why we can't use this solution though - it's pretty much the best we've got so far. I don't really care about wasting a bit of memory either. This vector never grows too big :)
comment:32 by , 11 years ago
Replying to wraitii:
I had forgotten to "svn add" the entity map, the comment is fixed, however I'm quite sure mrf's changes were in already, or am I mistaken here?
Ah, sorry, I thought you meant that one: http://trac.wildfiregames.com/attachment/ticket/1707/spatial_optimization.patch
Don't know about the >=0 changes.
by , 11 years ago
Attachment: | EntityMap.2.h added |
---|
Updated EntityMap with custom iterator that skips invalid entities.
comment:33 by , 11 years ago
Just as a question. Maybe I'm naive, but couldn't PerformQuery(), ExecuteQuery() and ExecuteActiveQueries() cope completely without the repetitive .find() calls if the spatial subdivision stored both id and pointer. The template for an Octree thats somewhere in Trac for example does that already. The RangeManager already saves all relevant data, but not in the spatial structure. The latter could even just use pointers to EntityData instead of the id, couldn't it?
I'm just saying that, because according to profiler data of a simple match, all other uses of the m_EntityData map together consume not even a tenth of the amount of resources that the ExecuteActiveQueries function takes (looking only at RangeManager). Maybe this is a special case, but in case this is true, it would perhaps make sense to eliminate the finds and concentrate on iteration speed instead.
follow-up: 36 comment:34 by , 11 years ago
Ok, i did the test again, now playing Gambia River for some minutes:
find
: 106086insert
: 12061begin
: 510erase
: 93
The iterations come from the LOS code in CCmpRangeManager.
From these numbers and also keeping in mind that the ComponentManager also requires fast iteration, I don't see a motivation for improving the erase performance on the cost of iteration performance.
I would stick to the lookup-vector approach and consider bulk removal if erase
proves to be too slow. One simple improvement to erase
in my version would be to use the following code, which avoids touching the empty entries during index update:
inline void erase(iterator it) { // update all lookup indicies m_Lookup[it->first] = -1; for(iterator jt = it + 1; jt != m_Data.end(); jt++) { m_Lookup[jt->first]--; } // erase from m_Data m_Data.erase(it); }
Anyway, I have some comments on wraitii's EntityMap.h:
begin
should also skip invalid entries before the first valid onesize
should return the number of valid entries (e.g. by manually keeping track of this). Also,empty
should be equivalent tosize()==0
- I guess
max_size
does not make too much sense anymore -- ok, it has never made sense ;)
@scroogie: note that this is also suggested by quantumstate in comment:1. A problem with that might be that the spatial subdivision might store the entity data multiple times. Also, they data would have to be moved (physically, in the memory) every time the entity moves to a different subdivision cell.
comment:35 by , 11 years ago
mrf: Right, your comments about the entity map are spot on.
I'm not sure we can decide which of the approach is better unless we benchmark both of them on several maps. I think we ought to have two finished patches, one for each approach, so we can compare them. I'll update the entity map accordingly.
edit: about scroogie's proposition. I really haven't studied that possibility, but it sounds doable, so we probably should.
comment:36 by , 11 years ago
Replying to mrf:
@scroogie: note that this is also suggested by quantumstate in comment:1.
Okay, but there seems to be no further discussion, or what do you mean by that?
A problem with that might be that the spatial subdivision might store the entity data multiple times.
Only a pointer, just as now the ids are stored multiple times.
Also, they data would have to be moved (physically, in the memory) every time the entity moves to a different subdivision cell.
Just as the id now. I'm not proposing to drop the map, merely to concentrate on iteration performance for the map and replace the id mapping with pointers.
by , 11 years ago
Attachment: | RangeManagerOpt.2.patch added |
---|
comment:37 by , 11 years ago
Alright, uploaded two things... I noticed an issue with the entity map: if the last element is made invalid, the iterator will crash because of bad memory access. This is bad. Thus:
-RangeManagerOpt.2 is an updated version that reverts iterator changes and makes the RangeManager check for validity in the loops. This is not too nice but works cleanly.
-EntityMap.3.h is an updated version of the entityMap that works with the original RangeMangerOpt.patch, which checks on "erase" if the reased element is the last, and if so removes it from m_Data so that the last element is always valid. It seems to work but it could be slow.
I've updated both cases to take into account mrf' observations.
by , 11 years ago
Attachment: | EntityMap.3.h added |
---|
comment:38 by , 11 years ago
Milestone: | Alpha 14 → Alpha 15 |
---|
comment:39 by , 11 years ago
Starting perftest 100000 iterations... stdmap< int >: insert: 36.99ms find: 29.13ms iterate1: 1.94ms erase: 31.32ms iterate2: 2.11ms clear: 6.13ms stdmap< string >: insert: 38.10ms find: 27.83ms iterate1: 2.70ms erase: 17.86ms iterate2: 2.42ms clear: 7.41ms stdmap< vector<int> >: insert: 28.19ms find: 22.23ms iterate1: 1.88ms erase: 20.92ms iterate2: 2.67ms clear: 7.18ms EntityMap< int >: insert: 4.91ms find: 2.38ms iterate1: 0.18ms erase: 2.49ms iterate2: 0.17ms clear: 0.53ms EntityMap< string >: insert: 20.65ms find: 2.18ms iterate1: 0.32ms erase: 2.24ms iterate2: 0.26ms clear: 0.49ms EntityMap< vector<int> >: insert: 11.43ms find: 2.32ms iterate1: 0.25ms erase: 2.64ms iterate2: 0.27ms clear: 0.27ms
comment:40 by , 11 years ago
Keywords: | simple removed |
---|
comment:41 by , 11 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
Alright, I optimized the SpatialSubdivision class and also included the EntityMap to drastically improve ExecuteActiveQueries performance. (that function is the main bottleneck in rangemanager).
The overall performance improvement can be measured in Combat Demo (Huge). On my laptop, the IDLE ExecuteActiveQueries time went down from 17-20ms per frame to a much better 2-3ms per frame.
I should note that once units start moving and getting closer, the ExecuteActiveQueries still shows up as a bottleneck. However, it is dwarfed by ComputeShortPath, so I suggest we concentrate on optimizing ComputeShortPath next.
follow-up: 44 comment:42 by , 11 years ago
Just posting here what I told on IRC as well, after applying "RangeManagerEntityMap.patch" on the latest GIT version, I get the following compile errors:
- On CCmpRangeManager.cpp:24:42: #include "simulation2\system\EntityMap.h" does not exists (easy to fix)
- Big complaint at CCmpRangeManager.cpp:24:0 http://pastebin.com/JJc56kzt
For the second error I removed "typename" from "typedef typename K first_type" (and the second), as suggested on IRC.
This compiles without errors, but crashes on loading a map with: http://pastebin.com/zMtzciRB
I suppose the proposed fix for the second error is incorrect? (my c++ templating is a but rusty, so I really can't tell)
comment:43 by , 11 years ago
Redfox: this sounds good enough for me, combat demo huge is basically the case we'll never reach
comment:44 by , 11 years ago
Replying to Riemer:
For the second error I removed "typename" from "typedef typename K first_type" (and the second), as suggested on IRC.
This compiles without errors, but crashes on loading a map with: http://pastebin.com/zMtzciRB
I suppose the proposed fix for the second error is incorrect? (my c++ templating is a but rusty, so I really can't tell)
The fix is correct. The problem is with our own sub-optimal code. I have intentionally put a hardcoded limit to number of Entities that can exist in a Subdivision grid. Ideally it should always stay below 128. If it goes over that, it's because the subdivsion grid is too sparse - thus also suboptimal.
comment:45 by , 11 years ago
This patch needs to be merged with the current trunk version. I've fixed the compile-error but then it crashed when running the tests and when trying to start a game. Maybe something is not compatible with Philip's latest change.
comment:47 by , 11 years ago
With the latest patch built with VC++ 2010, I get a crash starting Atlas from main menu:
> pyrogenesis.exe!CCmpRangeManager::ResetSubdivisions(CFixed<int,2147483647,32,15,16,65536> x1={...}, CFixed<int,2147483647,32,15,16,65536> z1={...}) Line 669 + 0x3 bytes C++ pyrogenesis.exe!CCmpRangeManager::Init(const CParamNode & __formal={...}) Line 330 C++ pyrogenesis.exe!CComponentManager::AddComponent(CEntityHandle ent={...}, int cid=29, const CParamNode & paramNode={...}) Line 590 C++ pyrogenesis.exe!CSimulation2Impl::ResetComponentState(CComponentManager & componentManager={...}, bool skipScriptedComponents=false, bool skipAI=false) Line 112 C++ pyrogenesis.exe!CSimulation2::ResetState(bool skipScriptedComponents=false, bool skipAI=false) Line 760 + 0x2d bytes C++ pyrogenesis.exe!CMapReader::LoadMap(const Path & pathname={...}, CTerrain * pTerrain_=0x017ea578, WaterManager * pWaterMan_=0x0189e6d0, SkyManager * pSkyMan_=0x0189ebf0, CLightEnv * pLightEnv_=0x003de180, CGameView * pGameView_=0x01871cb0, CCinemaManager * pCinema_=0x09c62c80, CTriggerManager * pTrigMan_=0x00000000, CPostprocManager * pPostproc_=0x0189f664, CSimulation2 * pSimulation2_=0x0186c1a8, const CSimContext * pSimContext_=0x09c6ad78, int playerID_=1, bool skipEntities=false) Line 116 C++ pyrogenesis.exe!CWorld::RegisterInit(const CStrW & mapFile={...}, int playerID=1) Line 95 C++ pyrogenesis.exe!CGame::RegisterInit(const CScriptValRooted & attribs={...}, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & savedState="") Line 151 + 0x36 bytes C++ pyrogenesis.exe!CGame::StartGame(const CScriptValRooted & attribs={...}, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & savedState="") Line 259 + 0xc bytes C++ pyrogenesis.exe!`anonymous namespace'::StartGame(const CScriptValRooted & attrs={...}) Line 67 + 0x30 bytes C++ pyrogenesis.exe!AtlasMessage::fLoadMap(AtlasMessage::mLoadMap * msg=0x09c9f358) Line 142 C++ pyrogenesis.exe!AtlasMessage::fLoadMap_wrapper(AtlasMessage::IMessage * msg=0x09c9f358) Line 126 + 0x3c bytes C++ pyrogenesis.exe!RunEngine(void * data=0x0071f810) Line 175 + 0x6 bytes C++ pyrogenesis.exe!thread_start(void * param=0x0aab5408) Line 624 + 0x3 bytes C++ msvcr100.dll!__endthreadex() + 0x3a bytes msvcr100.dll!__endthreadex() + 0xe4 bytes kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes
Deserialization crashes similarly - try quicksave(Shift+F5)/quickload(Shift+F8):
> pyrogenesis.exe!CCmpRangeManager::ResetSubdivisions(CFixed<int,2147483647,32,15,16,65536> x1={...}, CFixed<int,2147483647,32,15,16,65536> z1={...}) Line 671 C++ pyrogenesis.exe!CCmpRangeManager::Init(const CParamNode & __formal={...}) Line 330 C++ pyrogenesis.exe!CCmpRangeManager::Deserialize(const CParamNode & paramNode={...}, IDeserializer & deserialize={...}) Line 380 C++ pyrogenesis.exe!CComponentManager::DeserializeState(std::basic_istream<char,std::char_traits<char> > & stream={...}) Line 313 C++ pyrogenesis.exe!CNetTurnManager::QuickLoad() Line 356 C++ pyrogenesis.exe!ScriptInterface::call<void,&`anonymous namespace'::QuickLoad>(JSContext * cx=0x031efd38, unsigned int argc=0, unsigned __int64 * vp=0x04d80068) Line 103 + 0x13a bytes C++ mozjs185-ps-release-1.0.dll!69070d10() [Frames below may be incorrect and/or missing, no symbols loaded for mozjs185-ps-release-1.0.dll] msvcr80.dll!59f64b6c() kernel32.dll!76c314ad() msvcr80.dll!59f64c39() pyrogenesis.exe!GUI<bool>::GetSettingPointer(const IGUIObject * pObject=0x031efd38, const CStr8 & Setting={...}, bool * & Value=0x0bd9e330) Line 341 + 0x18 bytes C++ pyrogenesis.exe!IGUIObject::ScriptEvent(const CStr8 & Action=) Line 496 + 0x70 bytes C++
by , 11 years ago
Attachment: | RangeManagerEntityMap.patch added |
---|
Working application of the EntityMap. Debugged and ~5x performance improvement to ExecuteActiveQuery.
comment:49 by , 11 years ago
We wasted a bunch of time looking in the wrong place... comparing all the profiling data from a replay of huge combat demo, with and without this patch applied, there is one thing that explains the 200ms difference. The A* in the short pathfinder is taking that much longer.
Looking at what changed in the patch that might cause this, it had to be CCmpObstructionManager::GetObstructionsInRange
, range manager isn't even used by the pathdinder. I put some printfs in there and sure enough, it is now returning many more obstructions than before, something like 16% more, so the already slow pathfinder is doing that much more work. Since the simulation states are otherwise identical (both coming from a single replay file), this must be a bug introduced in GetObstructionsInRange
- used only by the pathfinder.
comment:50 by , 11 years ago
Excellent find historicbruno. I'll direct my remaining time to figuring out why it's returning so many obstructions.
comment:51 by , 11 years ago
See also http://irclogs.wildfiregames.com/2013-09-14-QuakeNet-%230ad-dev.log at 10:16.
comment:52 by , 11 years ago
Keywords: | review removed |
---|
Committed the memory optimizations part of the patch to r13854. The ticket should still remain open - considering replacing SpatialSubdivision for RangeManager completely with Sweep & Prune approach.
comment:53 by , 11 years ago
Right now the Parallel For still needs some work: 1) The implementation isn't very solid - we could probably do better. 2) There are probably some concurrency issues in RangeManager, I only placed 1 Mutex lock. 3) There is no native threadcount detection, which is very bad for systems that only have 1 thread.
by , 11 years ago
Attachment: | ParallelFor_rangemanager.patch added |
---|
Implemented Parallel For on pthreads (scary), improves rangemanager performance on multicore platforms.
comment:54 by , 10 years ago
Description: | modified (diff) |
---|---|
Keywords: | patch removed |
Milestone: | Alpha 15 → Alpha 16 |
Owner: | removed |
Priority: | Must Have → Should Have |
Status: | assigned → new |
Moving to A16 and resetting some info. Setting as should have since this is a lot harder than it looks.
comment:55 by , 10 years ago
The many patches reflect the many approaches tried, none having worked perfectly.
But didn't some of the patches already work and improve performance but were ditched because some even faster version seemed to be possible? Or did mrf's patches (e.g. #19) already include the problem with GetObstructionsInRange?
comment:56 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed behind our backs by Redfox and Philip (see notable [13854]).
comment:57 by , 10 years ago
Milestone: | Alpha 16 → Alpha 15 |
---|
WIP patch, currently works but is done in parallel and doesn't serialize so won't survive saving.