Opened 11 years ago

Closed 10 years ago

Last modified 8 years ago

#1923 closed defect (wontfix)

[PATCH] very performance basic non behaviour changing optimisations (in hotspot only)

Reported by: tuan kuranes Owned by: tuan kuranes
Priority: Should Have Milestone:
Component: Core engine Keywords: patch
Cc: Patch:

Description

Done only on very hot CPU intensive code, using very sleepy profiler hotspot tracing:

what the patch does

  • Debranch function
  • const variables methods and parameters wherever possible.

The most and foremost one being getting rid of all new/free/malloc/delete during runtime as that's your main performance killer (all memory should be pre-allocated and stored in memory pools, period).

Attachments (3)

commands.txt (4.3 KB ) - added by fabio 11 years ago.
this replay is a lot slower with the patched game
Archive.zip (52.2 KB ) - added by tuan kuranes 11 years ago.
latest all-in-one patch against latest svn (removed size_empty, added Territory_memfrag and pathfind_bench useful for vanilla 0ad comparison)
minimap.2.patch (8.2 KB ) - added by tuan kuranes 11 years ago.
fixed fixed function renderpath

Download all attachments as: .zip

Change History (50)

comment:1 by Philip Taylor, 11 years ago

I believe const should not affect the code generated by the compiler at all, so it'll have no effect on performance. (In most cases the compiler can already tell whether a variable gets modified or not; and in the cases where it can't tell, it can't assume some other code doesn't modify the variable despite it being const). Are there any cases where you can see these const changes resulting in faster code?

comment:2 by tuan kuranes, 11 years ago

  • Const is foremost a compile-time safety and code safety, useful for code reading and code analysis (tools like cppcheck). Still, it provides compiler are hint, and nowadays compilers are moving faster, and it might still do good deeds.
  • Meanwhile const methods and const methods parameters are optimised by compiler, const references can prevent aliasing ( the load and store hit. Note that using "restrict" make it more even so, but people already have hard time with my const-correcness...) The most useful is const vector3D &myvector which allows compiler to pass the pointer only (size_t copy in register), not doing a copy of the whole vector3D object. (rvo doesn't really work afaik)

I get noticeable perf improvements with my patch here, but previous comment was misleading, I added const only in hotspot (top 10 in profiler) I reviewed, but most of its effect are cache wise, and performance also comes from changes to prevent stack allocation. (but all that only in hotspots.)

I added a new version of the patch, with new hotspot. There's lots of other optimisation opportunities, on the same vein, but now, it's more high-level and can be behaviour changing.

(more stack allocation to prevent, like the shaderpogram uniform stack allocation thing using string as ID that could be cached by shaderprogram by name getTransformUniform, getShadowSizeUniform() that would help ease the boosthaslap cpu impact, etc.).

comment:3 by tuan kuranes, 11 years ago

The latest patch is a *very huge perf* gain here, having done all the hotspots now. (pathfinding is the most impacted, but all other cpu intensive operations get noticable gains)

The idea of the patch is really "Optimisations as you go" withtout any code behaviour modification, so it's really easy to apply and test.

  • http://www.tantalon.com/pete/cppopt/main.htm (most gain with "Pass Class Parameters by Reference")
  • cpu cycle saving with computation cache (computing edge distance once and sort, instead of computing distance at each sort comparison in pathfinding)
  • Effective C++ scott meyers usual things.

There's still const whenever possible on code I crossed, because it's good practice for clean code and optimisiation opportunities, and in order to be excused, at least, I'm not alone there : http://clipboard.com/clip/LQP6eFU2JSft1ZBHlDQGutltTI1UqvgqxJTe

Now, that let JS code in hotspots for cpu cycle saving. ("search drop build site" is a frame killer here)

The next step are the "hiccups" still there.

I would guess they come from memory, with all those stack allocation, new and delete even show in hotspots. Not only it's slow, but memory fragmention grows over time execution (notice when you save, close and restarts, game is faster, at start). The way there is "memory pooling", which is preallocation "once and for all" from where we get new object and store unused objects. Not only it stops memory fragmentation and is faster than new/delete, but it gives data "localization" in RAM, giving cpu cache usage a boost. ( http://blog.kejser.org/2012/06/14/the-effect-of-cpu-caches-and-memory-access-patterns/ and all data-oriented talks/slides )

(each patch replace the precedent, it's not a cumulative patch)

Last edited 11 years ago by tuan kuranes (previous) (diff)

comment:4 by historic_bruno, 11 years ago

Keywords: patch review added; performance removed

As interesting as these patches may be, you've provided no numbers on what performance gains were seen, nor how you measured them, or what compiler you used and what system this was tested on.

Also it's extremely unlikely that we apply and review a 142 KB patch, so if you would like some of these changes added, I suggest breaking it into multiple patches, where each one addresses a particular issue and a few files. Then provide a description of the patch and what it gains.

comment:5 by tuan kuranes, 11 years ago

Sorry about huge patch.

A link where patch is discussed a bit : http://www.wildfiregames.com/forum/index.php?showtopic=17274&st=60#entry268582

How do you want patch performance gains posted ? is there a wiki page explaining a route to follow ?

It depends on game context, so I don't know how to get meaningful numbers, but note that the patch just removed unecessary cycles and branch mispredict (attested by codeXL) so it can only be gains.

As said in the forum thread, My method was to used real profiler (codeXL) to validate each changes on cpu cycles, branch prediction, and cache trashing, and only kept those which lowered those numbers (but didn't take note or screenshots of numbers).

(Note that It's only the latest file patch, the latest in the file section (didn't find how to remove file previously submitted.)

I intend to give some time to cut patch in three (probably not until I get time, that is , next week-end)

Here's perhaps how I could make three separate patches:

  • "optimisation as you go", that cover a lot of files, but not much code changes in the end mostly "method(paramClass)" => "method(const param &Class) const" and other const opportunities whenever possible. (that's the one that make the patch big, because of the huge number of files.)
  • Pathfinding (CFvector2D and all path files does cover a lot of files too ).
  • Basic branches optimisation (remove if from inside while and call stack, and not inline funcs), and whenever RVO doesn't kick (and cause temp object alloc).

I intend to give some time to cut patch in three (probably not until I get time, that is , next week-end)

comment:6 by tuan kuranes, 11 years ago

As pointed on forum, last file patch seems to look rather like a wrong patch, and doesn't apply on linux out of the box. (and seems a wrong file with commented code and fast_blur_test intrinsics, and seemingly the boost::unordered changes test that should not be in.)

in reply to:  5 comment:7 by historic_bruno, 11 years ago

Replying to kuranes:

How do you want patch performance gains posted ? is there a wiki page explaining a route to follow ?

We probably need to make this a more formal process in the future, but basically what I hinted at: testing method, test system (OS, compiler, hardware, etc.), results, explanation of changes.

Here's perhaps how I could make three separate patches:

  • "optimisation as you go", that cover a lot of files, but not much code changes in the end mostly "method(paramClass)" => "method(const param &Class) const" and other const opportunities whenever possible. (that's the one that make the patch big, because of the huge number of files.)
  • Pathfinding (CFvector2D and all path files does cover a lot of files too ).
  • Basic branches optimisation (remove if from inside while and call stack, and not inline funcs), and whenever RVO doesn't kick (and cause temp object alloc).

Sounds good to me, and if you can clean up any commented out code that was added, that would help too. Since it's a patch, we'll already be able to see and compare the old and new code without it being commented out :)

Last edited 11 years ago by historic_bruno (previous) (diff)

comment:8 by tuan kuranes, 11 years ago

Here's all the patches:

  • pathfind.patch: huge patch, but very huge perf boost, mostly cycles saving reusing results, and using more appropriate structures, minimizing allocation and object copies. (20% to 400% perf, depending on path requests)
  • size_empty.patch: using empty() instead of size(), because size does iterate and check content size. (0.1 perf...)
  • skinning.patch: specific RVO not kicking code for CPU model skinning, avoiding lots of temp objects,and loop unrolling as 4 bones is a static number. (3-7%perf)
  • const_only.2.patch: const param& , const param, const method, const simple variable (making compiler avoiding object copies whenever possible) some rvo no kicks in there as well.(5-10%perf)
  • minimap.patch: add VBO support for unit drawing when supported, huge perf boost per frame, as opengl driver doesn't have to sync anymore. (20%perf with a good gpu, invisible with slow gpu)
  • branch.patch: remove branches whenever possible in loops: here CLOSTexture::GenerateBitmap getting huge boost, removing lots of code and adding cache prefetching benefits with the single memset at start (5%perf)
  • perf.patch: all in one patch updated with all changes

Perf is in the main thread only, relative to 100%. Here it makes 0ad very playable on all non naval maps, including huge maps. The real perf stopper now here is a IA economy script "drop sites searching" that kicks in regularly and gives perfs stuttering. Huge pathfind request, (200 units moves at once), still goes up, but gives "reasonable stuttering".

Tested on linux (old cpu), and windows x64 with 32bits exe (good cpu).

Last edited 11 years ago by tuan kuranes (previous) (diff)

comment:9 by scroogie, 11 years ago

Last one applies cleanly.

comment:10 by wraitii, 11 years ago

Trac fails to parse correctly the perf.patch and both const patches, and I fail to apply them (the diff does work though).

It seems to me like you have done substantial changes to the pathfinder and the rangemanager, which will the most need investigating.

Edit: compiled: not sure about performance improvements, particularly as I don't really have any point of comparison, but it doesn't feel slower. Biggest slowdown for me always was the rendering anyway.

Last edited 11 years ago by wraitii (previous) (diff)

comment:11 by tuan kuranes, 11 years ago

Added a .tar.gz with patches inside to be trac-proof.

Path and range is the culprit of the patch, as perf is abissmal as soon as you're having "armies" fighting, or many player/bots (even a simple long "punjab1" scenario play againt "aegis" bots becomes unplayable as times advance (even after a close/reload to remove memory fragmentation). I guess perf is barely visible in simple scenario on a good cpu, but should becomes very apparent on big maps/numerous player/bots.

Last edited 11 years ago by tuan kuranes (previous) (diff)

comment:12 by fabio, 11 years ago

I tried applying the patches inside 0ad_perf.tar.gz​, they apply fine except for the const one:

~/sorgenti/0ad/trunk/source$ patch -p1 < const_only.patch 
patching file graphics/UnitAnimation.cpp
patching file main.cpp
patching file maths/BoundingBoxAligned.cpp
patching file maths/BoundingBoxOriented.cpp
patching file renderer/ModelRenderer.cpp
patching file renderer/OverlayRenderer.cpp
patching file renderer/TerrainOverlay.h
patching file renderer/TerrainRenderer.cpp
patching file renderer/VertexArray.cpp
patch: **** malformed patch at line 266: Index: source/simulation2/system/ComponentManager.cpp

Then I tried a short replay of combat demo with:

time ./0ad -replay=/home/fabio/.config/0ad/logs/sim_log/./3557/commands.txt

With standard 0ad from r13404 I get:

[...CUT...]
Turn 597 (200)... 
TIMER| profile2 get buffer: 7.17836 ms
TIMER| profile2 visitor: 67.0483 ms
TIMER| profile2 query: 74.7369 ms
# Final state: af6cec4fa14b5cfba8b22ddee4373686
TIMER TOTALS (9 clients)
-----------------------------------------------------
  tc_pool_alloc: 34.5203 Mc (1739x)
  tc_png_decode: 0 c (0x)
  tc_dds_transform: 0 c (0x)
  tc_transform: 0 c (0x)
  tc_plain_transform: 0 c (0x)
  tc_ShaderValidation: 0 c (0x)
  tc_ShaderGLSLLink: 0 c (0x)
  tc_ShaderGLSLCompile: 0 c (0x)
  xml_validation: 103.31 Mc (41x)
-----------------------------------------------------

real	0m6.664s
user	0m6.116s
sys	0m0.508s

with the patched one:

[...CUT...]
Turn 597 (200)... 
TIMER| profile2 get buffer: 7.66424 ms
TIMER| profile2 visitor: 74.6106 ms
TIMER| profile2 query: 82.8202 ms
# Final state: 56f8dceac7c11f8548376a9cf2e11899
TIMER TOTALS (9 clients)
-----------------------------------------------------
  tc_pool_alloc: 36.4033 Mc (1739x)
  tc_png_decode: 0 c (0x)
  tc_dds_transform: 0 c (0x)
  tc_transform: 0 c (0x)
  tc_plain_transform: 0 c (0x)
  tc_ShaderValidation: 0 c (0x)
  tc_ShaderGLSLLink: 0 c (0x)
  tc_ShaderGLSLCompile: 0 c (0x)
  xml_validation: 105.967 Mc (42x)
-----------------------------------------------------

real	0m5.437s
user	0m4.892s
sys	0m0.520s

So it is at least > 15% faster on this map with few units and running only a couple of minutes, which is really promising. I should try with a longer play when I'll find some time.

Note however that the final state hash is different with the patch.

comment:13 by fabio, 11 years ago

I also tried with Combat demo (huge). However with the patch it's a lot slower (6m52.802s vs. 3m23.077s) (I may be doing something wrong, however). I attach the commands.txt for this demo to let you test.

Last edited 11 years ago by fabio (previous) (diff)

by fabio, 11 years ago

Attachment: commands.txt added

this replay is a lot slower with the patched game

comment:14 by scroogie, 11 years ago

I can confirm that with your commands.txt.

Unpatched version:

TIMER| profile2 get buffer: 846.507 us
TIMER| profile2 visitor: 53.3953 ms
TIMER| profile2 query: 54.4477 ms
# Final state: 56e7b0c246f7bd2ea0fe92790b3016b6
TIMER TOTALS (9 clients)
-----------------------------------------------------
  tc_pool_alloc: 12.2239 Mc (1738x)
  tc_png_decode: 0 c (0x)
  tc_transform: 0 c (0x)
  tc_plain_transform: 0 c (0x)
  tc_dds_transform: 0 c (0x)
  tc_ShaderGLSLLink: 0 c (0x)
  tc_ShaderGLSLCompile: 0 c (0x)
  tc_ShaderValidation: 0 c (0x)
  xml_validation: 18.5788 Mc (7x)
-----------------------------------------------------

real    0m32.780s
user    0m32.635s
sys     0m0.076s

Patched version:

TIMER| profile2 get buffer: 924.714 us
TIMER| profile2 visitor: 53.2509 ms
TIMER| profile2 query: 54.4416 ms
# Final state: e669e0cbc9e4afcf060b66203d3a0676
TIMER TOTALS (9 clients)
-----------------------------------------------------
  tc_pool_alloc: 26.1131 Mc (1738x)
  tc_png_decode: 0 c (0x)
  tc_transform: 0 c (0x)
  tc_plain_transform: 0 c (0x)
  tc_dds_transform: 0 c (0x)
  tc_ShaderGLSLLink: 0 c (0x)
  tc_ShaderGLSLCompile: 0 c (0x)
  tc_ShaderValidation: 0 c (0x)
  xml_validation: 17.6386 Mc (6x)
-----------------------------------------------------

real    1m14.301s
user    1m13.198s
sys     0m0.255s

I also did a callgrind run on both versions. While the unpatched version did 13691 calls to CCmpPathFinder::ProcessShortRequest, the patched version did 22644 calls. Is that because of timing differences?

comment:15 by historic_bruno, 11 years ago

Notice the final state hashes differ with and without the patches, so it's altering the simulation behavior.

comment:16 by tuan kuranes, 11 years ago

All patches applies here with "svn patch myfile.patch" ? is using patch directly mandatory for the project ?

Reposted a const.patch/diff

Other patches applies and let see a 3% gain on your benchmark method.

Now for pathfinding, the hash thing is certainly a problem. I'll have a look.

On the "time" performance, they're not representative at all on pathfinding patch, as in order to keep "save" compatibility, serialisation does some vector to map/ptr_map translation and deserialisation each time (which happens a lot during replay apparently), which surely gives wrong timing there. (the entity map was used for find and loop, whereas a vector for loop and an entityid indexed vector gives ). Testable with a non-tested hack with pathfind_test, just for perf, which gives 10% on that replay

I'll look for a workaround for both problems (hash and serialisation overhead breaking benchmarks (perhas related)).

Version 1, edited 11 years ago by tuan kuranes (previous) (next) (diff)

comment:17 by scroogie, 11 years ago

Don't you think that at least some of the timing difference might be because of this:

I also did a callgrind run on both versions. While the unpatched version did 13691 calls to CCmpPathFinder::ProcessShortRequest?, the patched version did 22644 calls.

in reply to:  17 comment:18 by tuan kuranes, 11 years ago

Replying to scroogie:

Don't you think that at least some of the timing difference might be because of this:

I also did a callgrind run on both versions. While the unpatched version did 13691 calls to CCmpPathFinder::ProcessShortRequest?, the patched version did 22644 calls.

Surely, that too, but I suspect that the drastic perf difference in pathfinding comes from the serialization thing.

comment:19 by tuan kuranes, 11 years ago

Removed the problematic parts, fixing perf "timing" and hash. Posted the patch pathfind.2.patch still giving reasonable perfs on pathfinding. So now all patch should apply and gives correct hash and perf on both linux and win32 ( 0ad_perf.tar.gz​ (without pathfind and const ) + const.patch + pathfind.2.patch )

I'll make a new patch and investigate on the entity_id entity map perf/hash problem in another ticket/patch, as the perf impact of std::map::find and iterate grows with number of agents (log(n) for find, non contigous memory access for iteration) instead of not constant not growing find and prefetching iteration, and therefore is a serious candidate for slowing down huge numbers game.

comment:20 by fabio, 11 years ago

I tried the "3" package and the performance now looks similar to stock 0ad (no regression but no improvement). I tried again the two commands.txt I tested before.

comment:21 by tuan kuranes, 11 years ago

@fabio strange, on your commands.txt time bench I get a consistent 3-5% perf. Be sure to do the time at least twice each to avoid "cold cache" effect

Added a patch to apply on vanilla 0ad svn in order to test/benchmark directly the pathfinding using a simple command:

test -test TestCmpPathfinder

Using that benchmark, I get 10-15% on computePath and 15-20% on computeshortpath.

comment:22 by fabio, 11 years ago

Yes, I noticed a small gain with the patch, but the first command.txt test is still slower than when using the first patch (where I got a >15% improvement).

comment:23 by leper, 11 years ago

In 13413:

Change some size() > 0 to Based on patches by kuranes and Markus. Refs #1852, #1923.

by tuan kuranes, 11 years ago

Attachment: Archive.zip added

latest all-in-one patch against latest svn (removed size_empty, added Territory_memfrag and pathfind_bench useful for vanilla 0ad comparison)

comment:24 by historic_bruno, 11 years ago

I tested the latest minimap.patch and indeed measured a performance improvement on my system

Win7 SP 1 (6.1.7601); CPU: x86, Intel Core i7-2600K @ 3.40GHz (1x4x2), 3.39 GHz; Memory: 8192 MiB; 5352 MiB free; Graphics Card: AMD Radeon HD 6800 Series; OpenGL Drivers: 4.2.12217 Compatibility Profile Context 12.104.0.0, atioglxx.dll (6.14.10.12217)

Example profiling data from Gambia River map shortly after loading:

With patch: render minimap 0.873 ms/frame
Without patch: render minimap 1.516 ms/frame

But I notice it breaks minimap unit rendering on fixed function render path. This can be tested by setting renderpath=fixed in your local.cfg file (see Manual_Settings). It's not a big deal because we've been meaning to remove the fixed render path anyway, but progress on that has stalled and we still technically support it. Can you figure out what might have broken?

The other thing is that it looks very strange when toggling reveal map (Alt+D to open dev overlay), as if it's being rendered in a different order than before. I don't think this is a significant problem either but IMO it looks worse than it did.

What's the motivation for maxEntitiesDrawn = 8144? We could have maps with more entities than that, but we don't want some entities left unrendered in the minimap or debug warnings on those maps. Of course there is a limit to how many could be usefully rendered in the minimap but we should use some algorithm for that (along with other minimap rendering improvements - #763).

Otherwise, it needs more testing for bugs on different systems, then we can commit it.

Last edited 11 years ago by historic_bruno (previous) (diff)

by tuan kuranes, 11 years ago

Attachment: minimap.2.patch added

fixed fixed function renderpath

comment:25 by tuan kuranes, 11 years ago

  • added new Minimap.patch fixing the ff path.
  • tested on nvidia linux&win7
  • 8114 is arbitrary. Can goes up to 65535 (max indice if we stick to 16bits buffer,best openglES wise), didn't find a max entity number. A simple merge by distance algo is indeed possible, but should only kick if above the size. (a fixed size is mandatory perf wise, avoiding buffer allocation/deallocation is important perfwise)
  • "Reveal Map order" change is just showing that mixing deprecated opengl immediate calls with modern (async) opengl calls leads to driver random drawing order. Can change that in making all other deprecated opengl calls (glvertexf() and all) going modern opengl (async vbo). (is there a ticket for getting rid of ff path ? of gl deprecated calls (could be mixed with openglES ticket)? )

in reply to:  25 comment:26 by fabio, 11 years ago

is there a ticket for getting rid of ff path ? of gl deprecated calls (could be mixed with openglES ticket)?

There is a forum topic: http://www.wildfiregames.com/forum/index.php?showtopic=16734

in reply to:  25 comment:27 by historic_bruno, 11 years ago

Thanks for fixing the bug. There is no ticket for removing fixed function support yet, the person who was working on that (myconid) has disappeared. fabio's link explains the reasoning though: the renderer was going to be completely overhauled and fixed function support was an obstacle. We probably shouldn't remove it without reason, but only if there is an advantage for everyone else, as myconid claimed there would be.

Replying to kuranes:

  • 8114 is arbitrary. Can goes up to 65535 (max indice if we stick to 16bits buffer,best openglES wise), didn't find a max entity number. A simple merge by distance algo is indeed possible, but should only kick if above the size. (a fixed size is mandatory perf wise, avoiding buffer allocation/deallocation is important perfwise)

There can be many more entities than that in theory (see Entity.h), but 65535 is more than enough for visible entities on the minimap, with our current map sizes.

  • "Reveal Map order" change is just showing that mixing deprecated opengl immediate calls with modern (async) opengl calls leads to driver random drawing order. Can change that in making all other deprecated opengl calls (glvertexf() and all) going modern opengl (async vbo).

I don't think it's worth doing that now, but it's something to keep in mind for other reasons. The difficulty (I think) would be implementing the fixed function version and maintaining them both as we improve the minimap. It needs a lot of work, like using pixel art for resources, instead of drawing every entity as a point.

comment:28 by tuan kuranes, 11 years ago

  • You want me to do the 65535 change or the merge by distance thing ?
  • Note that removing glvertexf calls in favor of VBO call doesn't break ff. (it's removing "glmatrix" and adding shader calls that does)
  • About ff, I replied in the forum thread. In my "steps" in the forum, we can even keep ff as a possible "renderer". and make only the VBO changes (shader changes would be done in "renderers") (once you have VBO, you can do async drawing and draw them in any renderer you like).

in reply to:  28 comment:29 by historic_bruno, 11 years ago

Replying to kuranes:

  • You want me to do the 65535 change or the merge by distance thing ?

A large limit is fine for now, after it's committed, we can make a ticket that it might crash with a large number of entities. Until we decide how to handle this. I think far far less than 65535 should ever be rendered in practice. We need some minimap mockups to see how it should look, maybe Pureon will be up for that.

  • Note that removing glvertexf calls in favor of VBO call doesn't break ff. (it's removing "glmatrix" and adding shader calls that does)

We already have VBO support, so that doesn't sound too difficult. If you want to attach a new patch with it, we'll test :)

comment:30 by historic_bruno, 11 years ago

kuranes, did you want to continue working on the minimap rendering? If not, we can move forward on committing the latest patch and leave the remaining work as a TODO.

in reply to:  30 comment:31 by tuan kuranes, 11 years ago

Replying to historic_bruno:

kuranes, did you want to continue working on the minimap rendering? If not, we can move forward on committing the latest patch and leave the remaining work as a TODO.

Sry, I tought you were hinting at letting it run like that until next gen minimap show up. I can make a patch either with a 65k limit or just limit drawing a 8144 ? Which one it is ?

(got a bit sidetracked with making pathfinding a separate static lib with its own gui tools for test/code in wxwidget)

Last edited 11 years ago by tuan kuranes (previous) (diff)

in reply to:  25 comment:32 by historic_bruno, 11 years ago

Sorry, I was only referring to this suggestion:

Replying to kuranes:

  • "Reveal Map order" change is just showing that mixing deprecated opengl immediate calls with modern (async) opengl calls leads to driver random drawing order. Can change that in making all other deprecated opengl calls (glvertexf() and all) going modern opengl (async vbo). (is there a ticket for getting rid of ff path ? of gl deprecated calls (could be mixed with openglES ticket)? )

I thought you might be working on that, so I wasn't going to commit it. Sounds like you've been working on more important things though, don't worry about it :)

comment:33 by ben, 11 years ago

In 13454:

Adds VBO support to minimap entity rendering to improve performance, patch by kuranes. Refs #1923

comment:34 by tuan kuranes, 11 years ago

@ben: thanks

@historic_bruno: ok, misunderstood, sorry. On that ff removal, I've made some proposition on that FF removal topic: http://www.wildfiregames.com/forum/index.php?showtopic=16734&st=20#entry269321 but didn't get much discussion/approval. I'll add that specific step on my 0ad todo if it's ok.

comment:35 by Jorma Rebane, 11 years ago

Hey kuranes. Could you perhaps remove any old irrelevant patch/diff files? You can do that by clicking on the attachment, scrolling to the end of diff comparisons and clicking "Delete Attachment".

As for these patches in general, I think it's very hard to review it if you have all these patch files. It would make more sense to separate your minimap improvements and const patch into separate tickets - then we could give a more objective review on your code.

I know that there is a lot to improve in 0 A.D. source and it's hard not to fix everything :) But try to focus on a single task per ticket.

In general about the patches though:

  • I think "const size_t value;" and "int getStuff(const MyStuff& arg) const" is all right, but "const size_t* const pvalue;" is a bit pushing the limits of readability :). Furthermore it should give no performance boost, since it's just a compile time check to make the variable read-only. The assembly generated has no difference. So having "int getStuff(const size_t arg)" is totally unnecessary, however "void doStuff(const MyStuff& arg)" still makes sense - since you are guaranteeing the value passed by reference will not change.
  • Storing calculations in temporary variables is common sense and good practice in general, so that's the correct way to go.
  • Passing larger than 4-byte values by reference is also a good thing to do. In general 0 A.D. has too many copy-on-move and temporary objects, so it's important to reduce that.

If you have created new tickets to categorize the patches and have removed the old patches, I can give you a better review than this. :)

Last edited 11 years ago by Jorma Rebane (previous) (diff)

comment:36 by historic_bruno, 11 years ago

Owner: set to tuan kuranes

comment:37 by tuan kuranes, 11 years ago

Thanks RedFox for following up.

Here's list of created ticket for easier review: #2022 #2023 #2024 #2025 #2026

Sadly, No "delete attachment" for me, lacking trac permission I guess. All files can be deleted but archive.zip. I'll create fast ticket just now.

Non "optimizing const" are code documenation oriented, they can be removed if they don't suit team style.

  • const size_t* const is but a way to point for aliasing and therefore "restrict" optimisations and use. (themself leading to potential openmp "optimisations spot")
  • "int getStuff(const size_t arg)" is more developper safeguard and code documentation than real perf. (preventing people using args as local vars and easing code reading, by pointing possible optimisation when code evolves)

Btw, If you want just perf "const &" spots, and avoid my patches, get free version of PVS-studio, and run an analysis, it should list theme perfectly.

IRL hard times here due to heavy rainfall flooding in my location, impossible to allocate times on 0ad just now so pathfinding test env and the skinning patch followup is on hold.

in reply to:  37 ; comment:38 by historic_bruno, 11 years ago

Replying to kuranes:

IRL hard times here due to heavy rainfall flooding in my location, impossible to allocate times on 0ad just now so pathfinding test env and the skinning patch followup is on hold.

Sorry to heat that, thanks for your work so far, and thanks for creating separate tickets :) Hope you stay safe and dry!

in reply to:  38 comment:39 by tuan kuranes, 11 years ago

Thanks. Now Just a matter of shovelling lots of mud. llots.

comment:40 by Kieran P, 11 years ago

Milestone: Alpha 14Alpha 15

comment:41 by wraitii, 10 years ago

Milestone: Alpha 15Alpha 16
Priority: Must HaveShould Have

We should review if we want any of this at all. Setting as "Should have".

comment:42 by Niek, 10 years ago

When is this being commited/reviewed? It seems that we would get an enormous performance boost: just what 0 A.D. needs!

Edit: I'm referencing to the whole Kuranes patch.

Last edited 10 years ago by Niek (previous) (diff)

in reply to:  42 comment:43 by Josh, 10 years ago

Replying to niektb:

When is this being commited/reviewed? It seems that we would get an enormous performance boost: just what 0 A.D. needs!

Edit: I'm referencing to the whole Kuranes patch.

As the patches are 10+ months old they probably won't apply anymore without a lot of work.

comment:44 by wraitii, 10 years ago

Judging from my work on other Kuranes patches it's unlikely to be a huge difference.

comment:45 by sanderd17, 10 years ago

Resolution: wontfix
Status: newclosed

Too much work to fix + not enough improvement.

comment:46 by sanderd17, 10 years ago

Milestone: Alpha 16

comment:47 by sanderd17, 8 years ago

Keywords: review removed
Note: See TracTickets for help on using tickets.