Opened 13 years ago
Closed 13 years ago
#722 closed task (fixed)
[PATCH] Optimise terrain renderer
Reported by: | Philip Taylor | Owned by: | |
---|---|---|---|
Priority: | Should Have | Milestone: | Alpha 5 |
Component: | Core engine | Keywords: | simple |
Cc: | Patch: |
Description
Some quick experimentation suggests the terrain rendering code can be made significantly faster by batching more.
Currently, TerrainRenderer::RenderTerrain
calls CPatchRData::RenderBase
etc for every patch independently, and CPatchRData
renders each of its own splats. There are often lots of patches on screen, and they often share a lot of textures. It's better if TerrainRenderer
collects all the splats, sorts by texture and by vertex buffer, and then renders them all at once, to reduce the number of state changes.
This is fairly trivial for base textures. For blend textures we need to preserve the ordering within a patch; the same grouping algorithm as in CPatchRData::BuildBlends
should probably work.
Attachments (3)
Change History (16)
comment:1 by , 13 years ago
by , 13 years ago
Attachment: | proposal_ro_722_r2.patch added |
---|
Experimental patch with stats, from source/ directory.
comment:2 by , 13 years ago
BatchSplatPatch::operator<
looks wrong: if A = {layer:0, texture=100}
and B = {layer:1, texture=50}
then it'll say A < B
and B < A
. I get crashes with corrupted data, probably because this confuses std::sort
. Seems to work better like:
return (layer < b.layer) || (layer == b.layer && splat.m_Texture < b.splat.m_Texture);
I think doing a global comparison of layer
values between different patches will unnecessarily restrict the batching. If one patch has a list of layers with textures [R, G, B]
and another has [B]
then we should render in three batches, instead of doing all the layer 0s before all the layer 1s. This seems basically the same as what CPatchRData::BuildBlends
does, when it's taking the 16x16 single-tile stacks of textures and merging them into layers (preserving just the order within individual tiles) - we need to merge all those layers into batches (preserving just the order within individual patches).
The performance difference from this patch still seems noticeable to me (on Intel GMA 4500MHD on Linux, 2.16GHz Core 2 Duo), if I change local.cfg to say
view.zoom.max = 2560 view.zoom.default = 1920
so it renders more terrain and is easier to measure: the "render patches" part of the profiler varies around 23-26ms/frame without the patch, and 15-17ms/frame with it, and the overall performance has a similar improvement. (Players won't normally zoom out that much so the improvement will be smaller, but still non-zero, and we might zoom out or aim towards the horizon when playing cinematics if not during normal gameplay.)
Better profiling tools would definitely be good - I guess we should at least have a mode that disables simulation updates and animation to provide more stable measurements.
comment:3 by , 13 years ago
Thanks for spotting the bug with operator<! I'll try to improve on the ordering. Crashes are problematic on my machine too. I've narrowed them via gdb to "p
" in FlushRenderBlends
which was sometimes undefined (delete-d PatchRData
?). Better testing in needed anyway on this issue.
comment:4 by , 13 years ago
Milestone: | Backlog → Alpha 4 |
---|---|
Summary: | Optimise terrain renderer → [PATCH] Optimise terrain renderer |
comment:5 by , 13 years ago
New version of patch: different ordering function as suggested above. Seems stable on my computers. Saves may be suboptimal for blends but sound interesting anyway.
comment:6 by , 13 years ago
I see some rendering glitches with this patch - e.g. in Oasis at the starting camera position, in the grassy area to the right of the civ center, some of the blends stop sharply at a tile boundary, and it flips between incorrect and correct rendering when I rotate the camera.
I think this is because blend_batchsplat_cmp
still isn't valid for std::sort
- it's possible to have values a
, b
, c
such that cmp(a, b) && cmp(b, c) && !cmp(a, c)
, which violates the C++ standard's requirements for std::sort
(particularly that the comparison must be transitive) so there's no guarantee the sort will give correctly-ordered output.
I don't think it's possible to do this with std::sort
: mathematically the relationship between splats is a partial order, not a total preorder, and putting them into a total order needs a topological sort algorithm instead.
comment:7 by , 13 years ago
I tried modifying the patch to do correct blend sorting (with the same algorithm as in BuildBlends
), and this seems to fix the rendering errors I noticed. Also I made it use glMultiDrawArrays
/glMultiDrawElements
to draw each batch, which saves a little bit of CPU time compared to explicitly looping (maybe like 0.5msec per frame though I don't really trust the accuracy of my measurement). The overall performance improvement seems to be around the same as before (i.e. small but noticeable and worthwhile).
(I just hacked the changes into this patch to experiment with it. It'll need to be cleaned up a lot before actually committing.)
comment:8 by , 13 years ago
I had not noticed the rendering issue. Thanks for spotting.
I saw your patch and am currently checking out a fresh copy to try it. Sounds much more sophisticated than mine with respect to reordering splats.
However, don't you think it would be easier to stay with simpler code and suboptimal rendering rather than trying to achieve the ideal order? E.g. get rid of blend_batchsplat_cmp
and use BatchSplatPatch::operator<
directly. (A simple ordering on texture pointer seems already to offer more than 80% of all possible texture switches saves from what I had counted in some unpublished code spitting to std::cout.)
comment:9 by , 13 years ago
The current BatchSplatPatch::operator<
ignores the patchdata
and layer
values, so I think it won't even try to preserve the blend ordering. The splats within a patch are already grouped by texture, so sorting just within a patch won't improve anything. Do you mean one of those ways, or some different type of ordering?
It would be easiest to stay with the current renderer implementation, but the goal is to make it faster rather than to do what's easiest :-) . I don't think the algorithm in my patch is excessively complex for this (the implementation would probably be easier to follow if the sorting algorithm was split into a separate function and shared with BuildBlends
), and it seems like the right way to do near-optimal batching (as far as I can tell), so I'm not convinced of the benefit of intentionally doing a less-efficient approach.
comment:10 by , 13 years ago
I meant your initially ordering suggestion (with layer and texture).
In the static class methods, sorting can be done over all the patches at once (all the submitted splats more precisely), so some reordering opportunities (inter-patches) existed to save a significant number of textures setup. (Even, astonishingly, for blend splats.) I checked that empirically before starting to work on the ticket.
Anyway, your patch is an improvement over mine in terms of performance that's certain. I was more caring about the maintenability/performance balance. (And I also started to do some work on the water renderer which seems to offer opportunity for much more improvements.)
Do you want me to proceed based on your work, or wait on that issue? (Just tell! ;-)
comment:11 by , 13 years ago
I'm happy to clean it up myself and commit it, so probably it'll be quickest to do that.
comment:12 by , 13 years ago
Milestone: | Alpha 4 → Alpha 5 |
---|
comment:13 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The following patch implements such a different approach for batching
CPatchRData::RenderBase
andCPatchRData::RenderBlends
via static methods in the class. (I selected that way to avoid breaking the separation betweenTerrainRenderer
andCPatchRData
, but without any specific reason.) The actual rendering phase (in "flush") sorts the target splats according to the texture used and saves texture setup when possible.The code was also initially instrumented (via a preprocessor switch) to count such savings and occasionally display the corresponding statistics. Such statistics seem to show that potential savings are interesting (several hundred texture switches should be saved), e.g.:
However, contrary to what these simple stats suggest, the actual performance observed by applying the patch in its current form is not especially improved. On a low-end computer, it was even a 10% degradation that was observed.
This is counter-intuitive: saving 400 texture setups per frame sounds attractive in the first place. But actual improvements are yet to be confirmed. In its current state, I would therefore recommend not applying the patch yet, though I welcome other experimentation of course.
Overall, maybe better profiling and timing measurements of this part of the engine are probably needed (and possibly higher level optimization like static terrain model simplifications).