Opened 13 years ago

Closed 13 years ago

#722 closed task (fixed)

[PATCH] Optimise terrain renderer

Reported by: Philip Taylor Owned by:
Priority: Should Have Milestone: Alpha 5
Component: Core engine Keywords: simple
Cc: Patch:

Description

Some quick experimentation suggests the terrain rendering code can be made significantly faster by batching more.

Currently, TerrainRenderer::RenderTerrain calls CPatchRData::RenderBase etc for every patch independently, and CPatchRData renders each of its own splats. There are often lots of patches on screen, and they often share a lot of textures. It's better if TerrainRenderer collects all the splats, sorts by texture and by vertex buffer, and then renders them all at once, to reduce the number of state changes.

This is fairly trivial for base textures. For blend textures we need to preserve the ordering within a patch; the same grouping algorithm as in CPatchRData::BuildBlends should probably work.

Attachments (3)

proposal_ro_722_r2.patch (10.7 KB ) - added by Rodolphe Ortalo 13 years ago.
Experimental patch with stats, from source/ directory.
proposal_ro_722_r3.patch (10.9 KB ) - added by Rodolphe Ortalo 13 years ago.
3rd revision, with respect to source/
722_r4.patch (11.8 KB ) - added by Philip Taylor 13 years ago.
a further revision

Download all attachments as: .zip

Change History (16)

comment:1 by Rodolphe Ortalo, 13 years ago

The following patch implements such a different approach for batching CPatchRData::RenderBase and CPatchRData::RenderBlends via static methods in the class. (I selected that way to avoid breaking the separation between TerrainRenderer and CPatchRData, but without any specific reason.) The actual rendering phase (in "flush") sorts the target splats according to the texture used and saves texture setup when possible.

The code was also initially instrumented (via a preprocessor switch) to count such savings and occasionally display the corresponding statistics. Such statistics seem to show that potential savings are interesting (several hundred texture switches should be saved), e.g.:

Total number of patch objects submitted for render base: 34
Total number of splats submitted for render base: 156
Texture reuse stats: 5, 2, 6, 9, 12, 10, 1, 7, 9, 4, 12, 1, 8, 6, 16, 9, 1, 2, 3, 13,  TOTAL SAVES = 136
Total number of BLEND splats submitted for render base: 148
Blends texture reuse stats: 6, 2, 6, 2, 17, 6, 0, 6, 7, 6, 5, 0, 3, 12, 11, 0, 0, 4, 2,  BLENDS TOTAL SAVES = 95

Total number of patch objects submitted for render base: 41
Total number of splats submitted for render base: 197
Texture reuse stats: 5, 2, 7, 10, 15, 0, 11, 2, 9, 11, 6, 14, 4, 11, 9, 20, 12, 4, 3, 4, 17,  TOTAL SAVES = 176
Total number of BLEND splats submitted for render base: 188
Blends texture reuse stats: 7, 5, 10, 2, 21, 8, 1, 9, 8, 7, 5, 6, 9, 21, 20, 7, 3, 6, 4,  BLENDS TOTAL SAVES = 159

Total number of patch objects submitted for render base: 47
Total number of splats submitted for render base: 317
Texture reuse stats: 8, 8, 13, 17, 24, 6, 9, 4, 1, 5, 16, 2, 0, 14, 19, 28, 40, 21, 2, 16, 7, 10, 8, 15,  TOTAL SAVES = 293
Total number of BLEND splats submitted for render base: 360
Blends texture reuse stats: 10, 16, 16, 8, 33, 4, 5, 2, 0, 3, 23, 2, 1, 17, 19, 42, 33, 2, 23, 8, 9, 6, 5,  BLENDS TOTAL SAVES = 287

However, contrary to what these simple stats suggest, the actual performance observed by applying the patch in its current form is not especially improved. On a low-end computer, it was even a 10% degradation that was observed.

This is counter-intuitive: saving 400 texture setups per frame sounds attractive in the first place. But actual improvements are yet to be confirmed. In its current state, I would therefore recommend not applying the patch yet, though I welcome other experimentation of course.

Overall, maybe better profiling and timing measurements of this part of the engine are probably needed (and possibly higher level optimization like static terrain model simplifications).

by Rodolphe Ortalo, 13 years ago

Attachment: proposal_ro_722_r2.patch added

Experimental patch with stats, from source/ directory.

comment:2 by Philip Taylor, 13 years ago

BatchSplatPatch::operator< looks wrong: if A = {layer:0, texture=100} and B = {layer:1, texture=50} then it'll say A < B and B < A. I get crashes with corrupted data, probably because this confuses std::sort. Seems to work better like:

    return (layer < b.layer) || (layer == b.layer && splat.m_Texture < b.splat.m_Texture);

I think doing a global comparison of layer values between different patches will unnecessarily restrict the batching. If one patch has a list of layers with textures [R, G, B] and another has [B] then we should render in three batches, instead of doing all the layer 0s before all the layer 1s. This seems basically the same as what CPatchRData::BuildBlends does, when it's taking the 16x16 single-tile stacks of textures and merging them into layers (preserving just the order within individual tiles) - we need to merge all those layers into batches (preserving just the order within individual patches).

The performance difference from this patch still seems noticeable to me (on Intel GMA 4500MHD on Linux, 2.16GHz Core 2 Duo), if I change local.cfg to say

view.zoom.max = 2560
view.zoom.default = 1920

so it renders more terrain and is easier to measure: the "render patches" part of the profiler varies around 23-26ms/frame without the patch, and 15-17ms/frame with it, and the overall performance has a similar improvement. (Players won't normally zoom out that much so the improvement will be smaller, but still non-zero, and we might zoom out or aim towards the horizon when playing cinematics if not during normal gameplay.)

Better profiling tools would definitely be good - I guess we should at least have a mode that disables simulation updates and animation to provide more stable measurements.

comment:3 by Rodolphe Ortalo, 13 years ago

Thanks for spotting the bug with operator<! I'll try to improve on the ordering. Crashes are problematic on my machine too. I've narrowed them via gdb to "p" in FlushRenderBlends which was sometimes undefined (delete-d PatchRData?). Better testing in needed anyway on this issue.

comment:4 by Kieran P, 13 years ago

Milestone: BacklogAlpha 4
Summary: Optimise terrain renderer[PATCH] Optimise terrain renderer

comment:5 by Rodolphe Ortalo, 13 years ago

New version of patch: different ordering function as suggested above. Seems stable on my computers. Saves may be suboptimal for blends but sound interesting anyway.

by Rodolphe Ortalo, 13 years ago

Attachment: proposal_ro_722_r3.patch added

3rd revision, with respect to source/

comment:6 by Philip Taylor, 13 years ago

I see some rendering glitches with this patch - e.g. in Oasis at the starting camera position, in the grassy area to the right of the civ center, some of the blends stop sharply at a tile boundary, and it flips between incorrect and correct rendering when I rotate the camera.

I think this is because blend_batchsplat_cmp still isn't valid for std::sort - it's possible to have values a, b, c such that cmp(a, b) && cmp(b, c) && !cmp(a, c), which violates the C++ standard's requirements for std::sort (particularly that the comparison must be transitive) so there's no guarantee the sort will give correctly-ordered output.

I don't think it's possible to do this with std::sort: mathematically the relationship between splats is a partial order, not a total preorder, and putting them into a total order needs a topological sort algorithm instead.

by Philip Taylor, 13 years ago

Attachment: 722_r4.patch added

a further revision

comment:7 by Philip Taylor, 13 years ago

I tried modifying the patch to do correct blend sorting (with the same algorithm as in BuildBlends), and this seems to fix the rendering errors I noticed. Also I made it use glMultiDrawArrays/glMultiDrawElements to draw each batch, which saves a little bit of CPU time compared to explicitly looping (maybe like 0.5msec per frame though I don't really trust the accuracy of my measurement). The overall performance improvement seems to be around the same as before (i.e. small but noticeable and worthwhile).

(I just hacked the changes into this patch to experiment with it. It'll need to be cleaned up a lot before actually committing.)

comment:8 by Rodolphe Ortalo, 13 years ago

I had not noticed the rendering issue. Thanks for spotting.

I saw your patch and am currently checking out a fresh copy to try it. Sounds much more sophisticated than mine with respect to reordering splats.

However, don't you think it would be easier to stay with simpler code and suboptimal rendering rather than trying to achieve the ideal order? E.g. get rid of blend_batchsplat_cmp and use BatchSplatPatch::operator< directly. (A simple ordering on texture pointer seems already to offer more than 80% of all possible texture switches saves from what I had counted in some unpublished code spitting to std::cout.)

comment:9 by Philip Taylor, 13 years ago

The current BatchSplatPatch::operator< ignores the patchdata and layer values, so I think it won't even try to preserve the blend ordering. The splats within a patch are already grouped by texture, so sorting just within a patch won't improve anything. Do you mean one of those ways, or some different type of ordering?

It would be easiest to stay with the current renderer implementation, but the goal is to make it faster rather than to do what's easiest :-) . I don't think the algorithm in my patch is excessively complex for this (the implementation would probably be easier to follow if the sorting algorithm was split into a separate function and shared with BuildBlends), and it seems like the right way to do near-optimal batching (as far as I can tell), so I'm not convinced of the benefit of intentionally doing a less-efficient approach.

comment:10 by Rodolphe Ortalo, 13 years ago

I meant your initially ordering suggestion (with layer and texture).

In the static class methods, sorting can be done over all the patches at once (all the submitted splats more precisely), so some reordering opportunities (inter-patches) existed to save a significant number of textures setup. (Even, astonishingly, for blend splats.) I checked that empirically before starting to work on the ticket.

Anyway, your patch is an improvement over mine in terms of performance that's certain. I was more caring about the maintenability/performance balance. (And I also started to do some work on the water renderer which seems to offer opportunity for much more improvements.)

Do you want me to proceed based on your work, or wait on that issue? (Just tell! ;-)

comment:11 by Philip Taylor, 13 years ago

I'm happy to clean it up myself and commit it, so probably it'll be quickest to do that.

comment:12 by Kieran P, 13 years ago

Milestone: Alpha 4Alpha 5

comment:13 by philip, 13 years ago

Resolution: fixed
Status: newclosed

(In [9053]) # Optimise terrain renderer. Batch patch splats by texture match. Use VBOs for patch indices. Fixes #722.

Note: See TracTickets for help on using tickets.