#3054 closed defect (needsinfo)
Rendering slowdown on Deep Forest with AMD cards
Reported by: | Yves | Owned by: | |
---|---|---|---|
Priority: | Should Have | Milestone: | |
Component: | Core engine | Keywords: | |
Cc: | Patch: |
Description (last modified by )
When starting a game with these parameters, then zooming out and moving the camera to the center of the map, the FPS drop from >60 to <20 on my Radeon R9 270(x).
./pyrogenesis -autostart="random/deep_forest" -autostart-players=4 -autostart-size=256 -autostart-ai=1:petra -autostart-ai=2:petra -autostart-ai=3:petra -autostart-ai=4:petra
The same problem was confirmed on IRC with a R9 280x. The problem is the same on Linux with the open source drivers and on Windows with the proprietary drivers.
Philip reported 50-60 FPS in this worst case and even around 175 with the default camera perspective with a GF 560Ti. The performance with the other two AMD/ATI cards should be better actually.
Attachments (5)
Change History (24)
by , 9 years ago
Attachment: | deep_forest_render_slowdown.sleepy added |
---|
by , 9 years ago
Attachment: | rendering_profiledeep_forest_render_slow_2015-02-15_00-24-22.png added |
---|
screenshot from the profile with very sleepy
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
Can confirm it on a HD8750M (Max 30 fps, min 5fps) the game is really laggy and movement over the map really slow, while on the Intel HD4000 chipset even though the game runs at around 20 fps movement stays fluid.
by , 9 years ago
Attachment: | pyrogenesis-gDEBuggerProfilingData.csv added |
---|
The data I collected with gdebug.
comment:3 by , 9 years ago
Milestone: | Backlog → Alpha 18 |
---|
comment:4 by , 9 years ago
I tried some profiling with PerfStudio
http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudio/
Got those screens, maybe it will be usefull for someone
http://i.imgur.com/IlTeB7O.png http://i.imgur.com/FSuhDkW.png http://i.imgur.com/m4o6xxO.png
comment:5 by , 9 years ago
I'm pretty sure performance has always been like this on deep forest just due to sheer poly count. I have an HD 5450 on open source drivers that starts around 30fps and goes to 14fps in the middle of the map at min zoom.
comment:6 by , 9 years ago
Note: on linux you can use this tool to monitor AMD GFX GPU usage: https://github.com/clbr/radeontop http://www.phoronix.com/forums/showthread.php?72130 No idea if it can be useful here.
by , 9 years ago
Attachment: | time_radeonsi_dri_glDrawElements.so.png added |
---|
Profile with the much slow glDrawElements
comment:7 by , 9 years ago
Milestone: | Alpha 18 → Alpha 19 |
---|
comment:8 by , 9 years ago
I've figured out why glDrawElements is much slower than glDrawRangeElements.
It's because vbo_get_minmax_indices needs to be called per object, which is quite expensive. The relevant code in Mesa is here.
if (!index_bounds_valid) if (!all_varyings_in_vbos(arrays)) vbo_get_minmax_indices(ctx, prims, ib, &min_index, &max_index, nr_prims);
DrawRangeElements passes true for index_bounds_valid, so the check is not needed (it's not needed because you already specify the bounds/minmax_indices as argument). Now the more tricky question was why it does the check in the driver when the whole vertex buffer is already in graphics memory and the driver should not even care about figuring out the bounds for uploading. To answer this, I had to look at the all_varyings_in_vbos function here.
all_varyings_in_vbos(const struct gl_client_array *arrays[]) { GLuint i; for (i = 0; i < VERT_ATTRIB_MAX; i++) if (arrays[i]->StrideB && !arrays[i]->InstanceDivisor && !_mesa_is_bufferobj(arrays[i]->BufferObj)) return GL_FALSE; return GL_TRUE; }
Basically, if the data for any of the enabled attributes is not in a VBO in graphics memory, this check returns false. Some debugging with GDB has shown that this returned false for i==24. So the attribute number 24 causes the check to fail and is the reason why glDrawElements is so much slower.
It was quite difficult to figure out where attribute 24 came from. This data gets passed along between different parts of the driver and in the OpenGL compatibility profile, it converts the attribute IDs because the first ones are reserved for the fixed function pipeline (or similar...). So instead of looking for ID 24, I had to look for ID 7 because the generic attributes start with ID 17 (17+7=24).
Finally I figured out that removing the following line from model_common.xml made glDrawElements just as fast as glDrawRangeElements (in InstancingModelRenderer.cpp):
<attrib name="a_tangent" semantics="CustomAttribute2" if="USE_INSTANCING || USE_GPU_SKINNING"/>
Of course you have to disable gentagents in your config (and probably something else), or this change breaks stuff. Just disabling it doesn't help if you leave that line in the xml.
Unfortunately I don't think that this helps anything in regard to the general performance problem of AMD cards compared to Nvidia cards. It's nice to know why glDrawElemnts was so much slower, though.
comment:9 by , 9 years ago
It seems likely that AMD OpenGL drivers are simply slower with these relatively old extensions and approaches we use for rendering. Instead of trying to figure out why the same code is much slower than on comparable Nvidia hardware, I think it's better to implement some newer rendering techniques.
Mainly the idea is to reduce CPU overhead in the driver which is caused by state changes and validation. I'm testing uniform buffer objects (UBOs) together with instanced rendering at the moment. UBOs per object are quite slow, but with instancing it should be possible to pack uniform data for multiple draw calls into one buffer and then draw multiple objects at once.
This and additional ideas are described here: https://www.youtube.com/watch?v=-bCeNzgiJ8I
comment:10 by , 9 years ago
I've create a branch on my github repository for my work on the OpenGL 4 renderer: https://github.com/Yves-G/0ad/tree/OGL4
On that page, there's also a description of the current state, a guide how to test it and some information about the improvements it brings so far.
comment:12 by , 9 years ago
Milestone: | Alpha 19 → Alpha 20 |
---|
Hi Yves, if you don't mind I'll push that to A20. :)
comment:13 by , 8 years ago
Keywords: | patch added |
---|
comment:14 by , 8 years ago
Milestone: | Alpha 20 → Alpha 21 |
---|
comment:17 by , 2 years ago
Description: | modified (diff) |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
The related problem with vertex attributes was fixed in r26781. The other part of the ticket is more about general optimizations.
comment:18 by , 2 years ago
Keywords: | patch removed |
---|
comment:19 by , 2 years ago
Milestone: | Backlog |
---|
profiling with very sleepy