Opened 9 years ago

Closed 2 years ago

Last modified 2 years ago

#3054 closed defect (needsinfo)

Rendering slowdown on Deep Forest with AMD cards

Reported by: Yves Owned by:
Priority: Should Have Milestone:
Component: Core engine Keywords:
Cc: Patch:

Description (last modified by Vladislav Belov)

When starting a game with these parameters, then zooming out and moving the camera to the center of the map, the FPS drop from >60 to <20 on my Radeon R9 270(x).

./pyrogenesis -autostart="random/deep_forest" -autostart-players=4 -autostart-size=256 -autostart-ai=1:petra -autostart-ai=2:petra -autostart-ai=3:petra -autostart-ai=4:petra

The same problem was confirmed on IRC with a R9 280x. The problem is the same on Linux with the open source drivers and on Windows with the proprietary drivers.

Philip reported 50-60 FPS in this worst case and even around 175 with the default camera perspective with a GF 560Ti. The performance with the other two AMD/ATI cards should be better actually.

Attachments (5)

deep_forest_render_slowdown.sleepy (96.6 KB ) - added by Yves 9 years ago.
profiling with very sleepy
rendering_profiledeep_forest_render_slow_2015-02-15_00-24-22.png (49.4 KB ) - added by Yves 9 years ago.
screenshot from the profile with very sleepy
pyrogenesis-gDEBuggerProfilingData.csv (28.2 KB ) - added by Stan 9 years ago.
The data I collected with gdebug.
time_radeonsi_dri.so.png (74.6 KB ) - added by Yves 9 years ago.
profiling of radeonsi_dri.so
time_radeonsi_dri_glDrawElements.so.png (73.6 KB ) - added by Yves 9 years ago.
Profile with the much slow glDrawElements

Download all attachments as: .zip

Change History (24)

by Yves, 9 years ago

profiling with very sleepy

by Yves, 9 years ago

screenshot from the profile with very sleepy

comment:1 by Yves, 9 years ago

Description: modified (diff)

comment:2 by Stan, 9 years ago

Can confirm it on a HD8750M (Max 30 fps, min 5fps) the game is really laggy and movement over the map really slow, while on the Intel HD4000 chipset even though the game runs at around 20 fps movement stays fluid.

by Stan, 9 years ago

The data I collected with gdebug.

comment:3 by Yves, 9 years ago

Milestone: BacklogAlpha 18

comment:5 by Josh, 9 years ago

I'm pretty sure performance has always been like this on deep forest just due to sheer poly count. I have an HD 5450 on open source drivers that starts around 30fps and goes to 14fps in the middle of the map at min zoom.

comment:6 by fabio, 9 years ago

Note: on linux you can use this tool to monitor AMD GFX GPU usage: https://github.com/clbr/radeontop http://www.phoronix.com/forums/showthread.php?72130 No idea if it can be useful here.

by Yves, 9 years ago

Attachment: time_radeonsi_dri.so.png added

profiling of radeonsi_dri.so

by Yves, 9 years ago

Profile with the much slow glDrawElements

comment:7 by Yves, 9 years ago

Milestone: Alpha 18Alpha 19

comment:8 by Yves, 9 years ago

I've figured out why glDrawElements is much slower than glDrawRangeElements.

It's because vbo_get_minmax_indices needs to be called per object, which is quite expensive. The relevant code in Mesa is here.

      if (!index_bounds_valid)
         if (!all_varyings_in_vbos(arrays))
            vbo_get_minmax_indices(ctx, prims, ib, &min_index, &max_index,
                                   nr_prims);

DrawRangeElements passes true for index_bounds_valid, so the check is not needed (it's not needed because you already specify the bounds/minmax_indices as argument). Now the more tricky question was why it does the check in the driver when the whole vertex buffer is already in graphics memory and the driver should not even care about figuring out the bounds for uploading. To answer this, I had to look at the all_varyings_in_vbos function here.

all_varyings_in_vbos(const struct gl_client_array *arrays[])
{
   GLuint i;

   for (i = 0; i < VERT_ATTRIB_MAX; i++)
      if (arrays[i]->StrideB &&
          !arrays[i]->InstanceDivisor &&
          !_mesa_is_bufferobj(arrays[i]->BufferObj))
	 return GL_FALSE;

   return GL_TRUE;
}

Basically, if the data for any of the enabled attributes is not in a VBO in graphics memory, this check returns false. Some debugging with GDB has shown that this returned false for i==24. So the attribute number 24 causes the check to fail and is the reason why glDrawElements is so much slower.

It was quite difficult to figure out where attribute 24 came from. This data gets passed along between different parts of the driver and in the OpenGL compatibility profile, it converts the attribute IDs because the first ones are reserved for the fixed function pipeline (or similar...). So instead of looking for ID 24, I had to look for ID 7 because the generic attributes start with ID 17 (17+7=24).

Finally I figured out that removing the following line from model_common.xml made glDrawElements just as fast as glDrawRangeElements (in InstancingModelRenderer.cpp):

<attrib name="a_tangent" semantics="CustomAttribute2" if="USE_INSTANCING || USE_GPU_SKINNING"/>

Of course you have to disable gentagents in your config (and probably something else), or this change breaks stuff. Just disabling it doesn't help if you leave that line in the xml.

Unfortunately I don't think that this helps anything in regard to the general performance problem of AMD cards compared to Nvidia cards. It's nice to know why glDrawElemnts was so much slower, though.

comment:9 by Yves, 9 years ago

It seems likely that AMD OpenGL drivers are simply slower with these relatively old extensions and approaches we use for rendering. Instead of trying to figure out why the same code is much slower than on comparable Nvidia hardware, I think it's better to implement some newer rendering techniques.

Mainly the idea is to reduce CPU overhead in the driver which is caused by state changes and validation. I'm testing uniform buffer objects (UBOs) together with instanced rendering at the moment. UBOs per object are quite slow, but with instancing it should be possible to pack uniform data for multiple draw calls into one buffer and then draw multiple objects at once.

This and additional ideas are described here: https://www.youtube.com/watch?v=-bCeNzgiJ8I

comment:10 by Yves, 9 years ago

I've create a branch on my github repository for my work on the OpenGL 4 renderer: https://github.com/Yves-G/0ad/tree/OGL4

On that page, there's also a description of the current state, a guide how to test it and some information about the improvements it brings so far.

comment:11 by Stan, 9 years ago

Any hope of seeing something for next release ?

comment:12 by Itms, 9 years ago

Milestone: Alpha 19Alpha 20

Hi Yves, if you don't mind I'll push that to A20. :)

comment:13 by elexis, 8 years ago

Keywords: patch added

comment:14 by Itms, 8 years ago

Milestone: Alpha 20Alpha 21

comment:15 by elexis, 8 years ago

Milestone: Alpha 21Backlog

Backlogging due to lack of progress.

comment:16 by Vladislav Belov, 2 years ago

In 26781:

Uses sequential numbering of GL vertex attributes for modern hardware. Refs #3054

Differential Revision: https://code.wildfiregames.com/D4601

comment:17 by Vladislav Belov, 2 years ago

Description: modified (diff)
Resolution: needsinfo
Status: newclosed

The related problem with vertex attributes was fixed in r26781. The other part of the ticket is more about general optimizations.

comment:18 by Vladislav Belov, 2 years ago

Keywords: patch removed

comment:19 by Vladislav Belov, 2 years ago

Milestone: Backlog
Note: See TracTickets for help on using tickets.