OpenGL rendering, understanding fill rate vs poly throughput

Hi all,

This is entirely unrelated to HMH, but there's enough collective knowledge here that perhaps someone can help my understanding :)

I've been trying to understand the performance profile of 3D rendering in OpenGL. I seem to be encountering frame drops when rendering reasonably high poly meshes - but only when those meshes occupy a large portion of the screen. If I shrink the viewport, or if the object is far enough away that it doesn't occupy most of the screen - it's absolutely fine.

For some time I suspected this was a symptom of being fill-rate limited.. especially since the issue goes away if I reduce the viewport size (it stands to reason that if it's rendering to a smaller area, we're not trying to fill so many pixels or perform so many texture lookups).

However..
I have discovered that the issue remains even after disabling textures.. and lighting.. and drawing the mesh as a solid colour. I also get no benefit from drawing as a wireframe, rather than filled. In fact, the frame drop issue remains even with the most braindead vertex/fragment shaders.


So it would seem that the issue clearly isn't fill-rate. I must be pushing too much geometry. But that leaves me wondering, why does the frame-rate recover when the object occupies less physical screen space? If the problem is geometry, then how could the size be a factor? It's not like I'm pushing less vertices, it's just that the fragment shader has less to do, but with the above simplifications - it shouldn't be doing much anyway.


If it helps, this is all running under OSX..
Incidentally, has anyone found the OpenGL Driver Monitor on OSX to be pretty well broken on Sierra?





What are your frame timings like?
One of the causes of being fill-rate limited is pure memory bandwidth. In other words you are trying to write to more pixels than the memory on the GPU can handle.

You can fix that by ordering the polygons so you are rendering front to back so the depth test can kick in and not both shading some pixels that won't make it on screen.
If you want code examples for what ratchetfreak is referring to, this site has the basics covered : https://learnopengl.com/#!Advanced-OpenGL/Face-culling
@Jesse:

Frame timings:
When I'm hitting full 60fps, with the object taking up a small area of the screen, I'm getting consistent frame times of 2ms (1.2ms draw + 1ms flushBuffer).

If I make that same object consume approximately half the screen, I'm still hitting 60fps. Frame times are still consistent, but closer to 11ms total (1.2ms draw + 10ms flushBuffer).

The tipping point is when that mesh is nearly full-screen, the timings are all over the place. I still get an average 2ms draw, but the flushBuffer timings can be anywhere from 0.26ms to 49ms for any given frame.

That's a huge variation, for what should be an identical scene from one frame to the next..
I'm also not sure how flushBuffer could ever complete in 0.26ms unless it is literally doing nothing.


Here's something else I've noticed...
With a reasonably static scene running full-screen - when I switch desktops away and then return, my frame timings improve substantially for a short period. I wonder if that implies command buffer limitations somewhere..


Lastly.. and maybe this is telling..
When I render this mesh (at this size) using a polygon mode of GL_POINT or GL_LINE (instead of GL_FILL) - performance is actually worse!. I guess that with point or line rendering there is no face culling that can be achieved, so it has to evaluate and render every vertex..
ratchetfreak
One of the causes of being fill-rate limited is pure memory bandwidth. In other words you are trying to write to more pixels than the memory on the GPU can handle.

You can fix that by ordering the polygons so you are rendering front to back so the depth test can kick in and not both shading some pixels that won't make it on screen.


Thanks @ratchetfreak.

I'm not sure if this is a draw order problem.
The test object I'm drawing is a torus, drawn as a single high-poly mesh - with 1 draw call.

Even drawn upright and towards the camera, all of the faces on the near side of the torus are visible and facing the camera. Conversely, every single face on the far side of the torus are facing away from the camera -- backface culling should eliminate these before they are filled.

Also, since I can draw much lower poly count models that cover the same volume (as the same screen space), wouldn't this imply that I'm not fill-rate limited?

How are you measuring time? Measuring time on CPU and reasoning about it is very poor indication how GPU works.
And what is flushBuffer?
I use a CADisplayLink that fires a callback to my render code at a vsync locked 60fps.
That callback is part of an NSOpenGLView, which is (one of) OSX's mechanisms to providing an OpenGL window.


[[url=https://developer.apple.com/refer...openglcontext/1436211-flushbuffer
]flushBuffer[/url]] is called once you're done with your draw calls, it pretty much boils down to [glFlush + glSwapBuffers]

My understanding was that flushBuffer blocks until the GPU has done its thing. As you say, timing GPU operations isn't particularly precise.. I'll say :)


I've basically got something like this going:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
- (CVReturn)getFrameForTime:(const CVTimeStamp*)outputTime
{
    // render here, vsync fired

    static double frameStart;
    static double lastFrameStart;


    lastFrameStart = frameStart;
    frameStart = getCurrentTime();

    double frameInterval = frameStart - lastFrameStart;   

    render();  // render the scene, draw calls etc

    double commandBufferTime = getCurrentTime() - frameStart;

    [currentContext flushBuffer];   // rendering is done, flush and swap

    double frameEnd = getCurrentTime();
    double frameTime = frameEnd - frameStart;
}



Since this function only fires on the vsync, if frameInterval is larger than 16.777 then I know I've missed a frame.

frameTime tells me the elapsed time in which all draw calls and the flushBuffer completed, which I would expect to be a reasonably useful measurement.


That said.. I'm looking a little closer at this...
Even a 'swapbuffer' call can leave work for the graphics system to do that can hang around and slow down subsequent operations in a mysterious fashion. The cure for this is to put a 'glFinish' call before you start the clock as well as just before you stop the clock.


With the above in mind, it does indeed seem that a lot of the wait time is occurring after the swapbuffer call.


I suppose the best approach is to just, draw less.. but I seem to be facing an LOD conundrum. The whole idea of LOD is to use less detailed mesh for further away objects, and higher detailed meshes for those that are closer. But it seems to be those high detailed meshes that give me performance issues at close range.

It's not really a problem, these are exaggerated meshes that I'm using to help my understanding of the potential bottlenecks, and how they manifest.. I'm just seem unable to draw many conclusions from it :)

Thanks everyone
If anyone is interested, this is a pretty great article that explains fill-bound vs transform-bound and how to reason between them (and optimise for them).