[OpenGL performance] Drawing many of the same object, but in different locations

So, I'm creating a map generator to learn opengl. Currently I'm using webGL, but it is very similar to pure opengl from what I've seen so far. Anyway, I want to draw about ~100,000 2D hexagons with different colors and different locations.

I've written the following code so far, and it works fine for a couple thousand objects, but is reaaaally slow when scaling up to 100k.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
function draw(dt) {
  gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
  
  gl.useProgram(shader_program);
  
  var view = mat4.create();
  view = mat4.rotate(view, view, glMatrix.toRadian(-50), vec3.fromValues(1.0, 0.0, 0.0));
  view = mat4.translate(view, view, vec3.fromValues(camera.x, camera.y, camera.z));

  // TEST
  camera.y += dt * 0.00001;
  
  var projection = mat4.create();
  projection = mat4.perspective(projection, glMatrix.toRadian(45), gl.drawingBufferWidth / gl.drawingBufferHeight, 0.1, 100);

  var uniform_loc;
  uniform_loc = gl.getUniformLocation(shader_program, "view");
  gl.uniformMatrix4fv(uniform_loc, gl.FALSE, view);
  uniform_loc = gl.getUniformLocation(shader_program, "projection");
  gl.uniformMatrix4fv(uniform_loc, gl.FALSE, projection);
  
  var hex_indices = [0, 1, 2, 0, 2, 3, 0, 3, 5, 3, 4, 5];
  
  gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, indices_buffer);
  gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, new Uint16Array(hex_indices), gl.STATIC_DRAW);

  // this could be elsewhere, but I'm just testing so far (and it is not the bottleneck, of course)
  var hex = create_hex(0, 0, 0.1);
  var hex_verts = [];
  for (var i = 0; i < 6; i++) {
    hex_verts = hex_verts.concat(hex_corner(hex, i));
  }

  gl.bindBuffer(gl.ARRAY_BUFFER, array_buffer);
  gl.bufferData(gl.ARRAY_BUFFER, new Float32Array(hex_verts), gl.STATIC_DRAW);

  var vertexLocation = gl.getAttribLocation(shader_program,"position");
  gl.enableVertexAttribArray(vertexLocation);
  gl.vertexAttribPointer(vertexLocation,3,gl.FLOAT,false,6 * 4,0);
    
  // draw every hexagon
  for (let hex of hexes) {

    // translate to hexagon position
    var model = mat4.create();
    model = mat4.translate(model, model, vec3.fromValues(hex.x, hex.y, 0));
    uniform_loc = gl.getUniformLocation(shader_program, "model");
    gl.uniformMatrix4fv(uniform_loc, gl.FALSE, model);

    // set hexagon color
    var vertexColor = gl.getAttribLocation(shader_program,"color");
    gl.vertexAttrib3f(vertexColor, hex.r, hex.g, hex.b);
    
    // draw hexagon
    gl.drawElements(gl.TRIANGLES,hex_indices.length,gl.UNSIGNED_SHORT,0);
    
    // draw hexagon outline
    var vertexColor = gl.getAttribLocation(shader_program,"color");
    gl.vertexAttrib3f(vertexColor, 0, 0, 0);
    gl.drawArrays(gl.LINE_LOOP, 0, hex_verts.length/2/3);
  }
  
  requestAnimationFrame(draw)
}


Is it the sheer number of draw calls? Or is it the fact that I'm telling opengl to draw 100k+ objects which are mostly not even on the screen space? Or is it both?

I'm trying to think of a good way to fix the first one (maybe create a huge array with the vertices of all hexagons? Is this really the to go?), but I don't even know if the second one is a problem I should care about.

I would like the opinion of someone who is more experienced on what I should aim for.

Thank you guys in advance, this community is awesome.

Edited by Italo on
You can call getAttribLocation and getUniformLocation just once outside the loop and save a little bit.

However what you really need here is instancing. Using the extension ANGLE_instanced_arrays.

Replace the mat4 model uniform with a vec2 modelPosition attribute. In the shader you add position and modelPosition before applying the other matrices.

Then you upload all the positions to a second VBO.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
var ext = gl.getExtension('ANGLE_instanced_arrays');


gl.bindBuffer(gl.ARRAY_BUFFER, array_buffer);
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array(hex_verts), gl.STATIC_DRAW);
//only needs to be done once at init.
  


if(!ext){
  //not supported fall back to non instanced
}else{
  //everything following up to upload can be consolidated into a vao

  gl.bindBuffer(gl.ARRAY_BUFFER, array_buffer);
  var vertexLocation = gl.getAttribLocation(shader_program,"position");
  gl.enableVertexAttribArray(vertexLocation);
  gl.vertexAttribPointer(vertexLocation,3,gl.FLOAT,false,6 * 4,0);
   
  gl.bindBuffer(gl.ARRAY_BUFFER, isntance_buffer);

  var modelLocation = gl.getAttribLocation(shader_program,"modelPosition");
  gl.enableVertexAttribArray(modelLocation );
  gl.vertexAttribPointer(modelLocation ,2,gl.FLOAT,false,6 * 4,0);
  ext.VertexAttribDivisor(modelLocation, 1);
   
  var colorLocation = gl.getAttribLocation(shader_program,"color");
  gl.enableVertexAttribArray(colorLocation);
  gl.vertexAttribPointer(colorLocation,4,gl.FLOAT,false,6 * 4,2*4);
  ext.VertexAttribDivisor(colorLocation, 1);

  //upload data  
  gl.bindBuffer(gl.ARRAY_BUFFER, isntance_buffer);
  gl.bufferData(gl.ARRAY_BUFFER, createFloat32InstanceArray(hexes), gl.STATIC_DRAW);


  ext.DrawElementsInstanced(gl.TRIANGLES,hex_indices.length,gl.UNSIGNED_SHORT,0, hexes.length);

  gl.disableVertexAttribArray(colorLocation);
  gl.vertexAttrib3f(colorLocation, 0, 0, 0);
  ext.DrawArraysInstanced(gl.LINE_LOOP,0, hex_verts.length/2/3, hexes.length);
}

In my experience, the bottleneck is the calls to the gpu. They should be as packed as possible. Calculate all positions first, and make a huge struct array with the object positions and colors, then a loop that only do the calls to the graphic card (render the textures). That normally increase my performance with 70% or more.

Edit: That being said, I have NO opengl experience, I always use SDL, but I assume the same thing apply to opengl.

Edited by Mikael Johansson on
More importantly, is the geometry static, or dynamic?
@vassvik

Depends on what you mean by dynamic. It can change location, but not shape.

@ratchetfreak

Thanks for the help, I'm trying to understand what you've written. Later I will post again with the performance results of instancing.
nyeecola
@vassvik

Depends on what you mean by dynamic. It can change location, but not shape.

@ratchetfreak

Thanks for the help, I'm trying to understand what you've written. Later I will post again with the performance results of instancing.
The instancing trick mentioned by @ratchetfreak is definitely the way to go. You can even drop the vertex attributes for the hex positions too, but using the vertex ID in the vertex shader to draw the hex "implicitly" to further decrease amount of state change. I'll try to cook together an example.

The reason I asked about dynamic vs static is that you can do a lot of nice additional optimization if your hex tiles don't move relative to some anchor.

Do you have an example screenshot of what it looks like? =)
@vassvik @ratchetfreak

The instancing method worked, It went from unplayable at 10k to really smooth at 100k+.

Here is a screenshot, as requested:

Nice! Well done.
Here's the trick I mentioned. One name for it is "bufferless rendering", I believe. (thanks to @d7samurai for teaching me this trick in general, and deriving the id-to-position functions)

You make something like the following without any vertex buffers or vertex attributes at all:


Basicly, you just need an empty vao, and one call to glDrawArraysInstanced:

1
2
glBindVertexArray(vao);
glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 6, N);

which will call the vertex shader 6*N times, where every 6 calls, gl_VertexID will rotate between 0 and 5, and gl_InstanceID will increment from 0 to N-1 every 6th call. gl_VertexID can be used to place the vertices correctly like the following snippet of a vertex shader:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
vec2 hexagon(int vertex_id, vec2 scale_factor) {
    int x = (vertex_id + 1) >> 1;
    int y = (vertex_id & 1) << 1;
    
    return vec2(x + (x >> 1) - 2, y - ((vertex_id + 3) >> 2)) * scale_factor;
}

void main() {
    // scaled so that the height is unity, and equal sides
    vec2 p0 = hexagon(gl_VertexID, vec2(0.28867513459, 0.5)); 
    gl_Position = vec4(p0, 0.0, 1.0);
}

For a full (but short) example, take a look at these:
Main program (It's Odin, but the OpenGL calls are the same regardless of languages
Vertex shader
Fragment shader

Hopefully it's readable.