Particle systems - part II

09 May 2016

I’ve continued improving my particle system implementation, and added a few particle effects to the game as a way to test it, and of course as a small step to making the game a little more visually enjoyable. I’m really not focusing on production visuals at this point, all work on the rendering is still in service of the development and debugging the renderer, That said, a little extra moving stuff on-screen already livens up things a lot ;-)

Particle rendering

In the previous post I spent a short section on particle rendering, which at that point was still unnecessarily inefficient. I made some small improvements to eliminate the one thing that was bothering me the most: the fact that every attribute of every particle had to duplicated four times (once for each vertex) before sending the particle data to the GPU on each frame. Effectively, this increased the size of the particle system data sent to the GPU 4-fold, which could start to become quite a drag on memory bandwidth if the number of particles or their number of attributes increases.

First a quick recap on how particle rendering currently works. At this point, all particles are rendered as solid rectangles, and parameterized by at least a position (a 2D vector), a size (scalar), and a rotation (scalar). To relieve the CPU from the task of converting these 3 elementary particle attributes to screen space, a vertex shader is used that takes the four points of a unit square (a square of size 1, centered around the origin of its local coordinate system), and translates, rotates and scales it according to each particle’s position, rotation and size. The vertex shader currently looks like this:

attribute lowp vec4 a_position;

attribute lowp vec2 a_particle_position;
attribute lowp float a_particle_angle;
attribute lowp float a_particle_lifetime;
attribute lowp float a_particle_size;
attribute lowp vec4 a_particle_color;

uniform mat4 u_projection;
uniform mat4 u_model_view;

varying lowp float v_lifetime;
varying lowp vec4 v_color;

void main()
  v_lifetime = a_particle_lifetime;
  v_color =  a_particle_color;
  mat2 rotate;
  rotate[0][0] = cos(a_particle_angle);
  rotate[0][1] = -sin(a_particle_angle);
  rotate[1][0] = sin(a_particle_angle);
  rotate[1][1] = cos(a_particle_angle);
  vec2 vertex_position = rotate * vec2(a_position.x * a_particle_size, a_position.y * a_particle_size) + a_particle_position;

  gl_Position = u_projection * u_model_view * vec4(vertex_position.x, vertex_position.y, -1.0, 1.0);

Note that there is nothing fancy or smart going on here, my vertex shader skills are very basic, and this is currently all just made to work, not to be optimal ;-). This shader is called for each vertex of the unit rectangle, for each particle, where the a_position attribute will contain the coordinates of the unit rectangle vertex, and a_particle_position the position of each particle itself.

The particle data is passed using 2 vertex buffer objects (VBO’s), one static buffer containing the unit rectangle vertices (which are always the same), and a dynamic VBO with the particle properties (updated every frame). The way OpenGL vertex shaders work, this means the data layout of the combined two buffers looks somewhat like this, where the top row is the static VBO containing the unit square coordinates, and the bottom three rows illustrate how some of the particle attributes (in this case: particle angle and position):

As you can see, in terms of data size this is very inefficient. First of all we have to repeat the same 4 unit square vertices over and over again, then we have to repeat every vertex attribute 4 times for each particle, effectively increasing the size of the data transferred to the GPU 4-fold (the static buffer containing the repeated unit square vertices only needs to be transferred once). When the number of particles and attributes is small, the inefficient data layout will not be noticeable, but if we wanted to render 100K particles with 64 bytes of attributes each, we are duplicating ~6.4 MB of data four times, uploading ~25 MB to the GPU for each frame. At 60 frames per second this starts to add up: 60 frames/second × 25 MB/frame = 1.5 GB/s

Modern graphics API’s have multiple solutions to prevent the data overhead. I’ll list the three most common solutions here, with links that provide more background, a treatise on each of them would be outside the scope of this post:

  • Point sprites

    Drawing particles as large, possibly textured points (quads). This allows sending a single coordinate plus per-particle attributes, but has some serious downsides. The most glaring ones are that the point sprites cannot be rotated, and have a hardware-defined size limit. Also, point sprites preclude varying per-particle attributes, e.g. to create gradients.

  • Geometry shaders

    Without doubt the most powerful, flexible and sensible way to implement particles without restrictions or data duplication. A geometry shader basically takes the output of a vertex shader, and generates geometry from it on the fly. For example, a geometry shader could take a single vertex, rotation and scale, to generate a quad representing a particle and its per-vertex attributes. Absolutely fabulous, but sadly not supported by OpenGL ES 2.0 :-(. OpenGL ES 3.0 supports geometry shaders, but the baseline hardware for ES 3.0 is an Apple A7 processor (iPhone 5S and up), which would preclude too many perfectly capable devices for a simple game like this (I try to stick with an iPhone 5 as baseline device, as its the oldest iOS device I have lying around).

  • Instancing

    Instancing is a way to efficiently render the same geometry multiple times, using different attributes such as position, angle, size, etc, in a single draw call. Instancing is very much like drawing the same VBO multiple times from a loop, updating part of it on each iteration. Instead of actually looping and updating though, an instanced draw call takes both the static (non-updating) attributes and the variable (updating) attributes in a single call, along with the number of instances and what is called the ‘attribute divisor’ for each attribute. The attribute divisor indicates the interval at which the pointer into the attribute array needs to be updated. When drawing particles, we would be instancing 2 triangles forming a quad, the number of instances would be equal to the number of particles, the attribute divisor for the unit rectangle coordinates would be 0 (the attribute would not be instanced), but the attribute divisor for all per-particle attributes would be 1 (advance every instance, which means: every particle).

To improve the rendering efficiency of our particle system, instancing appears to be the only option we have. At least if we don’t want to limit ourselves to newer devices with OpenGL ES 3.0, or accept less flexibility in the kinds of effects we can render.

While OpenGL ES 2.0 does not natively support instancing, all iOS hardware with a PowerVR SGX 543 or later (iPhone 4S and up) supports instancing through the EXT_instanced_arrays OpenGL extension. For our particle system rendering, we’ll be instancing a single quad defined by its four vertices, passing in per-particle attributes without duplicating them. The figure below shows the data layout for the same 3 particles as above, but using instanced drawing instead. The space savings compared to non-instanced drawing should be obvious from just the size of the figure alone:

To instance a quad that uses the same per-particle attributes for each of its four vertices, we need to set up attribute divisors and draw the VBO’s holding the static data (now just 4 vertices total, no repetition) and the dynamic per-particle data using glDrawArraysInstancedEXT instead of glDrawArrays. A stripped-down version of the setup code and the render call that only uses the per-particle position and angle attributes looks like this:

// Enable static particle vertex attributes (unit square vertices)
glBindBuffer(GL_ARRAY_BUFFER, particle_system_state.particleBufferStatic);

  K14GLES2ShaderAttributePosition, 3, GL_FLOAT, GL_FALSE,
  sizeof(K14GLES2VertexAttribute), (void *) offsetof(K14GLES2VertexAttribute, position));

// Enable dynamic particle vertex attributes (particle position & angle)
glBindBuffer(GL_ARRAY_BUFFER, particle_system_state.particleBufferDynamic);

  K14GLES2ShaderAttributeParticlePosition, 2, GL_FLOAT, GL_FALSE,
  sizeof(K14GLES2ParticleAttribute), (void *) offsetof(K14GLES2ParticleAttribute, position));

  K14GLES2ShaderAttributeParticleAngle, 1, GL_FLOAT, GL_FALSE,
  sizeof(K14GLES2ParticleAttribute), (void *) offsetof(K14GLES2ParticleAttribute, angle));

// Render particle system by instancing a single quad
glVertexAttribDivisorEXT(K14GLES2ShaderAttributePosition, 0);
glVertexAttribDivisorEXT(K14GLES2ShaderAttributeParticlePosition, 1);
glVertexAttribDivisorEXT(K14GLES2ShaderAttributeParticleAngle, 1);

int n = particle_system_state.currentParticles;

glDrawArraysInstancedEXT(GL_QUADS, 0, 4, n);


glBindBuffer(GL_ARRAY_BUFFER, 0);

The last section that sets up the instanced drawing should be translated as follows: we will be rendering n instances of a quad, defined by 4 vertices, where n is the number of particles. The quad vertices have 3 attributes: a position, which has attribute divisor 0, meaning it will not be instanced and advance every vertex, and a particle position and angle, which have attribute divisor 1, meaning they will advance every instance (in other words, every 4 vertices). I personally found it a little confusing that the attribute divisor specifies instances (quads) instead of vertices. Initially I used a divisor of 4, which of course completely screwed up rendering. A divisor in terms of vertices in my opinion would have made more sense, and allow more advanced instancing techniques such as having attributes per half-particle, if that makes sense. I’m thinking of different colors for the two leftmost and rightmost vertices of the quad, for instance.

If you’re familiar with OpenGL ES 2.0, you may have noticed one small cheat in the quoted code: I’m using GL_QUADS in the example, which is not actually supported by OpenGL ES 2.0 (it is on desktop OpenGL though). In ES 2.0 a quad is drawn using GL_TRIANGLES, as 2 triangles defined by 6 vertices total. This does not match the figure I made to illustrate the instanced attribute data layout though, and I don’t want to spend more time to correct it, so for demonstration purposes I just modified the code snippet instead ;-)

Respawn particle effect

In the original game a visual effect was displayed when the player respawned or left the planet atmosphere, which could be described as a ‘pulsating cross’ where the player would re-appear. I re-created this effect as a particle system, by modifying the ‘circular area’ position generator, adding a new emitter to emit bursts of particles, and a particle generator and updater specific to the respawn effect.

What the particle effect does is to cover a cross-like shape with particles, which start with a zero size, grow to some preset maximum size, then shrink and disappear. The maximum size and the speed at which the particle sizes pulsate depending on their distance to the center of the particle effect: closer particles grow bigger and last longer, further particles have a smaller maximum size and fade out quickly. Last but not least, the particles are not all emitted at once, but in bursts, where the burst size is chosen to be equal to the number of particles in one ‘ring’ of the effect.

The best way to explain the effect is by showing it, so check the video in the section below to get a better idea of how the effect works. From a code point of view, setting up the particle effect looks like this:

K14RadialPositionParticleGenerator *radial_position_generator =
  [K14RadialPositionParticleGenerator generatorWithOrigin:origin
radial_position_generator.steps = 6;
radial_position_generator.rings = 4;

unsigned int num_particles = radial_position_generator.steps * radial_position_generator.rings;

NSArray *generators =
  [K14RespawnParticleGenerator generatorWithRespawnLocation:origin
  [K14ColorParticleGenerator generatorWithColors:@[ 
    [K14Color colorWithRed:1.0f green:0.0f blue:0.0f alpha:1.0f] 

id<K14ParticleEmitter> emitter = [K14BurstParticleEmitter emitterWithGenerators:generators

NSArray *updaters = @[ [K14RespawnParticleUpdater new] ];

particle_system = [[K14ParticleSystem alloc] initWithEmitter:emitter 

The K14RadialPositionParticleGenerator was previously called K14CircularAreaParticleGenerator, and can be used to initialize particle positions inside a circular area. The generator now has a distribution parameter which can be ‘random’ or ‘windmill’. The former distribution is still used for the explosion effect, the windmill distribution is used for the respawn effect, and will position particles at regular angular intervals (steps) and distance from the origin (rings).

The K14RespawnParticleGenerator implements particle initialization specific to the respawn effect. It assigns maximum sizes and lifetimes to particles based on their distance to the origin of the effect. The K14RespawnParticleUpdater reads these and applies them to create the pulsating size effect. Storage of these new particles properties (max size and current time relative to lifetime) inside the particle data structure is implemented using a new ‘custom particle property’, which is currently a 3-vector. This is a bit of a hack, I’d much rather have a generic way to register and layout per-particle properties without hard-coding them in the K14ParticleData structure or resorting to hacks like stuffing values in a ‘custom property’ vector.

Last but not least there now is a second particle emitter, K14BurstParticleEmitter. Instead of continuously emitting particles at some rate in particles per second, the burst emitter emits a number of particles (‘burst’) at a regular interval (‘pulse’). For the respawn particle effect the burst size is set to be the same as the number of angular steps in the respawn effect, which results in the rings of particles appearing one-by-one.


To better time the respawn particle effect and to be able to also play it when the game starts initially, I changed a few things related to game loop control. Instead of automatically starting the game with the player ship at the explicit coordinates specified in the planet definition, the engine now disables the player entity at game start, and hands over control to the scripting layer. The scripting layer has to explicitly respawn the ship entity to enable it. By using a series of chained gameplay events with delays, the respawn particle effect can be timed such that the player ship entity appears approximately halfway the effect. When the player entity respawns, the orb entity is automatically moved to sit on top of the pedestal entity.

In anticipation of having multiple respawn locations per planet (e.g. to have save points), the JSON planet definition now has to list one or more named respawn locations. When respawning the player, the planet script can refer to the respawn location by its name.

Camera control

I thought it would be nice to be able to control the camera independent of the player position, if only to be able to set the camera to the location the respawn effect would appear after a crash. Previously, the renderer was hardcoded to set the camera to always track the player ship, using a fixed zoom setting. For more flexible camera control, the Lua Planet class now has access to two new properties of the K14Planet class: cameraPosition and cameraZoom. On each frame, these properties are copied to the renderer via the K14RenderBuffer instance. This means the Lua Planet class is now free to move and zoom the camera however it wishes, for example to create a flyover of the planet or to dynamically zoom in/out when the player enters difficult/narrow sections. It also means player tracking now also has to be done from the Lua planet wrapper class.

To test the new camera control feature and to make the respawn effect more visually attractive, I implemented a camera dolly/zoom effect when the player crashes. Using to linear servo’s for the camera x- and y-position, and a sinusoidal servo for the camera zoom setting, the camera will now move from the crash site to the respawn site with a nice zoom effect. I just quickly hacked this together, so it’s a little tacky, but it still beats warping to the respawn location instantaneously.


The obligatory video to illustrate the respawn particle effect and the camera dolly/zoom effect. Note that the default zoom setting has been slightly increased, making everything a little bigger.

Next steps

One thing I’d like to do before moving on is to further improve the low-level particle data structure, and the high-level abstraction around it. Right now, the K14ParticleData structure is just a struct of arrays with hard-coded fields for each particle attribute. Most particle effects only need a small subset of these fields, but right now all of them are always allocated, copied and uploaded to the GPU. It would be much better to have particle effects register a set of attributes they want to use, and pack them tightly into a single block of memory that has the minimum necessary size and can be allocated and copied using a single malloc and memcpy call.

Other things for the future are adding more and nicer particle effects, and making a start towards stabilizing the game rules and fixing gameplay related bugs. At some point the idea still is to make this into a real game, create some interesting levels, glue everything together using menu’s, etc. All I’ve been working on so far is engine code, adding many interesting features that are hardly used for anything. But I’ll get to that, eventually ;-)

Development scoreboard

I’ve decided to stop tracking development time and SLOC counts. Over time, I’ve mistakenly double-counted source files, I’ve included JSON files in the SLOC count, I’ve forgot to include Objective-C++ code, etc, all of which makes the SLOC counts more or less unusable for tracking how the source code size progressed. Besides that, I also have mixed feelings about the SLOC metric, as it implies minimizing the SLOC count is a goal in itself. In general, fewer lines means fewer opportunities for bugs, so it’s definitely good to avoid repetition and to try to be concise. Focusing only SLOC count starts being a bad thing when you find yourself trying to come up with ‘clever’ ways to smash multiple things together into fewer lines of code though, which I sometimes caught myself doing. So it’s probably better to just stop paying attention to the SLOC count.

Similarly, I almost always lose track of the amount of time spent on the code, and have never included time thinking about solutions off-line, which again makes the development time figures mostly useless. So I’ll also skip this for future posts, and this post will be the last to have a ‘development scoreboard’ paragraph.