Voxel Engine – Performance

In the last two posts I have described a method of using run length encoding and interval trees to store and generate meshes from voxel data. In this post I intend to analyze its performance and compare it to that of the traditional approach.But before that I need to describe what that traditional approach is.

Traditional Voxel Engine

Traditional voxel engines store data in large 3D arrays with the index in the array corresponding to the coordinate in the chunk and the value corresponding to the type of voxel at that coordinate. Data is broken up in to chunks in the same way and for the same reasons as the interval tree method. The buffer region of 1 voxel is not present in the traditional method as it is unnecessary as random access is fast. This means the chunks are 16 voxels wide and deep, rather than 18. The chunks are stored in an unordered map and therefore there is a time cost associated with their retrieval. Therefore at the start of the meshing function references to all 4 adjacent chunks are obtained alongside the central chunk containing the MegaVoxel we are meshing.

Once these chunks have been found we iterate through the central chunk searching for air voxels. Then for each of these air voxels we search its 6 adjacent voxels, adding faces if they are solid. Whilst some engines implement a heightmap which keeps track of the highest occupied voxel for each (x-z) and can therefore stop iteration earlier, I didn’t implement this. This is largely because I am only generating meshes for occupied chunks and so the performance gain of this would be negligible.

Tests

The terrain is generated using a simple heightmap function. At startup Chunks are generated from (-16,-16) to (16,16) for a total of 1089 chunks or 35,684,352 voxels (not including the overlapping regions in the interval tree structure). Within these chunks all voxels below the heightmap value are set to 1 and the rest are set to 0 (air).

Next meshes are generated for the bottom 4 MegaVoxels within each of these chunks. However, as the meshing functions are too fast to accurately time, these meshes are generated 128 times.

Whilst the performance of the traditional approach is largely constant, the nontraditional method varies enormously based on how many runs there are. To deal with this I used 3 different heightmap functions.

The first is the best case for the run based system and is just flat:

heightMap[x][z] = 48;

The next generates a sin wave in the z direction. As each z strip has the same height, this structure fits very nicely into our run length encoding.

heightMap[x][z] = 32 + (int)(sinf((chunkZ + z - 1) / 32.0f*3.14f)*18.0f);

The next heightmap is another sin wave but this time in the x direction. This means the voxel data doesn’t fit into the run length encoding nicely.

heightMap[x][z] = 32 + (int)(sinf((chunkX + x - 1) / 32.0f*3.14f)*18.0f);

The worst case for both the data structures is a checkerboard pattern, however, I decided not to test this case as it isn’t a very meaningful test. This voxel engine is designed for rendering minecraft-like terrains, where such a situation doesn’t even come close to arising.

Memory Usage

Memory usage is one of the key areas where voxel engines often fall down and so is a key area to asses. For these tests I used a ChunkHeight of 128 and an unsigned 16-bit integer for the voxel type, making each voxel 2 bytes in size.

Each node within the interval tree contains 3 unsigned 16-bit integers (start position, end position and the maximum end position in its subtree) in addition to the voxel type. It also contains 3 64-bit pointers (to the left, right and parent intervals) and, as I underpinned the structure with a red-black tree, it also contains an 8-bit integer to store whether it is red or black node. The VS2013 compiler I am using adds 1 byte of padding to this, bring the total size of each run to 34 bytes. I used an integer not a boolean for the node’s colour as the size of a boolean is implementation defined.

The memory results for the 3 heightmaps are below:

HeightMap Traditional Size Interval Tree Size
Flat 35,684,352 Voxels = 71,368,704 Bytes 2,178 Runs = 74,052 Bytes
Sin(Z) 35,684,352 Voxels = 71,368,704 Bytes 39,270 Runs = 1,335,180 Bytes
Sin(X) 35,684,352 Voxels = 71,368,704 Bytes 687,786 Runs = 23,384,724 Bytes

From this we can see the run length encoding can give us very good in memory compression, however, it isn’t as amazing as one might have expected. The memory usage in the final case is not significantly different and it isn’t even close to the worst case situation: Introducing more random terrain and different voxel types is likely to push the memory usage close to, if not beyond, that of the traditional method. I therefore think that choosing this method for memory compression is a fairly hard sell.

When I was looking at ways to improve on this I considered allocating the intervals in an array and using 16-bit indexes instead of 64-bit pointers. However, I deemed the performance implications of this to be too sever to be viable: Changing a voxel type, which is already fairly slow, could potentially require reallocating the entire array.

Meshing Speed

Every time that a voxel changes we have to regenerate the mesh for the MegaVoxel that contains it and potentially for an adjacent MegaVoxel also. Meshing speed is therefore the next crucial area to analyse. Timings for generating the meshes for the 1089 chunks 128 times are shown below. They were taken from a 64-bit release build using GetTickCount() which has a typical resolution of 10-16 milliseconds.

HeightMap Traditional Speed Interval Tree Speed
Flat 6.656 Seconds 0.015 Seconds
Sin(Z) 9.688 Seconds 0.234 Seconds
Sin(X) 9.812 Seconds 4.172 Seconds

If memory usage was a hard sell, meshing speed is a completely different story: Even with the least favorable heightmap function, the interval tree was more than twice as fast as the traditional approach. Despite extensive use of the time profiler Very Sleepy, I was unable to reduce this disparity. I optimised the algorithm as much as I could, however, there was no way to remove the cost of iteration. In the version of the code used for the above timings, indexing the data array takes up around 81% of the function’s run time.

Meshing Quality

One of the other advantages of the interval tree method is it can produce lower polygon count meshes. The table below shows how significant this is:

HeightMap Traditional Polygon Count Interval Tree Polygon Count
Flat 557,568 Triangles 2,178 Triangles
Sin(Z) 1,151,040 Triangles 66,972 Triangles
Sin(X) 1,151,040 Triangles 1,051,776 Triangles

We can see here that the interval tree method can generate meshes with substantially lower polygon counts than the traditional method. This saves memory on the GPU, improves frame rate and potentially leads to faster physics (if the generated meshes are used as physics bodies). However, in the final case the difference is negligible and so lower polygon count is probably best thought of as an additional perk, rather than a game changer. This is particularly true if one wants to use vertex lighting or tilemaps, which require a great deal of shader trickery to get to work properly with this face merging.

Conclusion

Despite all of this data supporting using interval trees and my bias from sinking a week of my time into implementing them, I’m not really able to wholeheartedly recommend their usage for 3 main reasons. Firstly, they are a pain to work with: In the game I am currently working on I plan to implement cellular automata based simulation, for example flowing water or gasses. With a run based engine these are far more fiddly to implement, largely due to the poor random access performance. In fact I’ve still not really come up with a nice way to do this (perhaps a topic for another post). There are plenty of other areas that are also made much harder with interval trees, physics being one of them.

The second reason is more fundamental and that is the problem of uniqueness. Every single value you add to a voxel decreases the chance that it can be merged with those around it. For example, a lot of voxel engines use voxel based lighting, with each voxel storing the cumulative light value from the light sources around it. Such a system is simply not possible with runs.

The third and final reason is more from a gameplay perspective. Back in the early days of minecraft, ores were more or less plentiful depending on which quadrant of the map you were in, and even today it is plagued with orientation based bugs. Using a run based system is just asking for such issues, issues which I think a user shouldn’t have to contend with.

Despite all this I intend, time permitting, to continue to investigate using interval trees to store voxel data. I find working with them an incredibly interesting and rewarding challenge and perhaps I will find something they are capable of which traditional engines are not.

2 Responses to “Voxel Engine – Performance”

  1. With the tests, it seems as your just doing one block. Would it make a difference if within one chunk, there were different block types? Traditional Methods have to have some advantages right?

    Reply
    • Tustvold

      Within the data structure the block types are stored as 16-bit integers, meaning both engines support up to 65535 different block types, including air. Both engines make the assumption that other than air there are no transparent voxels, i.e you only ever need to render a face if it is adjacent to an air voxel. This makes air kind of special and so it is the only voxel type that needs to be specifically mentioned.

      Reply

Leave a Reply to Tustvold

Click here to cancel reply.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>