Skip to main content

Sparse Voxel Octrees

Terrain implementation for games is a subject with a lot of depth. At the surface, it's very easy to get rudimentary terrain working via a noise function and fixed triangle grid. As you add more features, though, the complexity builds quickly.
  1. Overhangs, caves
  2. Multiple materials
  3. Destructive terrain
  4. Collision detection
  5. Persistence
  6. Dynamic loading
  7. Regions, biomes
  8. Very large scale
  9. Smooth features
  10. Sharp features
  11. Dynamic level of detail
  12. Extremely long view distances
In isolation, each of these features can make terrain difficult to implement. Taken together, it takes a lot of care and attention to make it all work.

Minecraft is probably the canonical example for terrain generation in the last generation of games. In fact, I'd say Minecraft's terrain is the killer feature that makes the game so great. Minecraft does an excellent job at 1-8, but for a recent planetary renderer project I was working on, I really wanted 9-12 too. In a series of articles, I'm planning to break down what's required to make all these things work.

Let's start with very large scale, long view distances, and dynamic level of detail. These are items Minecraft doesn't do well enough for planetary rendering. But first, let's talk about constraints.

For caves, voxel terrain is a requirement. It's very difficult to get caves working with a simple height map. Many planetary renderers out there that I've seen do not use a voxel representation, and therefore have difficulty with caves. It's still possible; I've seen systems that generate "portals" in the main height map mesh that open into tunnels/caves. However, if you go that route, I find it becomes more difficult to implement collision detection, destructive terrain, and very large scale. I think that voxel representation makes persistence and dynamic loading easier as well, due to how easy it is to "chunk" the terrain data.

However, voxels can make very large scale terrain difficult, because the data required to represent a world of size N scales in O(N^3). Minecraft deals with this problem by limiting the height of its world, so that the world scales roughly by O(size^2) rather than O(size^3). With planetary rendering, this height limit is not possible.

Fortunately, there's a solution for this problem: sparse voxel octrees. A sparse voxel octree is just a normal octree where each node is a voxel. Unlike in Minecraft, these voxels come in varying sizes. At the top level of the voxel octree for a planet (10,000 km diameter), you would  have a single 10,000 km^3 voxel. The voxels are subdivided in powers of two from there; each parent voxel has 8 child voxels that subdivide the volume of the parent.

This representation creates a hierarchy of different voxels, which is helpful for dealing with large-scale terrain. If you're using Perlin noise sampling to generate the terrain (a common approach), each "level" of voxels is sampled with a different frequency. For example, a 256 m voxel would sample Perlin noise every 256 m in 3D space. The next smallest size would sample at a higher frequency: one sample for every 128 m^3 volume. 

With very, very large voxel octrees, you run into a problem here. If your terrain is detailed enough, you need to sample with enough resolution to capture all the details. However, naively sampling at this resolution is prohibitive for performance. A 10,000 km^3 voxel octree with 128 m leaf voxels in the octree has (10,000,000/128)^3 = 400 trillion leaf voxels! It's clearly impractical to work with an octree of this size. Even if the octree is simplified on-the-fly to reduce memory usage, it takes way too long to explore all 400 trillion leaf voxels to decide which to simplify.

To get around this, it's important to use knowledge of the terrain you're generating to aggressively prune internal nodes from the octree. For planetary terrain, for example, a very large volume in the top-level, 10,000 km^3 voxel is air. Another very large volume is internal to the planet and too deep for the player to explore. A much smaller volume of the octree (the "crust" of the planet) is actually worth exploring.

However, even if you aggressively prune voxels that are known to be all air/all solid, the volume to explore is still very large. In planetary rendering, if the planet "crust" is 20 km (Mount Everest is 8 km tall, Marianas trench is 10 km deep), then the total volume is 20,000 x 4 x pi x (5,000,000)^2 / (128^3) = 3 trillion voxels.

This is where dynamic level-of-detail comes into play. The key is to explore those 3 trillion voxels incrementally, as needed. As the player explores the voxel world, you can generate areas the player has explored at the detail required. For example, at an extremely high altitude, you can stop exploring at 5 km^3 voxels, since the player can't see any more detail than that. Given the camera position, you can calculate the "minimum voxel level of detail" that the player can see and stop exploring the octree at that voxel size. As the player moves around, you can dynamically explore/expand octree nodes to generate more detail, and contract/simplify octree nodes that are very small and far away.

Here we run into another problem: undersampling. If you're worked in engineering, you might know about the Nyquist frequency rule. In the context of terrain rendering, the Nyquist frequency rule implies that when you have terrain with features of size N, then you need to use voxels of at least size N/2 to capture all the detail. For example, consider a conical mountain that's about 5 km in diameter and 5 km tall and fits inside a 5 km^3 volume. If you sample terrain at a resolution of 5 km, you may completely miss this mountain because it falls into the gap between samples. With voxels of size 2.5 km, you can accurately sample and render the mountain.

This is indeed a tricky problem to solve with sparse voxel octrees. On one hand, we can't explore the terrain too much, or we'll load too many leaf nodes and run out of memory to store the octree. On the other hand, if we don't explore enough, we'll miss important "high-frequency" features, like the mountain. Simplification doesn't help much here either; we need to explore at high frequency before we can definitively simplify out flat areas without missing features like the mountain.

In the general case this is a pretty tough problem to solve. In practice, though, we are generating and sampling terrain, not just sampling (i.e., reading terrain samples from a database). For starters, we can use knowledge of the terrain generation algorithm to control how much we explore the octree.

For example, let's say that the voxel world is divided into two biomes: plains, and mountains. If you know that a certain voxel only contains plains, then you can sample it at a much lower frequency. This applies to smaller features as well: if you know that a 512 m voxel is a "smooth granite mountain face" then you probably don't need to subdivide it into eight 256 m voxels to render that detail.

There's another trick that you can use here: modify the terrain generation algorithm to avoid sampling errors. One approach is to use what I call "offset sampling." The idea is that large parent voxels establish coarse details with one noise function, and then smaller child voxels build fine details on top of the coarse detail. Say you the coarse details is a 5 km voxel containing a flat 45 degree slope. Subvoxels can then apply random offset to this slope to get a new, more detailed slope with smaller valleys and peaks. As long as the subvoxels don't go outside the bounds of the parent voxel, this works out quite well. That is, if you're rendering a plains voxel, don't make the child voxels introduce 10 km tall spikes. In other words: the max local feature size for a child voxel should be smaller than the parent voxel. This works much the same way the classic diamond-square terrain algorithm does, where each step mutates the terrain map by 1/2 the amplitude of the previous step.

To summarize, it's possible to get very large terrain, dynamic level of detail, long view distances, and voxel terrain.  The key techniques are:
  1. Use a sparse voxel octree
  2. Aggressively prune octree nodes
  3. Use knowledge of the terrain generator to control octree exploration
  4. Avoid undersampling by using "offset sampling"
In a follow up article, I'd like to explore how to generate a mesh from the voxel terrain that retains both smooth and sharp features in a stylistic way.


Popular posts from this blog

Lua-Style Coroutines in C++

Lua's implementation of coroutines is one of my all-time favorite features of the language. This (short) paper explains the whole reasoning behind the Lua's coroutine implementation and also a little about the history of coroutines. Sadly, coroutines are not supported out-of-the box by many modern languages, C++ included. Which brings me to the subject of this post: Lua-style coroutines in C++! For those who don't know (or were too lazy to read the paper!), Lua's coroutines support three basic operations: Create: Create a new coroutine object Resume: Run a coroutine until it yields or returns Yield: Suspend execution and return to the caller To implement these three operations, I'll use a great header file: ucontext.h. #include <vector> #include <ucontext.h> class Coroutine { public: typedef void (*Function)(void); Coroutine(Function function); void resume(); static void yield(); private: ucontext_t context_; std


So, it turns out that I didn't use Criterium for the video game competition at Stanford.  I actually met a partner and went with another concept instead -- Warp.  It's kind of like Starfox and it's inspired by Rez, one of the first PS2 games.  Explosions and missiles fire in time with the music; we used ChucK , an audio processing language, to achieve this. We also made some destructible objects using rigid bodies, and I added some particle explosion effects.  We used Lua to for enemy AI, and wrote a small TCL-like script parser that reads in data for the level layout.  The buildings in the background are procedurally generated.  We used OGRE for the graphics (this was a loose requirement of the project) and Bullet for the physics.  I had a lot of fun with this project, and I've posted a video capture below.