Procedural Generation

Noise and Hash Functions

Introduction

Procedural generation is used in many applications in computer graphics and games. The ideas of Procedural Generation have been around since some time in the late 70s and early 80s. At that time, ProcGen was used to get around size limitations, as there was a tight limit on the storage and RAM use of programs. It took less space to store a function that produced some output, rather than store all of the output of the function. The program could just crunch the function and calculate whatever data was needed. For example, in games, storing 2d 'map' data could take up lots of space. A function that could produce map data took much less space to store, and could still run very quickly. Some early games from the those days like "Beneath Apple Manor" and "Rouge" generated dungeons with rooms and corridors. Early predecessors of the modern 'Endless Runner' like River Raid (1982) used ProcGen ideas to place objects as the player moved.

The techniques used in these old games typically amounted to use of a pseudo-random number generator, with a controlled 'seed' value used in the process of generation. Using a different 'seed' would give a different pseudo-random sequence, and change the layout of the level. This pseudo-random function would be used for various things, like deciding the horizontal position of an object placed on the top of the screen, or the width/length of a room or hallway. At the same time, another group of people were using the same techniques, to promote their activities.

Beneath Apple Manor - 1978
Image from myabandonware.com
Rouge - 1980
Image from dosgamesarchive.com
Nethack - 1987
Image posted on echoingthesound.org (direct link)

Demos are tiny programs that were originally inserted alongside pirated software to advertise the skill of the group that 'hacked' that software. These programs were engineered to be as small as possible, and be as optimized as possible, capable of creating animated images set to some synthesized music. These little programs inside of pirated software eventually lead to the creation of the 'Demoscene' - A group of people who make demos, not for software pirates, but for fun.

These 'Demos' didn't really use random functions like games did, but instead used mathematical functions to represenat things. This allowed them to create programs that produced impressive visuals and sound/music, with a minimal program size and memory usage.

Swinth - 1985
Image from a video of the demo on Youtube
Compunet Christmas - 1985
Image from a video of the demo on Youtube

To this day, Procedural Generation is still used, but in many more complex ways. Many games use procedural generation to store as little data as possible, and create massive worlds. The game ".kkriger", released in 2004 as an entry into a competition, was only 96kb in size, but had graphical fidelity comparable to its industry peers.

Screenshots from '.kkriger' - 2004 by '.theprodukkt'
Screenshots taken by myself. .kkrieger is a 96kb game. These images are about 1.5-2 times as large, each!
A windows download can be found here.

More recently, 'Graphics Card' technology has gotten powerful enough to generate incredible graphics in real time. Through a number of 3d rendering techniques, using Raycasting, objects with mathematically defined topologies, and fractals, a shader can be created to render incredibly complex 3d scenes, in real time. No inputs are needed, and the shader calculates and renders the scene, every frame. One recent release, 'Optical Circuit' (Japanese Title '光回路'), released at Tokyo Demo Fest 2015.

Stills from 'Optical Circuit (光回路)' - 2015 by '0x4015'
Screenshots taken myself. Optical Circuit is a 4kb demo - These images are 40-50 times larger, each!
A windows download can be found here.
An embedded version of the shader can be found here.

In the early days of gaming, a little randomness went a long way, and could be used to define properties of things that the game creates, such as their position on a grid, their size on that grid, and other things that are directly mathematical. The seed value could be easily controlled so that the same world could be created every time, if desired. These days, there may be many more objects in the game world that need to be generated procedurally, or properties of the objects must be controlled with finer granularity. A simple random function with a controlled seed isn't good enough, and isn't able to produce everything. For example, terrain generated with just a random diceroll function would be completely incoherent, there would be no geological features, but just a sea of peaks and valleys crammed close together. We would want to get a better noise function that can be used to create more believable worlds.

We also want to be able to reproduce the same outcomes for the same inputs, and not have the process reliant on state. Controlling the seed of a random function to control the random sequence solves the reproducability problem, but leaves the process reliant on state, and also prevents us from parallelizing the generation processes.

Additionally, the output of our 'random' functions must not be truely random, but just statistically random. It's actually a good thing computers are highly deterministic, as this makes creating predictable, pseudorandom noise much easier, and this also has the benefit of making our pseudorandom numbers able to be generated in parallel.

Useful properties for Procedural Generation

  1. Same outcomes for same inputs
  2. No reliance on program state (Seed value shouldn't change when randomness is calculated)
  3. Easily Parallelized (Can run the procedural generation for any two 'things' at the same time)
  4. Statistically Random outputs for any given inputs

A functional solution will provide all of the properties that we want.

Note:
There will be a few different languages used, mostly Nvidia Cg (C for Graphics) and OpenGLSL (Open Graphics Library Shader Language, or GLSL). There are minor differences between these two languages, so each code snippit will have the language it is in noted. Some of the samples will be created in JavaScript so they can run inside the webpage. Other samples will include 'GLSLSandbox' shaders (GLSL code, runs in the browser), And executables with Cg shader code.
Hashing and Noise

The first, and most basic component of building coherent noise is a form of hash function. The hash functions in this context are a function that takes an input of a Scalar or Vector, and outputs a Scalar, in the range [0,1]. There are a number of different forms for these hash functions. Here is a formal definition of a Hash Function.

Hash Function: Definition

A function that maps data of an arbitrary size to data of a fixed size. There are a few properties that are necessary:

  1. Any input to the function must always produce the same output.
  2. Any change to the input data produces an unpredictable change in the output.

For our purposes of generating textures, input data to our hash functions are floating point Scalars, or floating point Vectors any dimension. Our output is a floating point number in the range (0, 1). The differences of exclusion or inclusion of 0 and 1 in the output range are insignificant for the overall behaviour. Typically, a hash function is completely discontinuous at every point.

Also, hash functions do not need to be overly complex, a simple function is preferable to a more complex one, so long as they have all of the necessary properties. Here is a pretty complex hash function:

(C++)
double IntegerNoise(int n) {
  n = (n >> 13) ^ n;
  int nn = (n * (n * n * 60493 + 19990303) + 1376312589) 
  & 0x7fffffff;
  return 1.0 - ((double)nn / 1073741824.0);
}
From libnoise's documentation

This is quite a complex function, and is much harder to compute per-pixel than it needs to be. When rendering noise in real-time, the speed of a 'hash' function must be a consideration. Another thing, is that this function won't run as expected when translated for use on a graphics card- There is no 'double' type in most languages that compile to Graphics Card assemblies. We would want to have a simpler, faster function, that gives us much the same kind of output, and will still work properly on a graphics card.

(NVIDIA CG)
float hash(float n) {
	return frac(sin(n)*SEED);
}
Common form of noise function found in various shaders on http://glslsandbox.com/

Now, this function isn't perfect either, in fact, there are drawbacks to this function that the other function doesn't have: Floating point rounding can create 'artifacts' at points that are very far away from zero, and, on different hardware (graphics cards), floating point numbers may have different sizes. This results in noise based off of this kind of hash function looking different based on the precision of the hardware. Typically, it is expected to be a 32-bit floating point number. Depending on the specifics, on older hardware, it may be a 24-bit or 16-bit.

Example of artifacts caused by low precision, compared to a 'normal' texture

Procedural texture
Artifacted
Normal
One layer of noise in texture
Artifacted
Normal

Lets look at some specific stuff about the smaller, quick hash function.
SEED is a value to scale the distribution of the random function.
frac is a function that returns only the fractional part of the number, as a positive value.

This hash function takes in an input (n), passes that input into a (sin) function, multiplies the value by a SEED value, and then drops the integer part of the number, leaving the fractional part.
Changing the SEED value changes the output of the hash function quite drastically.

Here is some sample output from the hash function:

Output from hash(n) for n=(0...500)
SEED = 31337.1337 1 0 0 500 SEED = 12345.6789 1 0 0 500

It should be relatively easy to see that the distribution between these two functions is different, just from having a different seed value. The most notable thing about the data here, is that there is no COHERENCE in the output. If you went through all of the points, one by one, left to right, the values vary wildly. Additionally, we can visualize the values output from hash not as points on a graph, but as colors.

For the following images, they are made by generating hash values for consecutive numbers, mapping those values directly to grayscale colors, where 0 is black and 1 is white, and numbers in-between are some shade of gray. Then, placing those colors into a grid. The appearance they have is like the harsh 'static' on an untuned television.

Output from 'raw' hash as grayscale textures
SEED = 31337.1337

SEED = 12345.6789

SEED = 9876.54321

Now, that's well and good and all, but, still has no 'coherence'. It's just raw noise, and looks very messy. To make noise that has coherence, we utilize the 'raw' noise, sample it at given points, and for points in between the samples, blend the values together. We look at this on the next page.