Why

As you may know Bullet Waste is a first-person bullet-hell game, which means spawning a lot of bullets at runtime. Furthermore, a game designer needs to be able to define complicated attacks with ease. This task was assigned to me and I think I’ve come up with a pretty interesting solution.

But how to tackle this

This problem (as many others) consists of many smaller ones. But for the sake of this article lets focus on the main two:

How to create large complicated attacks easily
How to make it efficient

Understanding the concept

When I was tasked with this I asked what is the relatively complicated attack that should be possible to create and I was given this image:

The image itself doesn’t really say that much about how would this attack behave over time but it was very helpful regardless. You can imagine this as a hierarchy tree where each node is responsible for one simple movement like this:

If we were to follow each possible path from the enemy to the bullet we would recreate the attack from above.

Okay, so how to represent this

The rotation element in the tree above (and later added scale element) exist as a single node in its layer. This allows for the configuration of such element to be a simple Scriptable Object. The tricky part starts with the linear movement elements as in a normal use case there would be a lot more than 6 directions that we’d want to define. Also, the sheer amount of Scriptable Objects were them to be stored in separate files would be unmanageable. So I made the decision to combine directions in a single layer into a list in a single Scriptable Object. This allowed me to write a simple script to parse and import a large number of directions stored in a mesh asset. Starting position and move direction are stored in vertex position and vertex normal respectively. Lastly, all of the element configurations are stored in a single attack configuration in a simple list that represents the layers shown in the picture above.

We have the instruction so let’s build it

An attack configuration tells us everything we need to know to create an attack and the algorithm for doing so is pretty simple.

Start with two collections:
- empty nextNodes
- currentNodes with single root element
For each element in attack configuration list
1. Add it to every node in currentNodes
2. Save generated nodes in nextNodes
3. Swap currentNodes and nextNodes
4. Clear nextNodes
For each node in currentNodes
1. Add a bullet to it

Maybe you’ve noticed that the tree created this way looks like this:

By doing this we can easily differentiate between different bullets and remove them when they are destroyed. But this is one of the things that can be improved in the future.

So what was the first iteration of this

To save time and make things easy to test, the first iteration was based on simple Game Object pool where all of the attack elements were prefabs with single script that updated its transform. The building process created an actual hierarchy that could be seen in the inspector and the bullets contained a kinematic rigidbody with a collider. As you can probably imagine it wasn’t very optimal and, with a large amount of bullets, pushed the physics simulation to it’s limits. The other thing was that, every attack element had it’s MonoBehaviour.Update method which also stressed the game engine. And last but not least, every bullet was drawn separately and needlessly stressed the rendering. All of this resulted in unplayable framerates (~20FPS).

How can this be improved

Firstly, when using Unity systems - don’t. In most situations it is much faster to do the thing ourselves. Of course this is an exaggeration but in this case that was the first step. Let’s take a closer look at the hierarchy of the attack. All the elements do is change the transform in local space. All of the changes in children are handled by Unity. This comes at a cost, because there is an overhead of e.g. managing Game Object or calling Update by the engine.

Let’s inspect what needs to be done exactly

As you may know, the position, scale and rotation of an object in 3D space can be stored in a 4x4 matrix. Further more to combine two transformation matrices we need to simply multiply them. With that in mind we can rewrite our system to, instead of creating an Unity transform hierarchy, create a purely C# hierarchy tree that would be updated manually by us. We’d travel the tree and in our Update method we’d pass in the time and the current transformation matrix. Then we would apply our own transformation to it and pass it down the hierarchy.

But what about the bullets

In the second iteration of the system the bullets remained as Game Objects with collider component and the computed matrix updating the transform component. This wasn’t good enough as the rendering issue hadn’t been addressed. Also a collider without rigidbody won’t detect colliders on objects without rigidbody. This is a logical PhysX optimization as an object without a rigidbody can be considered stationary. But this meant that the bullets wouldn’t be destroyed by the walls. Also while the Unity Game Object managing issue was mitigated it was not completely resolved.

Enter GPU Instancing

GPU instancing is a mechanism that allows for many objects with the same mesh and material to be rendered efficiently. With this the bullets could become purely C# objects that tell some static renderer class to add them to the current batch. With this we can’t rely on collider for collision detection so we make an assumption that all of the bullets have a spherical collider, then we can just call Physics.OverlapSphereNonAlloc and check for collisions that way. Moreover, the Graphics.RenderMeshInstanced expects an array of 4x4 matrices with the transformations of the objects to render… which we calculated when traveling the tree.

Dynamic tree deletion

Another neat thing that was implemented was that when a node’s child was removed the node would check whether it had any children. If not, then it would delete itself. Thanks to this the attack basically cleans itself as it runs and there any dead ends where the updates are called and do nothing. The tree can also be imploded by simply destroying all of the bullets so you don’t have to worry about some elements that could be left uncleaned. This caused some implementation headache because you can’t modify lists when iterating over them. To overcome this the children of each node are stored in a linked list which I can safely iterate using this simple extension method:

public static void RemoveSafeForEach<T>(this LinkedList<T> list, System.Action<T> action)
{
	LinkedListNode<T> current = list.First;
	LinkedListNode<T> next;
	while (current != null)
	{
		next = current.Next;

		action(current.Value);

		current = next;
	}
}

What about the Garbage Collector

The Garbage Collector is known for it’s impact on performance and when coding in an object oriented language you often forget about the memory usage. In spite of that, I’ve managed to make this system work without any allocations during runtime. This was done by creating an object pool for C# objects. Contrary to normal Unity Object pool, the GC must be taken into account as the collection of an object would break the system. The responsibility to call ReturnToPool on a given object is on the programmer. You could implement some resurrection mechanism (making the object return to pool in its finalizer and reregistering for finalization in GC) but it would still require a GC pass to work which would hinder the performance. Other than that, I’m using classic Physics.OverlapSphereNonAlloc with a prealocated buffer.

Don’t trust lambdas

While searching for things that were dynamically allocating memory I saw that for some reason I was generating around 8MB of garbage every tick. What was weird is that everywhere i could i made sure to use object pools or pass around structs that were created on the stack. As it turned out the culprit was this piece of code that every node had:

public void Update(float time, Matrix4x4 transformation, bool active)
{
	Matrix4x4 newTransformation = transformation * GetTransformationMatrix(time);
	bool newActive = active && GetShouldBeActive(time);

	children.RemoveSafeForEach(child => child.Update(time, newTransformation, newActive));
}

As you can see, the lambda passed to the ForEach function had to create a closure on the time, newTransform and newActive variables which meant that these values were dynamically stored on the heap. The simple solution was to store these values as the class fields and create a simple update function that would take in a child object and update it with stored values.

And speaking of closures, let’s wrap this up

The system in Core Mechanics Demo 2 utilizes all of the things mentioned above. The framerate now sits at ~60FPS which is a huge improvement over the starting solution. But this isn’t the end. While reading this you maybe wondered Why not use Jobs for that? And you’d be right. There are still things to improve on such as:

parallelize the update process (with Unity Job system)
reduce amount of used elements by using the same one for every node in the tree (the problem shown in third picture), this can be used to optimize computation further by caching own transformation matrix instead of computing it independently across multiple objects
other things that will surely come up as I continue to work on this

About

Projects

Blog

So you want to have 5k active bullets at once...

Why