Low Level Platform Optimisation

(all performance metrics and code is available on my github at the bottom of the page)

In my final year at university I was given an unoptimized simulation of colliding and bouncing objects with no bounding volume hierarchies (BVH) so with large numbers of objects, the simulation was slow and inefficient to start with and I was tasked to improve its performance and efficiency with resources.

Memory Optimisation:

The first improvement implemented was to override the ‘new’ keyword in C++ to allow for global tracking of where and how much memory is being used. Using ‘malloc’ to define the exact size of the memory needed for the requested object to be constructed alongside the size of the header and footer memory. With some simple pointer arithmetic I added header and footer buffers that include data that will be used later. Then I return the start of the memory block so that the objects created can be used properly in the code where the memory was requested.

Memory Tracking and Memory Pool:

A simple tracking system was added so that memory usage and location was easily identifiable during runtime. The next logical step in my progress was to add a memory pool so that when memory is needed at runtime there is no delay in having to request that memory using ‘malloc’ and then assign the header and footer potentially slowing down the simulation. Creating a memory pool at the start of the simulations runtime is a trade off as it may be slower to start but memory isn’t having to be requested and given to the application during runtime. The memory pool worked as a doubly linked list where each header had a pointer to the previous and next header. With different sized memory pools it meant that an object which is larger in memory signature has space in the larger of the pools, once all are full, when new memory is requested it checks for a free slot in each pool and then if there is room, the object is allocated that address, if not more memory is requested. This also allowed the ability to walk the heap and check for any corrupted or overrun buffers.

Bounding Volume Hierarchy – Octree :

Whilst all this memory changing is nice to have, it doesn’t impact the performance, if anything at this point it may be slower. An Octree structure made the most sense to allow for further optimisation, subdividing the scene into octants meant that less objects had to check against every other for a collision minimizing the amount of checks therefore making the application more performant. This had a large impact on the performance of the simulation but wasn’t necessarily very complex solution to the problem.

Thread Pool:

This is where the performance increased majorly. In conjunction with my now subdivided scene a thread pool was perfect for increasing the speed of the simulation and allowing for even more objects than before. Assigning octants weather each leaf node of the octant or a generation or two up the tree allowed for simultaneous checks of collision to be executed each frame, this drastically improved the capabilities of the simulation alongside its speed, using some lambda functions with a queue of functions and then correctly working through the queue correctly utilising mutex’s and conditional variables to allow for no race conditions, livelock’s or deadlock’s alongside other concurrency problems.

Porting to PS4:

Once this had been completed, as a team of three, we ported our code to a PlayStation 4 using the universities development kits, this gave me some practice with using proprietary API’s and navigating a complex codebase that is used in industry.

Oscar Duffield