An Approach to Occlusion Culling
I've never paid occlusion culling the same attention that I had other areas of rendering. At least not in the form that the term refers to today on our current generation of graphics hardware. This is probably due to the fact that aside from basic frustum culling and spatial partitioning, modern graphics hardware is excellent at processing large amounts of raw polygon soup quickly to determine what shouldn't be drawn. That being said, there are still countless applications where a dedicated occlusion culling system would be beneficial, provided that it does not incur a performance hit greater than that if handled directly by the GPU.
After discussing the subject in more detail during a lecture last week and hearing about Umbra, I was driven to see if I could come up with a system of my own. While I did come up with an idea, I am almost sure that it is not what you could classify as highly efficient. Regular frustum culling is probably faster, given the ability of a GPU nowadays. It is also by no means a substitute for the visibility problem nor is it a completely accurate method of occlusion, at least not without without a considerable loss in performance. Rather its intention is to remove the bulk of occluded objects, depending on the level of resolution which is about to be discussed in more detail.
Take a game world in the form of a node tree structure, for this case assume the use of a bounding volume hierarchy or BVH. The first step is to ensure that every individual object not only has their own axis-aligned bounding box, but a secondary mesh representing the best possible recreation of their shape with as few polygons as possible. Somewhat similar to the low-polygon meshes used for collision detection in place of the high-resolution model itself. In the image below, the white line shows a regular axis-aligned bounding box or AABB from a side on orthogonal view. In yellow is the low-polygon representation, ideally taking up as much space inside the object as feasibly possible.
The low-polygon representation of the mesh should always have the same orientation of its parent object, no matter where it moves or how it rotates. From here the next step involves an initial cull, similar to frustum culling process albeit with a simpler oriented bounding box or OBB. Sufficient enough for our purposes while simultaneously yielding greater performance. For extra efficiency the frustum's OBB should be cached and orienting only when required, rather than having it generated on the fly for each rendering call. The same principal applies for each objects individual low-polygon mesh.
After this process we are left with a list of objects, still organized as a BVH that are known to be either completely or partly inside the frustum's OBB. To begin the actual culling process the frustum is divided into smaller chunks that approximate the frustum itself, each in the form of a seperate OBB that should also be cached. The amount of chunks is arbitrary, the higher the number the greater the accuracy with an evident performance trade-off. These values would need to be set depending upon the application but for now let's assume that there are only 4 evenly divided chunks for explanation purposes as illustrated in the image below.
The chunk closest to the near plane is checked first. All nodes that intersect the OBB of that chunk are checked one-by-one with their low-polygon counterpart in blue eventually being projected onto the surface closest to the viewer as illustrated in red. These projections are used to generate a stencil, resembling a silhouette of visible objects that occlude objects further away. As each object is checked, they are also added to the list of objects to be rendered by the GPU as we already know that they are potentially visible. These objects among the rest of the objects residing inside the chunk completely are simultaneously removed from the list of objects to check.
For each consecutive chunk, the same process continues with the addition of a check to determine whether the projected version of each objects shape is visible or not through the stencil. Invisible or occluded objects are discarded, while those found to be partly visible are added to the stencil and the final list of renderables. The stencil itself resembles a 2D screen buffer, each pixel (stexel?) must for now be checked individually. Stencils with larger resolutions would raise the level of accuracy as would a larger number of chunks, at the cost of performance as mentioned earlier. Again these values would need to be adjusted on a per application basis depending upon the type of scene and average object size.
After the last chunk has been processed, the list of objects initially found in the frustum's OBB should be empty, with all objects at least partly visible included in the final render list. There will be cases where objects that are entirely occluded are still considered visible due to the fact that their low-polygon representations will never perfectly reflect their original shape. The number of chunks and stencil resolution will also have a substantial impact, particularly with smaller objects. Use of logarithmically scaled chunks as opposed to being evenly divided may also produce more desirable results, certainly an area that will require thorough testing.
The projection and comparison of shapes onto the stencil will need to be as efficient as possible. I believe a good starting point for determining the appropriate number of chunks, is to take the square root of the number of pixels within the stencil by multiplying each axis of the resolution together and then square rooting the result again. I will continue toying around with ideas to improve this process as checking each pixel to me feels like a major bottleneck, unless a stencil of low resolution is used hindering the purpose of the system anyway due to poor results.
In the near future I plan to produce a prototype, most likely with Unity given its ability for rapid development. For the sake of comparison I will manually implement regular frustum culling along with the concept described herein, this should provide a decent approximation of how it would perform if implemented properly. I am almost sure that after an initial prototype it will be clear that major changes are in order, given the opportunity to see exactly what works well and what doesn't for whatever reason.
After discussing the subject in more detail during a lecture last week and hearing about Umbra, I was driven to see if I could come up with a system of my own. While I did come up with an idea, I am almost sure that it is not what you could classify as highly efficient. Regular frustum culling is probably faster, given the ability of a GPU nowadays. It is also by no means a substitute for the visibility problem nor is it a completely accurate method of occlusion, at least not without without a considerable loss in performance. Rather its intention is to remove the bulk of occluded objects, depending on the level of resolution which is about to be discussed in more detail.
Take a game world in the form of a node tree structure, for this case assume the use of a bounding volume hierarchy or BVH. The first step is to ensure that every individual object not only has their own axis-aligned bounding box, but a secondary mesh representing the best possible recreation of their shape with as few polygons as possible. Somewhat similar to the low-polygon meshes used for collision detection in place of the high-resolution model itself. In the image below, the white line shows a regular axis-aligned bounding box or AABB from a side on orthogonal view. In yellow is the low-polygon representation, ideally taking up as much space inside the object as feasibly possible.
The low-polygon representation of the mesh should always have the same orientation of its parent object, no matter where it moves or how it rotates. From here the next step involves an initial cull, similar to frustum culling process albeit with a simpler oriented bounding box or OBB. Sufficient enough for our purposes while simultaneously yielding greater performance. For extra efficiency the frustum's OBB should be cached and orienting only when required, rather than having it generated on the fly for each rendering call. The same principal applies for each objects individual low-polygon mesh.
After this process we are left with a list of objects, still organized as a BVH that are known to be either completely or partly inside the frustum's OBB. To begin the actual culling process the frustum is divided into smaller chunks that approximate the frustum itself, each in the form of a seperate OBB that should also be cached. The amount of chunks is arbitrary, the higher the number the greater the accuracy with an evident performance trade-off. These values would need to be set depending upon the application but for now let's assume that there are only 4 evenly divided chunks for explanation purposes as illustrated in the image below.
The chunk closest to the near plane is checked first. All nodes that intersect the OBB of that chunk are checked one-by-one with their low-polygon counterpart in blue eventually being projected onto the surface closest to the viewer as illustrated in red. These projections are used to generate a stencil, resembling a silhouette of visible objects that occlude objects further away. As each object is checked, they are also added to the list of objects to be rendered by the GPU as we already know that they are potentially visible. These objects among the rest of the objects residing inside the chunk completely are simultaneously removed from the list of objects to check.
After the last chunk has been processed, the list of objects initially found in the frustum's OBB should be empty, with all objects at least partly visible included in the final render list. There will be cases where objects that are entirely occluded are still considered visible due to the fact that their low-polygon representations will never perfectly reflect their original shape. The number of chunks and stencil resolution will also have a substantial impact, particularly with smaller objects. Use of logarithmically scaled chunks as opposed to being evenly divided may also produce more desirable results, certainly an area that will require thorough testing.
The projection and comparison of shapes onto the stencil will need to be as efficient as possible. I believe a good starting point for determining the appropriate number of chunks, is to take the square root of the number of pixels within the stencil by multiplying each axis of the resolution together and then square rooting the result again. I will continue toying around with ideas to improve this process as checking each pixel to me feels like a major bottleneck, unless a stencil of low resolution is used hindering the purpose of the system anyway due to poor results.
In the near future I plan to produce a prototype, most likely with Unity given its ability for rapid development. For the sake of comparison I will manually implement regular frustum culling along with the concept described herein, this should provide a decent approximation of how it would perform if implemented properly. I am almost sure that after an initial prototype it will be clear that major changes are in order, given the opportunity to see exactly what works well and what doesn't for whatever reason.
Comments
Post a Comment