I’m noticing a trend in graphics. It appears that the gap between what is possible in theory and what has been acheived in practice is growing with increased hardware power. Towards the end of the last console cycle we seemed to hit diminishing returns and you could point at the crop of AAA titles and reasonably say that they had narrowed in on best practice. Sure there were a few that went farther with more exotic techniques such as adaptive tesellation LOD or even early deferred shading, but there didn’t seem to be a huge supply of new techniques that could be crammed into the available resources of an xbox or PS2.
With the current generation, many developers started by porting over the best practice techniques from the last generation – even if they started ‘fresh’. But the scope of potential techniques that we can now fit into a game running at 30fps has increased greatly. I can only see this trend increasing with the next generation, whenever that may be.
Its always fun and instructive to think about what the ideal state of the art for a given platform could be, given a potentially large amount of time to research and implement all the ideas. For the current console hardware generation, I think a really state of the art engine would be quite radically different than your typical forward shaded polygon rasterizer which is a continuation of last-gen technology. The ideal architecture for 360/Ps3 tech-level hardware (based on research available now) would be based on the following ideas/systems:
- novel surface rendering – moving beyond the hardware triangle setup limitation: point splatting and or even limited ray tracing
- deferred shading, possibly combined with light buffers, and perhaps even a more exotic approach, such as deferred texturing
- G-buffer compositing primitives (simplified geometry which performs subsets of the full shading computation – lights are just one possibility, 3D texture decals another, etc.)
- spatial coherence optimizations – frequency adaptive shading: optimizations inspired from image compression: shading and shadowing performed on an adaptively reduced set of sample points which can then be upfiltered with little quality loss
- temporal coherence optimizations – inspired from motion compensation in video compression: full cached deferred shading – most samples are reused frame to frame
- real-time global illumination with area lighting through a combination of techniques
- adaptive quadtree geometry images for terrain and megasurfaces – adaptive view dependent micro tessellation and or point splatting or ray tracing
- adaptive quadtree textures for streaming – virtualized texture memory resources (geometry streaming/management falls out of this as geometry images are textures)
- full unique texturing through some combination of techniques – explicit virtual textures and or ‘procedural’ painting of custom texture layers and decals directly into the G-buffer – 3D Painting
- sort order independent translucency and true volumetric rendering for clouds, fog & scattering through adaptive optic density histogram rendering
So you can probably add alot more to this list: I’m not even mentioning foilage, water, or a dozen other more minor systems, mainly because I haven’t spent as much time thinking about them. Of the above mentioned ideas, I’ve implemented about half. Each is a subject into itself. Some are old news by now and becoming something of an accepted standard – such as deferred shading, which was unpopular and ‘risky’ at pandemic with my peers when I first fought for it (back when only Stalker was known to be using it, before that game shipped), but is now increasingly becoming standard as many high profile titles are using it.
For your main path opaque scene, the obstacles to acheiving near-film quality are legion. On the technical rendering side, you can reduce the problem down to properly sampling the visually important components of the rendering equation to near pixel resolution across all views of your scene. You can further divide this problem into geometric accuracy, material-texture accuracy, and finally illumination accuracy. Character rendering is currently the area we’ved achieved the highest accuracy on top-tier games (even if the animation often falls far short). This trend started with Doom3 and the application of the ‘appearance preserving simplification‘ route to faking high geometric accuracy through a unique normal map which stores true geometric normals (instead of the typical, weaker approach of using the normal map to add additional procedural normal pertubation to a low resolution model.) For characters, the memory costs for storing several unique textures are reasonable enough that acheiving high fidelity for the 1st two parts of the rendering challenge becomes mainly an art pipeline issue.
The evolution of carmack’s tech at id from doom3 to the id tech 5 stuff follows a pretty clear trajectory – if you could just store enough data to uniquely texture the entire world, then you could almost completely solve the rendering problem – geometric accuracy faked through unique normal maps derived from higher res geometry, full pixel level accurate material control across the entire world, and accurate secondary global illumination pre-baked at high resolution. Add dynamic direct lighting and you’ve taken a big short cut to hitting film quality, at least for static worlds of reasonable size. You reduce the entire runtime problem down to one of streaming and compression, where you can focus all of your attention.
From an art pipeline perspective, this tech has two disruptive, revolutionary effects: it allows artists to uniquely paint material properties across the entire world, effectively removing texture budget limitations, and it thus also allows triangle count limitations to be effectively bypassed to a degree by baking high res geometry down to unique normal maps across your entire world! In essence, the idea is to take the big idea from doom3 that worked so well for characters and apply it en masse to the entire world.
The massive data and streaming requirements of a full unique texturing approach through virtual textures limit the applicability of the technique, unless some novel surface compression techniques are employed, beyond typical jpeg or wavelets. Much of my own work has focused on achieving full unique surfaces through two seperate techniques: unique geometry though quadtree geometry images and GPU surface generation/decompression, and unique texturing detail through deferred compositing of ‘procedural’ texture layers. From an artist’s perspective, the effect is that they can paint texture anywhere into the scene, but the runtime compositing avoids UV coordinate hassles and massive virtual textures, at some peformance cost, which however can be well optimized.
Likewise, even though virtual textures make large scale precomputed global illumination effects practical, ideally we would like to acheive(fake) real-time global illumination, for both larger worlds and more dynamic, interactive worlds with destruction, diurnal cycles, and the associated dynamic lighting environments. Here I see a combination of techniques which can make this possible: approximate conic area lighting, conic ambient occlusion for low frequency shadowing approximation, GPU generation of 2nd bounce light sources, and agressive optimization of the light-pixel interactions through exploitation of spatial and temporal coherence.
Finally, even though surfaces with medium geometry tessellation amplified with higher frequency unique normals look pretty good for organic surfaces, we can do better. Really high quality terrains and organic surfaces require pixel or sub-pixel accurate geometric accuracy, which is difficult to acheive on current hardware due to the triangle setup and rasterization bottleneck. This is an area where some more exotic surface techniques such as point splatting and even limited ray tracing could potentially be even faster than rasterizing micro-triangles on current hardware, when combined with key spatial and temporal optimizations. Ultimately, this is the likely new direction for the state of the art in the next generation of GPU hardware. Triangle rasterization no longer has specific advantages in the upcoming next generation of renderers.
But coming back down to the reality of the current generation, there is a final set of phenomenon that is not well handled by any of the current techniques discussed above, which is atmospheric effects. I’ve been thinking about what I believe is a new representation of optic density (the function of visibility through a ray) which could permit an order independent rendering of translucent effects suitable for clouds, smoke, fire, fog, and general atmosphere, and more importantly, based on the equivalent of a z-buffer which would allow full shadowing and lighting effects. There’s already been some work related to this, such as deep shadow buffers, and optic density accumulation of particles, but what I have in mind is a volumetric particle system combined with an optic density histogram structure, built in pyramid fashion, which achieves massive fillrate reduction, order independence, and a single structure which can give an illumination/extinction query per pixel. It naturally exploits spatial coherence, in 2D of the screen through the pyramid, and also along depth. It would be even better to exploit temporal coherence, but I haven’t thought much about that yet.
If I actually had the time to implement all of this before the next generation comes around, I think it would make a pretty impressive engine. But the next generation is already here on high-end PC, and my current bet is that all of these techniques can be replaced by a simpler rendering architecture based on a single data structure and algorithm: voxel cone tracing. When combined with spatial and temporal coherence optimizations, this technique can do everything, and do it fast and well. The real challenge will be in streaming and compression to handle all the data and fit into memory.