I need to somehow enforce a mental pre-committment to blog daily. It’s been almost half a year and I have a huge backlog of thoughts I would like to commit to permanent long term storage.
Thus, a commitment plan to some upcoming future posts:
- In October/November of last year(2010), I researched VR HMDs and explored the idea of a next-generation interface. I came up with a novel hardware idea that could potentially solve the enormous resolution demands of a full FOV optic-nerve saturating near-eye display device (effective resolution of say 8k x 4k per eye or higher). After a little research I found the type of approach I discovered already has a name: a foveal display, although current designs in the space are rather primitive. The particular approach I have in mind, if viable, could solve the display problem once and for all. If an optimized foveal display could be built into eyewear, you would never need any other display – it would replace monitors, tvs, smartphone screens and so on. Combine a foveal HMD with a set of cameras spread out in your room like stereo speakers and some software for real-time vision/scene voxelization/analysis, and we could have a Snowcrash interface (and more).
- Earlier in this year I started researching super-resolution techniques. Super-resolution is typically used to enhance old image/video data and has found a home in upconverting SD video. I have a novel application in mind: Take a near flawless super-res filter and use it as a general optimization for the entire rendering problem. This is especially useful for near-future high end server based rendering solutions. Instead of doing expensive ray-tracing and video compression on full 1080p frames, you run the expensive codes on a 540p frame and then do a fast super-res upconversion to 1080p (potentially a 4x savings on your entire pipeline!). It may come as surprise that current state of the art super-res algorithms can do a 2x upsample from 540p to 1080p at very low error rates: well below the threshold of visual perception. I have come up with what may be the fastest, simplest super-res technique that still achieves upsampling to 1080p with imperceptible visual error. A caveat is that your 540p image must be quite good, which has implications for rendering accuracy, anti-aliasing, and thus rendering strategy choices.
- I have big grandiose plans for next-generation cloud based gaming engines. Towards that end, I’ve been chugging away at a voxel ray tracing engine. This year I more or less restarted my codebase, designing for Nvidia’s fermi and beyond along with a somewhat new set of algorithms/structures. Over the summer I finished some of the principle first pipeline tools, such as a triangle voxelizer, some new tracing loops and made some initial progress towards a fully dynamic voxel scene database.
- Along the way to Voxeland Nirvanah I got completely fed up with Nvidia’s new debugging path for cuda (they removed the CPU emulation path) and ended up writing my own cuda emulation path via a complete metaparser in C++ templates that translates marked up ‘pseudo-cuda’ to either actual cuda or a scalar CPU emulation path. I built most of this in a week and it was an interesting crash course in template based parsing. Now I can run any of my cuda code on the CPU. I can also mix and match both paths, which is really useful for pixel level debugging. In this respect the new path i’ve built is actually more powerful and useful than nvidia’s old emulation path as that required full seperate recompilation. Now I can run all my code on the GPU, but on encountering a problem I can copy the data back to the CPU and re-run functions on the CPU path with full debugging info. This ends up being better for me than using nvidia’s parallel insight for native GPU debugging, because insight’s debug path is rather radically different than the normal compilation/execution path and you can’t switch between them dynamically.
- In the realm of AI, I foresee two major hitherto unexploited/unexplored application domains related to Voxeland Nirvanah. The first is what we could call an Artificial Visual Cortex. Computer Vision is the inverse of Computer Graphics. The latter is concerned with transforming a 3+1D physical model M into a 2+1 D viewpoint image sequence I. The former is concerned with plausibly reconstructing the physical model M given a set of examples of viewpoint image sequences I. Imagine if we had a powerful AVC trained on a huge video database that could then extract plausible 3D scene models from video. Cortical models feature inversion and inference. A powerful enough AVC could amplify rough 2D image sketches into complete 3D scenes. In some sense this would be an artificial 3D artist, but it could take advantage of more direct and efficient sensor and motor modalities. There are several aspects to this application domain that make it much simpler than a full AGI. Computational learning is easier if one side of the mapping transform is already known. In this case we can prime the learning process by using ray-tracing directly as the reverse transformation pathway (M->I). This is a multi-billion dollar application area for AI in the field of computer graphics and visualization.
- If we can automate artists, why not programmers? I have no doubt that someday in the future we will have AGI systems that can conceive and execute entire technology businesses all on their own, but well before that I foresee a large market role for more specialized AI systems that can help automate more routine programming tasks. Imagine a programming AI that has some capacity for natural language understanding and some ontology that combines knowledge of some common-sense english, programming, and several programming languages. Compilation is the task of translating between two precise machine languages expressed in some context-free grammar. There are deterministic algorithms for such translations. For the more complex unconstrained case of translation between two natural languages we have AI systems that use probabilistic context-sensitive-grammars and semantic language ontologies. Translating from a natural language to a programming language should have intermediate complexity. There are now a couple of research systems in natural language programming that can do exactly this (such as sEnglish). But imagine combining such a system with an automated ontology builder such as TEXTRUNNER which crawls the web to expand it’s knowledge base. Take such a system and add an inference engine and suddenly it starts getting much more interesting. Imagine building entire programs in pseudo-code, with your AI using it massive onotology of programming patterns and technical language to infer entire functions and sub-routines. Before full translation, compilation and test, the AI could even perform approximate-simulation to identify problems. Imagine writing short descriptions of data structures and algorithms and having the AI fill in details and even potentially handling translation to multiple languages, common optimizations, automatic parallelization, and so on. Google itself could become an algorithm/code repository. Reversing the problem, an AI could read a codebase and began learning likely structures and simplifications to high-level english concept categories, learning what the code is likely to do. Finally, there are many sub-problems in research where you really want to explore a design space and try N variations in certain dimensions. An AI system with access to a bank of machines along with compilation and test procedures could explore permutations at very high speed indeed. At first I expect these type of programming assistant AIs to have wide but shallow knowledge and thus amplify and assist rather than replace human programmers. They will be able to do many simple programming tasks much faster than a human. Eventually such systems will grow in complexity and then you can combine them with artificial visual cortices to expand their domain of applicability and eventually get a more complete replacement for a human engineer.