I am technologist, and a transhumanist. My medium term mission is to create altruistic universal learning machines – ie friendly AGI. My longer term mission is to save the world in the literal sense. Through the power of a benevolent superintelligence, we can develop the advanced uploading and simulation tech required to achieve universal resurrection and destroy death, the last enemy. How fun!
I am a child of the Moore’s Law generation – some of my earliest memories are of playing video games on an Atari. I first experimented with programming around age 16, a bit before the internet, deriving enough from high school geometry to make a simple 3D game. A few year later I started college at UCSB CCS and found the library where I fell in love with computer graphics, culminating in my first serious project: a SOTA procedural planetary terrain renderer. Kind of like a procedural version of Google Earth (which used real world data), but higher visual quality, and earlier. There is some alternate universe where I realized that tech had commercial use outside games and went that route, but I was strictly focused on games. Regardless, the demo circulated a bit and soon I was consulting Ken Musgrave about how to integrate the tech into his MojoWorld fractal terrain program – only about a year after I had first read about fractal terrain from Musgrave’s chapters in the popular procedural texturing and modeling book.
While at UCSB in the late 90’s I was introduced to, and molded by, the transhumanist writings of Vinge, Moravec, Kurzweil and kin. I was intrigued by AI/ML, but informed by the Moore’s Law trendline which predicted AI wouldn’t become interesting for a while I determined that it was better to first focus on graphics/games for the next decade: partly because the compute for AI wasn’t available yet, but also because it was clear to me then that a core component of intelligence must involve approximate inner simulations of the outer world. Therefore it seemed profitable to first deeply learn how to construct efficient simulations at all before trying to build AI that could learn efficient simulations from observations. So I did.
I spent years trying to develop a space game (Eschaton: Chain of Command) with SOTA graphics and production values on a shoestring budget. It almost worked, but we probably should have focused on nailing the game design first (or moreover, the market/business case – this was in the wake of the dot-com crash). After that I got a corporate job at Pandemic Studios and did some mostly boring work on (the mundane but successful) Star Wars Battlefront 2. I then was granted a few months of research time in which I produced a far more impressive version of the planetary terrain demo; through exploiting GPU advances and new features I was able to hit truly movie quality (but 30 fps), with a zoom in from space down to the surface of Mars (realistic scale and appearance – now using real world data + amplification). The demo was legitimately jaw-dropping and made me internally famous, but it was also highly controversial – leading to fierce internal debates about the resources required to build out all the rest of the new game engine tech to a compatible quality level (as just one example, physics engines of that era certainly could not handle the pixel-level geometry detail of my terrain system). Sadly, that demo was never online AFAIK and is probably lost to history.
EA acquired Pandemic at the start of 2008 and by then I had given up on the battle over the game engine – I didn’t see a promising payoff curve for all the tremendous effort. I was looking ahead to moving from quadtrees to more general octree voxel based engines, wrote a voxel cone tracing prototype, and then also becoming very interested in the idea of cloud gaming with remote rendering (which could combine well with voxel engines). I wrote a blog post about cloud gaming just as Onlive came out of stealth; it circulated and was picked up by game developer magazine. A few months later EA closed Pandemic in late 2009; so, naturally, I went to work for Onlive. Unfortunately creating an advanced cloud native engine was neither in their roadmap nor capability set. I stayed with them regardless, until the inevitable implosion in late 2012. Fortunately by that time I had saved a bit into Bitcoin. Now – for the first time in my life – I finally had (barely) enough savings to fund a few years of my own research.
There were some distractions involving crypto and high frequency trading, but mostly I began a self-study in ML, focusing on the research tracks that were just transitioning into early DL: deep belief nets, sparse coding, early ANNs, etc. I realized that sparsity was the key to brain level efficiency, so I created a training framework and began experimenting with various sparse learning algorithms (mostly inspired from sparse coding). I also began iterating on the core problem of efficient sparse ANN (matrix mult & related) GPU algorithms, eventually arriving on a design for a novel matrix multiplication algorithm I ended up calling adaptive sparsity, which – for many interesting, but not all distributions – is asymptotically faster than standard sparse algorithms that all run in time a*b*M*K*N (where a and b are the nonzero fractions of the two input matrices, M,K,N are dimensions). This eventually lead to an early demo of a large sparse CNN that was roughly 10x faster in actual practice than theoretical dense max throughput, which was about 100x or more faster than SOTA sparse algorithms. However it was only the forward pass using plausible ‘fake’ weight data from gabor patterns. A full training pipeline with all required ops is vastly more work, not to mention that fully sparse training of fully sparse nets was still an open research problem.
In 2015 I wrote a less wrong blog post – The Brain as a Universal Learning Machine – summarizing the new interesting developments in comp neuroscience and the fledgling field of DL. I wrote that ostensibly as a critique of evolved modularity – which seemed to be the default viewpoint of Hanson/Yudkowsky/MIRI/LW at the time – but it was moreover a critique of the larger formalist viewpoint in AI, also strongly correlated/associated with MIRI/LW/etc. (ie agent foundations) It was obvious to me then that the connectivists had already decisively won against the formalists, it just wasn’t common knowledge yet. Now, six years later, it is.
By 2016 I had recruited a cofounder (thanks to that post) and incorporated Vast.ai to continue the adaptive sparsity research. Eventually I found an approach – named costreg – for training under adaptive sparsity which could find networks that used roughly 100x less flops, at nearly the same classification performance. I also finally developed our sparse tensor library into a mostly complete replacement for the equivalent CUDNN+pytorch. Then in 2017 Nvidia announced their new Volta architecture with tensorcores. My best fully sparse codes ran at about 1/10th the flop throughput of dense codes, which thus gave about 10x net performance with 100x flop reduction through adaptive sparsity. Now with tensorcores Nvidia had given dense codes a 10x boost, closing the lead for all but very sparse matrices.
In 2018 we pivoted to a different performance lever of a more economic nature – creating the current Vast.ai marketplace to exploit Nvidia’s GPU lead and market segmentation (consumer GPUs provide about 5x better perf/$ vs their very similar but massively upmarked enterprise variants). Along the way I also got a bit more involved with crypto, and the design of Orchid in particular.
As of 2021, three years later, both Vast and crypto are doing quite well. I have figured out how to map adaptive sparsity on to tensorcores (or other small dense systolic array designs), and more generally I have an emerging vision on how to create benevolent universal learning machines. Interested in helping or connecting? I’m also immediately hiring cuda/C++ programmers.
-Jake Cannell 11/29/2021