How efficient is the brain’s wiring architecture and learning algorithms?
The human brain is our only current working example of high intelligence, so understanding it’s efficiency is critical for forecasting the future of AI.
Efficiency here is concerned solely with algorithmic and circuit efficiency. As one learns and acquires knowledge, one’s brain self-organizes into a more efficient circuit for the task at hand. There are thus two components of efficiency to measure: the task efficiency of a trained expert brain and the learning algorithm itself which optimizes the network towards that final attractor.
Consider that the brain runs at a clock rate that is somewhere less than a single, measly kilohertz.
Now combine that with the knowledge that we have hit something of a fundamental power and heat limited clock rate wall with current CMOS technology, and although Moore’s law is continuing, that just means that transistor density alone is increasing exponentially, not the clock rate. The human cortex has somewhere around 100 to 1,000 trillion transistor equivalents – a good deal, but not unreachable – equivalent to ten to a hundred thousand current CMOS chips (remember the cortex has the area of several entire CMOS wafers – from which hundreds of chips are cut, and it has several layers of thickness), or a few hundred modern hard-drives.
If you could arrange those chips in a cortical design, you could in theory run them a million times accelerated. And even if it at some point in the future we reach terahertz speeds, that will just help the cortical architecture even more, because then it can run a billion times accelerated. The unintuitive, contrarian observation is that the tremendously slow speed of neurons implies a corresponding tremendous efficiency of circuit level organization.
This is interesting in itself, but more importantly it sets a bound on the performance of any other potential AGI architecture.
AGI Parallel Speed Scalability Observation: Consider an AGI algorithm that can match human intelligence in all measures running on a computational system of N total circuits running at clock rate R. If R and the substrate physics are fixed but N increases exponentially toward infinity (ie more and more parallel circuits put together), the potential speedup of the system will scale very weakly because of the constraints of physics. Increasing N increases the size of the system and thus the physical path lengths for circuit communication for any fixed R. For any such algorithm on a particular physical substrate, there is an eventual asymptotic wall where increasing N can not increase the performance of the system any further (and in fact may degrade it) . On the other hand, increasing R (up to the speed of light) will always result in a strict linear performance increase. Thus the maximum intrinsic speed scalability of an AGI algorithm (maximum potential speedup) is determined by how small R is. The brain’s network is an AGI algorithm running on hardware with N ~<10^15 and R ~<1000. Current computer systems have N in the range of 10^10-10^12 for home PCs and up to 10^14-10^17 for supercomputers, but R in all cases is around 10^9. Thus the brain’s particular circuit algorithm for AGI has a massive intrinsic scalability of 10^6. Any other AGI algorithm with similar scalability (potential to think a million times accelerated) will obviously have to first run in real-time on a parallel machine with R around 1000 – a kilohertz machine.
So even if there was another general AGI algorithm, quite different from the general, flexible cortical meta-algorithm the brain uses, the brain’s cortical meta-algorithm has a rather mind-boggling potential speedup advantage.
It turns out there is a pretty easy route to placing bounds on the brain’s efficiency: we can compare the effectiveness to a trained human expert on a particular task or problem such as a game to the best known computer algorithms. Why is this useful? In computer science, the best known algorithm is the best a learning agent can ever hope to do, as it pays a high cost for its generalization capability and learning adaptation. The best specialized algorithms for a given problem will strictly beat a general learning agent using considerably less resources – and this is guaranteed in all cases where the problem is solved or nearly solved. Checkers has been solved more or less for over a decade, as it has a relatively small possibility space and low branching factor. Chess is more complex, but the general minimax pruning strategies used now are believed to be optimal or reasonably close enough. The Deep Blue vs Kasparov match was a turning point for AI, but a decade later a 2010 commodity PC running the best chess algorithms can now play at the grandmaster level. Thus in this problem domain, an optimally configured 2010 PC is equivalent to the best trained human mind. This indicates to us some form of a strict lower bound on the efficiency of the brain, for no general AI program running on the same PC can perform remotely close to the best known chess algorithms. So we can say with some confidence that there is no general AI algorithm that could ever reach human level performance on a 2010 PC – it will require strictly more power to match the brain. But how much?
The strength of the brain is in its large memory capacity, deep pattern recognition and probabilistic simulation (aka reasoning with uncertainty). The brain’s conscious imperative thinking mode (check this position, now this, etc) is essentially single-threaded and can struggle to (consciously) evaluate one position per second. But with a ‘clock rate’ of roughly only 100-1000hz, its no suprise that the brain doesn’t consciously evaluate board positions very quickly. Under the hood, the declarative type thoughts you are consciously experiencing are the tip of a vast subconscious pyramid of competing ideas and possibilities, but only the winner at any moment gets to direct the brain and store itself to our limited short term memory – a critical tool for chess. Every time the chess player consciously picks another branching move to evaluate, this filters down the vast iceberg of cortical memory where a much larger space of next potential moves are evaluated and cross checked in parallel against probabilistic memories which intuit out their strength. All this is going on below your conscious level of awareness – you are only ever conscious of the winning move that filters up to the top and becomes the next action. The brain thus uses its massive parallel capability and deep memory hierarchy to offset its low speed.
In terms of raw space-time complexity, the desktop CPU has on the order of a billion transistors which cycle around a billion times a second, for around 10^18 total transistor*cycles. The brain has at most a quadrillion synapses that cycle at most 1,000 hz, so it has around 10^18 synapse*cycles as an upper limit (and this frankly is generous). They use roughly comparable energy (the brain draws power closer to that of a laptop) Thus in terms of just raw computational throughput, they are vaguely similar. The brain’s massive size and slow speed have a high parallelization penalty – only a rather small number of dependent operations can possibly cycle through the slow network per second – this severely limits the maximum possible board depth evaluations per second. The brain’s strength is its large amount of potential memory accessible for computation. Comparing to plausible parallel chess algorithms that could run on the same slow, massively parallel hardware, the brain seems to fair quite well indeed. If you re-read the previous paragraph above about what goes on under the hood of a chessplayer’s brain as describing a chess algorithm for a massively distributed computer, it sounds like a fairly good algorithm given the computational constraints.
Another interesting route of comparison is to point out that the desktop chess program could be vastly improved by replacing the CPU simulating the optimal chess algorithm with specialized hardware which directly emulates the optimal chess algorithm. This would give a performance speedup of roughly 100 to 1000x. The CPU pays a large price for its generalization, and indeed the Deep Blue system which beat Gary Kasporov made use of such specialized chess logic. Encoding an algorithm directly into the hardware, instead of the software, makes the system useless for any other tasks, but it has orders of magnitude advantages in speed, power, and transistor real-estate. We can make an analogy to the brain, where simple lower animal brains have more space efficient hardwired circuitry, but do not have the flexibility to learn during life. Large mammal brains like ours use up all that extra space to pay for the cost of generalized learning – our equivalent of software flexibility.
To continue this line of reasoning, we could imagine equating the brain’s synapses to transistors (which is reasonable) and then look at the possible circuit equivalents we could build, analyzing the problem from the angle of circuit complexity theory. We earlier used the figure of a billion transistors for a 2010 desktop CPU. Within the same circuit space of the brain with its 10^14-10^15 slow transistors, we could build a weird supercomputer composed of 100,000 to a million desktop CPU’s, but with each CPU only running at the incredibly slow speed of 200 to 1000hz. Surprisingly, this system would then have a similar raw aggregate compute performance as our simplified desktop: at most a gigaflop of general computation (1 million cpus * 1000hz). Curiously, to make this comparison more concrete, the brain’s cortex is built up out of roughly a million cortical columns, which are built out of less than a million neurons and well under a billion synapses each – making them a rough equivalent to a CPU in terms of circuit complexity (although far slower, of course).
The remarkable thing is the brain – through learning alone – can automatically reconfigure this large network of very slow ‘cortical CPU’s’ to achieve performance similar to the optimal (best known) algorithm running on a general CPU system of similar total circuit performance, but the brain is solving the harder (inherently slower) parallel version of the algorithm. But remember the optimal chess circuit would not be a general CPU circuit simulating the algorithm in code – it would emulate the algorithm directly in the circuit. From the earlier analysis, a specialized chess processor built with the same billion transistor limit could probably match the CPU’s performance at 1/100th or 1/1000th of the clock rate – it would only need to run in the low mhz.
Put another way, the brain’s learning algorithm is extremely – almost magically – efficient: with enough training, it can configure its circuit space to match the performance of a circuit equivalent number of 2010 CPUs running the best known (and probably optimal) chess algorithm. This allows us to put another bound on the brain’s learning capability: in the chess problem domain a trained master’s brain has a circuit wiring efficiency within a factor of X of the optimal chess algorithm circuit, where X (around 1000 or so) is the efficiency difference of CPU simulation vs hardware emulation. When you consider that the same chessmaster’s brain can also converse in human languages, recognize a vast space of visual patterns, walk and drive moving vehicles, and do countless other advanced robotic tasks, the cortical learning meta-algorithmic seems to come out very well indeed – clearly the chess knowledge only represents a portion of its total capability.
As we move up the complexity ladder to more complex games such as Go, the brain’s particular strengths take over vs current CPUs. More than a decade after computers conquered chess, they are just reaching high amateur or low professional play in Go, and seem on track to master humans in this game in another 5-10 years. Go is complex enough that board positions escape any simple numerical evaluation and it has a higher branching factor than chess. For these reasons it is much better suited to the brain’s deep memory and pattern recognition abilities. As we move from go to the real world, the branching factor escalates to a near uncountable degree, and the brain comes in to its own element. It is a highly evolved general intelligence agent endowed with a learning algorithm tuned to a fairly general environment: the real world. However, unlike in chess, we can’t yet draw the same conclusions about Go, because we probably haven’t explored the algorithm space for Go as much yet. When we narrow in on the optimal go algorithms (or their practical equivalents), then we can use this is as another benchmark against the brain.
So, in summary current AI research in deep learning and probabilistic planning is making good progress along algorithm directions that are inspired from the brain’s cortical circuits. These cortical circuits appear to be evolutionary optimized direct hardware implementations of the same said network learning algorithms (which is rather trivially true – as the circuits were the inspiration for the algorithms!), and thus we can conclude that any future general CPU system running software simulations of these algorithms is going to be at a tremendous disadvantage.
Thus its pretty safe to conclude that the likely route to human-equivalent AI (a general intelligence that can train to human-level performance or better on a wide variety of real world tasks) is to reverse engineer the human brain and not only does this lead clearly to strong super-intelligence (just by massively speeding it up), but it also sets a strict scalability benchmark for any other potential super-intelligence system.