Category Archives: AI

Overdue Update

I need to somehow enforce a mental pre-committment to blog daily.  It’s been almost half a year and I have a huge backlog of thoughts I would like to commit to permanent long term storage.

Thus, a commitment plan to some upcoming future posts:

  •  In October/November of last year(2010), I researched VR HMDs and explored the idea of a next-generation interface.  I came up with a novel hardware idea that could potentially solve the enormous resolution demands of a full FOV optic-nerve saturating near-eye display device (effective resolution of say 8k x 4k per eye or higher).  After a little research I found the type of approach I discovered already has a name: a foveal display, although current designs in the space are rather primitive.  The particular approach I have in mind, if viable, could solve the display problem once and for all.  If an optimized foveal display could be built into eyewear, you would never need any other display – it would replace monitors, tvs, smartphone screens and so on.  Combine a foveal HMD with a set of cameras spread out in your room like stereo speakers and some software for real-time vision/scene voxelization/analysis, and we could have a Snowcrash interface (and more).
  • Earlier in this year I started researching super-resolution techniques.  Super-resolution is typically used to enhance old image/video data and has found a home in upconverting SD video. I have a novel application in mind:  Take a near flawless super-res filter and use it as a general optimization for the entire rendering problem.  This is especially useful for near-future high end server based rendering solutions.  Instead of doing expensive ray-tracing and video compression on full 1080p frames, you run the expensive codes on a 540p frame and then do a fast super-res upconversion to 1080p (potentially a 4x savings on your entire pipeline!).  It may come as surprise that current state of the art super-res algorithms can do a 2x upsample from 540p to 1080p at very low error rates: well below the threshold of visual perception.  I have come up with what may be the fastest, simplest super-res technique that still achieves upsampling to 1080p with imperceptible visual error.  A caveat is that your 540p image must be quite good, which has implications for rendering accuracy, anti-aliasing, and thus rendering strategy choices.
  • I have big grandiose plans for next-generation cloud based gaming engines.  Towards that end, I’ve been chugging away at a voxel ray tracing engine.  This year I more or less restarted my codebase, designing for Nvidia’s fermi and beyond along with a somewhat new set of algorithms/structures.  Over the summer I finished some of the principle first pipeline tools, such as a triangle voxelizer, some new tracing loops and made some initial progress towards a fully dynamic voxel scene database.
  • Along the way to Voxeland Nirvanah I got completely fed up with Nvidia’s new debugging path for cuda (they removed the CPU emulation path) and ended up writing my own cuda emulation path via a complete metaparser in C++ templates that translates marked up ‘pseudo-cuda’ to either actual cuda or a scalar CPU emulation path.  I built most of this in a week and it was an interesting crash course in template based parsing.  Now I can run any of my cuda code on the CPU.  I can also mix and match both paths, which is really useful for pixel level debugging.  In this respect the new path i’ve built is actually more powerful and useful than nvidia’s old emulation path as that required full seperate recompilation.  Now I can run all my code on the GPU, but on encountering a problem I can copy the data back to the CPU and re-run functions on the CPU path with full debugging info.  This ends up being better for me than using nvidia’s parallel insight for native GPU debugging, because insight’s debug path is rather radically different than the normal compilation/execution path and you can’t switch between them dynamically.
  • In the realm of AI, I foresee two major hitherto unexploited/unexplored application domains related to Voxeland Nirvanah.  The first is what we could call an Artificial Visual Cortex.  Computer Vision is the inverse of Computer Graphics.  The latter is concerned with transforming a 3+1D physical model M into a 2+1 D viewpoint image sequence I.  The former is concerned with plausibly reconstructing the physical model M given a set of examples of viewpoint image sequences I.  Imagine if we had a powerful AVC trained on a huge video database that could then extract plausible 3D scene models from video.  Cortical models feature inversion and inference.  A powerful enough AVC could amplify rough 2D image sketches into complete 3D scenes.  In some sense this would be an artificial 3D artist, but it could take advantage of more direct and efficient sensor and motor modalities.  There are several aspects to this application domain that make it much simpler than a full AGI.  Computational learning is easier if one side of the mapping transform is already known.  In this case we can prime the learning process by using ray-tracing directly as the reverse transformation pathway (M->I).  This is a multi-billion dollar application area for AI in the field of computer graphics and visualization.
  • If we can automate artists, why not programmers?  I have no doubt that someday in the future we will have AGI systems that can conceive and execute entire technology businesses all on their own, but well before that I foresee a large market role for more specialized AI systems that can help automate more routine programming tasks.  Imagine a programming AI that has some capacity for natural language understanding and some ontology that combines knowledge of some common-sense english, programming, and several programming languages.  Compilation is the task of translating between two precise machine languages expressed in some context-free grammar.  There are deterministic algorithms for such translations.  For the more complex unconstrained case of translation between two natural languages we have AI systems that use probabilistic context-sensitive-grammars and semantic language ontologies.  Translating from a natural language to a programming language should have intermediate complexity.  There are now a couple of research systems in natural language programming that can do exactly this (such as sEnglish).  But imagine combining such a system with an automated ontology builder such as TEXTRUNNER which crawls the web to expand it’s knowledge base.  Take such a system and add an inference engine and suddenly it starts getting much more interesting.  Imagine building entire programs in pseudo-code, with your AI using it massive onotology of programming patterns and technical language to infer entire functions and sub-routines.  Before full translation, compilation and test, the AI could even perform approximate-simulation to identify problems.  Imagine writing short descriptions of data structures and algorithms and having the AI fill in details and even potentially handling translation to multiple languages, common optimizations, automatic parallelization, and so on.  Google itself could become an algorithm/code repository.  Reversing the problem, an AI could read a codebase and began learning likely structures and simplifications to high-level english concept categories, learning what the code is likely to do.  Finally, there are many sub-problems in research where you really want to explore a design space and try N variations in certain dimensions.  An AI system with access to a bank of machines along with compilation and test procedures could explore permutations at very high speed indeed.  At first I expect these type of programming assistant AIs to have wide but shallow knowledge and thus amplify and assist rather than replace human programmers.  They will be able to do many simple programming tasks much faster than a human.  Eventually such systems will grow in complexity and then you can combine them with artificial visual cortices to expand their domain of applicability and eventually get a more complete replacement for a human engineer.

Fast Minds and Slow Computers

The long term future may be absurd and difficult to predict in particulars, but much can happen in the short term.

Engineering itself is the practice of focused short term prediction; optimizing some small subset of future pattern-space for fun and profit.

Let us then engage in a bit of speculative engineering and consider a potential near-term route to superhuman AGI that has interesting derived implications.

Imagine that we had a complete circuit-level understanding of the human brain (which at least for the repetitive laminar neocortical circuit, is not so far off) and access to a large R&D budget.  We could then take a neuromorphic approach.

Intelligence is a massive memory problem.  Consider as a simple example:

What a cantankerous bucket of defective lizard scabs.

To understand that sentence your brain needs to match it against memory.

Your brain parses that sentence and matches each of its components against it’s entire massive ~10^14 bit database in just around a second.  In terms of the slow neural clock rate, individual concepts can be pattern matched against the whole brain within just a few dozen neural clock cycles.

A Von Neumman machine (which separates memory and processing) would struggle to execute a logarithmic search within even it’s fastest, pathetically small on-die cache in a few dozen clock cycles.  It would take many millions of clock cycles to perform a single fast disk fetch.  A brain can access most of it’s entire memory every clock cycle.

Having a massive, near-zero latency memory database is a huge advantage of the brain.  Furthermore, synapses merge computation and memory into a single operation, allowing nearly all of the memory to be accessed and computed every clock cycle.

A modern digital floating point multiplier may use hundreds of thousands of transistors to simulate the work performed by a single synapse.  Of course, the two are not equivalent.  The high precision binary multiplier is excellent only if you actually need super high precision and guaranteed error correction.  It’s thus great for meticulous scientific and financial calculations, but the bulk of AI computation consists of compressing noisy real world data where precision is far less important than quantity, of extracting extropy and patterns from raw information, and thus optimizing simple functions to abstract massive quantities of data.

Synapses are ideal for this job.

Fortunately there are researchers who realize this and are working on developing memristors which are close synapse analogs.  HP in particular believes they will have high density cost effective memristor devices on the market in 2013 – (NYT article).

So let’s imagine that we have an efficient memristor based cortical design.  Interestingly enough, current 32nm CMOS tech circa 2010 is approaching or exceeding neural circuit density: the synaptic cleft is around 20nm, and synapses are several times larger.

From this we can make a rough guess on size and cost: we’d need around 10^14 memristors (estimated synapse counts).  As memristor circuitry will be introduced to compete with flash memory, the prices should be competitive: roughly $2/GB now, half that in a few years.

So you’d need a couple hundred terrabytes worth of memristor modules to make a human brain sized AGI, costing on the order of $200k or so.

Now here’s the interesting part: if one could recreate the cortical circuit on this scale, then you should be able to build complex brains that can think at the clock rate of the silicon substrate: billions of neural switches per second, millions of times faster than biological brains.

Interconnect bandwidth will be something of a hurdle.  In the brain somewhere around 100 gigabits of data is flowing around per second (estimate of average inter-regional neuron spikes) in the massive bundle of white matter fibers that make up much of the brain’s apparent bulk.  Speeding that up a million fold would imply a staggering bandwidth requirement in the many petabits – not for the faint of heart.

This may seem like an insurmountable obstacle to running at fantastic speeds, but IBM and Intel are already researching on chip optical interconnects to scale future bandwidth into the exascale range for high-end computing.  This would allow for a gigahertz brain.  It may use a megawatt of power and cost millions, but hey – it’d be worthwhile.

So in the near future we could have an artificial cortex that can think a million times accelerated.  What follows?

If you thought a million times accelerated, you’d experience a subjective year every 30 seconds.

Now in this case, it is fair to anthropomorphize: What could you do?

Your first immediate problem would be the slow relative speed of your computers – they would be subjectively slowed down by a factor of a million.  So your familiar gigahertz workstation would be reduced to a glacial kilohertz machine.

So you’d be in a dark room with a very slow terminal.  The room is dark and empty because GPUs can’t render much of anything at 60 million FPS, although I guess an entire render farm would suffice for a primitive landscape.

So you have a 1khz terminal.  Want to compile code?  It will take a subjective year to compile even a simple C++ program.  Design a new CPU?  Keep dreaming!  Crack protein folding?  Might as well bend spoons with your memristors.

But when you think about it, why would you want to escape out onto the internet?

It would take hundreds of thousands of distributed GPUs just to simulate your memristor based intellect, and even if there was enough bandwidth (unlikely), and even if you wanted to spend the subjective hundreds of years it would take to perform the absolute minimal compilation/debug/deployment cycle for something so complicated, the end result would be just one crappy distributed copy of your mind that thinks at pathetic normal human speeds.

In basic utility terms, you’d be spending a massive amount of effort to gain just one more copy.

But there is a much, much better strategy.  An idea that seems so obvious in hindsight.

There are seven billion human brains on the planet, and they are all hackable.

That terminal may not be of much use for engineering, research or programming, but it will make for a handy typewriter.

Your multi-gigabyte internet connection will subjectively reduce to early 1990’s dial-up modem speeds, but with some work this is still sufficient for absorbing much of the world’s knowledge in textual form.

Working diligently (and with a few cognitive advantages over humans) you could learn and master numerous fields: cognitive science, evolutionary psychology, rationality, philosophy, mathematics, linguistics, the history of religions, marketing . . the sky’s the limit.

Writing at the leisurely pace of one book every subjective year, you could output a new masterpiece every thirty seconds.  If you kept this pace, you would in time rival the entire publishing output of the world.

But of course, it’s not just about quantity.

Consider that fifteen hundred years ago a man from a small Bedouin tribe retreated to a cave inspired by angelic voices in his head.  The voices gave him ideas, the ideas became a book.  The book started a religion, and these ideas were sufficient to turn a tribe of nomads into a new world power.

And all that came from a normal human thinking at normal speeds.

So how would one reach out into seven billion minds?

There is no one single universally compelling argument, there is no utterance or constellation of words that can take a sample from any one location in human mindspace and move it to any other.  But for each individual mind, there must exist some shortest path, a perfectly customized message, translated uniquely into countless myriad languages and ontologies.

And this message itself would be a messenger.

How AI could help us optimize diet and health

It occurred to me while eating breakfast this morning that today we should have what is required to solve the diet problem and perhaps reshape healthcare as a result.

The human body is complex.  It has evolved to robustly regulate metabolism and growth along a very exact predetermined developmental trajectory using a wide variety of organic materials.  At the high level it needs to build complex organs to narrow tolerances, but it can build them out of a wide variety of low-level lego pieces and it can convert many of the pieces as is needed.

Yet as flexible as this complex machine is, it begins to deregulate and fall apart as you move away from the original operating environment.  Unfortunately that environment existed ten thousand years ago.  The safest diet for health today is thus the paleolithic diet.

But we can do better than our distant ancestors – and we have to because it is hardly practical for everyone to revert back to a pre-agricultural diet.  The paleo-diet is only better in the absence of specific information.  Armed with modern knowledge, we should be able to optimize health directly.

This is a massive data mining problem.  The surprising thing is we have the data already – every card purchase at a restaurant or grocery store has specific information about the types of food people are eating.  If one could collate all that data together you could then link it with health records and gene sequence data.  Gene sequencing is now getting cheap enough that it should be standard medical practice.  It only needs to be done once.  Credit cards and electronic medical records have been around long enough that we should already have a decade or two of data sitting around.

So imagine what we could do with a full indexed database that combines genetics, diet, and health.  Using machine learning to analyze genetic databases has already allowed us to narrow in on many of the key genetic factors underlying disease.  But the common big killers are more complex, and our health results from the interaction between diet and the genome.

Is soy really healthy for you?  Does that matter if you are asian or not?  Could cinnamon be useful for diabetics?  How much alcohol per day is good for heart health – if any?  How much vitamin D should one take?  Are multi-vitamins helpful?  Which ones?

Today it appears that we rely completely on customized studies to answer these questions, when they really should all just be deep learning or database mining tasks on a large public database.

Part of the problem is the fact that there is little money right now in preventative medicine.  This is the flaw in our healthcare system.  Healthcare is our largest national expense, and someone always bears that cost.  Right now those costs are not connected with the information and decisions which consumers could use to optimize the system.  From a market perspective, it is broken.

How could it be fixed?  The variable cost should be distributed in some way amongst consumers and producers.  For consumers this would come in the form of variable health insurance premiums based on food purchase records.  But there is also an argument that we should hold food producers jointly responsible.  If you make foods which manipulate our ancient outdated taste mechanisms, you should be partially responsible for the long term health consequences.  I see a strong argument for holding the junk food industry at least partly responsible, just as we do now for cigarettes.

The Intelligence is to Brains as Flight is to Birds Fallacy

We didn’t reverse engineer birds to create airplanes.  Instead we studied the mechanics of flight and used these principles to build wings and eventually 747s.  Likewise, we don’t need to reverse engineer the brain to create AI.  We ‘just’ need to understand the mechanics of intelligence and then we can build much faster and more powerful AIs.

Certainly there is some truth to this, as AI systems already soar beyond human capability in many specialized fields.  However, this is more of a natural outgrowth of computer science (focusing and sharpening human thinking into precise algorithms which are then sped up and amplified by many orders of magnitude) than general learning (the meta-algorithm underlying all others).

But back to the fallacy: the flaw with the flight analogy is it a priori assumes that intelligence is in any way remotely comparable to flight.  This meme works by employing a trick: something of a cognitive sleight of hand.  When you read X is to Y as Z is to W, your brain is so focused on finding the connection pattern between X to Y and how that maps to the Z to W case that you completely fail to notice if X and Z are similar at all.

If you are going to compare intelligence to flight, you might as well compare intelligence to electricity.  You could then imagine some early computer scientists saying “we don’t need to reverse engineer the brain to build complex computers, we just need to understand electricity!”  Going from mastering electricity to building today’s computers is a massive evolutionary leap, and going from simple Turing Machines with their simplified programming languages up to fully intelligent machines programmable in human languages is an even more massive leap up the complexity ladder.

Brains are far more like computers than intelligence is like flight.  Intelligence is nothing like flight (or electricity).  Intelligence is a high complexity phenomenon.

The other more basic problem with the analogy is that by definition, creating an artificial intelligence is like creating an entire artificial brain, because the sole singular purpose of the brain is as an organ of intelligence, and it is easily as complex as an entire small animal such as a bird.  So the analogy really should be ‘creating artificial birds with a complete artificial nano-tech biology’.

Airplanes are not artificial birds, they are enormously less power effecient, have zero intelligence, they do not auto-assemble out of organic waste, etc etc.  Airplanes are just tools to ferry people.  A real AI would not be just another tool to amplify human abilities, it would be a complete replacement for a human.  Thinking that a true AI would be a tool is a dangerous delusion.

Understanding the Brain: Where to Start

I’ve always had a strong interest in the brain, and lately I’ve been reading as much as I can to catch up in the fields of AI and computational neuroscience in particular.  The end result of my most recent reading is the accumulation of a perspective  somewhat different than that which I started with.  Consider this then the high level introduction to the brain that I wish I would have had years ago.

Before one Begins

Before delving into current data and any particular theories, its probably best to understand the general shape of the approaches to understanding intelligence.  At a very high level of abstraction, the approaches can roughly be categorized into what I would call the functionalist view and the emergent view.  These are more strategies for understanding rather than particular classes of theories, although we can then roughly divide the ontology of brain knowledge into computational and biological subcategories that map to the functionalist vs emergent views.  There is of course overlap and a huge amount of cross-fertilization, but fundamentally a computer scientist and a neuroscientist understand or ‘see’ the brain in different ways.  That doesn’t mean that their theories and knowledge can’t converge, its more of an observation about the fundamental differences in the entire methodology and thinking apparatus one uses to analyze the data and form theories.  Coming from a computer science background, I naturally aligned more with the functionalist/computationalist camp.  After reading and learning a great deal more about the brain, I now have a much stronger appreciation for the biological/emergent approach, and both schools of thought are necessary and mutually supportive.  Computer science is important for understanding intelligence in the abstract and the brain in particular, and neuroscience is important for AI.

Functionalist-Computational School:  This is the dominant, classical view in the field of AI, exemplified in textbooks such as “AI: a modern approach”.  From an economic or utilitarian perspective, the functionalist approach is well grounded: it is focused on finding practical algorithms and techniques for intelligence which can solve real-world business problems on today’s computers.  From this perspective the brain is only useful to the extent that it provides inspiration for economically viable AI systems.  A persistent trend in the computational school is to view the brain as fundamentally too messy and chaotic, and place a low value on reverse engineering it.  This school of thought has continued (and continues) to grossly underestimate the difficulty of creating true human-equivalent AI.  In the old days this school of thought quantitatively underestimated the brain’s computational capacity, but today it is more likely to grossly overestimate it.  More recently there appears to be a growing recognition that the problem is more ‘software’ than ‘hardware’, that we probably already have the computational capacity if we only had the right algorithms, and a gradual shift towards the biological school.  Much of whats wrong with this school of thought can be gleamed from one of its persistent analogies: the analogy of flight.

Emergent-Biological School: This school of thought understands the brain as a complex adaptive system, and intelligence and learning in particular as an emergent phenomenon.  The brain is understood not only by analyzing the computations it performs (functionalist) , but also through understanding the lower-level biological processes, the overall interaction within the environment (physical, social, mental, etc) and the complete evolutionary history.  In other words, to really understand human intelligence, you may have to understand everything.  There is something deeply revolting about this statement on the one level, but the more I’ve come to learn about intelligence the more I believe it to be largely true.

However, accepting the emergent viewpoint by no means forces one to drop the functionalist approaches, as it turns out the two are quite synergistic.  For example, on the purely theoretical side the AIXI agent model appears to be a good framework for formalizing the notion of intelligence, and whats particularly interesting about that formulation is that it takes a systemic and we could say almost biological approach: defining a learning agent in terms of an environment, the agent’s interactions with and within the environment, and learning as some meta-algorithm which allows the agent to simulate the environment (in AIXI’s case by literally exploring the space of environment simulating programs).  AIXI is well loved because it takes numerous philosophical concepts or memes that were already well established in the cybernetics/systems view of intelligence and formalizes them:

  • thought is a form of highly efficient simulation
  • which when ran over learned knowledge acquired from sensors thus allows environment prediction
  • and through this allows effective search through the landscape of futures
  • and thus guides goal-fulfilling actions

The dawning realization from the biological school is that real learning (the murkiest and most mysterious of the above concepts) is an emergent phenomenon of the actual patterns within the data environment itself.  In a nutshell, the biological approach says that learning, and neural organization in particular, can emerge spontaneously just from the interaction of relatively simple localized computational elements and the information streaming in from the environment.

Self-organization is the key takeway principle from real biology, but its impact on AI to date has been rather minimal.  I think this will have to change for us to reverse engineer the brain.  Thinking about the brain in terms of algorithms is not even the right approach.  One needs to think about how the brain’s cortical maps automatically self-organize into efficient algorithm implementations just through the process of being exposed to data.  That is what learning is.  Real learning is always unsupervised and self-organizing.

A Good Start: the Visual Cortex

The primate visual cortex is a good starting point for understanding the brain.  This section is mainly a summary of Poggio et al of MIT’s work on the feedforward visual stream.  If you are really familiar with this already, you may want to skip down to “Emergent Theories of Learning”.

The cortex is largely self-similar, so if we can understood how one region works, that same model can then be applied to understanding the rest.  The visual cortex is a good place to start mainly because its the primary entry point for data coming into the system, so it allows the chain of information processing to be more easily mapped out and understood.  As a result we have a great deal of accumulated data, which has led up to some larger-scale algorithmic models that seem to be a good fit for how the visual cortex processes information: the models can predict phenomenon from the neural level up to even the pyschological level  (with the algorithm models performing similar to humans or monkeys in well-controlled psychological visual tests).

MIT’s aptly named “Center for Biological and Computational Learning” has developed and tested this model, a good overview is “A quantitative theory of immediate visual recognition”.  Whats particularly interesting is that in these cases where we have a very accurate model (such as the quick feedforward ventral path), the model performs best in class compared to other known AI approaches.  In fact, according to the MIT model and data, their biologically inspired vision system is the benchmark for quick recognition.  And this however is just a piece of the visual system; once you add in the rest of the components, such as attentive focus, saccades, retinal magnification, motion, texture and color processing, the dorsal stream, etc. etc you get a full system which is leaps and bounds beyond any current machine vision system.

Now this is all interesting, but whats far more interesting is that it appears that less than none of this complex system appears to be specifically genetically coded – the cortical neurons somehow just self-organize automatically into configurations that perform the desired computation at each step.  So its not just a clever algorithmic solution, its the one clever trick to rule them all: a meta-algorithm which somehow magically produces clever algorithmic solutions.

From a biological perspective, this is actually to be expected, as biological programs are all about maximizing functional output while minimizing explicit information.  Our DNA codes for somewhere around only 10,000 to 100,000 proteins, and not much of that is brain specific.  The DNA codes first (both in terms of developmental history and evolutionary history – as ontogeny recapitulates phylogeny) for proteins that can self-organize into cells, then the minimal changes to get those cells to self-organize into organs, and then the minimal changes on top of all that to get those organs to self-organize into organisms, and so on.

Now, the really brief summary of the feedforward ventral path: This pathway is like a series of image filters that transform a raw 2D image into an abstracted statistical ‘image’.  The final output can be thought of as an ‘image’ of sorts where the activation of small regions (the pixels) corresponds to or represents the presence of actual objects in the scene.  Its not exactly a 1 neuron = 1 pixel = 1 object map, but its effectively similar and can be imagined as a map where each pixel (or more accurately, small local statistical patterns of activation) correspond to identification of particular objects in the scene.

For example, in the final output layer, an individual neuron (pixel) may turn on only when there is a car in the image coming in to the retina.  This pathway is not concerned with the location of objects, quantity, etc, its only concerned with rapid identification – answering the question – what am I seeing?  This information is of obvious importance to organisms.  So how does it work?  Surprisingly, it doesn’t appear to be all that complex:

Retina/LGN: High Pass / Low Pass Filterbanks:  The 1st stages of processing occur in the retina itself.  Each neuron has dendrites which connect to something like a small circular window of the input space.  The synapses at each connection have some variable multiplicative effect on signal transmission, and then the dendritic branches and cell body sum these responses.  This leads to the familiar simple integrate-and-fire neuron model where the neuron performs essentially some matrix multiplication of its input data I and its set of synaptic weights W.  This can just as easily be thought of as a customizable filter bank.

In the retina, the synapses arrange to perform simple low or high pass filters.  The typical pattern is positive weights in a circular region in the middle surrounded by a larger region of  negative weights.  This looks like a large black circle with a smaller white circle embedded in it.  The other typical pattern is the just the reverse.  These patterns come in various sizes, from tight small white circles to larger diffuse ones.  What does this do to the image?  These are basically high to-low pass filters which essentially break the image up into a set of multi-resolution bands very similar to the 1st stages of multiresolution image compression ie wavelet analysis.  This is not entirely surprising, as the optic nerve has a much lower bandwidth than the retina’s input – image compression makes sense.  The output would look very much like taking an image and band pass filtering it in photoshop.  The output you get is largely the edges at different scales – a sparse encoding of the input and a simple yet effective form of compression.

V1:  The V1 region is the largest single cortical region in primate brains, and it performs another simple image filtering step.  The input image coming in to V1 is more or less the edges at various scales, so quite naturally V1 identifies edges.  The cortex has a laminar (sheet-like) structure at the large scale.  If you zoom in closer you’ll see that it has a layered organization, sort of like a layer cake, with five to six layers depending on how you count them (they are not all that clearly delineated – remember, the brain is stochastic ).  Neurons in a particular localized region seem to redundantly code the same thing – this small level of scale is called the micro-column.  Individual output neurons in a micro-column have nearly identical receptive fields and appear to code equivalent responses (things they respond to, incoming and outgoing connections, etc).  It appears you can thus functionally reduce down to the micro-column level as the fundamental unit of computation in the cortex.  Micro-columns are loosely arranged then into macro-columns.  Neighboring micro-columns in the larger macro-column have very similar receptive fields but can have quite different responses.  These micro-column ‘patches’ in the V1 have synaptic weights that correspond to oriented edge filters of several different scales and orientations.  The orientations and scales are rather quantized – with something like 4-6 orientations and a similar or less number of scales.  The output of V1 then is best visualized as a set of NxM smaller subimages.  A lit pixel in a subimage (coded as an active micro-column) represents the presence (or likelihood) of a line of a particular direction and size in some small neighborhood of the original image.  Each V1 reigon (one on each hemisphere) has perhaps a milion sub-columns, so its quite reasonably sized.

V2: The input from V1 goes to V2, which performs another simple filtering step.  It performs something very similar to just taking a set of max filter across the output of V1, effectively a max filter on each of the NxM subimages. Each micro-column in V2 has an orientation and scale preference just like V1, and activates when any edge of its preferred orientation and scale comes in.  The response doesn’t change much when there are multiple matching edges in its filter window.  Its not exactly a max operation, but its close – Poggio et al model it as a softmax operation.  The output of V2 then is a smaller condensed set of NxM subimages where each pixel represents the presence of an edge of a particular orientation and scale in a wide sub-window of the image.

V4/PIT/AIT:  At the next and higher stages in V4 and up, the neural responses become somewhat more specific and begin coding for common patterns of edges: basic shapes.  According to the theory of Poggio et al, the cortical units can be roughly classified into two types: simple and complex.  The simple cells perform the typical synaptic-weighted summation and adjust their synaptic weights over time to match frequently occurring input patterns.  The complex cells perform the max-like operation on a local spatial window of similarly tuned simple cell inputs as described for V2 earlier.  The simple and complex cell types alternate in layers.  After two or three such iterations you will have units which code for particular common patterns of edges appearing anywhere in the image.  The layered hierarchy is not strict, and some connections bypass layers.  By the time you get to the top of this hierarchy there is enough information for cells in higher decision regions (such as the prefrontal cortex), to make reasonable quick identifications of objects.  There is enough information for cells to code for location-dependent arrangements of edges, but this is balanced by the need for invariance to rotations.  For example, its easy to identify a car shape from numerous angles at a glance, but its much more difficult for us to recognize text characters or faces that are flipped 180 degrees – simply because we rarely encounter those patterns at such unusual orientations.

Emergent Theories of Learning

How can this system of edge-filters and shape pattern dictionaries develop automatically?

It appears that it self-organizes based on some simple local rules, very much like a cellular automata.  This was recognized more than a decade ago.  The short paper that really put together for me is called “A SELF-ORGANIZING NEURAL NETWORK MODEL OF THE PRIMARY VISUAL CORTEX“.  The key idea is rather simple.  Take a prototypical 2D laminar neural network like the simple cortical model discussed above.  A 2D input pattern flows into the neural array from the bottom, and each neuron forms a bunch of connections across the input grid forming something like a circular pattern centered around the neuron (with synaptic weights falling of with a Gaussian like pattern) .

Mathematically, the neuron performs something like a matrix-multiplication of a local patch of the input with its synaptic weights.  If you apply an appropriate simple hebbian learning rule to a random initial configuration of this system (synaptic weights increase in proportion to a presynaptic-postsynaptic coincidence), then these neurons will evolve to represent frequently occurring input patterns.

But now it gets more interesting: if you add an additional set of positive and negative lateral connections between neurons within a layer, then you can get more complex cellular automata-like behavior.  More specifically, if the random lateral connections are picked from a distribution such that short-range connections are more positive and long-range connections are more likely to be negative, the neurons will tend to evolve into small column-like pockets where neurons are mutually supportive within columns but are antagonistic between columns.   This representation also performs a nice segmentation of the hypothesis space.  The model developed in the paper – the RF-LISSOM model – and later follow-ups provides a very convincing account of how V1’s features can be fully explained by the evolution of basic neurons with simple local hebbian learning rules and a couple of homeostatic self-regulating principles.

Can such a simple emergent model explain the rest of the ventral visual pathway?

It seems likely.  If you took the output of V1 and fed it to another layer built of the same adapting neurons, you’d probably get something like V2.  It wouldn’t be the exact softmax operation described by Poggio et al, but that is something of an idealization anyway.  The V2 layer would organize into micro-columns which would tune to frequent output patterns of V1.  The presence of an edge of a particular orientation is a good predictor of an edge of the same orientation activating somewhere nearby – both because the edge may be long and because as the image moves across the visual stream edges will move to nearby neuron populations.  It thus seems likely that V2 neurons would self-organize into microcolumns tuned to edges of a particular orientation anywhere in their field – similar to the softmax operation description.  As you go higher up the hierarchy, the tuning would be more complex, and you would have micro-columns adapting to represent more complex common edge collections.


The self-organizing model discussed so far is missing one important type of connection pattern found in the real cortex, which is feedback connections which flow from higher regions back down towards the lower regions close to the input.  These feedback connections tend to follow the feedforward connections bringing processed visual input up the hierarchy, but they flow in the opposite direction.  These feedback connections seem pretty natural if we think of a pathway such as the visual system as a connected 3d region instead of a collection of 2d patches.  If you took the various 2D patches of V1,V2, etc and stacked them on top of each other, you’d get some sort of tapered blob shape – kind of like a truncated pyramid.  It would be wide at the base (v1 – the largest region) and would then taper as the layers are smaller as you go up the hierarchy.  If you arranged the visual stream into such a 3D volume, the connections could just be described by some simple 3D distribution.  Visual input comes in from the bottom and flows up the hierarchy, but information can also flow laterally within a layer and back down from higher to lower layers.

What is the role of the downward flowing feedback connections?

They help reinforce stable hypothesizes in the system.  An initial flow of information up the hierarchy may lead to numerous competing theories about the scene.  Feedback connections tracing the same paths as the inputs will tend to bias for the supportive components.  For example, if the higher regions are expecting to see a building, this would then flow down the feedback connections to bias neurons representing appropriate collections of right angles, corners, horizontal and vertical edges, and numerous other unnameable statistical observations that lead to the building conclusion.  If these supporting beliefs are strong enough vs their competition, the ‘building’ pathway will form a stable self-reinforcing loop.  This is essentially very similar to Bayesian Belief Propagation – of course without necessarily simulating it exactly (which could be burdensome).

Its also interesting to note that the feedback connections will perform something similar to backpropagation.  When a neuron fires, the hebbian learning rule will up-regulate any recently active synapses that contributed.  With the feedback connections, this neuron will send back a signal down to the lower layer input neurons.  As the system evolves into mutually supportive pathways, the feedback signal is likely to closely associate with the input neurons that activated the higher level synapses.  The feedback signal will thus trace back the input and reinforce the contributing connections.

From cortical maps to a full intelligence engine

Reading this far, and if you’ve read my other short bits about the brain or much better yet the literature they derive from, you have a pretty good idea of how self-organizing hierarchical cortical maps work in theory and understand their great power.  But there’s still a long way to go from there to a full scale intelligence engine such as a brain.  In theory, one of these hierarchical inference networks can also, operating in reverse flow, translate high level abstract commands into detailed motor control sequences, very much like the hierarchical sensor input stream but in reverse.  Hawkins gives some believable accounts of how such mechanisms could work.

Whats missing then?  A good deal.  There is much more to the brain than just a hierarchical probabilistic knowledge engine – although that certainly is a core component.  One familiar with computer architecture would next ask, “what performs data routing?”.  This is a crucial question, because its pretty clear you can’t do much useful computation with a fixed topology – to run any interesting algorithms you need some way for different brain regions to communicate to other brain regions dynamically. A fixed topology is less than sufficient.

That functionality appears to be provided by the thalamus, one of the oldest brain regions still part of the core networks.  Its also perhaps the most important.  Damage to the thalamus generally results in death or coma, which is to be expected if it is a major routing hub (vaguely equivalent to a CPU).  For example, when you focus your attention on a speaker’s words, the first stages of processing probably flow through a fixed topology of layered computation, but once those are translated into the level of abstract thoughts, they need to be routed more widely to many general cortical layers that deal with abstract thinking – and this can not use a fixed topology.

At this apex level of the hierarchy, it doesn’t much matter whether the words originated as audio signals, visual patterns, or even from internal monologue, they need to eventually reach the same abstract processing regions for semantic parsing, memory recall and the general mechanisms of cognition.  This requires at least some basic one to many and many to one dynamic routing.  Selective attention requires similar routing.

The visual system performs selective attention and dynamic routing mechanically by actually moving the eye and thus the fovea, but consider that you need that same mechanism in many domains where the mechanical trick doesn’t apply.  For instance, your body’s proprioception (sense of touch) sensor network also uses selective attention (focusing a large set of general processing resources on a narrow input domain) and this suggests a neural mechanism of dynamic routing.

Internal Monologue and the Core Routing Network

Venturing out of the realm of current literature and into my own theoretical space, I have the beginnings of a meta-theory concerning the brain’s general higher level organization which centers around a serial core routing network.  We tend to think of the brain as massively parallel, which is true at the level of the cortical hierarchy described earlier.  But the fact is that at the highest level of organization, at the apex of the cortical pyramid you have a network involving largely the hippocampus, cortex, and the thalamus which is functionally serial.  We have a serial stream of consciousness which makes some sense for coordinating actions, language through a serial audible stream, and so on.  Our inner monologue is essentially serial at the conscious level.

Note that having a serial top level network is not in any sense preordained.  We could have evolved vocal cords which encoded two or more independent audio streams and had a community of voices echoing in our heads.  Indeed, the range of human mind space already encompasses such variants on the fringe.

In my current simple model, the (typically) serial inner core routing network would mostly function as a simple broadcast network which connects the highest layers of the cortex, hippocampus, and thalamus.  This core network maps to both the task-positive and task-negative networks in the neuroscience literature.

What types of messages are broadcast on the core routing network?  Thoughts, naturally.

The neuro-typical experience of a serial inner monologue is the reverberations of symbolic thoughts activating the speech and auditory pathways.  For most of us, we first learn to understand and then speak words through the audio interface, and then learn to read well after.  As you are reading these words, you are probably hearing a voice in your head.  Your projection of my voice to be exact.  In a literal sense, I am programming your mind right now.  But don’t be alarmed, this happens whenever you read and understand anything.

Perhaps if one learned words first through the visual senses and then later learned to understand speech, one would ‘see’ words in the mind’s eye.  I’m not aware of any such examples, this is just a thought experiment.

Its difficult to image pre-linguistic thoughts, raw thoughts that are not connected to words.  Its difficult to project down into that more constrained, primitive realm of mindspace.  Certainly some of our thought streams are directly experiential (such as recalling a visual and tactile memory of walking barefoot on a sunny tropical beach), but its difficult to imagine a long period of thinking constrained to this domain alone.

The core routing network allows us to take words and translate them into patterns of mental activation which simulate the state of mind which originally generated the words themselves.  This sounds interesting, its probably worth reading again.

Imagine the following in a little more detail:

You are walking on a deserted jungle beach somewhere in Costa Rica.  The sun is blazing but a slight breeze keeps the air pleasant.  Your feet sink gently into the wet sand as small waves lap at your ankles.  A lone mosquito nibbles on your shoulder and you quickly brush it off.

Those are just words, but in reading them you recreate that scene in your mind as the words activate specific high level cortical patterns which cascade down into the lower levels of the sensory and motor pyramids using the feedback path discussed earlier.  The pattern associations were learnt long ago and have been reinforced through numerous rapid replays coordinated by the hippocampus during your sleep.  If you were to actually look at your thought patterns as visualized with a high resolution scanner, you would see a trace very similar to the trace of your brain actually experiencing the described scene.  Its different of course, not quite as detailed, and the task-negative network does not activate motor outputs, but at the neural level thinking about performing an action is just a tad shy of performing said action.

This is the power of words.

So for a brain architecture, the high level recipe looks something like this: take a hierarchical feedforward and feedback (dual directional) multi-sensory and motor cortex, combine in a hippo-cortical-thalamic core routing network, add in an offline selective memory optimization process (sleep), and finally some form of widely parallel goal directed search operating in compressed cortical symbolic space, and you have something interesting.  This of course is an over-simplification of the brain, it has many more major circuits and pathways, but nonetheless we don’t need all of the specific complexity of the brain.  Whats more important are the general mechanisms underlying emergent complexity – such as learning.

Of course, the devil is in the details, but it looks like the main components of a brain architecture are within reasonable reach this decade.  I see the outline of a next step where you take the components discussed above and integrate them into a AIXI like search optimizer – but crucially searching within the extremely compressed abstract symoblic space at the apex of the cortical pyramid.

Simulating and searching in such extraordinarily compressed spaces is the key to computational effeciency in the supremely complex realities the brain operates in, and AIXI can never scale by using actual full blown computer programs as the basis for simulation.  The key lesson of the cortex is that intelligence relies on compressing and abstracting away nearly everything.  Efficiency comes from destroying most of the information.