One: Perspective

Brain Theory as a Basis for AI

This book is about a new theory of how the brain works, and software which uses this theory to solve real-world problems intelligently in the same way that the brain does. In order to understand both the theory and the software, a little context is useful. That’s the purpose of this chapter.

Before we start, it’s important to scotch a couple of myths which surround both Artificial Intelligence (AI) and Neuroscience.

The first myth is that AI scientists are gradually working towards a future human-style intelligence. Despite what they tell us, and they themselves believe, they are really building computer programs which merely appear to behave in a way which we might consider smart or intelligent - as long as we ignore how they work. Don’t get me wrong, these programs are very important in our understanding of what constitutes intelligence, and they also provide us with huge improvements in understanding the nature and structure of problems solved by brains. The difficulty is that brains simply don’t work the way computer programs do, and there is no reason to believe that human-style intelligence can be approached just by adding more and more complex computer programs.

The other myth is that Neuroscience has figured out how our brains work. Neuroscience has collected an enormous amount of data about the brain, and there is good understanding of some detailed mechanisms here and there. We know (largely) how individual cells in the brain work. We know that certain regions of the brain are responsible for certain functions, for example, because people with damage there exhibit reduced efficiency in particular tasks. And we know to some extent how many of the pieces of the brain are connected together, either by observing damaged brains or by using modern brain-mapping technologies. But there is no systematic understanding which could be called a Theory of Neuroscience, one which explains the working of the brain in detail.

Traditional Artificial Intelligence

Traditional AI does not provide a basis for human-like intelligence. In order to understand the reasons for this, let’s take a look inside a digital computer.

A computer chip contains a few billion very simple components called transistors. In digital circuits, transistors act as a kind of switch (or relay): they allow a signal through or not, based on a control signal. Computer chip, or hardware, designers produce detailed plans for how to combine all these switches to produce the computer you’re reading this on. Some of these transistors are used to produce the logic in the computer, making decisions and performing calculations according to a program written by others: software engineers. The program, along with the data the program uses, are stored in yet more chips – the memory – using transistors which are either on or off. The on or off state of these memory bits comprise a code which stands for data – whether numbers, text, image pixels, or program codes which instruct the computer what instruction to perform at a particular time.

If you open up a computer, you can clearly see the different parts. There’s a big chip, usually with a fan on top to cool it, called the Central Processing Unit or CPU, which is where the hardware logic is housed. Separate from this, a bank of many smaller chips houses the Random Access Memory (RAM) which is the fastest kind of memory storage. There will also be either a hard disk or a solid state disk, which is where all your bulk data - programs, documents, photos, music and video - are stored for use by the computer. When your computer is running, the CPU is constantly fetching data from the memory and disks, doing some work on it, and writing the results back out to storage.

Computers have clearly changed the world. With these magical devices, we can calculate in one second with a spreadsheet program what would have taken months or years to do by hand. We can fly unflyable aircraft. We can predict the weather ten days ahead. We can create 3D movies in high definition. We can, using other electronic “senses”, observe the oxygen and sugar consumption inside our own brains, and create a map of what’s happening when we think.

We write programs for these computers which are so well thought out that they appear to be “smart” in some way. They look like they’re able to out-think us; they look like they can be faster on the draw. But it turns out that they’re only good at certain things, and they can only really beat us at those things. Sure, they can calculate how to fly through the air and get through anti-aircraft artillery defences, or they can react to other computer programs on the stock exchange. They seem to be superhuman in some way, yet the truth is that there is no skill involved, no knowledge or understanding of what they’re doing. Computer programs don’t learn to do these amazing things, and we don’t teach them. We must provide exhaustive lists of absolutely precise instructions, detailing exactly what to do at any moment. The programs may appear to behave intelligently, but internally they are blindly following the scripts we have written for them.

The brain, on the other hand, cannot be programmed, and yet we learn a million things and acquire thousands of skills during our lives. We must be doing it some other way. The key to figuring this out is to look in some detail at how the brain is put together and how this structure creates intelligence. And just like we’ve done with a computer, we will examine how information is represented and processed by the structures in the brain. This examination is the subject of Chapter Two. Meanwhile, let’s have a quick look at some of the efforts people have made to create an “artificial brain” over the past few decades.

Artificial Intelligence is a term which was coined in the early 1950′s, but people have been thinking about building intelligent machines for over two thousand years. This remained in the realm of fantasy and science fiction until the dawn of the computer age, when machines suddenly became available which could provide the computational power needed to build a truly intelligent machine. It is fitting that some of the main ideas about AI came from the same legendary intellects behind the invention of digital computers themselves: Alan Turing and John von Neumann.

Turing, who famously helped to break the Nazi Enigma codes during WWII, theorised about how a machine could be considered intelligent. As a thought experiment, he suggested a test involving a human investigator who is communicating by text with an unknown entity – either another human or a computer running an AI program. If the investigator is unable to tell whether he is talking to a human or not, then Turing considers the computer to have passed his test and must be regarded as “intelligent” by this definition. This became known as the Turing Test and has unfortunately become a kind of Holy Grail for AI researchers for more than sixty years.

Meanwhile, the burgeoning field of AI attracted some very smart people, who all dreamed they would soon be the designer of a machine one could talk to and which could help one solve real-world problems. All sorts of possibilities seemed within easy reach, and so the researchers often made grand claims about what was “just around the corner” for their projects. For instance, one of the milestones would be a computer which could beat the World Chess Champion, a goal which was promised within 5 years, every year since the mid-50s, and which was only achieved in the 21st century using a huge computer and a mixture of “intelligent” and “brute-force” techniques, none of which resembled how Gary Kasparov’s brain worked.

Everyone recognised early on that intelligence at the level of the Turing Test would have to wait, so they began by trying to break things down into simpler, more achievable tasks. Having no clue about how our brains and minds worked as machines, they decided instead to theorise about how to perform some of the tasks which we can perform. Some of the early products included programs which could play Noughts and Crosses (tic-tac-toe) and Draughts (checkers), programs which could “reason” about placing blocks on top of other blocks, in a so-called micro-world, and a program called Eliza which used clever and entertaining tricks to mimic a psychiatrist interviewing a patient.

Working on these problems, developing all these programs, and thinking about intelligence in general has had profound effects beyond Computer Science in the last sixty years. Our understanding of the mind as a kind of computer or information processor is directly based on the knowledge and understanding gained from AI research. We have AI to thank for Noam Chomsky’s foundational Universal Grammar, and the field of Computational Linguistics is now required for anyone wishing to understand linguistics and human language in general. Brain surgeons use the computational model of the brain to identify and assess birth defects, the effects of disease and brain injuries, all in terms of the functional modules which might be affected. Cognitive psychology is now one of the basic ways to understand the way that our perceptions and internal processes operate. And the list goes on. Many, many fields have benefited indirectly from the intense work of AI researchers since 1950.

However, traditional AI has failed to live up to even its own expectations. At every turn, it seems that the “last 10%” of the problem is bigger than the first 90%. A lot of AI systems require vast amounts of programmer intelligence and do not genuinely embody any real intelligence themselves. Many such systems are incapable of flexibly responding to new contexts or situations, and they do not learn of their own accord. When they fail, they do not do so in a graceful way like we do, because they are brittle and capable only of working while “on-tracks” in some way. In short, they are nothing like us.

Yet AI researchers kept on going, hoping that some new program or some new technique would crack the code of intelligent machine design. They have built ever-more-complex systems, accumulated enormous databases of information, and employed some of the most powerful hardware available. The recent triumphs of Deep Blue (beating Kasparov at chess) and Watson (winning at the Jeopardy quiz game) have been the result of combining huge, ultra-fast computers with enormous databases and vast, complex, intricate programs costing tens of millions of dollars. While impressive, neither of these systems can do anything else which could be considered intelligent without reinvesting similar resources in the development of those new programs.

It seems to many that this is leading us away from true machine intelligence, not towards it. Human brains are not running huge, brittle programs, nor consulting vast databases of tabulated information. Our brains are just like those of a mouse, and it seems that we differ from mice only in the size and number of pieces (or regions) of brain tissue, and not in any fundamental way.

It appears very likely that intelligence is produced in the brain by the clever arrangement of brain regions, which appear to organise themselves and learn how to operate intelligently. This can be proven in animal research labs, when experimenters cut connections, shut down some regions, breed mutants, and so on. There is very little argument in neuroscience that this is how things work. The question then is: how do these regions work in detail? What are they doing with the information they are processing? How do they work together? If we can answer these questions, it is possible that we can learn how our brains work and how to build truly intelligent machines.

I believe we can now answer these questions. That’s what this book claims to be about, after all!

Machine Intelligence versus Machine Learning

There is a branch of Computer Science and AI which has recently gained, or regained, prominence, partly as a result of the current age of Big Data: Machine Learning. Machine Learning involves having computer systems learn to carry out a task or solve a problem automatically, without having to explicitly instruct them about every step.

Two main branches appear to dominate the field of Machine Learning. The first bears some similarity to traditional AI in that it uses mathematical, statistical, and symbolic programming to compute its results. Essentially, the software searches a space of possibilities to identify the best function to use to solve the given problem; this kind of approach is used, for example, to predict trends in economic data into the future, to guess who’s going to win an election, or to identify the probability that a Large Hadron Collider experiment has found the Higgs Boson.

The other branch is based on some kind of “neural net”, a network of simplified artificial processing elements which combine to learn or model some aspects of the data in order to make predictions or identify patterns. Invented at the very beginning of the computer age, these networks have undergone many phases of mania and depression, as fundamental limitations were first ignored, then admitted, and finally overcome as the techniques improved. Recently, nets known as Deep Learning nets and Convolutional Neural Networks (ConvNets) have become very popular, being used as the core of search engines, speech recognisers, and other applications by giants like Google and Facebook, and their inventors now sit at the top tables of global corporations as “VP of AI Research” and “Chief Scientist”.

Neural nets attempt to model how the brain processes information. They work by connecting up pairs of neurons with adjustable “weights”, which indicate the strength of the influence of one neuron on another. The neurons are arranged in layers (as in the brain), input data is fed in at the bottom, passed up through the layers until it comes out, transformed by all the weights and summations, at the top, where it can be used to make predictions or identify what the network is looking at. Classic tasks for neural networks include identifying whether a picture is of a cat or a dog, deciphering handwritten characters, interpreting speech from sound, and so on. At tasks like this, the current world champions are all some kind of Deep Learning system or ConvNet.

While very interesting and quite well understood theoretically, we believe that such simple modelling of the details of brain function is yet another case of a fundamentally limited technology which will eventually fail to realise high expectations. One of the reasons for this is that neural net researchers seem to insist that their systems (and the way those systems settle on a solution) can be described using very simple mathematics, and that certain properties of the systems can be proved mathematically. They dare not make their models any more faithful to the true structure of the brain (as we do), because the mathematics quickly becomes too difficult to support such proof-based reasoning about them.

I would, however, encourage the reader to learn all about these systems. In certain ways they’ve gone much further than HTM researchers in addressing many cognitive tasks - they’ve been at this for decades, and they’ve had support from governments and, more recently, the likes of Google and Facebook, so there is already a large community of extraordinarily smart people hard at work to test the limits of what can be achieved. Many discoveries have emerged from their research, including many on perception, pattern recognition, hierarchy, and representation. Links to books, articles, courses and talks on neural nets can be found in the Further Reading section, including a page explaining HTM to experts in Machine Learning.