Neural Networks: The Promise and
the Reality
Ben Goertzel
The
first time I saw a human brain in a medical laboratory, I was overwhelmed by a
very strange feeling. How, I wondered
mutely, can this three-pound gray mass of nerve cells possibly lead to so
much? How can all the social and
psychological and cultural complexity we see around us and feel within us, come
out of this little lump of meat? This
lump of meat, not so different from the similar lumps of meat found in the
heads of apes and monkeys – or, for that matter, sheep and cows.
The
brain is a puzzle, and obviously, one with significance far beyond the ivory
tower.
First
of all, understanding the brain is important as part of the human race’s quest
for self-understanding. The
contradictions and dilemmas and great moral questions that follow us from one
culture to the next – aggression, sexuality, gender differences, freedom and
bondage, death and suffering – all have their roots in the patterns by which
electricity courses through these three-pound masses in our heads.
But
that’s not all -- the significance of the brain extends beyond even the human
condition. For the human brain is, at
the moment, our unique example of a highly intelligent system. Some people think whales and dolphins are
highly intelligent – but this remains a speculation. Stanislaw Lem, in his famous book Solaris, hypothesized an
intelligent ocean on another planet, whose intelligence was revealed to humans
in the complexity of its wave patterns and its strange effects on peoples’
minds. But even if Solaris’s real-world
equivalent is out there somewhere, we haven’t voyaged there yet. For the moment, the human brain, flawed as
it is, is our only incontrovertible example of an intelligent system, so if we
want to understand the general nature of this mysterious thing called
intelligence – or, more practically, create intelligent devices to improve our
lives -- the brain is the only really concrete source of inspiration we have.
Neuroscientists
try to understand the workings of this amazing three-pound meat hunk, one cell
at a time, one specialized region at a time.
The basic principles are simple, but the complexities are
astounding. As of now, they know a lot
about how the brain is structured – which parts do which sorts of things. And they know a lot about cells and
chemicals in the brain. But they have
remarkably few hard facts about the dynamics of the brain – how the brain
changes its state over time, a process that’s more colloquially referred to
with terms like “thinking” and “feeling.”
The
main cause of this situation is that there’s no way, right now, to monitor the
details of what goes on in the brain all day, how it does its business. Brain scanning equipment, PET and fMRI
scanners, have come a long way, but are still far too crude to do much beyond
tell us which general regions of the brain a person uses to do which general
kinds of thinking. They don’t yet let
us monitor the course of thinking itself.
A few bold neuroscientists have made their guesses as to the big picture
of how the brain works, but the more timid 99.9% remain focused on their own
small corner of the puzzle, all too aware of how much more there is to be
understood before general theories about brain function can be systematically
verified or falsified.
And
the difficulty of understanding brain and mind is not restricted to
neuroscience. Other disciplines
concerned with other aspects of intelligence – psychology, artificial
intelligence, philosophy of mind, linguistics – have run up against equally
tough dilemmas as the ones that face neuroscience. Psychologists bend over backwards to create complex experiments
that will test even their simplest theories, let alone their subtler ones. AI programmers find current computer
hardware inadequate to implement their more ambitious designs. Linguists enumerate rule after rule
describing human language use, but fail to encompass its full variety. This cross-disciplinary difficulty has led
to the creation of a combined interdisciplinary discipline called cognitive
science, which wraps all the difficulties up into one unmanageable but
fascinating package. Many universities
now offer degrees in cognitive science – the study of the baffling phenomenon
of brain-mind from every aspect, the piecing together of diverse fragments of
evidence to try to arrive at an holistic understanding of this most baffling
three-pound meat hunk that lies at the very center of our selves, and holds the
key to untold future technologies.
Not
surprisingly, cognitive science itself is a rather diverse endeavor,
encompassing a number of different approaches to understanding mind-brain. Each approach has its own merits. Among the most fascinating conceptual
frameworks under the cognitive science umbrella, however, is the field of artificial
neural networks. Scientists and
engineers working in this area are creating computer programs that, in some
sense, work like the brain does – not just on the overall conceptual level of
being intelligent, but by simulating the dynamics of interaction between brain
cells. So far none of these neural net
programs is anywhere near as intelligent as the brain. But some of them have shed light on various
aspects of brain function (memory, learning, disorders like dyslexia), and
others are serving very useful functions in the world right now, embedded in
other software programs doing everything from translating handwriting into
typescript to filtering out porn on the Internet to helping diagnose problems
in auto engines.
There
is a lot to be learned from these neural network programs’ successes – and from
their limitations. We can see just how
much, and how little, of what makes the human brain so powerful and wonderful
can be captured in simplified, small-scale models of its underlying low-level
mechanism.
Neural
networks, today, are useful tools for understanding aspects of the dynamics of
the brain. And they’ve proved valuable
at solving some kinds of computer science and engineering problems. But they don’t form a viable approach to
understanding how the brain works in a holistic sense, nor for creating real
computational intelligence. For these
more ambitious goals, we’ll have to wait for a different kind of science.
Neurons
The
brain is made of cells called neurons.
There are many other cells there too – glia, for example, that fill up
the space between the neurons. But all
the evidence – dating back to Ramon y Cajal at the end of the 19’th century --
shows that neurons are the most important ones. Neuron in groups do amazing things but neurons individually seem
to be relatively simple. A neuron is a
nerve cell. Nerve cells in your skin send information to the brain about what
you're touching; nerve cells in your eye send information to the brain about
what you're seeing. Nerve cells in the brain send information to the brain
about what the brain is doing. The
brain monitors itself -- that's what makes it such a complex and useful organ!
How
the neuron works is by storing and transmitting electricity. We're all running on electric! To get a
concrete sense of this, look at how electroshock therapy affects the brain – or
look at people who have been struck by lightning. One many who was struck by lightning was never again was able to
feel in the slightest bit cold. He'd go
outside in his underwear on a freezing, snowy winter day, and it wouldn't
bother him one bit. The incredible jolt of electricity had done something weird
to the part of his nervous system that experienced cold....
The
neuron can be envisioned as an odd sort of electrical machine, which takes
charge in through certain "input connections" and puts charge out
through its "output wire." Some of the wires give positive charge --
these are "excitatory" connections. Some give negative charge --
these are "inhibitory." But the trick is that, until enough charge
has built up in the neuron, it doesn't fire at all. When the magic
"threshold" value of charge is reached, all of a sudden it shoots its
load. From a low-level, mechanistic
view, this "threshold dynamic" is the basis of the incredibly
complexity of what happens in the brain.
Of course, the connections between neurons aren’t really as simple as
electrical wires – that’s just a metaphor.
In reality, each inter-neuron connection is mediated by a certain
chemical called a neurotransmitter.
There are hundreds of types of neurotransmitters. When we take drugs, the neurotransmitters in
our brain are affected, and neurons may fire when they otherwise wouldn’t, or
be suppressed from firing when they otherwise would.
And the near-consensus among neuroscientists is that most learning takes
place via modification of the connections between neurons. The idea is simple: Not all connections between
neurons conduct charge with equal facility. Some let more charge through than
others; they have a "higher conductance." If these conductances can
be modified even a little then the behavior of the overall neural network can
be modified drastically. This
leads to a
picture of the brain as being full of "self-supporting circuits,"
circuits that reverberate and reverberate, keeping themselves going. This concept
goes
back to Donald Hebb in the 1940's, and it's held up since then through
all the
advances in neuroscience.
Christof Koch, a well-known neuroscientist, believes there is some
fairly subtle electrochemical information processing going on inside each
neuron, and the extracellular space between neurons. If this is so, no one knows exactly what role this has in
intelligence. A few mavericks go even
further, and argue that the neuron itself is a complex molecular computer. Anesthesiologist Stuart Hameroff has
suggested that the essence of intelligence lies in the dynamics of the neural cytoskeleton
-- in the molecular biology in the walls of the neurons. Roger Penrose, the famous British
mathematician, has
taken this one step further, arguing that one needs a fancy theory of quantum
gravity to explain what's going on in the cytoskeletons of neurons, and tying
this in with the idea that quantum theory and consciousness are somehow
interrelated. There’s no real evidence for these
theories, but it’s an indicator of how little we understand about the brain
that these outlier theories can’t be 100% convincingly empirically refuted at
this stage.
High-Level
Brain Structure
Neurons
are the stuff of the brain; but they live at a much lower level than thoughts,
feelings, experiences, knowledge. Above
the level of interesting mind-stuff, on the other hand, we have the various
regions of the brain, with their various specialized purposes. In between is the interesting part – and the
least known.
One
hears a lot about left brain versus right brain, but in fact the really
critical distinction is forebrain versus hindbrain and midbrain. The hindbrain is situated right around the
top of the neck. What it does is mostly to regulate the heart and lungs -- and
control the sense of taste. The
midbrain, resting on top of the hindbrain, integrates information from the
hindbrain and from the ears and eyes, and it appears to play a crucial role in
consciousness. Collectively, the
hindbrain and midbrain are referred to as the brainstem.
The
hindbrain is old – it’s basically the same in humans as in reptiles,
amphibians, fish, and so forth. The
forebrain, on the other hand, is much
more complex in mammals than in other animals. It’s where our smarts live, and it has its own highly complex and
variegated structure. The mammalian
forebrain is subdivided into three parts: the hypothalamus, the thalamus and
the cerebral cortex. And then the cortex
itself is divided into several parts, the largest of which are the cerebellum
and the cerebrum. The cerebellum has a
three-layered structure and serves mainly to evaluate data regarding motor
functions. The cerebrum is the seat of intelligence -- the integrative center
in which complex thinking, perceiving and planning functions occur. How it works we have only a very vague idea.
And
all this is just the coarsest level of high-level brain structure. With PET and fMRI brain scan equipment, scientists can go one level
deeper. By getting a rough picture of
which parts of the brain are getting more attention at what times, they can see
what
parts of the brain are involved in what activities. So, for example, attention seems to involve three
different systems: one for vigilance or alertness, one for sort of executive
attention control, and one for disengaging attention from whatever it was
focused on before. Three totally different systems, all coming together to let
us be attentive to something in front of us.... Depending on which of the three
systems is damaged, you get different kinds of awareness deficits, different
neurological disorders.
All these different parts of the brain are made of neurons, passing
electricity around via neurotransmitters, in reverberating cycles and complex
patterns. But the different kinds of
neurons and neurotransmitters in the different parts of the brain, and the
different patterns of arrangement and connectivity of the neurons in the
different parts of the brain, obviously make a huge practical difference.
Atoms and molecules are a reasonable metaphor here. All brain regions are made of neurons, just
as all molecules are made of atoms.
Different atoms have very different properties, as to different types of
neurons. And different molecules,
formed from different types of atoms, can do very different things – just as
different brain regions are specialized for different functions. Only the very simplest molecules can, in
practice, be analyzed in terms of atomic physics – otherwise we have to use the
crude heuristics of theoretical chemistry.
Complex molecules like proteins can barely even be studied with computer
simulations; the physical chemistry is just too tricky. Similarly, so far, only the very simplest
brain regions and functions can be understood in terms of neurons and their
interactions. And complex brain
functions can’t even be understood via computer simulations, yet. There are too many neurons, and the
interconnectivity patterns and ensuing dynamics are just too tricky.
Artificial
Neural Networks for Biology and Engineering
The
model of the brain as a network of neurons, passing charge amongst each other,
is a crude approximation to the teeming complexity of the brain as it actually
is. But it has a number of
advantages. Simplicity and
comprehensibility, for instance. And,
just as critically, amenability to mathematical analysis and computer
simulation. Modeling all the molecules
in the brain as a set of equations is an impossible task, in practice. It might be necessary for a real
understanding of brain function, but I certainly hope not. Even modeling the chemical dynamics inside a
single neuron is a huge exercise, to which some excellent scientists devote
their entire lives, and which tells you extremely little about the functioning
of the brain as a whole and the emergence of intelligence from matter. On the other hand, if one is willing to view
the brain as a neural network, then one can construct mathematical and
computational models of the brain quite easily. It’s true that the brain contains roughly a hundred billion
neurons, and no mathematician can write down that many equations, and very few
computers can simulate a system that large with reasonable efficiency. But at least one can simulate small neural
networks, to get some kind of feel for how brain dynamics works.
This
is the raison d’etre of the “neural networks” field, which was launched by the pioneering work of cyberneticists
Warren McCullough and Walter Pitts, in the early 1940's. What McCullough and Pitts did in their first
research paper on neural networks was to prove that a simple neural network
model of the brain could do anything – could serve as a "universal computer." At this stage computers were still basically
an idea, but Alan Turing and others had worked out the theory of how computers
could work, and McCullough and Pitts' work, right there at the beginning,
linked these ideas with brain science.
Artificial
neural nets have come a long way since McCullough and Pitts. But even after all these years, the main
thing this kind of work does is to expose our ignorance.
For
instance, to decide which artificial neurons should connect to which, in an
artificial brain-simulator program, requires detailed biological knowledge that
is hard to come by. Deciding what types
of neurons to have in one’s toy neural net requires similarly elusive
knowledge, if simulating the brain accurately is really one’s goal. A few neural net specialists focus on
acquiring and utilizing this data – Stephen Grossberg of Boston University is
the chief example.
Gerald
Edelman, who won a Nobel for his work in immunology, has taken a different
approach, trying to abstract beyond these difficult issues while remaining
within the computational brain modeling domain, by modeling networks of neuronal
groups rather than networks of individual neurons. He has proposed some detailed and
fascinating theories about the connectivity patterns of neuronal groups, and
the dynamics of neural group networks, and embodied these theories in computer
simulations of vision processing.
Edelman doesn’t consider himself a neural net theorist per se, but the
basic line of thinking and mathematical modeling is not far off.
Next
, beyond the issue of the lack of sufficient detailed biological data to guide
detailed computer modeling, there’s the small question of what an artificial
neural network is supposed to do. As
even the most overintellectual of us have surely noticed at some time during
our lives, the brain is connected to a body.
On the other hand, most artificial neural networks are not attached to
any real sensors or actuators. There
are exceptions, such as neural nets used in vision processing or robotics. But other than these cases, neural network
programs must be carefully fed data, and must be carefully studied to decode
their dynamical patterns into meaningful results in the given application
context. It often turns out that
figuring out how to represent data to feed to a neural network program requires
more intelligence than the neural network program itself embodies.
In
practice, the biological-modeling motivated research carried out by neural net
researchers like Grossberg and Edelman is fairly separate from more pragmatic,
engineering-oriented neural network research.
Most engineering applications of neural nets aren’t based on serious
brain modeling at all; rather, the brain’s neural network is taken as general
conceptual inspiration for the creation of an artificial neural network in
software, and then computationally efficient learning algorithms are applied to
this artificial neural network. It
seems likely that the standard learning algorithms for artificial neural nets
are actually more efficient at learning than the brain’s learning
algorithms – in the context of very small neural networks, say hundreds to
thousands of neurons. On the other
hand, they don’t scale up well to billions of neurons. The brain’s learning algorithms, on the
other hand, work badly in the small scale but remarkably well in the large.
For
instance, Rulespace (www.rulespace.com) uses a neural
network program to recognize Internet pornography. America Online uses it to filter out porn for customers who
request this feature. But the neural
net inside their program can’t actually read text. Rather, there’s other code that recognizes key words in Web
pages, and then uses the presence or absence of each key word in a Web page to
control whether a certain neuron in a neural network gets activated (given
simulated electricity) or not. The
pornographicness or not of the Web page is then determined by whether or not a
special neuron (the “output neuron”) is active or not, after activation has
been given a while to spread through the network.
In
application like Rulespace, a neural network is being used as a simple
mathematical widget. Simulation of the
brain in any serious sense is not even being attempted. The connectivity pattern of the neurons, and
the inputs and outputs of the neurons, are engineered to yield intelligent
performance on the particular problem the neural net was design to solve. This is pretty typical in the neural net
engineering world.
Real-world
neural net engineering gets quite complex.
For instance, to get optimal performance for OCR, instead of one neural
net, researchers have constructed modular nets, with numerous subnetworks. Each subnetwork learns something very
specific, then the subnetworks are linked together into an overall meta-network. One can train a single network to recognize
a given feature of a character – say a descender, or an ascender coupled
with a great deal of whitespace, or a collection of letters with little
whitespace and no ascenders or descenders.
But it is hard to train a single network to do several different things
– say, to recognize
letters with ascenders only, letters with descenders only, letters with both
ascenders and descenders, and letters with neither. Thus, instead of one large network, it pays to break things up
into a collection of smaller networks in a hierarchical architecture. If the network learned how to break itself
up into smaller pieces, one would have a very impressive system; but currently
this is not the case, the subnets are carefully engineered by humans.
Some
experiments with fairly simple neural nets, such as the ones used in these
practical applications, have had fascinating parallels in human
psychology. In the mid-80’s, for
instance, Rumelhart and McLelland created an uproar with simple neural networks
that learn to conjugate verbs. They
showed that the networks, in the early stages of learning, made the same kinds
of errors as small children. Other
people, a few years, trained neural nets to translate written words into sounds. If they destroyed some of the neurons and
connections in the network, they showed, they obtained a dyslexic neural
net. Which is the same thing that
happens if you lesion someone's brain in the right area. You can do the same sort of thing for epilepsy:
by twiddling the parameters of a neural network you can get it to have an
epileptic seizure. Of course, none of
this proves that these neural nets are good brain models in any detailed sense,
but it shows that they belong to a general class of “brain-like systems.”
Perhaps the most fascinating neural net project around is Hugo de
Garis’s Artificial Brain project, originally conducted at ATR in Japan
and currently being pursued at Starlab, in Brussels. This is an attempt to create a hardware platform for advanced
artificial intelligence using special chips called Field-Programmable Gate
Arrays to implement neural networks. As
de Garis says, his CBM platform “will allow the assemblage of 10,000’s of
evolved neural net modules. Humanity can then truly start building artificial
brains. I hope to put 10,000 modules into a kitten robot brain….” This research remains unfinished, but it
will be interesting to see where it leads.
The
Mystery of Neurodynamics
Most
of the neural networks used in practical applications are pretty simple
dynamically. In order to get reliable
functionality in their particular domains, the neural networks involved are
restricted to very limited behavior regimes.
It’s easy to predict what they’ll do overall, although the details of
the activation spreading inside them may be wildly fluctuating. Neural nets constructed for biological
modeling are usually allowed a freer rein to evolve and grow, but this is
rarely the focus of research. In
general, the potential for really complex and subtle dynamics in neural
networks has hardly been explored at all.
And
this is a shame, because, the “threshold” behavior of a neuron conceals the
potential for immense dynamical complexity, of a very psychologically relevant
nature. Think about it: let's say a
neuron holds almost enough charge to meet the threshold requirement, but not
quite. Then a chance fluctuation increases its charge just a little bit. Its
total store of charge will be pushed over the threshold, and it will shoot its load.
A tiny change in input leads to a tremendous change in output -- the hallmark
of chaos. But this is just the
beginning. This neuron, which a tiny fluctuation has caused to fire, is going
to send its input to other neurons. Maybe some of these are also near the
threshold, in which case extra input will likely influence their behavior. And
these may set yet other neurons off -- et cetera. Eventually some of these
indirectly triggered neurons may feed back to the original neuron, setting it
off yet again, and starting the whole cycle from the beginning. The whole
network, in this way, can be set alive by the smallest fluke of chance! In this way you get chaos and complexity out
of these simple formal networks.
The
one biological researcher who has really grabbed ahold of this aspect of neural
nets is Walter Freeman, who has shown, in his work with the olfactory part of
the brain, that real neural networks display the same complex and crazy
dynamics as artificial neural nets, and that this complex dynamics is critical
to how brains solve the problem of identifying smells. Now, if we use complex, near-chaotic
dynamics to identify smells, it’s hardly likely that the neural nets in our
frontal lobes use simple dynamics like those embodied in current neural network
based software products. We have a long
way to go before our toy neural net model catch up with the neural nets in our
heads!
Neural
Nets and Real AI
Biologists
and pragmatic computer scientists each have their own use for neural network models,
their own neural network learning schemes, and so forth. My own personal interest in neural nets, on
the other hand, has been mainly
oriented neither toward brain modeling, nor towards immediate practical
engineering applications. Rather, it’s
been oriented toward real AI – toward the creation of truly intelligent
computer programs, programs that, like humans, know who they are and behave
autonomously in the world. Viewed in
this light, current neural network research comes up rather lacking. Now, this isn’t a tremendously shocking
conclusion, since with the exception of deGaris’s brain machine, researchers
are playing around with vastly smaller neural networks than the one the brain
contains. But it’s interesting to delve
into the precise reasons why current neural net work isn’t terribly relevant to
the task of artificial mind creation.
To
understand the role of neural nets in the history of AI, one also has to
understand their opposition. When I
first started studying AI in the mid-1980’s, it seemed that AI researchers were
fairly clearly divided into two camps, the neural net camp and the logic-based
or rule-based camp. This isn’t quite so
true anymore, but it’s still a decent first order approximation.
Whereas
neural nets try to achieve intelligence by simulating the brain, rule-based
models take a totally different approach.
They try to simulate the mind's ability to make logical, rational
decisions, without asking how the brain does this biologically. They trace back to a century of
revolutionary developments in mathematical logic, culminating in the
realization that Leibniz’s dream of a complete logical formalization of all
knowledge is actually achievable in principle, although very difficult in
practice.
Rule-based
AI programs aren’t based on self-organizing networks of autonomous elements
like neurons, but rather on systems of simple logical rules. Intelligence is reduced to following
orders. In spite of some notable
successes in areas like medical diagnosis and chess playing and financial
analysis, the biggest thing this approach has taught us is that it’s really
hard to boil down intelligent behaviors into sets of rules – the sets of rules
are huge and variegated, and the crux of intelligence become the dynamic
learning of rules rather than the particular rules themselves.
Now,
to most any observer not hopelessly caught up on one or another side of the
debate, it’s obvious that both of these ways of looking at the mind – rules or
neural nets -- are extremely limited.
True intelligence requires more than following carefully defined rules,
and it also requires more than random or rigidly laid-out links between a few
thousand artificial neurons.
My
own attempt at a solution to this problem, in my “Webmind AI Engine” software
system, has been somewhat like Gerald Edelman’s. My AI program is based on entities called nodes, that are roughly
of the same granularity as Edelman’s “neuronal groups.” Nodes are a bit like neurons – they have a
threshold rule in them, and they’re connected by “links” that are a bit like
synapses – connections between neurons – in the brain. But the Webmind AI Engine’s nodes have a
lot more information in them than neurons.
And the links between nodes have more to them than the links in neural
net models – they’re not just conduits for simulated electricity; they have
specific meanings, sometimes similar to the meanings of logical rules in a
rule-based AI system.
The
key intuition underlying Edelman’s and my approaches is to focus on the
intermediate level of brain/mind organization: larger than the neuron, smaller
than the abstract concept. The idea is
to view the brain as a collection of clusters of tens or hundreds of thousands
of neurons, each performing individual functions in an integrated way. One module might detect edges of forms in
the visual field, another might contribute to the conjugation of verbs. The network of neural modules is a network
of primitive mental processes rather than a network of non-psychological,
low-level cells (neurons). The key to brain-mind, in this view, lies in the way
the modules are connected to each other, and they way they process information
collectively. The brain is more than a
network of neurons connected according to simple patterns, and the mind is more
than an assemblage of clever algorithms or logical transformation rules. Intelligence is not following prescribed
deductive or heuristic rules, like IBM’s
super-rule-based-chess-player Deep Blue; but nor is intelligence the
adaptation of synapses in response to environmental feedback, as in current
neural
net systems.
Intelligence involves these things, but at bottom intelligence is
something different: the self-organization and mutual intercreation of a network
of processes, embodying perception, action, memory and reasoning in a unified
way, and guiding an autonomous system in its interactions with a rich, flexible
environment.
Onward!
It’s far too early to write a conclusion for the field of neural network
research. Right now, things are in all
ways extremely primitive. The dynamics
of the brain has not been tremendously elucidated by experimentation with
neural net models – not yet. But maybe it will be, as these models become
larger (due to faster computers and bigger computer networks) and richer in
structure (due to greater understanding of the brain, or further development in
theoretical AI). So far, there aren’t
any major engineering problems that neural nets solve vastly better than other
non-brain-inspired algorithms. And so
far, nothing close to a fully functioning artificial bug, let alone an
artificial human brain, has been produced using neural net software – or any
other approach to AI. We’re at the
beginnings of this fascinating area of cognitive science research, not the end. If neural net researchers are willing to
grow their field to embrace more and more of the multileveled complexity of the
three-pound enigma in our skulls, then we can expect more and more wonders to
emerge from their work as the years roll by.