Structure of Intelligence -- Copyright Springer-Verlag © 1993 |
The ideas of the previous chapters fit together into a coherent, symbiotic unit: the master network. The master network is neither a network of physical entities nor a simple, clever algorithm. It is rather a vast, self-organizing network of self-organizing programs, continually updating and restructuring each other. In previous chapters we have discussed particular components of this network; but the whole is much more than the sum of the parts. None of the components can be fully understood in isolation.
A self-organizing network of programs does not lend itself well to description in a linear medium such as prose. Figure 11 is an attempt to give a schematic diagram of the synergetic structure of the whole. But, unfortunately, there seems to be no way to summarize the all-important details in a picture. In Appendix 1, lacking a more elegant approach, I have given a systematic inventory of the structures and processes involved in the master network: optimization, parameter adaptation, induction, analogy, deduction, the structurally associative memory, the perceptual hierarchy, the motor hierarchy, consciousness, and emotion.
These component structures and processes cannot be arranged in a linear or treelike structure; they are fundamentally interdependent, fundamentally a network. At first glance it might appear that the master network is impossible, since it contains so many circular dependencies: process A depends on process B, which depends on process C, which depends on process A. But, as indicated in the previous chapters, each process can be executed independently -- just not with maximal effectiveness. Each process must do some proportion of its work according to crude, isolated methods -- but this proportion may be reduced to a small amount.
Figure 11 and Appendix 2 provide the background necessary for my central hypothesis: that the master network is both necessary and sufficient for intelligence.
As in Chapter 4, let us define general intelligence as the average of L-intelligence, S-intelligence, and R.S.-intelligence, without specifying exactly what sort of average is involved. Then, using the ideas of Chapters 4 and 5, one may easily prove the following:
Theorem 12.1: For any computable set of patterns C, and any degree D of general intelligence, there is some master network which has general intelligence D relative to C.
However, it is also clear from Chapter 5 that most of the master network is not essential for this result. In particular, we have:
Theorem 12.2: Theorem 12.1 holds even if the perceptual and motor control hierarchies only have one level each, and even if the global optimizer works by the Monte Carlo method.
In fact, even this assumes too much. The essential core of the master network consists the induction processor, the global optimizer and the parameter adaptor. One may show
Theorem 12.3: Theorem 12.1 holds even if all perception, induction and parameter adaptation are executed by Monte Carlo optimization, and the analogy and deduction processors do not exist.
FEASIBLE INTELLIGENCE
The problem is that Theorem 12.1 and its offshoots do not say how large a master network needs to be in order to attain a given degree of intelligence. This is absolutely crucial. As discussed in Chapter 11, it is not physically possible to build a Turing machine containing arbitrarily many components and also working reasonably fast. But it is also not possible to build a quantum computer containing arbitrarily many components and also working reasonably fast. Quantum computers can be made smaller and more compact than classical Turing machines, but Planck's constant would appear to give a minimum limit to the size of any useful quantum computer component. With this in mind, I make the following hypothesis:
Hypothesis 12.1: Intelligent computers satisfying the restrictions imposed by Theorem 12.3, or even Theorem 12.2, are physically impossible if C is, say, the set of all N'th order Boolean functions (N is a very large number, say a billion or a trillion).
This is not a psychological hypothesis, but it has far-reaching psychological consequences, especially when coupled with the hypotheses made at the end of Chapter 4, which may be roughly paraphrased as
Hypothesis 12.2: Every generally intelligent system, relative to the C mentioned in the Hypothesis 12.2, contains a master network as a significant part of its structure.
Taken together, these two hypotheses imply that every intelligent system contains every component of the master network.
In conclusion, I think it is worth getting even more specific:
Hypothesis 12.3: Intelligent computers (relative to the C mentioned in Hypothesis 12.2) in which a high proportion of the work of each component of the master network is done independently of the other components -- are physically impossible.
All this does not imply that every intelligent system -- or any intelligent system -- contains physically distinct modules corresponding to "induction processor", "structurally associative memory," and so on. The theorems imply that the master network is a sufficient structure for intelligence. And the hypotheses imply that the master network a necessary part of the structure of intelligence. But we must not forget the definition of structure. All that is being claimed is that the master network is a significant pattern in every intelligent system. According to the definition given in Chapter 4, this means that the master network is a part of every mind. And, referring back to the definition of pattern, this means nothing more or less than the following: representing (looking at) an intelligent system in terms of the master network always yields a significant amount of simplification; and one obtains more simplification by using the entire master network than by using only part.
PHILOSOPHY OR SCIENCE?
To demonstrate or refute these hypotheses will require not only new mathematics but also new science. It is clear that, according to the criterion of falsification, the hypotheses are indeed scientific. For instance, Hypothesis 12.2 could be tested as follows:
1) prove that system X is intelligent by testing its ability to optimize a variety of complex functions in a variety of structurally sensitive environments
2) write the physical equations governing X, mathematically determine the set of all patterns in X, and determine whether the master network is a significantpart of this set
We cannot do this experiment now. We must wait until someone constructs an apparently intelligent machine, or until neuroscientists are able to derive the overall structure of the brain from the microscopic equations. But, similarly, no one is going to make a direct test of the Big Bang theory of cosmology or the theory of evolution by natural selection, at least not any time soon. Sometimes, in science, we must rely on indirect evidence.
The theory of natural selection is much simpler than the theory of the master network, so indirect evidence is relatively easy to find. The Big Bang theory is a slightly better analogy: it is not at all simple or direct. But the theories of differential geometry, functional analysis, differential equations and so forth permit us to deduce a wide variety of indirect consequences of the original hypothesis. In principle, it should be possible to do something similar for the theory of the master network. However, the master network involves a very different sort of mathematics -- theoretical computer science, algorithmic information theory, the theory of multiextremal optimization, etc. These are very young fields and it will undoubtedly be difficult to use them to derive nontrivial consequences of the theory of the master network.
A theory of mind and a theory of brain are two very different things. I have sketched an abstract Platonic structure, the master network, and claimed that the structure of every intelligent entity must contain a component approximating this structure. But it would be folly to deny that different entities may approximate this structure in very different ways.
A general study of the emergence of minds from physical systems would require a general theory of networks of programs. But of course no such theory presently exists (see Appendix 2 for a summary of certain preliminary results in the field). Thus we cannot give a comprehensive answer to the question: what sorts of machines, if constructed, would be able to think? In talking about thinking machines, we will have to be contented with very specialized considerations, improvising on the themes of computer science and neuroscience.
Most of the workings of the brain are still rather obscure. We have an excellent understanding of the workings of individual brain cells (Hille, 1984); and we have long known which regions of the brain concentrate on which functions. What is lacking, however, is a plausible theory of the intermediate scale. The study of the visual cortex, reviewed above, has brought us a great deal closer to this goal. But even here there is no plausible theory relating thoughts, feelings and "mind's-eye" pictures to the microscopic details of the brain.
In Chapters 6 and 10, lacking a truly effective theory of intermediate-level brain structure, we have made use of what I consider to be the next best thing: Edelman's "Neural Darwinism," a slightly speculative but impressively detailed model of low-to-intermediate-scale brain structure. I suspect that Neural Darwinism is incapable of explaining the higher levels of cognition and memory; but, be that as it may, the theory is nonetheless essential. As suggested in Chapters 6 and 10, it indicates how one might go about establishing a nontrivial connection between brain and mind. And furthermore, it leads to several interesting ideas as to how, given sufficient technology, one might go about constructing an intelligent machine. In closing, let us sketch one of these ideas.
OMPs, AMPs and nAMPs
In what follows I will speculate as to what global neural structure might conceivably look like. This should not be considered a theory of the brain but a design for a brain, or rather a sketch of such a design -- an indication of how one might draw blueprints for a thinking machine, based loosely on both the idea of the master network and the theory of Neural Darwinism. The "zero level" of this design consists of relatively sophisticated "optimization/memory processors" or OMPS, each of which stores one function or a fairly small set of related functions, and each of which has the capacity to solve optimization problems over the space of discrete functions -- e.g. to search for patterns in an input -- using the functions which it stores as initial guesses or "models". For instance, the multi-leveled "Neural Darwinist" network of maps described at the end of Chapter 10 could serve as an OMP. It is biologically plausible that the brain is composed of a network of such networks, interconnected in a highly structured manner.
Next, define an "analogy-memory processor," an AMP, as a processor which searches for patterns in its input by selecting the most appropriate -- by induction/analogy/deduction -- from among an assigned pool of OMPs and setting them to work on it. Each AMP is associated with a certain specific subset of OMPs; and each AMP must contain within it procedures for general deductive, inductive and analogical reasoning, or reasonable approximations thereof. Also, each AMP must be given the power to reorganize its assigned pool of OMPs, so as to form a structurally associative memory. There should be a large amount of duplication among the OMP pools of various AMPs.
And similarly, define a "second-level analogy-memory processor," a 2AMP, as a processor which assigns to a given input the AMP which it determines -- by induction/analogy/deduction -- will be most effective at recognizing patterns in it. Define a 3AMP, 4AMP, etc., analogously. Assume that each nAMP (n>1) refers to and has the power to reorganize into rough structural associativity a certain pool of (n-1)AMPS.
Assume also that each nAMP, n=2,..., can cause the (n-1)AMPs which ituses frequently to be "replicated" somehow, so that it can use them as often as necessary. And assume that each AMP can do the same with OMPs. Physically speaking, perhaps the required (n-1)AMPs or OMPs could be put in the place of other (n-1)AMPs or OMPs which are almost never used.
A high-level nAMP, then, is a sort of fractal network of networks of networks of networks... of networks. It is, essentially, an additional control structure imposed upon the Neural Darwinist network of maps. I suspect that the Neural Darwinist network of maps, though basically an accurate model, is inadequately structured -- and that, in order to be truly effective, it needs to be "steered" by external processes.
I will venture the hypothesis that, if one built a nAMP with, say, 15 levels and roughly the size and connectivity of the human brain -- and equipped it with programs embodying a small subset of those special techniques that are already standard in AI -- it would be able to learn in roughly the same way as a human infant. All the most important aspects of the master network are implicitly or explicitly present in the nAMP: induction, pattern recognition, analogy, deduction structurally associative memory, and the perception and motor control hierarchies.
In conclusion: the nAMP, whatever its shortcomings, is an example of a design for an intelligent machine which is neither AI-style nor neural-network-style. It is neither an ordinary program nor an unstructured assemblage of programs; nor a self-organizing network of neurons or neural clusters without coherent global structure. It is a program, and a network of physical entities -- but more importantly it is a network of networks of networks ... of networks of programs; a network of networks of networks... of networks of neural clusters. In this context it seems appropriate to repeat Hebb's words, quoted above: "it is... on a class of theory that I recommend you to put your money, rather than any specific formulation that now exists." The details of the nAMP are not essential. The point is that, somehow, the dynamics of neurons and synapses must intersect with the abstract logic of pattern. And the place to look for this intersection is in the behavior of extremely complex networks of interacting programs.