Structure of Intelligence -- Copyright Springer-Verlag © 1993 |
Though there is a vast psychological literature on intelligence, it contains surprisingly few insights into the foundational questions which interest us here: what is intelligence, and how can it, practically or theoretically, be quantified? The problem is that, as Robert Sternberg has observed, theories of intelligence are not all theories of the same thing. Rather, they tend to be theories of different aspects of intelligence. To make matters worse, the theorists who propose these theories rarely make it clear just what aspects of intelligence their theories embrace (1987, p.141).
The psychology of intelligence has dwelled on the context-specific and the easily measurable. But transcending the bounds of particular contexts is what intelligence is all about; and there is no reason to expect this ability to be easy to gauge.
The confusion may be traced back to the turn of the century. First, Galton (1883) analyzed intelligence as a combination of various psychophysical abilities, everything from strength of grip to reaction time. And then, not too much later, Binet and Simon (1916) proposed that intelligence is a matter of problem solving, logical reasoning and spatial judgement. Binet's approach was of more immediate practical use -- it led to the I.Q. test, which is fairly good at predicting certain aspects of behavior; e.g. at predicting which children are capable of benefiting from schooling. But aspects of Galton's theory have recently been revived (Carroll, 1976; Jensen, 1982). It is now clear that mental speed is closely connected with intelligence; and some modern psychologists (Hunt, 1978; Jensen, 1979) have advocated studying intelligence in terms of quantities such as speed of lexical access. Now it is recognized that the ideas of Galton and Binet, though at first glance contradictory, are on most important points complementary: they refer to different aspects of intelligence.
Just as modern psychology has integrated the ideas of Galton and Binet, Sternberg's "triarchic theory" proposes to synthesize several apparently contradictory currents in the contemporary psychology of intelligence. It seeks to understand the interconnections between: 1) the structures and processesunderlying intelligent behavior, 2) the application of these structures to the problem of attaining goals in the external world, and 3) the role of experience in molding intelligence and its application. Sternberg's triarchic theory is useful here, not because its details are particularly similar to those of the mathematical theory to be presented below, but rather because it provides a convenient context for relating this abstract mathematics with contemporary psychological research. The triarchic theory begins with mainstream psychology and arrives at the somewhat radical hypothesis that, although intelligence can be defined only relative to a certain context, there are certain universal structures underlying all intelligent behavior.
STRUCTURES AND PROCESSES
In the triarchic theory, the structures and processes underlying intelligence are divided into three different categories: metacomponents, performance components, and knowledge-acquisition components. From the point of view of internal structure, intelligence is understood as a problem-solving activity which is allocated specific problems from some external source.
Metacomponents have to do with the high-level management of problem-solving: deciding on the nature of the problem with which one is confronted, selecting a problem-solving strategy, selecting a mental representation of the problem, allocating mental resources to the solution of the problem, monitoring problem-solving progress, and so on. Studies show that all of these factors are essential to intelligent performance at practical tasks (MacLeod, Hunt and Mathews, 1978; Kosslyn, 1980; Hunt and Lansman, 1982).
Metacomponents direct the search for solutions; but they do not actually provide answers to problems. The mental structures which do this are called performance components. These are of less philosophical interest than metacomponents, because the human mind probably contains thousands of different special-case problem-solving algorithms, and there is no reason to suppose that every intelligent entity must employ the same ones. Most likely, the essential thing is to have a very wide array of performance components with varying degrees of specialization.
For example, consider a standard analogy problem: "lawyer is to client as doctor is to a) patient b) medicine". Solving this problem is a routine exercise in induction. Given three entities W, X and Y:
1) the memory is searched for two entities W and X,
2) a relation R(W,X) between the two entities is inferred from the memory,
3) the memory is searched for some Z so that R(Y,Z) holds
This process is a performance component, to be considered in much more detail in the following chapter. It is not "low-level" in the physiological sense; it requires the coordination of three difficult tasks: locating entities in memorybased on names, inference of relations between entities, and locating entities in memory based on abstract properties. But it is clearly on a lower level than the metacomponents mentioned above.
Neisser (1983), among others, believes that the number of performance components is essentially unlimited, with new performance components being generated for every new context. In this point of view, it is futile to attempt to list the five or ten or one hundred most important problem solving algorithms; the important thing is to understand how the mind generates new algorithms. There is certainly some truth to this view. However, it may be argued that there are some relatively high-level performance components which are of universal significance -- for instance, the three forms of analogy to be discussed in the following chapter. These general algorithms may be used on their own, or in connection with the more specific procedures in which Neisser, Hunt (1980), Jensen (1980) and others are interested.
This brings us to the knowledge acquisition components of intelligence: those structures and processes by which performance components and metacomponents are learned. For example, three essential knowledge acquisition components are: sifting out relevant from irrelevant information, detecting significant coincidences (Barlow, 1985), and fusing various bits of information into a coherent model of a situation. These three abilities will be considered in detail in later chapters.
The importance of effective knowledge acquisition for intelligence is obvious. The ability to speed-read will help one perform "intelligently" on an I.Q. test; and the ability to immediately detect anomalous features of the physical environment will help one perform intelligently as a detective. One might argue that factors such as this do not really affect intelligence, but only the ability to put intelligence to practical use. However, intelligence which is not used at all cannot be measured; it is hard to see how it could even be studied theoretically. The mathematical theory of intelligence to be given below provides a partial way around this dilemma by admitting that one part of a mind can be intelligent with respect to another part of the mind even if it displays no intelligent behavior with respect to the external environment.
INTELLIGENCE AND EXPERIENCE
The experiential approach to intelligence begins with the idea that most behavior is "scripted" (Schank and Abelson, 1977). Most actions are executed according to unconscious routine; and strict adherence to routine, though certainly the intelligent thing to do in many circumstances, can hardly be called the essence of intelligence. It would rather seem that the core of intelligence is to be found in the learning of new scripts or routines.
For instance, one might focus on the rate at which newly learned scripts are "automatized". The faster a behavior is made automatic, the faster the mind will be free to focus on learning other things. Or one could study the ability todeal with novel situations, for which no script yet exists. Insight, the ability to synthesize appropriate new metacomponents, performance components and even knowledge acquisition components, is essential to intelligence. It has been extensively studied under the label "fluid intelligence" (Snow and Lohman, 1984).
The relevance of insight to tests such as the I.Q. test is a controversial matter (Sternberg, 1985). It would seem that most I.Q. test problems involve a fixed set of high-level metacomponents, as well as a fixed set of performance components: analogical, spatial and logical reasoning procedures. In other words, in order to do well on an I.Q. test, one must know how to manage one's mind in such a way as to solve puzzles fast, and one must also have a mastery of a certain array of specialized problem-solving skills. However, in this example one sees that the dichotomy between metacomponents and performance components is rather coarse. It would seem that, to do well on an I.Q. test, one has to have a great deal of insight on an intermediate plane: on a level between that of specific problem-solving methods and that of overall management strategies. One must have a mastery of appropriate high-level and low-level scripts, and an ability to improvise intermediate-level behavior.
INTELLIGENCE AND CONTEXT
One may look at intelligence as an array of structures and processes directed toward the solution of specific, externally given problems. One may understand intelligence as the learning of new structures and processes. Or -- third in Sternberg's triarchy -- one may hypothesize that
intelligent thought is directed toward one or more of three behavioral goals: adaptation to an environment, shaping of an environment, or selection of an environment. These three goals may be viewed as the functions toward which intelligence is directed: Intelligence is not aimless or random mental activity that happens to involve certain components of information processing at certain levels of experience. Rather, it is purposefully directed toward the pursuit of these three global goals, all of which have more specific and concrete instantiations in people's lives. (1987, p.158)
This contextual approach to intelligence has the advantage that it is not biased toward any particular culture or species.
For instance, Cole, Gay and Sharp (1971) asked adult Kpelle tribesmen to sort twenty familiar objects, putting each object in a group with those objects that "belonged" with it. Western adults tend to sort by commonality of attributes: e.g. knives, forks and spoons together. But Western children tend to sort by function: e.g. a knife together with an orange. The Kpelle sorted like Western children -- but the punchline is, when asked to sort the way a stupid person would, they sorted like Western adults. According to their culture, what we consider intelligent is stupid; and vice versa. By asking how well a personhas adapted to their environment, rather than how well a person does a certain task, one can to some extent overcome such cultural biases.
Sternberg distinguishes adaptation to an environment from shaping an environment and selecting an environment. In the general framework to be presented below, these three abilities will be synthesized under one definition. These technicalities aside, however, there is a serious problem with defining intelligence as adaptation. The problem is that the cockroach is very well adapted to its environment -- probably better adapted than we are. Therefore, the fact that an entity is well adapted to its environment does not imply that it is intelligent. It is true that different cultures may value different qualities, but the fact that a certain culture values physical strength over the ability to reason logically does not imply that physical strength is a valid measure of intelligence.
Sternberg dismisses this objection by postulating that
the components of intelligence are manifested at different levels of experience with tasks and in situations of varying degrees of contextual relevance to a person's life. The components of intelligence are... universal to intelligence: thus, the components that contribute to intelligence in one culture do so in all other cultures as well. Moreover, the importance of dealing with novelty and automatization of information processing to intelligence are... universal. But the manifestations of these components in experience are... relative to cultural contexts (1987, p. 168).
This is a powerful statement, very similar to one of the hypotheses of this book: that there is a universal structure of intelligence. However, psychology brings us only this far. Its conceptual tools are not adequate for the problem of characterizing this structure in a general, rigorous way.
Having just reviewed certain aspects of the psychological perspective on intelligence, it is worth observing how different the engineering perspective is. As one might expect, engineers have a much simpler and much more practical definition of intelligence.
Control theory deals with ways to cause complex machines to yield desired behaviors. Adaptive control theory deals with the design of machines which respond to external and internal stimuli and, on this basis, modify their behavior appropriately. And the theory of intelligent control simply takes this one step further. To quote a textbook of automata theory (Aleksander and Hanna, 1976)
[An] automaton is said to behave "intelligently" if, on the basis of its "training" data which is provided within some context together with information regarding the desired action, it takes the correct action on other data within the same context not seen during training.
This is the sense in which contemporary "artificial intelligence" programs are intelligent. They can generalize within their limited context: they can follow the one script which they are programmed to follow.
Of course, this is not really intelligence, not in the psychological sense. It is true that modern "intelligent" machines can play championship chess and diagnose diseases from symptoms -- things which the common person would classify as intelligent behavior. On the other hand, virtually no one would say that walking through the streets of New York requires much intelligence, and yet not only human beings but rats do it with little difficulty, but no machine yet can. Existing intelligent machines can "think" within their one context -- chess, medical diagnosis, circuit design -- but they cannot deal with situations in which the context continually shifts, not even as well as a rodent can.
The above quote defines an intelligent machine as one which displays "correct" behavior in any situation within one context. This is not psychologically adequate, but it is on the right track. To obtain an accurate characterization of intelligence in the psychological sense, one must merely modify their wording. In their intriguing book Robots on Your Doorstep, Winkless and Browning (1975) have done so in a very elegant way:
Intelligence is the ability to behave appropriately under unpredictable conditions.
Despite its vagueness, this criterion does serve to point out the problem with ascribing intelligence to chess programs and the like: compared to our environment, at least, the environment within which they are capable of behaving appropriately is very predictable indeed, in that it consists only of certain (simple or complex) patterns of arrangement of a very small number of specifically structured entities.
Of course, the concept of appropriateness is intrinsically subjective. And unpredictability is relative as well -- to a creature accustomed to living in interstellar space and inside stars and planets as well as on the surfaces of planets, or to a creature capable of living in 77 dimensions, our environment might seem just as predictable as the universe of chess seems to us. In order to make this folklore definition precise, we must first of all confront the vagueness inherent in the terms "appropriate" and "unpredictable."
Toward this end, let us construct a simple mathematical model. Consider two computers: S (the system) and E (the environment), interconnected in an adaptive manner. That is, let St denote the state of the system at time t, and let Et denote the state of the environment at time t. Assume that St=f(St-1,Et-1), and Et=g(St-1,Et-1), where f and g are (possibly nondeterministic) functions characterizing S and E. What we have, then, is a discrete dynamical system on the set of all possible states SxE: an apparatus which, given a (system state, environment state) pair, yields the (system state, environment state) pair which is its natural successor. We need to say what it means for S to behave "appropriately", and what it means for E to be "unpredictable".
Intuitively, a system is unpredictable if a lot of information about its past state tends to yield only a little information about its future state. There are many different ways to make this precise. Here we shall consider four different definitions of unpredictability, three of them original.
Let us consider a discrete dynamical system (f,X), which consists of a "state space" X and a function f mapping X into X. For the details to follow, it should be assumed that X is a finite space, so that concepts of algorithmic complexity may be easily applied. But in fact, the ideas are much more general; they apply to any metric space X. A trajectory of the system (f,X) is a sequence (x,f(x),f2(x),...), where fn(x)=f(f(...f(x)...)), the n'th iterate of f applied to x.
In this notation, we may define the Liapunov sensitivity, or L.-sensitivity, of a dynamical system as follows:
Definition 4.1: The L.-sensitivity K(a,n) of a dynamical system (f,X) at a point x in X is defined as the average over all y so that d(x,y)<a of d(fn(x),fn(y)).
The function K tells you, if you know x to within accuracy a, how well you can estimate fn(x).
Different choices of "averaging" function yield different definitions. The most common way of averaging two entities A and B is the arithmetic mean (A+B)/2, but there are other common formulas. For positive numbers such as we have here, there is the geometric mean (AB)1/2 and the power mean (Ap + Bp)1/p. In general, a function A which takes in n real numbers and puts out another is said to be an average if min(x1,...,xn) % A(x1,...,xn) % max(x1,...,xn) for all n-tuples of numbers x1 ,...,xn.
If the average of a set of n numbers is defined as the maximum element of the set, and X is not a discrete space but a space of real numbers or vectors, then in many cases it is known that K(a,n) is equal to a exp(L(x)n), where L(x) is called the "Liapunov exponent" of the dynamical system (Collet and Eckmann, 1980). Often the Liapunov exponent is independent of x, i.e. L(x)=L. This exponent has the advantage of being easily computable. But the maximum function is not always a reasonable choice of average: if one is interested in making a guess as to what fn(x) is probably going to be, then one wants an average which (like, say, the arithmetic mean) does not give undue emphasis to unlikely situations.
To measure the sensitivity of a system, one merely averages the sensitivities at all points x in X. Here again, there is a choice to be made: what sort of average? But since we are speaking conceptually and not making explicit calculations, this need not bother us.
Next, let us consider a form of unpredictability which has not previously been identified: structural sensitivity, or S-sensitivity.
Definition 4.2: The S-sensitivity K(a,n) of a dynamical system (f,X) at a point x in X is defined as the average over all y so that d(x,y)<a of d#(xf(x)...fn(x),xf(x)...fn(x)).
This measures how sensitively the structure of a trajectory depends on its initial point.
Conversely, one may also consider reverse structural sensitivity, or R.S.-sensitivity -- roughly, how sensitively the point a trajectory passes through at time n depends on the structure of the trajectory up to that point. To be precise:
Definition 4.3: The R.S.-sensitivity K of a dynamical system (f,X) at a point x in X is defined as the average over all y so that
d#(xf(x)...fn(x),xf(x)...fn(x))<a of d(fn(x),fn(y)).
This is not so similar to L-sensitivity, but it has a simple intuitive interpretation: it measures how well, from observing patterns in the behavior of a system, one can determine its immediately future state.
Finally, let us define what might be called structural-structural sensitivity, or
S.S.-sensitivity.
Definition 4.4: The S.S.-sensitivity K(a,n,m) of a dynamical system (f,X) at a point x in X is defined as the average, over all y so that
d#(xf(x)...fn(x),xf(x)...fn(x))< a, of
d#(xf(x)...fn(x),xf(x)...fn(x)).
This measures how difficult it is to ascertain the future structure of the system from its past structure.
What is essential here is that we are talking about the unpredictability of structure rather than the unpredictability of specific values. It doesn't matter how different two states are if they lead to similar structures, since (or so I will hypothesize) what the mind perceives is structure.
Theoretically, to measure the L.-, S.-, R.S.- or S.S.-sensitivity of a system, one merely averages the respective sensitivities at all points x in X. But of course, the word "measure" must be taken with a grain of salt.
The metric d# is, in general, an uncomputable quantity. For practical purposes, we must work instead with dC, the distance which considers only patterns in the computable set C. For example, C could be the set of all n'th order Boolean patterns, as discussed at the end of Chapter 3. If one replaces d# with dC in the above definitions, one obtains L.-, S.-, R.S.- and S.S.-sensitivities relative to C.
Estimation of the sensitivity of a system in these various senses could potentially be quite valuable. For instance, if a system were highly sensitive to initial conditions, but not highly structurally sensitive, then although one could not reasonably predict the exact future condition of the system, one would be able to predict the general structure of the future of the system. If a system were highly structurally sensitive but not highly S.S.-sensitive, then, although knowledge of the present state would tell little about the future structure, knowledge of the past structure would tell a lot. If a system were highly R.S.-sensitive but not highly S.S.-sensitive, then by studying the structure of a system one could reasonably predict the future structure but not the exact future state. The precise relation between the various forms of unpredictability has yet to be explored, but it seems likely that all these combinations are possible.
It seems to me that the new sensitivity measures defined here possess a very direct relation to unpredictability as it occurs in real social, psychological and biological situations -- they speak of what studying a system, recognizing patterns in it, can tell about its future. L.-sensitivity, on the other hand, has no such connection. L.-sensitivity -- in particular, the Liapunov exponent -- is profoundly incisive in the analysis of intricate feedback systems such as turbulent flow. However, speaking philosophically, it seems that when studying a system containing feedback on the level of structure as well as the level of physical parameters, one should consider unpredictability on the level of structure as well as the level of numerical parameters.
In conclusion, I would like to make the following conjecture: that when the logic relating self-organization with unpredictability is untangled, it will turn out that real highly self-organizing systems (society, the brain, the ecosystem, etc.) are highly Liapunov sensitive, structurally sensitive and R.S.-sensitive, but are not nearly so highly S.S.-sensitive. That is: roughly speaking, it should turn out that by studying the structure of the past, one can tell something about the structure of the future, but by tracking or attempting to predict specific events one will get nowhere.
As above, let us consider dynamical systems on spaces SxE, where S is the state space of a system and E is the set of states of its environment. Such dynamical systems represent coevolving systems and environments.
We shall say that such a dynamical system contains an S.-sensitive environment to extent e if it is S.-sensitive to degree at least e for every system S; and so forth for L., R.S. and S.S.-sensitivity. One could modify this approach in several ways, for instance to read "for almost any system S," but at this stage such embellishments seem unnecessary. This concept addresses the "unpredictable conditions" part of our definition of intelligence: it says what it means for a system/environment dynamic to present a system with unpredictable conditions.
Next we must deal with "appropriateness". Denote the appropriateness of a state St in a situation Et-1 by A(St,Et-1). I see no reason not to assume that the range of A is a subset of the real number line. Some would say that A should measure the "survival value" of the system state in the environment; or, say, the amount of power that S obtains from the execution of a given action. In any case, what is trivially clear is that the determination of appropriate actions may be understood as an optimization problem.
One might argue that it is unfair to assume that A is given; that each system may evolve its own A over the course of its existence. But then one is faced with the question: what does it mean for the system to act intelligently in the evolution of a measure A? In the end, on some level, one inevitably arrives at a value judgement.
Now we are ready to formulate the concept of intelligence in abstract terms, as "the ability to maximize A under unpredictable conditions". To be more precise, one might define a system to possess S-intelligence with respect to A to degree %%h%% if it has "the ability to maximize A with accuracy g in proportion b of all environments with S-sensitivity h(a,b,c)=abc and %% %% is some measure of size, some norm. And, of course, one might define L.-, R.S.- and S.S.-intelligence with respect to A similarly.
But there is a problem here. Some functions A may be trivially simple to optimize. If A were constant then all actions would be equally appropriate in all situations, and intelligence would be a moot point. One may avoid this problem as follows:
Definition 4.5: Relative to some computable set of patterns C, a system S possesses S.-intelligence to a degree equal to the maximum over all A of the product [S.-intelligence of S with respect to A, relative to C]*[computational complexity of optimizing A]. L., R.S., and S.S.-intelligence may be defined similarly.
This, finally, is our working definition of intelligence. In terms of Sternberg's triarchic theory, it is essentially a contextual definition. It characterizes the intelligence of a given entity in terms of its interactions with its particular environment; and what is intelligent in one environment may be unintelligent in another. Unfortunately, at present there is no apparent means of estimating the intelligence of any given entity according to this definition.
For simplicity's sake, in the following discussion I will often omit explicit reference to the computable set C. However, it is essential in order that intelligence be possible, and we will return to it in the final chapter. Anything that is done with d# can also be done with dC.
I believe that high S.S. intelligence is, in general, impossible. The reason for this is that, as will become clear in the Chapter 9, perception works by recognizing patterns; so that if patterns in the past are no use in predicting patterns in the future, mind has no chance of predicting anything. I suggest that intelligence works by exploiting the fact that, while the environment is highlyL., S.- and R.S.-sensitive, it is not highly S.S.-sensitive, so that pattern recognition does have predictive value.
The master network, described in Chapter 12, is a system S which is intended to produce a decent approximation to appropriate behavior only in environments E for which the relevant dynamical system on SxE is not extremely S.S.-sensitive -- and not even in all such environments. It is hypothesized to be a universal structure among a certain subset of L., S.- and R.S.-intelligent systems, to be specified below. Thus, a more accurate title for this book would be The Structure of Certain Liapunov, Structural and Reverse Structural Intelligent Systems.
In other words: roughly speaking, the main goal of the following chapters is to explore the consequences of the contextual definition of intelligence just given -- to see what it implies about the structure and experiential dynamics of intelligence. To be more precise about this, we shall require a bit more formalism.
Let S be any system, as above. Let it and ot denote the input to and output of S at time t, respectively. That is, ot is that part of St which, if it were changed, could in certain circumstances cause an immediate change in Et+1; and it is that part of Et which, if it were changed, could in certain circumstances cause an immediate change in St+1.
Then we may define the behavioral structure of an entity S over the interval (r,s) as the fuzzy set B[S;(ir,...,is)] = {Em(ir,or+1),Em(ir+1,or+2),...,Em(is,os+1), St[Em(ir,or+1),Em(ir+1,or+2),...,Em(is,os+1)]}. This is a complete record of all the patterns in the behavior of S over the interval (r,s).
Then what is a model of S, on the interval (r,s)? It is a function MS so that B[MS;(ir,...,is)] is as close to B(S;(ir,...,is)] as possible. In other words, a good model is a simple function of which one can say "If S worked like this, it would have behaved very much the same as it actually did."
In order to specify what is meant by "close", one might define the magnitude of a fuzzy set Z, %%Z%%, as the sum over all z of the degree to which z is an element of z. Then, %%Y-Z%% will be a measure of the size of the total difference between two fuzzy sets Y and Z.
For instance, assume MS is a Turing machine program; then the best model of S might be defined as the function MS which minimized %MS%*%%B[MS;(ir,...,is)]-B[S,(ir,...,is)]%%, where %MS% denotes the size of MS(perhaps (Ms)=%L(Ms)%T).
In general, one good way to go about finding models is to look for functions Y so that %Y%*%%[Y(S(ip)),...,Y(S(iq))]-[S(op+1),...,S(oq+1)]%% is small on some interval (p,q). Such functions -- simple models of the structures of particular behaviors -- are the building blocks out of which models are made. Combining various such functions can be a serious problem, so that it may not be easy to find the best model, but it is a well-defined problem.
That takes care of behavior. Now, what about mind? Let us define the structure St[S;(r,s)] of a system S on the interval (r,s) as the set of patterns in the ordered set [Sr,...,Ss], where St, as above, denotes the state of S at time t. This is the actual structure of the system, as opposed to B[S;(r,s)], which is the structure of the system's behavior. In the case where S is a human or some other organism, through psychology we only have access to B[S;(r,s)], but through biology we can also study St[S;(r,s)].
We may define a mind as the structure of an intelligent system. This means that a mind is not a physical entity but rather a Platonic, mathematical form: a system of functions. Mind is made of patterns rather than particles.
The central claim of this book is that a certain structure, the master network, is part of the mind of every intelligent entity. One might make this more precise in many ways. For instance, define the general intelligence of a system to be the average of its R.S.-intelligence, its S.-intelligence, and its L.-intelligence. Then I propose that:
Hypothesis 4.1: There is a high correlation coefficient between 1) the degree with which the master network is an element of St[S;(r,s)], and 2) general intelligence.
If this is too much to believe, the reader may prefer a weaker statement:
Hypothesis 4.2: If A is more L.-, S.- and R.S.-intelligent than B, the master network is almost never less prominent in A than in B.
These hypotheses will be considered again in Chapter 12, once the master network has been described in detail.