Back to " From Complexity to Creativity" Contents

Part III. Mathematical Structures in the Mind

CHAPTER 11. ARTIFICIAL SELFHOOD

1. Introduction
2. Autopoiesis and Knowledge Representation
3. Self- and Reality-Theories
4. Artificial Intersubjectivity

CHAPTER ELEVEN

ARTIFICIAL SELFHOOD

11.1 INTRODUCTION

The psynet model portrays the mind as a seething "soup" of intertransforming mental processes, self-organized into a spatial distribution. These "magician" processes lock into various attractors, which adapt themselves to each other, and to external circumstances. Low-level autopoietic process systems bind together into a meta-attractor, the dual network.

The dual network is a far more sophisticated emergent structure than any current AI program has been able to manifest -- but it is not the end of the line by any means. The dual network itself interacts with and supports a variety of other dynamics and structures. One such dynamic, mindspace curvature, was introduced in Chapter Four. Here, thinking along similar lines, we will introduce an abstract structure that is hypothesized to emerge from the dual network: the self.

By "self" what I mean is "psychosocial self" -- that is, a mind's image of itself as a part of the external world. There are other notions of self, e.g. the "higher self" of various spiritual traditions, but these are not at issue here. A self, in the sense I mean it here, is formed through observing one's interactions with the world, and it is also a tool for interacting with the world. The self is just one among many autopoietic systems within the dual network, but it assumes particular importance because of its sheer size. It extends from the lowest levels, physical sensations, to very high levels of abstract feeling and intuition. And it is also widely extended through the various levels of the heterarchical network, pertaining to thoughts, perceptions and actions in all sorts of different domains. The self is very often engaged with consciousness, which means that it exercises a strong effect on the way things are "bundled" up in memory.

Self is rarely discussed in the context of AI, but I believe that it should be. I will argue that self is necessary in order that a system adequately represent knowledge in terms of autopoietic systems. Thus, until we have "artificial selfhood," we will never have true artificial intelligence. And in order to induce artificial selfhood, it may be necessary to give our AI agents a social context -- to pursue artificially intelligent artificial life, or A-IS, artificial intersubjectivity.

Later chapters will pick up on these themes by considering the notion of multiple subselves, and explaining human personality phenomena in terms of subself dynamics. Then, in the final chapter, AI will be drawn back into the picture, and the precise relation of human and machine creativity to selves and subselves will be explored.

11.2 AUTOPOIESIS AND KNOWLEDGE REPRESENTATION

"Knowledge representation" is a key word in AI. It usually refers to the construction of explicit formal data structures for encapsulating knowledge. The psynet model represents a fundamentally different approach to knowledge representation. It contends that knowledge is stored in wide-basined autopoietic magician systems, rather than frames, objects, schemas, or other such formal constructions. These formal constructions may describe the contents of self-producing mental process systems, but they do not emulate the inherent creativity and flexibility of the systems that they describe.

The single quality most lacking in current AI programs is the ability to go into a new situation and "get oriented." This is what is sometimes called the brittleness problem. Our AI programs, however intelligent in their specialized domains, do not know how to construct the representations that would allow them to apply their acumen to new situations. This general knack for "getting oriented" is something which humans acquire at a very early age. It is something that current AI programs lack, due to their brittle, "dead," non-autopoietic systems for knowledge representation.

As a "straw man" example of the inflexibility of AI programs, consider Herbert Simon's famous "computer-scientist" program, BACON. This program was inspired by Sir Francis Bacon, who viewed science as a matter of recognizing patterns in tables of numerical data. But, Sir Francis Bacon never appreciated the amount of imagination involved in gleaning patterns from scientific data; and the program BACON falls into the same trap, albeit far more embarrassingly.

For instance, consider the "ideal gas law" from thermodynamics, which states that

pV/nT = 8.32

where p is the pressure of the gas, V is the volume of the gas, T is the temperature in degrees Kelvin, and n is the quantity of the gas in moles. In practice, this relation cannot be expected to hold exactly, but for most real gasses it is a very good approximation.

Given an appropriate table of numbers, BACON was able to induce this law, using rules such as:

If two columns of data increase together, or decrease together, then consider their quotient.

If one column of data increases, while another decreases, then consider their product.

Given a column of data, check if it has a constant value

As pressure goes up, volume goes down, so BACON forms the product pV. Next, as the combined quantity pV goes up, so does thetemperature -- thus BACON constructs the quotient pV/T. And as pV/T goes up, so does the number of moles -- hence the quotient (pV/T)/n = pV/nT is constructed. This quotient has a constant value of 8.32 -- so the ideal gas law is "discovered."

Very interesting, indeed. But how terribly far this is from what real scientists do! Most of the work of science is in determining what kind of data to collect, and figuring out creative experiments to obtain the data. Once a reliable set of data is there, finding the patterns is usually the easiest part. Often the pattern is guessed on the basis of terribly incomplete data -- and this intuitive guess is then used to guide the search for more complete data. But BACON is absolutely incapable of making an intuitive guess from sketchy data -- let alone figuring out what kind of data to collect, or designing a clever experiment.

Simon once claimed that a four-to-five hour run of BACON corresponds to "not more than one human scientific lifetime." Douglas Hofstadter, in Metamagical Themas, has sarcastically expressed his agreement with this: one run of BACON, he suggests, corresponds to about one second of a human scientist's life work. We suggest that Hofstadter's estimate, though perhaps a little skimpy, is much closer to the mark. Only a very small percentage of scientific work is composed of BACON-style data crunching.

A few AI researchers have attempted to circumvent this pervasive brittleness. Perhaps most impressively, Doug Lenat has developed a theory of general heuristics -- problem-solving rules that are abstract enough to apply to any context whatsoever. His programs AM and EURISKO applied these general heuristics to mathematics and science respectively; and both of these programs were moderately successful. For example, EURISKO won a naval fleet design contest two years in a row, until the rules were changed to prohibit computer programs from entering. And it also received a patent for designing a three-dimensional semiconductor junction.

But still, when looked at carefully, even EURISKO's triumphs appear simplistic and mechanical. Consider EURISKO's most impressive achievement, its invention of a 3-D semiconductor junction. The novelty here is that the two logic functions

"Not both A and B"

and

"A or B"

are both done by the same junction, the same device. One could build a 3-D computer by appropriately arranging a bunch of these junctions in a cube. But how did EURISKO make this invention? The crucial step was to apply the following general-purpose heuristic: "When you have a structure which depends on two different things, X and Y, try making X and Y the same thing." The discovery, albeit an interesting one, came right out of the heuristic. This is a far cry from the systematic intuition of a talented human inventor, which synthesizes dozens of different heuristics in a complex,situation-appropriate way.

For instance, the Croatian inventor Nikola Tesla, probably the greatest inventor in recent history, developed a collection of highly ideosyncratic thought processes for analyzing electricity. These led him to an steady stream of brilliant inventions, from alternating current to radio to robotic control. But not one of his inventions can be traced to a single "rule" or "heuristic." Each stemmed from far more subtle intuitive processes, such as the visualization of magnetic field lines, and the physical metaphor of electricity as a fluid. And each involved the simultaneous conception of many interdependent components.

EURISKO may have good general-purpose heuristics, but what it lacks is the ability to create its own specific-context heuristics based on everyday life experience. And this is precisely because it has no everyday life experience: no experience of human life, and no autonomously-discovered, body- centered digital life either. It has no experience with fluids, so it will never decide that electricity is like a fluid. It has never played with Lincoln Logs or repaired a bicycle or prepared an elaborate meal, nor has it experienced anything analogous in its digital realm ... so it has no experience with building complex structures out of multiple interlocking parts, and it will never understand what is involved in this.

EURISKO pushes the envelope of rule-based AI; it is just about as flexible as a rule-based program can ever get. But it is not flexible enough. In order to get programs capable of context-dependent learning, I believe, it is necessary to write programs which self-organize -- if not exactly as the brain does, then at least as drastically as the brain does. However, Lenat is of a different opinion: he believes that this greater flexibility can come from a richer knowledge base, rather than a richer self-organizing cognitive dynamics. With this in mind, he moved from EURISKO to the CYC project, an ambitious attempt to encode in a computer everything that a typical 8-year-old child knows. Given this information, it is believed, simple EURISKO-type heuristics will be able to make complicated intuitive inferences about the real world.

However, the CYC project finished in 1995, and despite heavy funding over a ten year period, it cannot be labeled a success. A CD-ROM is available containing a large amount of common-sense knowledge, stored in formal-logical language. But so what? This knowledge is not sufficient for dealing with everyday situations. It is not sufficiently flexible. The CYC knowledge base is like a department store model -- it looks alive from a distance, but approach closer and you see that it's really just a mock-up. Where are the practical applications making use of the CYC CD- ROM? They do not exist.

The point is that the human mind does not embody its commensense knowledge as a list of propositions. What we know as common sense is a self-reproducing system, a structural conspiracy, an attractor for the cognitive equation. Abstractions aside, the intuitive sense of this position is not hard to see. Consider a simple example. Joe has three beliefsregarding his girlfriend:

A: She is beautiful

B: I love her

C: She loves me

Each of these beliefs helps to produce the others. He loves her, in part, because she is beautiful. He believes in her love, in part, because he loves her. He believes her beautiful, in part, because of their mutual love relationships.

Joe's three thoughts reinforce each other. According to the psynet model, this is not the exception but the rule. When Joe looks at a chair obscured by shadows, he believes that the legs are there because he believes that the seat is there, and he believe that the seat is there because he believes that the legs are there. Thus sometimes he may perceive a chair where there is no chair. And thus, other times, he can perceive a chair far more effectively than a computer with a high-precision camera eye. The computer understands a chair as a list of properties, related according to Boolean logic. But he understands a chair as a collection of processes, mutually activating one another.

The legs of a chair are defined partly by their relation with the seat of a chair. The seat of a chair is defined largely by its relation with the back and the legs. The back is defined partly by its relation to the legs and the seat. Each part of the chair is defined by a fuzzy set of patterns, some of which are patterns involving the other parts of the chair. The recognition of the chair involves the recognition of low-level patterns, then middle-level patterns among these low-level patterns, then higher-level patterns among these. And all these patterns are organized associatively, so that when one sees a certain pattern corresponding to a folding chair, other folding- chair-associated patterns become activated; or when one sees a certain pattern corresponding to an armchair, other armchair- associated patterns become activated. But, on top of these dual network dynamics, some patterns inspire one another, boosting one another beyond their "natural" state of activation. This circular action is the work of the cognitive equation -- and, I suggest, it is necessary for all aspects of intelligent perception, action and thought. The failure of AI programs to construct useful internal models of the world should be understood in this light.

But how do humans come to build their numerous knowledge- representing autopoietic systems? My claim is that the knowledge-representation capacities of the human mind are centered around a single dynamic data structure, called the self. Computer programs, as they currently exist, do not have selves -- and this is why they are not intelligent. CYC tried to get all the knowledge possessed by an eight-year old self -- without the self at the center. But the individual bits of knowledge only have meaning as part of the autopoietic self-system, and other, smaller, autopoietic mental systems. In order to write an intelligent program, we will have to write a program that is ableto evolve a variety of robust autopoietic mental process systems, including a self.

11.3 WHAT IS THE SELF?

Psychology provides many different theories of the self. One of the clearest and simplest is the "synthetic personality theory" proposed by Seymour Epstein's (1984). Epstein argues that the self is a theory. This is a particularly useful perspective for AI because theorization is something with which AI researchers have often been concerned.

Epstein's personality theory paints a refreshingly simple picture of the mind:

[T]he human mind is so constituted that it tends to organize experience into conceptual systems. Human brains make connections between events, and, having made connections, they connect the connections, and so on, until they have developed an organized system of higher- and lower-order constructs that is both differentiated and integrated. ...

In addition to making connections between events, human brains have centers of pleasure and pain. The entire history of research on learning indicates that human and other higher-order animals are motivated to behave in a manner that brings pleasure and avoids pain. The human being thus has an interesting task cut out simply because of his or her biological structure: it is to construct a conceptual system in such a manner as to account for reality in a way that will produce the most favorable pleasure/pain ratio over the foreseeable future. This is obviously no simple matter, for the pursuit of pleasure and the acceptance of reality not infrequently appear to be at cross-purposes to each other.

He divides the human conceptual system into three categories: a self-theory, reality-theory, and connections between self-theory and reality-theory. And he notes that these theories may be judged by the same standards as theories in any other domain:

[Since] all individuals require theories in order to structure their experiences and to direct their lives, it follows that the adequacy of their adjustment can be determined by the adequacy of their theories. Like a theory in science, a personal theory of reality can be evaluated by the following attributes: extensivity [breadth or range], parsimony, empirical validity, internal consistency, testability and usefulness.

A person's self-theory consists of their best guesses about what kind of entity they are. In large part it consists of ideas about the relationship between oneself and other things, or oneself and other people. Some of these ideas may be wrong; butthis is not the point. The point is that the theory as a whole must have the same qualities required of scientific theories. It must be able to explain familiar situations. It must be able to generate new explanations for unfamiliar situations. Its explanations must be detailed, sufficiently detailed to provide practical guidance for action. Insofar as possible, it should be concise and self-consistent.

The acquisition of a self-theory, in the development of the human mind, is intimately tied up with the body and the social network. The infant must learn to distinguish her body from the remainder of the world. By systematically using the sense of touch -- a sense which has never been reliably simulated in an AI program -- she grows to understand the relation between herself and other things. Next, by watching other people she learns about people; inferring that she herself is a person, she learns about herself. She learns to guess what others are thinking about her, and then incorporates these opinions into her self-theory. Most crucially, a large part of a person's self- theory is also a meta-self-theory: a theory about how to acquire information for one's self-theory. For instance, an insecure person learns to adjust her self-theory by incorporating only negative information. A person continually thrust into novel situations learns to revise her self-theory rapidly and extensively based on the changing opinions of others -- or else, perhaps, learns not to revise her self-theory based on the fickle evaluations of society.

Self and Cognition

The interpenetration between self-theories and meta-self- theories is absolutely crucial. The fact that a self-theory contains heuristics for exploring the world, for learning and gathering information, suggests that a person's self- and reality-theories are directly related to their cognitive style, to their mode of thinking.

And indeed, we find evidence for this. For instance, as mentioned above in the context of consciousness and hallucinations, Ernest Hartmann (1988) has studied the differences between "thick-boundaried" and "thin-boundaried" people. The prototypical thick-boundaried person is an engineer, an accountant, a businessperson, a strict and well-organized housewife. Perceiving a rigid separation between herself and the outside world, the thick-boundaried person is pragmatic and rational in her approach to life. On the other hand, the prototypical thin-boundaried person is an artist, a musician, a writer.... The thin-boundaried person is prone to spirituality and flights of fancy, and tends to be relatively sensitive, perceiving only a tenuous separation between her interior world and the world around her. The intriguing thing is that "thin- boundaried" and "thick-boundaried" are self-theoretic concepts; they have to do with the way a person conceives herself and the relation between herself and the world. But, according to Erdmann's studies, these concepts tie in with the way a person thinks about concrete problems. Thick-boundaried people are better at sustained and orderly logical thinking; thin-boundariedpeople are better at coming up with original, intuitive, "wild" ideas. This connection is evidence for a deep relation between self-theory and creative intelligence.

What Hartmann's results indicate is that the way we think cannot be separated from the way our selves operate. This is so for at least two reasons: one reason to do with the hierarchical network, another to do with the heterarchical network. First of all, every time we encapsulate a new bit of knowledge, we do so by analogy to other, related bits of knowledge. The self is a big structure, which relates to nearly everything in the mind; and for this reason alone, it has a broad and deep effect on our knowledge representation. This is the importance of the self in the heterarchical network.

But, because of the hierarchical nature of knowledge representation, the importance of self goes beyond mere analogy. Self does not have to do with arbitrary bits of information: it has to do, in large part, with the simplest bits of information, bits of information pertaining to the immediate perceptual and active world. The self sprawls out broadly at the lower levels of the dual network, and thus its influence propagates upward even more widely than it does.

This self/knowledge connection is important in our daily lives, and it is even more important developmentally. For, obviously, people do not learn to get oriented all at once. They start out, as small children, by learning to orient themselves in relatively simple situations. By the time they build up to complicated social situations and abstract intellectual problems they have a good amount of experience behind them. Coming into a new situation, they are able to reason associatively: "What similar situations have I seen before?" And they are able to reason hierarchically: "What simpler situations is this one built out of?" By thus using the information gained from orienting themselves to previous situations, they are able to make reasonable guesses regarding the appropriate conceptual representations for the new situation. In other words, they build up a dynamic data structure consisting of new situations and the appropriate conceptual representations. This data structure is continually revised as new information that comes in, and it is used as a basis for acquiring new information. This data structure contains information about specific situation and also, more abstractly, about how to get oriented to new situations.

My claim is that it is not possible to learn how to get oriented to complex situations, without first having learned how to get oriented to simpler situations. This regress only bottoms out with the very simplest situations, the ones confronted by every human being by virtue of having a body and interacting with other humans. And it is these very simple structures which are dealt with, most centrally, by the self-theory. There is a natural order of learning here, which is, due to various psychological and social factors, automatically followed by the normal human child. This natural order of learning is reflected, in the mind, by an hierarchical data structure in which more and more complex situations are comprehended in terms of simpler ones. But we who write AI programs have made little or no attempt to respect this natural order.

We provide our programs with concepts which "make no sense" to them, which they are intended to consider as given, a priori entities. On the other hand, to a human being, there are no given, a priori entities; everything bottoms out with the phenomenological and perceptual, with those very factors that play a central role in the initial formation of self- and reality-theories. To us, complex concepts and situations are made of simpler, related concepts and situations to which we already know how to orient ourselves; and this reduction continues down to the lowest level of sensations and feelings. To our AI programs, the hierarchy bottoms out prematurely, and thus there can be no functioning dynamic data structure for getting oriented, no creative adaptability, no true intelligence.

This view of self and intelligence may seem overly vague and "hand-waving," in comparison to the rigorous theories proposed by logic-oriented AI researchers. However, there is nothing inherently non-rigorous about the build-up of simpler theories and experiences into complex self- and reality-theories. It is perfectly possible to model this process mathematically; the mathematics involved is simply of a different sort from what one is used to seeing in AI. Instead of formal logic, one must make use of ideas from dynamical systems theory (Devaney, 1988) and, more generally, the emerging science of complexity. The psynet model gives one natural method for doing this.

Self- and reality- theories, in the psynet model, arise as autopoietic attractors within the context of the dual network. This means that they cannot become sophisticated until the dual network itself has self-organized to an acceptable degree. The dual network provides routines for building complex structures from simple structures, and for relating structures to similar structures. It provides a body of knowledge, stored in this way, for use in the understanding of practical situations that occur. Without these routines and this knowledge, complex self- and reality- theories cannot come to be. But on the other hand, the dual network itself cannot become fully fleshed out without the assistance of self- and reality-theories. Self- and reality- theories are necessary components of creative intelligence, and hence are indispensible in gaining information about the world. Thus one may envision the dual network and self- and reality- theories evolving together, symbiotically leading each other toward maturity.

A speculation? Certainly. And until we understand the workings of the human brain, or build massively parallel "brain machines," the psynet model will remain in large part an unproven hypothesis. However, the intricate mathematical constructions of the logic-oriented AI theorists are also speculations. The idea underlying the psynet model is to make mathematical speculations which are psychologically plausible. Complex systems science, as it turns out, is a useful tool in this regard. Accepting the essential role of the self means accepting the importance of self-organization and complexity for the achievement of flexible, creative intelligence.

11.4 ARTIFICIAL INTERSUBJECTIVITY

So, suppose one accepts the argument that an autopoietic self-system is necessary for intelligence, for knowledge representation. The next question is: How is the self-structure to be made to emerge from the dual network?

The first possibility is that it will emerge spontaneously, whenever there is a dual network? This is plausible enough. But it seems at least as likely that some kind of social interaction is a prerequisite for the emergence of the self-structure. This leads to the concept of A-IS, or "artificial intersubjectivity" (as first introduced in CL). The basis of A-IS is the proposition that self- and reality-theories can most easily evolve in an appropriate social context. Today, computer science has progressed to the point where we can begin to understand what it might mean to provide artificial intelligences with a meaningful social context.

In AI, one seeks programs that will respond "intelligently" to our world. In artificial life, or Alife, one seeks programs that will evolve interestingly within the context of their simu lated worlds (Langton, 1992). The combination of these two research programmes yields the almost completely unexplored discipline of AILife, or "artificially intelligent artificial life" -- the study of synthetically evolved life forms which display intelligence with respect to their simulated worlds. A- IS, artificial intersubjectivity, may be seen as a special case of artificially intelligent artificial life. Conceptually, however, A-IS is a fairly large step beyond the very general idea of AILife. The idea of A-IS is to simulate a system of intelligences collectively creating their own subjective (simulated) reality.

In principle, any AILife system one constructed could become an A-IS system, under appropriate conditions. That is, any collection of artificially intelligent agents, acting in a simulated world, could come to collude in the modification of that world, so as to produce a mutually more useful simulated reality. In this way they would evolve interrelated self- and reality-theories, and thus artificial intersubjectivity. But speaking practically, this sort of "automatic intersubjectivity" cannot be counted on. Unless the different AI agents are in some sense "wired for cooperativity," they may well never see the value of collaborative subjective-world-creation. We humans became intelligent in the context of collaborative world- creation, of intersubjectivity (even apes are intensely intersubjective). Unless one is dealing with AI agents that evolved their intelligence in a social context -- a theoretically possible but pragmatically tricky solution -- there is no reason to expect significant intersubjectivity to spontaneously emerge through interaction.

Fortunately, it seems that there may be an alternative. I will describe a design strategy called "explicit socialization" which involves explicitly programming each AI agent, from the start, with:

1) an a priori knowledge of the existence and autonomy of theother programs in its environment, and

2) an a priori inclination to model the behavior of these other programs.

In other words, in this strategy, one enforces A-IS from the outside, rather than, as in natural "implicit socialization," letting it evolve by itself. This approach is, to a certain extent, philosophically disappointing; but this may be the kind of sacrifice one must make in order to bridge the gap between theory and practice. Explicit socialization has not yet been implemented and may be beyond the reach of current computer resources. But the rapid rate of improvement of computer hardware makes it likely that this will not be the case for long.

To make the idea of explicit socialization a little clearer, one must introduce some formal notation. Suppose one has a simulated environment E(t), and a collection of autonomous agents A1(t), A2(t),..., AN(t), each of which takes on a different state at each discrete time t. And, for sake of simplicity, assume that each agent Ai seeks to achieve a certain particular goal, which is represented as the maximization of the real-valued function fi(E), over the space of possible environments E. This latter assumption is psychologically debatable, but here it is mainly a matter of convenience; e.g. the substitution of a shifting collection of interrelated goals would not affect the discussion much.

Each agent, at each time, modifies E by executing a certain action Aci(t). It chooses the action which it suspects will cause fi(E(t+1)) to be as large as possible. But each agent has only a limited power to modify E, and all the agents are acting on E in parallel; thus each agent, whenever it makes a prediction, must always take the others into account. A-IS occurs when the population of agents self-organizes itself into a condition where E(t) is reasonably beneficial for all the agents, or at least most of them. This does not necessarily mean that E reaches some "ideal" constant value, but merely that the vector (A1,...,AN,E) enters an attractor in state space, which is characterized by a large value of the society wide average satisfaction (f1 + ... + fN)/N.

The strategy of explicit socialization has two parts: input and modeling. Let us first consider input. For Ai to construct a model of its society, it must recognize patterns among the Acj and E; but before it can recognize these patterns, it must solve the more basic task of distinguishing the Acj themselves. In principle, the Aci can be determined, at least approximately, from E; a straightforward AILife approach would provide each agent with E alone as input. Explicit socialization, on the other hand, dictates that one should supply the Aci as input directly, in this way saving the agents' limited resources for other tasks. More formally, the input to Ai at time t is given by the vector

E(t), Acv(i,1,t)(t),..., Acv(i,n(t),t)(t) (1)

for some n < N, where the range of the index function v(i,,)defines the "neighbors" of agent Ai, those agents with whom Ai immediately interacts at time t. In the simplest case, the range of i is always {1,...,N}, and v(i,j,t) = j, but if one wishes to simulate agents moving through a spatially extended environment, then this is illogical, and a variable-range v is required.

Next, coinciding with this specialized input process, explicit socialization requires a contrived internal modeling process within each agent Ai. In straightforward AILife, Ai is merely an "intelligent agent," whatever that might mean. In explicit socialization, on the other hand, the internal processes of each agent are given a certain a priori structure. Each Ai, at each time, is assumed to contain n(t) + 1 different modules called "models":

a) a model M(E|Ai) of the environment, and

b) a model M(Aj|Ai) of each of its neighbors.

The model M(X|Ai) is intended to predict the behavior of the entity X at the following time step, time t+1.

At this point the concept of explicit socialization becomes a little more involved. The simplest possibility, which I call first order e.s., is that the inner workings of the models M(X|Ai) are not specified at all. They are just predictive subprograms, which may be implemented by any AI algorithm whatever.

The next most elementary case, second order e.s., states that each model M(Aj|Ai) itself contains a number of internal models. For instance, suppose for simplicity that n(t) = n is the same for all i. Then second order e.s. would dictate that each model M(Aj|Ai) contained n+1 internal models: a model M(E|Aj|Ai), predicting Aj's internal model of E, and n models M(Ak|Aj|Ai), predicting Aj's internal models of its neighbors Ak. The definition of n'th order e.s. for n > 2 follows the same pattern: it dictates that each Ai models its neighbors Aj as if they used (n-1)'th order e.s. Clearly there is a combinatorial explosion here; two or three orders is probably the most one would want to practically implement at this stage. But in theory, no matter how large n becomes, there are still no serious restrictions being placed on the nature of the intelligent agents Ai. Explicit socialization merely guarantees that the results of their intelligence will be organized in a manner amenable to socialization.

As a practical matter, the most natural first step toward implementing A-IS is to ignore higher-order e.s. and deal only with first-order modeling. But in the long run, this strategy is not viable: we humans routinely model one another on at least the third or fourth order, and artificial intelligences will also have to do so. The question then arises: how, in a context of evolving agents, does a "consensus order" of e.s. emerge? At what point does the multiplication of orders become superfluous? At what depth should the modeling process stop?

Let us begin with a simpler question. Suppose one is dealing with agents that have the capacity to construct modelsof any order. What order model should a given agent choose to deal with? The only really satisfactory response to this question is the obvious one: "Seek to use a depth one greater than that which the agent you're modeling uses. To see if you have gone to the correct depth, try to go one level deeper. If this yields no extra predictive value, then you have gone too deep." For instance, if one is modeling the behavior of a cat, then there is no need to use a fifth-order model or even a third- order model: pretty clearly, a cat can model you, but it cannot conceive of your model of it, much less your model of another cat or another person. The cat is dealing with first-order models, so the most you need to deal with is the second order (i.e. a model of the cat's "first-order" models of you).

In fact, though there is no way to be certain of this, it would seem that the second order of modeling is probably out of reach not only for cats but for all animals besides humans and apes. And this statement may be made even more surely with respect to the next order up: who could seriously maintain that a cat or a pig can base its behavior on an understanding of someone else's model of someone else's model of itself or someone else? If Uta Frith's (1989) psychology of autism is to be believed, then even autistic humans are not capable of sophisticated second-order social modeling, let alone third-order modeling. They can model what other people do, but have trouble thinking about other peoples' images of them, or about the network of social relationship that is defined by each person's images of other people.

This train of thought suggests that, while one can simulate some kinds of social behavior without going beyond first order e.s., in order to get true social complexity a higher order of e.s. will be necessary. As a first estimate one might place the maximum order of human social interaction at or a little below the "magic number seven plus or minus two" which describes human short term memory capacity. We can form a concrete mental image of "Joe's opinion of Jane's opinion of Jack's opinion of Jill's opinion on the water bond issue," a fourth-order construct, so we can carry out fifth-order reasoning about Joe ... but just barely!

According to this reasoning, if intelligence requires self, and self requires intersubjectivity, then there may be no alternative but to embrace A-IS. Just because strong AI is possible does not mean that the straightforward approach of current AI research will ever be effective. Even with arbitrarily much processing power, one still needs to respect the delicate and spontaneous self-organization of psychological structures such as the self.