Chapter Eight

THE COGNITIVE EQUATION

To anyone trained in physical science, the overall impression made by psychology and neuroscience is one of incredible messiness. So many different chemical compounds, so many different neural subsystems, so many different psychic dysfunctions, so many different components of intelligence, perception, control.... And no overarching conceptual framework in which all aspects come together to form a unified whole. No underlying equation except those of physics and chemistry, which refer to a level incomprehensibly lower than that of thoughts, emotions and beliefs. No cognitive law of motion.

Of course, there is no a priori reason to expect such a thing as a "cognitive law of motion" to be possible at all. It is amazing that one can find far-reaching yet precise generalizations such as Newton's laws in any field of study. To expect to find such conceptual jewels in every single discipline may be asking more than the world has to offer.

But on the other hand, consider: Newton's laws would have been impossible without calculus, general relativity would have been impossible without differential geometry, and quantum physics would have been impossible without functional analysis. It is quite conceivable that, once we have developed the appropriate mathematical concepts, the goal of a "cognitive law of motion" will cease to appear so unrealistic.

In fact, my contention is that this time has already come. As of 1993, I suggest, we have collectively developed precisely the mathematical and conceptual tools required to piece together the rudiments of a "fundamental equation of mind." The most important of these tools, I suggest, are four in number:

1) component systems

2) pattern theory

3) algorithmic information

4) strange attractors

In this chapter I show how these ideas may be used to formulate a new type of equation, which I call a "self-generating pattern dynamic." This is the type of equation, I suggest, which makes one thought, one emotion, drift into the next. It is the general form which a cognitive law of motion must take.

In The Evolving Mind the term "self-structuring system" is used to describe a system which, more than just organizing itself, structures and patterns itself; a system which studies the patterns in its past, thus determining the patterns in its future. Here I will delineate a class of systems which is a subset of the self-structuring systems -- namely, the class of systems that evolve by self-generating pattern dynamics. My hypothesis is that minds, as well as being self-structuring, also fall within this narrower category.

This is at the same time a brand new approach to mind, and a re-interpretation of the dual network model given in Chapter Three. The cognitive equation presents a dynamical view of mind, whereas the dual network presents a static view; but the two are ultimately getting at the same thing. In the dual network perspective, one begins with a structure, asks what the dynamics must be to retain that structure, and obtains the answer: something like the cognitive equation. In the cognitive equation perspective, on the other hand, one begins with a dynamical iteration, asks what sorts of structures will tend to persist under this iteration, and obtains the answer: something like the dual network. Dynamics lead to statics, statics leads to dynamics, and the simultaneous analysis of the two provides the beginning of an understanding of that mysterious process called mind.

8.1. MIND AS A SELF-GENERATING SYSTEM

The systems theory of Chapter Seven gives us a new way of looking at the dual network. The mind, filtered through the component-systems/self-generating-systems view, emerges as a structured network of components.

Note that this conclusion refers primarily to the mind -- the patterns in the brain -- and not to the brain itself. One could model the brain as a component-system, insofar as each neuron is not a fixed "component" but aspace of potential components -- one component for each condition of its synaptic potentials. When neuron A feeds its output to neuron B, thus altering its synaptic potential, it is in effect "creating" a new element of the space corresponding to neuron B. This may be a fruitful way to think about the brain. However, it is much more direct and elegant to view the collection patterns in the brain as a self-generating component-system -- recalling that a pattern is first of all a process. In the context of general systems theory, the pattern-theoretic model of mind is not merely useful but conceptually essential.

The mind is vastly different from a soup of molecules -- unlike the immune system, it is not even in a rough approximation well-mixed. (Putting brain tissue in a blender to make "synaptosome soup" is a nifty method for determining the levels of different neurotransmitters in the brain, but it has a definite negative effect on brain function.) But the relatively rigid structure of the brain does not prevent it from being a genuine self-generating system, and a genuine component-system.

There is an overall global structure of mind; and this structure self-organizes itself by a dynamic of typeless interaction, in which some mental processes act on others to produce yet others, without respect for any kind of "function/argument" distinction. One can model this sort of activity in terms of stochastic computation alone, without mentioning hypersets or component-systems -- this is the contemporary trend, which I have followed in my previous research. However, in many situations this point of view becomes awkward, and the only way to express the reality clearly is to adopt a typeless formalism such as the one developed in Sections 8.2 and 8.4.

Let us take a simple heuristic example -- purely for expository purposes, without any pretense of detailed biological realism. Let us think, in an abstract way, about the relation between a mental process that recognizes simple patterns (say lines), and a mental process that recognizes patterns among these simple patterns (say shapes). These shape recognizers may be understood as subservient to yet higher-level processes, say object recognizers. If the shape recognizer has some idea what sort of shape to expect, then it must partially reprogram the line recognizer, to tell it what sort of lines to look for. But if the line recognizer perpetually receives instructions to look for lines which are not there, then it must partially reprogram the shape recognizer, to cause it to give more appropriateinstructions. Assuming there is a certain amount of error innate in this process, one has an obvious circularity. The collection of two processors may be naturally modeled as a self-generating system.

It seems likely that the specific programs involved in these perceptual processes involve linear array operations. But still, one does not yet have an array component system. To see where component-systems come in, one needs to take a slightly more realistic view of the perceptual process. One must consider that the mapping between line-recognizing processes and shape-recognizing processes is many-to-many. Each shape process makes use of many line-recognizing process, and the typical line-recognizing process is connected to a few different shape-recognizing processes. A shape-recognizing process is involved in creating new line-recognizing processes; and a group of line-recognizing processes, by continually registering complaints, can cause the object-recognizing parents of the shape-recognizing processes to create new shape-recognizing processes.

What this means is that the reprogramming of processes by one another can be the causative agent behind the creation of new processes. So the collection of processes, as a whole, is not only a self-generating system but a component-system as well. By acting on one another, the mental processes cause new mental processes to be created. And, due to the stochastic influence of errors as well as to the inherent chaos of complex dynamics, this process of creation is unpredictable. Certain processes are more likely to arise than others, but almost anything is possible, within the parameters imposed by the remainder of the network of processes that is the mind.

This example, as already emphasized, is merely a theoretical toy. The actual processes underlying shape and line recognition are still a matter of debate. But the basic concept should be clear. Whenever one has sophisticated multilevel control, combined with heterarchical relationship, one has a situation in which self-referential models are appropriate. The whole network of processes can be modeled otherwise, using only stochastic computer programs. But the vocabulary of self-generating and component-systems leads to a novel understanding of the basic phenomena involved.

8.2. SELF-GENERATING PATTERN DYNAMICS (*)

Now let us return to the formal "process iterations" of Chapter Seven. Equation (**), in itself, is much too general to be of any use as a "cognitive law of motion." If System1 and T are chosen appropriately, then (**) can describe anything whatsoever. That is, after all, the meaning of universal computation! However, this simple iteration is nevertheless the first stop along the path to the desired equation. What is needed is merely to specialize the operator T.

Instead of taking the compounds formed from Systemt, I suggest, one must take the patterns in these compounds. This completes the picture of the mind as a system which recognizes patterns in itself, which forms its own patterns from its own patterns. There might seem to be some kind of contradiction lurking here: after all, how can patterns in hyperrelations themselves be hyperrelations? But of course, this is precisely the distinctive quality of hyperrelations: they subvert the hierarchy of logical types by potentially belonging to their own domain and range. And this unusual property does not violate the laws of physical reality, because the hyperrelations required for practical modeling can themselves be perfectly well modeled in terms of ordinary Boolean functions.

To make this more precise, define the relative structure St^ of a set A = {a, b, c, ...} as the set of all x which are patterns in some subset of A relative to some other subset of A.

For instance, in the algorithmic information model, "x is an exact pattern in b relative to a" means

1) b produces x from a

2) I(x|b,a) < I(x|a)

More generally, statement (2) must be replaced with a less specific formal notion such as

2') |x\{b,a}| < |x\a|

The generalization of this notion to encompass patterns that are approximate rather than exact is quite straightforward.

In this notation, the simplest self-generating pattern dynamic says that, where Systemt is the system at time t,

Systemt+1 = St^( R[Systemt] ) (***)

I call this iteration the basic deterministic dynamic. It will serve as a "demonstration equation" for talking about the properties of more complicated cognitive dynamics.

The idea underlying this equation is encapsulated in the following simple maxim: in a cognitive system, time is the process of structure becoming substance. In other words, the entities which make up the system now all act on one another, and thus produce a new collection of entities which includes all the products of the interactions of entities currently existent. For lack of a better term, I call this exhaustive collection of products the "Raw Potentiality" of the system. Then, the system one moment later consists of the patterns in this collection, this Raw Potentiality.

8.2.1. A General Self-Generating Pattern Dynamic (*)

For every type of self-generating system, there is a corresponding type of self-generating pattern dynamic. The basic deterministic dynamic is founded on the type of self-generating system that is so totally "well-mixed" that everything interacts with everything else at each time step. But in general, this is only the simplest kind of self-generating system: a self-generating system may use any stochastically computable rule to transform what the Raw Potentiality of time t into the reality of time t+1.

Furthermore, the basic deterministic dynamic assumes infinite pattern recognition skill; it is anti-Godelian. In general, a self-generating system may use its Raw Potentiality in an incomplete fashion. It need not select all possible patterns in the Raw Potentiality; it may pick and choose which ones to retain, in a state-dependent way.

Formally, this means that one must consider iterations of the following form:

Systemt+1 = F [ Zt [St^( G[ R[Systemt] ])] ] (****)

where F and G are any stochastically computable functions, and Zt = Z[Systemt] is a "filtering operator" which selects certain elements of

St^( G[ R[Systemt] ]]), based on the elements of Systemt.

Note that the function F cannot make any reference to Systemt; it must act on the level of structure alone. This is why the function Zt is necessary. The particular system state Systemt can affect the selection of which patterns to retain, but not the way these patterns are transformed. If this distinction were destroyed, if F and Zt were allowed to blur together into a more general Ft = F[Systemt], then the fundamental structure-dependence of the iteration would be significantly weakened. One could even define Ft as a constant function on all values of St^( G[ R[Systemt] ]), mapping into a future state depending only on Systemt. Thus, in essence, one would have (**) back again.

Equation (****), like the basic deterministic dynamic (***), is merely (**) with a special form of the transition operator T. T is now assumed to be a some sequence of operations, one of which is a possibly filtered application of the relative structure operator St^. This is indeed a bizarre type of dynamic -- instead of acting on real numbers or vectors, it acts on collections of hyperrelations. However, it may still be studied using the basic concepts of dynamical systems theory -- fixed points, limit cycles, attractors and so forth.

To see the profound utility of the filtering operator Zt, note that it may be defined specifically to ensure that only those elements of St^(G[R[Systemt]]) which are actually computed by sub-systems of Systemt are passed through to F and Systemt+1. In other words, one may set

Zt(X) = Z[Systemt](X) = X intersect R[Systemt]

Under this definition, (****) says loosely that Systemt+1 consists of the patterns which Systemt has recognized in itself (and in the "compounds" formed by the interaction of its subsystems). It may be rewritten as

Systemt+1 = F [ R[Systemt] intersect St^( G[ R[Systemt] ])] (*****)

This specialization brings abstract self-generating pattern dynamics down into the realm of physical reality. For reasons that will be clear a little later, it is this equation that I will refer to as the "cognitive equation" or "cognitive law of motion."

8.2.2. Summary

Self-generating pattern dynamics are dynamical iterations on collections of processes, and are thus rather different from the numerical iterations of classical dynamical systems theory and modern "chaos theory." However, it would be silly to think that one could understand mental systems by the exact same methods used to analyze physical systems.

The basic modeling ideas of graph-theoretic structure and iterative dynamics are applicable to both the mental and the physical worlds. But whereas in the physical domain one is concerned mainly with numerical vectors, in the mental realm one is concerned more centrally with processes. The two views are not logically contradictory: vectors may be modeled as processes, and processes may be modeled as vectors. However, there is a huge conceptual difference between the two approaches.

In non-technical language, what a "self-generating pattern dynamic" boils down to is the following sequence of steps:

1) Take a collection of processes, and let each process act on all the other processes, in whatever combinations it likes. Some of these "interactions" may result in nothing; others may result in the creation of new processes. The totality of processes created in this way is called the Raw Potentiality generated by the original collection of processes.

2) Transform these processes in some standard way. For instance, perhaps one wants to model a situation in which each element of the Raw Potentiality has only a certain percentage chance of being formed. Then the "transformation" of the Raw Potentiality takes the form of a selection process: a small part of the Raw Potentiality is selected to be retained, and the rest is discarded.

3) Next, determine all the patterns in the collection of processes generated by Step 2. Recall that patterns are themselves processes, so that what one has after this step is simply another collection of processes.

4) "Filter out" some of the processes in the collection produced by Step 3. This filtering may be system-dependent -- i.e., the original processes present in Step 1 may have a say in which Step 3-generated pattern-processes are retained here. For instance, as will be suggested below, it may often be desirable to retain only those patterns that are actually recognized by processes in Step 1.

5) Transform the collection of processes produced by Step 4 in some standard way, analogously to Step 2.

6) Take the set of processes produced by Step 5, and feed it back into Step 1, thus beginning the whole process all over again.

This is a very general sequence of steps, and its actual behavior will depend quite sensitively on the nature of the processes introduced in Step 1 on the firstgo-around, as well as on the nature of the transformation and filtering operations. Modern science and mathematics have rather little to say about this type of complex process dynamics. The general ideas of dynamical systems theory are applicable, but the more specific and powerful tools are not. If one wishes to understand the mind, however, this is the type of iteration which one must master.

More specifically, in order to model cognitive systems, a specific instance of the filtering operation is particularly useful: one filters out all but those patterns that are actually recognized by the components of the system. In other words, one takes the intersection of the products of the system and the patterns in the system. The self-generating pattern dynamic induced by this particular filtering operation is what I call the "cognitive equation."

Informally and in brief, one may describe the cognitive equation as follows:

1) Let all processes that are "connected" to one another act on one another.

2) Take all patterns that were recognized in other processes during Step (1), let these patterns be the new set of processes, and return to Step (1)

An attractor for this dynamic is then a set of processes with the property that each element of the set is a) produced by the set of processes, b) a pattern in the set of entities produced by the set of processes. In the following sections I will argue that complex mental systems are attractors for the cognitive equation.

8.3. STRUCTURAL CONSPIRACY

According to chaos theory, the way to study a dynamical iteration is to look for its attractors. What type of collection of processes would be an attractor for a self-generating pattern dynamic?

To begin with, let us restrict attention to the basic deterministic dynamic (***). According to this iteration, come time t+1, the entities existent at time t are replaced by the patterns in the Raw Potentiality generated by these entities. But this does not imply that all the entities from time t completely vanish. That would be absurd -- the system would be a totally unpredictable chaos. It is quite possible for some of the current entities to survive into the next moment.

If a certain entity survives, this means that, as well as being an element of the current system Systemt, it is also a regularity in the Raw Potentiality of Systemt, i.e. an element of R[Systemt]. While at first glance this might seem like a difficult sort of thing to contrive, slightly more careful consideration reveals that this is not the case at all.

As a simple example, consider two entities f and g, defined informally by

f(x) = the result of executing the command "Repeat x two times"

g(x) = the result of executing the command "Repeat x three times"

Then, when f acts on g, one obtains the "compound"

f(g) = the result of executing the command "Repeat x three times" the result of executing the command "Repeat x three times"

And when g acts on f, one obtains the "compound"

g(f) = the result of executing the command "Repeat x two times" the result of executing the command "Repeat x two times" the result of executing the command "Repeat x two times"

Now, obviously the pair (f,g) is a pattern in f(g), since it is easier to store f and g, and then apply f to g, than it is to store f(g). And, in the same way, the pair (g,f) is a pattern in g(f). So f and g, in a sense, perpetuate one another. According to the basic deterministic dynamic, if f and g are both present in Systemt, then they will both be present in Systemt+1.

One may rephrase this example a little more formally by defining f(x) = x x, g(x) = x x x. In set-theoretic terms, if one makes the default assumption that all variables are universally quantified, this means that f has the form {x,{x,x x}} while g has the form {x,{x,x x x}}. So, when f acts on g, we have the ugly-looking construction { {x,{x,x x x}}, {{x,{x,x x x}}, {x,{x,x x x}} {x,{x,x x x}} }; and when g acts on f, we have the equally unsightly {{x,{x,x x}}, {{x,{x,x x}}, {x,{x,x x}} {x,{x,x x}} {x,{x,x x}}}. It is easy to see that, given this formalization, the conclusions given in the text hold.

Note that this indefinite survival is fundamentally a synergetic effect between f and g. Suppose that, at time t, one had a system consisting of only two entities, f and h, where

h = "cosmogonicallousockhamsteakomodopefiendoplamicreticulu mpenproleta riatti"

Then the effect of h acting on f would, by default, be

h(f) = empty set

And the effect of f acting on h would be

f(h) = "cosmogonicallousockhamsteakomodopefiendoplasmicreticulum

penproletariatticosmogonicallousockhamsteakomodope fiendoplasmicreticulumpenproletariatti"

Now, (f,h) is certainly a pattern in f(h), so that, according to the basic deterministic dynamic, f will be a member of Systemt+1. But h will not be a member of Systemt+1 -- it is not a pattern in anything in R[Systemt]. So there is no guarantee that f will be continued to Systemt+2.

What is special about f and g is that they assist one another in producing entities in which they are patterns. But, clearly, the set {f,g} is not unique in possessing this property. In general, one may define a structural conspiracy as any collection of entities G so that every element of G is a pattern in the Raw Potentiality of G. It is obvious from the basic deterministic dynamic that one successful strategy for survival over time is to be part of a structural conspiracy.

Extending this idea to general deterministic equations of the form (****), a structural conspiracy may be redefined as any collection P which is preserved by the dynamic involved, i.e. by the mathematical operations R, G, St^ and F applied in sequence.

And finally, extending the concept to stochastic equations of form (****), a structural conspiracy may be defined as a collection P which has a nonzero probability of being preserved by the dynamic. The value of this probability might be called the "solidity" of the conspiracy. Stochastic dynamics are interesting in that they have the potential to break down even solid structural conspiracies.

One phrase which I use in my own thinking about self-generating pattern dynamics is "passing through." For an entity, a pattern, to survive the iteration of the fundamental equation, it must remain intact as a pattern after the process of universal interdefinition, universal interaction has taken place. The formation of the Raw Potentiality is a sort of holistic melding of all entities with all other entities. But all that survives from this cosmic muddle, at each instant, is the relative structure. If an entity survives this process of melding and separation, then it has passed through the whole and come out intact. Its integral relationship with the rest of the system is confirmed.

8.3.1. Conspiracy and Dynamics

What I have called a structural conspiracy is, in essence, a fixed point. It is therefore the simplest kind of attractor which a self-generating pattern dynamic can have. One may also conceive of self-generating-pattern-dynamic limit cycles -- collections P so that the presence of P in Systemt implies the presence of P in Systemt+k, for some specific integer k>1.

Nietzsche's fanciful theory of the "eternal recurrence" may be interpreted as the postulation of a universe-wide limit-cycle. His idea was that the system, with all its variation over time, is inevitably repetitive, so that every moment which one experiences is guaranteed to occur again at some point in the future.

And, pursuing the same line of thought a little farther, one may also consider the concept of a self-generating-pattern-dynamical strange attractor. In this context, one may define a "strange attractor" as a group P of entities which are "collectively fixed" under a certain dynamic iteration, even though the iteration does not cycle through the elements of P in any periodic way. Strange attractors may be approximated by limit cycles with very long and complicated periodic paths.

In ordinary dynamical systems theory, strange attractors often possess the property of unpredictability. That is, neither in theory nor in practice is there any way to tell which attractor elements will pop up at which future times. Unpredictable strange attractors are called chaotic attractors. But on the other hand, some strange attractors are statistically predictable, as in Freeman's "strange attractor with wings" model of the sense of smell. Here chaos coexists with a modicum of overlying order.

It is to be expected that self-generating pattern dynamical systems possess chaotic attractors, as well as more orderly strange attractors. Furthermore, in ordinary dynamics, strange attractors often contain fixed points; and so, in self-generating pattern dynamics, it seems likely that strange structural conspiracies will contain ordinary structural conspiracies (although these ordinary structural conspiracies may well be so unstable as to be irrelevant in practice). However, there is at the present time no mathematical theory of direct use in exploring the properties of self-generating pattern dynamical systems or any other kind of nontrivial self-generating system. The tools for exploring these models simply do not exist; we must make them up as we go along.

Fixed points are simple enough that one can locate them by simple calculation, or trained intuition. But in classical dynamical systems theory, most strange attractors have been found numerically, by computer simulation or data analysis. Only rarely has it been possible to verify the presence of a strange attractor by formal mathematical means; and even in these cases, the existence of the attractor was determined by computational means first. So it is to be expected that the procedure for self-generating dynamics will be the same. By running simulations of various self-generating systems, such as self-generating pattern dynamics, we will happen upon significant strange attractors ... and follow them where they may lead.

8.3.2. Immunological Pattern Dynamics

The immune system, as argued at the end of Chapter Seven, is a self-generating component-system. The cognitive equation leads us to the very intuitive notion that, even so, it is not quite a cognitive system.

Insofar as the immune system is a self-maintaining network, the survival of an antibody type is keyed to the ability of the type to recognize some other antibody type. If A recognizes B, then this is to be viewed as B creating instances of A (indirectly, via the whole molecular system of communication and reproduction). So the antibody types that survive are those which are produced by other antibody types: the immune network is a self-generating component-system.

The next crucial observation is that the recognition involved here is a pattern-based operation. From the fact that one specific antibody type recognizes another, then it follows only that there is a significant amountof pattern emergent between the two antibody types; it does not follow that the one antibody type is a pattern in the other. But the ensuing reproduction allows us to draw a somewhat stronger conclusion. Consider: if type A attacks type B, thus stimulating the production of more type A -- then what has happened? The original amounts of A and B, taken together, have served as a process for generating a greater amount of A. Is this process a pattern in the new A population? Only if one accepts that the type B destroyed was of "less complexity" than the type A generated. For instance, if two A's were generated for each one B destroyed, then this would seem clear. Thus, the conclusion: in at least some instances, antibody types can be patterns in other antibody types. But this cannot be considered the rule. Therefore, the immune system is not quite a fully cognitive system; it is a borderline case.

Or, to put it another way: the cognitive equation is an idealization, which may not be completely accurate for any biologically-based system. But it models some systems better than others. It models the immune system far better than the human heart or a piece of tree bark -- because the immune system has many "thought-like" properties. But, or so I will argue, it models the brain even more adeptly.

8.4. MIND AS A STRUCTURAL CONSPIRACY

I have said that mind is a self-generating system, and I have introduced a particular form of self-generating system called a "self-generating pattern dynamic." Obviously these two ideas are not unrelated. In this section I will make their connection explicit, by arguing that mind is a structural conspiracy -- an attractor for a self-generating pattern dynamic.

More specifically, I will argue that a dual network is a kind of structural conspiracy. The key to relating self-generating pattern dynamics with the dual network is the filtering operator Zt.

8.4.1. The Dual Network as a Structural Conspiracy

It is not hard to see that, with this filtering operation, an associative memory is almost a structural conspiracy. For nearly everything in an associative memory is a pattern emergent among other things in that associative memory. As in the case of multilevel control, there may be a few odd men out -- "basic facts"being stored which are not patterns in anything. What is required in order to make the whole memory network a structural conspiracy is that these "basic facts" be generatable as a result of some element in memory acting on some other element. These elements must exist by virtue of being patterns in other things -- but, as a side-effect, they must be able to generate "basic facts" as well.

Next, is the perceptual-motor hierarchy a structural conspiracy? Again, not necessarily. A process on level L may be generally expected to be a pattern in the products obtained by letting processes on level L-1 act on processes from level L-2. After all, this is their purpose: to recognize patterns in these products, and to create a pattern of success among these products. But what about the bottom levels, which deal with immediate sense-data? If these are present in Systemt, what is to guarantee they will continue into Systemt+1. And if these do not continue, then under the force of self-generating pattern dynamics, the whole network will come crashing down....

The only solution is that the lower level processes must not only be patterns in sense data, they must also be patterns in products formed by higher-level processes. In other words, we can only see what we can make. This is not a novel idea; it is merely a reformulation of the central insight of the Gestalt psychologists.

Technically, one way to achieve this would be for there to exist processes (say on level 3) which invert the actions taken by their subordinates (say on level 2), thus giving back the contents of level 1. This inversion, though, has to be part of a process which is itself a pattern in level 2 (relative to some other mental process). None of this is inconceivable, but none of it is obvious either. It is, ultimately, a testable prediction regarding the nature of the mind, produced by equation (*****).

The bottom line is, it is quite possible to conceive of dual networks which are not structural conspiracies. But on the other hand, it is not much more difficult, on a purely abstract level, to envision dual networks which are. Equation (*****) goes beyond the dual network theory of mind, but in an harmonious way. The prediction to which it leads is sufficiently dramatic to deserve a name: the "producibility hypothesis." To within a high degree of approximation, every mental process X which is not a pattern in some other mental process, can be produced by applying some mental process Y to some mentalprocess Z, where Y and Z are patterns in some other mental process.

This is a remarkable kind of "closure," a very strong sense in which the mind is a world all its own. It is actually very similar to what Varela (1978) called "autopoesis" -- the only substantive difference is that Varela believes autopoetic systems to be inherently non-computational in nature. So far, psychology has had very little to say about this sort of self-organization and self-production. However, the advent of modern complex systems science promises to change this situation.

8.4.2. Physical Attractors and Process Attractors

All this is quite unorthodox and ambitious. Let me therefore pause to put it into a more physicalistic perspective. The brain, like other extremely complex systems, is unpredictable on the level of detail but roughly predictable on the level of structure. This means that the dynamics of its physical variables display a strange attractor with a complex structure of "wings" or "compartments." Each compartment represents a certain collection of states which give rise to the same, or similar, patterns. Structural predictability means that each compartment has wider doorways to some compartments than to others.

The complex compartment-structure of the strange attractor of the physical dynamics of the brain determines the macroscopic dynamics of the brain. There would seem to be no way of determining this compartment-structure based on numerical dynamical systems theory. Therefore one must "leap up a level" and look at the dynamics of mental processes, perhaps represented by interacting, inter-creating neural maps. The dynamics of these processes, it is suggested, possess their own strange attractors called "structural conspiracies," representing collections of processes which are closed under the operations of patter-recognition and interaction. Process-level dynamics results in a compartmentalized attractor of states of the network of mental processes.

Each state of the network of mental processes represents a large number of possible underlying physical states. Therefore process-level attractors take the form of coarser structures, superimposed on physical-level attractors. If physical-level attractors are drawn in ball-point pen, process-level attractors are drawn in magic marker. On the physical level, a structural conspiracy represents a whole complex of compartments. But only the most densely connected regions of the compartment-network of the physical-level attractor can correspond to structural conspiracies.

Admittedly, this perspective on the mind is somewhat speculative, in the sense that it is not closely tied to the current body of empirical data. However, it is in all branches of science essential to look ahead of the data, in order to understand what sort of data is really worth collecting. The ideas given here suggest that, if we wish to understand mind and brain, the most important task ahead is to collect information regarding the compartment-structure of the strange attractor of the brain, both on the physical level and the process level; and

above all to understand the complex relation between the strange attractors on these two different levels.

8.5. LANGUAGE ACQUISITION

I have proposed that the mind is an attractor for the cognitive equation. But this does not rule out the possibility that some particular subsets of the mind may also be attractors for the cognitive equation, in themselves. In particular, I suggest that linguistic systems tend to be structural conspiracies.

This idea sheds new light on the very difficult psychological problem of language acquisition. For in the context of the cognitive equation, language acquisition may be understood as a process of iterative convergence toward an attractor. This perspective does not solve all the micro-level puzzles of language acquisition theory -- no general, abstract theory can do that. But it does give a new overarching framework for approaching the question of "how language could possibly be learned."

8.5.1. The Bootstrapping Problem

The crucial puzzle of language acquisition theory is the "bootstrapping problem." What this catch phrase means is: if all parts of language are defined in terms of other parts of language, then where is the mind to start the learning process?

Consider the tremendous gap between the input and the output of the language learning process. What a child is presented with are sentences heard in context. Gradually, the child's mind learns to detect components and properties of these sentences: such things asindividual words, word order, individual word meanings, intonation, stress, syllabic structure of words, general meanings of sentences, pragmatic cues to interpretation, etc. All this is just a matter of correlating things that occur together, and dividing things into natural groupings: difficult but straightforward pattern recognition.

But what the child's mind eventually arrives at is so much more than this. It arrives at an implicit understanding of grammatical categories and the rules for their syntactic interrelation. So the problem is, how can a child determine the relative order of noun and verb without first knowing what "nouns" and "verbs" are? But on the other hand, how can she learn to distinguish nouns and verbs except by using cues from word order? Nouns do not have a unique position, a unique intonation contour, a unique modifier or affix -- there is no way to distinguish them from verbs based on non-syntactic pattern recognition.

The formal model of language given in Chapter Five makes the bootstrapping problem appear even more severe. First of all, in the definition of "syntactic system," each word is defined as a fuzzy set of functions acting on other words. How then are words to be learned, if each word involves functions acting on other words? With what word could learning possibly start? Yes, some very simple words can be partially represented as functions with null argument; but most words need other words as arguments if they are to make any sense at all.

And, on a higher level of complexity, I have argued that syntax makes no sense without semantics to guide it. No mind can use syntax to communicate unless it has a good understanding of semantics; otherwise, among other problems, the paradoxes of Boolean logic will emerge to louse things up. But on the other hand, semantics, in the pattern-theoretic view, involves determining the set of all patterns associated with a given word or sentence. And the bulk of these patterns involve words and more complex syntactic structures like phrases and clauses: this is the systematicity of language.

No syntax without semantics, no semantics without syntax. One cannot recognize correlations among syntactic patterns until one knows syntax to a fair degree. But until one has recognized these correlations, one does not know semantics, and one consequently cannot use syntax for any purpose. But how can one learn syntax at all, if one cannot use it for any purpose?

Chomsky-inspired parameter-setting theories circumvent this chicken-and-egg problem in a way which iseither clever, obvious or absurd, depending on your point of view. They assume that the brain has a genetically-programmed "language center," which contains an abstract version of grammar called Universal Grammar or UG.

UG is understood to contain certain "switches" -- as a switch which determines whether nouns come before or after verbs, a switch which determines whether plurals are formed by affixes or by suffixes, and so on. The class of possible human syntaxes is the class of possible switch settings for UG; and language learning is a process of determining how to set the switches for the particular linguistic environment into which one has been born.

The parameter-setting approach simplifies the bootstrapping problem by maintaining that syntaxes are not actually learned; they are merely selected from a pre-arranged array of possibilities. It leaves only the much more manageable problem of semantic bootstrapping -- of explaining how semantic knowledge is acquired by induction, and then combined with UG to derive an appropriate syntax. Some theorists, however, consider the whole parameter-setting approach to be a monumental cop-out. They stubbornly maintain that all linguistic knowledge must be induced from experience. In other words, to use my earlier example, first the child gets a vague idea of the concept of "noun" and "verb"; then, based on this vague idea, she arrives at a vague idea of the relative positioning of nouns and verb. This inkling about positioning leads to a slightly strengthened idea of "noun" and "verb" -- and so forth.

In general, according to this view, the child begins with very simple grammatical rules, specific "substitution frames" with slots that are labeled with abstract object types; say "NOUN VERB" or "NOUN go to NOUN" or "NOUN is very ADJECTIVE". Then, once these simple frames are mastered, the child induces patterns among these substitution frames. "NOUN eats NOUN," "NOUN kills NOUN," "NOUN tickles NOUN," etc., are generalized into NOUN VERB NOUN. Next, more complex sentence structures are built up from simple substitution frames, by induced transformational rules.

In the inductivist perspective, bootstrapping is understood as a difficult but not insurmountable problem. It is assumed that the 1010 - 1012 neurons of the human brain are up to the task. Parameter-setting theorists have a more pessimistic opinion of human intelligence. But the trouble with the whole debate is that neitherside has a good overall concept of what kind of learning is taken place.

In other words: if it's inductive learning, what kind of structure does the induction process have? Or if it's parameter setting, what is the logic of the process by which these "parameters" are learned -- how can this mechanistic model be squared with the messiness of human biology and psychology? In short, what is the structure of linguistic intelligence? My goal in this section is to suggest that the cognitive equation may provide some hints toward the resolution of this conceptual difficulty.

8.5.2. Process-Network Theories of Language Learning

The dual network model suggests that language learning must be explicable on the level of self-organizing, self-generating process dynamics. This is something of a radical idea, but on the other hand, it can also be related with some of the "mainstream" research in language acquisition theory. And, I will argue, it provides an elegant way of getting around the bootstrapping problem.

8.5.2.1. Constraint Satisfaction Models

Perhaps the most impressive among all parameter-setting theories is Pinker's (1987) constraint satisfaction model. Initially Pinker wanted to model language learning using a connectionist architecture a la Rumelhart and McClelland (1986). But this proved impossible; and indeed, all subsequent attempts to apply simple "neural networks" to symbolic learning problems have been equally fruitless.

So instead, Pinker borrowed from artificial intelligence the idea of a self-adjusting constraint satisfaction network. The idea is that language acquisition results from the joint action of a group of constraint satisfaction networks: one for assigning words to categories, one for determining grammatical structures, one for understanding and forming intonations, etc.

Consider, for instance, the network concerned with grammatical structures. Each node of this network consists of a rule prototype, a potential grammatical rule, which has its own opinion regarding the role of each word in the sentence. The dynamics of the network is competitive. If the sentence is "The dog bit the man," then one rule might categorize "The dog" as subjectand "bit the man" as verb phrase; another might categorize "The dog bit" as subject and "the man" as verb phrase. But if a certain rule prototype disagrees with the majority of its competitors regarding the categorization of a word, then its "weight" is decreased, and its opinion is counted less in the future.

The behavior of the network gets interesting when rules agree regarding some categorizations and disagree regarding others. The weights of rules may fluctuate up and down wildly before settling on an "equilibrium" level. But eventually, if the rule network is sufficiently coherent, an "attractor" state will be reached.

If there were no initial knowledge, then this competitive process would be worthless. No stable equilibrium would ever arise. But Pinker's idea is that the abstract rules supplied by UG, combined with rudimentary rules learned by induction, are enough to ensure the convergence of the network. This is a fancy and exciting version of the "parameter-setting" idea: parameters are not being directly set, but rather UG abstractions are being used to guide the convergence of a self-organizing process.

8.5.2.2. Competition Models

An interesting counterpoint to Pinker's network model is provided by the evolutionary approach of Bates and MacWhinney (1987). They present cross-linguistic data suggesting that language learning is not a simple process of parameter-setting. Children learning different languages will often differ in their early assumptions about grammar, as well as their ultimate syntactic rule structures. Furthermore, the passage from early grammar to mature grammar may be an oscillatory one, involving the apparent competition of conflicting tendencies. And different children may, depending on their particular abilities, learn different aspects of the same language at different times: one child may produce long sentences full of grammatical errors at an early stage, while another child may first produce flawless short sentences, only then moving on to long ones.

These observations disprove only the crudest of parameter-setting theories; they do not contradict complex parameter-setting theories such as Pinker's constraint satisfaction network, which integrates UG with inductive rule learning in a self-organizational setting. But they do suggest that even this kind of sophisticatedparameter-setting is not quite sophisticated enough. The single-level iteration of a constraint satisfaction network is a far cry from the flexible multilevel iterations of the brain.

What Bates and MacWhinney propose is a sort of "two-level network" -- one level for forms and another for functions. Form nodes may be connected to function nodes; for example, the form of preverbal positioning in English is correlated with the function of expressing the actor role. But there may also be intra-level connections: form nodes may be connected to other form nodes, and function nodes to other function nodes.

In their view, mappings of a single form onto a single function are quite rare; much more common is widely branching interconnection. For instance, they argue that

"subject" is neither a single symbol nor a unitary category. Rather, it is a coalition of many-to-many mappings between the level of form (e.g. nominative case marking, preverbal position, agreement with the verb in person and number) and the level of function (e.g. agent of a transitive action, topic of an ongoing discourse, perspective of the speaker)....

Notice that the entries at the level of form include both "obligatory" or "defining" devices such as subject-verb agreement, and "optional" correlates like the tendency for subjects to be marked with definite articles. This is precisely what we mean when we argue that there is no sharp line between obligatory rules and probabilistic tendencies.

Learning is then a process of modifying the weights of connections. Connections that lead to unsatisfactory results have their weights decreased, and when there is a conflict between two different nodes, the one whose connection is weighted highest will tend to prevail.

8.5.2.3. Summary

Bates and MacWhinney, like Pinker, view language learning as largely a process of adjusting the connections between various "processes" or "nodes." While this is not currently known to be the correct approach to language acquisition, I submit that it is by far the most plausible framework yet proposed. For Neural Darwinism teaches us that the brain is a networkof interconnected processes, and that learning consists largely of the adjustment of the connections between these processes. The process-network view of language acquisition fits quite neatly into what we know about the brain and mind.

And the question "UG or not UG," when seen in this light, becomes rather less essential. What is most important is the process dynamics of language learning. Only once this dynamics is understood can we understand just how much initial information is required to yield the construction of effective linguistic neural maps. Perhaps the inductivists are right, and abstract cognitive abilities are sufficient; or perhaps Chomsky was correct about the necessity of pre-arranged grammatical forms. But one's opinion on this issue cannot serve as the basis for a theory of language acquisition. The process-network view relegates the innate-vs.-acquired debate to the status of a side issue.

8.5.3. The Cognitive Equation and Language Learning

So, language learning is largely a process of adjusting the weights between different processes. But how are these processes arrived at in the first place? Some of them, perhaps, are supplied genetically. But many, probably most, are learned inductively, by pattern recognition. This gives rise to the question of whether a language is perhaps a structural conspiracy.

The above discussion of "bootstrapping" suggests that this may indeed be the case. Parts of speech like "nouns" and "verbs" are patterns among sentences; but they are only producible by processes involving word order. On the other hand, rules of word ordering are patterns among sentences, but they are only producible by processes involving parts of speech.

Bootstrapping states precisely that, once one knows most of the rules of syntax, it's not hard to induce the rest. Suppose one assumes that the processes bearing the rules of language all

1) possess modest pattern-recognition capacities, and

2) are programmed to recognize patterns in sentences

Given this, it follows from the bootstrapping problem that any portion of a mind's linguistic system is capable of producing the rest, according to the dynamics of the cognitive equation. In other words, it follows that language is an attractor, a structural conspiracy.

And if one accepts this conclusion, then the next natural step is to view language learning as a process of convergence to this attractor. This is merely a new way of conceptualizing the point of view implicit in the work of Pinker, Bates, MacWhinney, and other process-network-oriented acquisition theorists. These theorists have focused on the dynamics of already-existing networks of linguistic rules; but as Pinker explicitly states, this focus is for sake of simplicity only (after all, rule-bearing processes must come from somewhere). The cognitive equation shifts the focus from connection adjustment to process creation, but it does not alter the underlying process-network philosophy.

The learning process starts with an initial collection of syntactic rules -- either simple substitution rules picked up from experience, or randomly chosen specific cases of abstract UG rules, or a combination of the two. Then each rule-bearing process recognizes patterns -- among incoming and outgoing sentences and its companion processes.

This recognition process results in the production and comprehension of sentences, via its interaction with outside perceptual and motor processes, and the associative memory network (recall the intimate connection between syntax and semantics, discussed in Chapter Five). But internally, it also leads to the creation of new processes ... which aid in the production and comprehension of sentences, and in the creation of processes.

And this process is repeated until eventually nothing new is generated any more -- then an attractor has been reached. Language, a self-sustaining mental system, has been learned.