From Complexity to Creativity -- Copyright Plenum Press, © 1997 |
Part II. Formal Tools for Exploring Complexity
CHAPTER 6. EVOLUTION AND DYNAMICS
CHAPTER SIX
EVOLUTION AND DYNAMICS
In this chapter we will return to the genetic algorithm, which was introduced in Chapter One. The relevance of the genetic algorithm to the psynet model has already been established -- GA's, it seems, are an abstract, archetypal model of a certain type of psychological creativity. Here we will be concerned with genetic algorithms as dynamical systems, and with the use of genetic algorithms to evolve other dynamical systems. Rather than merely cranking out genetic-algorithm applications, the focus is on understanding what the genetic algorithm is, what it can do, and why it is relevant to human and computer creativity.
We will begin by studying the mathematical dynamics of the simple GA. New mathematical tools will be introduced for studying the nature of evolutionary dynamics. Using these new methods, two important conclusions will be reached. First, it will be shown that, while GA's cannot be expected to converge to a globally optimal population, nevertheless, as population size tends to infinity, this kind of convergence gets closer and closer to happening. Finite population size is a generator of diversity, but a distractor from ultimate convergence.
Next, the convergence rate of the GA will be studied. An equation will be derived which suggests a new way of looking at the GA -- as a kind of progressively focussing stochastic search. The GA with crossover, it seems, displays a pattern of learning similar to that observed in humans and animals: a phase of rapid initial learning followed by slower phase of knowledge refinement.
A GA with mutation only gives the "wrong" shaped learning curve; a GA with crossover gives the "right" shape." This mathematical result, which is borne out by GA practice, is extremely intriguing from a psychological point of view. It suggests very strongly that, if we are to model minds as evolutionary learning systems, we must include crossover operations in the mix.
Finally, it is noted that evolution itself does not provide a complete explanation of any real-world complex system. There is always another side to the story -- one which represents a force of order-maintenance, as opposed to evolution's force of change and adaptation. In evolutionary biology this other side is called ecology. Ecology expressed the interrelations of things: it both constrains evolution and gives it power. In a psychological context, I will usually call this "other side" autopoiesis. The idea is the same. Evolution provides innovation, creativity. Autopoiesis keeps things alive, and generates emergent pattern binding different parts of systemstogether. The two processes are two different views of a unified life-process.
In order to get at the full picture, we will describe the Simple Evolving Ecology (SEE) model, a genetic algorithm generalization which is flexible and robust enough to be considered as a serious model of mind. SEE is a spatially distributed GA with ecological effects and a dynamically evolving spatial structure. Experimentation with SEE has just begun. Initial experiments, however, indicate that the behavior of SEE has a great deal to teach us regarding the behavior of complex psychological systems, and complex systems in general.
The accomplishments of the GA as an optimization and machine learning algorithm have been described in many other places, and I feel no need to review them in detail here. Instead I will report several experiments using the GA to evolve complex structures -- in particular, fractal structures. The GA will be used to evolve strange attractors for plane quadratic maps; and to evolve invariant measures for iterated function systems, which are interpreted as pictures or music.
The details of these applications have various connections to ideas described in other chapters. For instance, to understand the dynamics of the GA as it evolves strange attrctors, we will find it convenient to introduce the notion of autopoiesis; and to invoke the idea of "second-order evolution," as discussed in the previous chapter. In order to use the GA to produce musical melodies, we will need to arrive at a method for computationally assessing the "fitness" or "quality" of different melodies. As it turns out, a strategy very similar to the Chaos Language Algorithm will come in handy here.
These experiments may be considered in purely computational terms; or, speaking quite loosely, they may be interpreted in terms of cognitive science. One psychological application of the GA is to learning theory -- one wants to hypothesize that when we solve optimization problems of various sorts, we are using crossover and mutation operations on a pool of possible answers. Another, perhaps more interesting, application is to creativity. Here one is not trying to use the GA to simulate the process of answering a question, but rather to simulate the process of coming up with new forms. This is where the present experiments come in. One is, quite simply, trying to get the GA to come up with interesting things.
In traditional applications, where one uses the GA to solve problems, one is not exploiting the full information available in the GA population at a given time. One is extracting the "best" answer from the population and ignoring the many other interesting structures contained therein. In these creativity-oriented applications, on the other hand, the full population is used. As in real biological and psychological systems, the full efficiency of the evolutionary process is exploited.
6.2 THE DYNAMICS OF THE GENETIC ALGORITHM
Dynamical systems models are often categorized based on thecriterion of discreteness versus continuity. There are continuous systems, such as differential equations. There are discrete-time, continuous-space systems, i.e. dynamical iterations such as the Baker map. And there are wholly discrete systems -- such as the evolution game described above, and the genetic algorithm.
Each of the two varieties of dynamical system has its advantages and disadvantages. Most continuous dynamical systems of practical interest are too complex to study analytically; and so, in the study of continuous dynamical systems, one quickly resorts to computer simulations. One replaces one's continuous system with a nicely space- and time- discrete dynamical system. But even where they are intractable, continuous models are still often useful as guides for the intuition. For instance, differential equations models are naturally appropriate to physics, because the basic conceptual models of physics involve continuously varying quantities like position, momentum and energy.
In areas like population biology and cognitive science, on the other hand, the basic conceptual models tend to involve discrete entities: organisms, species, thoughts, neural assemblies,.... So one is naturally working with discrete models, such as the genetic algorithm. The disadvantage of working with discrete dynamical systems, however, is that, if one does wish to do mathematical analysis, the tools of calculus are not directly available. Computation is simple and immediate, but the extra understanding that comes with mathematical theory is difficult to come by. There are three options: to give up on analytical understanding, to struggle with discrete mathematics despite its limitations, or to look for ways to approximate discrete models by continuous systems, which are more easily analyzable.
In this section I will take the third option. I will describe some recent research in which the genetic algorithm, a discrete dynamical system, is approximated by a continuous dynamical system called the IMGA, and studied in this way. First I will present a convergence theorem for the simple GA with very large population size. Then I will turn to the question of how fast the GA converges. I will describe mathematical results which suggest that the GA follows a learning curve similar to that demonstrated by humans and animals: a power law, with rapid improvement at first followed by slower learning afterwards. This is a suggestive (though far from conclusive) piece of evidence that the crossover operation has something to do with biological learning.
Convergence, Convergence Rate, and the IMGA
Davis and Principe (1993) have shown, using methods from the theory of Markov chains, that the simple GA cannot, for any finite population size, be guaranteed to converge to a globally optimal population. However, they also present numerical results suggesting that, as population size tends to infinity, convergence to a globally optimal population becomes more andmore likely. I have shown, in a recent paper, that this is in factthe case: that convergence to a globally optimal population holds in the limit as population size tends to infinity. This proof was obtained in the manner described above: by approximating the GA with a continuous iteration.
This continuous approximation to the GA is called the IMGA or "iterated mean genetic algorithm." It is a real nonlinear iteration which approximates the behavior of the GA arbitrarily closely as population size increases infinity. Because it is a differentiable function, one may treat it by the methods of calculus: one may calculate the Jacobian derivative of the IMGA, which turns out to be an exceedingly interesting quantity.
The first study of this determinant, given in an appendix to The Evolving Mind (Goertzel and Ananda, 1993) and in the Proceedings of the COMPLEX94 Australian Complex Systems Conference (Goertzel, Ananda and Ikle', 1995), was marred by some persistent calculational and typographical errors. A corrected and completed treatment was presented at the 1995 IEEE International Conference on Evolutionary Computing, and is given in detail in (Bachman, Goertzel and Ikle', 1996).
To study convergence, the eigenvalues of the Jacobian are evaluated for the case of a globally optimal population. In the case of no mutation and a fitness function with a unique global maximum, these eigenvalues are bounded by 1 in magnitude, thus showing that a globally optimal population is an attractor for the IMGA. Furthermore, the eigenvalue formulas yield a bound for the asymptotic rate of convergence of the GA for large population size.
To study rate of convergence, on the other hand, we will deal with the Jacobian determinant. By a remarkable combinatorial circumstance, it is possible to give a simple formula for this determinant that holds at all points on the trajectory of the IMGA, not just in the vicinity of the answer. This formula gives valuable hints as to the behavior of the GA. In words, what it tells us is that: The fitter the population gets, the slower the GA converges.
Mathematical Apparatus
Throughout this section, we will consider a simple genetic algorithm, with a population of size N, consisting of M-bit binary sequences, evolving with non-overlapping generations, and selection probability proportional to unscaled fitness. For the purpose of the convergence proof we will set the mutation rate equal to zero. Finally, where f is the nonnegative-valued fitness function, we will use the notation Ai = f(i).
To facilitate the expression of formulas involving crossover, we will represent a crossover operator as a collection ofprobabilities: Pkli will denote the probability of crossing bit string k with bit string l to obtain bit string i. If the equations
Pjij + Pijj <= 1
Pjji = 0 (1)
hold whenever i and j are unequal, then we will say that we aredealing with a true crossover operator. It is easy to see that the case of crossover at a single randomly-selected cut-point falls into this category.
Suppose that pi gives the proportion of bit strings of type i in generation t-1. Then it is not hard to see that the expected proportion of type i in generation t is given by
Phi = ( pkpl AkAl Pkli ) / ( prAr)2 (3)
Where p = p1,...,pK), where K = 2M. The iterated mean genetic algorithm (IMGA) is defined by the formula:
pt = Phi (pt-1) (3)
where the initial vector p0 represents the vector of proportions derived from the fixed initial population.
In order to show the validity of this approximation, one may represent the GA as a mapping from vectors of probabilities into probability measures on the space of probability vectors, sigma: [0,1]K --> P([0,1]K). This representation is exploited extensively by Davis and Principe in their Markov chain analysis of the simple GA. Where sigmar denotes the mapping of this sort resulting from r generations of the GA, the following lemma is easily obtained.
Lemma. Fix an initial vector of proportions p, a number of iterations t, an epsilon > 0 and a theta < 1. Then there exists a number N' so that, whenever the population size N exceeds N',
Pr{ |pt - sigmat(p)| < epsilon } > theta (4)
Though the function Phi is a perfectly accurate representation of the iterated mean path, it is sometimes useful to introduce a different representation, one which takes into account the implicit contraint that all the arguments of Phi must sum to one. By replacing Phi with the function Psi defined by
Psi(p1,...,pK) = Phi(1-p1- ... - pk-1,pk+1,...,pK) (5)
one obtains a function which acts on an entire hyperoctant of (2M-1)-dimensional space. (The zeroth coordinate has been singled out for definiteness; obviously any coordinate could have been chosen.)
The results to be described here regard the Jacobian derivative of the functions Phi and Psi, as given by the following formulas:
Phi'ij = Aj ( pkpl AkAl (Pkji + Pjki - Pkli) ) / ( prAr )3
Psi'ij = Phi'ij - Phi'i0
(6)
Convergence Result for the IMGA
To establish the convergence of the IMGA from a sufficiently good initial population, it suffices to compute the eigenvalues of the matrix Phi'[e_m], where em is apopulation vector corresponding to an optimal population. These eigenvalues are all less than one in magnitude, thus yielding the desired result.
Theorem. Assume the IMGA with zero mutation rate. Suppose that m is the unique global maximum of f over {0,...,N}. Let em denote the i'th unit vector of length N+1 (i.e., the vector indexed from 0 to N with a 1 in the i'th position and 0's in all other positions).
Then the eigenvalues of Phi'[em] are given by the formula
lambda0 = 0
lambdai = ( Aj/Am ) (Pmji + Pjmi), i=1,...,N (7)
As a consequence, the vector em is an attracting fixed point for the iteration Phi. The asymptotic convergence rate is bounded by the magnitude of the largest eigenvalue.
The magnitude of the largest eigenvalue gives a bound on asymptotic convergence rate; and the form of this bound has avery clear intuitive meaning. One has (Aj/Am )d(i,m) where d(i,m) = Pmii + Pimi is a measure of the proximity between i and m (d(i,m) is largest when i=m, smallest when i and m are Boolean complements, and grades fairly smoothly from the one extreme to the other). Thus, convergence is slowest when there is a genotype with fitness close to the fitness of m, which is also close to m. A fit genotype which is not so close to m matters less, as does a nonfit genotype which is close to m.
This result substantiates the observation of Davis and Principe that, as population sizes get larger and larger, the GA gets closer to demonstrating typical convergence behavior. More philosophically, it illustrates a fundamental difference between discrete and continuous dynamical systems. The discrete system here, the GA, illustrates a fundamental diversity that is absent from its continuous approximant, the IMGA. By averaging out the GA, the finite-size variations of particular populations are lost, and only the overall trend toward the optimum remains. This is good from an optimization perspective, but it may not be good from all perspectives. In real-world systems, there may be some value in the obstinate refusal to converge displayed by finite-size evolutionary dynamics. Convergence is not necessarily always a more important value than diversity.
On the Rate of Learning
At present no method for computing the eigenvalues of Phi' or Psi' away from the unit vectors ei has been discovered. However, if one assumes crossover at a single randomly chosen cut-point, then it is possible to compute the determinant of Psi' for the general case (the determinant of Phi' is of course zero). This determinant is given by
Det Psi' = cM (Ai) / ( pi Ai ) K (8)
where
cM = rV / (M+1)K-M-1 , V=(r-1)2M-r , M>1
c1 = 1
This is a remarkably elegant and intuitively meaningful formula; unfortunately, the only known proof is quite lengthy and complicated.
The assumption of zero mutation rate is convenient but not crucial to this formula. To incorporate a mutation rate it suffices to consider the GA with mutation as a two-stage process: one stage of true crossover and another stage of mutation. The mutation stage may be represented as a constant linear operator on
probability vectors, namely
M(p)j = P1jp1 + ... + Pnjpn (9)
where Pij denotes the probability of i mutating into j.
The dimension of this matrix may be reduced just as with Phi.
According to the chain rule, the Jacobian of the two-stage GA is then given by the product of the determinant of this mutation matrix with the determinant of the crossover Jacobian. Thus mutation merely serves to add an extra constant outside the formula of the theorem. For most mutation schemes, the mutation matrix is only singular for finitely many values of the mutation rate, so that the constant will rarely be zero (numerical calculations verify that this is true for independent point mutation with M<6).
As noted above, this determinant formula has interesting philosophical implications. The numerator simply scales the determinant by a power of the geometric mean of the fitness function. And the denominator decreases the determinant whenever the fitness increases; and vice versa. The quantity |Det Phi'| measures the proportion by which Psi stretches areas in probability vector space; thus it is a rough measure of the scope of the search carried out by the genetic algorithm. The theorem suggests that genetic algorithms accomplish their remarkably effective optimization by the simple strategy of adaptively narrowing the scope of their crossover-based statistical search, based mainly on the average fitness of the population, and not on the specific makeup of the population. This is a new and powerful idea.
The fact that crossover gives a fitness-dependent Jacobian rate, while mutation gives a constant Jacobian, is also highly significant. What it says is that mutation is an inherently inferior form of adaptation. Mutation does not close in closer and closer on the answer, it just keeps moving toward the answer at the same rate. This less sophisticated methodology is reflected in practice in the poorer performance of mutation-based GA systems. Crossover is needed to give the GA its initial burst of creativity, which brings it into the neighborhood of efficient problem solutions.
Discussion
The concept of an "infinite population size GA" may seem somewhat vexatious. After all, an infinite population is guaranteed to contain infinitely many copies of the optimal genotype! However, as the numerical results of Davis and Principe show, the behavior of the simple GAapproaches the infinite population size behavior rather quickly as population size increases. Thus, while the IMGA is indeed an infinite-population-size approximation to the GA, it may just as correctly be thought of as a deterministic nonlinear iteration which intuitively reflects the behavior of the GA.
Essentially, in biological terms, the IMGA is a "no genetic drift" approximation. The convergence result given here shows that, in the absence of mutation and random genetic drift, evolution would indeed converge. At least in the case of the simple GA, it is not only mutation but also genetic drift that plays a central role in preserving the diversity of evolving populations.
On the other hand, the determinant theorem shows how convergence proceeds. Fast at first, when fitness is low. Slow later on, when fitness is high. This is what the IMGA formulas say; it is what practical GA experience shows. And it is also, tantalizingly enough, what is seen in human and animal learning. The famous "power law" of learning shows that, when approached with a new situation, we learn very quickly at first, and our learning then slows down once we reach a certain level. This is exactly the pattern shown by the GA with crossover -- but not the GA with mutation. The GA with mutation shows a very un-biological linear learning curve. This observation does not prove anything conclusively, but it is extremely suggestive. It says that, if we believe learning is evolutionary, then one should believe learning involves crossover, rather than just mutation.
6.3 THE SIMPLE EVOLVING ECOLOGY (SEE) MODEL
The simple genetic algorithm is a beautiful and, as the name suggests, simple model. It captures the essence of crossover-based natural selection in a way that is highly amenable to computer simulation and is, to a lesser extent, amenable to mathematical analysis as well. However, as pointed out in the chapter introduction, it is an incomplete model in that it leaves out ecology. This omission will become glaringly apparent in later chapters as we attempt to use the genetic algorithm as a metaphor for more complex psychological processes. In a psychological context, a metaphor for evolution which lacks ecology will come up short far too often.
It is therefore interesting to seek a minimal formal model of evolution plus ecology -- a model which does for evolving ecology what the simple GA does for evolution. The goal of this section is to describe such a model, called the Simple Evolving Ecology model. Computer implementation of the model is currently underway, in collaboration with Dr. Matthew Ikle'. Here we willdescribe the basic structure of the model and mention the qualitative nature of some preliminary numerical results.
In essence, SEE is a genetic algorithm living on a graph. Each node of the graph contains a certain population, which evolves within the graph according to the GA. Unfit individuals at a node of the graph can move to neighboring nodes in search of more favorable conditions. The fitness of an individual is determined, not only by an "objective" fitness function native to each node, but also by how the individual "fits in" with its neighbors. This combination of spatial structure, GA evolution and ecology gives the model the possibility to give rise to complex emergent structures quite different in character from the emergent structures observable in a simple dynamical system like the GA.
I will argue that SEE, unlike the GA, has the potential to serve as a minimal computational model of mind. Evolution alone is not enough to give rise to the full array of emergent psychological structures. Evolution plus ecology is.
Specification of the Model
I will now give a formal specification of the SEE model.
Geometry . Consider a weighted graph G, called a "world-graph." As a first approximation G may be taken to be a two-dimensional lattice. Each node of the world-graph G is called a "cell." The set of all cells, the node set of G, is called W. Each cell C has a neighborhood N(C), which is a certain connected subgraph of W, containing C. It may be useful to distinguish different neighborhoods for different purposes, e.g. N1(C), N2(C)....
Each cell C contains a certain population P[C] of entities called "agents." These agents are drawn from some space S, e.g. the space {0,1}n of binary sequences of length n. One may also speak about the population of a collection of cells, e.g. the population P[N(C)] of a neighborhood of C. Where time-dependence needs to be emphasized, we may speak of P[C;t] to indicate the population of cell C at time t. Also, N = NC = NC;t will denote the number of elements in P[C;t]; i.e., the population size.
Fitness, Environmental and Ecological. This gives the basic geometry of the model. Next the evolutionary aspect of the model must be specified. For this purpose, each cell C is endowed with two real-valued functions: an environmental fitness function, and an ecological fitness function. These two fitness functions embody the opposing principles of evolution and ecology, or adaptation and autopoiesis.
The environmental fitness function fenv;C: SN -> R is defined separately for each cell, and represents the physical surroundings in that cell. If agent x lives in cell C, fC(x) is called its "environmental fitness."
The function fec;C: P[Nec(C)] x S -> R, called the "ecological fitness," is defined so that the quantity fec;C(x) determines the degree of "fit" between x and the other entities in Nec(C). The neighborhood Nec may be taken equal to C, or it may be larger.
item The "total fitness" of an agent x living at cell C is then the weighted sum of its environmental and ecological fitnesses,i.e. f(x) = c fenv;C(x) + (1-c)fec;C(x). The weight c is a global system parameter.
Interaction. The fitness functions specify how agents in the model are to be judged. It remains to specify how they are to interact. This is intentionally left quite general. An "action operator" is defined as a function r: Sm -> S for some integer m. The system as a whole is endowed with a collection of action operators, R = {ri,i=1,...,J}. The ri may be reproductive operators, as in mutation, with m=1, or crossover, with m=2. Or they may be operators which allow the agents to rewrite each other, as in classifier systems. Each operator has a certain real-number "importance" I[ri], which determines how often it will be utilized.
The dynamics within a single cell, at a given time, are as follows. First, a stage of selection according to fitness. Then, a stage of mutual interaction, leading to the creation of new agents within the cell.
In the selection stage, the population of the cell at time t, P[C;t], is replaced by an interim population Q consisting of elements of P[C;t] with different frequencies. item In Q, the frequency of an agent x is approximately proportional to its total fitness.
In the interaction stage, the elements of Q are chosen at random to interact with each other. The process is as follows. First an action operator is chosen. The probability of selecting operator ri is proportional to its importance I[ri]. Then, where m is the arity of ri, m agents are chosen at random from Q. The product of ri, applied to these m agents, is placed in the new population P[C;t+1]. This process is repeated N times, at which point the new population P[C;t+1] is full.
In the most general case, each edge (Ci,Cj) of the world-graph G may be understood to have a weight with two components, one component (wij) determining the base migration rate from Ci to Cj, the other (wji) determining the base migration rate from Cj to Ci. The sume of the wij, for fixed i, cannot exceed 1.
Migration. Finally, each agent x has a certain "propensity to migrate," mu(x), which is a monotone function of the rate of change of its fitness over the last g generations. If the total fitness of x has increased over this time period, then mu(x) is nonnegative. If the total fitness of x has decreased over this period, then mu(x) is nonpositive. E.g., in the simplest instance, one might set mu(x)=1 if the fitness of x has gone up since the last generation, mu(x)=-1 if the fitness of x has just gone down, and mu(x)=0 otherwise. Or, mu(x)=0 uniformly is also permissible. Newly created agents are assumed to have mu(x)=0.
The sum of mu(x) over all elements in a cell Ci is called Etai. The total migration rate from Ci to Cj is then given by Muij = wij + a Etai, where a is a constant. The constant a must be chosen so that the sum of Muij for fixed i does not exceed 1.
The actual migration from Ci to Cj is carried out as follows. An agent is chosen from P[Ci], where the probability of x being selected is increasing with respect to mu(x). This agent is then removed from P[Ci] and placed in P[Cj]. This process is repeated until N[Ci] Muij agents have been moved.
The weights wij may change over time. This represents a kind of "ecological learning" -- a selection of the optimalenvironment. For instance, one might decrease or increase wij based on whether the agents that pass through it tend to decrease or increase their fitness.
Initially, the weights may be set at random; or they may be set hierarchically, in such a way as to decompose G into regions within regions within regions. For instance, in a 32 x 32 lattice, one might have four 16 x 16 lattices separated by weights of .1, each consisting of four 8 x 8 lattices separated by weights of .2, each consisting of four 4 x 4 lattices separated by weights of .3.
Emergent Structures
At the time this book is being written, computational experimentation with SEE is just beginning. A major problem has been the design of an appropriate user interface for observing the dynamics of such a complex system. Long tables of numbers are not particularly intuitive. To study the dynamics of the genetic algorithm, one generally looks at episodic "dumps" of the whole population; but when there are a number of cells each containing their own population, these dumps become prohibitively complex. The Tcl/Tk scripting language has been used to build an interactive interface, in which the world-graph G is represented graphically on the screen, and the user can track one or more selected cells at a given time by clicking on their icons.
Up to this point, experimentation with SEE has taken place primarily with the ecological fitness function set to zero. Also, we have assumed agents modeled by bit strings, reproducing by one-cut-point crossover with mutation as in the simple GA. In this case, what one has is a spatially distributed GA, with the possibility for migration. In this circumstance, SEE leads to fixed point attractors, with the population distribution determined by the fitness functions at the different cells. In many instances one notes extreme sensitivity to initial conditions, whereby a very slight change in the initial population can cause much of the population to migrate into one cell rather than another. In general, there seem to be a variety of different network-wide attractors, spaced fairly close together in the state space.
In order to get more interesting dynamics out of SEE, one has to introduce ecological fitness. One can still get fixed-point dynamics here -- but one can also get total, unbridled chaos. As usual in such cases, the interesting behaviors are found by setting the parameters between those values that lead to boring behavior (fixed point) and those that lead to excessively chaotic behavior. The key goal, initially, is to get interesting large-scale emergent structures to arise in the network. Having gotten these structures to arise, one can then ask how often they will arise -- how large their basins are in the overall state space of the system.
In the preliminary experiments done to date, with the simple GA-like bit string model, the most striking result is the emergence of continuous maps across the network, whereby the agents in each cell tend to be similar to the agents in neighboring cells (in terms of the Hamming metric). The reasonfor the emergence of this associative structure is obvious. If two neighboring cells have very different populations, then whenever an individual migrates from one to the other, it will be at severe risk of dying. It will bring down the fitness of the population of the new cell. On the other hand, if two neighboring cells have similar populations, then migration will be more likely to yield individuals who are successful in their new homes. Co-evolution of groups of neighboring cells means that the fitness of both cells will be higher if the populations "cooperate" by breeding similar populations.
It was hoped to also find an hierarchical attractor structure, whereby distinctive, large-basined attractors emerge on an hierarchy of levels (e.g., in a lattice world-graph, 2 x 2 squares, 4 x 4 squares, etc.). This has not yet been seen to emerge on its own. If it is elicited by means of an hierarchical migration structure, however, such an attractor structure does emerge and will preserve itself. The regions delineated by the migration coefficients will become dynamically separate as well as geographically separate.
These results need to be validated by further experimentation. Different kinds of emergent structures will also doubtless be observed. Looking further toward the future, one can see a wide range of possibilities for the SEE model.
For example, it is quite possible to envision SEE as a learning system. Certain cells may be distinguished as "input" cells, others as "output" cells. The other cells are "processing" cells. In a machine learning context, the goal of the system is to map inputs into appropriate outputs. I.e., at time t the fitness of elements in the output cells are judged based on whether they are an appropriate response to the input cells at time t-s, for some fixed delay s. This may be viewed as a special choice of the function fenv;C for the output cells C. The effects of high or low fitness in the output cells will propagate through the system causing reorganization.
Finally, in an artificial life context, the whole graph G might be considered as the brain of a single organism. The behavior of the organism is then evaluated in terms of the dynamics on the meta-graph in which it lives. The meta-graph might also be a Simple Evolving Ecology as outlined here.
Doubtless the SEE model, like the simple GA, has its shortcomings and will need to be modified or replaced. In its integration of evolution and ecology, however, it represents a large step forward. The dynamics of very complex systems may be viewed a kind of harmonious struggle between forces of change and expansion and forces of preservation. SEE incorporates both of these "forces" in a simple and intuitive way.
The SEE Model as a Psynet Implementation
One of the main motivations for the development of the SEE model is the hope that SEE will turn out to be a good way of implementing the psynet model of mind.
From an AI perspective, the SEE model is vaguely similar to the Darwin Machine model discussed in Chapter One, the difference being that, instead of neural modules, one has evolvingpopulations of abstract entities. A single-cell evolving population in the SEE model may be taken as an analogue of a neuronal group. Instead of feeding charge to each other, the populations exchange individuals with each other.
Indeed, if one wishes to map the SEE model onto the brain, it is not difficult to do so. The genotypes in the populations at each cell of the SEE lattice world could be taken as small neural modules, or "first-order networks." A single-cell population then becomes a second-order network, defined as a competitive pool of first-order networks. The first-order networks are all competing, first to fit in well with one another, and second to solve some problem represented by the cell's objective function. The whole SEE network is then a network of competitive pools, interacting with each other.
In the end, this somewhat peculiar "SEE neural network architecture" is really no less realistic than feedforward networks, Hopfield networks, recurrent networks, or any of the other standard models. Even so, however, it seems clear that not much is gained by talking about the SEE model in the language of formal neurons. It is enough to observe that, like the standard formal neural network models, the SEE model is a complex cognitive system with intuitive similarities to the brain.
In preliminary experiments, primitive versions of the dual network have already been observed to come out of the SEE model. The associative memory network comes out of the above-noted propensity of neighboring cells to have similar populations. This makes the whole network a kind of "self-organizing feature map," similar in some ways to Kohonen-style neural networks.
Hierarchical structure is a little subtler. But it is very instructive to think about the ways in which a perceptual-motor hierarchy might be gotten to emerge from SEE. There are no explicit "governor processes," which will explicitly rule over regions of the SEE world. So if there is to be a hierarchy of first-level processes, second-order processes controlling first-order processes, third-order processes controlling second-order processes, and so on -- it will have to come out implicitly, in the dynamics of the world population. The higher order "controling processes" will in fact have to be attractors of regions of the network. For simplicity, consider the case of a lattice of side 2n. In this case, one might find the attractor of a 2x2 square to be the "controlling process" for that square -- not a particular process that lives anywhere, but rather an emergent process, part of the coe-evolution of the 4 individual cell population. Similarly, the attractor of a 4x4 square might be the "controlling process" for that square, regulating the 2x2 square controlling processes within it.
Experimentation with SEE has not proceeded to the point of sophisticated study of multilevel attractors. However, it has been observed that the most interesting dynamics occur when the migration rates are defined in an "hierarchical" way -- i.e., defined so that the network consists of clusters, within clusters, within clusters, etc. The precise nature of these "interesting" dynamics has not yet been studied -- but one suspects that it may consist precisely of hierarchical attractor structures, as suggested in the previous paragraph.
In sum, the SEE model presents a very concrete and workableframework for studying associative memory, hierarchical control, and the interactions between the two. It is a complex model, as compared to the simple GA or the toy iterations of dynamical systems theory. However, it is the minimal model that I have found which has any reasonable potential to demonstrate the main structures of mind (the CAM-Brain and Darwin Machine project being, in my view, a close second).
What is essential to both of these models is the combination of adaptive spatial structure with local, self-organizing learning. This is left out in current complex systems models, which have either localized learning (neural networks, GA's), or spatial structure regulating rigidly defined dynamics (CA's, lattice maps in dynamical systems). Mind requires intelligent adaptation in both space and time. Flexible complex systems models which provide this are, it seems, excellent candidates for intuitive, computational models of mind.
6.4 THE SEARCH FOR STRANGE ATTRACTORS
Now let us return to the ordinary genetic algorithm. In this section and those that follow, instead of generalizing the genetic algorithm into a more psychologically realistic model, we will look at the possibility of using the genetic algorithm in ways that are, very loosely speaking, more psychologically interesting. We will use the genetic algorithm, not as a tool for problem-solving, but as a method for generating a diverse population of interesting forms. First of all, we will look at the use of the GA for generating interesting populations of strange attractors.
In particular, the strange attractors in question are Julia sets of plane quadratic maps. A Julia set represents the region of parameter space for which a plane iteration converges to an attractor rather than diverging to infinity. In the original sense of the phrase, only analytic mappings can have Julia sets (Devaney, 1988), but in practice the concept may be applied to general vector-valued iterations. These "generalized Julia sets" need not have all the elegant mathematical properties of conventional Julia sets, but they exist nonetheless.
Here I will not be directly concerned with generating Julia sets, but rather with generating the attractors corresponding to the parameter vectors within the generalized Julia set of an iteration. While these attractors do not display the same fractal intricacy as Julia sets, they do present a striking variety of visual forms. Furthermore, they are fairly quick to compute -- unlike Julia sets, which are notorious for their gluttonous consumption of computer time.
In a recent article, Sprott (1993) has presented a method for the automatic generation of strange attractors. Essentially, his technique is a Monte Carlo search based on the Liapunov exponent as a practical criterion for chaos. But this technique, while novel, is computationally very crude. The question addressed here is the acceleration of Sprott's Monte Carlo technique: Is there any way to adaptively guide the search process, causing it to "zoom in" more rapidly on the chaotic domain?
My method for accelerating the attractor generation process is to replace Sprott's Monte Carlo method with a more sophisticated optimization scheme, the genetic algorithm (GA) (Goldberg, 1988). As it turns out, this exercise is not only successful with respect to its original goal -- depending on the specific implementation, the GA can locate strange attractors around 5 to 50 times more efficiently than the Monte Carlo method -- it is also instructive in regard to the dynamics of the evolutionary optimization. It gives rise to a new variation of the GA: the eugenic genetic algorithm, in which "unacceptable" elements are weeded out before they ever get a chance to enter the population. And it provides a simple, interesting illustration of two important evolutionary phenomena: 1) genetic drift (Kimura, 1984), and 2) second-order evolution, or the evolution of evolvability (as will be discussed in Chapter 6).
On some graphics systems, e.g. an X-Terminal, the time required for point plotting greatly exceeds the time required for search, so that little is to be gained from improving Sprott's search technique. But on other systems, e.g. on a slower PC clone, the search time dominates. This point is vividly illustrated by programming a screen saver which presents an endless succession of different strange attractors, not by summoning them from memory, but by conducting a continual search through the Julia set of an iteration. When one runs such a screen saver on a 25 Mhz 386-sx, the time required to locate a new attractor becomes quite apparent: one wishes there were some way to produce the new picture a little faster!
In fact, this trade-off between search time and plotting time is not only machine-dependent, but also equation-dependent. For the plane quadratic map considered here, plotting time dominates search time on an X-terminal, but for more complex iterations this would not necessarily be the case. In fact, it is not hard to see that, for every machine, there is some class of potentially chaotic iterations for which search time will greatly exceed plotting time.
Sprott's standard "test equation" is the plane quadratic, as described in Chapter 1. His Monte Carlo algorithm selects random values for the parameters ai and, for each parameter value selected, it computes a certain number of iterates, continually updating its estimate of the Liapunov exponent. Most parameter values lead to unstable orbits (in practice, |xn| + |yn| > 100000), often within the first few iterates, and these can be rapidly discarded. Others lead to a very small Liapunov exponent (in practice, <.005), which indicates that a fixed point or a limit cycle is being approached, rather than a strange attractor. Roughly speaking, around 85% of parameter values result in instability, while of the remainder all but around 1-2% lead to limit cycles. The chaotic domain, while clearly of positive measure, is relatively scanty.
There is no mathematical or artistic reason for choosing the quadratic instead of some other iteration; however, for sake of comparison with Sprott's results, I will use this same iteration as my "test equation" here. In particular, I will intentionally use the same parameter settings as Sprott does, for the various internal parameters of the generation process. This is important because a minor adjustment in these internal parameters cansometimes lead to a drastic change in the effectiveness of the search.
For instance, Sprott chooses his guesses for each ai from 25 values equally spaced over the interval (-1.2,1.2). If the interval (12,12) were used instead (which, admittedly, contradicts common sense), the advantage of the GA approach would be immensely increased, because the Monte Carlo method would lose 90% of its successes, whereas the GA would be hampered only initially, and would eventually hone on the correct region. Also, Sprott's estimation of the Liapunov exponent involves, at each step, doing one iteration from a point separated from (xn,yn) by 10-6, and then computing the difference between the result of this iteration and (xn+1,yn+1). In some cases the specific value (10-6) of this "standardized separation" makes a significant difference in the computed value of the Liapunov exponent; adjusting this value can therefore affect the proportion of attractors to be labeled "chaotic" rather than "limit cycle." Finally, Sprott uses x = y = .5 as an universal initial point; for obvious practical reasons he makes no attempt to distinguish a chaotic attractor whose basin of attraction is the whole plane from a similar chaotic pattern which is obtained only from certain initial values (in the worst case, perhaps only from x = y = .5).
The most natural way to evolve parameter vectors for quadratic iterations is to use the real vector GA . In the real vector representation which I will use here, crossover of two "parent" vectors v and w of length n is naturally interpreted to mean swapping of entries: one picks a random cut-point k between 1 and n-1 inclusive, and forms a "child" vector by prepending the first k entries of v to the last n-k entries of w. And mutation is naturally taken as the operation of replacing each entry vi of a vector with some value chosen from a normal distribution about vi (the variance of this distribution may be fixed or, most usefully, it may be taken to gradually decrease over the course of evolution).
Strange attractor generation is an atypical application of the genetic algorithm, because one is not seeking to find the single parameter vector that maximizes or minimizes a certain function, but rather to produce a large collection of parameter vectors that satisfy a certain broad criterion. In ordinary GA algorithm applications, one wants the population to converge to a certain "answer," or perhaps to a small collection of possible answers. Here, convergence to a single answer can be produced under certain conditions, but it is considered a failure. Thus, for instance, "fitness scaling," which is often used to force convergence to a single answer, would be counterproductive here. The objective is to produce a lot of different attractors, ranging widely across the chaotic domain, but without running down so many dead ends as the Monte Carlo method does.
Genetic Drift
How to use the GA for strange attractor generation? One approach would be to use the Liapunov exponent as a fitness function. Other numerical parameters, such as the correlationdimension, also suggest themselves. As one might expect, however, this approach is not effective; it leads to a population dominated by a single parameter vector with a slightly higher Liapunov exponent (correlation dimension, etc.) than the others.
A better approach is to define each twelve-dimensional parameter vector a = (a1,...,a12) as either acceptable or unacceptable, and contrive a fitness function as follows:
f(a) = 0, if a is unacceptable
(10)
f(a) = 1, if a is acceptable
Under this arrangement the algorithm has no reason to prefer one chaotic attractor to another; the population should, one would suspect, evolve to contain a fairly random assortment of acceptable parameter vectors.
Unfortunately, things are not quite so simple! The culprit is a peculiar twist on the biological phenomenon of genetic drift (Kimura, 1984). A mathematical understanding of genetic drift in GA's is difficult to come by, but on an intuitive level the phenomenon is not so difficult to appreciate. The crucial idea is that of a schema of vectors: a collection U of vectors with the property that, once a certain proportion of members of U have invaded the population, the continued presence of members of U in the population is, if not guaranteed, at least quite likely. In the bit string GA (Goldberg, 1988), schema take the form of hyperplanes; in the real vector GA with mutation there is no analogous elegant formulation, but the phenomena associated with schema in bit string GA's are still observable.
For, consider: if one begins with a random initial population, nearly the entire population will be unacceptable. If something acceptable emerges by chance, the odds are that it will form unacceptable children, and its lineage will die out quickly. But eventually at least one vector will begin to flourish, i.e., it and its minor variations will take over a significant percentage of the population. The source of "genetic drift" in attractor generation is (empirically speaking) the fact that, once a single vector begins to flourish, it becomes virtually unassailable.
Call this first successful vector Vector 1. Vector 1 and its mutants form a schema: when a vector crosses over with another very similar vector, the triangle inequality dictates that any "child" vector obtained will be very similar to its two parents. And in a population where almost everything else has fitness 0, any schema that arises will expand very fast: virtually ever pair selected for reproduction will be drawn from the system. Suppose another acceptable vector, Vector 2, appears; its fitness will be 1 just like the members of the Vector 1 system. But when Vector 2 is selected to cross over, its mate will most likely be a member of the Vector 1 system, so that its children are fairly likely to be unacceptable and at best its lineage will expand slowly.
To indicate the severity of this phenomenon, consider a few numerical results. In one run of tests, the population size was 2000 (fairly large), and the standard deviation of the mutation operator was .01. Over the course of 50 different experiments,it took an average of 320 generations for 90% of the population to converge to a single answer, within two decimal places of accuracy. Over all 50 experiments, this answer was never the same, making quite clear that the cause of the phenomenon lies in the general dynamics of the GA.
One way to get around this difficulty is to introduce speciation -- only let each vector mate with its neighbors. But empirically, it seems that this only results in genetic drift to several different vectors at once. The final population contains 2, or 4, or 10 different classes of approximately-equal vectors; there is no sustained generation of diversity. A more radical solution is required.
The Eugenic Genetic Algorithm
How can this pesky genetic drift be circumvented? The key, it turns out, is that one must never let the population get into a condition where genetic drift can dominate. So, first of all, instead of using a random initial population, it is far better to stock the initial population with acceptable vectors. And after this, whenever unacceptable vectors occur, one must throw them out immediately -- without even giving them a place in the new population. This latter step may seem like overkill, since an unacceptable vector would never be chosen to reproduce anyway. But in practice, if one permits unacceptable vectors to enter the population, they will soon come to dominate, and genetic drift will kick in.
These population-management policies are reminiscent of the draconic social policy called "eugenics," and therefore I have christened the GA thus modified the eugenic genetic algorithm. Although eugenics may be ethically unacceptable in a human context, in the context of strange attractor generation it would appear to be just the ticket! Having booted out unacceptables once and for all, one can cross and mutate to one's heart's content, and no vector will ever have the opportunity to dominate. Milder forms of genetic drift may still sometimes be observed: now and then there will arise a population consisting solely of, say, cigar-shaped attractors pointing in one direction or the other. But this is the exception, not the rule.
Practical experience with the ordinary GA teaches that mutation is of relatively little importance; and the same principle holds for the eugenic GA. Even without any mutation at all, the eugenic GA is a powerful tool for attractor generation. First one uses Sprott's Monte Carlo technique to generate, say, 10 - 50 strange attractors, saving the parameter vector underlying each one. Then one takes this collection of parameter vectors as an initial population, and repeatedly executes the following eugenic genetic algorithm:
1) select two different vectors at random (from a uniform distribution)
2) select a random cut-point (between 0 and 10, since the vectors have 12 entries), and produce a child vector
3) if the child vector is acceptable, choose a population
member at random and replace it with the child vector
The results are surprisingly good. For instance, over 10 different 300-generation experiments with population size 25, an average of about 30% acceptable children was obtained. 10 similar experiments with population size 10 yielded an average of 35%. The variances involved seem to be quite large; the highest proportion obtained in the course of these experiments was 60%, the lowest 11%. But these values are all much higher than the 1-2% obtained with the Monte Carlo method, so the intuitive moral would seem to be clear.
If one adds mutation, the success percentage unsurprisingly seems to go down somewhat, e.g. by 4-5% with a mutation standard deviation of .06. The advantage is that, in the limit anyway, one has the possibility to explore an infinite number of parameter vectors from a fixed initial population. But in practical terms, this advantage does not seem to amount to much.
These results do depend on the specific nature of the crossover operator. One might think to replace one-cut-point crossover with two-cut-point crossover, or uniform crossover in which each entry of the child vector is chosen at random from either of the two parent vectors. But this significantly degrades the results, in the latter case by approximately a factor of two. If one looks at the form of the quadratic iteration, however, this is not surprising; it probably stems from the fact that in some cases ai is multiplied with ai+1 or ai+2 before it produces an effect (xn,yn). This interdependence makes it a benefit for pairs of nearby coefficients to stick together through the process of crossover.
A more curious feature of the results of these experiments is that they depend fairly sensitively on Step 3 of the eugenicized GA. This is a highly nonobvious result: why should the child vectors produce any better children than the parents? Nonetheless, if Step 3 is removed, one obtains much worse figures -- for instance, an average of around 7% acceptable offspring for population size 25, down from 15-16%. This suggests that some form of second-order evolution (as defined in the previous chapter, in the context of the evolution game) is occuring -- an evolution in the direction of forms with greater "evolvability." For some reason offspring obtained by crossover are better parents than typical acceptable vectors. The GA is arriving at a region of population space which is an "adaptive attractor": it is adaptive in the sense of being generally fit, and it is an attractor because the GA happens upon it as if by accident, without explicity coaxing.
As with genetic drift, the most sensible intuitive explanation involves subsets U of the generalized Julia set that are schema under crossover, in the sense that if one takes two vectors a and b within U, there is an relatively high probability that the child of a and b will also be within U. Over the course of its evolution, the GA would seem to be "converging" to these schema, in the same manner that repeated iteration of a function from an arbitrary initial value can lead to convergence to a fixed point of that function.
In the more general, system-theoretic language that will be introduced later, these "schema" subsets U are autopoietic subsystem of the GA dynamical system. They are self-producing systems of vectors or pictures, which produce each other over andover again, and in this way protect themselves against outside interference. What is happening with the GA here is a very simple example of a process that is crucial in psychological systems -- a process that, for example, underlies the maintenance of human belief systems.
The Question of Repetitiveness
The eugenic GA partially stifles genetic drift. But does it avoid it entirely? In other words, are the attractors generated by the eugenic GA somehow more visually repetitive than the ones generated by the Monte Carlo method? By re-using old coefficients, instead of generating entirely new ones, are we sacrificing some kind of originality?
It is clear from specific cases that the offspring of two attractors can look very different from its parents; thus the mechanism for generating diversity is present in the crossover operator. An example is given in Figure 4. Here, the two parent parameter vectors (a1,...,a12) are as follows:
Parent 1:
-1.10350, -0.00570, 0.23650, -0.82920, 0.72670, 0.35840, -0.75380, -0.45080, -0.37600, 0.53850, -0.99390, -0.80260
Parent 2:
1.00650, -0.03940, -1.17760, -0.49160, 0.24740, 0.10900, 0.30930, 0.16200, 0.73160, -0.00960, -0.73040, -0.48770
Both of these parameter vectors, as shown in the Figure, lead to fairly uninteresting attractors. The child is obtained from taking the first four entries of Parent 1, and prepending them to the last 8 entries of Parent 2:
Child:
-1.10350, -0.00570, 0.23650, -0.82920, 0.24740, 0.10900, 0.30930, 0.16200, 0.73160, -0.00960, -0.73040, -0.48770
Despite its derivation from the parameter vectors of its parents, the shape of the child bears little relation to the shapes of its parents. And this example is not unusual in this respect. What the prevalence of cases like this means is that the crossover operator does a fairly broad search of the Julia set of the quadratic operation. The Julia set is shaped in such a way that, when two points within it are crossed over, the result is reasonably likely to lie within the Julia set as well. And the "evolution of evolvability" is simply the tendency of a population to drift into regions of the Julia set which have a greater than average tendency to cross into the Julia set.
But the ability of crossover to produce dramatic new forms does not disprove the possibility that these new forms will tend to be repetitive. In an attempt to approach the issue of repetitiveness quantitatively, I have kept records of the distribution of the positive Liapunov exponents generated by theeugenic GA, as opposed to Sprott's Monte Carlo method. The results are not surprising. Sprott's method leads to positive Liapunov exponents which are distributed on a slowly decaying "hump" shaped curve, centered around .2 or .3. The eugenic GA, as tested on a population size of 20, leads to the same kind of distribution at first, but then as evolution progresses, the distribution grows a sharper and sharper peak around some particular point, or sometimes two or three particular points. During the initial period of more slowly-decaying positive Liapunov exponent distributions, the GA still generates chaotic attractors 5-20 times more often than the Monte Carlo method; but as the curve becomes peaked, the success rate can become even higher. What this represents is the indefatigability of genetic drift. Even the eugenic GA does not escape it completely.
Given the pervasiveness of genetic drift, if one wishes to search for strange attractors, the best course seems to be to alternate Monte Carlo search with low-mutation-rate eugenic GA search: e.g. to use Monte Carlo search to generate an initial population of 25 acceptable vectors, then use an eugenic GA to breed perhaps 100 attractors from these, then revert to Monte Carlo search again after a certain period to obtain a new initial population, etc. In order to determine when to switch back to the Monte Carlo search, one can check the entropy of the distribution of the last, say, 50 positive Liapunov exponents generated. When this entropy becomes too low, one has reason to believe that genetic drift is predominating.
This algorithm is only a beginning; there is clearly much more "engineering" to be done. But even in its present form, the method provides a very rapid search of the interior of the Julia set. And, when applied to iterations less well-studied than the plane quadratic, it may provide a highly effective method of discovering strange attractors, where their existence is not previously known.
6.5 THE FRACTAL INVERSE PROBLEM
In this section, we will explore the possibility of evolving specified target fractals using the iterated function system algorithm for fractal generation. This exploration is similar in spirit to that of the previous setion, but quite different in detail. It provides further confirmation of the remarkable ability of the GA to generate diverse, intricate structures.
The theory of iterated function systems is a method for getting fractal sets from dynamical equations. It is both simple and elegant. Most of it is implicit in the half-century-old work of mathematicians like Hausdorff and Banach. Much less simple and much less classical, however, is the IFS inverse problem. Given, say, a picture or musical composition, which is in principle capable of being effectively produced by the IFS method, it is no easy task to actually determine the coefficients of the iterated function system which generates the picture or composition.
In this section I will describe some partially successful efforts to solve this difficult problem, using the genetic algorithm. As it turns out, a straightforward GA approach isworkable but inefficient. It is reasonably useful for simple one dimensional examples, but runs into difficulty with even the easiest two dimensional problems. The same computational obstacle was located by Mantica and Sloan (1988), using a completely different optimization method, so-called "chaotic optimization," leading one to the conclusion that the difficulty may lie not in the incapacity of the optimization methods but in the inherent complexity of the fractal inverse problem.
Barnsley's fractal image compression method involves the use of local IFS's instead of genuine iterated function systems. Local IFS's are mathematically less straightforward than IFS's; they are not contraction maps, so that convergence from an arbitrary starting point is not guaranteed. Also, unlike IFS's, they do not capture an image's global structure, but only the local regularities in its structure. Computational practicality is purchased, in this case, at the cost of philosophical and scientific interest.
In the following section, I will turn to a related problem which poses a different sort of difficulty. As in the application of the GA to strange attractor generation, instead of using a GA to solve an optimization problem, we will attempt to use a GA to evolve a population of entities satisfying a certain criterion. But this time, instead of evolving strange attractors, we will be trying to evolve one-dimensional IFS's whose attractors form aesthetically satisfying melodies. The eugenic genetic algorithm fits the bill nicely here; the trouble is in defining a fitness function which quantifies the notion of "aesthetically satisfying." Drawing on some ideas from musical psychology, we will outline one way of approaching this difficult problem, and present some musical results.
Iterated Function Systems
An iterated function system, or IFS ,consists of N affine transformations wi, each one associated with a certain probability pi. In the one-dimensional case, the wi take the particularly simple form
wi(x) = ai x + bi (11)
To specify a 1-D IFS with N transformations, then, requires 3N - 1 floating-point numbers (it is 3N-1 and not 3N because the N probabilities pi are known to sum to one, which reduces by one the number of degrees of freedom).
In the two-dimensional case, the wi take the form:
(12)
One may show that any such transformation can be expressed as the composition of one translation, three scalings, and three rotations. Two "classic" examples are given in Table 1.
The deterministic IFS algorithm consists of the following iteration:
An+1 = w1(An) union ... union wN(An) (13)
The "initial set" A0 must be given. This is an iteration on set space, meaning that it takes sets into sets.
If one restricts attention to compact, nonempty subsets of some metric space (e.g. the line or the plane), then one can show that this algorithm necessarily converges, given only the assumption that the determinants of the wi are all less than one. And even this can be strengthened; it suffices that the geometric mean of the determinants not exceed one. The limit set, the "fixed point attractor" of the iteration, is of course given by the equation
A = w1(A) union ... union wN(A) (14)
At first this seems a remarkable result but it turns out to be quite "shallow," in the mathematical sense; it follows almost immediately from Banach's contraction mapping principle. The trickiest part, in fact, is the definition of an appropriate metric on set space (without some way of defining the distance between two sets, one cannot prove that the iterates converge to a limit). But this problem was solved half a century ago, by none other than Felix Hausdorff, of Hausdorff dimension fame (and also known, in point set topology, for his introduction of the "Hausdorff space"). The distance between a set A and a point y is given by d(A,y), the minimum over all points x in A of the pointwise distance d(x,y). The asymmetric distance between two sets A and B, d(A,B), is given by the maximum over all points y in B of d(A,y). And, finally, the Hausdorff distance from A to B, h(A,B), is the maximum of the two asymmetric distances d(A,B) and d(B,A). It is in this sense that the deterministic IFS algorithm converges to its attractor: h(An,A) tends to zero as n tends to infinity.
But the deterministic IFS algorithm is useful primarily for theoretical purposes. For practical computation one uses the random iteration algorithm. Given an IFS, one obtains an "attractor" by executing the following decoding algorithm:
1) Select an initial point x
2) Choose a transformation wk (where the chance of choosing
wi is proportional to pi)
3) Replace x with wk(x), and plot the new x
4) Return to step 2, unless sufficiently many iterations
have been done already
In principle one should do infinitely many iterations; in practice one does perhaps 1000 - 50,000 iterations, and discards as "transients" the initial 100 - 2000 points.
The result of this 2-D algorithm is a density or measure m on a certain region A of the plane. Similarly, the 1-D algorithm gives rise to a measure on a certain subset of the line. The necessity to deal with measures means that, although the random iteration algorithm is very simple from the perspective of implementation, it cannot be fully understood without a formidable mathematical apparatus. The Hausdorff metric on a space of sets must be replaced by the Hutchinson metric on aspace of probability measures. However, the upshot is the same: Elton's "IFS ergodic theorem" says that the random iteration algorithm will eventually, with probability one, converge to a certain attractor or "invariant measure" determined by the wi and the pi.
The "support" of this invariant measure m is the same set A given in Equation (14) above. By construction, the measure assigns A an area of 1; m(A) = 1. The value of m(B), for a given subset B in A, is the "percentage" of points in A which are also in B, on a typical run of the decoding algorithm from an initial point within A (or, more precisely, it is the limit of this percentage as the number of iterations tends to infinity). So the wi determine the region of support of the measure m, and the probabilities pi determine the precise contour of the measure.
In graphics terms, this means that the pi serve only to determine the "grey scale values"; whereas the wi determine which points get some shade of grey (those points inside A), and which points get pure white (those points outside of A). In practice, however, if some of the pi are set sufficiently low, then certain regions of A may not be likely to show up on the computer screen within a reasonable number of iterations; a fact which cannot be neglected when one is evolving IFS's by the genetic algorithm, as will be done in Chapter Five.
It is noteworthy that the IFS method deals only with fixed point attractors. One can define chaotic dynamics on the attractor set A -- by defining a map on A by s(x) = wi(x)-1(x), where i(x) is one of the integers j for which wj-1(x) lies in A -- but this is a different matter. There is no good reason why algorithms that demonstrate periodic and chaotic dynamics on set space and measure space should not be equally useful, or even more useful. But, for the present, such algorithms exist only in the realm of speculation; we are left with fixed points only.
Some Simple Experiments
Can one use the genetic algorithm to go from the fractal to the IFS that generates it? This section describes experiments aimed at answering this question, conducted on a 486 PC by Hiroo Miyamoto and Yoshimasa Awata at the University of Nevada, Las Vegas, under the direction of the author.
As a simple one-dimensional test case, we used a nonuniformly shaded Cantor set in [0,1] as a target picture. The IFS generating this target is given by the two affine transformations
w1(x) = a1x + b1 = .333x
w2(x) = a2x + b2 = .333x + .666 (15)
with probabilities
p1 = 0.3
p2 = 0.7
To represent this as a parameter vector, we used the standardordering (a1,b1,p1,a2,b2), where p2 is omitted due to the relation p2 = 1 - p1, obtaining the vector (.333,0,0.3,.333,.666).
Note that this strategy for encoding probabilities only works for the case N = 2. In the more general case where there are N probabilities, one must encode at least N-1 numbers representing probabilities, but these numbers will not in general sum to one, and they must be normalized to sum to one before the vector is used to generate an image.
In our Cantor set example, preserving three digits of accuracy yields an 80-bit binary string. The task given to the genetic algorithm, then, is to search the space of 80-bit binary strings for the one which generates the shaded Cantor set.
The Fitness Function
Ideally, one would like to use a fitness function involving the Hutchinson metric on measure space (Barnsley, 1988). However, each evaluation of this metric involves a minimization on function space, and so this is not an effective computational strategy. The longer each function evaluation takes, the longer the algorithm will take to run. In practical applications, one is not directly concerned with measure-theoretic convergence, but rather with convergence to the degree of approximation represented by a computer screen. Therefore, it seems sensible to use a lattice-based "distance measure," in which the distance between two finite sets A and B is determined by dividing the interval or square involved into n rectangular subdivisions. For each subdivision, one measures the magnitude of the difference between the number of points of A which lie in the subdivision, and the number of points of B which lie in the subdivision. Then one sums these values, to obtain the distance between A and B.
In the Cantor set example, the n subdivisions become n subintervals of [0,1], and one obtains a fitness function f defined so that the fitness of the i'th population element is given by
f(i) = 1 - (ob + dp) / (2 - T) (16)
where:
T is the total number of iteration points for each IFS, which is taken equal to the number of pixels in the target image;
ob is the number of points generated by the i'th IFS which lie outside the interval [0,1];
dp = dp(1) + ... dp(n),
dp(i) = |ap(j) - tp(i,j)|,
ap(j) is the number of points in the target image which lie in the j'th subinterval
tp(i,j) is the number of points in the attractor of the i'th IFS which lie in the j'th subinterval.
According to this scheme, the maximum fitness is 1, and the minimum fitness is 0. The probability of a given IFS in the population being selected is then given by the standard formula f(i)/[f(1) + ... + f(n)].
Experiments
For our first run of experiments, we set the population size at 300, and permitted a maximum of 1500 generations. In the first run, the maximum fitness value of 0.802 was reached in generation 926. In the second run, 0.766 was reached after 850 generations, and there was no improvement after that. In the third run, a near perfect fitness of .972 was achieved after only 400 generations.
The successes were due, not to particularly fit initial populations, but solely to crossover effects. Mutation rate was kept very low (0.00015 per bit), so it is unlikely that mutations played a major role in the convergence. Over twenty runs, the maximum fitness achieved after 1500 generations averaged 0.858; and in all but three cases, this value was achieved before 1000 generations.
One might wonder what would happen if the number of transformations required to generate a given picture were overestimated. As an initial approach to this question, we set the genetic algorithm the problem of finding three affine transformations to generatethe Cantor set. The results were quite encouraging: over twenty runs, the maximum fitness achieved after 1500 generations averaged.721. In all but two cases, one of the three transformations ended up with coefficients very close to zero.
The behavior observed in these experiments is typical of the genetic algorithm: the desired answer is approached fairly quickly, and then never precisely obtained. However, the attractors with fitness over 0.85 are visually almost identical to the Cantor set, suggesting that the genetic algorithm may in some cases be able to provide IFS coefficients that are adequate for "lossy" solutions of the fractal inverse problem, i.e. for applications in which a decent approximation of the target image is enough.
The number of generations seemed to matter less than the populaton size. Continuing with the Cantor set example, for sake of variety we shrunk the population size to 50, and expanded the number of generations to 4000. In a typical run, after generation 84, the fitness value became 0.406 and the optimal parameter vector was (0.599, 0.169, 0.598, 0.377, 0.001). After this stage, the fitness value simply remained in the range between 0.335 and 0.415. The smaller population size kept the algorithm stuck in a far-from-optimal region of IFS space.
Though poor for the Cantor set, this result was typical of our experience with two-dimensional target pictures. For instance, a very simple two-dimensional fractal, the Sierpinski triangle, proved completely unapproximable. Only three affine transformations are needed, for a total of twenty real variables. But time after time, a population of 500 or 1000 ran for several thousand generations (96 hours of CPU time on our 486) without achieving fitness above 0.4. Clearly the longer bit strings required a much larger population size; but populations greater than 1000 are not feasible for PC implementation due to the extremely long running times involved.
Frustrated with these results, I decided to port the algorithm from the PC to a more powerful Unix system. Theresults, however, were unimpressive. A population of 10000 vectors, running for one hundred trials of a thousand iterations each, managed to break the .5 fitness barrier for the Sierpinski triangle only eight times. Only twice did the fitness exceed .8, producing a reasonable Sierpinski triangle. Further experiments were performed on fractals produced by 4 or 5 affine transformations, on random images, and very simple picture files not derived from IFS transformations; but these proved entirely unsuccessful.
Conclusion
Our one-dimensional experiments demonstrate the potential effectiveness of genetic algorithms for solving the IFS inverse problem. But our results for the two-dimensional case indicate that genetic optimization may not be an adequate tool. Further experimentation is required in order to determine the rapidity with which the population size must increase, in order to accomodate the longer bit strings required by higher dimensions and more transformations. However, it seems most likely that, in order to be useful for the fractal inverse problem, the genetic algorithm must be hybridized with some other optimization algorithm. This is a natural avenue for future research.
Now let us turn from pictures to music. In order to use the GA to generate fractal music, we will stick with IFS's, but will return to the "target-less" form generation of Section 6.4. However, instead of merely trying to generate chaotic melodies, we will describe a programme aimed at the evolution of psychologically and aesthetically interesting melodies.
The key problem of algorithmic music composition is structure. Human-composed music combines mathematical and emotional structure in a complex way. But a computer composition algorithm, lacking access to the human unconscious, must draw structure from elsewhere. IFS music seeks to solve this problem by deriving musical structure from geometrical structure; more specifically, from the structure of the geometrical entities called measures. Goguen (1990) has used two-dimensional IFS's to generate fractal music (Goguen, 1990). However, it seems more natural to use one dimensional IFS's to represent melodies; so this is the approach I have taken here. Music is indeed multi-dimensional, but its core organization is one-dimensional, defined by the axis of time.
The fractal melodies generated by 1-D IFS's are not in any sense random; they possess structure on all different levels, from the most local to the most global. The worst of these fractal melodies are completely chaotic or excessively repetitive. The best, however, are at least moderately "catchy." Because they are generated without attention to the rules of Western tonal music, they tend to contain strange delays in timing and occasional "odd notes." Most of these aberrations could be filtered out by a "melody postprocessor" programmed with the elements of tonal music theory. Here, however, I have takenan alternate approach: I have chosen to play the melodies in a genre of Western music which is relatively insensitive to the niceties of music theory -- what is known as "industrial music". As it turns out, fractal melodies sound quite natural when played as feedback-infused guitar samples against the background of a loud drum machine. This is not because industrial music is indifferent to melodic quality, but simply because industrial music is forgiving of strange delays in timing and occasional odd notes.
Fractal melodies are structured on many different scales. Admittedly, however, there are differences between this fractal structure and what we ordinarily think of as musical structure. The most important difference is the sense of progressive development over time -- most human music has this, but very little fractal music does. The most interesting fractal melodies are those which, coincidentally, do happen to display progressive development. In order to isolate these particular fractal melodies from the larger mass of uninteresting ones, I have implemented a genetic algorithm (GA), acting on the space of "code strings" for 1-D IFS's.
The genetic algorithm requires a "fitness function" to tell it which qualities make one melody superior to another. To supply such a fitness function is a very difficult task -- obviously, no one knows exactly what it is that makes a melody sound good. However, if one's goal is merely to separate the passable from the terrible, and to do so statistically rather than infallibly, then some headway can be made. I have implemented a fitness function based on Meyer's (1956) theory of musical semantics and the mathematical psychology of The Structure of Intelligence, the basic idea of which is that good melodies demonstrate the surprising fulfillment of expectations (SFE). The SFE-based fitness function is certainly not a litmus test for melody quality, but it does serve as a rough "filter." In practical terms, it decreases the number of poor fractal melodies one must listen to in order to find one good one.
At the present stage of development, 1-D IFS's are primarily useful as a compositional tool rather than as a stand-alone composition method. Particularly when augmented by the GA, they are a reliable source of novel melodic ideas. It seems possible, however, that in the future the role for IFS's in music composition could become even greater. Combined with a tonality-based postprocessor and an improved SFE-based fitness function, IFS's might well be able to produce "finished compositions." The initial experiments described here provide strong encouragement for research in this direction.
From Mathematics to Melodies
How does one translate measures into melodies? The simplest strategy is to begin with a fixed interval I containing all or most of the attractor region A. One then divides the interval into M equal subintervals Ir, where M is the number of notes in the melody being constructed, and runs the IFS decoding algorithm keeping track of the number Nr of points which fall into each interval Ir. If Ir does not intersect the attractor A, then after perhaps a few stray transients, no points should fall inIr at all; Nr should be zero or very close to zero. On the other hand, in theory Nr may become very high for certain intervals; it is even possible that every single point plotted will occur within the same interval. What is needed to produce music is a formula for converting each number Nr into a note or rest.
There are many ways to carry out this conversion. For an initial experiment, I have used the following scheme:
1) First, compute the minimum and maximum Nr over all the
subintervals Ir; call these valus N(m) and N(M).
2) Divide the interval (N(m),N(M)) into 12 equal-sized subintervals Rl, l = 1,...,12.
3) Assign the interval Ir the l'th note in the octave if Nr
lies inside Rl.
This is very simplistic and may be varied in several obvious ways. However, although variation of the conversion scheme does affect the character of the melodies obtained, it does not sensitively affect the presence of complex multilevel structure in these melodies. Under any reasonable conversion scheme, one finds melodies with repetitions, repetitions within repetitions, and so forth, and continual minor variations on these repetitions. The complexity of the melody increases as one increases the length of the "code vector" which gives the coefficients of the IFS. Too few transformations, and generally repetitive melodies result. As a rule of thumb I have chosen the number of transformations to be half the number of intervals (notes); but this is largely an arbitrary decision.
1-D IFS's are a surprisingly interesting compositional tool. Fractal melodies obtained from 1-D IFS's in the manner described above sound far different from, and better than, random melodies. Some are overly repetitive, but many are complexly structured on multiple scales. Not all of this structure is humanly appealing -- much IFS music sounds markedly alien, neither unpleasant nore particularly "listenable." Occasionally, however, one happens upon a fractal melody which displays the sense of patterns unfolding over time that we expect from human-composed music.
Surprising Fulfillment of Expectations
My approach to assessing the fitness of a melody involves Meyer's (1956) theory of musical emotion, which states that the key to melodic structure is the artful combination of predictability and surprise. A successful melody, Meyer proposes, works by surprising fulfillment of expectation (hereafter abbreviated SFE). SFE means, quite simply, first arousing an expectation, then fulfilling the expectation in a surprising way. This approach to musical aesthetics has the merit of being easily programmable: one may recognize patterns in a melody, and compute the degree to which their fulfillment is "surprising" in terms of the other expectations implicit in the melody.
I will now describe in detail how I have implemented SFE to evaluate the "quality" of a melody. The basic method here isvery reminiscent of the Chaos Language Algorithm described above, although the details of the implementation are somewhat different. The idea, here as in the CLA, is to recognize repeated subsequences in a time series, and use these repeated subsequences to get at the structure of the series. One is getting at patterns in dynamics, and, now, using the genetic algorithm to evolve dynamics manifesting appropriate patterns.
Following the approach taken in The Structure of Intelligence, the intensity of a pattern P is defined as one minus the compression ratio which the recognition of P provides. So, suppose one is dealing with a pattern of the form "This note sequence of length k is repeated r times in a melody of length t." If one substitutes a single "marker" symbol for each occurrence of the sequence, then one has reduced the string of t notes to a string of t - r(k-1) symbols. On the other hand, it takes k symbols to store the string itself, thus giving a total of t - r(k-1) + k symbols for the compressed melody, as opposed to t symbols for the uncompressed melody. The intensity of the pattern is then given by
1 - [t - r(k-1) + k]/t = [r(k-1) - k]/t (17)
For very long melodies this formula is not correct, since one will eventually run out of symbols (assuming as usual a finite alphabet), and will have to start inserting two or three symbols to denote a single repeated string. One must then become involved with the tedious details of Huffman coding or arithmetic coding (Bell, Cleary and Witten, 1990). For the short melodies involved in the current experiment, however, equation (3) is a perfectly adequate approximation. In fact, as a practical matter, for short melodies it seems to work better to replace the "-k" term with a "-1". This means that once a pair of notes occurs once it is a pattern; it is a falsehood in terms of information theory but may possibly be more psychologically accurate, and is in any event useful for the evolution of short melodies. Thus one obtains the simpler formula
[r(k-1) - 1]/t (17')
If one is dealing with a contour pattern or an interval pattern instead of a note pattern, formula (8) must be modified in a somewhat ad hoc fashion. For clearly, a contour sequence of length k, occuring r times, should be counted as less intense than an interval sequence of the same length which has occurred the same number of times; and the same should be true of interval sequences compared with note sequences. The question is really one of musical psychology: from the point of view of a human listener, how prominent is a repeated contour sequence as opposed to a repeated interval sequence or note sequence? Studies show that contour patterns are very important to human melody identification, but that interval patterns also play a significant role. The easiest way to model this balance is to introduce weights 0 < wC < wI < 1. The intensity of repeated contour and interval sequences are then defined by the formulas
wC [r(k-1) - k]/t (18)
wI [r(k-1) - k]/t (19)
respectively, where for short melodies one may in practice wish to replace "-k" with "-1" as in (8).
The values of the weights are actually quite important. In most fractal melodies, contour patterns are the most common patterns, so if wC is set below .5 most fractal melodies will have very few significant patterns. As a rule I have set wI = .8 and wC = .6. In fractal melodies there are very few interval patterns which are not also pitch patterns; this is clearly one significant difference between fractal music and Western tonal music. In this respect fractal music is more similar to, e.g., Balinese gamelan music, which is almost entirely based on contour with no concept of absolute pitch at all.
This way of measuring intensity is time-independent, in the sense that it counts patterns from the distant past (the beginning of the melody) just as much as more recent ones. Even for very short melodies, this is not psychologically realistic; there is always some degree of damping. This can be modeled by
computing the intensity over some interval shorter than the one from the beginning of the melody (time 0) to the current point in the melody (time t). Having measured the intensity of a pattern P over several intervals [t-si,t], one then averages the intensities obtained in this way to obtain the total intensity of P.
Ideally one would wish to let si run from 1 through t. Writing the intensity of pattern P over [t-si,t] as IN[P;si], one could then define the damped intensity as the sum
v(1)IN[P;1] + ... + v(t)IN[P;t] (20)
where v(x) is a "decay function" monotone decreasing in x, satisfying v(1) + ... + v(t) = 1/t. In the current experiment I have implemented a simplified version of this scheme in which only two values of si are used, s1 = t giving the whole melody and s2 = 7 +/- 2, giving only a brief window of past history. Specifically, I have chosen s2 = 5 note patterns, s2 = 6 for interval patterns, and s2 = 8 for contour patterns.
This idea of a "short scale" versus a "long scale" is obviously a simplification, motivated primarily by practical rather than conceptual considerations. But it does have some psychological relevance: recall the "magic number 7 +/- 2" of short term memory capacity. The short scale intensity of a pattern may be reasonably interpreted as the intensity of the pattern in the part of the melody "currently psychologically present," while the long scale intensity is the intensity of the pattern in the melody as a whole, as stored in long-term memory. Perhaps not coincidentally, 7 is also an approximate cut-off for the size of frequently observed repeated patterns in melodies. For note or interval patterns k>5 occurs only rarely; and for contour patterns k>7 is infrequent (it is well nigh impossible to find two human-composed melodies with the same contour in the first 15 notes).
So, given these two estimates of intensity -- short scale and long scale -- how does one estimate the degree of SFEdelivered by a given note in a melody? The key, I suggest, is to compare the amount of short scale intensity delivered to the amount of long scale intensity delivered. If patterns with long scale intensity are selected at the expense of patterns with short scale intensity, then this means that immediate goals are being frustrated in the service of long-term satisfaction. This is certainly not the only kind of SFE, but it is certainly one important species of SFE, which has the advantage of being easily quantified.
At each note t, each pattern is assigned a long scale percentage, obtained by dividing its intensity over [0,t] by the intensities of all other patterns over [0,t]; and a short scale percentage, obtained similarly from intensities computed over the interval [t-7,t]. The SFE value of the note is defined as the positive part of bA-B, where
A = the sum of the long scale percentages of all patterns
completing themselves at note t
B = the sum of the short scale percentages of all patterns
completing themselves at note t
and b>1 is a balance factor designed to compensate for the natural tendency of B to exceed A in all music. In the experiments I have done, a balance factor 1.5 < b < 2.5 has yielded the most sensible results, but this may be different for non-fractal melodies.
The basic idea is that, if B exceeds A by a sufficient ratio, the SFE value is zero; if A exceeds B the SFE value is the amount of the excess. Thus the SFE value of a note is the extent to which the note affirms long term structure at the expense of short term structure.
Now, a good melody need not -- and in general will not -- have universally high SFE values. Sometimes short scale patterns need to be fulfilled; this the best way to build up the long scale intensity that gives future notes high SFE values. Therefore, in this context, the quality of a melody is most sensibly estimated as the average of the SFE values of its notes. This is a long way from being a "litmus test" for distinguishing good from bad melodies -- aside from the question of the general validity of the SFE theory, many serious simplifications have been made along the way. But it is a start.
Fractal Industrial Music
IFS's, as used here, generate only note strings. To get a real melody from a note string, one must deal with other features such as duration, velocity and volume. I have found that, while
velocity and volume can be set constant without great loss, the assumption of a constant duration results in unnecessarily "wooden-sounding" melodies. Therefore, in order to produce "listenable" melodies, I have introduced an ad hoc rule for producing durations. Namely: durations are chosen at random from the set {WHOLE, HALF, QUARTER, EIGHTH, SIXTEENTH}, but if a string of notes is repeated, they must all get the same duration. This rule ignores the manifold difficulties of identifying phrasestructure, and is obviously not intended a general solution to the problem of matching note strings with durations. However, the rule generates much better-sounding melodies than a constant- durations, IFS-generated-durations, or random-durations approach.
In addition to duration, timbre cannot reasonably be ignored. Melodies can sound quite different when played on different instruments -- for instance, few melodies sound their best played on the "beep" of an IBM PC speaker. In order to play the fractal melodies I have chosen sounds from the genre of industrial music. Generation of industrial music is a particularly appropriate application of fractal music because industrial music tends to be forgiving of strange pauses or "off" notes.
The genetic algorithm is of limited but definite use in the generation of industrial music with IFS's. If one runs a GA using the SFE-based fitness function, and plays each melody which the GA evolves, one will hear a better sequence of melodies than one would hear by trying IFS's at random. The improvement is not so dramatic as had been hoped, but is significant nonetheless. A substantial part of this improvement is due to the elimination of extremely repetitive melodies, which consist primarily of long strings of one note repeated over and over. This might seem to be a trivial kind of improvement, but it is not, because there is no obvious way to predict which IFS "code sequences" will give rise to repetitive attractors. The GA shifts the melody population until it resides primarily in that part of code sequence space containing few repetitive attractors. It tends not to converge quickly to a single melody, which is just as well, because the point is to generate a diversity of melodies. Eventually, however, it does converge.
One way to avoid this outcome is to introduce the Eugenic Genetic Algorithms developed above -- to replace the SFE fitness function f(x) with a fitness function g(x) defined by
g(x) = 0, f(x) < c
1, otherwise (21)
Implementing this modified fitness function leads to an endless stream of novel, more-interesting-than-average melodies. Numerically, for long melodies (100-1000 notes), with b = 1.5 and c = .3, one obtains melodies with SFE value averaging around .35 or .4 (out of a possible 1.5); whereas the average melody has SFE value in the range .26 - .27. For short melodies the results are erratic and the variances are so high that I was not able to obtain meaningful averages.
Even without the GA, however, interesting-sounding fractal melodies are not that hard to come by. Long fractal melodies, the ones which the GA does the most consistent job of locating, tend to grow dull. IFS's seem to be best at generating short "riffs" or melodic ideas. My personal favorite is:
Example 1:
A WHOLE A# QUARTER A WHOLE B SIXTEENTH A# WHOLE
G# QUARTER F QUARTER E WHOLE C# EIGHTH D# EIGHTH
C# EIGHTH F QUARTER C# WHOLE F QUARTER C# EIGHTH
C EIGHTH B HALF B# WHOLE C SIXTEENTH A# EIGHTH
G QUARTER A EIGHTH A EIGHTH A EIGHTH G HALF
G HALF G HALF G HALF G# EIGHTH F # WHOLE
Played on a sampling keyboard through a feedback-infused guitar sample, against a background of crashing drums and cymbals, this is quite a natural-sounding riff.
Example 1 was obtained without the GA, by random search. The following melodies of length 30 were found by the GA, and are fairly typical of those produced by the genetic algorithm with a population size of twenty, after a dozen generations of search:
Example 2:
C# EIGHTH C# EIGHTH C# EIGHTH B EIGHTH G# HALF
D QUARTER E EIGHTH D SIXTEENTH A HALF E QUARTER
E QUARTER E QUARTER E QUARTER E QUARTER G HALF
E SIXTEENTH E SIXTEENTH F WHOLE F WHOLE D# QUARTER
E WHOLE D# QUARTER E EIGHTH D SIXTEENTH D SIXTEENTH
D SIXTEENTH D SIXTEENTH C QUARTER C QUARTER D WHOLE
Example 3:
E HALF D# QUARTER G SIXTEENTH F# EIGHTH C HALF
G EIGHTH F# EIGHTH F EIGHTH F EIGHTH F# EIGHTH
G# HALF G EIGHTH G EIGHTH G EIGHTH G EIGHTH
A# HALF A# QUARTER A# QUARTER D# SIXTEENTH
C# WHOLE B# WHOLE G# EIGHTH F# QUARTER G WHOLE
G WHOLE G WHOLE G WHOLE E QUARTER F# WHOLE
Example 3 is a fine argument in favor of the SFE approach to musical aesthetics. The existence of two separate strings of four repeated G notes, in itself, leads to an above average SFE value for the melody. For, the second time this string is reached, it does not fulfill any short-scale patterns, but it does fulfill the long-scale pattern set up by the repeated G's the last time they came around. The musical effect of this SFE would be ruined by random selection of durations, but the duration algorithm used works quite nicely; the four eighth-note G's appear as an "intentional foreshadowing" of the four whole-note G's, giving the essential impression of development over time. In fact, the repetition of the same note sequence with different durations is itself an example of "surprising fulfillment of expectations," though this fact was not picked up by the currently implemented SFE fitness function due to its exclusive focus on pitch values.
Finally, for sake of completeness, the following is a fairly typical example of a melody rejected by the GA:
Example 4:
C WHOLE C WHOLE C WHOLE C WHOLE C WHOLE
C WHOLE C WHOLE C WHOLE D QUARTER C# HALF
C# HALF C# HALF C# HALF C# HALF D WHOLE
D WHOLE D WHOLE D WHOLE D WHOLE C HALF
C HALF C HALF C HALF C HALF C HALF
C HALF C HALF C HALF C HALF C HALF
The 60 IFS coefficients which generated this melody look no different to the human eye than those which generated Examples 2 and 3; the genetic algorithm, however, is able to learn to predict, with a certain degree of accuracy, which code vectors will lead to such monotonous and obviously terrible melodies. In the SFE framework, the long repeated strings are strong short-scale patterns, which are not outweighed by any long-scale patterns.
6.6 On the Aesthetics of Computer Music
The use of the computer to generate pictures and music naturally raises the question of the aesthetics of these digitally-conceived art forms. In this section I will sketch out a few ideas on this subject, specifically focused on the fractal music generated in the previous section.
The question of musical aesthetics is a broad one. All sorts of factors affect the subjective judgement of musical quality: the timbre of the sounds involved, the comprehensibility of the melody, the nature of the harmonies, the kinesthetic "feel" of the rhythm, the listener's familiarity with the musical style.... Beneath all these different factors, however, there would seem to be some kind of core. The essence of musical quality, one feels, has to do with the patterns of the rising and falling of notes. Without attractive note patterns, none of the other factors will come into play in the first place. The essence of musical aesthetics resides in the single melody line. A single melody line -- just a series of notes with varying pitch and duration, perhaps twenty to fifty notes long. This is a very discrete thing, just a selection from a finite number of permutations of pitch/duration combinations. The million dollar question is, why are some permutations so much more attractive than others? This question becomes particularly acute when one uses computer programs to generating melody lines. I have used perhaps a dozen different mathematical structures for generating melody lines, with widely varying results. Most, such as the chaotic attractors of quadratic maps), produce melodies that sound essentially random, despite the obvious presence of mathematical structure. Others, such as the invariant measures of large random one-dimensional iterated function systems, produce predominantly over-repetitive melodies. On the other hand, there is the odd mathematical structure which produces truly interesting and "catchy" tunes; an example, as described in the previous section, is the invariant measures of a small one-dimensional IFS (Goertzel, 1995). But the frustrating thing is that, even when one happens upon a structure that produces good melodies, one has no idea of why. What is it about small IFS's that "rings a bell" with the human mind?
The first step toward a resolution of the problem of melody lines, I propose, is the recognition that musical appreciation is a kind of emotion. In particular, it is a kind of pleasure. Even a favorite sad piece of music brings intense pleasure as it moves one to tears. In order to understand why certain melodies are enjoyable, we must understand the structure of the emotions to which they give rise.
Mandler (1985) proposes that each emotion may be resolved into a "hot" part and a "cold" part. The hot part of the emotion is the "raw feel" of it, the conscious experience of the emotion. The cold part, on the other hand, is the underlying logic of the emotion, the interplay of other factors which gives rise to the emotion. When I speak of the structure of emotion, I am speaking of this "cold" part. The topic in question is the structure of the emotion of musical appreciation.
There are, basically, two approaches to the psychological analysis of the structure of emotion. The first is the combinatory approach, which begins with a few basic emotions and tries to build other emotions out of these. The second is what might be called the "frustrated expectations" approach, according to which an emotion occurs only when some mental process doesn't get what it expected. The two approaches are not necessarily incompatible, but they represent different perspectives.
The frustrated expectations approach originated with the French psychologist Paulhan in 1887. Paulhan was long on philosophy and short on specifics, but he did give one very precise definition. Happiness, he said, is the feeling of increasing order. And unhappiness is the feeling of decreasing order. Happiness occurs when processes receive an overabundance of order, more order than they expected; when chaos is surprisingly replaced with structure. On the other hand, unhappiness occurs when processes are let down by the absence of of order, when they get chaos but expected structure.
The combinatory approach, on the other hand, would seem to have a firmer biological foundation. There are certain emotions that seem to be grounded in human physiology. Rage and lust are the two most obvious ones: even reptiles would seem to experience these. Also, there is a feeling of warmth and tenderness that develops in mammals between the infant and its mother. There is a feeling of curiosity and exploration, which is easily detected on EEG recordings in humans, rats and monkeys.
It seems clear, however, that these basic emotions may be interpreted in a Paulhanian fashion. Lust, tenderness and exploration are all feelings of increasing order -- they are special kinds of happiness. On the other hand, rage is a feeling of decreasing order; it occurs when something threatens the order of one's world. So, on a very abstract level, we may arrive at a unified view of emotion. The question is, what does this general picture tell us about the specific emotion of musical appreciation?
As mentioned above, this path has been trodden before, by the excellent musical psychologist Meyer, in his book "Emotion and Meaning in Music." Meyer, drawing on the frustrated expectations theory of emotion, argues that high quality melodies work by frustrating expectations, and then fulfilling them. In this way they cause a temporary unhappiness, followed by a much greater happiness -- a feeling of a lack of order, followed by a more intense feeling of increasing order. It is worth noting that the temporary unhappiness is necessary for the ensuing happiness: there is a practical "ceiling" to the amount of order a mind can experience, so that, after a certain point, in order to have the feeling of increasing order, one must first have the feeling of decreasing order. The key is to emphasize theincrease, the happiness, over its counterpart. I like to summarize Meyer's theory with the phrase, "surprising fulfillment of expectations," or SFE. A good melodic line is one that makes the mind feel: "Oh, wow! Yes -- that's what those notes were doing back there. I see now: it all fits. Beautiful!"
The trouble with Meyer's theory, however, is the difficulty of assessing what kinds of expectations are involved. Meyer, in his book, makes use of standard ideas from the theory of Western classical music. But these ideas are of little use in analyzing, say, Balinese gamelan or be-bop or punk rock, let alone experimental computer-composed music.
In the previous section I sought to use the theory of SFE as a guide for the computer composition of melodies. I wrote an algorithm which assessed the degree to which a given melody manifested the surprising fulfillment of expectations, and used a technique called the genetic algorithm to cause the computer to "evolve" melodies with a high SFE value. But the results, though exciting and in many ways successful, were not entirely positive. My implementation of SFE, in terms of repeated note, interval and contour patterns, did serve the role of filtering out pseudo-random melodies. But it did not filter out all of these and, what is worse, it did not reliably distinguish interesting melodies from boring, repetitive ones. Simple repeated patterns, balanced by a fixed algorithm, do not provide enough insight.
And this brings me to my key contention. I believe that the SFE theory is essentially correct. But this does not imply that the quality of a melody can be assessed by mathematical (as in my program) or music-theoretic (as in Meyer's book) criteria alone. it may be that certain note patterns have an intrinsic emotional value, due to their associations with other experiences. These note patterns, when used appropriately in a musical composition, will cause a much greater experience of pleasure than will other note patterns. A surprising fulfillment of expectations is all right, but not as good as a surprising fulfillment of expectations by the right kind of pattern!
But what is, precisely, the "right" kind of pattern? Here I will turn back to the 1800's again, to another philosophy of music: that of Arthur Schopenhauer. Schopenhauer, as is well known, argued that music had a special place among the arts. Music, he said, was closer than anything else to tracking the movements of the Will. The patterns by which notes move are very similar to the patterns by which our own free will moves. In Schopenhauer's view, the act of willing is only means by which we regularly come into contact with true reality; thus his philosophy of music gives music a very important role indeed!
Despite its excesses and peculiarities, I believe Schopenhauer's idea is a powerful one. Recall the two approaches to the study of emotion, discussed above: the frustrated expectations approach, and the combinatory approach. Meyer's theory of musical aesthetics corresponds to the frustrated expectations approach; and Schopenhauer's theory corresponds, a bit more loosely, to the combinatory approach. Schophenhauer argued that, in Nietzsche's phrase, "in music the passions enjoy themselves." The patterns of music, according to Schopenhauer, are precisely the same patterns according to which we act andfeel. The combination of musical elements into musical compositions is like the combination of emotional and behavioral elements into complex actions and feelings.
So what does this all add up to? I suggest that, just as both theories of emotion are partly correct, so both theories of musical psychology are partly correct. One needs the structure of SFE, but one needs this structure to be supported by musical patterns that reflect our basic emotional and behavioral patterns, our psychic/bodily rhythms.
And, finally, what does this have to do with computer-composed music? At first glance, it might seem that the prognosis is not so good. Unless we can program a computer to determine those patterns which reflect our human actions and emotions, one might conclude, then computer music will continue to sound "sterile," as so much of it does today. But yet there are counterexamples; there are programs such as my own IFS algorithm, which generate interesting, listenable melodies that are not at all "sterile." Somehow, it would seem, this algorithm is hitting on some humanly meaningful structures. The IFS method itself is better at producing catchy melodies than my jury-rigged SFE algorithm is at distinguishing them from poor ones.
The IFS algorithm, as I have implemented it, produces structures that deviate from the "tonic" note, and then return to it, and then deviate again, and return, et cetera. And while it deviates and returns, it tends to repeat the same contour patterns. For short melodies, these repetitions do not occur quite often enough to become dull. This sort of overall structure has an obvious psychological meaning to it: in many different contexts, we journey to and from the same mental state, repeating on our different journeys the same patterns with minor variations. This is a crude but plain example of how mathematical structures may reflect psychological structures.