Supercompiling Java Programs

 

Ben Goertzel, Andrei Klimov, Arkady Klimov

 

April 2002

 

 

 

Supercompilation is an innovative and general approach to global program optimization, first developed by Valentin Turchin in the 1970’s (see Turchin, 1986 for references into the older, Russian literature).  Qualitatively different from other, simpler global optimization approaches, supercompilation is based on a systems-theoretic methodology, in which the program being optimized is run in a special way (“driven”), by a metaprogram.  The metaprogram mathematically models the execution of the program as it runs, and uses this model to work out a more efficient program for doing the same computation.

Until recently, the only existing supercompiler was a prototype software application aimed at optimizing programs written in the language Refal (Turchin, 1989).   Since 1998, however, a project has been underway to create a supercompiler for the Java programming language.  This effort is a very significant one, because it is the first-ever effort to apply such advanced global program optimization technology to programs written in a real-world, practical programming language, rather than a programming language like Refal or LISP that is used primarily for academic research.

In this article, we will describe the general concept of supercompilation, illustrating various points that arise along the way with concrete examples obtained using the current version of the Java supercompiler.  Particular stress will be laid on the relationship between supercompilation and partial evaluation, a more popular and simpler but much less effective global optimization technique.  We will elucidate the relation between supercompilation and AI as well, showing that the “driving” methodology at the heart of supercompilation can be used to solve some problems traditionally considered in the domain of artificial intelligence (an indication of the depth with which the supercompilation process “thinks” about program efficiency).  At the end of the article we will speculate a little about the future impact that a widely available, generally effective Java supercompiler may have on programming practices in general.

 

 

Approaches to Program Optimization

 

The problem of  fully optimizing a arbitrary computer program, when mathematically formalized, turns out be an uncomputable problem (this is related to algorithmic information theory; see, Chaitin, 1987 and Bennett, 1986).  There is no possibility of ever creating an algorithm that can fully optimize a general computer program.   Fortunately, though, the practical problem of program optimization -- though still very difficult -- is a lot easier than the general mathematical problem to which it corresponds.  Because, in reality, we do not need to optimize arbitrary mathematically given computer programs, we only need to optimize programs written by humans.  And humanly-written programs tend to have a lot of special structure too them, including both obvious and subtle redundancies that lead to inefficiencies.

Supercompilation does not try to solve the impossibly hard problem of fully optimizing an arbitrary computer program.  Rather, it is an extremely valuable heuristic technique, which seeks to find ways of significantly optimizing a wide variety of computer programs.  Most real-world computer programs are chock full of inefficiencies, which can be removed via the supercompilation process.  These inefficiencies are often hard to see in the source code, but they are highly transparent when the code is translated into a more mathematical form, as is done inside the supercompiler.  Some of these inefficiencies are introduced by sloppy programming; others are placed there intentionally, by programmers interested in goals besides efficiency, such as maintainability and comprehensibility of code, or speed of software development.  In many cases the most efficient way to implement a given piece of code is a truly humanly infeasible way, involving literally hundreds of times more lines of code than the ways that occur naturally to humans.  

Most previous projects touching the program optimization domain have stayed near the edges, basically just tweaking existing code rather than studying what the code does overall and trying to figure out the best way to do it.  Such approaches are called local program optimization, meaning that they look at small pieces of a program and try to figure out how to make them work more efficiently.  A good example of this is Hotspot, which accelerates Java software performance by locally optimizing bytecode during the interpretation process (Sun Microsystems, 1999).  This is a valuable thing to do, but it cannot fix inefficiencies that have to do with the overall structure of a program, or a program segment. 

The radical difference between this approach and supercompilation is indicated by our experimental observation that Hotspot optimizations and supercompiler optimizations are essentially orthogonal.  In other words, if Hotspot speeds up a program 5 times, and the current supercompiler speeds it up 20 times, then applying Hotspot and the current supercompiler will probably speed it up around 100 times.   Eventually, when supercompilers are sophisticated enough, it will encompass localized optimizations like Hotspot does, thus rendering specialized local optimization tools like Hotspot unnecessary; for the time being, however, our existing supercompiler has a different focus, which is complementary to that of local optimization methods.

Other researchers have voyaged further into the global program optimization domain than the Hotspot team, creating methods that can rewrite the overall structure of a program in certain special cases.  The primary technology here is partial evaluation, which is very old news in the LISP and functional programming communities.  There is an academic research group engaged in developing a partial evaluator for Java -- the COMPOSE Project in INRIA, France (www.irisa.fr/compose/); though their work seems to be at a relatively early stage (see Lawall et al, 1999 for some examples).  The basic insight underlying partial evaluation is that sometimes you know, before running a program, what the inputs to one or more functions are going to be.  These functions can then be automatically rewritten to give better performance on the given specific inputs.  This is already getting into a dangerous and exciting domain; for there is no general way to figure out the optimal specialization of a function to a set of inputs, there is merely a fairly effective set of heuristics.  There are currently no commercial products providing partial evaluation for any real-world programming language, but we can probably expect such products to emerge in the not too distant future.

Like partial evaluation, supercompilation is a global program optimization method: it takes in sourcecode, and outputs new sourcecode, which does the same thing, but may be structured entirely differently.  However, it is as far beyond partial evaluation, as partial evaluation is beyond simple localized techniques.  Partial evaluation and related approaches all involve some kind of direct program transformation.  A program is transformed, by the step by step application of a series of equivalences, into a different program, hopefully a more efficient one.  Supercompilation is another kind of algorithm.  A supercompiler contains a “metaprogram” that runs the program under optimization in a special way (a process called “driving”) and constructs a model of the program’s behavior.  This model is in a special mathematical form, and in most practical cases, it can be used to create an efficient program doing the same thing as the original one. 

The techniques used to create the mathematical model of a program and then optimize it are complex and various, so that “supercompilation” is actually a grab-bag of optimization methods, of varying complexity and subtlety.  All of these methods, however, act on the same mathematical model of a program, and fit into the same overall algorithmic framework, which we will describe below, giving illustrative examples from recent work supercompiling simple Java programs.

 

Why Java?

 

The choice of Java as a target for current supercompilation research is not coincidental.  It was desired to choose a real-world programming language, in which there are numerous examples of large-scale programs needful of multifold optimization.  C and C++ would seem to be the first choice; however, these languages are particularly unsuitable for global program optimization techniques, because of their use of pointers, which directly manipulate the memory heap.  In order to do global program optimization on a C/C++ program, one would have to include in one’s mathematical model of a program, a complete mathematical model of the memory heap and its interactions with the program.  This is not impossible, but it would entail a huge increase in the complexity of the task involved. 

Java is not a “toy” language, it is full of complex features like object hierarchy and multithreading, which help a great deal with code efficiency and maintainability, but present significant obstacles to the translation of Java programs into the simpler, purely mathematical format required by the supercompiler’s optimization algorithms.  But, based on our experience, the complexities of Java would seem to be reasonably manageable within the supercompilation framework, given the computational capabilities of modern hardware.  Other contemporary commercial languages that might be suitable for supercompilation include C# and Visual Basic, neither of which involves the kind of heap memory manipulation that characterizes C and C++.

Of course, direct heap memory manipulation buys C and C++ programs a lot of efficiency, which is why programmers put up with it.  However, the advent of supercompilation technology for practical languages like Java is likely to change this situation.  The efficiency that direct heap manipulation buys may soon pale before the efficiency that it loses by making supercompilation more difficult.  In this way, supercompilation technology may well cause a fundamental alteration in the way programming is done.  The most efficient languages will be the ones that are most amenable to supercompilation; and the best programming style will be the one that ensures human maintainability and maximizes supercompilability.

 

 

How Supercompilation Works

 

The internal behavior of the supercompiler is, not surprisingly, quite complex; what we will give here is merely a high-level overview and summary.    There are three separate levels to the supercompilation ideas: first, a general philosophy; second a translation of this philosophy into a concrete algorithmic framework; and third, the manifold details involved making this algorithmic framework practicable in the Java context.  The discussion here focuses on the first two levels, and is fairly closely modeled on the presentation given in Valentin Turchin’s paper The Concept of a Supercompiler (Turchin, 1986).  Some more recent results on supercompilation in the Refal language are given in (Turchin, 1996).

The key philosophical concept underlying the supercompiler is that of a metasystem transition.  In general, this term refers to a transition in which a system that previously had relatively autonomous control, becomes part of a larger system that exhibits significant controlling influence over it.  For example, in the evolution of life, when cells first become part of a multicellular organism, there was a metasystem transition, in that the primary nexus of control passed from the cellular level to the organism level.

The metasystem transition in supercompilation consists of the transition from considering a program in itself, to considering a metaprogram which executes another program, treating its free variables and their interdependencies as a subject for its mathematical analysis.   In other words, a metaprogram is a program that accepts a program as input, and then runs this program, keeping the inputs in the form of free variables, doing analysis along the way based on the way the program depends on these variables, and doing optimization based on this analysis.

The metaprogram executes a Java program without assuming specific values for its input variables, creating a tree as it goes along.   Each time it reaches a statement that can have different results depending on the values of one or more variables, it creates a new node for the tree.  This part of the supercompilation algorithm is called driving – a process which, on its own, would create a very large tree, corresponding to a rapidly-executable but unacceptably humongous version of the original program.  The other part of supercompilation, configuration analysis, is focused on dynamically reducing the size of the tree created by driving, by recognizing patterns among the nodes of the tree and taking steps like merging nodes together, or deleting redundant subtrees.

Finally, the tree that the metaprogram creates is translated back into a Java program, embodying the constraints implicit in the nodes of the tree.  This Java program is not likely to look anything like the original Java program that the metaprogram started with, but it is guaranteed to carry out the same function.

 

 

 

The Supercompilation Algorithm

 

The basic algorithm of supercompilation is simple but massively recursive. It involves two main processes: driving and configuration analysis.

During the driving process, the metaprogram proceeds line by line through the program, acting just like an ordinary interpreter.   As it proceeds, it constructs a “state graph,” each node of which, somewhat roughly speaking, corresponds to a source program point and a possible set of value assignments for local variables inside the program.  The children of a node N in the state graph correspond to all other nodes that may be obtained from N by one step of computation.

The problem with this state graph is that it may be infinite; and even where it’s not infinite, it may be unacceptably large.  This is where “configuration analysis” comes in: its goal is to adaptively transform this large, potentially infinite state graph into a state graph of manageable size.  The finite state graph found corresponding to the program, will then be used to generate a new Java program.  Roughly speaking, the bigger the state graph, the bigger the Java program output by the supercompiler –sometimes supercompiled programs involve a huge number of lines of source code even for doing relatively simple things.

Suppose, during the driving process, a leaf node V of the state graph is found which has the following property:  V has an ancestor Va and it seems likely that continued driving will generate an infinite sequence Va,...,V,.…  Then V should not be driven any further; instead, an operation called “generalization” should be performed.   Making this guess as to whether an infinite sequence is likely to be generated, is a matter of some subtlety.  The criterion telling when to stop driving is called the "whistle" condition.

A simple condition sometimes used is: we stop driving if we reach a node that is identical to one of its ancestors, or is (in a certain precise sense) a superset of one of its ancestors.  This condition, however, is not perfect: it will sometimes stop driving when it would in fact be beneficial for driving to continue!  The formulation of an heuristically near-optimal whistling criterion is not obvious and may vary according to the type of program being supercompiled.

There are techniques called "transient reductions" and "characteristic trees" that make the supercompilation process more efficient by reducing the number of nodes on which one must check the whistling condition.  These speed up the process on the average case but can wreak considerable havoc in the worst case, in the case of transient reductions sometimes leading to an infinite state graph.

The basic rule followed, when encountering a node and considering possible generalization, is as follows.  Here the node V is understood to be associated with a set of “configuration variables” (roughly, local variables inside the supercompiled version of the program), and its ancestor node V' is similarly associated with a set of configuration variables.  For each ancestor node V', one of the following steps is carried out:

 

1.      If V is a subset of V', reduce V to V' by a substitution.

 

  1. Generalize V and V', i.e., find Vg so that both V and V' are subsets of Vg; then erase the driving subgraph which starts at V, reduce V to Vg and go on driving Vg.

 

  1. Do nothing on V' (the nodes V and V' are "far from each other"), and examine the next ancestor.

 

If all ancestors are considered and neither step 1 or 2 is activated for any of them, then the algorithm just goes back to driving V'.

The state graph created by pure driving is a tree, but the generalization process leads to the creation of a dynamically growing augmented state graph that is no longer a tree.  It contains special nodes called “generalization nodes,” which can contain return edges representing generalization relations between nodes and their ancestors, discovered in Steps 1 and 2 above.

Like the whistling condition, the order of search among the many possible ancestors of a given node V, is another place where heuristics may be introduced, and where the choice of heuristic can significantly affect supercompilation performance.

The supercompilation process ends when the graph becomes self-sufficient, i.e. when it shows, for every node in its model of the program P, how to make a transition to the next node which corresponds to at least one step of the program P. This graph is a program for computing the method invocation represented by the initial node.

Finally, this basic scheme of supercompilation is usually extended by keeping a list of basic configurations, which are those nodes for which the transition graph is already known. Then before examining the ancestors of the node V the supercompiler compares it with basic configurations, and if one of them, Vb, is such that V is a subset of Vb, then V is reduced to Vb. As an option, the supercompiler may make this reduction only if V=Vb -- this is another illustration of the existence of various heuristic strategies of supercompilation.

 

 

 

 

Partial evaluation vs. Supercompilation

 

We discussed above the global program optimization technique known as partial evaluation or partial evaluation.  It is best to consider this simpler approach, not as an alternative to supercompilation, but rather as a special case of supercompilation, important yet severely limited in scope.

In essence, the difference between partial evaluation and supercompilation is that the former requires a purely static analysis to decide which parts of code will be optimized out at specialization time, whereas the latter, via the process of metacomputation, makes decisions using a much larger information set, including information perceivable only during the course of execution.   Partial evaluation is dependent upon a type of program analysis called Binding Time Analysis (BTA), which has no access to the concrete values taken on by local variables in a program as it executes, and hence does not use the maximum information available at a given point in the execution of a program.

Specifically, in practice, BTA performs separation of program code into two types of fragments:

 

·        Those which can be executed based on variable values known at specialization time (these are annotated S, for static)

·        Those which cannot be thus executed, and which are therefore left in the residual program, the part of the program that is not modified by the partial evaluation process (these are annotated D, for dynamic)

 

In the case of complex real-world programs, only a very small percentage of code can be specialized in this manner, which means that partial evaluation in and of itself is often of relatively limited utility.

On the other hand, the essence of supercompilation is driving performed without knowing any information in advance (besides the program code itself).  The only information required by supercompilation is that which is created dynamically via configuration analysis (though some a priori information on method properties may be useful in certain caes).   Driving works perfectly well with program fragments that cannot be specialized with statically separated binding times.  

            The step from partial evaluation to supercompilation is a large one: on the one hand, a purely static analysis of variable-value relationships; on the other, a full-on metacomputation framework in which a program is executed by another program, keeping records and making analyses as it goes along.  So far, however, nobody has found any significant intermediate forms of global program optimization.  A few small simplifications of supercompilation may be identified, but by and large, if one wants to go beyond the simple analyses obtainable through partial evaluation, one has to pass through the metasystem transition to supercompilation.

 

 

Supercompilation-Based Java Program Specialization

 

            In this section we will give some concrete examples of Java program specializations carried out by the supercompiler.  Essentially, these examples show the supercompiler acting like an extremely effective partial evaluator, although its methods are a bit different even in this special case.  The following few sections will present concrete examples of the supercompiler carrying out more advanced operations.

A good illustrative example of program specialization is binary search.   Suppose one has a software system that is very frequently presented with the task of searching sorted lists of length n for specified objects.  What program specialization does is to automatically create a specialized search routine that applies only to lists of length n.  Such specialized search routines can be very complicated; here we’ll give a trivial example for the case n=4.  Of course, significant speedup will not be obtained for n=4; and in general, there are many other cases where more significant speedup can be obtained than searching lists, since binary search is actually a highly efficient algorithm.  This example is a good conceptual demonstration of the ability of program specialization (as implemented within the Java supercompiler) to find a perfect decision tree, however.

The class BinarySearch, below, embodies a general binary search routine.  The method test invokes the routine for a particular input array of length 4.

 

 

public class BinarySearch
{
private final float[] array;
 
protected BinarySearch() { array = null; };
 
protected BinarySearch (float[] array)
{
this.array = array;
}
 
public static BinarySearch create (float[] array)
{
return new BinarySearch(array);
}
 
public int getIndex(float x)
{
return getIndexB(x, 0, this.array.length);
}
 
private int getIndexB(float x, int lower, int upper)
{
int mid;
while (upper - lower > 1) {
mid = (upper + lower) / 2;
if (x < this.array[mid])
upper = mid;
else
lower = mid;
}
return lower;
}
 
static final BinarySearch bs = new BinarySearch (new float[4]);
 
/************************************************************/
public static int test (float x)
{
return bs.getIndex(x);
}
/************************************************************/
 
public static void main (String args[]) {
bs.array[0] = 0.5f;
bs.array[1] = 1.5f;
bs.array[2] = 2.5f;
bs.array[3] = 3.5f;
System.out.println( test(0f) );
System.out.println( test(1f) );
System.out.println( test(2f) );
System.out.println( test(3f) );
System.out.println( test(4f) );
}
}

 

The supercompiled version of the test method is as follows:

 

public static int test (final float x_6)
{
  final float[] array_1 = BinarySearch.bs.array;
  if (x_6 < array_1[2]) {
    if (x_6 < array_1[1]) {
      return 0;}
    return 1;}
  else {
    if (x_6 < array_1[3]) {
      return 2;}
    return 3;}
}

 

 

A more realistic and useful, though still relatively simple, instance of program specialization involves the Fast Fourier Transform.   A standard implementation of the FFT was created, using objects of class Complex to represent complex numbers.  The FFT code itself is a bit long to include in this article, but it can be viewed online at www.supercompilers.com/fftjava.htm.   A method test was then written, which takes in an array of doubles of length 2*n that represent a sequence of n complex numbers in pairs, and returns its Fourier Transform represented similarly.  (One thing that happens during the supercompilation process is that all objects of class Complex are completely eliminated.) 

Now, suppose one has to frequently evaluate the Fast Fourier Transform for some fixed value n.   As in the binary search example, one may perform program specialization, a special case of supercompilation.   For example, here is the supercompiled version of method test for n = 4:

 

public static double[] test (final double[] arg_1)
{
  if (arg_1.length != 8) {
    throw new java.lang.IllegalArgumentException();}
  final double arg_0_12 = arg_1[0];
  final double arg_1_13 = arg_1[1];
  final double arg_2_16 = arg_1[2];
  final double arg_3_17 = arg_1[3];
  final double arg_4_20 = arg_1[4];
  final double arg_5_21 = arg_1[5];
  final double arg_6_24 = arg_1[6];
  final double arg_7_25 = arg_1[7];
  final double re_111 = arg_0_12 + arg_4_20;
  final double im_112 = arg_1_13 + arg_5_21;
  final double re_117 = arg_0_12 - arg_4_20;
  final double im_118 = arg_1_13 - arg_5_21;
  final double re_141 = arg_2_16 + arg_6_24;
  final double im_142 = arg_3_17 + arg_7_25;
  final double re_147 = arg_2_16 - arg_6_24;
  final double im_148 = arg_3_17 - arg_7_25;
  final double re_178 = re_147 * 6.123031769111886e-17D - im_148;
  final double im_180 = re_147 + im_148 * 6.123031769111886e-17D;
  final double[] res_242 = new double[8];
  res_242[0] = re_111 + re_141;
  res_242[1] = im_112 + im_142;
  res_242[2] = re_117 + re_178;
  res_242[3] = im_118 + im_180;
  res_242[4] = re_111 - re_141;
  res_242[5] = im_112 - im_142;
  res_242[6] = re_117 - re_178;
  res_242[7] = im_118 - im_180;
  return res_242;
}

 

Empirically, in Java, this gives a roughly 20 times speedup over the generic FFT.  Similar results are obtained for larger n.   For instance, the supercompiled FFT for an input series of length 16 may be obtained at the URL www.supercompilers.com/fft16java.htm.

            Of course, in reality we won’t always know the array size in advance.  In this sort of case, there are two potential solutions.  One is what is known in the partial evaluation community as “The Trick” – the replacement of a function body with a list of conditionals directing inputs, whenever possible, to the most appropriate member of a set of specialized versions of the functions.  In the FFT case, one may use program specialization to create a set of methods specialized to sizes arg1.length=2k, k=1,…,m.    The body of the method test may then be rewritten to take an input array of size m, and

 

  1. If m> 2m, simply carry out the standard FFT
  2. Otherwise, pad it with zeros until it reaches the size 2k for some k, and feed it to the specialized version of the FFT created for inputs of size 2k

 

The other alternative, where n is not known in advance, is to do what we call dynamic supercompilation.  Essentially, this means specializing the FFT to the particular size n “on the fly”, by supercompiling the FFT algorithm to n during runtime, and then dynamically loading the supercompiled version.  This will not always be worthwhile, because the supercompilation process itself takes some time.  But if n is very large it may well be worthwhile; and there are other examples besides the FFT where this kind of dynamic supercompilation strategy can given an even greater payoff.

 

 

Handling Dynamic Variables

 

As stressed above, program specialization is far from the only operation performed by the supercompiler; it is merely the simplest.  Other examples of relatively elementary things the supercompiler does are: unwrapping hierarchies of nested functions into single functions with immense bodies, and automated decomposition of function bodies into conditionals joining automatically-generated specialized functions.   The most complex manifestations of supercompiler behavior are hard to express in simple programming-language terms and can only be communicated using a specialized mathematical formalism.  The relationship between program specialization and one particular aspect of supercompilation (program composition) was explored in detail in (Klimov, 1998).

We will now review a concrete example of the sort of thing supercompilation can do but straightforward program specialization can’t.  The example is very simple, almost trivial, but it involves variables whose values are set dynamically during program execution.  Supercompilation, unlike program specialization, is still able to offer significant optimization in this case, because it can reason abstractly about the manipulations done with the dynamic variables, recognize redundancies in these manipulations and make optimizations accordingly.  

Consider the method:

 

 

        float m (float x) {

            if (x<=0) then return 0;

            else

            if (x>=1) then return 1;

            else return 2-3*x;

        }

 

 

where the variable x is annotated D (its exact value is not known in advance).

Let us suppose that the usage of this method is as follows:

 

        y = x0;

        for (int i=0; i<5; i++)

            y = m(y);

 

Suppose that x0 and hence y are also annotated D, meaning that their specific values are not known until the program is actually executed.

Under straightforward program specialization, we'll obtain simply:

 

        y = x0;

        y = m(y);

        y = m(y);

        y = m(y);

        y = m(y);

        y = m(y);

 

 

which really doesn’t accomplish much.

An implementation of program specialization that includes function inlining will yield a very slightly more sophisticated result:

 

            y = x0;

            if (y<=0) then y=0;

            else

            if (y>=1) then y=1;

            else y = 2-3*y;

            if (y<=0) then y=0;

            else

            if (y>=1) then y=1;

            else y = 2-3*y;

            if (y<=0) then y=0;

            else

            if (y>=1) then y=1;

            else y = 2-3*y;

            if (y<=0) then y=0;

            else

            if (y>=1) then y=1;

            else y = 2-3*y;

            if (y<=0) then y=0;

            else

            if (y>=1) then y=1;

            else y = 2-3*y;

 

 

 

But, although this result is more complex, the practical impact of inlining here is essentially negligible.  For all x there will still be between 5 and 10 comparisons.  Without being able to dynamically study the values taken on by x, x0 and y, not much optimization can be done.

Now, what happens under supercompilation?  The inlining is done automatically, and the large function body obtained through inlining is then substantially optimized.  We obtain a much more interesting result:

 

if (x0<=0) then y=0;

            else

            if (x0>=1) then y=1;

            else {

                y = 2-3*x0;

                if (y<=0) then y=0;

                else

                if (y>=1) then y=1;

                else {

                    y = 2-3*y;

                    if (y<=0) then y=0;

                    else

                    if (y>=1) then y=1;

                    else {

                        y = 2-3*y;

                        if (y<=0) then y=0;

                        else

                        if (y>=1) then y=1;

                        else {

                            y = 2-3*y;

                            if (y<=0) then y=0;

                            else

                            if (y>=1) then y=1;

                            else {

                                y = 2-3*y;

            }   }   }   }   }

 

 

As is evident from this code, the supercompiler has actually rethought the series of conditionals.  There is a tree of possibilities implicit in the series of conditionals in the inlined version of the program.  Program specialization cannot optimize this tree, but supercompilation can.  Driving follows the various branches of the tree, and when several branches lead to the same point, it is able to optimize the code accordingly. 

The result is a significant degree of code optimization.   With the supercompiled code, the average number of comparisons for x uniformly distributed in [-1,2] will be less than 2.5. (the number of subtractions and multiplications is always the same).

            Of course, this is a very simple example, and the speedup here is significantly less than has been obtained in more complex and realistic examples.  But the example illustrates well the superior mathematical power of supercompilation.

            On the other hand, what if the variable n in the program

 

        y = x0;

        for (int i=0; i<n; i++)

            y = m(y);

 

 

is itself dynamic, or is so large that loop-unfolding is impractical?   In this case the significant optimization described above cannot be done, but the Java supercompiler can at least reduce overhead somewhat, by producing a program such as:

 

        y = x0;

        for (int i=0; i<n; i++) {

            if (y<=0) then { y=0; break; }

            else

            if (y>=1) then { y=1; break; }

            else y = 2-3*y;

            }

 

This is simple and elegant, and it is what a human programmer set the task of optimizing the program fragment would likely do under the circumstances.

 

 

 

Driving as Problem-Solving

 

An amusing and fascinating illustration of the power of driving is the application of the Java supercompiler to solve logic puzzles.  This is actually almost entirely an application of driving; no significant configuration analysis is required for the example to be presented.

We will describe a simple logic puzzle, attributed to Albert Einstein, and frequently used as an illustration of Prolog’s resolution-based theorem-proving capability.   The point is not that the supercompiler solves the problem faster than Prolog; this has not even been benchmarked.  The conceptual point we wish to illustrate is, rather, that what driving does is a kind of logical problem-solving.   It is this problem-solving ability of driving that underlies the ability of the supercompiler to optimize simple series of conditionals as reviewed above, and far more complex programs as well.  

Driving is more powerful than the backtracking heuristic underlying Prolog, but it shares with backtracking the property of being too computationally inefficient to be used on its own except in relatively simple cases.  Thus the supercompiler couples it with configuration analysis, which provides heuristic guidance to the driving process, allowing it to deal adequately with extremely complex situations.  But still, as the following example shows, driving possesses no small intelligence on its own.

The logic puzzle in question is the “fish problem”:

 

There are 5 houses in 5 different colors.  In each house lives a man with a different nationality.  The 5 owners drink a certain type of beverage, smoke a certain brand of cigar, and keep a certain pet.  No owners have the same pet, smoke the same brand of cigar or drink the same beverage.

Who owns the fish?

Hints:

The Brit lives in the red house.

The Swede keeps dogs as pets.

The Dane drinks tea.

The green house is on the left of the white house.

The green house's owner drinks coffee.

The person who smokes Pall Mall rears birds.

The owner of the yellow house smokes Dunhill.

The man living in the center house drinks milk.

The Norwegian lives in the first house.

The man who smokes Blends lives next to the one who keeps cats.

The man who keeps the horse lives next to the man who smokes Dunhill.

The owner who smokes Bluemasters drinks beer.

The German smokes Prince.

The Norwegian lives next to the blue house.

The man who smokes Blends has a neighbor who drinks water.

 

 

For a LISP solution to the problem, and a list of links to other programs solving the problem, see the URL http://www.weitz.de/einstein.html.

The approach to this problem via supercompilation was as follows.  A program Fish.java (viewable at the URL www.supercompilers.com/fishjava.htm) was written, which takes as input the array

 

 String[][] houses = new String[nofHouses][nofProperties];

 

There is a method

 

checkAndPrint(String [] [] houses)

 

which takes as its input an double array representing a candidate solution to the problem, and evaluates whether the double array represents a correct solution or not.  If the input does not represent a correct solution, the method returns an exception.  If it does represent a correct solution, it returns the answer (the String representing the nationality of the fish-owner).

Since the problem has only one answer, if the following array is fed into the checkAndPrint method, the result is “fish.”

 

    String[][] houses = {

        {"Norwegian", "Yellow", "Dunhill",    "Cat",   "Water" },

        {"Dane",      "Blue",   "Blend",      "Horse", "Tea"   },

        {"Brit",      "Red",    "PallMall",   "Bird",  "Milk"  },

        {"German",    "Green",  "Prince",     "Fish",  "Coffee"},

        {"Swede",     "White",  "BlueMaster", "Dog",   "Beer"  }

    };

 

Otherwise, the result is an a thrown exception. 

            In the unsupercompiled program, the checkAndPrint method is evaluated at all 525 = 298023223876953125 possible property assignments.   When the supercompiler is applied to fish.java, it is this method which is significantly optimized. 

The Java method obtained by supercompiling this method may be seen at www.supercompilers.com/fishjs.htm.  The first part of the method tells it to return an exception in one after another situation.  The second part of the method explicitly contains the solution to the problem:

 

java.lang.System.out.println("the owner of a fish = German") /*virtual*/;

  java.lang.System.out.println("") /*virtual*/;
java.lang.System.out.println("house 1: Norwegian Yellow Dunhill Cat Water ") /*virtual*/;

  java.lang.System.out.println("house 2: Dane Blue Blend Horse Tea ") /*virtual*/;

  java.lang.System.out.println("house 3: Brit Red PallMall Bird Milk ") /*virtual*/;

  java.lang.System.out.println("house 4: German Green Prince Fish Coffee ") /*virtual*/;

  java.lang.System.out.println("house 5: Swede White BlueMaster Dog Beer ") /*virtual*/;

  return;

:

 

What has happened here is that the supercompilation process has actually solved the problem.  The metaprogram ran the program, watching closely what was happening as it went along, and it carefully controlled the execution of the program.  It controlled the execution of the program so well that it produced an execution graph containing no wrong moves, and leading straight to the answer.  Turning this optimized execution graph back into a Java program, resulted in a program that directly embodies the solution, carrying out no search at all.  The supercompiled version of checkAndPrint does not check its input array against the given constraints, it simply checks the input against the answer, and responds to all other inputs with an immediate exception.  It’s not hard to see the continuity between this example and the previous one, involving the function float m (float x).  In both cases, we see driving effectively finding an efficient path through a tree of possibilities.

In general conceptual terms, what this example indicates is that the line between sophisticated global program optimization and AI is a thin one indeed.  Prolog is considered an AI language, and the heuristic solution of logic puzzles is considered an AI problem.  But the driving methodology – which is merely the simplest half of supercompilation – solves this sort of logic puzzle with efficiency basically comparable to that of an AI algorithm written with such puzzles in mind.  With sophisticated configuration analysis heuristics, supercompilation is capable of solving AI-ish problems of much greater complexity than Einstein’s fish problem.  Unlike simpler optimization methods such as program specialization, supercompilation uses a metaprogram to run (drive) the program under optimization -- and it “thinks,” during the driving process, about how to make the running of the program more efficient, based on all the information collected during the process of driving.

 

Supercompilation and AI

 

It is interesting to speculate about where the AI aspects of supercompilation, highlighted in the Einstein fish problem, lead in the future.  In the case of the fish problem, supercompilation allows the programmer to implement a manifestly stupid algorithm -- one that is so inefficient it couldn’t pragmatically be used to solve the problem at hand -- and nevertheless get an efficient answer.  This suggests that, eventually, it might be possible to do something like implement an algorithmically crude approach to Fourier transformation, and have an advanced supercompiler not only specialize the algorithm to perform effectively for particular input sizes, but improve the algorithm itself in fundamental ways (similarly to how Cooley and Tukey originally improved on crude Fourier transformation approaches to derive the FFT algorithm in the first place). 

While this sort of grand achievement will be considerably beyond the reach of the first complete Java supercompiler, it is worth keeping such visions in mind as technology development proceeds.  Like chess playing and algebraic calculation, global program optimization is a task at the borderline of AI and mathematics; and an area where computational methods may achieve great things via their profoundly non-human ability to attend to a large number of details at once, and synthesize these details in a goal-directed way.  

The first Java supercompiler will be, loosely speaking, a cousin of programs like Deep Blue and Mathematica, which succeed by applying relatively simple algorithms on a scale of detail that the human mind cannot handle.  Later supercompiler versions will go far beyond this approach, incorporating more and more sophisticated configuration analysis heuristics, including some which learn from experience supercompiling different programs, and some which adaptively perform dynamic supercompilation based on patterns observed in the runtime behavior of a program.  Of course, optimization of speed may be coupled with optimization of memory usage – a strategy that will have particular power once a supercompiler is given the ability to model the behavior of the built-in Java garbage collector, or plug its own more specialized garbage collection routines into the JVM (for recent work with superior pluggable Java garbage collectors, see Blackburn et al, 2001).   And ultimately, a supercompiler should even be able to perform optimizations aimed at approximating rather than exactly simulating the behavior of a program, which may be very valuable in cases where speed of performance is more important than precision. 

 

Supercompilation and Programming Practice

 

Returning to more immediate and pragmatic considerations, what impact can we realistically expect supercompilation technology to have on practical programming, during the next 5-10 years?   At present we do not have a complete, generally effective supercompiler for Java or for any commercially practical programming language.  However, substantial progress has been made in this direction already, and it seems plain that with at most a few more years of effort, a generally effective Java supercompiler will be completed.  As development proceeds, the category of examples to which the Java supercompiler can be successfully applied is continually expanding.

So, although at the moment Java supercompilation is an R&D project, development has progressed far enough that we have gained a reasonable intuitive understanding of the impact that the technology, when completed and widely distributed, may have upon the practice of software engineering. 

Optimizing compilers have had a huge impact on the efficiency of real-world C++ code, and HotSpot and other similar tools have had a huge impact on the efficiency of real-world Java code.  However, these are all relatively simplistic from an algorithmic perspective; they do not liberate the programmer from the need to create efficient sourcecode.  The advent of supercompilation technology for Java (or for other practical languages like C#) can be expected to have a significant more profound effect. 

To take a somewhat oversimplistic case, let us return to the FFT example given above.  Until a truly superintelligent supercompiler is available, it will definitely be valuable to write a program embodying the FFT algorithm, as opposed to some less algorithmically efficient approach to computing the Fourier transform.  However, with a even a primitive supercompiler available, there is no point to trying to optimize one’s general FFT code for speed in any way.  One may as well write the code as simply, transparently and maintainably as possible; because the supercompiler is going to do the code optimization automatically, and using techniques far too tedious for humans to realistically emulate.

Similarly, if one is writing a language interpreter, there is no need to hack one’s interpreter to make it maximally efficient for the specific grammar one needs interpreted.  Rather, one can write the language interpreter in a simple, general and maintainable way, and let the supercompiler do the specialization of the interpreter for a given grammar.  This is an area in which significant experimentation with the Java supercompiler is going on right now.  On a cursory consideration, it looks like a pure program specialization application, but when one studies the actual code of language interpreters, one realizes that program specialization will not solve the majority of code inefficiencies, so that a fuller supercompilation approach is required.  Experimentation in this domain is going on currently, focusing on popular scripting languages such as XSLT, Perl and Tcl.

The list of examples can be multiplied indefinitely.  For transaction management, one can use a general framework and let the supercompiler specialize it to the particular set of transactions involved in a given application.  In a graphics application, one can write general ray tracing and radiosity algorithms without worrying much about performance, and let the supercompiler optimize them for performance on rendering particular scenes of interest.  And so forth.

In fact, the changes in programming methodology induced by supercompilation technology may be a little deeper than just allowing programmers to focus on comprehensibility and maintainability and other merits besides efficiency.   There is also an art to writing maximally supercompilable code.  The code that is most easily sped up by the supercompiler, is not necessarily the most efficient code, nor is it always the most comprehensible and maintainable code (though usually, comprehensible and maintainable code will be highly supercompilable as well).  This is an issue that will become less and less significant as supercompilers become more and more sophisticated.

Like most revolutions in science and engineering, the supercompilation revolution can be expected to be gradual.  Initially the supercompiler will be one handy optimization tool among many, to be used alongside HotSpot and other simpler techniques, extremely valuable for some types of programs and only moderately useful for others.  As the technology advances further, the changes it induces will be more and more profound, eventually, we believe, leading to entirely new paradigms of software engineering practice.  The art and science of computer programming is very young as yet, and the advent of technologies such as supercompilation is part of its gradual and inevitable maturation.

 

References