OK, We’ve Mapped the Human Genome, Now What

OK, We’ve Mapped the Human Genome, Now What?

Ben Goertzel

April 16, 2000

My DNA, a fabulously long chain of amino acids, a copy of which is contained in every one of my cells, contains a large percentage of the information required to produce me, Ben Goertzel.

This is an amazing thing, really.

Extract my DNA from any one of my cells, and feed it into a “human producing machine,” and out comes a clone of Ben Goertzel, lacking my knowledge and experience, but possessing all my physical and mental characteristics. Of course, we don’t have a human producing machine of this nature just yet, but the potential is there: DNA seems to encode most of the information required to produce a human being.

This is the glory and the romance underlying the Human Genome project, a huge initiative launched in 1990, which aims to chart the whole human genome, to map every single amino acid in the DNA of some sample of human beings. No one could doubt the excitement of this quest: It has a simplicity and grandeur similar to that of putting a man on the moon.

Once you get past the excitement and mystique and into the details, however, the Human Genome Project slowly begins to seem a little less tremendous. One realizes that the actual mapping of the genome is only a very small part of the task of understanding how people are made, and that, in fact, the design of the “human-producing machine” is a much bigger and more interesting job than the complete mapping of examples of the code that goes into the machine. In other words, embroyology is probably a lot subtler than genetics, and in the end, much like putting a man on the moon, the Human Genome Project is a task whose scientific value is not quite equal to its cultural and psychological appeal.

The project originally was planned to last 15 years, but rapid technological advances have accelerated the expected completion date to 2003. The project goals are multifold: to identify all the more than 100,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, and develop tools for the analysis of this huge amount of data. Some resources have also been devoted to exploring the ethical, legal, and social issues that may arise from the project.

There have been a couple recent milestones on the path to completion of the Human Genome Project. Just a couple weeks ago, scientists completed the mapping of Chromosomes 16 and 19 on the human genome. Human chromosome 19 contains about 2% of the human genome, including some 60 genes in a gene family involved in detoxifying and excreting chemicals foreign to the body. Chromosome 16 contains about 98 million bases, or some 3% of the human genome, including genes involved in several diseases, such as polycystic kidney disease (PKD), which is suffered by about 5 million people worldwide and is the most common potentially fatal disease caused by a defect in a single gene.

Clearly, these are major advances in gene mapping, with potential implications for helping remedy diseases. But -- what do they really mean?

An analogy may be instructive. Suppose a team of scientists goes to another planet, and discovers a lot of really long strips of paper lying around on the ground, each one with strange markings on it. Suppose they then notice some big steel machines, with slots that seem to be made to accept the strips of paper. After some experimentation, they figure out how the machines work: You feed the strip of paper in one end, and then after a few hours, the machine spits out a completely functional living organism. Amazing!

So, the scientists embark on a project to figure out what’s going on here. Of course, they have no idea what’s going on inside the machines, and all their efforts to bust the machines open meet with failure. So, instead, they devote themselves to completely recording all the markings on the strips of paper in their notebooks, hoping that eventually the patterns will come to mean something to them. When they achieve 10%, then 20%, then 50% completion of their task of recording these meaningless patterns in their notebooks, they declare themselves to have made significant scientific progress.

And occasionally, along the way, they make some small discoveries about the impact that the markings have on the organisms the machine produces. If you snip off the first 10% of the strip, the organism produced is more likely to be defective than if you strip off the last 10%. The region of the strip that’s 2000 to 3000 markings from the end seems to have something to do with the organism’s head: it seems to be very different for organisms with very different heads, and so forth. But these kinds of general observations don’t really get them very far toward an understanding of what the amazing steel machines are actually doing.

If you’re somewhat familiar with computers, a variation on this analogy may be instructive. Consider a large computer program such as Microsoft Windows. This program is produced via a long series of steps. First, a team of programmers produces some program code, in a programming language (in the case of Microsoft Windows, the programming language is C++, with a small amount of assembly language added in). Then, a compiler acts on this program code, producing an executable file – the actual program that we run, and think of as Microsoft Windows. Just as with human beings, we have some code, and we have a complex entity created by the code, and the two are very different things. Mediating between the code and the product is a complex process – in the case of Windows, the C++ compiler; in the case of human beings, the whole embryological and epigenetic biochemical process, by which DNA grows into a human infant.

Now, imagine a “Windows Genome Project,” aimed at identifying every last bit and byte in the C++ source code of Microsoft Windows. Suppose the researchers involved in the Windows Genome Project managed to identify the entire source code, within 99% accuracy. What would this mean for the science of Microsoft Windows?

Well, it could mean two different things.

Option 1: If they knew how the C++ compiler worked, then they’d be home free! They’d know how to build Microsoft Windows!

Option 2: On the other hand, what if they not only had no idea how to build a C++ compiler, but also had no idea what the utterances in the C++ programming language meant? In other words, they had mapped out the bits and bytes in the Windows Genome, the C++ source code of Windows, but it was all a bunch of gobbledygook to them. All they have a is a large number of files of C++ source code, each of which is a nonsense series of characters. Perhaps they recognized some patterns: older versions of Windows tend to be different in lines 1000-1500 of this particular file. When file X is different between one Windows version and another, this other file tends to also be different between the two versions. This line of code seems to have some effect on how the system outputs information to the screen. Et cetera.

Our situation with the Human Genome Project is much more like Option 2 than it is like Option 1.

The scientists carrying out the Human Genome Project are much like the scientists in my first parable above, who are busily recording the information on the strips of paper they’ve found, but have no idea whatsoever what’s going on inside the magical steel machines that actually take in the strips of paper and produce the alien animals.

Moving beyond analogies, let’s talk briefly about a real project related to the Human Genome Project: the Fly Genome Project. In the 24 March 2000 issue of Science magazine, in a series of articles jointly authored by hundreds of scientists, technicians, and students from 20 public and private institutions in five countries, the almost-complete mapping of the genome of the fruit fly Drosophila melanogaster was announced. Hurray! Some other species of fly have also been similarly mapped.

The fruit fly Drosophila has a big history in genetics; its study has yielded a long series of fundamental discoveries, beginning with the proof, in 1916, that the genes are located on the chromosomes. Now all of its 13,601 individual genes have been enumerated.

This achievement may have some practical value. In a set of 289 human genes implicated in diseases, 177 are closely similar to fruit fly genes, including genes that play roles in cancers, in kidney, blood, and neurological diseases, and in metabolic and immune-system disorders.

But, my point is: OK, we have the fruit fly genome mapped, to within a reasonable degree of accuracy. Now what? Wouldn’t it be nice to understand the process by which this genome is turned into an actual fly?

The Human Genome Project includes in its umbrella a focus on data analysis. This refers mainly to designing and implementing computer programs that study the huge sequences of amino acids that biologists have recorded, and look for patterns in these sequences. This is fascinating work, but it is a long way from a principled understanding of how DNA is turned into organisms.

For example, Luis Rocha and his colleagues at Los Alamos National Labs are working on identifying regions of the genome that are similar to each other, based on statistical tests. This kind of similarity mining gives biologists a hint that two parts of the genome may work together at some stage during the process of forming an organism. Similar statistical methods may be useful for recognizing where genes begin and end in a collection of amino acid sequences – a problem that’s surprisingly tricky, and may require comparison of human sequences with sequences from related species such as the mouse or the fruit fly.

The relation between 1-D sequences of amino acids and 3-D structures formed from these sequences is hard for scientists to understand even on the simplest level. The big problem here is what’s known as “protein folding.” Many structures in DNA encode instructions for the formation of proteins. But no one knows how to predict, from the series of molecules making up a protein, what that protein is going to look like once it folds up in three-dimensional space. This is important because many proteins that look very different on the one-dimensional, molecular-sequence level may look almost identical once they’ve folded up in 3 dimensions. Thus, by focusing on sequence-level analysis, researchers may be scrutinizing differences that make no difference. Currently, only very few 3-D protein motifs can be recognized at the sequence level.

Basically, we barely understand the simplest stages of the production of 3-dimensional structures out of DNA, let alone the complex self-organizing processes by which DNA gives rise to organisms. This is OK – mapping DNA is still of some value even in this situation – but it must be clearly understood. In practical terms, our lack of knowledge of embryological process greatly restricts the use we can make of observed correlations between genes and human characteristics such as diseases. There are diseases whose genetic correlates have been known for decades, without any serious progress being made toward treatment. For DNA researchers to announce that they’ve mapped the portion of the human genome that is correlated to a certain disease, doesn’t mean very much in medical terms.

Does all this mean that the Human Genome Project is bad – wasted money, useless science? Of course not. However, it does suggest that perhaps the government is allocating its research money in an imbalanced way. By pushing so hard and so fast for a map of the human genome, while not giving a proportionate amount of research money to studies in embryology and the general study of self-organizing pattern formation, the US government is guaranteeing that we are going to arrive at a map of the human genome that we cannot use in any effective way.

And this brings us to some very deep and fascinating questions in the philosophy of science. As the biological theorist Henri Atlan pointed out in an essay written right around the start of the Human Genome project, the mapping of the human genome is a very reductionist pursuit. In fact it is almost the definition of reductionism -- the construction of a finite list of features characterizing human beings. All of humanity, reduced to a list of amino acids in order – imagine that! Wow!

On the other hand, the formation of organisms out of DNA is a very non-reductionist process, which biologists from the last century attributed to a “vital force” underlying all living beings. Modern scientists have still not come to grips with the scientific basis for this apparent vital force, which builds life out of matter. There are disciplines of science – cybernetics, systems theory, complexity science – which attempt to solve this problem, but these have not been funded nearly as generously as gene mapping, and they have not been linked in any serious way with the work on data analysis of genetic sequences. I believe that the study of embryology has the potential to overthrow many of our established ways of doing science, by shifting the focus of attention to complex, self-organizing processes and the emergence of structure. But this “complexity revolution” is something that the scientific establishment seems determined to put off as long as it possibly can.

In this sense, one can see the Human Genome Project as an outgrowth of modern cultural trends extending beyond the domain of science. It’s an expression of the quest for understanding, and also of the illusion that reductionism is the path to understanding. It’s an expression of our inability as a culture to come to grips with the wholeness of life and being, and focus on the seemingly magical processes by which life is formed from the nonliving, and structure emerges from its absence.

But, the wonderful thing about science is that it’s self-correcting. Ultimately science is all about the data and the conclusions that can be drawn from it. We’ll go ahead collecting data on the human genome, and postponing placing a serious focus on how the genome interacts with its chemical environment to self-organize into the organism. But once the data is collected, and scientists need to do something with it, then attention will gradually shift to these subtle self-organization processes. Eventually we really will understand not just what amino acids make up a human being’s genetic material, but how a human being is made. But, barring a real scientific revolution in the area of embryology, this is a long way away, much, much longer than the few years until the Human Genome Project reaches its completion.