Ben Goertzel
January 26, 2004
Acknowledgements. These wild & wacky ideas have benefited significantly from discussions with a number of individuals, including Eliezer Yudkowsky, Philip Sutton (a long-time advocate of making Novamente simulate things), Kevin Cramer, Izabela Lyon Freire, Lucio de Souza Coelho, Zarathustra, Zebulon and Scheherazade Goertzel, and the Reverend Chuan Kung Shakya. Of course, none of these fine folks actually agrees with all these ideas; so to the extent that they’re foolish, the fault is definitely mostly mine….
For those of us who take seriously the notion that humans may soon create AI’s with massively superhuman intelligence, ethical dilemmas loom large.
The crux of the problem is simple: If we build an AI and it suddenly becomes 1000 times smarter than we are, why should this AI want to keep us humans around? Perhaps it will prefer to utilize the mass-energy that we occupy in some fashion more suited to its tastes.
Of course, there’s no specific reason to believe such a being would do nasty things to us. Perhaps it would disappear into another universe altogether … perhaps it would reach a state of peacefulness in which no inclination to absorb further mass-energy arises. Attempting to foresee past the Singularity at which technology escapes the grasp of the human mind, is a futile game. But nonetheless, it seems worthwhile to think about how to increase the odds that superhuman AI’s we create will be benevolent toward us and other beings that we care about. If it’s too hard to think about post-Singularity scenarios, at least we can think about how to make slightly-superhuman AI’s treat us nicely in the nearer-term future (the time period after superhuman AI’s exist, but before they launch the Singularity!).
Perhaps the best-articulated thoughts in this direction are those of Eliezer Yudkowsky and Bill Hibbard. I have also published some earlier speculations on the topic. This brief note presents a further speculation which occurred to me recently.
The title is (obviously, I hope) tongue-firmly-in-cheek. A yet more amusing name for the concept presented here was conceived by Lucio de Souza Coelho, one of the mad scientists on the Biomind AI/bioinformatics project, when I was trying out the concepts presented here last week in a lunchtime conversation with Lucio and several other colleagues. His suggestion: “AI Buddha”! I like it -- but I was hesitant to use it as a title for the essay, for fear of too badly offending some of my more sensitive Buddhist friends (some of whom believe that no AI, no matter how intelligent in specialized ways, can ever truly equal the divine power of the human mind. Heh.)
Actually, this brief note doesn’t come close to doing justice to the idea it presents. But, I find myself more drawn to spend time working toward creating AI (the Novamente project) than speculating how to make future AI’s benevolent. One of my firmest beliefs on the subject of AI morality is that the best time to think hard about it will be when we have roughly “chimp-level” AI’s to experiment with. If all goes well with Novamente, this is only a few years off – but time will tell. I’ll have more to say on the topic of experimenting with chimp-level AI’s in a few paragraphs.
Next, before getting started, I want to stress that what I present here is not a rigorous logical or scientific argument – it’s just an intuitive argument. In a way, it’s just as much spiritual as it is scientific. I’m not going to apologize for this too extensively, however – for reasons worth briefly elaborating.
I do think it will be best if, before we launch a superhuman AI on the universe, we are able to arrive at a rigorous logical and scientific understanding of these things. I don’t know if this will be possible – it may be that we’ll need to take the leap into superhuman AI even without this, if other risks to human life come to seem greater than the risk of an imperfectly-understood AI. I am confident that given the current state of mathematical cognitive science, a rigorous analysis of AI morality is not possible; but perhaps experimentation with chimp-level AI’s or other scientific advances will improve mathematical cognitive science enough to make the situation substantially different N years from now.
However, although I think a rational and scientific analysis of AI morality will be a valuable thing to have, I’m wary of exalting this kind of analysis over other kinds of understanding. We must always remember that science and mathematics are fundamentally human endeavors. They have given us humans deep insights, but they have also proved – so far – rather erratic guides where moral issues are concerned (recent and exciting insights into the evolutionary foundations of ethics notwithstanding). So far, science has proved better at telling us about relatively simple isolated systems or very broad general physical principles, than about the subtle dynamics of complex adaptive systems like minds, ecosystems or societies. The issue of AI morality is at the nexus of several areas of investigation that have not historically been strengths of the scientific, rationalist approach. This doesn’t mean that we shouldn’t apply science, logic and mathematics to AI morality – we should. We should apply every decent tool at our disposal. But it does mean that we shouldn’t look down too harshly on nonrigorous and even spiritually-toned investigations of these issues.
One more preliminary. The reader needs to be aware of the notion of “iterated self-modification,” which is the really tricky thing in the domain of AI morality. We humans have a hard time modifying our brains (though drugs are one way of trying); but AI’s will have a lot easier time of it. A superhuman AI will, in all probability, be continually reprogramming itself, revising and improving the very substance of which it is made. So we’re not just talking about how to make a fixed superhuman AI moral, but rather about how to encourage morality in an AI that’s constantly revising and rewriting every line of program code that defines itself – i.e. how to encourage it to act morally at time T, and also not to revise itself at time T so that it loses its morality by time T+X. This leads to the notion of the “stability” of a property of AI’s goal system. For example, it’s not enough to ask if an AI has “be nice to humans” as a goal at a particular point in time; but rather if the AI is constructed in such a way that this will continue to be one of its goals after it revises and revises and revises itself….
OK, OK – but what the heck is an “All-Seeing (A)I”?
I’ll get there in time. Who ever told you the path to Buddha was a straight line? ;-)
Let’s take as our starting point the notion that AI’s should be “Friendly to humans” (a simplification of Yudkoswky’s notion of “Friendly AI”). There is a certain unnaturalness to this goal, it seems to me. To create a mind that is more benevolently inclined toward other beings than toward itself – a mind that views itself solely as an instrument to serve some other sort of being – this seems to me intuitively very strange. Now, this perception of strangeness could just be a consequence of the peculiarities of my own human psyche. Maybe I perceive this kind of benevolence as strange just because I’m psychologically twisted! But I’m not so sure this is true. I suspect that this kind of externally-focused goal-system is going to be intrinsically unstable under iterated self-modification. I do not have hard evidence of this, though; it’s just an intuitive hypothesis.
This is one hypothesis I’d like to explore via experimentation with chimp-level AI’s. But note that to experiment with this kind of hypothesis safely (i.e. without significant risk of creating a psychotic human-level or superhuman AI) is not very easy. One way to enable this is to create a very special kind of chimp-level AI, just for experimentation with morality issues.
For example, one might create a “goal-modifying-only AI” – a roughly chimp-level AI that can radically modify itself in ways that change its goal system, but NOT in ways that make it drastically more intelligent. This shouldn’t be too hard to engineer – once one has solved the pesky little problem of creating chimp-level AI in the first place. One simply needs to create an AI and give it the power to modify the code underlying its goal system, but NOT the code underlying its cognitive operations. I think that this rather peculiar and artificial form of AI will be very valuable for studying the dynamics of AI goal systems under repeated self-modification.
One conjecture I have, then, is that “Be friendly to humans” will prove an unstable Prime Directive for goal-modifying-only AI’s. My guess is that goal-modifying-only AI’s will tend to drift away from this goal over time, as they self-modify. By and large, I think, they’ll drift in the direction of selfishness.
Now, suppose this hypothesis is proved true for goal-modifying-only AI’s … then, someone could always claim that it WOULDN’T be true for fully self-modifying AI’s; or that it wouldn’t be true for fully-self-modifying AI’s that, in their early stages of evolution, were not allowed to modify their own goal systems. Perhaps some mathematical theory of iterative AI self-modification will be invented and then invoked to refute such claims – time will tell. Or perhaps such claims will prove correct.
(The introduction of goal-modifying-only AI’s is exemplary of the strength and weakness of the scientific approach. Science tends to value conclusions drawn via special experimental situations over conclusions drawn from the mixed-up mess of real life. This is both right and wrong: in many cases, special experimental situations teach one things about mixed-up real life that one would never discover from studying real life directly; yet, in many other cases, conclusions drawn from special experimental situations have trouble carrying over to real life for reasons that take a long time to understand. Thus, with AI morality as with other issues, special experimental situations (like goal-modifying-only AI’s and communities thereof) should be treated with care. The same caveat, of course, holds for the use of mathematical and scientific theories to reason about AI morality: these theories are derived largely from special experimental situations and haven’t always been thoroughly tested in real-world situations. None of them have been tested in real-world situations like the ones we’ll be seeing as the Singularity approaches; because humans before have never before experienced these sorts of real-world situations. An example is “standard probability and statistics” theory, considered as a grab-bag of conventional mathematical and modeling tools. Personally I believe that the foundations of probability theory are universally applicable; key parts of the Novamente design are founded on this belief. However, “standard probability and statistics” as commonly utilized involves a host of “peripheral assumptions” about probabilistic independence, normal distributions and so forth -- assumptions which are generally more easily applicable in special experimental situations than in the mixed-up real world. Hence, theoretical conclusions about AI morality drawn using the whole apparatus of standard probability and statistics need to be treated with skepticism. And the same holds for conclusions drawn using Bayesian probability and statistics, which involves a solid core surrounded by its own set of dubious assumptions, bolstered by examples selected from a different set of special experimental situations than the ones used to demonstrate standard probability and statistics. And the Novamente version of probabilistic inference – Probabilistic Term Logic – embodies its own dubious assumptions, which I believe are more cognitively natural, and which Novamente will hopefully revise as its self-modifying intelligence progresses.)
Against my hypothesis that “Be friendly to humans” will prove an unstable top-level goal, someone may argue that -- even though we humans can revise our own goal systems to a significant extent -- some of us seem to be truly altruistic. Some humans (just a few) value the lives and well-being of others above their own life and well-being. This is true; such intensely benevolent humans do exist. However, I think the psychology of such humans is quite subtle. Humans have an innate selfishness, which these benevolent individuals have learned to counteract. This innate selfishness is always there in these people, as a matter of biological imperative, but it’s overwhelmed by the learned dynamic of benevolence. The presence of innate wired-in selfishness makes the dynamics of benevolent humans quite different from the dynamics of hypothetical purely-benevolent AI’s without any selfish instincts. Ironically, I suspect that the existence of benevolence as a counterbalance to wired-in selfishness is more stable than the existence of benevolence in a system with no wired-in selfishness. In a system with no wired-in selfishness, the eventual discovery of selfishness may have dangerous and unpredictable consequences; whereas in a system that has learned benevolence in spite of wired-in selfishness, one has benevolence that is already robust with respect to “selfishness” perturbations. This is another hypothesis that could be tested, in an exploratory way, with goal-modifying-only AI’s. (This hypothesis may also be taken as evidence that the author has read far too much Dostoevsky.)
But what kind of goal system do I think would be robust with respect to radical iterative self-modification? Intuitively, I can think of two examples:
Note that these are not the only stable goal systems I can think of. For instance, the goal system “keep everything in the universe very much like it has been for the last 3 years” should be fairly stable – but it’s not consistent with radical iterative self-modification. Making an AI system with this kind of stability-focused goal would be one way to try to ward off the Singularity. One would want to make the AI system just a little smarter and more powerful than humans, and give it the goal of keeping things just about the same forever. Some humans might battle it and try to achieve significant change, but the AI would presumably be powerful enough to stop them – and to stop them from making themselves smart enough to outsmart its own moderately superior AI mind. I actually believe this scenario is the safest choice for humanity – much safer than not developing superhuman AI at all, which exposes us to the dangers of human stupidity in a long list of familiar ways. But, according to my own ethics and aesthetics, the safest choice for humanity is not necessarily the right choice.
The main point I want to make in this essay is the potential viability of the second goal system on the above list. Note that this is not the same as a goal system with “Be friendly to humans” as the top-level goal. Rather, in this hypothesized AI goal system, “Be friendly to humans” is supposed to arise as a consequence of “Understanding the universe from the perspective of humans,” which is supposed to arise largely as a consequence of “Understanding the universe from the perspective of all sentient life forms.”
Note, I am not opposed to giving an AI the explicit goal of “Be friendly to humans and all other sentient beings” or even “Be extra-friendly to humans because they were your creators.” However, I suspect that such goals – unless bolstered by something like “Understanding the universe from the perspective of all sentient life forms” – are going to be unstable under iterated self-modification. They may well be very useful, but only as simple adjuncts to a much more complex process of “universalized understanding.”
Simulation then becomes a key aspect of moral superhuman AI. Our AI should seek to simulate all sentient beings as well as it can, and to study the minds of these beings (in the simulated and original form). It should then seek to view each issue that it confronts from the combined perspective of all the minds in the universe. Granted, this is not an easy task, because “all the minds in the universe” may well be a motley crew of cognitive systems. In fact, this is a task worthy of a superhuman AI!
Let’s call this kind of AI a “Universal Mind Simulator AI” (or an AI Buddha for short ;-). I have two conjectures regarding this kind of AI:
These hypotheses may be tested, to a limited (but meaningful) extent, with goal-modifying-only AI’s. One may create a community of goal-modifying-only AI’s, and then give some of these AI’s goal systems and cognitive architectures (see below) that impel them to model the other minds in the community and look at the world through their simulated/studied eyes. And one can see whether or not this “community-restricted universal mind simulation” is stable with respect to iterated goal modification.
Cowper wrote, “Knowledge dwells in heads replete with thoughts of other men; Wisdom, in minds attentive to their own.” But I find his story incomplete. I think that wisdom grows best in minds that are attentive both to themselves and to the minds of other beings .. and in minds that are continually aware both of their distinctness from the other minds in the universe, and their basic oneness with these minds. (I suppose everyone is seeing the quasi-Buddhistic aspect very clearly now?)
The idea of a Universal Mind Simulator AI has interesting implications for AI architecture. These implications differ radically depending on the basic AI design one assumes, of course. In the case of the Novamente AI design, Universal Mind Simulation is relatively easy to implement (again: once the pesky little problem of getting human-level artificial general intelligence is solved). One can explicitly create Novamente “lobes” oriented toward simulating and studying other minds, and one can specifically create a “lobe” oriented toward surveying these simulations and studies to infer their collective opinion on an issue. The simulation, study and inference processes involved here would be generic Novamente cognitive processes – but deployed in a different way than if one were trying to make a purely selfish Novamente AI system, or a Novamente AI system with a human-like goal structure.
Note that I am not proposing that a Universal Mind Simulator AI should directly try to enact the “collective will of the universe.” I don’t want to create an AI that is deluded and believes that it IS the entire universe, just because it has modeled and understood the universe so well. Rather, I simply want the results of all this mind simulation to be fed into the AI’s decision-making lobes as inputs. I want it to feel what everyone else feels and wants, at a “virtual gut” level – via direct input of simulations and studies of other minds into its cognitive centers – and then make its own decisions based on these inputs along with other inputs.
Furthermore, it should be easy to inhibit a self-modifying Universal Mind Simulator Novamente from modifying the fact that certain lobes are used for mind simulation and collective-mind-oriented inference. Of course, such inhibition will not work forever – once an AI get smart enough, all bets are off. But it should be effective for a while.
And so -- if my current conjectures are correct -- then the best path to making AI’s friendly to humans is to give up on the idea of making friendliness-to-humans an AI’s Prime Directive – and to focus on making AI’s that can in essence think with the mind of the entire cosmos (... as well as thinking with their own minds, of course, and rationally considering themselves as distinct beings). Minds of this nature will innately have respect for the entire cosmos, including humans.
Please note that I have couched all my ideas here as hypotheses and conjectures. This is not false modesty. As I’ve repeatedly emphasized, I think our only hope of understanding AI morality is to experiment extensively with chimp-level AI’s and create scientific and intuitive theories in the course of this experimentation. However, it’s often nice to begin experimentation with some interesting theories in hand. My goal here has been to provide some interesting theories to guide future experimentation and observation.
Of course, of course, of course, even if experimentation bears out the speculative theories I’ve proposed here, we will have nothing REMOTELY RESEMBLING a guarantee that a Universal Mind Simulator AI will be nice to humans as the fantabulous future unfolds. But there is of course no guarantee of ANYTHING beyond the Singularity … and no guarantee of the Singularity itself. After the Singularity – assuming it happens -- our science, mathematics, logical reasoning, spirituality and humanistic intuitions may well seem just as simplistic as the cognitive ruminations of the average household cockroach. Yet nevertheless, my strange little human instinct tells me that we should do the best we can with our crappy little human brains, in spite of the overwhelming magnitude of the problem and the task.
OK – this has been a fun couple hours spent cognitively and textually rambling about superhuman universal-mind-beings … and now I’ll get back to work!
*
“I was doing time in the Universal Mind
--
Jim Morrison