We have seen that the Internet contains ample resources to support the learning of language, in its syntactic and many of its semantic aspects. But language is not a closed system. Discussion of the automatic understanding of language leads naturally to the question: how can an emergent Internet intelligence understand what concepts "mean"? A Net-embedded AI system will have access to vast amounts of text, for use in language understanding, but recognizing patterns of words' appearances in text can only bring you so far. To achieve true intelligence, an AI system needs to go beyond this and relate concepts to its own experience. This is where the vast amounts of nonlinguistic data accessible to an Internet AI system becomes essential. Grounding concepts in experience requires a vast sensory universe, coupling redundancy with endless variety; and the Internet provides precisely this.
To take a very simple example, if an intelligent information retrieval system is asked by a user to "find me some big novels about love and politics" it should not merely look for occurrences of the phrase "big novels" in the texts in its memory, it should look for texts containing novels that are actually big. I.e., it should "ground" the symbol big by associating it with the "physical" parameter of file size.
What I call "symbol grounding" -- expression of linguistic entities as combinations of nonlinguistic elements -- is typically classified under the label "semiotics." And semiotics ties in naturally with the concept of autonomy, a relationship that was drawn to my attention by Cliff Joslyn (currently of Los Alamos National Laboratories). A system that grounds its own symbols makes its own meanings, and is hence autonomous in linguistic space. This kind of autonomy in linguistic space seems to be a prerequisite for intelligent autonomy in physical space.
In order to rigorously explore the question of semiotics in regard to AI, one requires a precise definition of "meaning." The definition to be used here is the one supplied in Chapter 5 of Chaotic Logic (Goertzel, 1994), where it is suggested that the meaning of an entity is simply the set of all patterns related to its occurence. For instance, the meaning of the concept "cat" is the set of all patterns, the occurrence of which is somehow related to the occurrence of a cat. Examples would be: the appearance of a dead bird, a litter box, a kitten, a barking dog, a strip dancer in a pussycat outfit, a cartoon cat on TV, a tiger, a tail,....
It is clear that some things are related to "cat" more strongly than others. Thus the meaning of "cat" is not an ordinary set but a fuzzy set. A meaning is a fuzzy set of patterns. In this view, the meaning of even a simple entity is a very complex construct. In fact, as is shown in (Goertzel, 1994), meaning is in general incomputable in the sense of Godel's Theorem. But this does not mean that we cannot approximate meanings, and work with these approximations just as we do other collections of patterns.
This approach to meaning is very easily situated, in the sense of situation semantics (Barwise and Perry, 1981). The meaning of an entity in a given situation is the set of all patterns in that situation which are related to that entity. The meaning of W in situation S will be called the S-meaning of W. The degree to which a certain pattern belongs to the S-meaning of W depends on two things: how intensely the pattern is present in S, and how related the pattern is to W.
These ideas are not difficult to formalize. Consider a pattern in S as a process p whose result rp is similar to S.
Let s be a "simplicity function" mapping the union of the space of mental processes and the
space of situations into the nonnegative real numbers; and let d be a metric on the space of
mental processes, scaled so that d(rp,S)/s(S) = 1/c represents an unacceptably large degree of similarity.
Then one reasonable definition of the intensity with which p is a pattern in S is given by the formula
Next, let MW,S(p) denote the degree to which p is an
element of the S-meaning of W (S is a situation, W is an object). Then one might, for instance, set
Finally, the relationship between meaning and intelligence is worthy of brief comment. Intelligence, as I have argued in (Goertzel, 1993) and reviewed above, should be considered as the capability to achieve complex goals in complex environments. A mathematical definition of "complexity" may be founded on the basis of the definition of "pattern" given above, the gist of which is that an entity is complex if it has a large number of different patterns in it. An environment is complex if it has many patterns in it; and a goal is complex if, when considered as a function mapping situations to degrees of goal achievement, its graph has a large number of different patterns in it. The collection of patterns in an environment or goal, which makes that environment or goal complex, is a fuzzy set of patterns. Since intelligence has to do with processing and enacting complexity, it requires manipulating fuzzy sets of patterns. Meanings of specific elements in the world may thus be viewed as subsets of the overall field of complexity in which intelligence operates. Isolating meanings is a way of breaking down complexity into parts -- i.e., it is an example of problem solving by division and reunification, the most basic problem-solving strategy of all.
A consequence of this approach to meaning is that, in order for an intelligence to understand the meaning of a word, concept or other entity, this entity must be "grounded" in the system's own experience. Consider, for example, the position of Webmind or another similar intelligent text processing system on encountering the word "two." In order to deal with "two" in an intelligent way, the system must somehow understand that "two" is not only a pattern in texts, it also refers to the two computer users who are currently logged onto it, the two kinds of data file that it reads in, etc.
Through processing of text alone, part of the fuzzy set that is the meaning of "two" can be acquired: that part which is purely linguistic, purely concerned with the appearance of "two" in linguistic combinations with other words. But the other part of the meaning of "two" can only be obtained by seeing "two" used, and using "two," in various situations, and observing the patterns in these situations that are correlated with correct use of "two." In general, the fuzzy set of patterns that is the meaning of a symbol X involves not only patterns emergent among X and other symbols, but also patterns emergent among X and entities involved in physical situations. Recognition of patterns of this type is called symbol grounding (Harnad, 1990), and is a crucial aspect of artificial intelligence. Symbol grounding is, in essence, the difference between semiotics and formal language.
One approach to the task of symbol grounding would be to encode thousands or millions of specific rules, explaining which items in which situations correlate with which symbols. However, this kind of approach is obviously doomed to fail, because even if one could accomplish such a tremendous feat of rule encoding for all known situations, one would still face the problem of enabling one's system to deal with unknown situations. Fortunately, in a self-organizing intelligent system, it is possible to take a different approach: to supply a small number of symbol-reality mapping rules and allow the remainder to emerge spontaneously from the network's own dynamics. The general principle here is that the subjective reality of an intelligent system is necessarily self-organizing: only a subjective reality which "understands its own structure" will be able to adapt and grow, and adaptation and growth are prerequisites of intelligence.
In order to make this concept more concrete, consider an AI system composed of multiple agents, each of which has the ability to look inside itself and report various aspects of its current state. Textual statements and other data, once read into the system, are decoded into collections of agents. A "symbol grounding" is represented by a special kind of mobile agent, a symbol grounding agent. A symbol grounding agent is associated with another agent, which it "tracks." Its goal is to mimic the behavior of the agent which it tracks, but without looking into that agent itself, only by looking into other agents. In this way it represents the agent that it tracks as a relationship among other agents. Given the collection of information-bearing agents as a fixed information base, the symbol grounding agent attempts to become a pattern in the node that it tracks. A given agent may be tracked by a number of symbol grounding agents, each of which is operative in certain situations, and some of which are more effective than others. The idea of meaning as a fuzzy set of patterns is thus embodied concretely: the meaning of an agent is enhanced by a fuzzy set of meaning-gathering agents. Situation specificity of meaning-contributing patterns is reflected as situation specificity of agents. These are intelligent agents, as recognizing patterns in the occurence of linguistic symbols or other mental entities is not an easy task; but the machine learning problems involved here are not insurmountable, and are in fact similar to those encountered in other areas of AI, e.g. financial prediction.
The implementation of symbol grounding agents involves many detailed and difficult issues not touched on here, but the conceptual aspects of the design should be clear. The point is that the population of symbol grounding agents associated with a piece of information expresses that piece of information, explicitly, as a fuzzy set of patterns. The usefulness of the symbol grounding agents for generating meaning depends on two factors: the intelligence of the symbol grounding agents, and the coherence and informativeness of agent system used as an information base.
The examples considered above -- "long", "two", etc. -- involve texts and numbers, things that Net-based AI systems have fairly direct knowledge of. They don't involve, for example, horses, or rooms, things that a such a system could know of only via indirect reference. This is not coincidence of course: an AI system will be able to ground only those symbols that refer to something it has direct experience with. This is a fundamental fact of mind and there is no getting around it!
Does this mean that Internet AI systems won't be able to reason about horses? By no means. If such a system is told "the horse is big," it will be able to interpret this statement by using its intuitive understanding of the word "big" -- which is gained via its grounding of big in the context of texts, data, and so forth. But, it won't be able to ground the concept of a horse for itself, unless it is given some perceptual basis for this grounding. If it were trained to recognize horses in .gif files, this would allow it to develop a visual grounding for the concept of "horse", which would still be very partial compared to the grounding that most humans have.
Humans experience the difference between grounded and ungrounded symbols very vividly when visiting, for the first time, a city that they had previously only read about. I understood a lot about Prague, before I had ever visited there, based on reading novels set in Prague, tourist books about Prague, etc. I reasoned about Prague by reference to my large battery of relevant already-grounded symbols: city, building, street, etc. etc. I constructed a fuzzy, shifting default image of the ungrounded symbol "Prague" as a fuzzy combination of already grounded symbols -- bringing other ungrounded symbols such as "Moscow" to bear on it too, no doubt. But when I visited there, the symbol "Prague" was rapidly grounded, which brought my thinking and feeling about Prague to a level it could never have achieved without the symbol grounding. One can make great discoveries about a city without ever visiting it -- pick up historical and cultural trends, etc. -- but there are some things one will necessarily miss from not really "knowing" it. (A more potent example than cities would be sex. Compare the understanding of sex possessed by someone who has never had sex, but has looked at a lot of pictures and read a lot of books about it, with someone who has had the experience.) Similarly, there are some things Webmind will always miss about horses until it is given multisensory organs to interact with horses and ground the symbol "horse" for itself. But it will likely pick up a lot more than us humans about computer networks, numerical data, Internet traffic, etc., because in these areas, it has more grounding than us, rather than less.
From a philosophical perspective, the basis of the notion of grounding is as follows. Symbols emerged to represent recognized patterns. But a symbol can never capture all the nuances of a pattern. It can guide the mind toward the recognition of the same patterns that others have recognized before; or it can serve as a shallow, limited "proxy" for the pattern, when the mind has not yet recognized the real thing. But there is no substitute for recognizing the pattern oneself. And no mind can operate entirely on the basis of proxies. The percentage of ungrounded proxies must be below a certain "critical level" or reason will necessarily fail. This is perhaps the main reason why all attempts at AI so far have failed.
What we do when we don't have a real-life grounding is to use a
formal definition. For instance, what is a horse? Well, a
horse, of course, is
A hoofed quadruped of the
genus Equus; especially, the domestic horse (E. caballus), which
was domesticated in Egypt
and Asia at a very early period. It has six broad molars, on each
side of each jaw, with six
incisors, and two canine teeth, both above and below. The mares
usually have the canine teeth
rudimentary or wanting. The horse differs from the true asses, in
having a long, flowing mane,
and the tail bushy to the base. Unlike the asses it has
callosities, or chestnuts, on all its legs.
The horse excels in strength, speed, docility, courage, and
nobleness of character, and is used
for drawing, carrying, bearing a rider, and like purposes.
A high-quality NL system will be able to parse this definition and use
it to construct a number of different things. It will build is-A
links from "horse" to "quadruped" and "Equus", and part-of links
to "hoof", "molar", "incisor", "tooth", etc. It will construct
trilinks to (4,horse), (long, mane) and (flowing, mane). It will
construct "quality" links from "horse" to "strength", "speed",
"docility", etc. This is useful information, but not the same as
grounding. These definition-embodying links simply "outsource"
questions about horses to other concepts: to figure out if X is a
horse, you check if it has teeth, hooves, four legs, etc. Then,
to check if X has "teeth", "legs", etc., the definitions of these
other terms must be appealed to. This is a reasonable way of
proceeding, but yet, it will never give true understanding. At
some point, the system must be able to move outside the network
of formal, relational links into a deeper kind of understanding:
an understanding grounded in experience. This understanding
grounded in experience is implicit, not explicit. It may
be represented as a neural network, as a high-dimensional
polynomial or logical combination of various simple terms, or in
any sufficiently flexible format, but it is unlikely to be
transparently comprehensible in the way that an explicit
dictionary-like definition or associated collection of relational
links is. Implicit symbol groundings cannot be derived from
formal definitions, and cannot be communicated in language in any
reasonably compact way, but can only induced or evolved based on
experience.
Whether a grounding is constructed in terms of rules or in terms of neural networks or something else is not important, the important thing is the complexity of the framework used to represent the grounding. Groundings must be constructed from a framework which is flexible and adaptive -- if they are to be built from rules, they must be built from large numbers of rules according to a non-restrictive rule-combining algebra. Every single grounding need not be complex -- e.g. the grounding of "big" given above is not complex -- but the framework for generating groundings must be capable of fluidly and compactly representing complexity.
One of the important qualities of symbol groundings is this: they are compact processes which enable a mind to make category membership decisions regarding categories that are of complex shape in "idea space." The most common statistical clustering algorithms are optimized to recognize categories of spherical shape. Simple collections of logical rules, on the other hand, often recognize categories of polygonal shape. Any generalized system of categorization is going to have a bias toward some particular shape of category. Generalized categorization systems have an important purpose: they help a mind to form a rough map of the conceptual landscape, to get an initial picture of what things belong with what other things. But in order to arrive at an effective working understanding of the world, it is necessary to go beyond any generalized system and create specific processes for determining category membership, and hence refining the mind's categorization system. This process is too computationally expensive to use as the sole means of categorization, but is important as a method of deepening the understanding initially arrived at by simpler, statistical means.
But, important as it is, flexibly-shaped categorization is not the defining characteristic of a symbol grounding. The word "grounding" is significant here -- it suggests attaining firmness by appealing to a lower level. In a human context, this lower level is the body and physical reality. For an Internet AI system, this lower level is its own body and physical reality: the lengths of files and strings and tables, etc., stored in its various component agents, which correspond directly to the sizes of chunks of RAM absorbed by different parts of its internal network. The lower level is not more absolutely meaningful in a philosophical sense, but it is more broadly meaningful in a social sense: it is a realm that is common to a host of different minds. Symbol grounding, in essence, may be defined as the identification of concepts with compact processes constructed as potentially complex combinations of socially-real phenomena, and demarcating complex shapes in idea space.
Crystallizing these philosophical observations into a framework more suitable to guide engineering, one may think about three types of grounding:
Inductive: observing that a symbol is correlated with certain phenomena, and inducing a machine that predicts which phenomena the symbol is correlated with
Deductive: observing that A is a subset of B, and hence that the grounding of B is an initial guess at the grounding of A. (e.g. a=bird, b=animal)
Abductive: creating a virtual grounding for something never directly perceived (e.g. a quasar). Here we look for transformations from things we have perceived, e.g. quasar = f(star), quasar = g(comet), etc.
Abductive grounding is the most complex of the three, and the one that gives the perceived world most of its texture. To carry out abductive grounding, a system must evolve transformations from unknown to known enities by a GA or similar method. Then, it must compose these transformation with the groundings of their arguments, e.g.
possible-grounding-for-quasar = f(grounding-for-star)
This is how we create "mental movies" grounding things we have never had experience with.
The real subtlety of symbol grounding lies in the fact, pointed out to me with particular force by Julian Troyanov, that even groundings that seem to be inductive also have an abductive, imaginative component. This abductive component of even familiar concepts is is what gives them their flexibility and living nature. An Internet-based AI system will have to deal abductively with many common elements of human physical reality; just as we have to deal abductively with things like packet flow over the Net, which it will be able to deal with inductively ... it will be able to "feel" the packets flowing and hence ground the concept of packet flow inductively.
With these concepts in hand, we may revisit the issue of grounding numbers. It seems that the straightforward evolution of inductive symbol groundings will suffice to ground numbers from specific examples: e.g. to cause the "2" node to recognize entities that are sets of 2, if it is trained by being given examples of things that are sets of 2 and things that are not sets of 2. This is not difficult if it is assumed that the agents comprising an Internet AI system will be introspective enough to numerically measure their own sizes.
But, what about a number like "three thousand, four hundred and six"? Clearly, the system can't be given explicit training on every number it will ever see. Of course, a specific formula for grounding numbers could be hard-coded into the system, but that isn't the point: the point is, how could it learn the correct way to ground numbers, as human children do when learning mathematics? This is abductive grounding, and it must be done in such a way as to provide not only precision but flexibility. For instance, what does "three zinkum, four hundred and six" ground to -- merely a vague impression of a big number? Yes, but every one knows that it's one more than "three zinkum, four hundred and five." Unless of course one discovers that a zinkum is a kind of infinity, in which case "three zinkum, four hundred and six" and "three zinkum, four hundred and five" are equivalent! This is all a matter of adjusting the transformation used in constructing the abductive symbol grounding.
In closing, let us return to the relation between semiotics and autonomy. This relationship is, in an intelligent system, necessarily bidirectional: on the one hand, a system uses its shared and individual semiotics to maintain autonomy; and on the other hand, a system requires its autonomy in order to gather and maintain the information that is the basis for the formation of its semiotics. The feedback dynamics between semiotics and autonomy is very subtle and can be expected to manifest phenomena of chaos, complexity and emergence.
As autonomous intelligences, we experience this feedback from the inside; and as observers of other humans, we experience it from a combined internal/external perspective, due to our empathy for other humans and our embeddedness in the shared social semiotic dynamic of human culture and society. In engineering artificial intelligences, however, we get a unique opportunity to observe the semiotics/autonomy feedback from the outside, a perspective that yields many new insights.
In particular, based on the above discussion, we may enounce a few general points about meaning and mind:
An intelligent system's strategies for meaning extraction are necessarily inseparable from the overall intelligent dynamics with which its dynamic knowledge base is endowed.
The meaning of a word, phrase, sentence, text or concept, to an intelligent system, is encapsulated in the relationships between this entity and other entities which it contains in its dynamic knowledge base.
In order to truly understand the meaning of text or speech, an intelligent system must be able to map the meanings recognized in linguistic items with aspects of own data structures and of non-textual data presented to it.
In an autonomous intelligent system, such as Webmind or the human brain, each entity is a symbol for a tremendously large fuzzy set of other entities in the system. Meaning thus emerges as a self-organizing web of fuzzy patterns. The task of building a subjective reality is carried out by this web; and the subjective reality constructed by this web is a key contributor to the web's growth.
Because the evolution of symbol groundings is derived by usage of the nexus of relationships in a self-organizing memory, semiotics depends on autonomy, inasmuch as autonomy is what allows an integral, useful nexus of relationships to evolve. On the other hand, a self-organizing AI system's autonomy relies equally much on its semiotics, in the sense that a system's understanding of itself hinges on its understanding of its relationship with humans and other computers, and its understanding of this relationship would be feeble indeed without symbol groundings of basic relational concepts. Essentially, we conclude that, in a self-organizing Internet-embedded AI system,
Semiotics supports autonomy via the social definition of self
Autonomy supports semiotics by supplying a coherent body of raw material for the evolution of symbol groundings
It is tempting to conclude that this characterization of the semiotics/autonomy feedback relation holds generally, beyond the context of the Net; but this is not the place to enter into such issues.