Notes:
1) This file contains the first few chapters of Digital Intuition, which constitute a "general overview" of the Webmind AI project. The full Digital Intuition manuscript is available only to selected individuals, and only under NDA.
2) A note on formatting. This manuscript exists in M$ Word format. When I
saved the excerpt to HTML from Word, it produced a lovely 2Meg file. So I
decided to save it to .txt instead, losing formatting but cutting the bytage by
a factor of 10. Soon enough, a lovelier version will be posted, based on a more
careful hand-conversion to HTML.
Digital Intuition
A Conceptual Overview of the Webmind AI Engine
Version 2
Primarily authored by
Ben Goertzel
Major contributions by
Pei Wang
Substantial contributions also by
Cassio Pennachin, Stephan Vladimir Bugaj,
Cate Hartley, Jeff Pressing, Anton Kolonin,
Lucio de Souza Coelho, Matt Ikle',
Len Trigg, Karin Verspoor
Based on ideas, designs & software cooperatively developed by
the roughly 45 members of the
the AI Development and Research Divisions of Webmind Inc.
Contents
Preface
1. History and Current Status of the Webmind AI Project
I. Conceptual Foundations
2. A Brief and Biased History of AI
3. Mind as a Web of Pattern
4. Key Aspects of Webmind AI
5. Knowledge Representation Issues
6. A Hybrid Approach to Natural Language Processing
7. Experiential Interaction
8. Experiential Interactive Learning
9. Emergence Within and Between Minds
II. Formal Models of System Components
10. Knowledge Representation for Inference
11. First-Order Inference
12. Higher-Order Inference
13. Probabilistic Term Logic
13. Importance Updating
14. Halos and Wanderers
15. Evolutionary Programming
16. Inference Control
17. Schema Learning
18. Feature Structure Parsing
19. Numerical Data Analysis
III. System Design
20. High-Level Architecture Issues
21. Architecture Overview
22. Webworld
IV. Applications
23. Exportation of Document Indexing Rules
24. A Formal Treatment of Rule Exportation
25. Financial Market Prediction
26. Future Applications
Appendix A. KNOW, Webmind's Knowledge Representation Language
1
History and Current Status
of the Webmind AI Engine Project
Ben Goertzel
1. Introduction
The Webmind AI Engine project is, in many important ways, the most ambitious initiative in the history of the AI discipline. Unlike most researchers and engineers working in the AI field, we in the AI Development Division of Webmind Inc. are actually making a serious attempt to create a truly intelligent computer program, in the short term. We have detailed software designs, and detailed engineering and testing and teaching plans, and a highly competent team of roughly 45 scientists and software engineers and testers executing these plans. We've been at it since mid-1998, and we estimate that within 1-3 years from the time I'm writing this (March 2001), we will complete the creation of a program that can hold highly intelligent (though not necessarily fully human-like) English conversations, talking to us about its own creative discoveries and ideas regarding the digital data that is its world.
This intelligent conversational program will be the WAE 1.0. It will focus on information retrieval and financial analysis, conversing about information it has read on the Internet and in document archives, financial market movements, its own self, and its own creative thoughts and discoveries relating to these areas. Within 1-3 years after this, we believe, we will deliver the AI Engine 2.0, which will possess an understanding of mathematics, and the ability to optimize and modify its own source code, thus continually improving its own intelligence in parallel with our own AI engineering work.
This brief Prologue summarizes the AI Engine project: its history, its future, and key elements of the software design and the conceptual thinking that has gone into it. It is written for readers who know a fair bit about AI: technical AI terms and concepts are introduced freely, without explanation. The rest of the book covers the same ground, but giving more background and more detail.
Of course, "1-4 years from real AI" and "1-3 years more to fully self-modifying AI" are very gutsy claims, similar to other claims that have been made (and not fulfilled) throughout the history of AI. But we believe that, due to the combination of advances in computer hardware and software with advances in various aspects of cognitive science, real AI really now is possible - and that we know how to achieve it, and are substantially advanced along the path to this goal.
I don't expect that this book, in itself, will convince every reader that the
AI Engine is what we say it is. At least, though, I hope it will convince you
that we have a plausible approach to constructing real AI, without significant
omissions or mistakes. To really convince the skeptic, of course, nothing short
of a real working and completed AI system will suffice. We're working on it, and
we'll be there soon enough.
2. Webmind Inc. and the AI Development Division
Before launching into discussion of the AI Engine itself, it may be useful to say a few words about the context in which the construction of this software system proceeded.
The Webmind AI Engine was created in the AI Development Division of Webmind, Inc., a start-up software firm, incorporated in August, 1997 as Intelligenesis Corporation. Webmind Inc. had a main business office in New York, small engineering offices in New Zealand and Australia, and a large engineering office in Belo Horizonte, Brazil. At the time of writing, April 2001, Webmind Inc. is in the midst of filing for bankruptcy, and I'm seeking funding to start a successor firm carrying on the Webmind Inc. AI work. Several dozen of the AI Development staff are working unpaid, continuing the development and productization of the AI code.
On a very general strategic level, Webmind Inc.'s mission was to create an "intelligence infrastructure" for the Internet. The vision was simple: The Internet of the future will be an immensely intelligent system, displaying powerful synergies with human intelligence, and that whatever company owns the Internet's intelligence, will own the largest share of the Internet economy.
From a scientific view, the core of the Webmind Inc. vision is the Webmind AI Engine: a series of increasingly intelligent software releases culminating in a conversational system that reads all the data on the Internet, assists in restructuring the data on the Internet to be more useful to human beings, and ultimately posts the new information that it creates. From a business view, on the other hand, an at least equally key aspect of the Webmind Inc. vision is the creation and sales of products that bring digital intelligence in various forms to websites and intranets. Currently the products depend on the AI Engine technology in a variety of ways; and as the AI Engine matures, we expect them to become more and more thoroughly AI Engine dependent.
In March 2001, the firm encompassed the following divisions:
* The AI Development Division, concerned with developing the WAE, and
delivering interim software along the way in the form of releases of the Webmind
IR Engine and Webmind Conversation Engine
* The Market Predictor group, doing financial trading in a joint venture with a
small group of investors, using a text-based nonlinear prediction system that
was prototyped within the AI Engine
* The Text Categorization group, located in New Zealand, developing specific AI
technology for text categorization in a highly customer-focused way
* Application Development and Product Marketing. This portion of the company is
devoted to creating and marketing products - currently information retrieval
applications carrying out functions such as text categorization, search, and
entity extraction. These products use tools from the AI Engine along with other
more conventional techniques
* Sales and Solutions Delivery, concerned with finding customers for our
products and helping them to integrate our innovative AI software into their
businesses
* Operations, including a robust IT staff able to support our demanding R&D
and product groups
Though the main office was in New York, Webmind was an international company, with engineering offices in Belo Horizonte (Brazil), Melbourne, and Hamilton (New Zealand), and a small business office in Silicon Valley. The AI Development Division consisted of roughly 30 people in Brazil, 14 in New York, and 1 in Melbourne. There were also a handful software testers devoted to the AI codebase, located in Brazil. The Brazilian staff consisted primarily of expert object-oriented software engineers, and computer scientists. The New York staff consisted primarily of scientists in various relevant areas: cognitive science, computer science, linguistics, physics, mathematics. However, the breakdown of responsibilities between offices was not at all rigid: a lot of software engineering was done in New York, and a lot of conceptual thinking is done in Brazil. The smooth integration of high-quality software engineering with innovative scientific research was one of the noteworthy and uncommon aspects of the Webmind Inc. AI Development Division. We put a fair amount of work into developing a software process that was friendly to the needs of experimental ground-breaking research, but also guaranteed the production of high-quality, efficient, testable and modifiable code.
The AI Development Division was managed by Cassio Pennachin, who was also the President and founder of Webmind Brazil. Cassio was the lead software architect of the AI Engine, whereas conceptual leadership was provided by Ben Goertzel and Pei Wang, together with Cassio and a loose group of others including Karin Verspoor and her team of computational linguists, Jeff Pressing (a physicist/cognitive-scientist who is our guru on such things as prediction and causation), and John Cleary and a group of his former students in New Zealand (experts on categorization technology).
At time of writing, active efforts are underway to resurrect Webmind Inc.
post-bankruptcy, but this is not the place to delve into such topics.
3. History of the AI Engine
The AI Engine design has evolved considerably over the 3 years since Webmind Inc. received its seed funding, due to a natural and productive feedback between theory and practice. This section recounts the evolution of the system, in a way that, we hope, will give the AI-educated reader a good sense of our current state of development and our future prospects.
On the highest level, the conceptual basis of the AI Engine is a complexity-science-oriented theory of mind called the "psynet model," described loosely in Ben Goertzel's previously published research monographs (and somewhat more precisely in some of his unpublished papers). In the psynet model, mind is envisioned as a self-organizing network of actors recognizing and creating patterns in each other, giving rise to emergent network-wide patterns. Phenomena like perception, action, cognition, memory and learning are theoretically framed in these terms - in terms of complexity, self-organization, and emergence in a network of pattern-focused actors. The theory seeks to identify the key structures and dynamics of mind as separate from the structures and dynamics used for their physical implementation. Rather than a rigorous scientific theory of mind, the psynet model is a conceptual framework, worked out more fully in some areas than in others. It would be compatible with a vast number of possible AI systems.
In 1994, I made an initial attempt to implement aspects of the psynet model of mind in Haskell, a functional programming language. This initial system was very simple - a dynamic network of simple computational actors recognizing patterns amongst each other, creating new agents embodying these patterns, and interacting with the user. Because of being implemented in Haskell, the system could only support a small network of actors. Partly because of this, unfortunately, this system failed to demonstrate interesting behaviors. From this experimentation, it became clear that a much larger and more diverse dynamic semantic network was going to be needed in order to give rise to emergent intelligence as hypothesized by the psynet model. A more industrial-strength programming language was required, and the system would have to be implemented using distributed processing on a cluster of powerful SMP machines. Furthermore, it was anticipated that a larger pool of weaker machines would need to be used for "background processing" - an idea that has led, these days, to the AI Engine adjunct called Webworld, a peer-to-peer network of mini-AI-Engines. The nodes and links of the Internet would be the key to supporting the nodes and links of the mind. Thus the name "Webmind" was born.
The first real attempt at a WAE was made in 1997. On the surface it was somewhat similar to the current AI Engine: a dynamic semantic network, consisting of nodes and links of many different types, with creative agents of various types that wandered among nodes along links, building new links and nodes based on patterns they had recognized. It was implemented in Java, a language that seemed to strike a middle way between elegance and practicality.
Implementation of the system picked up steam in mid-1998 when the company achieved funding so the chief technical founders (Ben Goertzel and Ken Silverman) could quit their day jobs and hire more programmers. This AI Engine was an interesting one, and before long we had found substantial success using this early-version system to recognize correlations between trends in the news and movements in the financial markets. But practical experimentation with initial versions of the system soon led us to two significant concerns. First, making the distributed processing infrastructure work effectively was going to be a very big job in itself. Secondly, using just a small assemblage of node, links and creative agents was not going to work, given the processing power and memory constraints we were facing (even with the distributed processing in place).
To address the second of these two concerns, in late summer 1998, we moved to a multi-modular design, in which we distinguished the core system (the distributed, dynamic semantic network) from modules corresponding to different aspects of intelligence. Each module contained its own specialized node types; and occasionally, its own link types and creative agents. For example, there were modules corresponding to: reasoning, natural language, categorization, evolutionary learning, numerical data analysis, and psyche (a term we use to encompass feelings and motivations). The focus of our work became the creation of nodes, links and creative agents specialized for various aspects of intelligence - but able to interact freely with the corresponding actors specialized for other aspects of intelligence.
This shift in engineering focus represented a significant change in philosophy - not a refutation of the original "psynet model of mind" perspective, but a significant augmentation to it. While the basic model of mind as a self-organizing, nonlinearly-dynamical semantic network of pattern-recognizing and pattern-creating actors still appeared to be in principle workable, we realized that it was not in itself a sufficient principle for the creation of a thinking machine. A human brain is born with a lot of specialized "wiring" for various types of intelligence (linguistic, visual, temporal, and so forth), and similarly, we found ourselves "wiring" our digital brain in various specialized ways, yet without sacrificing the complete adaptability of the system and its potential for self-organizing emergent general intelligence.
With this shift in focus, the project became more integrative in nature. We began to make extensive use of evolutionary programming, inspired by and going beyond John Koza's seminal work in that area. We integrated specialized techniques for numerical data analysis: prediction, association and causation finding, trend analysis, etc., developed by Jeff Pressing, myself, and others. Most critically, we created a reasoning module inspired in large measure by the Non-Axiomatic Reasoning System (NARS) developed by Pei Wang (the firm's first paid employee) over the past decade. (The AI Engine reasoning module now has two versions, one that uses NARS and the other that uses our own Probabilistic Term Logic system, which is somewhat similar to NARS but is founded on probability theory.) The use of neural-network-like activation spreading between nodes also became more sophisticated, as we developed two separate activation spreading processes to deal with attention allocation, and the detection of semantic associations.
In mid-1999, we had created a number of sophisticated modules all interacting in the context of one dynamic-semantic-network. But serious software engineering problems loomed: the original core system had not been designed to support all these modules, and needed a ground-up rewrite. During fall 1999 and early 2000, Cassio and his Brazilian object-oriented software gurus worked with Ken and myself (who had designed the original core) to create a new core system, and integrate the modules with the new core.
In the meantime, the market prediction work continued very successfully, and we realized that we could build a simpler system embodying the key AI processes needed for text-based nonlinear market prediction, without the overhead (or full emergent intelligence) of the full AI Engine. The Webmind Market Predictor product was born.
In early 1999, we also began doing categorization of texts drawn from financial message boards using AI Engine technology, and in this context learned a similar lesson: Reasonably good text classification could be done using techniques simpler, and easier to tune, than the full AI Engine. We began working toward the current Webmind Classification System product, which uses some fairly standard machine-learning categorization techniques, together with innovative methods for producing feature vectors representing documents (using objects extracted from the AI Engine codebase, along with other methods). In 1998, we had assumed that Webmind Inc.'s products would be based directly on the AI Engine, but as we began to understand how large the task of creating the AI Engine was, we became more interested in creating simpler products that leveraged aspects of the AI Engine's intelligence in highly-focused ways.
In terms of fundamental AI Engine development, natural language processing, at this stage, became the largest thorn in our side. We had been experimenting with unsupervised language learning, but this had failed to work adequately (as has traditionally been the case). We turned to supervised learning of linguistic rules based on linguistic corpora such as the Penn Treebank, XTag, morphological databases and so forth, but of course, this we knew this was only a partial solution. "Wiring in" knowledge in this way is only acceptable if the system has a way to adapt the knowledge based on its own learning, and unsupervised language learning did not seem adequate for this purpose.
We thus realized that we would have to expand the "Experiential Interactive Learning" aspect of our system. Language learning had to be integrated with the learning of cultural patterns of cognition, and this learning had to proceed through interaction with other minds in a shared, perceptual/manipulable environment. We created a mechanism by which Baby Webmind could interact with us in a simple simulated world, in which it could participate with us in various interactions with files, directories, financial data series, and other digital objects. It could then ground its linguistic knowledge in non-linguistic social interactions, just as a human child does when learning language.
Along with the Baby Webmind emphasis came a new focus on action as well as perception. We developed what we call the "schema" framework, a kind of program-execution framework implemented in terms of nodes and links and other actors; and worked out how schema could be learned by a combination of evolutionary programming and inference.
It took us a while to find a framework for representing and manipulating syntax that was compatible both with supervised learning from external sources and with experiential interactive learning. During the second half of 2000, we finally found this, in the form of our own version of lexicalized feature structure grammar. Finally, we had a natural language module that made sense in terms of the other modules of our digital mind, and in terms of the two modes of language acquisition that we had chosen to use.
Throughout 2000, our confidence in the finality of our AI design grew significantly. For the first time, we could review any textbook on cognitive science or human psychology, run through every aspect of mind mentioned there, and explain in detail how we accounted for that. We had designed a complete mind system, with the diverse specialization of the human brain, as well as the creative self-organizing flexibility. And, as the end of 2000 approached, we had written nearly all of the code needed to support this.
At the start of 2001, we completed what we called "WAE 0.5" - for the first time, an AI Engine incorporating all the modules, working together sensibly in a functioning distributed core. Millions of nodes, billions of links, dozens of types of cognitive processing. Well, there were only two small problems. Hundreds of parameters, complexly interacting, making the system very difficult to tune. And the performance of the system still wasn't anywhere near what we wanted it to be: the system needed to be drastically sped up, and its memory usage significantly reduced.
For a moment we thought we might have to rebuild the core again, but as it turns out, the object-oriented design of the 2000 core is sufficient that this is not the case. Instead, we are embarking on a program of intensive efficiency-oriented rearchitecture without altering the basic object structure and conceptual framework of the system. We have arrived at a number of simple yet radical design changes that, according to our experiments, should improve the speed of the system by several orders of magnitude. These changes will also make the parameter optimization problem significantly less severe. This rearchitecture process will take 3-6 months, and will proceed in parallel with basic AI work in areas like experiential interactive learning, language generation, and schema learning using hybridized evolution/inference. Most of these optimizations could not have been done a year earlier, because they are dependent on the specific makeup of the AI modules, which only became really "final" during 2000.
The AI Engine 0.5 is not adequate performance-wise to be used inside most kinds of products. It's too slow, and requires too many machines. However, its intelligence can be used to enhance the performance of information retrieval products, using a technique we call "rule export." One thing this engine is very good at is identifying relationships between various words, and various concepts. It is able to export rules describing the relationships between various words and concepts - in general, or in a particular domain characterized by a particular set of documents. Think of the exported rules as a kind of superintelligent, optimized thesaurus. The exported rule can be used within lightweight products to produce indices ("feature vectors") for documents, and these indices can be used for applications like text categorization, search and market prediction.
We believe that the concept of a "real AI Engine" exporting rules and other data structures to be used by simpler, less computationally intensive software system is a critical one. "Expert systems," simple software systems powered by sets of human-created rules, are well-known. Getting humans to explicitly state the rules by which they carry out intelligent acts in various domains, however, is not a trivial task. So much knowledge is tacit. Comparatively, using an intelligent, albeit extremely expensive, AI Engine to generate these rules has advantages, the chief one being that unlike human brains, the AI Engine's brain is "transparent": once it has learned something, one can look into its mind and see how it is doing that thing. Sometimes there are hard problems involved in figuring out how the system does something, but, not as hard as scanning and analyzing the real-time dynamics inside the human brain.
To sum up, then, the current state is that now, in March 2001:
* the conceptual design for the AI Engine is well specified, and apparently
complete
* an in-principle adequate platform for the deployment of this full conceptual
design (WM 0.5) is now implemented
* In theory, about 85-90% of the code required for "real AI" is now
written, and most of what is left to be done is testing, tuning, teaching,
wiring-in of specific knowledge, and refactoring for efficiency.
* Rigorous empirical testing of various system components, performing text
analysis tasks both together and separately, is now seriously underway, with
test plans created that will carry us far into the future.
* The system is now exporting rules embodying relationships between concepts,
for use for document and query indexing within IR products
The work of the AI Development Division, at this stage, is then divided into three categories:
* Work on the "Webmind IR Engine", a successor to the AI Engine 0.5
providing a wider spectrum of functions supporting products in the IR domain
* Work on the "Webmind Conversation Engine" - direct work on the path
toward real AI, chiefly involving experiential interactive learning, schema
learning, and the training and tuning of the system's psyche
* Work on core systems and services that benefit both AI Engines
There is also some other work being done around the fringes, building towards
the WAE 2.0. We are converting Mizar, a formalization of much classical and
modern mathematics, into KNOW (our knowledge representation language, which maps
immediately into AI Engine nodes and links). This will form the foundation for
the system's understanding of mathematical structures, which will form the basis
for its understanding of its own algorithms and data structures, which will form
the basis for the intelligent self-modification of AI Engine 2.0.
4. Intended Delivery Schedule
At the start of 2001 we bifurcated the development effort in the AI Development division, and aim toward two separate though overlapping goals: a Webmind IR Engine and a Webmind Conversation Engine.
Very rough schedules for these two engines were worked out prior to the
dissolution of Webmind Inc., and are given below. Of course, all dates given
here are now off, and will be re-calculated once the funding situation for the
post-bankruptcy Webmind Inc. becomes clearer.
* WM IR Engine 1.0, to be delivered Summer 2001, a solid platform providing intelligent text-analysis support to information retrieval products (principally search, categorization and entity extraction)
* WM Conversation Engine 0.7, to be delivered Fall 2001, carrying out simple, somewhat intelligent conversations in the domains of file manipulation and corporate information, using KNOW rather than English as its language
* WM IR Engine 2.0, to be delivered December 2001, adding robust natural language query processing, document summarization, and other functionalities yet to be specified, based on business needs
* WM Conversation Engine 0.8, to be delivered February 2002, providing simple English language conversation
* There will be a very small "maverick psynetized NL team" pushing toward the difficult goal getting NL conversation in place for WM Conversation Engine 0.7. This is not considered part of the mainstream of WCE 0.7 development, because of the fear that it may distract KNOW conversation development. However, it's understood that if this is achieved, the value of WCE 0.7 will be greatly enhanced.
* WM IR Engine 3.0, to be delivered Summer 2002, providing additional functionalities to be specified based on business needs
* WM AI Engine 1.0 (incorporating both IR and Conversation), to be delivered Fall 2002, providing intelligent (but not necessarily extremely human-like) conversation, in addition to the full spectrum of IR functions. I'll still call this guy WM 1.0 for short.
* WM AI Engine 2.0, to be delivered December 2003, incorporating mathematical
theorem-proving (beginning from the Mizar database) and automatic analysis and
optimization of the system's own source-code
5. Obstacles on the Path Ahead
The three big challenges that we seem to face in moving from AI Engine 0.5 to AI Engine 1.0 are:
* computational (space and time) efficiency.
* getting knowledge into the system to accelerate experiential learning
* parameter tuning for intelligent performance
Efficiency-wise, our experience so far indicates that the AI Engine 0.5 architecture is probably not going to be sufficiently efficient (in either speed or memory) to allow the full exercise of all the code that's been written within it - in either the real-world IR context or the real AI context. Thus a rearchitecture is in progress, based on the same essential object model. Essentially, the solidification of the conceptual design of the system over the last year allows us make a variety of optimizations aimed at specializing some of the very general structures in the current system so as to do better at the specific kinds of processes that we are actually asking these structures to carry out. The current system is written for extreme generality, and it has allowed us to experimentally design and implement a wide variety of AI processes (although, for efficiency reasons, not to test all of them in realistic situations, or in interesting combinations). Now that, through this experimental process, we have learned specifically what kinds of AI processes we want, we can morph the system into something more specifically tailored to carry out these processes effectively. It does seem the current architecture is sufficiently flexible that it will almost surely be possible to move to a more efficient architecture gradually, without abandoning the current general software framework or rewriting all the code.
Regarding getting knowledge into the system, we are embarking on several related efforts:
* Conversion of structured database data into KNOW format for import into WAE
(This is for declarative knowledge.)
* Human encoding of common sense knowledge in KNOW (this is for declarative
knowledge)
* Human encoding of actions (both external actions like file manipulations, and
internal cognitive actions) using "schema programs" written in
MindScript (this is for procedural knowledge)
* The Baby Webmind user interface, enabling knowledge acquisition through
experiential learning (this helps with both declarative and procedural
knowledge)
* Creation of training datasets so that schema operating in various parts of the
system can be trained via supervised learning. (This is different in detail for
different parts of the system, of course.) (This is for procedural knowledge
only.)
Finally, regarding parameter optimization, there have been several major obstacles to effective work in this area so far:
* Slowness of the system makes the testing required for automatic parameter
optimization unacceptably slow
* The interaction between various parameters is difficult to sort out
* Complexity of the system makes debugging difficult, so that parameter tuning
and debugging end up being done simultaneously
One of the consequences of the system rearchitecture proposed here would be to make parameter optimization significantly easier, both through improving system speed, and through the creation of various system components each involving fewer parameters. Although one part of the new system (the AttentionalFocus) will be almost as hard to tune as the current system, even here the problem will be rendered simpler by the fact that the parameters in the AF are also parameters in simpler, easier-to-tune components with fewer parameters apiece. Default settings for AF parameters will be obtainable from simpler components, and will then have to adapt themselves to take into account emergent phenomena in AF parameter space.
Summing up the directions proposed in these three problem areas (efficiency,
knowledge acquisition, and parameter tuning), one general observation to be made
is that, at this stage of our design work, analogies to the human mind/brain are
playing less and less of a role, whereas realities of computer hardware and
machine learning testing and training procedures are playing more and more of a
role. In a larger sense, what this presumably means is that while the analogies
to the human mind helped us to gain a conceptual understanding of how AI has to
be done, now that we have this conceptual understanding, we can keep the
conceptual picture fixed, and vary the underlying implementation and teaching
procedures in ways that have less to do with humans and more to do with
computers.
Obstacles on the Path to AI Engine 2.0
Finally, while the above issues are the ones that currently preoccupy us, it's also worth briefly noting the obstacles that we believe will obstruct us in getting from AI Engine 1.0 to AI Engine 2.0, once the current problems are surpassed.
The key goal with AI Engine 2.0 is for the system to be able to fully understand its own source code, so it can improve itself through its own reasoning, and make itself progressively more intelligent. In theory, this can lead it to an exponentially acceleration of system intelligence over time. The two obstacles faced in turning AI Engine 1.0 into such a system are
* the creation of appropriate "inference control schema" for the
particular types of higher-order inference involved in mathematical reasoning
and program optimization
* the entry of relevant knowledge into the system.
The control schema problem appears to be solvable through supervised learning, in which the system is incrementally led through less and less simplistic problems in these areas (basically, this means we will teach the system these things, as is done with humans).
The knowledge entry problem is trickier, and has two parts:
* giving the system a good view into its Java implementation
* giving the system a good knowledge of algorithms and data structures (without
which it can't understand why its code is structured as it is).
Giving the system a meaningful view into Java requires mapping Java code into
a kind of abstract "state transition graph," a difficult problem which
fortunately has been solved by some of our friends at Supercompilers LLC, in the
course of their work creating a Java supercompiler. Giving the system a
knowledge of algorithms and data structures could be done by teaching the system
to read mathematics and computer science papers, but we suspect this is a
trickier task that it may seem, because these are a specialized form of human
discourse, not as formal as they appear at first glance. In order to jump-start
the system's understanding of scientific literature in these areas, we believe
it will be useful to explicitly encode knowledge about algorithms and data
structures into the Mizar formalized mathematics language, from which it can
then be directly translated in to AI Engine nodes and links. (This is a project
that we would undertake now, if we were faced with an infinite-human-resources
situation!)
6. Architecture and Dynamics Overview
In spite of its simple conceptual foundations, the AI Engine is a large and complex system. Some of the reasons for this complexity were reviewed in the History section above. Here we will give a quick overview of the system architecture, which surely will raise more questions than it will resolve in the mind of any educated reader, but will hopefully at least get across a general idea of what kind of system we're building.
From a very abstract, mathematical point of view, the WAE consists of the
following conceptual/mathematical entities:
* Atomic actors (representing e.g. words, concepts, numerical data sets, URL's
and other pointers to outside entities)
* Composite actors, grouping other actors
* Atomic actions, both external and internal (basically,
transformations/creations/deletions of internal objects)
* Composite actions, grouping other actions
* data channels between actions
* n-ary relations (joining objects, sets or relations)
* Conjunctions and disjunctions of relations
The effective deployment of these entities on a distributed network of SMP
(symmetric multiprocessor) machines is the job of the Webmind Core, also called
"psycore." The specialization of these mathematical notions into
actors that do useful things in important contexts is the job of both the core
and the various modules implemented on top of it.
How are these mathematical structures embodied in software? This is of course
a long complicated story, and only the most superficial highlights will be
presented here. There is a collection of code which embodies a general system of
software actors, living on a network of SMP machines. This is the Webmind Core.
Then, within this, there is something called "psycore", which is a
dynamic semantic network of a very general type, implemented on top of the core.
The Webmind Core deals with such things as Lobes (groups of actors living on a
single machine), Messages sent between actors, and so forth.
Psycore is founded on the two most basic actor types in the AI Engine: the Node
and the Link. A Node represents a concept, process, percept or action. A Link
represents a relationship - it may be a relationship between Nodes, or a
relationship between Links. Links typically come with numerical "truth
values," including a strength (how strong is the relationship the link
represents) and a confidence (how sure is the system that the strength it's
assigned is correct). The other actor types in psycore exist to support Nodes
and Links - to group them in various ways, and to allow them to send messages to
each other and to create new actors relating each other in various ways. Psycore
is not the whole AI Engine architecture, but it is the crux of the system, the
"secret sauce" that makes the system unique. Because it is a
fundamental "mind network", it is often called the Psynet.
The first step to specializing the general psycore framework is to introduce some basic link types, which are all special kinds of n-ary relations. These fall into several classes: logical, associative, causal, and dataflow. Various node types may then be introduced, each one possessing various types of links and various types of link-building agents.
"Activation," the WAE form of energy, spreads through the network between nodes and links according to neural-net-like dynamics. Activation-spreading, in itself is goal-less and spontaneous, driven purely by the complex nonlinear dynamics of Peircean generalized association. But this isn't the whole story: a substantial percentage of system activity is goal-directed. A goal activates things that help achieve it, also recording as lasting knowledge information about which things helped achieve it. Thus activation spreading, in large part, works in the service of activating things that can help fulfill basic system goals.
All this is a general framework for learning and knowledge representation - basically, a flexible and extensible hybrid "semantic/neural network." In principle, this could be a mind in itself - but not a plausibly efficient one. Thus, there is a modular structure built on top of this general framework, involving special nodes, links and actions oriented toward special domain and special types of learning. But each module has only limited differentiation. It has its own parameters and its own distribution of node and link types, but it must use the same basic structures and dynamics as the other modules, as supplied by the core.
Finally, what exactly are WAE's goals? Just as humans are built to want to eat, have sex, drink, exercise, etc., so WAE is built to want to answer queries effectively. In order to assure this, each time it sends a query, it should receive an assessment of the perceived quality of its response. This may be explicit (e.g., the user may be asked), or it may be implicit (e.g., the client may calculate the number of times the human user needed to rephrase his question to get the answer desired). We all know how hard it is to guess what humans want -- what this means is that answering human questions is certainly a "complex goal." It's an adequately subtle goal to require all the modules to work together, utilizing the services of the mind OS cooperatively to build a diverse self-organizing actor system with emergent structures that are not only complex but adaptive.
Modules of Mind
The nodes and links in psycore are not uniform; they are of many different
types, and the different types are grouped into various modules.
The need for modularity is not surprising from a neurophysiological perspective.
The brain has hundreds of specialized parts devoted to tasks such as visual
perception, smell, language, episodic memory, and so forth. Each of these parts
is composed of neurons which share certain fundamental features, but each also
has its unique features and capabilities that scientists are only beginning to
understand. Similarly, when a WAE is running on a computer, different parts of
the computer's memory are assigned to different tasks. Each of these parts of
the computer's memory draws on the psycore for its basic organizational
framework, and on more specialized modules for advanced capabilities.
To support this dynamic specialization, the AI Engine is divided into modules, each one containing nodes and links pertaining to a certain kind of mental processing. Each of the modules, broadly speaking is specialized for recognizing and forming a particular kind of pattern. And all the different kinds of nodes and links can learn from each other -- the real intelligence of the system lies here, in the dynamic knowledge that emerges from the interactions of different species of nodes and links. This is the essence of the AI Engine's mind, of how its patterns create and recognize patterns in themselves and the world to achieve their complex goals.
Here we'll give a quick laundry list of modules, without going into great detail on any of them. Each module contains various types of actors: nodes, links, wanderers, stimuli, and other lower-level actors that live inside nodes and links.
There's a numerics module, containing data processing actors that recognize patterns in tables of numbers, using a variety of algorithms, some standard, some innovative. DataNode embodies nonlinear data analysis methods and it recognizes subtle patterns that'll always be missed by ordinary data mining and financial analysis software.
There's a Natural Language Processing ("natlang") module, which deals with human language processing. Most simply, the natlang module represents texts as TextNodes, linking down to WordNodes representing words in the text, and other nodes and links representing linguistic feature structures, facts, concepts and ideas in the text. It has text processing actors that recognize key features and concepts in text, drawing relationships between texts and other texts, between texts and people, between texts and numerical data sets. These actors process vast amounts of text with a fair amount of understanding and a lot of speed.
On the other hand, the natlang module also contains reading actors, which are used to study important texts in detail. They proceed through each text slowly, applying semantic processing schema that build a mental model of the relationships in the text just like a human reader does. These reading actors really draw the AI Engine's full set of semantic relationships into play, every time they read a text. The nodes in the natural language module carry out both semantic and syntactic analysis. The NL system takes in text and parses it, and outputs the parsed text into nodes and links.
As important as language understanding is, however, it is not all-powerful. Relations learned through the natural language system are not intrinsically "understood" by the system -- they represent purely formal knowledge. The grounding module, on the other hand, contains schema that allow the system to derive knowledge directly from its environment. A particular case of grounding actors are textual-numerical correlation actors, which recognize patterns joining texts and numerical data files together. These are used by the Webmind Market Predictor when it finds the concepts in news that drive the financial markets.
The ingestion of linguistic as well as numerical data is mediated via the short term memory module. The most recent items read in through a given input stream are stored in short-term memory and the various intercombinations of these items are explicitly represented in a temporary way. This system is crucial, among other things, for disambiguation of linguistic terms.
There's a category module, containing actors that group other actors together according to measures of association, and form new nodes representing these groupings. This, remember, is a manifestation of the basic principle of the dual network.
There are learning actors, that recognize subtle patterns among other actors, and embody these as new actors. These span various modules, including the reason module, containing logical inference wanderers, that reason according to a form of probabilistic logic; and the automata module, containing AutomatonNodes that carry out evolutionary learning, according to genetic programming, a simulation of the way species reproduce and evolve.
In the user module there are actors that model users' minds, observing what users do, and recording and learning from this information - these are UserNodes and their associated Wanderers. There are actors that moderate specific interactions with users, such as conversations, or interactions on a graphical user interface. And in the self module there are self actors, wanderers and stimuli that help the SelfNode study its own structure and dynamics, and set and pursue its own goals.
There are QueryNodes in the query module, embodying schema that mediate interactions with human queriers. WAE's query processing is integrated with the rest of its mental activity, just as for a human being, question-answering is not so different from purely internal thought processing. When a query (be it a series of key words, a paragraph of natural language, or a series of commands requesting particular data operations) is entered into the system, a node is created for it, the query node sends out mobile actors, and these actors create new links joining it and other nodes. Activity related to the query node spreads through the Psynet, and after a certain period of time, the nodes with the highest activity relevant to this particular activation process are collected, and returned as the answers to the query. The distinction between activity due to a particular query and activity due to general Psynet thought processes or other queries is carried out via an innovative, proprietary technique of "parallel thought processes," which allows to do one thing the human mind cannot: carry out hundreds or thousands of simultaneous trains of thought, and keep them all straight!
Each of the actors (nodes, links, mobile agents) involved in these modules has in itself only a small amount of intelligence, sometimes no more than that you might see in competing AI products. Psycore is a platform in which they can all work together, learning from each other and rebuilding each other, creating an intelligence in the whole that is vastly greater than the sum of the intelligences of the parts.
What the psycore provides is a common dynamic data structure for all these different specialized pattern recognition and formation schemes to work together on. It achieves its true value when the different specialized schemes actually work together in harmony, helping each other learn every step of the way. The emergent intelligence that you obtain in this way outweighs by far the mechanical inefficiency of using a common dynamic data structure.
Experiential Learning
So far we've mostly been talking about the Engine's internals - how various processes inside the system work, and how they interact, and so forth. But ultimately, it's crucial that the AI Engine is embedded in the outside world - the Net is not only it's brain, it's its world. How does WAE experience? How does it learn from experience? Intelligence is all about achieving complex goals in complex environments, which means that sensing and understanding the environment is centrally important.
Experiencing and learning from experience uses all the mechanisms described above - language, reasoning and activation spreading, self and emergence, and on and on and on.
It also uses another level of structure: the division of nodes into function-specific node groupings. Most of the psynet running at any giventime consists of what we can think of as a medium-term memory node group. But interacting with the world requires three specialized node groups in addition: one for perception, one for action, and one for short-term memory. The dynamics here is about exactly what you'd think: stuff comes in through perception, and goes out through action; and short term memory is the gateway for most if not all interactions between perception/action and long term memory.
The short term memory is host to many of the nodes constituting what we call
"WAE's Psyche": its goals, contexts, feelings and motivations; and
schema that it commonly uses to achieve certain goals in certain contexts,
including the goal of maximizing its own happiness, and the goal of maximizing
user happiness insofar as possible.
MindServers
The last ingredient of our AI architecture has emerged over the last 6 months as a result of our need to improve the space and time efficiency of the system. We realized that the psycore framework, with freely interpenetrating nodes and links of various types carrying out various processes, actually provided more generality than was required by 80% of all mental processes. Of course, the remaining 20% of mental processes are crucial - these are the smartest processes, the one requiring "dynamic, free-flowing, focused attention." But we realized we had to rearchitect the system so that only the processes really requiring the full power of psycore, are run in psycore, and the others are run using a host of more specialized and efficient software mechanisms called MindServers.
Thus we now have a centralized "Mind Database" containing complete information about all the mind objects (nodes and links) in the system. Most of the system's learning processes require only partial information about mind objects, or require information about only some mind objects rather than all of them. Thus, surrounding the Mind DB, we have a collection of specialized learning processors or "Mind Servers." Mind Servers come in two species:
* Psycore-based: These represent knowledge internally using nodes and links,
but have specialized scheduling of processes.
* Psycore-based: These represent knowledge internally using domain-specific
knowledge representations
In either case, a Mind Server comes with a process that builds its own internal image of the knowledge from the Mind DB that it needs. Because of their special-purpose nature, the knowledge images inside non-core-based MindServers may be much more specialized than that inside the Mind DB, thus achieving greater space and time efficiency. Note that the whole framework is still based on the Webmind Core, which provides a general framework for networked software actors. The Webmind Core was originally built to support psycore, but now it also supports a richer "society of mind"-ish AI architecture with psycore at the center.
There are many different MindServers; the following is a partial list:
* Context Formation
* Association-Finding
* Inference
* Higher-Order Inference
* Prediction
* Causal Inference
* Linguistic Feature Structure Learning
* Genetic Programming
Mind Servers can take care of most, but by no means all, mental processing. What they leave out is - precisely the crux of the mind! Because, for those aspects of intelligence that require constant real-time interpenetration of different kinds of learning, psycore is needed. One can think of the psycore portion of the system - still by far the largest part of the system - as the system's AttentionalFocus. Unlike the MindServers, it contains a general representation of knowledge allowing simultaneous use of all the types of knowledge contained in the Mind DB.
Finally, along with the MindServers, we have added specialized system lobes
corresponding to aspects of experiential interaction, as mentioned above.
Real-time conversation processing (in KNOW, our formal-logical knowledge
representation language, or English) is based on specialized lobes carrying out
STM, language comprehension, language production, query processing, and related
functions. Text processing uses a similar system.
Webworld
An even deeper level of background processing than is provided by the MindServers, is provided by Webworld. Webworld is a software system helps the AI Engine with speculative, long-term learning processes that may take a long time, but are ultimately needed for the system's increasing intelligence.
Webworld is a sister software system to the AI Engine, sharing some of the same codebase, but serving a complementary function. A Webworld lobe is a much lighter-weight version of a WAE lobe, which can live on a single-processor machine with a modest amount of RAM, and potentially a slow connection to other machines. Webworld lobes host actors just like WAE lobes, and they exchange actors and messages with other Webworld lobes and with AI Engines. AI Engines can dispatch non-real-time, non-data-intensive "background thinking" processes to Webworld, thus immensely enhancing the processing power at their disposal. Webworld is a key part of the Webmind Inc. vision of an intelligent Internet. It allows the AI Engine's intelligence to effectively colonize the entire Net, rather than remaining restricted to small clusters of sufficiently powerful machines.
In studying the practical applications of Webworld within the AI Engine, it becomes clear that there are two very separate use cases.
One is what we may call the high-bandwidth use case. In this case, we can assume we have a number of Webworld locations all of which have high-bandwidth access to a central Mind Db. This occurs, for example, when all the Webworld locations are on a single LAN, which also contains an AI Engine.
In this case, one can use Webworld locations to do pretty sophisticated things. For example, one can do schema evolution, where a population of schema resides in each location, and fitness evaluation of the schema involves inference, which gathers relevant data from the central Mind Db. GP operations, and inference operations, are carried out in the Webworld location, but the location must make frequent calls to the Mind Db to gather relevant links on which to perform inference.
The other case is the low-bandwidth use case. In this case, we assume that each Webworld location must do its own processing without frequent access to externally-stored data. Appropriate tasks for this kind of Webworld location would be more mathematically-oriented problems like:
* GP-based parameter optimization (given a bunch of data from the time server
regarding the indicator values of the system under different parameter values,
find the best parameter values for achieving given indicator values).
* Plan optimization (taking plans created by the inference engine and making
them more compact and efficient, using graph theory algorithms)
* Creation of predictive models for system parameters (using ESP, etc., based on
data from the time server)
* Clustering for category node formation. (A collection of nodes are
characterized by feature vectors, which are exported to a Webworld location for
clustering using a standard algorithm, e.g. a Weka method.)
Any "deep AI" problems require background knowledge which is highly memory-intensive and hence not appropriate for a low-bandwidth Webworld situation.
Since many of the low-bandwidth-appropriate problems are optimization problems, which can be approached using GP; and schema learning, a key high-bandwidth Webworld application, also relies on GP; the first priority with Webworld is thus to get distributed GP working as effectively as possible.
Summary
The architecture of the system has evolved over time in what seems to us a very natural way. We have made many discoveries that you just don't make until you actually start building and experimenting with an integrative, large-scale, self-organizing AI system. While we are sure that our learning curve is not over, we feel that our conceptual design and our system architecture are now both sufficiently complete to lead us to our goal of creating a program that can hold an intelligent conversation about itself and information it has read. We expect that our near future discoveries will consist largely of ways to make particular components more efficient, and intelligent "control schema" representing high-level habits of thought (affecting not the system structure but the system dynamics).
In review, the architecture consists of
* The Webmind Core, a general "distributed actor system" framework,
on top of which all other system components are implemented
* Psycore, a dynamic semantic network of flexibly-defined nodes and links
* An assemblage of AI modules, each containing specific types of nodes and links
devoted to a particular aspect of mental function: reason, NLP, GP, psyche, user
modeling, etc.
* An assemblage of MindServers, interacting via a common MindDB, each carrying
out a particular aspect of mental function in isolation, for cases where the
overhead of full inter-process interpenetration is not needed
* A real-time conversation-processing subsystem, carrying out language
comprehension and generation and short-term memory, based on a specialized
variant of psycore
* Webworld, a peer-to-peer distributed processing framework to which psycore
nodes and MindServers can dispatch difficult long-term problems for solution
This seems to us essentially the only possible way to induce contemporary hardware and software to give rise to the self-organizing emergent network of patterns that is the mind.
In terms of practical applications of the system, it seems that the first
wave of real-AI-empowered software applications will not use the AI Engine
directly, but will rather use rules of various sorts exported from the AI
Engine. This is because the AI Engine itself uses a lot of processing power, so
that even if a working intelligent computer conversationalist is created, it may
only be able to talk to one person at a time, and may require hundreds of
thousands of dollars worth of hardware. There may be a limited market for
"intelligent digital gurus" like this as research tools, but for the
mass market, in the short term, the key will be the exportation of rules and
other data structures that simpler, less computationally costly software systems
can use to dramatically enhance their performance.
I
Conceptual Background
2
A Brief and Biased History of AI
Ben Goertzel
with some help from Ted Goertzel
This chapter began as part of a popularization about the AI Engine that my
father Ted Goertzel and I were writing in 1998. We decided to put the
popularization on hold until the system itself was ready to be launched into the
world in a sufficiently impressive way. Maybe this year!
1. Competitors in the Race to Real AI (and the Lack Thereof)
Given that there exists a major subdiscipline of computer science called "Artificial Intelligence," one might expect there to be many strong competitors in the race to create a real AI system. But in fact this is not the case. Most of the field of AI is not directly concerned with the design and engineering of real AI systems at all, but rather with various narrowly defined subproblems of the problem of creating digital intelligence. The presupposition of this work is that solving these subproblems, in isolation, contributes significantly toward solving the overall problem of creating real AI. While this is of course true to a certain extent, our experience with the AI Engine suggests that it is not so true as is commonly believed. In many cases, the best approach to implementing an aspect of mind in isolation, is very different from the best way to implement this same aspect of mind in the framework of an integrated, self-organizing AI system.
Quite honestly, at the present time we don't consider there to be any very serious competitors in the race toward real AI. Without being too egomaniacal about it, there is simply no evidence that anyone else has a serious and comprehensive design for a digital mind. However we do realize that there is bound to be more than one approach to creating real AI, and we are always open to learning from the experiences of other teams with similar ambitious goals.
Perhaps closest thing we have to a real competitor on the real AI front is Artificial Intelligence Enterprises (www.a-i.com), a small Israeli company whose engineering group is run by Jason Hutchens, a former colleague of mine from University of Western Australia in Perth. They are a direct competitor in that they are seeking to create a conversational AI system somewhat similar to the Webmind Conversation Engine. However, they have a very small team and are focusing on statistical learning based language comprehension and generation rather than on deep cognition, semantics, and so forth. If this team were given a significant infusion of more diverse AI expertise, and many more engineers, it could become a serious threat. At best, however, they would be 3-5 years behind us, as their initial design is certainly no more advanced than ours was in 1997.
Another project that competes less directly is Katsunori Shimohara and Hugo de Garis's Artificial Brain project, initiated at ATR in Japan (see http://citeseer.nj.nec.com/1572.html) and continued at Starlab in Brussels, and Genotype Inc. in Boulder, Colorado. This is an attempt to create a hardware platform (the CBM, or CAM-Brain Machine) for real AI using Field-Programmable Gate Arrays to implement genetic programming evolution of neural networks. We view this fascinating work as somewhat similar to the work on the Connection Machine undertaken at Danny Hillis's Thinking Machines Corp. - the focus is on the hardware platform, and there is not a well-articulated understanding of how to use this hardware platform to give rise to real intelligence. It is highly possible that the CBM could be used inside WAE, as a special-purpose genetic programming MindServer; but CBM and the conceptual framework underlying it appear not to be adequate to support the full diversity of processing needed to create an artificial mind.
A project that once would have appeared to be competitive with ours, but changed its goals well before Webmind Inc. was formed, is the well-known CYC project (www.cyc.com). This began as an attempt to create true AI by encoding all common sense knowledge in first-order predicate logic. They produced a somewhat useful knowledge database and a fairly ordinary inference engine, but appear to have no R&D program aimed at creating autonomous, creative interactive intelligence.
Another previous contender who has abandoned the race for true AI is Danny Hillis, founder of the company Thinking Machines, Inc. This firm focused on the creation of an adequate hardware platform for building real artificial intelligence - a massively parallel, quasi-brain-like machine called the Connection Machine (Hillis, 1987). However, their pioneering hardware work was not matched with a systematic effort to implement a truly intelligent program embodying all the aspects of the mind. The magnificent hardware design vision was not correlated with an equally grand and detailed mind design vision. And at this point, of course, the Connection Machine hardware has been rendered obsolete by developments in conventional computer hardware and network computing.
On the other hand, the well-known Cog project at MIT is aiming toward building real AI in the long run, but their path to real AI involves gradually building up to cognition after first getting animal-like perception and action to work via "subsumption architecture robotics." This approach might eventually yield success, but only after decades.
Of course, there are hundreds of other AI engineering projects in place at various universities and companies throughout the world, but, nearly all of these involve building specialized AI systems restricted to one aspect of the mind, rather than creating an overall intelligent system.
Why is the field of AI this way? Why are there so few projects directly aimed at actually creating an autonomous, creative, digital intelligence?
I believe that, ultimately, the main culprit in the history of AI has been the lack of adequate hardware. It seems to me that hardly anybody has ever seriously tried to build a whole mind -- a computer system that can observe the world around it, act in the world around it, remember information, recognize patterns in the world and in itself, and create new structures inside itself in order to help it better achieve its goals. Presumably no one has tried to do this because the computer resources available have always been blatantly inadequate to support such a program. It seems to me that, lacking the computer resources to build a whole mind, researchers have typically focused on one or another particular aspect of the mind, and tried to push this aspect as far as it could go. Lacking a perceptual environment to embed their AI systems in, researchers have built reasoning and memory programs with essentially no perceptual systems; programs that act entirely on the basis of logical rules, with no direct sensory link to the world - and have theorized that something substantial about mind can be learned this way. Lacking the ability to create neural nets with a billion neurons, researchers have proposed that neural nets with 10,000 neurons can be made to do something mind-like. Lacking the ability to build specialized hardware supporting all aspects of intelligence, one builds specialized hardware supporting one particular aspect of intelligence (say, evolution of neural networks according to training data) and asks how far this one aspect can be pushed. And so on.
To build a comprehensive system, with perception, action, memory, and the ability to conceive of new ideas and to study itself, is not a simple thing. Necessarily, such a system consumes a lot of computer memory and processing power, and is difficult to program and debug because each of its parts gains its meaning largely from its interaction with the other parts. Yet, is this not the only approach that can possibly succeed at achieving the goal of a real thinking machine?
We now have, for the first time, hardware barely adequate to support a comprehensive AI system. Moore's law and the advance of high-bandwidth networking mean that the situation is going to keep getting better and better. However, we are stuck with a body of AI theory that has excessively adapted itself to the era of weak computers, and that is consequently divided into a set of narrow perspectives, each focusing on a particular aspect of the mind. In order to make real AI work, I believe, we need to take an integrative perspective, focusing on
* The creation of a "mind OS" that embodies the basic nature of
mind, and allows specialized mind structures and algorithms dealing with
specialized aspects of mind to happily coexist
* The implementation of a diversity of mind structures and algorithms
("mind modules") on top of this mind OS
* The encouragement of emergence among these specialized modules, so that the
system as a whole is coherently responsive to its goals
This - together with a precise plan for how the Mind OS should work, based on a study of the philosophy of mind -- is the core of the Webmind vision.
In the remainder of this chapter, I'll briefly review some large-scale trends
in previous AI work, with a focus on how they fit into this picture of the
history of AI, and how they contribute to the WAE. Then, in the next chapter,
I'll start over from scratch, and begin with the philosophy of mind that leads
to the WAE - that tells us how to build the "mind OS" and encourage
the emergence that I believe real AI demands.
2. Nets versus Rules
When I first started studying AI in the mid-1980's, it seemed that AI researchers were fairly clearly divided into two camps, the neural net camp and the logic-based or rule-based camp. This isn't quite so true anymore, but in reviewing the history of AI, it's an interesting place to start. Both of these camps wanted to make AI by simulating human intelligence, but they focused on very different aspects of human intelligence. One modeled the brain, the other modeled the mind.
The neural net approach starts with neurons, the nerve cells the brain is made of. It tries to simulate the ways in which these cells are linked together, and in which they achieve cooperative behaviors by nonlinearly spreading electricity among each other, and modulating each other's chemical properties. Its conceptual roots go back to Norbert Wiener's book "Cybernetics: Control and Communication in Animals and Machines," from the 1930's, an amazing book for its time, in which it was shown for the first time that the same mathematical principles could be used to understand both man-made electrical control systems and biological systems like bodies and brains.
Rule-based models, on the other hand, try to simulate the mind's ability to make logical, rational decisions, without asking how the brain does this biologically. They trace back to a century of revolutionary developments in mathematical logic, culminating in the realization that Leibniz's dream of a complete logical formalization of all knowledge is actually achievable in principle, although very difficult in practice.
To most any observer not caught up on one or another side of the debate, it's obvious that both of these ways of looking at the mind are extremely limited. True intelligence requires more than following carefully defined rules, and it also requires more than random links between a few thousand artificial neurons. The WAE incorporates aspects of neural nets and also of logic-based AI, although it doesn't use either one in a conventional way.
Neural Nets
The landmark work that sticks out as the start of neural network theory was the work of cyberneticists Warren McCullough and Walter Pitts, in the early 1940's_, along with the psychological theories Donald Hebb published in his 1949 book The Organization of Behavior. These early researchers created the "neural net view of the mind." In this view, the stuff of mind is patterns of electrical flow among neurons. Collections of tightly interlinked neurons - what Hebb called "cell assemblies" -- lead to distinct, repeatable patterns of flow. Electrical charge passes through neural networks according to nonlinear threshold rules, and on a slower time scale, modifies the properties of the synaptic connections making up the networks - the links between neurons. Via synaptic modification (Hebbian learning), distinct, repeatable patterns of flow create collections of tightly interlinked neurons. Everything is patterns of connection, patterns of electrical flow.
What McCullough and Pitts did in their first research paper on neural networks was to prove that a simple neural network model could do anything - could serve as a "universal computer." At this stage computers were still basically an idea, but Alan Turing and others had worked out the theory of how computers could work, and McCullough and Pitts' work, right there at the beginning, linked these ideas with brain science.
The overall behavior of a neural network is given by the pattern of interconnections between the various neurons, and by the values of weights that are assigned to the various connections. But, given the nonlinear nature of the threshold dynamics, there's no easy mathematical way to tell what kind of behavior a certain interconnection pattern and weight distribution are going to lead to. There's no easy mathematical way to tell how the weights should adapt themselves to provide learning, and it's not so easy to figure out how the brain does it - neuroscientists still don't know.
Modern work on neural nets falls basically into two camps. There are attractor neural nets, which are complex dynamical systems, used as associative memories and problem-solving algorithms. And, much more popularly, there are feedforward or "backprop" neural networks, which are learn patterns between inputs and outputs. These are used in various practical applications, ranging from financial prediction to automobiles' on-board diagnostic computers. In the WAE we use different methods to recognize input/output patterns, but mostly for convenience - we could use feedforward neural nets if we wanted to; we've experimented with them a bit but have found other similar things that are less brainlike but more efficient. In my view, the use of neurons in this way doesn't really embody any of the essence of the brain's intelligence - there are other ways of doing the same thing that don't look like the brain but work a little bit better.
Real-world neural net engineering gets quite complex. For instance, to get optimal performance for OCR, instead of one neural net, researchers have constructed modular nets, with numerous subnetworks. Each subnetwork learns something very specific via backpropagation, then the subnetworks are linked together into an overall meta-network. One can train a single network to recognize a given feature of a character 4/20/014/20/01 say a descender, or an ascender coupled with a great deal of whitespace, or a collection of letters with little whitespace and no ascenders or descenders. But it is hard to train a single network to do several different things 4/20/014/20/01 say, to recognize letters with ascenders only, letters with descenders only, letters with both ascenders and descenders, and letters with neither. Thus, instead of one large network, it pays to break things up into a collection of smaller networks in a hierarchical architecture. If the network learned how to break itself up into smaller pieces, one would have a very impressive system; but currently this is not the case, the subnets are carefully engineered by humans.
The WAE has "nodes" that are a bit like neurons - they have a threshold rule in them - and "links" that are a bit like synapses - connections between neurons - in the brain. But the WAE's nodes have a lot more information in them than neurons, they more closely represent huge groups of neurons, as will be made clear in Chapter 2. And the WAE's links have more to them than the links in neural net models - they're not just conduits for simulated electricity; they have specific meanings and are formed by specialized actors that recognize these meanings. In short, like backprop neural nets, the WAE takes the brain as an inspiration in some ways, but does not attempt to model the brain. But while backprop neural nets use the brain as an inspiration for how to map inputs into outputs, the WAE takes it as an inspiration for how to construct a whole mind.
Rules
Rule-based AI programs aren't based on self4/20/01organizing networks of autonomous elements like neurons or nodes, but rather on systems of simple logical rules. Intelligence is reduced to following orders. You don't try to deal with the emergence of mind from brain - you just try to look at what mind does, and write the simplest possible programs that will emulate this behavior. The approach is refreshingly direct, and teaches us a lot about the complexity of human behaviors that we take for granted and that we consider very very simple - like holding a conversation or solving a puzzle or even walking across the room. Much of what it teaches us, however, is that it's really hard to boil down intelligent behaviors into sets of rules - the sets of rules are huge and variegated, and the crux of intelligence become the dynamic learning of rules rather than the particular rules themselves.
One famous early rule-based program was something called the General Problem Solver, which was not that general at all, but was capable of solving a variety of simple puzzles, for example cryptarithmetic puzzles like DONALD + GERALD = ROBERT. [To solve this, assign a number to each letter so that the equation comes out correctly.] What GPS was doing was taking an overall goal - solving a puzzle - and breaking it down into subgoals. It then tried to solve the subgoals, breaking them down into subgoals if necessary, until it got subgoals small enough that it could deal with them in some direct way, like by enumerating all possible values some letter could take in a cryptarithmetic puzzle. This same basic logic is used now in much bigger and better rule-based AI programs, for example, SOAR, the subject of ongoing development by Simon and his colleagues.
This business of goal and subgoals is important to the WAE - we have something called a GoalNode, and we have processes called schema that can break goals contained in GoalNodes into subgoals. The basic algorithm of GPS and SOAR is something that's necessary for the mind, but it doesn't have to be done in as rigid a way as these programs do it. In fact, doing it in such a rigid way is tremendously destructive. But to make it flexible, you need the goal and subgoal management part of the mind to interact with the other parts of the mind. The system has to be able to flexibly determine which of its processes are effective for achieving which of its goals in what contexts - and for this it needs reasoning and association-finding and long-term memory. And it has to be able to use context-dependent, intuitive reasoning to figure out what goals to split into subgoals in what way in what situation. Basically GPS and SOAR and this whole line of AI research are a result of taking one aspect of the mind - goal-directed, problem-solving behavior - and extracting it from the rest of the mind. Unfortunately, when you extract it from the rest of the mind, this aspect of thinking isn't all that useful, because it has no way to control itself in a context-dependent way.
Another famous rule-based AI program was BACON, which was basically a data mining tool for extracting algebraic patterns from scientific data. Herbert Simon once claimed that a four4/20/01to4/20/01five hour run of BACON corresponds to "not more than one human scientific lifetime." Douglas Hofstadter, in his book Metamagical Themas, suggested that one run of BACON actually corresponds to about one second of a human scientist's life work. I think that Hofstadter's estimate, though perhaps a little skimpy, is much closer to the mark. Only a very small percentage of scientific work is composed of BACON4/20/01style data crunching.
In the WAE, we actually use stuff like BACON - though vastly more sophisticated. We call this aspect of the WAE's thinking "machine learning" or "data mining" - it's discussed in detail in Chapters 8 and 11. Recognizing patterns in vast amounts of data is a very important part of the mind, but it's only part of the mind. The WAE learns rules explaining why humans like some messages or e-mails better than others, using methods not that different from BACON's. But, the real trick there is in mapping the messages or e-mails into numbers that data mining methods can deal with. This involves understanding the meanings of various words and phrases and expressions. Also, there's the matter of deciding what data to look at, which is done by the general association-finding mechanisms in the WAE's mind. And there's reasoning which brings general background knowledge into the process, as opposed to pure data mining which is just pattern-finding. Bringing associations and reasoning into the picture, you need long-term memory, which opens a whole big and beautiful can of worms. Pattern finding is crucial, but it's only a little part of the picture.
Rule4/20/01based AI - "symbolic" AI -- has had plenty of practical successes. But every single one of these successes has resulted from specialized tricks, rather than flexible intelligence. One term for this is "brittleness." Or, you could call it remarkable literal4/20/01mindedness. These programs are a lot like WordPerfect, DOS 6.0, or a pocket calculator -- they do what they're told, and not one little bit more. If they're programmed to deal with one context, then that's what they'll deal with; not in a million years will they generalize their knowledge to something totally different.
There was one famous program that contained logical definitions of everyday words. An "arch" was defined as "Three blocks, A, -- and C, so that C is supported by A and B, and A and -- do not touch." This is all very well for playing with blocks 4/20/014/20/01 but what will the program do when it gets to Arches National Park in Utah ... or builds arches out of modeling clay? On the other hand, show a clever three4/20/01year old human an arch made of blocks, and she'll immediately recognize a rock arch as a member of the "arch" category. It won't occur to her that a rock arch can't be naturally decomposed into three blocks A, -- and C. Children, unlike expensive research computers, are anything but brittle 4/20/014/20/01 even their bones are flexible!
Some people have tried to get around the brittleness problem by providing the computer with so much information that it could answer any possible contingency. The most ambitious project in this direction was Doug Lenat's Cyc project, mentioned above, which has been going since 1984. Cyc is focused on trying to build a program with common sense. The Cyc team is mainly focused on encoding millions of items of data, so that the program can know everything an eight4/20/01year4/20/01old kid knows. "Cyc" was originally short for "Encyclopedia," but they found that the knowledge they needed was quite different from that found in encyclopedias. It was everyday knowledge you could get by asking a small child, perhaps more like that in a dictionary. For example, the Cyc definition of "skin" goes like this:
"A (piece of) skin serves as outer protective and tactile sensory covering for (part of) an animal's body. This is the collection of all pieces of skin. Some examples include #$TheGoldenFleece (representing an entire skin of an animal) and (#$BodyPartFn #$YulBrynner #$Scalp) (representing a small portion of his skin).
The Cyc definition of happiness is:
The enjoyment of pleasurable satisfaction that goes with well4/20/01being, security, effective accomplishments, or satisfied wishes. As with all #$FeelingAttributeTypes, this is a #$Collection 4/20/014/20/01 the set of all possible amounts of happiness one can feel. One instance of #$Happiness is `extremely happy'; another is `just a little bit happy'.
This is an attempt to solve the common sense problem that we see when playing with chat bots like ELIZA - these chat bots have no common sense, they have no idea what words mean. Cyc is based on getting humans to tell computers what words mean.
It's interesting stuff, but you have to ask: How much do the logical definitions in Cyc really overlap with the kind of information contained in the mind of an eight-year old child. We humans aren't even explicitly aware of much of the information we use to make sense of the world. A human's notion of happiness or skin is much bigger, disorderly and messier than these definitions. These kinds of general abstract definitions may be inferred in the human mind from a whole lot of smaller-scale, practical patterns recognized involving skin and happiness, but they're not the be-all and end-all. In dealing with most practical situations involving skin and happiness, we don't refer to this kind of abstraction at all, but we use the more specialized patterns that the general conclusions were derived from.
Basically, Cyc tried to divorce information from learning, but it can't be done. A mind can only make intelligent use of information that it has figured out for itself. Despite sixteen years of programming, Cyc never succeeded in emulating an eight year old child. Nor has anyone yet found much use for a CD-ROM full of formal, logical definitions of common sense information.
In fairness to Doug Lenat, I must say that he is now working from a computational4/20/01psychology perspective that has something in common with my approach. He has a reasonably solid theory of general heuristics 4/20/014/20/01 problem4/20/01solving rules that are abstract enough to apply to any context whatsoever, and his Cycorp Company is in some limited ways a competitor to Webmind Inc., developing intelligent text analysis techniques. His pre4/20/01Cyc programs AM and EURISKO applied his general heuristics to mathematics and science respectively. Both of these programs were moderately successful, exemplars in their field, but far from true intelligence. They lack a holistic view of the mind. Getting the problem4/20/01solving rules right means virtually nothing, because problem4/20/01solving rules gain their psychological meaning from their interaction with other parts of the mind. If the other parts aren't even there, the problem solving is bound to be sterile.
EURISKO won a naval fleet design contest two years in a row, until the rules were changed to prohibit computer programs from entering. And it also received a patent for designing a three4/20/01dimensional semiconductor junction. But when looked at carefully, even EURISKO's triumphs appear simplistic and mechanical. Consider EURISKO's most impressive achievement, the 34/20/01D semiconductor junction. The novelty here is that the two logic functions "Not both A and B" and "A or B" are both done by the same junction, the same device. One could build a 34/20/01D computer by appropriately arranging a bunch of these junctions in a cube.
But how did EURISKO make this invention? The crucial step was to apply the following general4/20/01purpose heuristic: "When you have a structure which depends on two different things, X and Y, try making X and Y the same thing." The discovery, albeit an interesting one, came right out of the heuristic. This is a far cry from the systematic intuition of a talented human inventor, which synthesizes dozens of different heuristics in a complex, situation4/20/01appropriate way.
By way of contrast, think about the Croatian inventor Nikola Tesla, probably the greatest inventor in recent history, who developed a collection of highly idiosyncratic thought processes for analyzing electricity. These led him to a steady stream of brilliant inventions, from alternating current to radio to robotic control. But not one of his inventions can be traced to a single "rule" or "heuristic." Each stemmed from far more subtle intuitive processes, such as the visualization of magnetic field lines, and the physical metaphor of electricity as a fluid. And each involved the simultaneous conception of many interdependent components.
EURISKO may have good general4/20/01purpose heuristics, but what it lacks is
the ability to create its own specific4/20/01context heuristics based on
everyday life experience. And this is precisely because it has no everyday life
experience: no experience of human life, and no autonomously4/20/01discovered,
body4/20/01centered digital life either. It has no experience with fluids, so it
will never decide that electricity is like a fluid. It has never played with
blocks or repaired a bicycle or prepared an elaborate meal, nor has it
experienced anything analogous in its digital realm ... so it has no experience
with building complex structures out of multiple interlocking parts, and it will
never understand what is involved in this. EURISKO pushes the envelope of
rule4/20/01based AI; it is just about as flexible as a rule4/20/01based program
can ever get. But it is not flexible enough. In order to get programs capable of
context4/20/01dependent learning, it seems to be necessary to write programs
which self4/20/01organize 4/20/014/20/01 if not exactly as the brain does, then
at least as drastically as the brain does.
Beyond the Nets versus Rules Dichotomy
"Nets versus rules" is an adequate way to view the history of AI from 30,000 feet, but of course, this perspective comes nowhere near to getting across the wild diversity of innovation you find by looking at the papers of individual researchers, including those way out of the mainstream. Here I will mention just three random bits of AI work which have had a particular influence on the WAE.
There was the work of John Andreae at the University of Canterbury in Hamilton, New Zealand. He wrote a nice little system called PURR-PUSS which learned to interact with you statistically. One of his students was John Cleary, who was one of the machine learning gurus at Waikato University in Hamilton, New Zealand, where I taught for a year. John has been working for Webmind Inc. on our machine learning module for over a year now, and he recommended a few of his best students to us, who now make up our Hamilton office. We're not exactly emulating PURR-PUSS in the WAE, but the statistical learning methods that it embodied are there in our machine learning module and our reasoning system, and the emphasis on interactive learning that Andraea advocated lives on in our current "Baby Webmind" project.
Then there is the idea of genetic algorithms - doing AI by simulating evolution rather than the brain. There are papers on the topic going back to the late 60's, but until the early 1990's this area of research was still extremely obscure. By the mid-90's it was a well-recognized area of computer science and I was doing research into the mathematics of genetic algorithms, studying questions such as "Why is evolution involving sexual reproduction more efficient than evolution involving asexual reproduction only?" Although the details are different, evolutionary AI is similar in spirit to neural net AI - you're dealing with a complex, self-organizing system that gives results in a holistic way, where each part of the system doesn't necessarily have a meaning in itself but only in the context of the behavior of the whole. In The Evolving Mind, I wrote a bit about the relation between evolutionary programming in AI and Edelman's theories of evolution in the brain. It turns out you can model the brain as an evolutionary system, with special constraints that make it a bit different from evolving ecosystems or genetic algorithms in AI. We have an evolution module in the WAE, which is used for two things: as one among many machine learning methods for finding patterns in data (along with feedforward neural nets and purely statistical methods); and as one among two ways of learning schema for perceiving and acting (the other being probabilistic logical inference).
Finally there is the notion of a Multi-Agent System. The AI Engine is indeed
a multi-agent system, but, it differs in some important ways from the MAS's
typically studied.
In most multi-agent software systems, system control is de-centralized, and a
common metaphor is an economy model. In those models, the agent themselves are
responsible for most of the control. They bid for resources from their
environment. They decide when to move to a new location. In some cases, they
even have the permission to alter the environment so their needs will be better
fulfilled. In the WAE, this model isn't entirely appropriate because of the need
for overall coordination of system functions. Most multi-agent software systems
are more like a society than a mind. The WAE mixes up central control with
decentralized, creative dynamics, which makes optimizing its parameters and
controlling its behavior inordinately complex.
In the WAE, rather than being done in a purely decentralized way, control is largely decentralized on a high level (between major system components), but centralized within each of these components (which may themselves contain thousands to billions of semi-autonomous actors). Control between and within high-level components is done by specialized Actors in the system, which will measure the system's behavior, and, according to some high level goals given by the system's cognitive processes, drive the system towards states that allow better performance under the necessary constraints of memory and processor usage, as well as attention restriction.
This is a major philosophical point, but it's one that we feel confident
we've handled correctly. The important point is that Marvin Minsky's metaphor of
a "Society of Mind" is not actually correct, though it's evocative and
useful in some ways. A mind is not a society, although it has many society-like
aspects: as compared to a society, it's much more coherently focused in its
behavior. Some of this coherent focus is emergent, but one of the factors that
allows such focus to emerge is a modicum of overall system control. In the
brain, there are many examples of this kind of overall control, including
various hormonal systems affecting neurochemical activity, and older, more
primitive parts of the brain that control our basic motivations and their
manifestations in everyday behavior. In the WAE, centralized overall control is
provided by the Homeostatic Controller, which regulates parameters (more like a
hormonal system), and the Attention Broker, which regulates heavyweight tasks
(more like the primitive brain, choosing between actions according to current
motivations).
3. The Importance of Embodiment
When neural nets were being dissed in the early 70's, not everyone was optimistic about the potentials of rule-based AI. In 1972, the era in which ELIZA was receiving a lot of attention. a philosopher named Hubert Dreyfus's wrote a book called What Computers Can't Do, which was a vicious attack on AI. Dreyfus argued that artificial intelligence researchers were fundamentally misguided, that they could never achieve their objectives with the methods they were using. Dreyfus preached the importance of body4/20/01centered learning, and the close connection between logic, emotion and intuition. Without a body, Dreyfus argued, without feelings, there can be no real generalization of special4/20/01case ideas. Based on these philosophical considerations, he predicted that AI would be a failure.
In 1992, Dreyfus re-released the book with the title What Computers Still Can't Do. The new Introduction brims over with insolence. But his exultant crowing is not quite persuasive. He was right about the limitations of the AI programs of the 1960s and 1970s. But the observers who thought it was just a matter of time and resources also been proven correct in many cases. Dreyfus, for example, ridiculed a late-1960's prediction by Frank Rosenblatt that computers would soon be able to take dictation, just as a human secretary can. Although this prediction didn't come true as quickly as Rosenblatt had thought, some fairly good programs are available today for this purpose, relying in large part on a neural net architecture to learn each users' speech patterns.
Dreyfus's critique of AI, in the first edition, was too strong. He appeared to believe that detailed simulation of the human body was the only possible path to AI, and he argued that this would be impossible without simulating the biology of the brain and the rest of the body. Actually, the human brain is only one intelligent system, and a great deal can be accomplished without replicating the details of its biology. But Dreyfus's arguments posed a serious challenge to AI theorists: how to design a machine that can simulate body4/20/01based, emotion4/20/01based conceptual generalization? I believe that Dreyfus was essentially correct that, if this is impossible, AI cannot work. A physical body just like ours is not required: an AI entity could have a virtual body, enabling it to interact in a rich and autonomous way with a virtual world. And emotions need not be controlled by biological neurotransmitters, they can come out of complex digital dynamics. But the point is, unless one has a computing system that is large, complex and autonomous, with integrated sensory, memory and action systems interacting with a rich environment and forming a self system, it will never develop the ability to generalize from one domain to another. The ability to generalize is learned through general experience, and general experience is gained by exploring a world.
In designing the WAE I took Dreyfus's critique to heart. Of course, I didn't
try to replicate the human body as he thought was necessary. Instead, I bypassed
his critique by designing a huge, self-organizing system, which lives in the
perceptual world of the Internet and understands that its body is made up by
Java objects living in the RAM of certain machines. It is a nonhuman, embodied
social actor. Dreyfus didn't try very hard to imagine an embodied, social
intelligence without a human-like body, but, his ideas certainly leave room for
such a thing. His problem was not with AI but with the attempt to build a mind
that operates in a vacuum, instead of synergistically with a self and a world.
4. Toward a Middle Way
I've presented a dichotomy between symbolic and connectionist AI - rule-based and neural-net AI. I've pointed out that a lot of cool AI doesn't fit into this framework at all, things like statistical machine learning and genetic algorithms. Now I'm going to dig my hole even deeper by arguing that the distinction between symbolic and connectionist AI is actually a lot fuzzier than most AI gurus realize.
This is a key issue because I often like to say that the WAE synthesizes connectionist and symbolic AI. While this is a true statement, it glosses over the peculiar vagueness of the notions of "symbolic" and "connectionist" themselves. When you get deeply into these concepts, you realize that this classical dichotomy is not quite framed correctly in most discussions on AI. There is a valid distinction between AI that is inspired by the brain, and AI that is inspired by conscious reasoning and problem-solving behavior. But the distinction between "symbolic" and "connectionist" knowledge representation is not as clear as it's usually thought to be.
Classically, the distinction is that in a symbolic system, meanings of concepts are kept in special localized data structures like rules, whereas in a neural-net-like connectionist system, meanings of concepts are distributed throughout the network. Also, in a symbolic system the dynamics of the system can be easily understood in terms of what individual rules do, whereas in a connectionist system the dynamics can basically only be understood holistically, in terms of what the whole system is doing.
But in reality the difference isn't so clear. For example, one branch of symbolic AI is "semantic networks." In a semantic network you have nodes that represent concepts and links representing relations between concepts. Suppose you have a semantic network in which there is a node representing "floor." This is, obviously, symbolic in the classic sense. The meaning of the "floor" node is localized. But wait - is it really?
In some semantic network based AI systems, all the relations are made up by people. But some of them have reasoning that builds relationships, that learns, for example, that because people walk on floors, floors must be solid, because people only can walk on solid things. In a system like this, relations are built from other relations, and so the meaning of the "floor" node may be contained in its relations to other nodes, i.e. its connections to other nodes. And, the formation of these connections may have been based on the connections of the other nodes to yet other nodes, etc. etc. etc.
What this means is that, in a semantic network formed by iterative reasoning rather than by expert rule creation, each element of knowledge (each node) actually represents the result of an holistic dynamic. It has meaning in itself -- a link to our socially constructed concept "floor" -- but internally its meaning is its relation to other things, each of which is only defined by the other things it related to, etc.; so that the meaning of the part is only truly describable in terms of the whole.
On the other hand, suppose one has a neural network in which memories are represented as attractors (a Hopfield Net, or Attractor Neural Network, in the lingo). Then, the meaning of a link between two nodes in this network mainly consists of the attractors that its presence triggers. On the other hand there's also a clear local interpretation: If the weight of the link is large then that means the two nodes it connects exist together in a lot of attractors. I.e., they're contextually similar. If the weight of the link is large and negative, this means that the two nodes rarely co-exist in an attractor -- they're contextually opposite. Whether the nodes have immediate symbolic meaning or not depends on the application -- in typical attractor neural network applications, they do, each one being a perceptible part of some useful attractor.
The point is, in both classic symbolic and classic connectionist knowledge representation systems, one has a mix of locally and holistically defined meaning. The mix may be different in different knowledge representation systems, but there is no rigid division between the two. This fact is important in understanding the WAE, which intermixes "symbolic" style and "connectionist" style knowledge representations freely.
Of course, there are extremes of symbolic AI and extremes of connectionism. There are logic based AI systems that don't have nearly the holistic-meaning aspect of a reasoning-updated semantic network as I've described above. And, there are connectionist learning systems -- e.g. backpropagation neural nets -- in which the semantics of links are way less transparent than in the attractor neural net example I've given above. But this is also an interesting point. I believe that, of all the techniques in symbolic AI, the ones that are most valuable are the ones that verge most closely on global, holistic knowledge representation; and of all the techniques in connectionist AI, the ones that are most valuable are the ones that verge most closely on localized knowledge representation. This is because real intelligence only comes about when the two kinds of knowledge representation intersect , interact and build on each other.
I'm certainly not alone in coming to the conclusion that the middle way is where it's at. For instance, Gerald Edelman, a Nobel Prize4/20/01winning biologist, proposed a theory of "neuronal group selection" or Neural Darwinism, which describes how the brain constructs larger4/20/01scale networks called "maps" out of neural modules, and selects between these maps in an evolutionary manner, in order to find maps of optimum performance. And Marvin Minsky, the champion of rule-based AI, had moved in an oddly similar direction, proposing a "Society of Mind" theory in which mind is viewed as a kind of society of actors or processes that send messages to each other and form alliances into temporary working groups.
Minsky's and Edelman's ideas differ on many details. Edelman thinks
rule-based AI is claptrap of the worst possible kind. Minsky still upholds the
rule-based paradigm --though he now admits that it may sometimes be productive
to model the individual "actors" or "processes" of the mind
using neural nets; he does not believe the self-organization of the population
of mind actors as a whole is important. But even so, the Society of Mind theory
and the Neural Darwinism approach are both indicative of a shift toward a new
view of the structure of intelligence, one which I believe is fundamentally
correct.
What Minsky and Edelman share is a focus on the intermediate level of process
dynamics. They are both looking above neurons and below rigid rational rules,
and trying to find the essence of mind in the interactions of large numbers of
middle4/20/01level psychological processes. I believe this is the correct
perspective, in large part because I think it is how the human mind works. Of
course, it is difficult to open up the human brain and test one's hypotheses.
But brain scientists are making enormous strides, and there will undoubtedly be
exciting new findings from them, which I expect to resonate nicely with
developments in artificial intelligence. In computer science, the way to prove a
theory is to translate it into computer code and get it running on available
computer hardware, which is what we're doing with the WAE.
5. The WAE as a Synthetic Solution
In working out the initial the WAE design, I was adamant about avoiding the oversimplifications of both the neural net and rule-based models. The neural network approach portrayed the brain as unrealistically unstructured and implausibly dependent on self4/20/01organization and learning, with little or no large4/20/01scale order. The rule-based approach portrayed the mind as unrealistically orderly, as implausibly dependent upon logical reasoning, with little or no chaotic, deeply trial4/20/01and4/20/01error4/20/01based self4/20/01organization. It totally misses the point about logic and rules: no particular system of rules is all that important to mind; the crucial thing is the ability to conceive new systems of rules to match new kinds of situations.
In order to understand how to go beyond these simplifications, I looked deep into the philosophy of mind, as will be described in the following chapter. From this fundamental perspective, I arrived at a conceptual design for a mind OS. Mind, I concluded, is an amazingly simple thing. Mind is a network of actors that act on each other, send messages to each other, and transform each other. Many of these actors are concerned with recognizing patterns in each other, and with achieving system goals in this way. The freeness and looseness of actor intercreation must be preserved in order that adaptivity and creativity may flourish - but there must nevertheless be actors specialized to carry out various tasks, otherwise, given finite computational resources, the mind will end up being equally stupid in all areas. The emergent dynamics of the whole actor system is essential to its intelligence, and this becomes yet more complex when one introduces specialized actor types.
How does this "self-organizing network of actors" view fit into the history of AI, to the traditional dichotomy between rules and neural nets? The key here, I believed, was to focus on the intermediate level of brain/mind organization: larger than the neuron, smaller than the abstract logical rules. In brain terms, I was most impressed by research on neural modules 4/20/014/20/01 clusters of tens or hundreds of thousands of neurons, each performing individual functions in an integrated way. One module might detect edges of forms in the visual field, another might contribute to the conjugation of verbs. The network of neural modules is a network of primitive mental processes rather than a network of non4/20/01psychological, low4/20/01level cells (neurons). The key lies in the way the modules are connected to each other, and they way they process information collectively. Neural modules, it seemed, were a good approximation to my "mind-actors." They lived at a level that was low enough to embody nonlinear self-organizing dynamics, yet high enough to be mindlike rather than physical-world-like, to have clear semantic meaning.
Of course, the brain is highly complex, and this conceptual perspective on brain structure comes nowhere near to capturing the brain's full diversity. For example, the cortical columns that are the basis for many ideas of neural groupings of this size are only found in primary cortices, not secondary or tertiary regions. This suggests a different structure for aggregates in the pure input/output (sensorimotor) domains from natural aggregates in the interactive, highly integrative parts of the cortices. But in spite of this very real diversity, it seems clear that the modular-grouping idea is at least a guide to understanding brain structure, worthwhile to consider as a high-level inspiration for AI design.
Consequently, I initially designed the WAE with this middle level of the brain's organization in mind. The AI Engine is not a neural net in the strict sense, but it embodies many aspects of neural networks. Compared to typical neural net AI programs, it's a level further removed from neurological detail. The basic unit of a neural network is the "formal neuron," a computer model of a biological neuron. The basic unit of the WAE, on the other hand, is a Java object called a node, which I think of as corresponding to a cluster of 10,000 to 100,000 neurons. These nodes interact with each other in many of the same ways that the neurons in a neural net program do.
Each of the WAE's nodes has rules programmed into it, which limit and shape what it does, depending on its node type. We haven't waited for each node in the WAE to figure out everything on its own, we've given each of them instructions based on our theories of how the mind operates. So the WAE includes both the logic of rule-based AI and the self-organizing capabilities of neural net AI.
And interestingly, although the initial AI Engine design was based on an attempt to mimic intermediate-level brain structure, during the course of our work we've found ourselves forced to imitate high-level brain structure a little more closely. I'm talking about the fact that, in moving from the AI Engine 0.5 release to the 1.0 release, we have introduced a great deal of functional specialization among our artificial brain's components. Just as the brain has specialized regions devoted to vision processing, language processing, temporal perception, and so forth, so does the current AI Engine design embody specialized subsystems for language processing, time processing, attention allocation, and various other tasks. At the same time, we have veered even further away from simulating the neuron-level of the brain, modifying the dynamics of the nodes and links of the AI Engine to look progressively less and less like straightforward neural net equations, based on the results of experimentation.
This, I believe, is also how the brain/mind works. The brain is more than a network of neurons connected according to simple patterns, and the mind is more than an assemblage of clever algorithms or logical transformation rules. Artificial intelligence research has a history of attempts to ignore this fact, for the sake of getting interesting semi4/20/01intelligent behavior out of insufficiently powerful computers. A lot has been learned through the process, but what has not been learned is how to build a true thinking machine. Rather, we have mostly learned what intelligence is not. Intelligence is not following prescribed deductive or heuristic rules, like Deep Blue or EURISKO; intelligence is not the adaptation of synapses in response to environmental feedback, as in Hebbian or backpropagation neural nets. Intelligence involves these things, but what intelligence really is, in my view, is something different: the self4/20/01organization and mutual intercreation of a network of processes, embodying perception, action, memory and reasoning in a unified way, and guiding an autonomous system in its interactions with a rich, flexible environment. That is what the WAE is all about.
In conclusion, I'll stress once again the key role that diversity plays in
the mind. The essence of mind is very simple, but this essence is only a
self-organizing "mind OS" of intertransforming actors - the other half
of mind is the specialized modules and learning methods that run on the OS.
Because of this, one of the keys to building the WAE has been creating a
community of AI experts with different points of view, and expertise on
different parts of the mind, who are all reasonably willing to listen to each
other. Each part of the mind is different, and requires a different way of
thinking, and it's very easy to assume that the way of thinking that is natural
for thinking about the part of the mind that you've been thinking about, is the
only correct way. In developing the WAE, we've had my initial design which was
firmly based on ideas of self-organization and emergence. We've had Jeff
Pressing's approach to mind based on cognitive psychology and his approach to
numerical data understanding based on nonlinear dynamics. We've had Pei Wang's
approach to reasoning, based on his own peculiar Peirce-inspired version of
probabilistic logic. We've integrated evolutionary programming, inspired but not
entirely based on John Koza's work in this area, and statistical learning as
developed by our Kiwi team, as descended from the thinking of John Cleary, John
Andreae, and others. And this is just a partial list. The WAE framework gives
general data structures and dynamics in which a variety of different approaches
to AI can coexist and interact and learn from each other.
3
Mind as a Web of Pattern
Ben Goertzel
(plus a paragraph here and there by Jeff Pressing)
1. Introduction
I've explained above how I think the field of AI came to become preoccupied with
things other than actually creating artificial intelligence. Some people have a
different diagnosis: They don't think AI is possible at all. They think that a
digital computer program can never really be intelligent, because it lacks some
magical chemical, physical or spiritual ingredient that humans possess. I meet
fewer people with this attitude now than I did fifteen years ago, but they still
exist. In fact, some folks with this attitude have been productive Webmind Inc.
employees.
The fact that there is no consensus among educated, intelligent people as to
whether computational intelligence is possible, indicates that even in a book
focused on a practical AI system, some attention to philosophical foundations is
worthwhile. In this chapter I'll give this aspect of AI its due, reviewing
various philosophical issues associated with computational intelligence, and
explaining how they've been answered in the context of the WAE.
Of course, I don't expect to convince every reader that I have all the answers
regarding questions such as the nature of intelligence, mind and consciousness.
But at least, I believe I can show that there is a consistent and sensible
conceptual framework underlying the AI Engine, which includes plausible answers
to all these questions as well as to more practical questions of AI design.
2. Is AI Possible?
When I first started thinking about AI, it wasn't obvious to me that it was
possible. But I convinced myself with two arguments, one scientific, one
philosophical.
The scientific argument went like this: Admittedly, the human brain is the only
definitive example of intelligence that we know, and it doesn't look like it's
executing algorithms: it is a largely incomprehensible mass of self-organizing
electrochemical processes. However, assuming that these electrochemical
processes obey the laws of quantum physics, they can be explained in terms of a
system of differential equations derived from quantum theory. And any such
system of differential equations may be approximated, to within any desired
degree of accuracy, by a computable function. Therefore, anyone who claims that
the human mind cannot be understood in terms of computation is either 1) denying
that the laws of quantum physics, or any similar mathematical laws, apply to the
brain; or 2) denying that any degree of understanding of the brain will yield an
understanding of the human mind. Neither of these alternatives seemed reasonable
to me (and neither of them seems reasonable now). (There's a little more to it
than this, one can consider potential quantum gravity phenomena, but that would
take us too far afield.)
And, the philosophical argument for the possibility of AI went like this: One
can divide the universe into two categories: the describable and the
indescribable. According to the Church-Turing Thesis, everything that is
describable is computable. And everything that is indescribable is
algorithmically random, and indistinguishable from the pseudorandom by any
finite entity. So, the universe consists, at most, of computation plus
randomness. Intelligence is part of the universe. QED.
So, suppose you accept these arguments, or others, and you believe that AI is
possible. A couple other questions then arise.
First, how do you do it? How do you create AI?
Second, what does AI mean exactly? What is the criterion by which we can judge
whether a digital system is intelligent?
Of course, these two questions interrelate. In the next section of this chapter
I'll explore the second question: What is intelligence? This question naturally
leads into a couple more philosophical questions: What is mind? What is
consciousness? Here I'll give fairly high-level conceptual answers to these
questions, referring the reader who wants mathematical or philosophical details
to my past publications on these topics. The goal is to give just enough
conceptual background to provide a meaningful philosophical context for the
discussion of mind design in the chapters to follow.
3. What is Intelligence?
Intelligence doesn't mean precisely simulating human intelligence. The WAE doesn't do that, and it would be unreasonable to expect it to, given that it lacks a human body. The Turing Test, "write a computer program that can simulate a human in a text-based conversational interchange," serves to make the theoretical point that intelligence is defined by behavior rather than by mystical qualities, so that if a program could act like, a human it should be considered as intelligent as a human. But it is not useful as a guide for practical AI development.
I'm not going to propose a specific IQ test for the WAE or other computer programs. This might be an interesting task, but it can't even be approached until there are a lot of intelligent computer programs of the same type. IQ tests work fairly well within a single culture, and much worse across cultures - how much worse will they work across species, or across different types of computer programs, which may well be as different as different species of animals? What is needed right now is something much more basic than an IQ test: a working, practical understanding of the nature of intelligence, which can be used as an intuitive guide for work on the development of intelligent machines.
My own working definition of intelligence builds on various ideas from psychology and engineering, as documented in (Goertzel, 1993, 2000). I believe that intelligence is best understood as follows:
Intelligence is the ability to achieve complex goals in a complex environment
The greater the total complexity of the set of goals that the organism can achieve, the more intelligent it is. (Of course, there are mathematical issues in how one takes this sum total, but I won't delve into these here. There is also the question of how to quantify the notion of complexity, which I'll return to briefly below.)
Note that this definition of intelligence is purely behavioral: it doesn't
specify any particular experiences or structures or processes as characteristic
of intelligent systems. I think this is as it should be. Intelligence is
something systems display; how they achieve it under the hood is another story.
It may well be that certain structures and processes and experiences are
necessary aspects of any sufficiently intelligent system.
My guess is that the science of 2050 will contain laws of the form: Any
sufficiently intelligent system has got to have this list of structures and has
got to manifest this list of processes. But this is another point, not necessary
for understanding how to design an intelligent system.
In conclusion, then: When I say that the WAE is an intelligent system, what I
mean is that it is capable of achieving a variety of complex goals in the
complex environment that is the internet. To go beyond this fairly abstract
statement, one has to specify something about what kinds of goals and
environments one is interested in. In the case of biological intelligence, the
key goals are survival of the organism and its DNA (the latter represented by
the organism's offspring and its relatives). In the WAE's case, the goals that
WAE 1.0 is expected to achieve are:
1. Predicting economic and financial and political and consumer data based on diverse numerical data and concepts expressed in news
2. Conversing with humans in simple English, with the goal not of simulating human conversation, but of expressing its insights and inferences to humans, and gathering information and ideas from them
3. Learning the preferences of humans and AI systems, and providing them with information in accordance with their preferences. Clarifying their preferences by asking them questions about it and responding to their answers.
4. Communicating with other WAE's, similar to its conversations with humans, but using a WAE-only language called Psynese
5. Composing knowledge files containing its insights, inferences and discoveries, expressed in XML or in simple English
6. Reporting on its own state, and modifying its parameters based on its self-analysis to optimize its achievement of its other goals
This is what a WAE instance needs to do in order to survive, in order to keep humans and other WAE instances happy with it so that it stays alive. It is by no means all the WAE will ever be able to do, but it's a start. Subsequent versions of the WAE are expected to offer enhanced conversational fluency, and enhanced abilities at knowledge creation, including theorem proving and scientific discovery and the composition of knowledge files consisting of complex discourses.
Are these goals complex enough that the WAE should be called intelligent? Ultimately this is a subjective decision. My belief is, yes. This is not a chess program or a medical diagnosis program, which is capable in one narrow area and ignorant of the world at large. This is a program that studies itself and interacts with others, that ingests information from the world around it and thinks about this information, coming to its own conclusions and guiding its internal and external actions accordingly.
Whether the WAE is smarter than or stupider than humans is not a very
interesting question. My own sense is that the first version will be
significantly stupider than humans overall though smarter in many particular
domains; but that within a couple years there may be a version that is
competitive with humans in terms of overall intelligence; and within 10 years
there will probably be a version dramatically smarter than humans overall, with
a much more refined design running on much more powerful hardware. But it's not
clear to me how relevant my own subjective judgment is, in assessing the
intelligence of another type of being. I'm content to make it as smart as
possible.
4. What is Mind?
If intelligence is the achieving of complex goals in complex environments, then what is "mind"?
Of course, many philosophers have addressed this question. My favorite ideas in this domain come from three very different thinkers: Charles S. Peirce, Buddha, and Friedrich Nietzsche. I've also been much inspired by the emerging discipline of complexity science, and its vision of the mind as a complex, adaptive self-organizing system.
One of my favorite passages in the history of philosophy is where Peirce says
that:
Logical analysis applied to mental phenomena shows that there is but one law of mind, namely, that ideas tend to spread continuously and to affect certain others which stand to them in a peculiar relation of affectability. In this spreading they lose intensity, and especially the power of affecting others, but gain generality and become welded with other ideas.
This is an archetypal vision of mind which I call "mind as relationship" or "mind as network." In modern terminology Peirce's "law of mind" might be rephrased as follows: "The mind is an associative memory network, and its dynamic dictates that each idea stored in the memory is an active agent, continually acting on those other ideas with which the memory associates it."
Peirce proposed a universal system of philosophical categories:
* First: pure being
* Second: reaction, physical response
* Third: relationship
Mind from the point of view of First is raw consciousness, pure presence, pure being. Mind from the point of view of Second is a physical system, a mess of chemical and electrical dynamics. Mind from the point of view of Third is a dynamic, self-reconstructing web of relations.
Following a suggestion of my friend, the contemporary philosopher Kent Palmer, I have added an additional element to the Peircean hierarchy:
* Fourth: synergy
In the WAE, ideas begin as First, with Nodes of their own. They interact with each other, which is Second, producing patterns of relationships, Third. In time, stable, self-sustaining ideas develop, which are Fourth. In Peirce's time, it was metaphysics, today it is computer science!
Nietzsche's philosophy presented a similar world-view, though articulated in a different language. He saw a world of entities which are relations between each other, each one constantly acting to extend itself over the other, in accordance with the will to power which is its essence. Each "thing" is known only by its effect on other things; by the observable regularities which it gives rise to. But this web of interrelationships is alive, it is constantly moving, each thing shifting into the others; and the way Nietzsche chose to express this dynamic was in terms of his principle of the "will to power," in terms of the urge for each relationship to extend over the others.
These ideas are articulated in yet a different way within modern complexity science. One views a complex system as a collection of elements, each defined by its interaction with other elements rather than by its own isolated properties. Relationships between elements define a system. Now we have some mathematics to describe these interrelationships and the dynamics that emerge therefrom, and we can run computer simulations accordingly, but the basic idea is the same as Peirce and Nietzsche proposed, and other philosophers before them: complex systems like minds are made of relationships, they're vast self-organizing systems of relationships, continually creating new relationships through emergent interaction between their parts.
In my years as an abstract systems theorist, my own approach to analyzing
mind as a complex system was founded on the mathematical concept of
"pattern." I proposed a very simple definition of pattern: A pattern
is a representation as something simpler.
Given this, I defined complexity as the total amount of pattern in an entity.
This implies that intelligence, the ability to achieve complex goals in complex
environments, is just a particular way of manifesting complexity. The crucial
thing is the web of pattern, of patterns emerging from patterns yielding greater
and greater complexity. Whether we want to call a particular system, a
particular nexus of patterns, an intelligent system or not, depends on whether
the functions it can optimize - the goals it can achieve -- meet our own
subjective standard of complexity. This is a very simplistic view of things, but
I think it is perfectly reflective of the role of intelligence in the universe.
Intelligent organisms are only a very small part of the world, of the universal
mind, of the web of pattern.
5. Consciousness
So, mind is a self-organizing system of relationships. But what about
consciousness? What makes a self-organizing system of relationships conscious?
This leads up to perhaps the thorniest philosophy-of-AI question of all time:
Can a computer program ever be conscious?
One interesting thing about the question of computer consciousness is that,
judging from current practice, it is pretty much completely irrelevant to the
practice of AI design and engineering. It's interesting to ask, why is this? Of
course, different philosophies of consciousness lead to different answers! Those
who believe that consciousness comes out of divine intervention in the human
brain, or out of macroscopic quantum processes that are present in the human
brain but not in computer programs, would say that the reason consciousness is
irrelevant to AI engineering is that the programs engineered cannot be
conscious. On the other hand, those who believe that consciousness is basically
a fiction, that it's just a construct we deterministic human machines use to
describe our deterministic actions, would say that consciousness is irrelevant
to AI engineering because consciousness is nothing special, it's just one more
part of mind programming to be dealt with like the other parts. And so forth.
I believe that consciousness, properly understood, does need to be considered in
the course of AI design and engineering. I think that the reason consciousness
has been irrelevant to practical AI so far is simply that practical AI has not
been concerned with actually building thinking machines, but only with making
programs that manifest particular components of intelligence in isolation.
What we call "consciousness" has several aspects, including
* self-observation and awareness of options
* "free will" - choice-making behavior
* inferential and empathetic powers
* Perception/action loops
Within these various aspects, two different more general aspects can be
isolated:
Structured consciousness: There are certain structures associated with
consciousness, which are deterministic, cognitive structures, involved with
inference, choice, perception/action, self-observation, and so on. These
structures, as they manifest themselves in the WAE and in the human brain, will
be discussed in Chapter 9.
Raw consciousness: The "raw feel" of consciousness, which I will
discuss briefly, here.
What is often called the "hard problem" of consciousness is how to
connect the two. Although few others may agree with me on this point, at this
point, I believe I know how to do this. I analyze raw consciousness as
"pure, unstructured experience," as Peircean First, which manifests
itself in the realm of Third as randomness. Structured consciousness on the
other hand is a process that coherentizes mental entities, makes them more
"rigidly bounded," less likely to diffuse into the rest of the mind.
The two interact in this way: structured consciousness is the process in which
the randomness of raw consciousness has the biggest effect. Structured
consciousness amplifies little bits of raw consciousness, as are present in
everything, into a major causal force.
Obviously, my solution to the "hard problem" is by no means
universally accepted in the cognitive science or philosophy or AI communities!
There is no widely accepted view; the study of consciousness is a chaos. I
anticipate that many readers will accept my theory of structured consciousness
but reject my theory of raw consciousness. This is fine: everyone is welcome to
their own errors! The two are separable, although a complete understanding of
mind must include both aspects of consciousness. Raw consciousness is a tricky
thing to deal with because it is really outside the realm of science. Whether my
design for structured consciousness is useful - this can be tested empirically.
Whether my theory of raw consciousness is correct cannot. Ultimately, the test
of whether the WAE is conscious is a subjective test. If it's smart enough and
interacts with humans in a rich enough way, then humans will believe it's
conscious, and will accommodate this belief within their own theories of
consciousness.
I'll talk about raw consciousness very infrequently in the following pages, but
this isn't because I think it's unimportant, it's because the focus here is on
mind design - on mechanics rather than experience.
6. The Psynet Model of Mind and the Webmind AI Engine
Now let's cut to the chase. Inspired by Peirce, Nietzsche, Leibniz and other
philosophers of mind, I spent many years of my career creating a similarly
ambitious, integrative philosophy of mind of my own. After years searching for a
good name, I settled for "the psynet model" instead - psy for mind,
net for network. Mind as Network was one theme of my theory of mind, at any
rate; so the name is only incomplete, not actually misleading.
The psynet model is a conceptual model of the mind, created not only for AI
purposes, but for the analysis of human thought processes as well (see Goertzel,
1993, 1993a, 1994, 1997 for detailed expositions of various aspects of the
model, as it developed over time). It aims to capture the abstract structures
and dynamics of intelligence, under the hypothesis that these are independent of
the underlying physical implementation.
The psynet model of mind can be described at various levels of granularity. The
essential ideas are very simple, but the full exposition is highly complex as
mind has many particular manifestions. Here I'll present a fairly high-level
view of the psynet model, in the interest of moving quickly from the realm of
philosophy into the domain of AI design.
According to the psynet model of mind:
1. A mind is a system of agents or "actors" (or currently preferred
term) which are able to transform, create & destroy other actors
2. Many of these actors act by recognizing patterns in the world, or in other
actors; others operate directly upon aspects of their environment
3. Actors pass attention ("active force") to other actors to which
they are related
4. Thoughts, feelings and other mental entities are self-reinforcing,
self-producing, systems of actors, which are to some extent useful for the goals
of the system
5. These self-producing mental subsystems build up into a complex network of
attractors, meta-attractors, etc.
6. This network of subsystems & associated attractors is "dual
network" in structure, i.e. it is structured according to at least two
principles: associativity (similarity and generic association) and hierarchy
(categorization and category-based control).
7. Because of finite memory capacity, mind must contain actors able to deal with
"ungrounded" patterns, i.e. actors which were formed from
now-forgotten actors, or which were learned from other minds rather than at
first hand -- this is called "reasoning"
8. A mind possesses actors whose goal is to recognize the mind as a whole as a
pattern -- these are "self"
According to the psynet model, at bottom the mind is is system of actors
interacting with each other, transforming each other, recognizing patterns in
each other, creating new actors embodying relations between each other.
Individual actors may have some intelligence, but most of their intelligence
lies in the way they create and use their relationships with other actors, and
in the patterns that ensue from multi-actor interactions.
We need actors that recognize and embody similarity relations between other
actors, and inheritance relations between other actors (inheritance meaning that
one actor is in some sense a special case of another one, in terms of its
properties or the things it denotes). We need actors that recognize and embody
more complex relationships, among more than two actors. We need actors that
embody relations about the whole system, such as "the dynamics of the whole
actor system tends to interrelate A and B."
This swarm of interacting, intercreating actors leads to an emergent
hierarchical ontology, consisting of actors generalizing other actors in a tree;
it also leads to a sprawling network of interrelatedness, a "web of
pattern" in which each actor relates some others. The balance between the
hierarchical and heterarchical aspects of the emergent network of actor
interrelations is crucial to the mind.
The name I've given to this balance in my past writings is the "dual
network." The idea is that, in the mind, hierarchy and heterarchy overlap
each other, and the dynamics of the mind is such that they have to work well
together or the mind will be all screwed up. The overlap of hierarchy and
heterarchy gives the mind a kind of "dynamic library card catalog"
structure, in which topics are linked to other related topics heterarchically,
and linked to more general or specific topics hierarchically. The creation of
new subtopics or supertopics has to make sense heterarchically, meaning that the
things in each topic grouping should have a lot of associative, heterarchical
relations with each other. In the WAE, this general "dual network"
principle is reflected in many ways, most simply in category formation by
clustering.
One can also view the dual network in terms of process. There are mental
processes - like following a train of thought - that follow only symmetric
connections. There are mental processes - like reasoning - that use only
asymmetric connections. These processes have to be balanced against each other,
they have to work together rather than against each other, for mind to function
properly. And a state of mind in which symmetry and asymmetry work well together
cannot be engineered easily. It has to evolve - it has to be an attractor. The
dual network is an archetype - a class of attractors of the dynamical system of
mental processes. It is a fundamental archetype of mind.
Structures like the dual network are built up by many different actors; they're
also "sculpted" by the deletion of actors. All these actors
recognizing patterns and creating new actors that embody them - this creates a
huge combinatorial explosion of actors. Given the finite resources that any real
system has at its disposal, it follows that forgetting is crucial to the mind -
not every actor that's created can be retained forever. Forgetting has profound
consequences for mind. It means that, for example, a mind can retain the datum
that birds fly, without retaining much of the specific evidence that led it to
this conclusion. The generalization "birds fly" is a pattern A in a
large collection of observations B is retained, but the observations B are not.
Obviously, a mind's intelligence will be enhanced if it forgets strategically,
i.e., forgets those items which are the least intense patterns. And this ties in
with the notion of mind as an evolutionary system. A system which is creating
new actors, and then forgetting actors based on relative uselessness, is
evolving by natural selection. This evolution is the creative force opposing the
conservative force of self-production, actor intercreation.
Forgetting ties in with the notion of grounding. A pattern X is
"grounded" to the extent that the mind contains entities in which X is
in fact a pattern. For instance, the pattern "birds fly" is grounded
to the extent that the mind contains specific memories of birds flying. Few
concepts are completely grounded in the mind, because of the need for drastic
forgetting of particular experiences. This leads us to the need for
"reasoning," which is, among other things, a system of transformations
specialized for producing incompletely grounded patterns from incompletely
grounded patterns.
Consider, for example, the reasoning "Birds fly, flying objects can fall,
so birds can fall." Given extremely complete groundings for the
observations "birds fly" and "flying objects can fall", the
reasoning would be unnecessary -- because the mind would contain specific
instances of birds falling, and could therefore get to the conclusion
"birds can fall" directly without going through two ancillary
observations. But, if specific memories of birds falling do not exist in the
mind, because they have been forgotten or because they have never been observed
in the mind's incomplete experience, then reasoning must be relied upon to yield
the conclusion.
The necessity for forgetting is particularly intense at the lower levels of the
system. In particular, most of the patterns picked up by the
perceptual-cognitive-active loop are of ephemeral interest only and are not
worthy of long-term retention in a resource-bounded system. The fact that most
of the information coming into the system is going to be quickly discarded,
however, means that the emergent information contained in perceptual input
should be mined as rapidly as possible, which gives rise to the phenomenon of
"short-term memory."
What is short-term memory? A mind must contain actors specialized for rapidly
mining information deemed highly important (information recently obtained via
perception, or else identified by the rest of the mind as being highly
essential). This is "short term memory." It must be strictly bounded
in size to avoid combinatorial explosion; the number of combinations (possible
grounds for emergence) of N items being exponential in N. The short-term memory
is a space within the mind devoted to looking at a small set of things from as
many different angles as possible.
From what I've said so far, the psynet model is a highly general theory of the
nature of mind. Large aspects of the human mind, however, are not general at
all, and deal only with specific things such as recognizing visual forms, moving
arms, etc. This is not a peculiarity of humans but a general feature of
intelligence. The generality of a transformation may be defined as the variety
of possible entities that it can act on; and in this sense, the actors in a mind
will have a spectrum of degrees of specialization, frequently with more
specialized actors residing lower in the hierarchy. In particular, a mind must
contain procedures specialized for perception and action; and when specific such
procedures are used repeatedly, they may become "automatized", that
is, cast in a form that is more efficient to use but less flexible and
adaptable. This brings the WAE into a congruent position with that of
contemporary neuroscience, which has found evidence both for global generic
neural structures and highly domain-specific localized processing.
Another thing that actors specialize for is communication. Linguistic
communication is carried out by stringing together symbols over time. It is
hierarchically based in that the symbols are grouped into categories, and many
of the properties of language may be understood by studying these categories.
More specifically, the syntax of a language is defined by a collection of
categories, and "syntactic transformations" mapping sequences of
categories into categories. Parsing is the repeated application of syntactic
transformations; language production is the reverse process, in which categories
are progressively expanded into sequences of categories. Semantic
transformations map structures involving semantic categories and particular
words or phrases into actors representing generic relationships like similarity
and inheritance. They take structures in the domain of language and map them
into the generic domain of mind.
And language brings us to the last crucial feature of mind: self and
socialization. Language is used for communicating with others, and the
structures used for semantic understanding are largely social in nature (actor,
agent, and so forth); language is also used purely internally to clarify
thought, and in this sense it's a projection of the social domain into the
individual. Communicating about oneself via words or gestures is a key aspect of
building oneself.
The "self" of a mind (not the "higher self" of Eastern
religion, but the "psychosocial" self) is a poorly grounded pattern in
the mind's own past. In order to have a nontrivial self, a mind must possess,
not only the capacity for reasoning, but a sophisticated reasoning-based tool
(such as syntax) for transferring knowledge from strongly grounded to poorly
grounded domains. It must also have memory and a knowledge base. All these
components are clearly strengthened by the existence of a society of similar
minds, making the learning and maintenance of self vastly easier
The self is useful for guiding the perceptual-cognitive-active
information-gathering loop in productive directions. Knowing its own holistic
strengths and weaknesses, a mind can do better at recognizing patterns and using
these to achieve goals. The presence of other similar beings is of inestimable
use in recognizing the self -- one models one's self on a combination of: what
one perceives internally, the external consequences of actions, evaluations of
the self given by other entities, and the structures one perceives in other
similar beings. It would be possible to have self without society, but society
makes it vastly easier, by leading to syntax with its facility at mapping
grounded domains into ungrounded domains, by providing an analogue for inference
of the self, by external evaluations fed back to the self, and by the affordance
of knowledge bases, and informational alliances with other intelligent beings.
7. The Psynet Model as a Framework for AI
Clearly there is much more to mind than all this - working out the details of
each of these points uncovers a huge number of subtle issues. But, even without
further specialization, this list of points does say something about AI. It
dictates, for example,
* that an AI system must be a dynamical system, consisting of entities (actors)
which are able to act on each other (transform each other) in a variety of ways,
and some of which are able to evaluate simplicity (and hence recognize pattern).
* that this dynamical system must be sufficiently flexible to enable the
crystallization of a dual network structure, with emergent, synergetic
hierarchical and heterarchical subnets
* that this dynamical system must contain a mechanism for the spreading of
attention in directions of shared meaning
* that this dynamical system must have access to a rich stream of perceptual
data, so as to be able to build up a decent-sized pool of grounded patterns,
leading ultimately to the recognition of the self
* that this dynamical system must contain entities that can reason (transfer
information from grounded to ungrounded patterns)
* that this dynamical system must be contain entities that can manipulate
categories (hierarchical subnets) and transformations involving categories in a
sophisticated way, so as to enable syntax and semantics
* that this dynamical system must recognize symmetric, asymmetric and emergent
meaning sharing, and build meanings using temporal and spatial relatedness, as
well as relatedness of internal structure, and relatedness in the context of the
system as a whole
* that this dynamical system must have a specific mechanism for paying extra
attention to recently perceived data ("short-term memory")
* that this dynamical system must be embedded in a community of similar
dynamical systems, so as to be able to properly understand itself
* that this dynamical system must act on and be acted on by some kind of
reasonably rich world or environment.
It is interesting to note that these criteria, while simple, are not met by any
previously designed AI system, let alone any existing working program. The WAE
strives to meet all these criteria.
How does the WAE meet these criteria? How does it realize this vision of the
mind as a self-organizing actor system, with language, self and reason emerging
from actor interactions? This is the story of the rest of the book, obviously,
but a few comments may be useful right here.
Firstly, according to the philosophy outlined above, we need a
"physical" layer of mechanical cause and effect underlying the actor
system, causing it to operate: in the case of the WAE, this is the Webmind Core,
to be discussed below. We also need synergetic emergence to happen, higher-level
patterns evolving amongst various actors allowing system goals to be fulfilled
by the actor system as a whole.
The WAE embodies the psynet model by creating a "self-organizing actors
OS," the Webmind Core, and then creating a large number of special types of
actors. Most abstractly, we have Node actors, which embody coherent wholes
(texts, numerical data series, concepts, trends, schema for acting); we have
Link actors, which embody relationships between Nodes (similarity, logical
inheritance or implication, data flow, etc.); we have Stimulus actors that
spread attention between Nodes and Links; and we have Wanderer actors, that move
around between Nodes building Links. These general types of actors are then
specialized into 100 or so node types, and a dozen link types, which carry out
various specialized aspects of mind - but all within the general framework of
mind as a self-organizing, self-creating actor system. There are also some macro
level actors, Data-Structure Specialized MindServers, that simulate the
behaviors of special types of nodes and links in especially efficient,
application-specific ways. These two are mind-actor, though specialized kinds.
Each actor in the whole system is viewed as having its own little piece of
consciousness, its own autonomy, its own life cycle - but the whole system has a
coherence and focus as well, eliminating component actors that are not useful
and causing useful actors to survive and intertransform with other useful actors
in an evolutionary way.
In this case, the path from abstract philosophy to concrete software
implementation is long and complex. In particular, to implement Mind as Third in
a way that effectively leads to Mind as Fourth, emergent synergy, turns out to
be a heck of a big trick, because to do so within realistic hardware constraints
requires the careful integration of a large number of highly specialized
components within a common structural and dynamical framework. But no one ever
said it would be easy. The key to success, I believe, is to at all time keep one
eye on the engineering details, and another on the philosophical fundamentals.
In the synergy between these two extremes lies the key to creating a thinking
machine.
8. The Essence of Digital Mind
As one may note from perusing the above list of lessons, there is, in my view,
no single really strange or amazingly original ingredient at the center of the
mind. There's no seventeenth-order differential equation giving the secret
formula for the relationship between perception, cognition and consciousness, or
anything like that. The essence of mind, be it biological or digital, is simple,
simple, simple. Mind is a network of actors, continually recognizing patterns in
each other and between themselves and each other. Mind as a whole is thus a
self-studying, self-transforming network of patterns, continually transforming
itself in order to achieve the goals of the organism in which it is embedded.
Most of the art of designing the WAE has been to choose the right actors to
enable the emergent dynamic of productive self-transformation to arise - in a
reasonably computationally efficient way, and with parameter dependences that
are manageable and understandable. This is what most of the lessons given about
pertain to, and it's what most of the following text concerns: the particular
actors involved in the WAE, and how they interact with each other to produce the
emergent dynamics and attractors of mind, and how they can be tuned and
specialized to work reasonably well under real-world conditions.
Eventually, I suspect, all the ideas contained in this book, and embodied in the
WAE design, will be considered fairly obvious. At the present time, however, the
idea that one can build a thinking machine using ideas like these is still
fairly heretical. The component ideas required are mostly currently being
discussed in one discipline or another, but the pattern of thinking required to
put all the pieces together is not a conventional one in the AI field at the
present time. Contemporary science focuses on the parts whereas to build a mind
one has to focus on the whole. To a much greater extent than is true in building
a car or a computer, each part has to be designed with the dynamics of the whole
in mind.
The WAE's expansiveness and messiness may be disconcerting to those who feel
that mind is a simple, elegant thing - but should not be at all surprising to
anyone who has studied brain science. The human brain is a horrible hodge-podge
and mess. Mind, in itself, is simple and elegant, but the implementation of mind
in physical reality is all about efficiency considerations, which do not lend
themselves to elegant universal solutions. This is a simple point which however
seems to have eluded nearly all AI theorists, even those with very interesting
and high-quality ideas.
One reason the brain is such a mess is that it has evolved to be adaptive and
functional over a very wide range of contexts over a long period of time. Thus
we have redundant mental systems, and such things as movement are evaluated and
shaped by emotional evaluators (limbic system), specialist control processors
(basal ganglia and cerebellum), planning areas (supplementary motor cortex,
premotor, frontal ), reasoning (frontal and general control centers (motor
cortex), listing only a few. These present inculcations of past solutions to
past problems have adapted to maintain relevance today, or become spandrels. The
same sort of thing, if we are successful, will happen with the WAE as it lives
through many different historical regimes.
4
Key Aspects of Webmind AI
Ben Goertzel
Because this is an overview chapter, the ideas presented in it are drawn from
many different places, and I haven't tried to attribute each subsection to the
relevant inventors. As a whole, the ideas presented here are the collective
creation of the Webmind Inc. AI R&D group.
1. Introduction
Webmind, as described in the Prologue and the end of the previous chapter, is a multi-modular AI system, which achieves general intelligence through the emergent structures and dynamics that arise when many specialized mind-modules are integrated in a suitable common framework. This chapter reviews various aspects of the mind-modules inside the current version Webmind. It is by no means a complete review - large areas like numerical data processing are left out, and key areas like procedure learning and feeling calculation are passed over lightly in a few sentences. The purpose is merely to give the reader an intuitive feel for how various key AI issues are confronted in the Webmind framework, to give a grounding for the further details to be given in later chapters. General topics such as
* Representation of declarative and procedural knowledge
* Logical inference
* Evolutionary learning
* Experiential interaction
* Feelings and Goals
* Self and User Modeling
* Natural language processing
are considered.
Each section of the chapter reviews a certain "lesson learned" in the course of designing or creating WAE. Of course we've learned many more lessons than this in the course of transforming the abstract psynet model of mind into a real software system, but these are among the chief and most general, abstract ones:
* Mind is a web of patterns
* A digital mind requires a Mind OS
* Implementing massively parallel intelligence on serial computers requires a
Distributed Actor System
* Nonlinear dynamics is crucial for information creation, memory, and system
control
* Declarative and Procedural Knowledge Must Be Represented in a
Learning-Friendly Way
* The Mind requires an inference engine that can operate adaptively,
incrementally and creatively in the face of extreme uncertainty
* Causal inference must be handled by a combination of temporally-constrained
logic, and domain-dependent specialized knowledge about the "cause"
concept
* Evolution is a key learning mechanism, supplementing reasoning by providing a
more "global" approach to learning problems varying from
categorization to parameter optimization, and the acquisition of causal and
procedural knowledge
* The hybridization of evolution with inference is critical to the solution of
difficult learning problems.
* Categorizing the world is a specialized learning task that demands specialized
ethods, or at least specialized control mechanisms for inference
* Experiential interaction is conducted according to a particular structure
involving a perception/action/short-term-memory triad, and a particular dynamic
involving goal/context/schema triads
* Intelligent human language processing in a digital system requires a
dynamically adapted combination of built rules and experiential learning
* Extracting meaning from sensory data (for example quantitative financial data,
in the Webmind AI Engine's case) requires complex and specialized data
preprocessing methods, as well as tools for mapping the results and internal
data structures of these methods into general knowledge representations
* For a mind, digital or human, to become truly intelligent, a process of
experiential interactive learning is required, necessarily involving other minds
* Emergence is critical to intelligence, on three levels: Within individual mind
modules, among the mind modules forming a mind, and between minds in a society
Note that the basics of WAE architecture and dynamics, as presented in
Chapter 1, are not repeated here. The reader may wish to refer back to the
relevant section of that chapter for a refresher, before proceeding further.
2. Mind is a web of patterns
A digital mind is a different sort of thing than an ordinary software program.
One can build a combat game, a transaction system or a word processor without an
explicitly articulated philosophy, but to build a digital mind without an
explicitly articulated philosophy of mind would be extremely difficult. There
are just too many difficult decisions that come up along the way, that send one
back to one's conceptual foundations. In order to think clearly about how to
build a thinking machine, one needs to begin with a clearly articulated
philosophy of mind - in other words, one needs a conceptual framework for
"what a mind is."
The essence of my own philosophy of mind was described in the previous chapter,
and may be summarized compactly as: A mind is a system
(population/society/whatever) of "actors" which are embedded in an
organism with goals. Each actor is able to transform, create & destroy other
actors. Many of these mind-actors act by recognizing patterns in the world, or
in other actors; and many perform meaningful acts with respect to the system's
external world. Actors pass attention ("activation") to other actors
to which they are related; and thoughts, feelings, motivations and other mental
entities are self-reinforcing, self-producing, systems of actors, which are to
some extent useful for the goals of the system.
Or, to phrase it in terms of buzzwords: Mind is an autopoietic, evolving,
self-organizing teleological pattern-recognition, and world-modification system.
This basic philosophy of mind has been strengthened, rather than weakened, by
confrontation with the practical reality of software implementation. Through
practice we have gained a sense of which mind-actors and mind-actor interaction
patterns are useful for building a software mind and which are not.
3. A digital mind requires a Mind OS
Given the self-organizing-actor-system philosophy of mind as a starting-point, it follows that the first step toward creating a digital mind is to create a "Mind OS," which provides a small set of unified knowledge representations and dynamics for all aspects of mind. Mental functions implemented on top of the Mind OS must have the ability to continually reconstruct and self-organize themselves in response to internal fluctuations and external interactions. The Mind OS must be flexible enough to support diverse specialized knowledge representations and dynamics for specialized areas, and to allow communication and cross-learning among the specialized representations and dynamics.
The Core component of the Webmind AI Engine provides such a Mind OS: its unifying principle is that of a self-organizing nonlinear-dynamical semantic network. Its unified knowledge representation consists of "nodes and links", which may be extended to create specialized node and link types; and its unified dynamics consists of agents called Stimuli and Wanderers which move through the network of nodes and links creating information.
The need for this "OS plus applications" framework comes, essentially, out of the need for diverse specialization. There are simple data-structure/algorithm combinations that are in principle adequate to power a thinking machine, but in practice none of these is adequately efficient to serve a real-world mind. Real-world minds need specialized components for dealing with specialized aspects of real life. But a collection of specialized components that don't speak the same language can never display the coherence required to guide an intelligent system. Thus we need a Mind OS that allows for the common implementation and constant rich communication of specialized components extending the same basic algorithms and data structures to various relevant domains.
4. Implementing massively parallel intelligence on serial computers requires a Distributed Actor System
The Webmind "Mind OS" would most naturally be implemented on a massively parallel hardware platform, in which each node and link had their own physical embodiment. But this is not practical at the present time. So we're faced with the situation of implementing a massively parallel Mind OS on top of a network of serial computers, some with multiple processors. This presents a huge engineering challenge, which we've confronted by building the Webmind core: a distributed actor system that allows a population of mind-actors (like nodes, links, stimuli and wanderers) to live on a network of ordinary computers, interacting and self-organizing roughly as if they were parts of a massively parallel quasi-biological computing system.
Most of the ideas underlying this distributed actor system are well known in
computer science. There's an addressing system - each actor has a
"handle" by which other actors can locate it. Dynamic load balancing
is done to improve system efficiency, with more extreme load balancing activity
done during system-wide "sleep" periods. A queue of worker threads is
used to deal with multiple processors on a single machine ... and so forth. The
psycore component of the Webmind AI Engine is a synthesis of well-known computer
science techniques in an unprecedentedly complex way, resulting in an extremely
sophisticated distributed multiprocessor object-oriented software framework.
5. Declarative and procedural knowledge must be represented in a way that is friendly for a variety of mind algorithms
This means it must be possible for new representations of new knowledge to be learned, and existing representations of old knowledge used to gain knowledge, by diverse methods.
Knowledge representation is one of the key issues in AI. In some cases, for instance in expert systems and frame- or logic-based system, it's overemphasized as compared to learning. But on the other hand, machine learning and feedforward neural net AI systems embody knowledge representations that are overly specialized for particular learning algorithms. Placing knowledge into these specialized representations is tricky, so these so-called AI methodologies rely on human intelligence to represent knowledge in terms of "input vectors." In a really intelligent system, knowledge representation and learning must go hand in hand, each one optimized for the convenience of the other. To overstate things just a little, one may say: The best knowledge representation is the one that best lends itself to learning; and the best learning system is the one that best exploits one's knowledge representation.
If the mind is viewed as a collection of mind-actors, then knowledge can be represented within individual actors, or in patterns of relationship among various actors. These two methods blur together when, as often occurs, individual actors recognize and embody patterns of relationship among other actors.
The particular actors making up any particular mind must embody a common system for representing knowledge, otherwise they won't be able to understand each other, and to build a common emergent mind-system together. Mathematical logic provides a powerful framework for knowledge representation, but in its typical forms, it's not a "learning-friendly" framework. The mind requires a way to express arbitrarily complex logical statements in a form that is susceptible to adaptation, learning, and integration with other parts of the mind. In the Webmind AI Engine, this means expressing logical formulae in terms of nodes and links, which sometimes requires complex structures such as nodes that group links, and links that point to links.
One may divide up WAE's knowledge representation in various ways. For instance, there's the local versus global distinction, which is not absolute, but is nonetheless useful. The meaning of any particular link or node is comprehensible only in the context of the whole system. But yet, there is a valid distinction between knowledge that is associated with some particular link or node, and knowledge that is not.
The declarative versus procedural distinction is also useful. Knowledge of facts versus knowledge of how to do something. Again, this is not a rigid distinction: facts may tell you how to do things; and even a complex procedure may be expressed as a set of facts about the sub-steps to be followed. But, in practical terms, this is a useful distinction both in terms of brain science and in terms of WAE software engineering.
It has often been observed that procedural knowledge - knowledge of "how to do things" - is particularly ineffectively dealt with by standard logic-based methods. The human brain seems to have separate systems for dealing with procedural and declarative knowledge. However, one also requires mechanisms for translating between one representation and the other, for cases where procedures need to be reasoned about logically. The Webmind AI Engine uses a functional-programming-like representation for storing and manipulating procedures or "schema," and has tools for translating this into a logical representation as necessary. Schema execution is integrated with other psynet dynamics in a subtle way; and schema can be learned either by reasoning or by evolution, or by a combination of the two.
The Webmind AI Engine's knowledge-representation in terms of nodes and links provides the generality and flexibility needed for many different learning algorithms to cooperate, for the cases of both declarative and procedural knowledge. Neural net type activation spreading can work together with logical inference, with genetic programming, and so forth. This is a necessary prerequisite for building a mind.
A node represents the knowledge that some entity exists in the external
world, or that some concept "exists in the space of worthwhile
concepts." A link represents the knowledge of some relationship between
internal or external entities. Sets of nodes and links, or patterns among nodes
and links, represent sets of relationships of various types, which are also a
valid kind of knowledge. The learning mechanisms associated with the node and
link knowledge representation are diverse and will be discussed in the following
sections. Nonlinear dynamics plays a role here, as does reasoning and
evolutionary programming and more specialized learning methods like
categorization and causal inference. It's crucial that these diverse learning
mechanisms all operate on the same knowledge representation, i.e. nodes and
links. In this way they can not only coexist but build on each others' insights.
Declarative Knowledge
Webmind's representation of declarative knowledge involves several types of links, including inheritance and similarity links, and HaloLinks. HaloLinks will be discussed in the following section, on nonlinear dynamics
The system of inheritance links has special importance in one sense: It is the minimum subset of Webmind required to express all declarative knowledge according to standard mathematical logic.
The basic concept of "inheritance", expressed informally, is:
Concept A inherits from concept B to the extent that, for various entities, X, the B-ness of an entity X can be inferred from the C-ness of X
For instance, the animal-ness of a certain thing can be inferred from its cat-ness.
Similarity, on the other hand, is a symmetrical version of the same idea:
Concept A is similar to concept B to the extent that the A-ness of X can be inferred from the B-ness of X and vice versa
Inheritance and similarity, as dealt with in Webmind, are not Boolean quantities -- one concept can inherit from another one a little bit, a moderate amount, a lot, completely, not at all, etc. Each relation has a strength between 0 and 1, where 0 means there is no such relationship, and 1 means the relationship is complete. Real-world relationships never have strength 1, but mathematical relationships may have strength 1 by definition. The method for computing strengths of these relations is frequency-based, and proprietary.
Finally, it is not difficult to build up from binary inheritance relationships to the more ambitious goal of expressing everything there is to be expressed? We don't just want to be able to say that cat inherits from animal, we want to be able to say that cats eat mice, or flies give diseases to people, and so forth. We want to express complex n-ary relations. But this is easier than the non-mathematician might think. As Charles Peirce was the first to observe, all n-ary relations can be decomposed into ternary ones. "Ben kicks Ken" means that Ben falls into the set of things that kick Ken, that Ken falls into the set of things that Ben kicks, and that kicking falls into the set of things that Ben does to Ken.
A somewhat similar ruse allows the system to express knowledge about
knowledge, and higher orders as well - knowledge about knowledge about
knowledge, and so forth. In logical terms, higher-order knowledge refers to
knowledge about propositions. In Webmind, this corresponds to links that point
to other links, rather than to nodes. Quantifiers and variables enter in here,
and care must be taken to treat them in an intuitive, self-organization-friendly
way. One also needs BooleanRelationNodes, encapsulating AND-OR-NOT relations
among basic Webmind links. This kind of "higher-order inference" is
complicated, which is probably why people are so bad at it. But it's no serious
problem for Webmind, conceptually or mathematically. Meaningful higher-order
inferences often involve probability distributions rather than absolute
probability values: this complicates things but doesn't make them intractable.
Procedural Knowledge
In WAE, we represent procedures using structures called SchemaNodes. SchemaNodes may be converted into declarative form for the purpose of reasoning about them, but for the purpose of enacting them, the SchemaNode form is used, because it vastly simplifies the kinds of processing required during procedural execution. Complex "schema" or "procedures" are represented in Webmind using self-organizing, adaptive networks of SchemaNodes.
Schemas are of several types and may focus on:
* Perception
* Action
* Perception-action
* Cognition
Perception schema just represent data in a certain way: this can of course be a dynamic relationship. Action schema do things: in our context, answer queries and send emails, evaluate risk, pass on messages, etc. . Perception-action schemas have perceptual inputs AND do things. Cognitive schema represent models and operational frameworks that guide other processes including schema search, such as concepts, classification schemes, and predictive models. Schema are stored in LTM or STM, in many case linked to the goals that they help achieve.
Schema have many uses in Webmind. To take a single example: Planning of a
course of action is achieved by inferential (causal) linking of schema. When
schema are linked together in chains, we must check that the conditions at the
end of one are really appropriate to initiate the following one. Standard
approaches to planning include forward and backward chaining, abstraction,
decomposition, and merging. These can be readily set in the context of the
psynet, using causal relations on nodes and links.
6. Nonlinear dynamics is crucial for information creation, memory, and system control
The actors making up the mind interact in complex ways, and the overall interaction patterns between the actors are just as crucial to mind as the individual actors themselves. Nonlinear dynamics and complex systems theory are the branches of science that deal with the emergent, holistic properties one often obtains when a bunch of simple elements are allowed to interact. The complexity and usefulness of the emergent structures and behaviors obtained, of course, depends on the particular ways the actors interact.
In the Webmind AI Engine, nodes and links are given roughly the dynamics of a standard attractor neural net (ANN), which is a relatively well-understood dynamic with a plausible connection to the dynamics of the human brain (although admittedly, the details of this connection are debatable). Nodes spread activation to each other along links; and each node maintains an importance which is a time-weighted average of its recent activation.
This is used for several purposes. Nodes build new links based on the activation patterns they trigger; this is called "halo formation" and is the main instance in the Webmind AI Engine of the use of nonlinear dynamics for information creation and memory. On the other hand, the nodes with the highest importance get to act more often, according to psycore's multilevel scheduler; in this way nonlinear, emergent dynamics governs system control.
This is not the only aspect of system control; there is also a strong element of centralized control and coordination. For example, there is the Attention Broker, which provides a more rule-based approach to control, applied to heavyweight processes that want large amounts of CPU time. But even the AttentionBroker uses dynamically-constructed node importance as a key criterion in selecting which heavyweight processes get to execute.
ANN's versus WAE
Let's look at the relation with ANN's in a little more detail. In attractor neural networks, system attractors are used to store memories and to embody solutions to combinatorial problems. The AI Engine is a dynamic semantic network, continually reconfiguring itself based on its study of itself; and its dynamics may be extremely complex, manifesting subtle attractors and transient patterns of various types. A natural question is therefore: What is the role of ANN-style memory in WAE?
Generally speaking, attractor neural nets are a very inefficient way to store memories, but there is one purpose for which the attractor phenomenon captured in these networks seems to be very useful. An AI system should be able to focus its global attention on one set of nodes very tightly, rather than in a diffusive way as promoted by ordinary psynet dynamics. This is what focused consciousness is all about. In WAE, nodes representing concepts, percepts or actions can spread activation (via objects called main Stimuli) to each other like nodes in an attractor neural net, and their activation affects another node quantity called importance. Importance is used to determine how much CPU time a node gets, via the importance-based schedule, defined below.
Halos
The process of "halo formation" somewhat resembles the process of memory retrieval from an ANN used for content-addressable memory To form a halo, a node N sends out activation along its links, and continues pulsing activation for a while. The objects carrying the activation (Stimuli) are specially marked with the handle of the node that sent them, so they don't get confused with any other stimuli traveling around. After the stimuli have spread for a while, then they're collected. A node M is considered to be in the "halo" of node N, to a degree proportional to the percentage of the stimuli that N sent out that arrived at M. A link called a HaloLink is then created, pointing from N to M. The HaloLinks pointing out from N, collectively, form a picture of the "activation distribution" that arises if node N is activated.
Conclusion
So we see that the maxim "Nonlinear dynamics is crucial for information creation, memory and system control" appears in WAE in a couple different ways. Nonlinear dynamics are used for attention control via main Stimulus spreading, and for explicit information creation via halo spreading. Of course, these particular mechanisms interact with other aspects of the system in many different ways, giving the system as a whole a nonlinear-dynamical character.
An interesting point here is that, both in the main activation and the halo
spreading cases, convergence to final attractors is not as important as creation
of interesting transient structures. This agrees with observations made by some
pundits of nonlinear dynamics in psychology and life sciences, who have said
that real biological systems, even if strongly nonlinear in their dynamics,
rarely demonstrate convergence to an attractor. For example, Robert Gregson's
nonlinear psychophysics is entirely based on clever use of the transient
behavior of nonlinear iterations.
7. The Mind requires an inference engine that can operate adaptively, incrementally and creatively in the face of extreme uncertainty
Ben Goertzel and Pei Wang
Logical reasoning is indispensable to the mind, providing a tremendous short-cut to more basic neural-net-like, association-based methods for deriving new relationships from known ones. It's particularly critical for dealing with knowledge that is transmitted to the mind indirectly through language.
However, traditional formal logic systems are not adequate for embedding in an integrated mind system, for several reasons. First, they don't deal adequately with uncertain information; some variants such as fuzzy logic try, but all have severe flaws. Second, they aren't good enough at updating a collection of inferred relations adaptively and incrementally; for instance, in a standard Bayesian network, when one new piece of information is obtained, the whole network must be updated. None of them are set up to adaptively update themselves from live data streams, such as the human brain does through its sense organs and the WAE does through continuous numerical and textual internet feeds. And finally, they focus excessively on deductive reasoning, not integrating analogical and inductive reasoning; and when these other types of reasoning are integrated they are not done so in a smooth way which allows the various types of reasoning to learn from each other.
In the WAE we've created two different reasoning systems fulfilling these requirements: Pei Wang's Non-Axiomatic Reasoning System, and a related Probabilistic Reasoning System. These both act on nodes and links, building new links by combining old ones according to various rules, in a way consistent with the underlying activation-spreading dynamics. Reasoning rules have been worked out for dealing with complex cases such as multiple-target links (used to represent n-ary relations), links pointing to links and nodes representing conjunctions or disjunctions of links (used to represent higher-order relations). In this way we have a complete, self-organizing predicate-logic-style system that deals robustly with uncertain data, and integrates naturally with the other aspects of the WAE, such as neural-net-like nonlinear dynamics acting on the same links.
The Role of Reasoning in WAE
In WAE, "logical reasoning" is one part of the mind rather than, as in some AI systems, the center of the mind. It is a special type of method for deriving new knowledge from available knowledge. It generates new knowledge according to a set of predetermined inference rules, which are independent of the domain of application and the particular content of the knowledge involved. Reasoning does not add new knowledge to the system, but by deriving new knowledge from available knowledge, the system can get information that is not explicitly stated in the previously available knowledge.
Most of WAE's knowledge is represented as links from nodes to other nodes. Reasoning uses link-embodied knowledge to generate new link-embodied knowledge, generating new links from old. It does so in an incremental fashion: each inference step use a few links as input (called "premises"), and generates one or more links as output (called "conclusions"). Webmind reasoning has to do mainly with the inheritance and similarity relations. It also deals with complex relations using multiple-target links and links that point to links or groups of links (higher-order inference), as mentioned above.
In its basic form, the Webmind reasoning system deals with the transitivity of inheritance, i.e. the "logical deduction" relation
A ( B and B ( C implies A(C
and with the reversibility of inheritance, i.e. the less certain relation
A ( B suggests B ( A to a certain degree
It also deals with other forms of inference, such as induction
A ( B and A(C implies B(C
and abduction,
A ( B and C(B implies A(C
Doing reasoning on links like this is called "term logic" (the ordinary kind of logic os predicate logic, which is similar but proceeds from different foundational assumptions). Depending on the mathematical formalism one chooses to underlie one's term logic, in some formalisms (e.g. PTL, Probabilistic Term Logic) all forms of inference can be derived from deduction and reversibility, whereas in others (NARS, Non-Axiomatic Term Logic), other forms of inference like induction and abduction must be considered separately.
Standard logic (including standard term logic) deals with the derivation of absolutely certain conclusions from absolutely certain premises, but in real life this is rarely applicable. A real-world reasoning system has to deal with uncertain knowledge about intermediate degrees of inheritance. It has to be able to deal with premises such as "It's reasonably certain that this person is moderately but not highly intelligent," as opposed to simple binary premises, such as "this person is intelligent" or "this person is moderately intelligent." There are standard mathematical ways to do this, using fuzzy set theory and probability theory, but unfortunately, none of them are really adequate. So we use our own mathematical reasoning rules. Currently we are experimenting with two variants, Pei Wang's NARS system, and the PTL system developed by Jeff Pressing and myself.
The inference system becomes subtler when one considers higher-order logic. This involves nodes representing variables, and nodes called CompoundRelationNodes that contain Boolean (logical) combinations of links. Old friends from formal logic like unification and quantifier dependency rear their heads here. But, it's all founded on the self-organizing foundation of term logic, which in a WAE context represents reason as one among many mechanisms of dynamic link formation. The really subtle point about higher-order inference (HOI) is HOI control, which is a sufficiently difficult problem that we believe it possesses no general solution - instead, different applications of HOI require their own specialized HOI control mechanisms.
In general, the relation between the reason module and the rest of Webmind is a complex one. First of all, there is an informational exchange: inference builds links that other processes use; and uses inheritance links built by other processes. And there is also a relationship on the level of control. The inference process requires a control strategy, which determines the following things:
* when to apply an inference rule,
* where to get the premises,
* what to do with the conclusions.
These decisions are necessary, because no real-world system can possibly afford the computational expense of exhaust all possible inferences. Reasoning does not provide its own control strategy; this is provided by a combination of simple heuristics, and input from other aspects of Webmind, primarily Webmind dynamics. This is a primary instance of inter-module emergent intelligence. Halos guide inference - if one node is in another's halo, then the system should look for a logical relation between them. And importance-based attention allocation guides inference - if two nodes are important at the same time, then the system should look for a logical relation between the two of them. This kind of control heuristic is
One may wonder whether complicated mathematical reasoning rules are really
important for general intelligence. More specifically: No one can reasonably
doubt that the mind does reasoning. One may wonder, however, whether formal
reasoning, using specific inference rules, should be explicitly coded into an AI
system, or whether it should be allowed to emerge from more basic operations. As
it turns out, this is a subtle matter, and largely a matter of efficiency. I
believe one can embody formal reasoning entirely in simple neural net dynamics.
This is a fine approach, given that you have a very long time to wait. In
practice, however, it's not particularly efficient. Reasoning gives a way to
short-cut complex neural net dynamics. I suspect that the brain contains
specialized neural structures that effectively implement reasoning rules, in
some subtle way that we don't yet understand. For WAE purposes, it seems clear
that getting logic itself to emerge from underlying subsymbolic dynamics is just
not a practical approach at the moment; and it seems that the term-logic
implementation of inference has all the flexibility and interoperability needed
to serve as the logic component of a digital mind.
Schema and Planning
A major application of inference in Webmind is planning -- determining what to
do in a given situation. Actions, in Webmind, are encapsulated in schema; but as
we have discussed above, schema can be translated into relations form so that
they can be learned, improved and evaluated using inference. Higher-order
inference plays a crucial role here.
In logical terms, a schema is a special kind of event that corresponds to an
operation that can be executed by a system. With the various kinds of
higher-order relations (Implication, Equivalence, various kinds of causal
relations), we can specify the relation of a schema with other schema and states
of affair in the system. A plan is defined as a series of actions that is
estimated likely to cause a certain goal to be achieved.
The big issue here is inference control. The higher-order inference rules only
specify how to generate valid conclusions from available knowledge, but say
nothing about which potential conclusions (there are many of them) are worthy of
being derived. In Webmind, these problems need to be solved by using Webmind
dynamics and module-specific knowledge to guide the inference. This is a classic
case of inter-module emergence. Schema learning (planning) is carried out by
specialized components of WAE, which have their parameters tuned to deal with
particular types of schema learning problems. For instance,
8. Causal inference must be handled by a combination of temporally-constrained logic, and domain-dependent specialized knowledge about the "cause" concept
Jeff Pressing and Ben Goertzel
Causal inference is a specialized kind of reasoning, but one with a vast importance. It has a long and complex history in philosophy in the natural sciences, and in fact, a minority of distinguished thinkers such as David Hume and Bertrand Russell have spoken against its meaningfulness, arguing that causation is an abstract concept without realistic importance. In our view, however, causal inference is not only meaningful but crucial. It forms the basis for a large fraction of human interactions, and society itself. We expect certain things to happen when we do other things. Otherwise, our abilities to plan or address goals, or have meaningful expectations and intentions, or form dependable relationships are strongly compromised. So any kind of intelligent being that is to be created must deal with this in a practical and humanly meaningful way.
In pragmatic social terms, mechanisms of cause are believed in and used by all societies. Groups and individuals have their own causal palettes that provide ready candidates for causes used to explain events. In traditional societies, gods, persons, animals, magical (e.g. healing) substances and laws and historical events are invoked. Western societies have their own biases. The degree to which an event is considered agent-caused and circumstance-caused varies with beliefs. It has been well-established empirically that Western societies, for example, being individual-centered rather than group-centered, are biased towards inferences of cause based on individual agency rather than situational forces (in social psychology, this is called the fundamental attribution bias).
In a sense, causal inference can be addressed by general reasoning methods. Causation, in its simplest form, is "Implication plus temporal precedence." This is simple only in concept, however: Complex causation of one process by another involves implications between complex logical constructs. Nuances of causal ascription in various domains can be learned by a digital system over time, just as they're learned by humans over time - small children don't ascribe causes as adults do. We believe that the patterns culturally associated with the "cause" concept can be learned through experiential interaction.
In practice, causation is a sufficiently subtle and crucial aspect of mind that it seems to make sense to "hard-wire" in a fairly refined understanding of causation, for dealing with various special cases. In particular, rhe WAE's initial perceptual world consists of textual and numerical information, and it seems that efficient inference of causes among numerical data sets requires some special techniques. The results of these techniques may not be different from those that would be obtained by application of inference to the individual points and patterns in the data series involved, but the specialized techniques are vastly more efficient. Such techniques will be described in Chapter 11 on numerical data analysis.
Of course, in the real world, causality is often quite subtle. Consider a financial market such as the S&P 500. Here there is a variety of causes that can affect the equities, including:
* changes in business conditions in significant component submarkets like
technology (process)
* interest rate directions (process)
* government financial announcements (events)
* comments from significant individuals (events, agents)
* specific acts by operators with significant market share (events, agents)
* general attitudes of traders & investors (agents)
This complex process is evidently a mixture of event, process and agent-based causalities.
A further distinction often needs to be made between causes and what are customarily called enabling conditions. Enabling conditions are generally implicated in producing the effect; but they display no significant variation within a given context. Hence their true causal status is not well tested. Example: oxygen is necessary to use a match to start a fire, but since it is normally always present, we usually ignore it as a cause, and it would be called an enabling condition. If it really is always present, we can ignore it in practice; the problem occurs when it is very often present but sometimes is not, as for example when new unforeseen conditions occur in the future.
In practice, in Webmind, we address causal inference by general reasoning methods, according to the approximation Causation is Implication plus temporal precedence." We also have wired in techniques for the efficient inference of causes among processes described by numerical data sets. And, finally, we require the system to learn the pragmatics of agent causality through social interaction, in the context of the Baby Webmind project. ( If it drags a file into a folder, and the folder is then deleted, did it cause the file to be deleted, or did the person who deleted the folder cause the file to be deleted? The subtleties of agent causality in cases like this are hard to formalize, but we humans pick them up through experience.)
In short, we believe there is no simple mathematical formalism for causal
inference, even on the level of plausible approximation. Causal inference is not
a subclass of logical inference. However, there is a core to causal inference
which is temporally-constrained logic, and this core must be augmented by
in-built intuitions and experiential learning about causation in particular
domains. "Cause" is a complex concept which to some extent reflects
peculiarities of physical environment (thus it will be different for a system
with a digital environment than one with a physical environment) and culture
(thus it will be different in a culture of digital organisms than in a human
culture). One can expect a digital mind to master the human concept of causality
only partially, and to bring its own unique intuition to bear to make different
and insightful causal judgments.
9. Evolution is a key learning mechanism, supplementing reasoning by
providing a more "global" approach to learning problems
In theory, evolutionary programming is an extremely broad and powerful approach to AI, but in practice, it's only adequately efficient for certain types of problems that the mind confronts. It's a powerful tool for optimizing the numerous parameters that any system as complex as a mind will naturally possess. Also, there are three specialized areas of learning where evolutionary methods seem to be indispensable: Inference of causal relations between processes; inference of procedures for achieving goals in particular contexts; and optimization of system self-control parameters.
Webmind dynamics is based on the survival of the fittest nodes and links, and reproduction of the successful nodes and links to produce others. In this sense all Webmind computing is "evolutionary computing." Webmind is an evolving system - an evolving ecological system, no less, in the sense that the fitness of a node or link depends on the other nodes and links around it.
This is different, however, from saying that Webmind is an "evolutionary computing" system in the standard computer science sense. Normally this term has a more specific meaning. It the technique of isolating a purpose-specific "fitness function" and running a highly optimized evolution process aimed at creating entities maximizing this fitness function. In this category we find genetic algorithms, genetic programming and artificial life, which all play a role in Webmind, as specialized learning techniques, which in some domains are the best techniques available.
In particular, we have found evolutionary methods to be useful for system-wide parameter optimization, for causal inference in numerical domains, and for schema learning. Implementationally, these are carried out using a generic evolutionary computing software framework which allows nearly any Webmind objects to evolve, by defining how they mutate, cross over, and assess their fitness in a given context.
So, Webmind is an evolutionary ecological computing system in the broad sense, and among the many algorithms that exist within it are some that are themselves explicitly evolutionary, and some that are not.
Genetic Algorithms and Genetic Programming, as used within Webmind, extend traditional practices in these areas in various ways, which are best summarizing by pointing out that traditional GA/GP leave out two critical aspects of real-world biological evolution: embryology and ecology. Embryology is taken care of in a variation of GP we call "epigenetic programming," where one evolves genetic code sequences that are used to generate programs whose fitness is then assessed. Ecology refers to the fact that, in an actual population of living beings, the environment itself changes in response to the change in population; therefore, the relation between the environment and the species living in it is synergetic instead of fixed. This is critical, for example, in the evolutionary learning of cognitive schema.
The relation between evolutionary computing and distributed processing is interesting. A standard GA/GP implementation consists of a single population running on a single machine. In a Webmind or Webworld context, however, one wants to make use of distributed processing, and split a population up among many different machines. It turns out that although this involves some small overhead, it can oftentimes increase the efficiency of evolution due to the intrinsic mathematical properties of a well-divided population.
10. The hybridization of evolution with inference is critical to the solution
of difficult learning problems.
A very critical aspect of WAE intelligence is, we believe, the hybridization of evolution with inference. We see no other way to solve extremely difficult learning problems, such as language learning, and schema learning. Higher-order inference is required for these tasks, but HOI requires the creation of numerous CompoundRelationNodes representing logical combinations of links in the system. Inference itself can be used to form simple heuristics for CRN formation, but we believe that to do truly powerful inference, an evolutionary method must be used to form CRN's. Yet, this is not a pure evolutionary system, because the fitness evaluation involves inference: the fitness of a CRN representing, say, a linguistic rule or a schema, is how useful the system infers the CRN will be in various situations. Evolution and inference are effectively fused into a single hybrid learning method.
To oversimplify a bit, we may say that logical reasoning is valuable for evaluating solutions to causal and procedural inference problems, and for varying on existing solutions, but evolution is needed because of its ability to reasonably rapidly provide creative candidate solutions that bear little apparent resemblance to the mind's existing knowledge.
This is a key example of emergence in the WAE, which typifies the power that
one obtains by creating a framework in which different aspects of intelligence
can work together.
11. Categorizing the world is a specialized learning task that demands specialized methods, or at least specialized control mechanisms for inference and evolution
Dividing the world (and the mind) into categories is a common occupation of intelligent systems. It can be carried out by generic reasoning mechanisms perfectly well; however, it seems to occur so often and with such real-time speed requirements that specialized categorization mechanisms are required, including methods for supervised categorization (based on externally specified categorizations) and unsupervised categorization (clustering).
The same statement holds for "coherentization," the recognition of coherent entities among a stream of real-time percepts, which is a subtle combination of clustering and grouping based on spatiotemporal contiguity. (Coherentization corresponds to the classic "binding problem" of cognitive science, and represents our approach to its resolution.)
Inference and evolution are adequate means for solving these sorts of
problems, but in order to give optimal performance specifically on
categorization problems, particularized control mechanisms are necessary. We use
genetic programming for supervised categorization, along with specialized
techniques like Support Vector Machines (utilized through the public-domain WEKA
toolkit). For unsupervised categorization, we use standard clustering techniques
like EM, combined with generic inference and association-finding mechanisms.
12. Conscious experience has a particular structure involving a perception/action/short-term-memory triad, and a particular dynamic involving goal/context/schema triads
Ben Goertzel and Cate Hartley
The AI Engine design is based on the premise that conscious experience uses substantially same basic structures and dynamics as the rest of the mind, but deployed and directed in a slightly different way. However, it is acknowledged that there may also be some structures and dynamics peculiar to either consciousness or unconsciousness.
In the human brain, for example, one may contrast the cerebral cortex (the nexus of conscious activity) and cerebellum (which has little role in consciousness). Both are made of neurons obeying the same dynamical and structural principles, but the cerebellum has distinctive cells that are not found elsewhere (Purkinje cells) that have uniquely broad dendritic structures. This different manifestation of the common underlying neural structure has profound functional implications.
In WAE, the analogous observation is that, even though they contain largely the same actor types, populations of mind actors dealing with perception, action and short-term memory need to be regulated differently than generic mind actors. This is accomplished partly by having special schema for dealing perception and action, and specialized system components for dealing with Short-Term Memory and what we call Attentional Focus.
One key aspect of the distinction between conscious and unconscious processing is control. In addition to the continual, percolating nonlinear control of mental activities, a mind needs to be capable of explicitly goal-directed actions. Recall after all that intelligence, as we've defined it above, requires the achievement of complex goals in complex environments. Goal-orientation is needed in order for knowledge representation, creation and manipulation systems to be honed into real-world effectiveness. Thus, in order to deploy its self-organizing node and link system intelligently, Webmind requires some specialized structures for "experiential interaction" - for interacting with and experiencing the world.
This is carried out in the WAE through goal/context/schema triples. A goal is
linked to schemas that help to achieve it in given contexts. Activation
spreading causes schemas relevant to current goals in the current context to
become active. This process is particularly intense in the short-term memory,
though it can occur throughout the mind.
This aspect of the WAE is not so far off from what goes on in standard
rule-based AI systems, and from standard ideas from cognitive psychology. The
difference here is that things like goals and perceptual schema are not rigidly
coded constructs; they are nodes and links with all the flexibility,
adaptability, and interconnectedness and continuity with the rest of the mind
that this implies. Just as a rigid structure is not the answer, nor are
flexibility and interconnectedness alone the answer: the key is to have the
right pattern of connection imposed on a flexible underlying system, where
"right" is determined relative to the domain of operation of the
system.
Short-Term Memory/ Attentional Focus
The first aspect of experiential interaction to consider is the STM/AttentionalFocus, the collection of nodes and links that Webmind deems most highly relevant to its interaction with the world, its long-term goals and its internal thought process, at any given time. Of course STM and related concepts have been the subject of intense psychological theorizing - to give a summary one would basically have to review the whole of cognitive psychology. Some of this background will be reviewed in a later chapter.
In the AI Engine, for practical purposes, we have split this up into two data structures
* Short-Term-Memory (STM), a sort of intelligently enhanced cache of recent
thoughts, percepts and actions
* AttentionFocus (AF), a collection of the most important things in the system
at a given time, some of which are in the STM and some of which may not be,
which are allowed to freely interact and carry out complex, expensive learning
processes
The AttentionalFocus, intuitively speaking, is viewed as a seething cauldron of concept creation, including processes such as the coherentization of vague percepts, the formulation of difficult actions, planning, spontaneous problem solving, imagery and inner speech.... A key focus here is the creation of new concepts, which can then be released into the rest of the mind and improved and interrelated slowly. This reflects the maxim that "What consciousness does is to create coherent wholes" - often in the form of new nodes. AF contains a mix of expensive node processes, which allows it to construct a dynamically shifting "deep and thorough view" of the things that it contains. Each thing in AF may be cast in many different forms, until a form is found that resonates with information in LTM (the rest of th emind) and allows useful or interesting new conclusions to be drawn. AF has limited capacity because it uses so much CPU time on each element inside it, and on each combination of elements inside it (since so many of its processes are based on interaction between elements of AF).
On the other hand, the STM is basically a storehouse, containing whatever other processes dealing with real-time interaction need to know about the current context of interaction. Many schema, including those involved with natural language understanding and generation and planning, require this knowledge.
In the human brain, the distinction between STM and AF is presumably not drawn in exactly this way. However, there are apparently complex subdivisions within what is loosely called "STM" in the brain, and as cognitive scientists over the next decade or two will unravel what they are it's possible we will learn something that helps us to refine WAE's conscious experience component yet further.
It's worth noting, finally, that the WAI's experiential interactions potentially possess one kind of complexity which is not present in the human mind: this is that the WAE can have several parallel interactional channels. A human can "multitask", but does so using the same channel for all the parallel tasks, which leads to confusion if more than 2 or 3 simultaneous tasks are undertaken. WAE does not have this limitation; it is able to hold on an unlimited number of simultaneous conversations, limited only by memory and processing power. It just needs a separate STM (and potentially, AF) for each one.
Self-Knowledge
Constructing a mental model of the self for a machine involves difficult
conceptual and philosophical issues. What properties of self should be
engineered and what properties of self emerge from the interaction of other
elements in the system? The model must provide a notion of identity and personal
experience. The systems must have a motivational framework in which it sets
goals and takes steps to enact them. The system must form goals based on its own
notion of personal satisfaction. In designing the memories, emotions, and goals
of the system, we are inevitably guided by introspection. However, the
temptation to anthropomorphize can lead to problems. The mind that we are
constructing does not live in our physical world, and thus will have feelings
and goals which are of a vastly different nature of our own. Thus we may use a
model of the human mind, not as a blueprint, but as an analogy for our design
from which we can try to speculate about how the function of our own cognitive
structures would translate to the mind of a machine.
The model of self in Webmind is embodied in the dynamics of several types of
nodes. In the earliest version of the system, we had a highly centralized
self-system, focused on the SeflNode, which was tightly coupled with the
structures in the system embodying feelings, goals, and memory. We realized,
though, that the relation should be somewhat looser, so that more of
"self" is distributed through the system as a whole, both in terms of
object structure and in terms of dynamics. From a philosophical perspective, it
seems that the previous design for SelfNode represented an excessive
centralization of self-functions. The self has both a centralized and a
distributed aspect, both of which are represented in our current design.
The SelfNode node grounds the word "I," and is hence is used within relations and schema that refer to the system itself. Very generally, we may say that the Self is concerned with "I am", "I will be", and "I was." Links into the STM and LTM deal with "I am." If the system knows "I am a computer," this is represented as an inheritance link from the SelfNode to the ConceptNode "computer." Results of predictive schema deal with "I will be." These predictive schema look into the long-term memory to study the system's past, in order to determine "I was," with the purpose of helping to find out what "I will be" SelfNode must carry out reasoning and association-finding activity, intensively, so the system can learn new statements about itself.
Knowledge of Others
On the other hand, knowledge of others is harder to gather than knowledge of self, but no less crucial. A main goal of the system is to please human users, and this can only be done if knowledge about human users' desires is accurately estimated.
User-adapted interaction, however, requires the use of an explicit or implicit model of the user. We have chosen a mixed model. A UserNode embodies an explicit user model, but it branches out into a diverse and complex web of nodes and links, which embodies subtle and implicit user knowledge. Via UserNode, the system contains mechanisms to exploit both explicit and implicit information to adapt its behavior to specific users dynamically.
UserNode is related to SelfNode, but quite different in details. It contains a record of the system's interaction with the user, and is able to predict the user's feelings and interests in the future based on this record. It also produces filters to help determine what information the user actually wants to see, based on study of the user's past interests. This is of immense practical use for Webmind-based products, and must exist even in simple Webminds that have no self-knowledge or psyche, and are only directed toward serving the needs of simple products like the Text Classification System or Market Predictor.
Each UserNode maintains a mental model of a given user. This model includes a
representation of the user's beliefs, preferences, interests, feelings and
episodic memory. The UserNode also will be able to store basic factual
information about a user and more specialized information about the user
depending on the Webmind application through which the user is interacting.
Analysis of the historical data maintained in the user's episodic memory will be
used for the prediction of a user's feelings, reactions and preferences in
future situations.
User preferences represent a user's mental context. User preferences may be
categorized to represent multiple dimensions of personality. Users sharing the
same preferences can be categorized to represent group personality. User beliefs
record the ontology of given user. User beliefs are a kind of relative knowledge
- knowledge that Webmind has of the beliefs of another mind. Both preferences
and beliefs can be used to allow disambiguation, filtering, and personalized
answers. The perception of user's affective states and goals can enable the
computer to give appropriate responses to the user. Prediction can be used to
anticipate user needs and let the system work on it (for example, looking for
information).
The figure below shows a simplified view of a sample PsyNet subgraph, perhaps
storing data about user's behavior with a particular application, involving
UserNodes (U), ConceptNodes (C), beliefs (B), user queries and utterances stored
as UtteranceNodes (Q), application specific data within DataNodes (D) and
TextNodes (T).
_
Feelings and Goals
How does Webmind decide what to do? In the Long-Term Memory component of Webmind, as described in previous sections, we have largely focused on ongoing background cognitive processes. There are also processes triggered by queries, such as search queries, categorization and prediction requests, and so forth. In order to truly be intelligent, however, a system needs to be able to achieve complex goals in complex environments. To some extent, this can be achieved implicitly. But we also believe it's useful to create an explicit representation of goal achievement in the system. This is in line with the general hybridization focus of the Webmind architecture. We have the symbolic-AI-style, explicit-goals approach, sitting on top of a vast self-organizing matrix of nodes and links that tells what these explicit goals really mean.
As in the human mind, Webmind's goal-orientation is deeply wrapped up with
its feelings. Most of Webmind's goals appeal to its feelings, or the feelings of
others. We all want to be happy - now and in the future. We want others to be
happy. Goal achievement requires feeling assessment. It also requires
understanding of context, because schema for achieving goals will in general be
context-dependent rather than universally useful.
Feelings are assessed through nodes that monitor and record the presence of
certain trends in the psynet that are relevant to the determination or execution
of the goals of the system. The same system can also evaluate a conjectural
state of the net, determining how the feeling would be evaluated in this
conjectural situation. The system contains a set of hard-wired
"feelings" that are adaptive mechanisms for the system; and can also
build composite feelings out of these.
In order to achieve its goals and hence fulfill its feelings, Webmind uses
schema, as described above. But each schema is only useful for achieving a goal
in a certain context, where a "context" represents a type of situation
- e.g. "talking to Ken about finance," "trading bonds in a
volatile situation." Hence intelligence, the achievement of complex goals
in complex environments, thus badly requires a process of context formation. To
some extent context formation can be done based on goal success - a context is a
set of events in which the system has a successful schema for achieving its
goals. But there is also an independent context formation process.
Schema learning is thus seen to be close to the center of Webmind's
intelligence. This is a difficult kind of learning, which can be carried out in
the system in several ways, including logical inference, evolutionary
programming, and simple reinforcement learning.
13. Intelligent human language processing in a digital system requires a dynamically adapted combination of built rules and experiential learning
Ben Goertzel and Karin Verspoor
Natural language understanding is very different from the other tasks we confront in the WAE design, for the simple reason that human language was made for humans, not WAEs. We humans evolved to understanding the world as it is objectively presented to us, recognizing patterns in the world and acting in the world so as to achieve our own goals. The WAE, using the experiential interactive learning framework, will do the same. However, humans have never so far been asked to understand and produce the language of an entirely alien organism - say, intelligent gas clouds in the Jovian atmosphere. Yet this is basically what we're asking of the WAE, or any other computer program, when we ask it to understand human language. Basically, what we call "natural language" is natural to us, not to the WAE - and this doesn't make the WAE less intelligent, just less human.
Getting Webmind instances to communicate using an entirely Webmind-insular language, incomprehensible to humans, would in some ways be a much easier task than getting Webmind instances to understand human language. (Of course, in other ways it would be more difficult. Debugging the code supporting a collection of humanly-incomprehensible organisms would present its own difficulties!)
However, we have not focused on this in our practical work with the WAE, for obvious reasons: We don't want a race of digital organisms that are completely inscrutable to human beings. Rather, we want Webmind instances that will coevolve with humans beings, communicating with us, growing with us, helping us to conduct our lives and solve our problems. However, it's worth remembering that current human language is not the be-all and end-all of communication. The WAE likely won't master human language in a completely human-like way, and that's OK. It will have its own mode of communication, using human language as one tool among others, and using human language in its own ways. Communicating with it will ultimately change the way we communicate, just as e-mail and chat have changed our communication patterns, but probably more profoundly.
In practical terms, we have several requirements for NL in the WAE --
1. Thorough analysis of natural language inputs, resulting in the
representation of the meaning of those inputs in terms of WAE data structures.
2. "Quick and dirty" analysis of natural language inputs using
statistical or other machine learning or information retrieval methods and
hybridization of these methods with subtler semantic analysis as in point 1.
3. updating of the knowledge embodied in the WAE through information gathered
from linguistic inputs (integration of content extracted from documents and
conversations with the general knowledge represented in the Psynet)
4. interpretation of natural language queries, and initiation of
question-answering processing within the WAE.
5. production of natural language responses to queries (this could include a
range of different response types, such as simple relation retrieval and
expression, document summarization, to full causal explanations).
6. disambiguation of user inputs/queries via conversational interaction.
7. free, spontaneous conversation between Webmind instances and humans
Very broadly speaking, there are basically three different approaches to getting the WAE to possess these capabilities:
* Statistical (IR, information retrieval like technology)
* experiential interactive learning (EIL)
* rule-based (the "standard approach" of computational linguistics).
In the statistical IR-oriented approach, the WAE treats language just like it treats numerical data, almost as if it were a physical artifact rather than something with known communicative intent. If this approach were taken long enough, with enough sophistication, it might eventually lead to the recognition of subtle semantic patterns in texts and queries. The hardest thing for this approach to grasp would be pragmatics: an understanding of why something is said when. One might argue that even pragmatics could eventually be learned by pattern recognition, given enough time and resources - but of course, this is a speculative conclusion motivated by pure mathematics, and not relevant to actual learning in the physical universe.
In practice, though, we use this approach to give the WAE a quick and glib understanding of texts, when it needs to process a large number of texts at a greater rate than the more sophisticated methods will support.
The second approach to language in the WAE is experiential learning (EIL) as described above and to be elaborated below. Using this approach, the WAE could learn language much like a human baby does - albeit, a baby with a vastly different perceptual surround and collection of motor tools for effecting actions. When it says something appropriate and useful, the environment rewards it. When it says something bad, the environment punishes it. Using this together with statistical IR learning, it seems quite probable that the WAE could learn human language on its own - eventually.
And the third approach is simply to build in a core of linguistic structures, and let the WAE modify it based on statistical learning and experiential interaction. This is much closer to the standard computational linguistics approach, as described in (Manning and Schutze, 1999) for example, so I call it the Standard Approach.
The WAE uses a hybrid of all three approaches. This decision took us a while to come to, but in retrospect, it seems quite obvious. For a while we were alternating between jazzed-up statistics, standard computational linguistics, and pure experiential interactive learning - because, after all, each approach has its irrefutable strengths. There is an attractive purity to the first two approaches, which avoid specific encoding of linguistic rules, or supervised learning from linguistic resources like tagged corpora and feature structure lexicons. But yet, the alien-ness of human language with regard to the WAE's sensory and social surround makes it extremely attractive to give it as many hints as possible, without giving it so much structure as to make its language module rigid and non-adaptive.
To give a bit more of a flavor for how this works, let's consider one particular aspect of the NL system - language comprehension - in slightly more detail. Conceptually, one may view Webmind's "language comprehension system" as consisting of a set of somewhat distinct processes:
* pragmatics
* semantics
* syntax
* morphology
* tokenization
Of course, these need not be carried out by different types of software objects. For instance, they can each be carried out by specialized schema, each one feeding output to the others on a situationally appropriate basis. Or they can be mixed up in large schema, that span the various levels - a schema that is active only in a particular real-world context, and that does a bit of reference resolution and a bit of semantic analysis. But by and large we believe these are somewhat distinct processes worth analyzing as such.
The goal of pragmatic analysis is for the system to understand what it's supposed to do with the linguistic input it's received. The pragmatics analysis schema need to ask, for example: Is this input a
* question to be answered
* declaration to be understood
* command to be followed
Once the pragmatics of determining the appropriate response to a statement has been taken care of, the semantic analysis process does its job of translating sentences or sentence fragments into nodes and links. The linguistic input is parsed into a collection of relationships. There is a fair amount of subtlety to this process. For a single example, consider the problem of semantic disambiguation: When entering a word with many senses into the psynet, it must be decided which sense of the word is intended (or, in situations of subtle ambiguity, to what extent each sense of the word is intended).
On a more fine-grained level, semantics is intricately intertwined with syntax, which relies on the lower levels of the natural language processing hierarchy. Tokenization schema split a stream of text up into words and punctuation - a process that is largely automatic, though in some cases it may require intervention from the semantic level. Morphological analysis schema recognize suffixes and prefixes and infixes, extracting word stems. Then, syntax, the most involved part of the NL process, transforms a sentence into a data structure representing the syntactic structure of a sentence, using a specially-optimized variant of certain higher-order inference processes (unification). This syntactic structure is acted on by semantic-processing schema.
The dynamics by which Webmind extracts semantic relations from texts is quite subtle and complex. It involves the cooperative action of numerous specialized semantic processing schema, which "swarm around" a syntactic strucure, transforming it in various ways and adding semantic features to the feature structures associated with various word-senses. Semantic processing schema can interpret the semantic features that others have created. This is very much in the spirit of the psynet model of mind, in which mind is considered a self-organizing network of interacting, intercommunicating actors. Complicated processes such as sense-disambiguation and anaphor resolution are not entirely done within the semantic processing schema; rather, these schema invoke various Webmind actions such as halo spreading and reasoning. In this manner, the whole of Webmind may be brought to bear on semantic processing.
Webmind dynamics, as invoked by semantic processing schema, are a key part of our approach to understanding language. In particular, we currently use Webmind dynamics for solving several specific problems in language comprehension:
* Anaphor and reference resolution
* Sense disambiguation
* Structural disambiguation
* Sense identification
* Concept generalization
Language production is done somewhat similarly, but in reverse. Some
collections of nodes and links are directly verbalizable, others are not. A
directly verbalizable collection is one where each node is linked to a
word-sense, and where the syntactic structures of the word-senses involved can
be joined together in a compatible way to form a sentence or a set of sentences.
A non-directly-verbalizable collection must be mapped into a closely
approximative directly-verbalizable collection, in order to be transformed into
a natural language utterance.
14. Extracting meaning from sensory data (for example quantitative financial data) requires complex and specialized data preprocessing methods, as well as tools for mapping the results and internal data structures of these methods into general knowledge representations
To really understand the world, a mind needs language, but it also needs to be able to ground its linguistic understanding in some nonlinguistic domain. The WAE has two such domains: the world of files and directories in an OS, and the world of numerical data files, particularly financial data files. The financial data world has a particular richness, some of which is exploited in the Market Predictor application.
The role of temporal information in the sensory realm is critical. In general, behaviours occur over time and so time and prediction have a special status. Both repetitiveness and novelty need to be addressed, meaning in mathematical terms that both stable/stationary and nonstable data must be handled. The WAE is able to cope with both stationary and nonstationary data structures, using for example datamining approaches for the first case and adaptive pattern matching for the second, among many other tools.
Analysis of financial and other quantitative data can be done using generic cognitive systems, but this is not efficient; it's clear that specialized data analysis methods such as wavelet analysis, statistical pattern recognition, correlational analysis and so forth are valuable here, just as frequency analysis is valuable for sound processing and edge detection is valuable for vision processing. The trick is to present the results of these specialized "numerical-perceptual" mechanisms in a way that is amenable to further analysis by generic intelligence mechanisms. In the WAE we have inference and specialized numerical-data analysis methods acting on the same links between DataNodes, for example.
Human expert knowledge has been utilized in designing the data filtering/preprocessing/optimal representation. This has formed a way to substitute for the effects of many years of adaptive Darwinian evolution, and the supporting social constructions of many years of cultural evolution of the knowledge domain. Ultimately, and this is not that far off, we expect the WAE to have adequate knowledge of general transformation capacities and be able to mimic such processes itself, with its community of other Webminds and on-line experts with whom it can interact.
The same basic principle applies to any perceptual data stream that one wishes to add to an intelligent system. There are going to be special perceptual preprocessing tricks appropriate to the given perceptual domain, and continuity between the output of these preprocessors and cognitive processes needs to be enforced.
This makes good sense in terms of human biology. As we know, there are parallel neural paths of sensory and higher level processing in the main sensory systems of audition and vision, where shifts between levels of meaning are most widely used in human thought. This suggests that the maintenance of direct sensory pathways and cognitively-mediated pathways that are both cooptable for specific goals and contexts is the right way to go, and our architecture is well set up to allow this, since for example, schema compete based on their relevance.
The Webmind Market Predictor system uses a combination of specialized numerical processes and general cognitive processes to provide superior prediction and trading performance on daily financial markets, based on a synthesis of numerical and textual information.
And, crucially, financial prediction and other real-world numerical data
analysis problems are not the only use case for the WAE's data modules. The WAE
also creates its own numerical problems -- it generates massive amounts of
numerical data. For instance, the problem of understanding overall WAE dynamics
is in large part a prediction problem, which can use the numerical prediction
methods to be described here. Aside from this, the WAE also needs to predict
other things such as user interests internal system parameters like memory and
processor usage
15. For a mind, digital or human, to become truly intelligent, a process of
experiential interactive learning is required, necessarily involving other minds
A human child has a certain genetic endowment which embodies the potential for intelligence, but to bring this intelligence out, many years of rich interaction with other intelligences is required. There's no reason to expect digital intelligences to be any different in this regard. Building the structures and dynamics of mind is only half the battle. A digital mind like the WAE needs to learn how to perceive, act and experience by watching other intelligent beings, and interacting with them in environments of mutual meaning.
In the WAE's case, this is provided by the Baby Webmind User Interface, which centers on a File World, a collection of files that the WAE and its human users can mutually manipulate. In this context the WAE can get feedback on its actions, can chat with users, and obtain experientially and socially valid groundings for its internal concepts. The introduction of "Baby Webmind society" is also crucial here, since after all children learn from each other, and the relevance of this learning source becomes progressively greater - look at teenagers!
We can feed a WAE knowledge, using KNOW data files. It can learn from analyzing text and numerical data, that it observes on the Web. But ultimately, this is not enough. To realize its true potential, a Webmind system must learn from its experience, not in isolation, but in the company of humans. In order for a Webmind to grow into a mature and properly self-aware Webmind, it must be interacted with closely, and taught, much like a young human.
Existing WAE applications such as the Text Classification System, Webmind Search and Market Predictor can be viewed as specialist experts with a particular focal interest. A truly intelligent version of either of these would be an EIL-trained WAE which could make itself understood and perform relevant tasks within the context of financial markets or information retrieval or document categorization.
"The Baby Webmind (BWM) User Interface" provides a simple yet flexible medium within which Webmind can interact with humans. It has the following components, among others:
* A chat window, where we can chat with Webmind
* Reward and punishment buttons, which ideally should allow us to vary the
amount of reward or punishment (a very hard smack as opposed to just a plain
ordinary smack...)
* A way to enter our emotions in, along several dimensions
* A way for Webmind to show us its emotions
Initially, Baby Webmind's world consists of a database of files, which it interacts with via a series of operations representing its basic receptors and actuators.
Learning is guided by a basic motivational structure. Webmind wants to achieve its goals, and its Number One goal is to be happy. Its initial motivation in making conversational and other acts in the BWM interface is to make itself happy . Some initial and obvious requirements on Webmind's concept of happiness are:
1. If the humans interacting with it are happy, this increases Webmind's
happiness.
2. Discovering interesting things increases Webmind's happiness
3. Getting happiness tokens (from the user clicking the UI's reward button)
increases Webmind's happiness
The determinants of happiness in humans change as the human becomes more
mature, in ways that are evolutionarily programmed into the brain. We need to
effect this in Webmind as well, manually modifying the grounding of happiness in
the system's mind as it progresses through stages of maturity. Eventually this
process will be automated, once there are many Webminds being taught by many
other people and Webminds.
16. Emergence is critical to intelligence, on three levels: Within individual mind modules, among the mind modules forming a mind, and between minds in a society
From a systems-theory perspective, intelligence can be viewed as a series of metasystem transitions, each of which involving the parts of a system becoming in some sense subordinate to the whole. First, individual objects (nodes and links, in the WAE case) must act coherently to carry out particular mental functions - a transition from mind object to mind module. Then, the mind modules embodying these particular functions must all act coherently giving rise to a unified emergent mental system - a transition from mind module to mind. Then, a group of minds must act coherently together, creating an emergent cultural understanding that feeds back into each individual mind - a transition from mind to sociocultural group.
Emergence also relates to mind in the context of the "Ah Ha" effect, the experience had when a new way of representing a problem or method for treating it suddenly comes to mind. For the AI Webmind Engine, this corresponds to a metasystem transition in the collection of actors pertaining to a particular collection of stimuli or concepts. These will happen largely automatically, encouraged by various cognitive schema.
The WAE framework allows for all these levels of emergence to occur in a harmonious, synchronized way, which is what real intelligence requires. Of course, it's far from the only possible approach to AI which supports these three levels of emergence. However, it seems to be the only such approach to AI that is currently being seriously pursued.
One key point as regards emergence in WAE is that the topology of links in Webmind emerges as the system evolves. Any given link, representing a relation between two or more concepts, may be acted upon - its strength modified - by many different modules. The modules must learn to work together to produce an assemblage of meaningful links.
The abstract psynet model of mind posits two important emergent structures that are supposed to come about as a mind grows:
* A "dual network", consisting of hierarchical and heterarchical
subnetworks that operate smoothly together
* A self, i.e. a part of the mind network that resembles the whole
How do these hypothesized emergent structures come out of the Webmind architecture, and collective link-building by the various modules?
First let's address the dual network. Similarity links in Webmind are purely heterarchical. They are symmetric; activation spreads along them both ways with equal fluidity. Inheritance links on the other hand can be heterarchical or hierarchical. They are hierarchical, for example, when they have weight near 0 or 1.
The use of ConceptNodes representing catgories gives the system's link topology the character of "clusters within clusters within clusters..." that is associated with the dual network. This structure is obtained by having ConceptNodes that categorize basic data, and ConceptNodes that categorize ConceptNodes, etc. The division of a Psynet into multiple Lobes also imposes this same quasi-fractal structure, on a higher level: each lobe is a cluster, as links between lobes are discouraged due to the higher computational cost they incur. Intelligent load balancing will automatically align these two sources of quasi-fractality, by moving large categories into their own lobes. One will then have a structure in which each lobe corresponds to a category, roughly. A very large psynet may have subnets that correspond to large categories, with individual machines within the subnets corresponding to smaller categories.
The trick here is to have heterarchical and hierarchical processes work together, dynamically. Similarity links between category nodes and other nodes may be built, based on the link tables of the nodes, including similarity links and inheritance links. On the other hand, inheritance links may be built based on the link tables of the nodes, including similarity links and inheritance links. There are mechanisms in place to assure that circular reasoning will not occur. But even so, the inheritance and similarity links, the hierarchical and heterarchical links, must work well together, or none of the link weights will make any sense. We need mutual error-correcting between the processes building the hierarchical links and the processes building the heterarchical links. Then the dual network structure, which is embodied in the category and lobe structure of the psynet, will be intelligent and will be self-preserving due to the high link weights that it contains.
In sum, the static emergent of the dual network is a consequence of:
* The predisposition to dual network structure built into the link types and
node types of Webmind
* Dynamic emergence between processes building hierarchical and heterarchical
links
Now, what about the self? We have SelfNode, the ConceptNode for "I", whose links embody declarative knowledge about the system itself. This is useful, but, it is disturbingly non-emergent. There should also be some emergent self in the psynet. How might this come about? Quite naturally, via categorization. The highest levels of the category hierarchy are actually a coarse-grained image of the whole psynet. Inasmuch as the dual network structure characterizes the psynet, the dual network structure of the top-level categories will approximate this overall dual network. The SelfNode can then derive its image of Webmind primarily from study of these high-level category nodes, rather than from continual statistical study of the whole mind (though this may be useful too). And if the SelfNode is deleted, these high-level category nodes could regenerate an approximation of the current-reality-storing portions of the SelfNode.
The dual network and the self emerge in Webmind as consequences of dynamic emergence between modules, coupled with built-in node and link types. But, they may also be explicitly encouraged, by writing code that recognizes their presence and adapts overall system parameters to encourage their formation.
Beyond this general level, the here are many particular emergences in Webmind
that are important in their own right: syntax-semantics emergence,
perception-action emergence, self model - user model emergence, for example.
Perhaps the most interesting of all is reason-intuition emergence, which is of
special note because of its relevance to the history of AI. This involves
feedback between the inference and stimulus spreading subsystems, and is
dynamically quite subtle. But we'll return to these a little later, after some
other aspects of the system have been delved into in more detail.