Sunday, June 13, 1999

Wumpus World Agents
Scott Parent - Scott@DoctorDatabase.com

Overview

The game known as Wumpus World involves a hunter competing with a creature called a Wumpus. I have developed two agents and a test environment based on this game. Both the hunter and the Wumpus are considered participants in the game, and both are implemented as autonomous agents. The hunter is a knowledge-based agent that--looking one move ahead--considers his degrees of belief about the consequences of each action at his disposal. He then chooses the action he believes most likely to bring favorable results.

The Wumpus is implemented as a reflex agent with limited memory. The Wumpus generally likes to sleep.

Environment

The agents are designed for a variant of Wumpus World I have constructed in Visual Basic. This environment includes a labyrinth of caves (AKA rooms). Both the hunter and the Wumpus operate in these caves. In addition to his penchant for sleeping, the Wumpus has a strong odor and eats its hunters given the opportunity. This opportunity arises when the Wumpus and the hunter find themselves in the same room.

There is also a game clock which begins at 1 and increments after each participant takes his turn. These turns are described in the Actions section, below.

Goals

The hunter's goal is to shoot the Wumpus--which necessitates not be eaten by the Wumpus. The goal is stated in the hunter's knowledge base as, "Wumpus State Dead." So, if the Wumpus were to die of natural causes, that would theoretically do just as well. In general, though, this goal is accomplished by shooting an arrow in the Wumpus' direction. Which direction that is, however, is not directly accessible by the hunter. Rather, it must be inferred from clues in the environment.

The Wumpus' goal--ostensibly--is to get more sleep and perhaps snack on a hunter.

Actions

At each cycle of the game clock, the hunter may either Move into- or Shoot an arrow towards an adjoining cave. The hunter begins each game with an allotment of five arrows. Once the arrows are exhausted, however, shooting is no longer an option.

The Wumpus, too, may move to an adjoining room if he is awake. Like the hunter, he is constrained to one such move per game clock cycle. The Wumpus may also elect to Wait rather than move to a different room. Generally, though he prefers to make Sleep his choice for action.

Both participants may elect to Observe repeatedly without advancing the game clock. A Participant's turn must end, however, when he has run out of percepts. Also, either participant may choose to Leave the game. In practice, though, only the hunter will choose this action--when he has killed the Wumpus or when he has run out of arrows.

Percepts

At the beginning of a game, each participant is placed in a randomly chosen room. He then senses the name, as it were, of the room he is in--expressed as, "At <room-name>".

Whenever entering a room, the participant can sense the rooms to which his current location is connected, as in, "See Room <room-name>".

The hunter can smell the Wumpus from two rooms (Manhattan distance) away: "Smell Wumpus".

When the hunter shoots an arrow anywhere in the labyrinth, the Wumpus can hear it: "Hear Shot".

Both agents can also sense when no additional percepts are available from the environment. This meta-percept is expressed as, "Find nothing else".

Theory Of Operation

Hunter Agent

The hunter tries to infer which rooms are adjacent to each other based on the connections he observes between them. For each of the rooms he is aware of, then, he assesses his belief or doubt that the Wumpus may be in that room. Beliefs are measured on a linear and continuous scale from -1 to +1 where -1 represents absolute disbelief, 0 represents no knowledge and +1 represents absolute certainty.

Once he believes he has gathered all immediately available percepts, he considers a set of hypothetical situations. For each room connected to his current location, there will be a possible situation resulting from moving to that room. If the hunter still has arrows, then each connected room will also give rise to situations resulting from shooting at those room. In short, for each action at his disposal, he forecasts a situation expressed as Result( action, current-situation).

For each such result situation then, the hunter assesses a utility score:

g - d + [(i + s + a))*(1 - l)]

Where these terms are defined as:

g

Belief the goal will be accomplished (i.e. Wumpus will be dead)

d

Belief that Self (the hunter) will be killed

i

New information ratio: for rooms not yet visited, i = (1/t) where t is the current game time. For rooms already visited i = (-1/(t - v)) where v is the time of the most recent visit.

a

Utility of arrows remaining: if the hunter has 1 or more, then a = log10(|arrows|)+1; otherwise a = 0.

l

Belief that the situation will end the game: CombinedCertainty( g, d). Also, running out of arrows (a = 0) results in 100% belief that a situation is terminal.

s

Belief that the Wumpus will be sleeping

The rationale behind this combination begins with the idea that killing the Wumpus and getting killed are equal and opposite in value. The remaining factors are only important to the extent that the situation in question is not terminal (i.e. the game will continue after this situation). So, the sum of these scores is multiplied by (1 - l).

Once the utility of each predicted outcome has been assessed, the agent can choose an action. When so choosing, he discards any possibilities with a score less than the maximum for the set of current predictions. He then selects one of the remaining situations--those sharing the maximum score--at random. The chosen action, then, is that from which the selected situation results.

When the hunter hears the Wumpus scream, or he runs out of arrows, the only action he will choose is "Leave."

Wumpus Agent

The Wumpus agent is somewhat simpler than the hunter. If the Wumpus is sleeping, he will not wake up unless he hears a shot fired. Once awake, he may choose to go back to sleep, stay where he is or move to an adjoining room. Random selection between these activities is design to fit the following profile:

Game Environment

The game space is a square matrix of rooms. The size of the matrix is variable so long as it is as many rooms across as it is down. Again, VB's random number generator is employed to create a maze by knocking down random walls until all rooms have a path to all other rooms. The participants are each placed in randomly chosen rooms in the matrix. The game is then ready to run.

Once play has started it continues until one of the participants is dead--or play is stopped by the operator.

Game Clock and Situations

In order to economize on the number of actual situations that the knowledge base must process, a one-to-one relationship is maintained between clock cycles and actual situations. Predicted (possible) situations, then, are defined to occur at the clock cycle immediately following the situation on which the prediction is based. In other words, if situation s occurs at game clock time t then Result(action, s) occurs at time t + 1.

Architecture

The program, "Wumpus.exe," is coded in Visual Basic. The program loads a set of CLIPS source modules that define the knowledge base. The conceptual structure of the program is illustrated in the diagram below. The IAgent interface has only method: ActionOnPercept, which, as the name indicates, takes a percept and returns an action. Here, percepts and actions are all implemented as strings. Wumpus, then, is a direct implementation of IAgent.

Another implementation of IAgent is KBAgent: the knowledge based agent. KBAgent provides a version of ActionOnPercept that translates the problem to LISP-style function calls. It then delegates these function calls to a dynamically attached knowledge base. This knowledge base must, in turn, implement the IKnowledgeBase interface.

CLIPSKB is one such implementation of IKnowledgeBase. Since CLIPS function calls conform--largely--to a LISP-style, sentences can be passed through directly to CLIPS. And, the hunter's knowledge base is constructed in CLIPS.

The knowledge base is expressed in terms of Sentences and rules for combining those Sentences. Each Sentence has a subject and predicate. It is also attributed with a measure of belief (i.e. certainty) and a situation to which it belongs. For some Sentences--particularly those regarding relationships between rooms--situation information is largely ignored and the sentences are assumed to be true for an entire game.

The combination of CLIPSKB, KBAgent and the hunter's knowledge base make the hunter agent counterpart to Wumpus. Each of these agents, then are attached to a Participant object: the agent architecture for interacting with the GameEnvironment. Each participant object conveys the percepts from the environment to its respective agent. The participant object then effects the actions thus chosen back in the environment.

Project Definition Module

The file WumpusWorld.vbp contains the project and workspace definitions for the Visual Basic IDE.

Knowledge Base Definition Modules

The following modules contain CLIPS code.

File

Description

Situatn.clp

Foundational constructs for situations

Sentence.clp

Sentence, Inference and Conjunction constructs.

Possible.clp

Query support for possible situations and sentences associated therewith.

KB.clp

Knowledge Base constructs: support for adding sentences and making queries. Currently, only action-recommendation queries are tested and supported.

WW.clp

Wumpus World domain specific constructs

WWQuery.clp

Wumpus World prune and search constructs to support queries

Test.clp

Batches of statements for testing the knowledge base in the CLIPS environment.

 

Agent, Architecture and Environment Modules

Files

Description

CLIPSInputForm.frm
RouteForm.frm

Forms used for direct interaction with the CLIPS DLLs. See "User Interface Description," below.

WumpusWorldForm.frm
GameEnvironment.frm

User interface modules for configuring and running the simulation.

CLIPSEnvironment.cls
Route.cls
Routes.cls

Wrapper classes for various CLIPS features.

CLIPSKB.cls

Implementation of the IKnowledgeBase interface using the CLIPS Wrapper classes.

IAgent.cls
IKnowledgebase.cls

Abstract Interface Definitions for the Agent model.

KBAgent.cls

Somewhat generic implementation of a knowledge based agent. Given some hard-coded predicates, though, it is probably only useful in the WumpusWorld Domain at present.

Participant.cls
Room.cls
Wumpus.cls

Wumpus World domain-specific classes.

WWGlobals.bas

Global definitions and globally accessible data for the Wumpus World program.

WinAPI.bas

A collection of utility functions for interacting with the Windows API.

CLIPS DLLs and Associated Interface Code

File

Description

CLIPS.DLL
CLIPSHLL.DLL

DLL Implementation of CLIPS by Mark Tomlinson (http://ourworld.compuserve.com/homepages/marktoml/clipstuf.htm)

HLLDecl.bas

Interface definitions for CLIPSHLL.DLL, also supplied by Mark Tomlinson.

Development Tools

The Visual Basic code was developed using Microsoft's Visual Basic 5.0 IDE and NuMega's SmartCheck.

The CLIPS code was largely developed and tested using CLIPS 6.10 (http://www.ghg.net/clips/CLIPS.html) and CLIPS MDI Editor 1.1 by Ronald M. Shapiro.

Other Sources For Ideas

My initial exposure to Wumpus World came from our textbook, "Artificial Intelligence, A Modern Approach" by Stuart Russell and Peter Norvig (http://www.cs.berkeley.edu/~russell/aima.html). In particular, Russell and Norvig recommend Wumpus World as "an excellent testbed environment for intelligent agents."

In the Russel/Norvig Model, the hunter's goal is to seek out and obtain a hidden treasure. The idea to have the agent pursue killing the Wumpus, instead, came from, "Multiplayer Hunt the Wumpus," (http://scv.bu.edu/cgi-bin/wclTF) at Boston University--apparently authored by Glenn Bresnahan (http://scv.bu.edu/SCV/gjb.html).

User Interface Description

The Wumpus.exe program initializes CLIPS and loads the hunter's knowledge base into the CLIPS environment. The program has evolved to allow three kinds of interaction with this knowledge base:

Main Window

After loading the hunter's knowledge base, the program presents two windows: one titled, "CLIPS Input," and the other, "Wumpus World." The CLIPS Input window is described below. The "Wumpus World" window facilitates both Agent-Level interaction and configuration for Simulation-Level interaction.

The top four controls address the Agent-Level communication. The "Game Time" text box displays the game-time fed along with the percepts the user supplies to the agent. It is determined by the form based on the agent's actions. Essentially, when a percept delivered to the agent from this form inspires the agent to take an action other than, "Observe," the clock advances.

Percepts are entered in the "Percept" text box. Percepts the agent will recognize are listed in the "Percepts" section at the beginning of this document. Once a percept has been typed in this box, the user can send it to the agent by clicking on the "Submit" button. And, then, the action chosen by the agent in response will be displayed in the "Action" text box.

Any kind of interaction with the knowledge base can be "dribbled" to a text file. Essentially, this logs a copy of CLIPS' "stdout" route to the selected file. To commence dribbling, enter a filename in the "Dribble To File" textbox. Once this box has some text in it, the checkbox to its left becomes enabled. Clicking on this begins the dribbling and places a check in the box. Clicking on it again stops the dribbling and removes the check.

Note that any existing files of the same name will be overwritten without warning whenever this box is checked.

The bottom three controls pertain to Simulation-Level interaction. The "Matrix Size" textbox allows the user to specify how big the labyrinth will be. A value greater than zero is required in this field--and dictates both the number of rows and columns of rooms in the labyrinth's matrix.

To begin a new game, the user clicks the "New Game" button. Doing so presents--or reconfigures--the "Game" window described in the next section.

The user may also elect to specify a random number seed. If specified, this value is applied to Visual Basic's random number generator which is in turn used to guide labyrinth construction and Wumpus behavior.

Game Environment Window

The "Game" window has two sections. On top is a graphical representation of the labyrinth. Black lines represent walls. A black square represents the Wumpus and a white square represents the hunter.

The bottom section--the "action list"--displays events that occur during the current game.

The "Game" menu at the top offers three choices: "Start", "Stop" and "Save." The first of these option starts the simulation running.

Once play is started, the action list gets filled with lines: one line per combination of percept fed to the agent action resulting therefrom. Also displayed in this window is the game time at the beginning of each game clock cycle.

The "Stop" option on the "Game" menu signals the environment to stop. This is the same signal either agent may send the environment via a "Leave" action--which the hunter will choose if he runs out of arrows. Apart from these events, however, the simulation will run until one of the participants is killed.

The last menu item, "Save," prompts the user for a file name and saves the contents of the action list to the file thus named. Note that it will overwrite any preexisting file with the same name.

CLIPS Route Windows

During interaction with CLIPS--in particular, any interaction with the knowledge base--CLIPS output is spooled out through its routes. Each route has a unique name. The main output route is named, "stdout." The "stdout" window will appear with the first such interaction, and it will display both the names of new sentences added to the knowledge base along with verbose explanations for inferences, utility scores and choices. The user may elect, however, to close this or any other route window should the user lack interest in such.

One important fact about these windows is that they begin to eat up memory and slow down overall program performance in accordance with the number of messages displayed. In long-running games this can even cause the program to crash. So, closing this window is a good idea if the user does not wish to view its contents--or if there is an active dribble file and the user can wait to look at the "stdout" output once the dribble file is closed.

Another route window of potential interest is the "wtrace" window. This window lists the knowledge base's responses to queries. In practice, this amounts to listing the actions chosen by the knowledge base. It alternately lists "past tense" variants of these actions--corresponding to invocations of MakeActionSentence given the chosen action.

CLIPS Input Window

In order to facilitate CLIPS-Level interaction, the program supplies a "CLIPS Input Window." CLIPS commands can be entered into this window. Any output resulting from these commands will appear in the route windows. Again, most output is, by default, routed to the "stdout" window. However, errors and warnings will appear in the "werror" and "wwarning" route windows, respectively.

Discoveries, Enhancements and Ideas

CLIPS Strengths and Weaknesses

I chose CLIPS because I find the language very useful for expressing combinations of fact-patterns. And the object-oriented extensions to the language (called, COOL for "CLIPS Object-Oriented Language") offer a nice paradigm for inheritance and message passing between objects. COOL also allows class-driven queries on the CLIPS environment--returning instances of a given class (and its subclasses) meeting specified criteria.

The drawback to COOL, though, is that instance specifications are not as easily read by humans as the non-COOL fact specifications. More often than not, however, I find the benefits of COOL to outweigh this drawback.

Another difficulty with CLIPS is that the official CLIPS implementations have some bugs--mostly surrounding COOL--that result in unsound behavior. Fortunately, the author of the DLL version seems to have implemented fixes for all those that I encountered. None the less, I needed to code some particularly unattractive workarounds in my low level code so that I could run it without the DLL.

Finally, the combination of forward-chaining and situation dependent sentences (along with unique measures of belief per combination of sentence and situation) can make for a very big pile of Sentences one must keep lying around. With longer running games, I found the resulting performance deterioration to be significant--sometimes requiring more than 60 seconds to arrive at a decision. Undoubtedly, though, my strategy for representation of these items could be optimized.

Improving the Hunter's Success

From my test runs I must conclude that, while the hunter is reasonably proficient at not being eaten, his Wumpus-hunting skills could use some improvement. In a majority of tests, the hunter used up his arrows before finding the Wumpus. I suspect that the heuristic function could be tuned to help this. In particular I would look at:

Other structural improvements to the knowledge base might include.