WordNet®--a product of the Cognitive Science Laboratory at Princeton University--is an online thesaurus featuring more than 170,000 combinations of terms and senses. WordNet defines a sense as, "A meaning of a word in WordNet."
ActiveWordNet.dll is a 32-bit, in-process automation server (a.k.a., "component"). It features a set of COM Interfaces that allow access to WordNet by ActiveX clients. Such clients include the applications in the Microsoft Office suite, development tools such as Visual Basic, and server-side scripting facilities like Active Server Pages.
The focus of this project is to create a solid foundation for COM access to WordNet. The key features of this first increment are:
Identification of noun, verb, adjective and adverb stems
Enumeration of senses for those stems
Enumeration of synonymous terms within a given sense
Identification of antonymous senses for a given sense
Identification of immediate hypernyms for a given sense
Provide a measure of specificity for a given sense based on these hypernyms
Access to WordNet's lexicographical subclassifications within part-of-speech
The component will feature a central Lexicon object which will provide access--via the ILexicon interface--to collections of terms, parts of speech and WordNet's lexicographer file categories. The properties that yield these collections behave as follows:
Terms property returns the Lexicon's Terms collection. The Item method of this collection yields Term objects. These objects will implement the ITerm interface--described below. In short, the client obtains information about a text token by supplying it to the Item() method as follows:
'w$ is a word or phrase in text form
The Lexicon's Terms collection is also capable of random access to--or iteration through--every term in WordNet's Sense Index. The Count property will yield the total number of terms therein, and the item property will accept an integer argument between 1 and the value returned by Count, inclusive. In other words, given an integer, i, item(i) will return the ith unique term in WordNet. Visual Basic style, "for each" iteration is also support through the IEnumVariant interface returned by the hidden NewEnum property.
Note: the first call to Count, Item( <integer>) or NewEnum suffers from a noticeable delay. The Lexicon object dynamically indexes the "Sense.idx" file in order to improve the response of all subsequent such calls.
LexCategories property exposes a collection of LexCategory objects. As indicated in the diagram below, these objects implement the ILexCategory interface. The Item method of this collection will accept either the category number or the category name as an index parameter.
PartsOfSpeech property--in similar form to the other two properties--returns a collection of PartOfSpeech objects implementing the IPartOfSpeech interface. And, as with LexCategories, the item property of this collection will accept either a number or name to yield a member object.
Senses property takes an integer argument indicating a part of speech and returns a collection of objects implementing the ISense interface. The values for the part of speech argument are 1, 2, 3 or 4 representing Noun, Verb, Adjective and Adverb, respectively. These values are the same as those for the Index property for the "IPartOfSpeech" Interface.
Features of the ITerm interface are as follows:
Text gets or sets the full text of the term's word or phrase. This text may not contain underscores--or, more specifically, underscores are converted to spaces.
Root property yields the term's stem for a given part of speech. Only integer arguments are currently supported. See the Index property of the "IPartOfSpeech" Interface for more information on these values. The string returned by this property does not contain spaces. If the term contains multiple words, they are connected with underscores.
FirstRoot looks through each part of speech in WordNet's order: (Noun, Verb, Adjective, Adverb) and returns the first stem, if any, found for the term.
Nouns, Verbs, Adjectives and Adverbs properties each yield a Senses collection. The collection thus yielded contains zero or more senses of the term belonging the to part of speech thus named. Members of a Senses collection implement the ISense interface.
IsNoun, IsVerb, IsAdjective and IsAdverb methods return Boolean values: True if one or more senses exist in the suggested part of speech; otherwise, False.
Senses property is an alternative way to access the Senses collections returned by the Nouns, Verbs, Adjectives and Adverbs properties. This property takes the same kind of integer argument as the Root property.
Sense objects implement the ISense interface. This interface features five operations:
Terms property returns a Terms collection. This collection contains one or more expressions of the sense--ergo synonyms. The Count and Item operations for this property's interface--ITerms--allow access to the individual terms. The Count property returns the number of terms thus contained. The item method yields Term objects identified either by a 1-based integer index or by an exact text match. And, enumerators will iterate through each of these items.
Definition property yields the text of WordNet's definition for the subject sense.
Antonyms property yields a Senses collection containing zero or more antonymous Sense objects.
Hypernyms also yields a Senses collection. This one contains only the immediate generalizations, if any, of the subject sense. In turn, the Hypernyms property of the Sense objects thus yielded may return more general hypernyms: the next level up, as it were.
LexCategory method returns an object representing the category to which--according to WordNet--this sense belongs.
Specificity property counts the levels of hypernym ancestors for the sense. Senses with no hypernyms have a specificity level of zero.
LexCategory objects represent WordNet's logical groupings called, "Lexicographer File Categories." This class implements four operations:
Index property returns an integer key to this category. In WordNet, this is referred to as the Lexicographer File Number.
Name property returns the category name in the format of "<part-of-speech-name>.<qualifier>".
PartOfSpeech property yields an object implementing the IPartOfSpeech interface.
Qualifier method returns only the qualifier portion of the category name--without the part of speech name.
PartOfSpeech objects have only two operations:
Index property returns the integer that WordNet associates with the part of speech. See "Syntactic Category" in the WordNet documentation.
Name property returns the part of speech name: "Noun", "Verb", etc.
The file, "AWN1.1.zip," contains the component as well as two example VB5 projects in source and executable form. To install the component,
The Senses property was added to ILexicon.
The "dispatch id" of the ILexicon.Terms was changed to 0. In short, Terms is now the default property of ILexicon.
The LexCategories and PartsOfSpeech properties were added to ILexicon as well.
Cycle detection was added to the ISense.Specificity property. This was done to handle senses that WordNet defines as hypernyms of themselves--either directly or indirectly. For example one sense of "load" is defined as a hypernym for a sense of "charge" which is, in turn, defined as a hypernym for "load." The specificity for such a situation is defined as -1 at the point of cycle detection.
A bug affecting the item property of the LexCategories collection was fixed. The bug manifested itself by not allowing access to the last element. It did not affect collections with a lower-bound of 1 and therefore did not impact Terms, Senses, or PartsOfSpeech.
The Lexicon object was added.
A bug in the sense search code was fixed. This bug caused the component to become unresponsive and eat all available memory while searching for some senses.
Please report any software defects to Scott@DoctorDatabase.com.
The software at this site is available "AS-IS." The author makes no warranties, express or implied, of merchantability or fitness for a particular purpose or that the use of this software will not infringe on the intellectual property rights of any third party.
WordNet is a registered trademark of Princeton University.