WordNet® Landscape: Senses and Terms

Scott Parent (scott@doctordatabase.com)
December 1999

WordNet defines a sense as, "A meaning of a word in WordNet." A given term will have one or more senses. A term with one sense is called monosemous, and a term with many is polysemous. The different senses associated with a polysemous term effectively represent homographs. A quick tally of the records in WordNet's database reveals that WordNet contains 99,642 distinct sense. Each sense has a prosaic definition and belongs to a single Lexicographer File Category. In turn, each of these categories is associated with a single part of speech. The four parts of speech supported by WordNet are:

Part Of Speech Number of Senses

Noun

66,025

Verb

12,127

Adjective

17,915

Adverb

3,575

The empirical given in the above table--as well as the remaining tables in this document--was gathered using my "ActiveWordNet" component in conjunction with Microsoft Excel.

Lexicographer File Categories

As mentioned above, WordNet partitions senses into categories. The following table shows the distribution of senses across these categories. The definitions in the center column are taken directly from the WordNet documentation.

Name

Definition

Sense Count

Ratio

adj.all

all adjective clusters

14,734

14.79%

adj.pert

relational adjectives (pertainyms)

3,099

3.11%

adv.all

all adverbs

3,575

3.59%

noun.Tops

unique beginners for nouns

35

0.04%

noun.act

nouns denoting acts or actions

5,372

5.39%

noun.animal

nouns denoting animals

7,294

7.32%

noun.artifact

nouns denoting man-made objects

9,810

9.85%

noun.attribute

nouns denoting attributes of people and objects

2,633

2.64%

noun.body

nouns denoting body parts

1,592

1.60%

noun.cognition

nouns denoting cognitive processes and contents

2,260

2.27%

noun.communication

nouns denoting communicative processes and contents

4,547

4.56%

noun.event

nouns denoting natural events

850

0.85%

noun.feeling

nouns denoting feelings and emotions

393

0.39%

noun.food

nouns denoting foods and drinks

2,377

2.39%

noun.group

nouns denoting groupings of people or objects

1,831

1.84%

noun.location

nouns denoting spatial position

2,123

2.13%

noun.motive

nouns denoting goals

40

0.04%

noun.object

nouns denoting natural objects (not man-made)

1,050

1.05%

noun.person

nouns denoting people

6,409

6.43%

noun.phenomenon

nouns denoting natural phenomena

523

0.52%

noun.plant

nouns denoting plants

7,872

7.90%

noun.possession

nouns denoting possession and transfer of possession

907

0.91%

noun.process

nouns denoting natural processes

521

0.52%

noun.quantity

nouns denoting quantities and units of measure

1,104

1.11%

noun.relation

nouns denoting relations between people or things or ideas

369

0.37%

noun.shape

nouns denoting two and three dimensional shapes

299

0.30%

noun.state

nouns denoting stable states of affairs

2,549

2.56%

noun.substance

nouns denoting substances

2,391

2.40%

noun.time

nouns denoting time and temporal relations

874

0.88%

verb.body

verbs of grooming, dressing and bodily care

495

0.50%

verb.change

verbs of size, temperature change, intensifying, etc.

2,006

2.01%

verb.cognition

verbs of thinking, judging, analyzing, doubting

635

0.64%

verb.communication

verbs of telling, asking, ordering, singing

1,388

1.39%

verb.competition

verbs of fighting, athletic activities

411

0.41%

verb.consumption

verbs of eating and drinking

229

0.23%

verb.contact

verbs of touching, hitting, tying, digging

1,953

1.96%

verb.creation

verbs of sewing, baking, painting, performing

606

0.61%

verb.emotion

verbs of feeling

303

0.30%

verb.motion

verbs of walking, flying, swimming

1,247

1.25%

verb.perception

verbs of seeing, hearing, feeling

410

0.41%

verb.possession

verbs of buying, selling, owning

688

0.69%

verb.social

verbs of political and social activities and events

1,007

1.01%

verb.stative

verbs of being, having, spatial relations

671

0.67%

verb.weather

verbs of raining, snowing, thawing, thundering

78

0.08%

adj.ppl

participial adjectives

82

0.08%

The names of these categories have two components: the part of speech and a qualifier within that part of speech. Four of these qualifiers are shared between a noun category and a verb category: The following table gives the combined counts for senses in categories with these qualifiers:

Body

2,087

2.09%

Cognition

2,895

2.91%

Communication

5,935

5.96%

Possession

1,595

1.60%

 

Term and Sense Relationships

WordNet provides a number of relationships between objects: lexical connections between terms and semantic associations between senses.(Miller) Two particular relations are examined here: synonyms and hypernyms. In WordNet, terms are synonymous if they share a sense. In other words, if terms t1 and t2 are both associated with a sense s, then t1 and t2 are synonyms of one-another. Just over half (54%) of all senses are associated with a single term. The mean number of terms per sense is 1.47. The maximum is 27, but only two senses have this many terms. The following chart shows the frequency distribution of mutual synonym counts. The frequency scale is logarithm, as apparently is the slope.

The following table shows mean synonym counts by category:

Group

Term-Sense Associations

Terms Per Sense

Noun

116,364

1.762423

Verb

22,073

1.820153

Adjective

29,892

1.668546

Adverb

5,679

1.588531

adj.all

25,781

1.749762

adj.pert

4,015

1.295579

adv.all

5,679

1.588531

noun.Tops

68

1.942857

noun.act

9,165

1.706069

noun.animal

14,315

1.962572

noun.artifact

15,493

1.579307

noun.attribute

4,913

1.865932

noun.body

2,834

1.780151

noun.cognition

3,747

1.657965

noun.communication

7,459

1.640422

noun.event

1,469

1.728235

noun.feeling

763

1.941476

noun.food

3,425

1.440892

noun.group

2,825

1.542873

noun.location

3,469

1.634008

noun.motive

72

1.8

noun.object

1,607

1.530476

noun.person

10,840

1.691372

noun.phenomenon

834

1.594646

noun.plant

18,373

2.333968

noun.possession

1,384

1.52591

noun.process

834

1.600768

noun.quantity

1,932

1.75

noun.relation

596

1.615176

noun.shape

510

1.705686

noun.state

4,186

1.642213

noun.substance

3,711

1.55207

noun.time

1,540

1.762014

verb.body

1,020

2.060606

verb.change

3,330

1.66002

verb.cognition

1,250

1.968504

verb.communication

2,772

1.997118

verb.competition

664

1.615572

verb.consumption

457

1.995633

verb.contact

3,289

1.684076

verb.creation

1,011

1.668317

verb.emotion

696

2.29703

verb.motion

2,251

1.805132

verb.perception

737

1.797561

verb.possession

1,216

1.767442

verb.social

1,961

1.947368

verb.stative

1,286

1.916542

verb.weather

133

1.705128

adj.ppl

96

1.170732

The groups that tend to have the highest numbers of mutual synonyms are noun.plant, verb.emotion and verb.body. Those with the lowest are adj.ppl and noun.food.

Sense Specificity

One of the properties calculated by the ActiveWordNet component is a sense's specificity. Senses for which no hypernym is defined are said to have a specificity value of zero. Senses with hypernyms have a specificity value one greater than the minimum specificity value of its hypernyms. In short, specificity is the minimum number of levels of hypernyms to a topmost, level-0, sense. A hypernym is a kind of generalization. For example, "edible fruit" is a hypernym of "apple."

Specificity

Senses

Ratio

0

22117

22%

1

3085

3%

2

4613

5%

3

6162

6%

4

8992

9%

5

15717

16%

6

13693

14%

7

10474

11%

8

7288

7%

9

3973

4%

10

1983

2%

11

775

1%

12

499

1%

13

221

0%

14

49

0%

15

1

0%

As the graph shows, adjectives and adverbs have no hypernyms, most (3,566) verbs have two levels of hypernyms, and most (15,135) nouns have five.

Terms

WordNet describes its string features as "words" and "collocations"--referring to word forms and phrases, respectively. Together they comprise 121,962 terms: approximately two-thirds the number of non-obsolete "main entries" in the Oxford English Dictionary. While every term in WordNet is associated with at least one sense, once term is associated with as many as 78. Terms can also be associated with different numbers of categories, category qualifiers and parts of speech. The following table shows the distribution of terms across each of these kinds of association counts. So, for instance, 110,247 terms are associated with only one part of speech; 10,648 are associated with two, etc.

Count of...

Parts of Speech

Qualifiers

Categories

1

110247

102947

102438

2

10648

11124

11371

3

997

4022

4092

4

70

1766

1825

5

919

969

6

456

491

7

295

307

8

159

163

9

73

83

10

69

67

11

50

58

12

37

44

13

20

18

14

10

15

15

2

5

16

7

7

17

4

4

18

3

19

2

2

The following graph charts the above data with the additional row--in back--of term distribution over sense count.

Words Versus Phrases

The final metric examined here is phrase size. The following table shows the distribution of terms across phrase-sizes--measured in number of words. Most terms are a single word. A comparable number are two-word, and the phrase sizes drop sharply thereafter.

Phrase size: Words

1

2

3

4

5

6

7

8

Terms

70803

45396

4881

713

129

29

7

4

WordNet is a registered trademark of Princeton University.