Qualia structure is an indispensable part of the theory of Generative Lexicon (GL). GL, itself, is a complex system allowing semantic compositionality of lexical items. In another sense, by allowing lexical items to take a semantic structure (rather than an atomic semantic unit), GL in general, and qualia structure in particular, allows semantic of lexical items to change accordingly to the context they are in. An example would demonstrate that
Mary enjoy a cup of coffee
John bought a cup of coffee
On the surface there is no difference between two instances of "cup of coffee", but semantically the first sentence command an implicit act of "drinking". Mary doesn't just enjoy the shape or the color of the cup, but she enjoy drinking it. This implicit sense of "drinking" is activated by the telic quale of an artificial typed object "cup". Here are the full list of qualia structure for a lexical item:
Formal: the basic category of which distinguishes the meaning of a word within a larger domain. It is the default qualia activation. For example, in "John bought a cup of coffee", formal quale activates an explanation of the cup as a merchandise object.
Constitutive: the relation between an object and its constituent parts. "I need to fill my pen", for example, activates this quale, as it refers to the pen's ink container.
Telic: the purpose or function of the object, if there is one. This quale type is typically associated with an artificial object, though natural object can activate it as well.
Agentive: the factors involved in the object’s origins or “coming into being”. For example, the quale involved in "John bakes a cake"
Qualia structure for chair described in first-order predicate logic
Qualia structure is important in common sense reasoning. One important application is reasoning on "object's affordance", which is tantamount to answer "what you can do to an object" (what you can do to a chair? sit on it, lean on its back, break its legs etc.). However, there is not much literature going on how to find qualia structure for an object. Our lab is pioneering in developing theory of Qualia structure and starting with a framework for annotation by Pustejovsky et al. (2008) called Generative Lexicon Markup Language (GLML).
Task description
In this project, I started a number of practical studies on learning of qualia structure. The first part involved study of practicality of using GLML for a large scale annotation task. The second part involved automatic extraction of qualia relationship between noun and verbs using Wordnet definition.
Annotation
In this subtask, I annotated a small set of documents from TimeBank corpus, using GLML guidelines. I annotated qualia relation between verb and object for about 20 documents. The task turned out to be very difficult, even for experts, because of two reasons: firstly, "formal" is the dominant qualia relation, leading to a very unbalanced dataset and occurrences of other qualia types are very scarse; secondly, even for a simple case, it is sometimes unclear which quale has been activated, for example in the following example, both formal and agentive quales are activated, because a gift can be considered an object transferable from person to person (formal), and the act of giving actually create the gift itself (agentive).
Use the enclosed card and give a generous gift to Goodwill today
Qualia discovery from gloss
Qualia discovery for lexical item is an interesting task, as it is typically a common subtask in common sense reasoning. The most interesting type of qualia relation is telic, so I used syntactic matching rules to extract "what an object is used for" from Wordnet synsets. Sentences are parsed using Stanford Core NLP, and matching on syntactic trees are carried out. Here are examples of Wordnet synsets:
paint.n.01: a substance used as a coating to protect or decorate a surface
bed.n.01: a piece of furniture that provides a place to sleep
Followings are the list of syntactic rule:
for V-ing or S to V-inf
: camera is equipment for taking photographs
that S V / that V
; abattoir is a building where animals are butchered
'used', 'intended', 'made', 'built', 'designed', ‘in order’ to V
: anti-inflammatory is a medicine intended to reduce inflammation
used in/on V-ing
: weapon is any instrument or instrumentality used in fighting or hunting
NP is a NP to V-inf
: crown is a wreath or garland worn on the head to signify victory
Moreover, telic relation could be inherited from Wordnet hierarchy, for example: "clothes" - "to be worn" infers "hat" -> "to be worn".
Telic verbs are found in 4779/11000 nouns. 3928 nouns are added to the result if resolution from hypernyms are used.
Existing problems
This process, however, turned out to be quite noisy. High quality patterns, such as "used to", are quite rare in Wordnet synset definitions. More frequent pattern, such as "NP is a NP to V-inf" doesn't have high precision.
Parsing problems:
Verbs and their objects in a conjunction need to be found. Example: A is something to B or C. C has conj_or dependency with B, no link to A.
Noun phrases of the form: the probability of A, the level of B etc. We want to find the object of the telic verb to be the head noun of A and B, instead of ‘probability’ or ‘level’.
Inheritance problems:
Parsing mistakes higher up in the tree spoil the telic quality of its hyponyms
Need to find additional links when inheriting the telic role. For example, A is drug for B. Let’s assume that drug has telic ‘treat illness’, A inherit that telic role, but it should be better to be ‘treat B’.
The telic verbs of the hypernym might not really make sense for the hyponyms. Example: tourist_class.n.01, gloss = tourist_class is inexpensive accommodations on a ship or train. The telic role inherited from ‘accomodations’ is ‘to live’!
Recent development
A recent paper shows that a combination of using distributed representation (word vector) and a set of seeding pairs of (object, verb) could be used to generate a prototype vector for telic verb. In short, it shows that by using an average subtracted vector of the seeding pairs, you can get a good translation vector from noun to the verbs that could be its telic verbs Fulda et. al (2017).
References
Pustejovsky, James. "The generative lexicon." Computational linguistics 17.4 (1991): 409-441.
Pustejovsky, J., Rumshisky, A., Moszkowicz, J. L., & Batiukova, O. (2008, September). GLML: A generative lexicon markup language. In Proceedings of the Generative Lexicon Workshop, Instituto di Linguistica Computazionale (CNR), Pisa, Italy.