Language Resources and Evaluation

, Volume 50, Issue 2, pp 263–289

Predicate Matrix: automatically extending the semantic interoperability between predicate resources

  • Maddalen Lopez de Lacalle
  • Egoitz Laparra
  • Itziar Aldabe
  • German Rigau
Original Paper

DOI: 10.1007/s10579-016-9348-5

Cite this article as:
Lopez de Lacalle, M., Laparra, E., Aldabe, I. et al. Lang Resources & Evaluation (2016) 50: 263. doi:10.1007/s10579-016-9348-5
  • 132 Downloads

Abstract

This paper presents a novel approach to improve the interoperability between four semantic resources that incorporate predicate information. Our proposal defines a set of automatic methods for mapping the semantic knowledge included in WordNet, VerbNet, PropBank and FrameNet. We use advanced graph-based word sense disambiguation algorithms and corpus alignment methods to automatically establish the appropriate mappings among their lexical entries and roles. We study different settings for each method using SemLink as a gold-standard for evaluation. The results show that the new approach provides productive and reliable mappings. In fact, the mappings obtained automatically outnumber the set of original mappings in SemLink. Finally, we also present a new version of the Predicate Matrix, a lexical-semantic resource resulting from the integration of the mappings obtained by our automatic methods and SemLink.

Keywords

Verbal lexicon WordNet VerbNet FrameNet  PropBank SemLink 

1 Introduction

Predicate models such as FrameNet (Baker et al. 1997), VerbNet (Kipper 2005) or PropBank (Palmer et al. 2005) are core resources in most advanced Natural Language Processing (NLP) tasks, such as Question Answering (Shen and Lapata 2007), Textual Entailment (Burchardt and Pennacchiotti 2008) or Information Extraction (Presutti et al. 2012). Most of the systems with Natural Language Understanding capabilities require a very rich and precise semantic knowledge at the predicate-argument level. This type of knowledge allows to identify the underlying typical participants of a particular event independently of its realization in the text. Thus, using these models, different linguistic phenomena expressing the same event, such as active/passive transformations, verb alternations, nominalizations and implicit realizations can be harmonized into a common semantic representation. Lately, several systems have been developed for shallow semantic parsing and explicit and implicit semantic role labeling (SRL) exploiting these resources (Erk and Pado 2004; Shi and Mihalcea 2005; Giuglea and Moschitti 2006; Das et al. 2010; Laparra and Rigau 2013). However, all these resources use different background predicate models guided by different design criteria and linguistic principles. At the same time, these resources offer interesting characteristics not provided by their alternatives. Unfortunately, since these semantic resources are developed independently and they are not integrated in a common platform, it becomes very difficult to exploit them together. Obviously, a common semantic framework would allow the interoperability among all these tools and resources. However, building large and rich enough predicate models for broad–coverage semantic processing takes a great deal of expensive manual effort. Furthermore, the same effort should be invested for each different language (Subirats and Petruck 2003).

Plenty of previous research have been focused on the integration of resources targeted at knowledge about nouns and named entities. Well known examples are YAGO (Suchanek et al. 2007), Freebase (Bollacker et al. 2008), DBpedia (Bizer et al. 2009), BabelNet (Navigli and Ponzetto 2010) or UBY (Gurevych et al. 2012). However, less attention has been paid to the integration of existing models for verbs and predicates (Burchardt et al. 2005; Fellbaum and Baker 2013).

One of the few projects working on the integration of the predicate information is SemLink (Palmer 2009). SemLink has focused on mapping complementary lexical resources that associate semantic information with the propositions in a sentence. The resources integrated in SemLink vary in the representation level and detail of the predicate semantic information. SemLink aims at unifying these lexical resources at several different levels. First by providing type-to-type mappings between the lexical units for each framework. Then, for each lexical unit, SemLink also supplies a mapping between the semantic roles of PropBank and VerbNet, as well as the roles of VerbNet and FrameNet. However, SemLink has some limitations. First, its coverage is still far from being complete. Second, the mappings between resources have been manually developed. A very costly process which is also not systematic. Our proposal is to define automatic methods for mapping different semantic resources containing predicate information in order to allow semantic interoperability between them. Additionally, these methods also increase the number of mappings that are included in those resources manually aligned. Furthermore, the automatic methodology makes easier to maintain updated the set of mappings when improved versions of the knowledge resources (each one developed independently) are released.

In this work, we focus on the integration of predicate information at lexical and role levels. The lexical mappings are centralized in WordNet in order to offer a wider coverage. For that, we apply graph-based algorithms in three different scenarios: (a) mappings between WordNet and VerbNet lexicons; (b) mappings between WordNet and FrameNet lexicons; and (c) mappings between WordNet and PropBank lexicons. Regarding the roles, we propose two methods to infer new role mappings between the different predicate models. First, we have defined a three-step method to increase the alignments between VerbNet thematic-roles and FrameNet frame elements. Second, we present a corpus-based method to extend the mappings between FrameNet and PropBank.

As a result of this work, we have developed a new version of the Predicate Matrix (López de Lacalle et al. et al. 2014b, a)1 a lexical-semantic resource resulting from the automatic integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. The Predicate Matrix, currently in its version 1.2, arises from the union of SemLink and the set of mappings obtained by our automatic methods.

Tables 1 and 2 show the differences between SemLink and the Predicate Matrix in terms of mappings between lexicons (Table 1) and roles (Table 2). Thanks to the methodology we propose for creating automatic mappings between lexical entries and roles, we have obtained a much larger resource than SemLink. For example, SemLink provides 6934 mappings between FrameNet and VerbNet roles while using our methods it is possible to obtain 14,258 mappings.
Table 1

Differences between SemLink and the Predicate Matrix: mappings between lexicons

 

WN–VN

WN–PB

WN-FN

VN–PB

VN–FN

PB–FN

SemLink

7665

5489

4851

4503

3709

2562

Predicate Matrix

10,832

9516

8583

4947

5462

4163

WN WordNet, FN FrameNet, VN VerbNet, PB PropBank

Table 2

Differences between SemLink and the Predicate Matrix: mappings between roles

 

PB–VN

FN–VN

FN–PB

SemLink

9950

6934

4384

Predicate Matrix

11,749

14,258

14,195

WN WordNet, FN FrameNet, VN VerbNet, PB PropBank

We also expect to provide a more robust and complete interoperable lexicon between all these resources. For example, consider the verb struggle that belongs to the VerbNet class battle-36-4-1. Given the sentence “John struggled with Mary for the last piece of cake.”, an automatic semantic parser based on PropBank should annotate the sentence with the struggle.01 PropBank predicate and the roles \(\textit{arg}_0\), \(\textit{arg}_1\) and \(\textit{arg}_2\), as shown in Fig. 1.
Fig. 1

PropBank information obtained from an automatic semantic role labeling and the corresponding VerbNet mappings obtained from SemLink

By means of SemLink, we know that the struggle.01 PropBank predicate belongs to the VerbNet class battle-36-4-1 and the \(\textit{arg}_0\), \(\textit{arg}_1\) and \(\textit{arg}_2\) PropBank arguments are aligned to the VerbNet Agent, Co-Agent and Topic thematic-roles respectively. SemLink also offers the mapping to WordNet but for this particular predicate, it lacks information from FrameNet.

Thanks to the methodology followed in this work, the Predicate Matrix contains mappings to VerbNet and PropBank for the Hostile_encounter frame and the struggle.v lexical unit. It is also defined that Agent and \(\textit{arg}_0\) are equivalent to Side1 frame element in FrameNet, Co-Agent and \(\textit{arg}_1\) to Side2, and Topic and \(\textit{arg}_2\) correspond to Issue and Goal frame elements. In that way, the annotations of one particular semantic resource can be projected to any other of the resources integrated into the Predicate Matrix, as shown in Fig. 2. Moreover, now the rich predicate information encoded in FrameNet is also available for further semantic processing.
Fig. 2

FrameNet information obtained from an automatic semantic role labeling and the corresponding PropBank and VerbNet mappings obtained from the Predicate Matrix

The contributions of this work are twofold:
  1. 1.

    First, a methodology to automatically integrate predicate information from VerbNet, PropBank, FrameNet and WordNet at lexical and role levels;

     
  2. 2.

    Second, the Predicate Matrix, a new lexical-semantic resource resulting from the integration of the new automatic mappings between resources and the mappings already offered by SemLink.

     
This paper is organized as follows. Section 2 presents the sources of predicate information used for developing the Predicate Matrix. It also offers an analysis of SemLink. Section 3 details our methodology for building the Predicate Matrix. Section 3.1 presents the methods to create the lexical mappings and Sect. 3.2 the ones to expand the role mappings. Section 4 provides the details of the current version of the Predicate Matrix. Finally, Sect. 5 presents some concluding remarks and our current plans for future work.

2 Sources of predicate information

The Predicate Matrix (as well as SemLink) integrates predicate information from WordNet, VerbNet, FrameNet and PropBank. In this section we briefly describe each resource and the semantic information they offer.

WordNet2 (Fellbaum 1998) is a large lexical knowledge base for English. It contains information about English nouns, verbs, adjectives and adverbs and is organized around the notion of a synset. A synset is a set of words with the same part-of-speech that can be interchanged in a certain context. A synset is often further described by a gloss and by explicit semantic relations to other synsets.

VerbNet3 (Kipper 2005) is a hierarchical domain-independent verb lexicon for English. VerbNet is organized into verb classes following the work by Levin (1993). Each verbal class in VerbNet is described by thematic-roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates.

FrameNet4 (Baker et al. 1997) is a very rich semantic resource that contains descriptions and corpus annotations of English words following the paradigm of Frame Semantics (Fillmore 1976). In Frame Semantics, a Frame corresponds to a scenario that involves the interaction of a set of participants, playing a particular role in the scenario. FrameNet groups words or lexical units (LUs hereinafter) into coherent semantic classes or frames, and each frame is further characterized by a list of participants or frame elements (FEs hereinafter). Different word senses for a word are represented in FrameNet by assigning different frames.

PropBank5 (Palmer et al. 2005) aims to provide a wide corpus annotated with information about semantic propositions, including relations between the predicates and their arguments. PropBank also contains a description of the frame structures, called framesets, of each sense of every verb that belong to its lexicon. Unlike other similar resources, PropBank defines the arguments, or roles, of each verb individually.

We use all these resources to define the automatic methods (cf. Sect. 3) and to create the new Predicate Matrix 1.2 (cf. Section 4). More specifically, we use FrameNet 1.3, WordNet 3.0, VerbNet 3.2, PropBank 1.7 as well as SemLink 1.2.2.

2.1 SemLink

Similar to the proposal presented in this work, SemLink is one of the few projects working on the integration of the predicate information contained in these resources. This section briefly presents SemLink. For a more complete analysis see (López de Lacalle et al. et al. 2014b).

SemLink6 (Palmer 2009) is a project whose aim is to link together different predicate resources via a set of manual mappings. Currently, SemLink provides partial mappings between FrameNet, VerbNet, PropBank and WordNet. These mappings make possible to combine their information for tasks such as inferencing, consistency checking, interoperable semantic role labeling, or automatically extending their current overlapping coverage.

However, SemLink has some limitations. First, its coverage is still far from being complete. Second, a manual process has been followed to establish the mappings. Obviously, this is a very costly process which is also not systematic. Next, we present a brief analysis of the coverage of the mappings included in SemLink with respect to VerbNet as it uses VerbNet as its central resource.

2.1.1 WordNet and VerbNet alignment

Although VerbNet is one of the largest available verb lexicon, it does not reach the coverage of the verbal part of WordNet. Thus, 74 % of the WordNet senses are not aligned to VerbNet. The reasons for this unassigned information can be grouped into: a distinct granularities of both resources; b cases where the lemma does not exist in the VerbNet lexicon; and c lemmas that exist in both resources but there is no sense mapping between them. In addition, SemLink does not provide mappings to WordNet senses for 1077 VerbNet predicates.

2.1.2 PropBank and VerbNet alignment

The mapping between PropBank and VerbNet introduces additional complexity to the comparison of both resources. In this case, aligning the lexicons means that the arguments of the PropBank predicates must be aligned to the VerbNet thematic-roles. Regarding the lexicon mapping, half of PropBank predicate senses have their corresponding VerbNet predicate in SemLink and all the lemmas of PropBank are contained within the VerbNet lexicon. The number of VerbNet predicates that are not aligned to PropBank is smaller. Some of these VerbNet predicates do not exist in the PropBank lexicon and others are actually part of the PropBank lexicon but there is no alignment for them. Finally, as regards the PropBank arguments and the VerbNet thematic-roles, around half of the arguments and roles are not mapped in both directions.

2.1.3 FrameNet and VerbNet alignment

The alignment between FrameNet and VerbNet is very incomplete. Only 16 % out of the total LUs of FrameNet are aligned to, at least, one VerbNet predicate. In addition, 88 % of the FEs from FrameNet are not aligned to any VerbNet thematic-role and the existing mappings just use a few frames. However, at class level, most of the VerbNet thematic-roles appear to be aligned to at least one FE.

In contrast to SemLink, our proposal is to define methods to automatically map different semantic resources containing predicate information. Moreover, the union of both, the new automatic mappings and the ones provided by SemLink, offers a more complete and robust interoperable lexicon. Thus, the Predicate Matrix not only contains the automatically obtained mappings but also the mappings offered by SemLink. In the following sections, we first present the methodology (cf. Sect. 3) and then, the steps to create the current Predicate Matrix (cf. Sect. 4).

3 Methodology

We have already mentioned that the integration of predicate information is performed at two levels: lexical and role levels. Table 3 summarizes the type of mappings we present per section. Each section describes a method to obtain the mappings between resources as well as the evaluation results of the proposed method.
Table 3

Summary of lexical and role mappings

Sections

Mappings

Method

Lexical mappings

 3.1.1

WN–FN and VN

Disambiguation of lexicon

 3.1.2

WN–PB

Crossing SRL (predicates) and WSD corpus annotations

Role mappings

 3.2.1

FN–VN

Learning role patterns and frequencies

 3.2.2

FN–PB

Crossing SRL corpus annotations

WN WordNet, FN FrameNet, VN VerbNet, PB PropBank

All the mappings obtained at the lexical level are based on graph-based word sense disambiguation (WSD) algorithms. The lexical mappings from WordNet to FrameNet and VerbNet are obtained by applying WSD algorithms to semantically coherent groupings of verbal entries (see Sect. 3.1.1). The lexical mappings from WordNet to PropBank are obtained by applying WSD to a corpus annotated with PropBank predicates (see Sect. 3.1.2). We have not created new mappings between PropBank and VerbNet because PropBank already offers this information and its coverage is nearly complete.

As it happens with the lexical mappings, PropBank also offers quite complete role mappings between PropBank and VerbNet. Thus, we concentrate our efforts on finding new role mappings between FrameNet and VerbNet and between FrameNet and PropBank. The mappings between FrameNet FEs and VerbNet thematic-roles are obtained following a three-step methodology (see Sect. 3.2.1). A corpus-based method is used to automatically create new role mappings between FrameNet and PropBank (see Sect. 3.2.2). This method obtains mappings between predicates and roles at the same time.

3.1 Lexical mappings

The methods for extending the mappings between lexical entries are based on a graph-based WSD approach which uses WordNet as a background knowledge base. Following (Laparra and Rigau 2009; Laparra et al. 2010; López de Lacalle et al. 2014a), we apply knowledge-based WSD algorithms that use a large-scale graph of concepts derived from WordNet to disambiguate the entries from the lexicons.

In the case of FrameNet and VerbNet, the graph-based WSD algorithms are applied to coherent groupings of words belonging to the same FrameNet frame or VerbNet class. For PropBank, the WSD approach is applied to a corpus annotated with predicates. In all cases, the disambiguation provides new links between those verbal entries and the WordNet senses. Thus, we can connect verbs from different resources that are connected to the same WordNet sense.

We tested two different graph-based WSD algorithms. An advanced version of the structural semantic interconnections algorithm (SSI) (Navigli and Velardi 2005) called SSI-Dijkstra+ (SSID+) (Cuadros and Rigau 2008; Laparra and Rigau 2009; Laparra et al. 2010) and UKB (Agirre and Soroa 2009). SSI-Dijkstra+ is a greedy graph algorithm that disambiguates a set of words by calculating the shortest path distances between word senses. UKB applies the Personalized PageRank on a graph to rank the possible senses and perform disambiguation. Both algorithms use the graph formed by the senses and the semantic relations of WordNet.

3.1.1 WordNet–FrameNet and VerbNet

We extend the lexical mappings from VerbNet and FrameNet to WordNet taking advantage of the fact that both resources group semantically related lemmas in coherent semantic classes or frames. Our strategy is to apply a WSD algorithm using those groupings as contexts. For that, we have used UKB (Agirre and Soroa 2009) and SSID+ (Laparra et al. 2010).

Although FrameNet covers 10,195 LUs and 795 frames, only 721 frames have at least a LU associated. WordNet recognizes the lemmas of 10,086 LUs that is, 98 % word-frames pairs, which corresponds to 708 frames and 2867 verbs. In FrameNet, the LUs of a frame can be nouns, verbs, adjectives and adverbs representing a coherent and closely related set of meanings that can be viewed as a small semantic field. For example, the frame Education_teaching contains LUs referring to the educational activity and their participants. It is evoked by LUs like \({\textit{cram}}_v\), \({\textit{instruction}}_n\), \({\textit{instruct}}_v\), \({\textit{learn}}_v\), \({\textit{lecturer}}_n\), \({\textit{study}}_v\), etc. The frame also defines core FEs such as STUDENT or SUBJECT that are semantic participants of the frame and their corresponding LUs.

VerbNet also groups semantically related verbs. It groups 4403 verbs in 386 classes and sub-classes. WordNet recognizes the lemmas of 6078 verbal senses, that is, 97 % verb-class pairs. For instance, the VerbNet class learn-14 groups together verbs like assimilate, cram, glean, learn, memorize or read. This VerbNet class also defines a set of thematic-roles: Agent, Source and Topic.

Evaluation As SemLink includes some manual assignments of WordNet senses to VerbNet and FrameNet, we can use them to evaluate the accuracy of the automatic mappings. For the evaluation, we used as gold-standard 272 VerbNet classes and their associated verbs and 214 FrameNet frames having at least one WordNet sense manually assigned to a verb. The average length of the contexts is 23.30 verbs for VerbNet and 19.38 LUs for FrameNet. We have built a baseline system which assigns to each verb the most frequent sense according to WordNet.

Table 4 presents the precision (P), recall (R) and F1 measure (harmonic mean of recall and precision) of the different methods and knowledge resources when mapping WordNet to VerbNet and FrameNet. WN stands for the Lexical Knowledge Base (LKB) built using only the relations from WordNet while WN + G refers to the LKB also integrating the relations from the semantically tagged glosses.7 Table 4 also presents the baseline system results. We observe very high results and robust behavior independently of the WSD algorithm and LKB, and in every case the baseline is widely outperformed. We could expect even higher results when also including the gold-standard cases from SemLink in the WSD process.
Table 4

Results of the disambiguation process when mapping WordNet to VerbNet and FrameNet

Method

LKB

P

R

F1

VerbNet

 Baseline

18.7

15.4

16.9

 UKB

WN

84.2

84.2

84.2

 UKB

WN + G

85.3

85.3

85.3

 SSID+

WN

83.8

83.5

83.7

 SSID+

WN + G

83.8

83.5

83.7

FrameNet

 Baseline

72.5

70.4

71.4

 UKB

WN

79.0

79.0

79.0

 UKB

WN + G

79.4

79.4

79.4

 SSID+

WN

82.5

81.3

81.9

 SSID+

WN + G

82.9

81.8

82.4

3.1.2 WordNet–PropBank

In PropBank, each predicate, which has no relation with any other predicate, has its own unique role structure. For this reason, we propose a slightly different method to extend the lexical mappings between PropBank and WordNet. We use the WordNet based WSD algorithms to disambiguate a corpus annotated with PropBank predicates. Then, the method obtains the most common matches between the annotations of both resources over the same words, as in Fig. 3.
Fig. 3

Example of matching annotations of WordNet (WN) and PropBank (PB)

We use two different sources of contexts.8 First, the annotated subset of the PropBank corpus distributed by the CoNLL shared task, that assures a fully reliable SRL annotation for 500 documents. Second, the FrameNet corpus that includes 99 documents with continuous text and 168,519 sample sentences for the 64 % of the LUs. This corpus does not contain PropBank annotations, so the annotations must be obtained from an automatic SRL processing. To disambiguate the corpora with WordNet senses we use UKB and SSID+. To tag PropBank predicates on the FrameNet documents we apply the mate-tools9 (Bohnet 2010) pipeline. The pipeline includes a highly accurate SRL module that obtains 95.59 % F110 performance identifying the appropriate PropBank predicates (Björkelund and Hafdell 2009). In this way, we obtain a full set of documents containing both PropBank (some of them manually annotated and others predicted) and WordNet annotations. By crossing both annotations we obtain PropBank predicates and WordNet senses for some words. Then, for each predicate we select its most frequent corresponding sense obtaining a set of mappings between the lexicon of both resources.

Evaluation In the case of PropBank, we build a gold-standard by recovering from SemLink the set of predicates manually connected to WordNet senses. We also built a baseline system which matches the most frequent predicate in the PropBank corpus with the most frequent sense according to WordNet. For instance, in the case of the verb sell, the baseline system matches sell.01 and sell%2:40:00::.

Table 5 presents the precision (P), recall (R) and F1 measure of the different methods and knowledge resources when mapping WordNet to PropBank as in Table 4. It also presents the baseline system results. All the strategies outperform the baseline in terms of F1 measure and, in general, the precision shows that our method generates quite reliable mappings.
Table 5

Results of the disambiguation process when mapping WordNet to PropBank

Method

LKB

P

R

F1

PropBank

 Baseline

74.9

24.0

36.4

 UKB

WN

71.3

58.0

64.0

 UKB

WN + G

70.7

57.2

63.2

 SSID+

WN

67.2

54.7

60.3

 SSID+

WN + G

68.3

55.3

61.1

Table 6 shows the number of mappings between WordNet and VerbNet, FrameNet and PropBank in SemLink and the number of mappings obtained by the best configuration of our automatic methods. It also compares the number of new and common mappings obtained automatically with respect to SemLink. In all the cases, the number of predicates mapped automatically is higher than the predicates mapped in SemLink. Note that our methods only connect each predicate to a single synset of WordNet while SemLink includes several possible links. For example, SemLink takes into account 3137 predicates of PropBank but they add up to 5489 mappings to WordNet. On the other hand, we automatically obtain 4484 links corresponding to exactly the same number of predicates. From these, 2924 automatic mappings are completely new.
Table 6

Links between VerbNet, FrameNet, PropBank predicates and WordNet synsets

 

SemLink

Automatic

Intersection

New

VN–WN

7665 (5255)

6081

4131

1950

FN–WN

4851 (2419)

3877

1842

2035

PB–WN

5489 (3137)

4848

1924

2924

In parentheses the number of predicates covered by the corresponding set of mappings in SemLink

Both results show the appropriateness of our methodology to obtain mappings between WordNet and FrameNet, VerbNet and PropBank. The automatic methods obtain good results in general and the number of mappings is higher compared to the number of mapping offered by SemLink.

3.2 Role mappings

In order to infer new role mappings among different predicate schemas, we have defined two methods. Section 3.2.1 presents a three-step process to increase the alignments between VerbNet thematic-roles and FrameNet FEs. Section 3.2.2 explains the corpus-based method used to extend the mappings between FrameNet and PropBank.

3.2.1 FrameNet–VerbNet

This method focuses on obtaining the missing correspondences between the semantic roles from VerbNet and FrameNet. The missing links can belong to verbs already included in SemLink or to the verb senses obtained applying the methods presented in Sect. 3.1. The method comprises three different steps that should be applied consecutively. We have set two alternative configurations:
  • Configuration 1-2-3: it runs Step 1 to 3 and it uses information contained in SemLink;

  • Configuration 2-3: it runs Step 2 and 3 and it is completely independent from SemLink.

Step 1: The first step learns from SemLink which alignments between VerbNet thematic-roles and FrameNet FE names are more frequent independently of the FrameNet frame. For example, Table 7 shows the frequencies of the alignments for the thematic-role Location.
Table 7

Frequencies of the FE names mapped to the thematic-role Location in SemLink

Thematic-role

FrameElement

Freq.

Location

Area

383

Location

Goal

322

Location

Path

177

Location

Ground

78

Location

Sound_source

76

Location

Fixed_location

50

Location

Source

49

Location

Place

41

Location

Location

25

Location

Body_part

21

For every verb of VerbNet aligned to a frame of FrameNet, we obtain the thematic-roles that have not been assigned to any FE. Then, we link each of these roles with the most frequently aligned FE in the whole set of frames. For example, the verb paddle of the VerbNet class spank-18.3 is mapped to the frame Corporal_punishment of FrameNet. However, the role Location of this verb is not linked to any FE. The frame Corporal_punishment contains FEs like Agent, Evaluee, Reason, Instrument, Degree and Body_part. According to the data showed in Table 7, Body_part is the FE of the frame Corporal_punishment that is mapped to the thematic-role Location in a greater number of times. Thus, we map Location to Body_part.

In Table 8 we present this new mapping and some other Location cases obtained by this method.
Table 8

Examples of new FEs mapped to the thematic-role Location

Lemma

VN-class

Thematic-role

FN-frame

FrameElement

Sit

Spatial_configuration-47.6

Location

Placing

Area

Spew

Substance_emission-43.4

Location

Excreting

Goal

Move

Roll-51.3.1

Location

Change_position_on_a_scale

Path

Paddle

Spank-18.3

Location

Corporal_punishment

Body_part

Step 2: For those verbs from VerbNet that are mapped to one particular frame of FrameNet, but none of their thematic-roles are linked to any FE, this step aligns the thematic-roles and the FEs based on pattern frequencies. This step looks into the examples of use contained in VerbNet to acquire patterns of thematic-roles for each class. Given the following sentence:
  • \(\hbox {I}_{Experiencer}\) saw the \(\hbox {play}_{Stimulus}\)

This step obtains the pattern Experiencer-verb-Stimulus for the VerbNet class see-30.1.
The same process is performed looking into the lexicographic annotations of FrameNet to obtain patterns of core FEs for each frame, like in the following example:
  • ... \(\hbox {she}_{Cognizer\_agent}\) felt for \(\hbox {it}_{Sought\_entity}\) with her right hand ...

In this case, the pattern Cognizer_agent-verb-Sought_entity for the frame Seeking is acquired.

Then, when a verb from VerbNet is mapped to a frame of FrameNet, the most frequent thematic-role pattern for the class of the verb is aligned to the most frequent FE pattern for the frame. In this way, the thematic-roles and the FEs that share the same positions are mapped.

For instance, the verb feel of the class see-30.1 is mapped to the frame Seeking, but none of its thematic-roles (Experiencer and Stimulus) are linked to any of the FEs of the frame Seeking. Table 9 presents the pattern frequencies obtained for the class see-30.1 and the frame Seeking. In this particular case, the step just finds examples that follow the pattern Experiencer-verb-Stimulus for the class see-30.1 and two patterns for the frame Seeking.
Table 9

Frequencies of the role patterns in VerbNet class see-30.1 and frame Seeking

Source

Class/frame

Pattern

Freq. (%)

VerbNet

See-30.1

Experiencer

v

Stimulus

100

FrameNet

Seeking

Cognizer_agent

v

Sought_entity

68.6

Sought_entity

v

Cognizer_agent

31.4

After comparing the most frequent ones, the method aligns the thematic-roles and the FEs that share the same positions. According to Table 9, the most frequent pattern for the frame Seeking is Cognizer_agent-verb-Sought_entity. Thus, as Table 10 shows, the method links the thematic-role Experiencer with the FE Cognizer_agent and Stimulus with Sought_entity because they appear in the same relative position with respect to the verb.
Table 10

Examples of new mappings between thematic-roles and FEs of the frame Seeking

Lemma

VN-class

Thematic-role

FN-frame

FrameElement

Feel

See-30.1

Experiencer

Seeking

Cognizer_agent

Feel

See-30.1

Stimulus

Seeking

Sought_entity

Listen

Peer-30.3

Experiencer

Seeking

Cognizer_agent

Listen

Peer-30.3

Stimulus

Seeking

Sought_entity

Step 3: This last step follows the same strategy as Step 1, but it includes the role mappings obtained automatically. As it is presented in Table 11, if we include the automatic links from Steps 1 and 2,11 the frequencies of the mappings between FEs and thematic-roles are different to those obtained in Step 1 (see Table 7).
Table 11

Frequencies of FEs mapped to the thematic-role Location including the automatic links obtained in Step 1 and Step 2

Thematic-role

FrameElement

Freq.

Location

Area

341

Location

Goal

213

Location

Place

148

Location

Path

145

Location

Ground

111

Location

Source

83

Location

Sound_source

78

Location

Location

71

Evaluation For this evaluation, we have used as testing set the existing 6,934 SemLink role alignments between FrameNet and VerbNet. The evaluation process has been the same as the one used for the lexical mappings (cf. Sect. 3.1). For each role mapping we apply a leave-one-out evaluation process. We learn the frequencies from the whole SemLink except the one we are evaluating. This process allows to use the full set of role mappings from SemLink as a gold-standard. Thanks to this process, we have evaluated the method with two different configurations: Configuration 1-2-3 and Configuration 2-3. We have also compared the configurations with a baseline system. For each verb, the baseline matches the most frequent thematic-role in the examples of use of VerbNet with the most frequent FE in the lexicographic annotations contained in FrameNet.

Table 12 contains the number of alignments when executing Configuration 1-2-3. The table shows how each step increments the number of cases covered by the previous method and it also includes the individual evaluation of the methods. It also presents the evaluation results of the baseline system. As it can be seen, the three methods outperform the baseline by more than 30 points in terms of F1 measure.
Table 12

Number of new role alignments and performance when executing Configuration 1-2-3

Method

New

Total

P

R

F1

SemLink

6934

Baseline

10,189

39.9

21.6

28.0

Step 1

4611

11,545

89.0

88.3

88.6

Step 2

407

11,952

72.3

49.0

58.4

Step 3

523

12,475

81.7

81.1

81.4

The results show that the majority of the new mappings are obtained by Step 1. Steps 2 and 3 are less productive when exploiting SemLink frequencies.

Table 13 presents the results when executing Configuration 2-3. This configuration does not require any manual mapping so this configuration provides a fully automatic set of new mappings. The number of final mappings is similar to those obtained by the previous method (see Table 12). In this case, the baseline is also widely outperformed.
Table 13

Number of new role alignments and performance when executing Configuration 2-3

Method

New

Total

P

R

F1

SemLink

6934

Baseline

10,189

39.9

21.6

28.0

Step 2

7132

7132

72.3

49.0

58.4

Step 3

4137

11,269

63.9

62.0

62.9

As expected, according to the evaluations shown in Tables 12 and 13, the most reliable set of mappings is obtained when using previous manual information, that is, Configuration 1-2-3. The influence of this knowledge is more evident comparing the results of the Step 1 and Step 3. Note that these two steps are fundamentally the same but they work with different sets of role-mapping frequencies. The frequencies used in Step 1 are learned directly from SemLink, while in Step 3, the frequencies are calculated adding the new mappings discovered by the previous steps. Obviously, this introduces some noise into the process. In our second configuration, the frequencies for Step 3 are obtained without taking into account SemLink. For that reason, the results in this case are lower.

3.2.2 FrameNet–PropBank

To obtain the role mappings between PropBank and FrameNet, our method acquires the most common correspondences between the annotations of both resources over the same sentences (cf. Fig. 4). The idea is to obtain first a corpus with gold FrameNet annotations and automatic PropBank annotations and a corpus with gold PropBank annotations and automatic FrameNet annotations. Then, we cross the annotations on both corpora to collect the coincidences. This way, we obtain pairs <PB-predicate-argument, FN-frame-frame-element> when the filler of one PropBank argument matches a FrameNet FE or viceversa.
Fig. 4

Example of matching annotations of FrameNet (FN) and PropBank (PB)

To assure a fully reliable annotation, we exploit existing manually annotated FrameNet and PropBank corpora. The FrameNet corpus can be divided in two different sets. On the one hand, FrameNet version 1.3 includes 168,519 sample sentences for the 64 % of the LUs. On the other hand, it contains continuous text annotations for 99 documents from different sources as WikiNews or the American National Corpus. In the PropBank corpus, the syntactic trees of the Penn Treebank Wall Street Journal data are enriched with PropBank predicate-argument relations. In this work, we use a subset of 500 different documents distributed by the CoNLL shared-task.

To automatically obtain the corresponding counterparts of the data presented above we have made use of two available tools that offer state-of-the-art results on SRL using FrameNet and PropBank. For the FrameNet based annotations we use SEMAFOR12 (Chen et al. 2010). The parser provides both frame and FE identification with an overall performance of 62.76 % precision and 41.89 % recall. The SEMAFOR package includes a modified version of the MST Parser (McDonald et al. 2005) to obtain the required syntactic dependencies. The PropBank based annotation has been done using the mate-tools13 (Bohnet 2010). It is a complete multilingual NLP pipeline that includes a highly accurate SRL module that obtains 79.29 % F114 performance labeling arguments (Björkelund and Hafdell 2009).

In this way, we obtain one corpus with manual FrameNet annotations and predicted PropBank annotations using mate-tools. Similarly, we also generate another corpus with manual PropBank annotations and predicted FrameNet annotations using SEMAFOR. We cross both annotations and then we follow two different strategies to obtain different sets of mappings.

The first strategy filters out the cases we consider too infrequent by setting a threshold of more than T cases per pair <PB-predicate-argument, FN-frame-frame-element>. We apply different values of T obtaining different sets of mappings. Finally, we select the most common ones for each predicate. For example, for the predicate retail.01 we obtain that the \(\hbox {arg}_1\) and the \(\hbox {arg}_3\) match most frequently the FEs Goods and Money of the frame Commerce_sell respectively. However, following this strategy the arguments \(\hbox {arg}_1\) and \(\hbox {arg}_3\) of retail.01 could be also assigned to other FEs of other frames, as long as they overcome the threshold T.

The second strategy selects for each PropBank argument only its most frequent mapping to a FrameNet FE. We first calculate the most common coincidences between PropBank predicates and FrameNet frames. Then, for each predicate we establish a mapping with only one frame. After that, we obtain the most frequent <PB-predicate-argument, FN-frame-frame-element> pair that fits that mapping. As a result, for each argument of each predicate we gather a single mapping with a FE. For example, with this strategy we also map the arguments \(\hbox {arg}_1\) and the \(\hbox {arg}_3\) of the predicate retail.01 with the FEs Goods and Money of the frame Commerce_sell respectively, but unlike the previous strategy, no more mappings can be produced for these arguments.

Note that following these cross-annotation strategies, we generate mappings between predicates and roles at the same time because the pairs obtained crossing the annotations contain both types of information. The previous example, <retail.01-\(\textit{arg}_1\), Commerce_sell-Goods>, contains a relation between the predicate retail.01 and the frame Commerce_sell and also a relation between the \(\hbox {arg}_1\) of that predicate and the FE Goods of that frame.

Evaluation We perform two different evaluations. On the one hand, we evaluate the mappings between PropBank predicates and FrameNet frames. For this, we use the set of 2562 manual mappings of SemLink. On the other hand, we evaluate the mappings between arguments and FEs. Similarly, we use as the testing set the 4394 mappings existing in SemLink. We have implemented a baseline that matches the most frequent <predicate-argument> pair in the manual PropBank annotation with the most frequent <frame-frame-element> pair in the manual FrameNet annotations.

The results in Table 14 contains the performances of both strategies and the baseline. For the first strategy we provide the evaluation with different threshold T values. The performance of our second strategy is showed in the Only-one row. The results show that the baseline is outperformed except when we map predicates using our first strategy with a threshold equal to 7. According to Table 14, our second strategy provides the automatic mappings with the highest precision, both for predicates and roles. Obviously, the best recall is obtained by our first strategy with the lowest threshold values, specially for \(\mathbf T =0\), because they are the least restrictive methods.
Table 14

Results of the evaluation of the cross-annotation process

Method

Predicates

Roles

P

R

F1

P

R

F1

Baseline

78.2

47.8

59.3

13.3

22.5

16.7

\(\mathbf T =0\)

76.4

71.3

73.8

60.7

51.5

55.7

\(\mathbf T =1\)

81.3

64.4

71.9

65.5

47.1

54.8

\(\mathbf T =4\)

85.4

52.8

65.3

70.7

38.6

49.9

\(\mathbf T =7\)

86.9

44.7

59.0

73.2

31.8

44.4

Only-one

89.8

52.4

66.2

75.0

41.2

53.2

Table 15 shows, for different values of T, the number of mappings obtained from the first cross-annotation strategy and the number of mappings given by our second strategy (Only-one). The table presents, in the New columns, how many of these automatic mappings are new. Note that our method obtains mappings for the core arguments of PropBank (\(\textit{arg}_0\), \(\textit{arg}_1\),...) and also for non-core arguments like \(\textit{arg}_{loc}\) or \(\textit{arg}_{tmp}\). The latter are not considered by SemLink. The last column (Core) presents the number of new mappings involving core arguments.
Table 15

Number of mappings obtained with different values of T compared to SemLink

 

Predicates

Roles

Total

New

Total

New

Core

SemLink

2562

4394

4394

\(\mathbf T =0\)

3865

2038

13,582

11,321

6095

\(\mathbf T =1\)

3061

1411

8892

6820

4282

\(\mathbf T =4\)

2255

901

5156

3462

2679

\(\mathbf T =7\)

1845

701

3667

2268

1941

Only-one

2584

1242

9820

8011

4117

As it can be seen, both configurations obtain a substantial number of new accurate mappings for predicates and roles. As expected our first strategy with \(\mathbf T =0\) is the configuration that provides the highest number of mappings. However, it is remarkable the high number of mappings obtained by the Only-one method, the configuration having the highest precision.

4 Predicate Matrix

In Sect. 3 we have presented a set of methods and techniques to automatically integrate different knowledge bases that contain predicate and role information. As shown, it is possible to build a new full resource, similar to SemLink, starting from scratch. Obviously, another strategy is to complete and extend the existing manual mappings provided by SemLink. Following the latter approach, we present the Predicate Matrix, a new lexical-semantic resource that integrates SemLink and the new set of mappings obtained by the automatic methods presented in this paper.

As the methods presented in Sect. 3 can be applied in different ways, in order to generate the Predicate Matrix, we select the settings we consider the most appropriate. In most of the cases, we prioritize precision over recall. That is, we give preference to more reliable sets even if they are smaller. Table 16 presents the settings used to obtain the automatic mappings.15
Table 16

Settings used to obtain the automatic mappings to build the Predicate Matrix 1.2

Lexical entries

Roles

VN–WN

UKB

WN + G

VN–FN

Steps 1–3

FN–WN

SSID+

WN + G

PB–FN

Only-one strategy

PB–WN

UKB

WN

  
Once the mappings are automatically obtained, we integrate them with those existing in SemLink and we build the Predicate Matrix 1.2. Each row of this Predicate Matrix represents the mapping of a role over the different resources and includes all the aligned knowledge about its corresponding WordNet verb sense. Tables 17 and 18 present the number of mappings contained in SemLink and the Predicate Matrix 1.2. Table 17 presents the differences in terms of mappings between lexicons and Table 18 the differences among roles. Although both sets of mappings overlap in many cases, the Predicate Matrix widely outnumbers the set of original mappings in SemLink. First, it provides more verb alignments between VerbNet and FrameNet (from 3709 to 5462 in Table 17). Second, it also enlarges the WordNet verb sense alignments (from 7,665 to 10,832 VerbNet verb senses and from 4851 to 8583 FrameNet verb senses in Table 17). Third, the new version of the Predicate Matrix doubles the role alignments between VerbNet and FrameNet (from 6934 to 14,258 in Table 18) and multiplies almost by three the number of role alignments between PropBank and FrameNet (from 4384 to 14,195 in Table 18).
Table 17

Differences between SemLink and the Predicate Matrix: Mappings between lexicons

 

WN–VN

WN–PB

WN–FN

VN–PB

VN–FN

PB–FN

SemLink

7665

5489

4851

4503

3709

2562

Predicate Matrix

10,832

9516

8583

4947

5462

4163

Table 18

Differences between SemLink and the Predicate Matrix: mappings between roles

 

PB–VN

FN–VN

FN–PB

SemLink

9950

6934

4384

Predicate Matrix

11,749

14,258

14,195

As we explain in Sect. 3, we do not propose any method to map PropBank and VerbNet directly. However, we obtain some new mappings between both resources indirectly. For example, PropBank predicates and VerbNet verbs that are not linked obtain mappings to the same LU of FrameNet. Note that the sets of mappings to FrameNet are highly enlarged by our methods, both for predicates and roles. Recall that FrameNet is the resource with the poorest coverage in SemLink. Specially remarkable is the new set of mappings between PropBank and FrameNet. Unlike SemLink, we provide direct links between those resources. The Predicate Matrix also includes mappings for modifiers (non-core arguments) of PropBank resulting in connections between roles that describe time, location, manner, etc. Moreover, the mappings comprising just core arguments are highly extended. For example, from the 14,195 mappings between PropBank and FrameNet roles in the Predicate Matrix, 10,320 correspond to core arguments. This figure is three times higher than the 4384 mappings existing in SemLink.

As an example of the current coverage of the Predicate Matrix v1.2, Table 19 shows the distribution of the verbal synsets covered by FrameNet frames, PropBank predicates and VerbNet classes according to the the lexicographic files from WordNet. From left to right, the table shows the lexicographic file number, the number of synsets pertaining to the lexicographic file, the number (and percentage) of synsets aligned to at least one FrameNet frame, the number (and percentage) of synsets aligned to at least one PropBank predicate, the number (and percentage) of synsets aligned to at least one VerbNet class and the lexicographic file name. Although the current coverage of the Predicate Matrix is larger than the one provided by SemLink, its coverage with respect to WordNet is still quite modest. For instance, less than 29 % of the WordNet verbal synsets are aligned to at least one FrameNet frame, around 39 % of the synsets have one or more PropBank predicates assigned and around 42 % of them are mapped to at least one VerbNet class. Interestingly, emotion is the lexicographic file with more synsets covered by all the three resources while competition is the one with the minor number of connections.
Table 19

WordNet verbal synsets covered by FrameNet frames, PropBank predicates and VerbNet classes in the Predicate Matrix v1.2. LF: Lexicographic file

LF

WN synsets

In FN (%)

In PB (%)

In VN

LF name

29

547

189 (34.55)

225 (41.13)

264 (48.26)

Body

30

2383

520 (21.82)

790 (33.15)

910 (38.19)

Change

31

695

196 (28.20)

272 (39.14)

266 (38.27)

Cognition

32

1548

535 (34.56)

647 (41.80)

657 (42.44)

Communication

33

459

93 (20.26)

133 (28.98)

131 (28.54)

Competition

34

243

79 (32.51)

98 (40.33)

107 (44.03)

Consumption

35

2196

639 (29.10)

879 (40.03)

1,035 (47.13)

Contact

36

694

157 (22.62)

248 (35.73)

255 (36.74)

Creation

37

343

158 (46.06)

207 (60.35)

219 (63.85)

Emotion

38

1408

474 (33.66)

551 (39.13)

626 (44.46)

Motion

39

461

167 (36.23)

202 (43.82)

227 (49.24)

Perception

40

847

192 (22.67)

332 (39.20)

305 (36.01)

Possession

41

1106

291 (26.31)

425 (38.43)

382 (34.54)

Social

42

756

230 (30.42)

312 (41.27)

301 (39.81)

Stative

43

81

36 (44.44)

34 (41.98)

43 (53.09)

Weather

Total

13,767

3956 (28.74)

5355 (38.90)

5728 (41.61)

 

4.1 Use of the Predicate Matrix in NewsReader

The Predicate Matrix is currently part of a multilingual event detection system implemented within the NewsReader project (Vossen et al. 2014). The event detection system is a pipeline which contains a set of modules to perform various NLP tasks. Among others, the system has a SRL module that automatically annotates semantic information based on PropBank. Thanks to the Predicate Matrix, it also obtains the equivalent annotations in FrameNet and VerbNet.

Furthermore, in NewsReader the events and their participants are also aligned to the Event and Situation Ontology (ESO) (Segers et al. 2015). ESO is a formalization of pre-conditions and post-conditions of events and roles and reuses existing resources such as WordNet, SUMO and FrameNet. The Predicate Matrix allows to obtain this representation even when the event extraction is based on other resources as PropBank.
Table 20

Results of evaluating the best method to map WordNet and FrameNet (SSID+ & WN + G) using as gold-standard SemLink and the new mappings from NewsReader

Testing-set

P

R

F1

SemLink

82.9

81.8

82.4

NewsReader

87.7

86.4

87.0

The development of ESO provides an additional benefit for the Predicate Matrix. As said, ESO arises from merging different resources, specifically WordNet and FrameNet. As a result, new manual mappings between these two resources have been created that are not included in SemLink. We use the new mappings for an additional evaluation, to check some of the new mappings obtained by our methods. We have evaluated the best method (SSID+ & WN + G) to map the lexicon of WordNet and FrameNet using the new mappings from NewsReader. Table 20 shows the results of this evaluation and the previous evaluation using SemLink (Table 4). Although the new testing set is not too large, it contains just 66 mappings, the results show that our strategy is consistent even for cases that are not included in SemLink.

4.2 Using WordNet to extend the Predicate Matrix

The mappings offered in the Predicate Matrix are also aligned to WordNet. Thus, it is possible to use WordNet to enlarge the knowledge included in the Predicate Matrix applying at least two different strategies. First, to include semantic knowledge from the Multilingual Central Repository (MCR) (Gonzalez-Agirre et al. 2012). The MCR relates senses of WordNets in different languages. These senses are related through the InterLingual Index (ILI). The MCR connects the ILIs with several ontologies and external references such as Adimen-SUMO (Álvez et al. 2012), the new WordNet domains (González-Agirre et al. 2012) and the Base Level Concept (Izquierdo et al. 2007). Once a predicate is mapped to a WordNet synset, we also include these features into the Predicate Matrix. Table 21 presents two examples of this new information. The verbs stutter and squeak of the VerbNet class 37.3 are mapped to the WordNet senses stutter%2:32:00 and squeak%2:39:00. In the MCR, the sense stutter%2:32:00 is aligned to the ILI ili-30-00981544-v, that belongs to the SUMO class Communication, to the domain factotum and its BLC is the sense speak%2:32:00. Also, the sense squeak%2:39:00 is aligned to the ILI ili-30-02171664-v, that belongs to the SUMO class SoundAttribute, to the domain factotum and, this time, its BLC is the sense sound%2:39:00.
Table 21

MCR knowledge projected to the PM

VN_LEMMA

VN_CLASS

WN_SENSE

FN_FRAME

PB_ROLESET

Stutter

37.3

Stutter%2:32:00

Communication_manner

Stutter.01

MCR_iliOffset

MCR_SUMO

MCR_Domain

WN_BLC

ili-30-00981544-v

Communication

Factotum

speak%2:32:00

VN_LEMMA

VN_CLASS

WN_SENSE

FN_FRAME

PB_ROLESET

Squeak

37.3

Squeak%2:39:00

Communication_noise

Squeak.01

MCR_iliOffset

MCR_SUMO

MCR_Domain

WN_BLC

ili-30-02171664-v

SoundAttribute

Factotum

Sound%2:39:00

Second, we can also easily exploit WordNet to extend the coverage of the lexicons of the resources included in the Predicate Matrix. A very straightforward method to extend the predicate information to additional lexical entries is to project the Predicate Matrix information associated to a particular WordNet sense to its synonyms. Note that this method expects that WordNet synonyms share the same predicate information. For instance, the predicate \(\textit{desert}_v\), member of the VerbNet class leave-51.2-1 is assigned to desert%2:31:00 WordNet verbal sense. In WordNet, this word sense also has three synonyms, \(\textit{abandon}_v\), \(\textit{forsake}_v\) and \(\textit{desolate}_v\). According to the previous assumption, these three verbs can also be aligned to the same VerbNet class. Table 22 presents two examples.
Table 22

New WordNet senses aligned to VerbNet

VerbNet

WordNet

New

Leave-51.2.1

Desert%2:31:00

Abandon%2:31:00::

Forsake%2:31:00::

Desolate%2:31:00::

Remove-10.1

Retract%2:32:00

Abjure%2:32:00::

Recant%2:32:00::

Forswear%2:32:00::

Resile%2:32:00::

As shown, this method assumes that synonyms should belong to the same VerbNet class and FrameNet frame. However, as shown in (López de Lacalle et al. 2014b), this is not always the case. Consider the following WordNet synset <understand, read, interpret, translate> with the gloss “make sense of a language” and the example sentences “She understands French; Can you read Greek?”. As synonyms, these verbs denote the same concept and are interchangeable in many contexts. However, in SemLink read%2:31:04 is aligned with the VerbNet class learn-14-116 while one of its synonyms understand%2:31:03 is aligned with the VerbNet class comprehend-87.2.17 Moreover, the thematic-roles of both classes are different. Learn-14-1 has the Agent (with semantic type [+animate]), Topic and Source thematic-roles while comprehend-87.2 has Experiencer (with semantic type [+animate or +organization]), Attribute and Stimulus. Are both sets of thematic-roles compatible? Complementary? Is one of them incorrect? Should we joint them? Maybe is the alignment or the synset definition in WordNet incorrect?

In order to study the effect of these phenomena, we conducted a simple experiment using as gold-standard those synsets in SemLink that have more than one lemma aligned to VerbNet or FrameNet. For each lemma in the gold-standard, we remove all its mappings and apply the synonymy strategy. Then, we compare the new assignments with the original ones in the gold-standard.

Table 23 presents the results of extending the Predicate Matrix by exploiting the synonymy information from WordNet. The threshold establishes that a lemma of a synset is included as a new lexical member of a class (or a frame), only if more than T lemmas of that synset are assigned to that class (or frame) in the Predicate Matrix. For example, desert%2:31:00 belongs to the class leave-51.2.1 in the Predicate Matrix but none of its synonyms appears in. If we set \(\mathbf T =0\) the verbs \(\textit{abandon}_v\), \(\textit{forsake}_v\) and \(\textit{desolate}_v\) would be included as new members of the class leave-51.2-1. For this synset, using \(\mathbf T =1\) we would not project any information to the rest of synonyms. As expected, increasing the threshold reduces the number of synonym projections while augmenting the precision.
Table 23

Results of extending the lexicon of VerbNet and FrameNet with different thresholds T

Threshold

VerbNet

FrameNet

P

New members

P

New LUs

\(\mathbf T =0\)

32.6

12,186

32.2

11,254

\(\mathbf T =1\)

59.6

4158

62.4

3680

\(\mathbf T =2\)

74.8

1988

72.4

1834

The results in Table 23 show that this method can obtain quite reliable new lexical members depending on the threshold. Surprisingly, the predicate information assigned to different WordNet synonyms seems to be quite inconsistent. That is, we were expecting verbal synonyms sharing its predicate information. Interestingly, this is not the case in the vast majority of cases. In fact, according to these results, predicate information is not shared between synonyms in WordNet. Thus, contrary to previous versions of the Predicate Matrix, for version 1.2 we decided not to include this extension. As future work, we will investigate possible ways to obtain more consistent projections exploiting the synonymy information as well as other semantic relations included in WordNet.

5 Conclusions and future work

Building large and rich predicate models takes a great deal of expensive manual effort. Furthermore, the same effort should be invested for each different language. Predicate resources such as VerbNet, FrameNet, PropBank and WordNet offer individually interesting characteristics not provided by their alternatives. Unfortunately, these semantic resources are developed independently and they are not fully integrated in a common platform. Thus, a common semantic framework would allow the interoperability among all these resources.

One of the few projects working on the integration of the predicate information is SemLink (Palmer 2009). It is an interesting approach but it has some limitations. First, the mappings have been manually developed. A very costly process which is also not systematic. Second, its coverage is still far from being complete.

Our work focuses on defining automatic methods for mapping in a systematic way different semantic resources containing predicate information. The aim is to offer a more complete semantic interoperability between these resources. For that, we have worked on the integration of predicate information at lexical and role levels.

Our lexical mappings are centralized through WordNet in order to offer a wider coverage. For that, we apply graph-based algorithms in three different scenarios: (a) mappings between WordNet and VerbNet lexicons; (b) mappings between WordNet and FrameNet lexicons; and (c) mappings between WordNet and PropBank lexicons. Regarding the role mappings, we propose two methods to infer new role mappings among the different predicate schemas. First, we have defined a three-step method to increase the alignments between VerbNet thematic-roles and FrameNet FEs. Second, we also present a corpus-based method to extend the mappings between FrameNet and PropBank.

We have also developed the Predicate Matrix, a new lexical-semantic resource resulting from the union of the mappings obtained by our automatic methods and SemLink. Although the sets of mappings obtained by the Predicate Matrix and SemLink overlap in many cases, our approach obtains a wider coverage compared to the original set of mappings in SemLink.

The Predicate Matrix is currently part the NewsReader18 pipelines providing interoperable semantic annotations between SRL outputs based on PropBank and FrameNet.

With the new Predicate Matrix, we expect to provide a more robust interoperable verbal lexicon. As future work, we plan to discover and solve inherent inconsistencies among the integrated resources. For this, we will consider all the sources of mappings available besides SemLink such as the mapping included in PropBank to VerbNet. Moreover, we foresee to extend the coverage of the current predicate resources. For example, by using the nominal WordNet morphosemantic links to verbal concepts, and exploiting FrameNet relations between FEs or by applying the current methods to new resources such as FrameNet+ (Pavlick et al. 2015). We also plan to enrich WordNet with predicate information, and possibly to extend predicate information to languages other than English (by exploiting the local wordnets aligned to the English WordNet). Moreover, we are studying new ways to integrate predicate information from other language resources. For instance, from the Ancora Spanish corpus and lexicon (Taulé et al. 2008). Finally, we want to perform some experiments to evaluate the Predicate Matrix indirectly. Our idea is to apply the new mappings across resources in order to obtain improvements in some real NLP tasks such as SRL. This kind of evaluation will allow to compare the automatic mappings against the manual ones of SemLink.

Footnotes
8

We obtain better results combining all sources of contexts than exploiting them separately.

 
10

The overall performance of the SRL obtains 85.50 % F1.

 
11

As explained before, to discover new alignments, it is possible to start from Step 1 or Step 2.

 
14

The overall perfomance of the SRL obtains 85.50 % F1.

 
15

Note that the Only-one strategy applied to obtain mappings between PropBank and FrameNet deals with lexical and role mappings.

 

Acknowledgments

This work has been partially funded by TUNER (TIN2015-65308-C5-1-R), NewsReader (FP7-ICT-2011-8-316404), as well as the READERS project with the financial support of MINECO, ANR (convention ANR-12-CHRI-0004-03) and EPSRC (EP/K017845/1) in the framework of ERA-NET CHIST-ERA (UE FP7/2007-2013).

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Maddalen Lopez de Lacalle
    • 1
  • Egoitz Laparra
    • 1
  • Itziar Aldabe
    • 1
  • German Rigau
    • 1
  1. 1.University of the Basque Country UPV/EHUDonostiaSpain

Personalised recommendations