Abstract
In the following, we present the experimental setup for evaluating the performance of our proposed discourse-aware TS approach with regard to the two subtasks of
-
(1)
splitting and rephrasing syntactically complex input sentences into a set of minimal propositions (Hypotheses 1.1, 1.2 and 1.3), and
-
(2)
setting up a semantic hierarchy between the split components, based on the tasks of constituency type classification and rhetorical relation identification (Hypotheses 1.4 and 1.5).
Moreover, we describe the setting for assessing the use of discourse-aware sentence splitting as a support for the task of Open IE (Hypotheses 2.1 and 2.2).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For a detailed analysis, see Section 21 in the online supplemental material.
- 2.
- 3.
- 4.
- 5.
In our implementation, we use the “” class from the python module “”.
- 6.
The 19 classes of rhetorical relations can be found in Table 15.17. Most importantly, Taboada and Das (2013) map the List and Disjunction relationships to a common Joint class and integrate the Result relation into the semantic category of Cause relationships. Moreover, both Temporal-After and Temporal-Before become part of the Temporal class of rhetorical relations.
- 7.
For the sake of simplicity, we do not judge the output for minimality here.
- 8.
In a pre-study with one annotator we further examined whether the content of a simplified sentence that is flagged as a contextual proposition by our TS approach is indeed limited to background information. This analysis revealed that only 0.5% of them were misclassified, i.e. they convey some piece of key information from the input, while 90.9% were correctly classified as context sentences. Instead, we figured out that malformed output is a bigger issue (9.6%). Therefore, we decided to not further investigate the question of limitation to background information, and rather focus on the soundness of the contextual propositions in our human evaluation.
- 9.
We use the default configuration of each system.
- 10.
We used the latest version of the OIE2016 benchmark (commit on their github repository).
- 11.
Though the authors of ClausIE specify a method for calculating the confidence scores of their system’s extractions, it is not implemented in the published source code. Therefore, we make use of Graphene’s confidence score function for ClausIE’s extractions, too.
- 12.
The original scoring function presented in Stanovsky and Dagan (2016) was more restrictive. Here, a prediction was classified as being correct if the predicted and the reference tuple agree on the grammatical head of all their elements, i.e. the predicate and the arguments. However, this scheme was later relaxed in their github repository to the lexical matching function described above, which has become the OIE2016 benchmark’s default scorer in the current literature.
- 13.
The results of the comparative analysis on the original CaRB dataset can be found in Section 22 in the online supplemental material.
- 14.
CaRB is built over a subset of OIE2016’s original sentences.
Author information
Authors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature
About this chapter
Cite this chapter
Niklaus, C. (2022). Experimental Setup. In: From Complex Sentences to a Formal Semantic Representation using Syntactic Text Simplification and Open Information Extraction. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-38697-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-658-38697-9_14
Published:
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-38696-2
Online ISBN: 978-3-658-38697-9
eBook Packages: EducationEducation (R0)