Text augmentation for semantic frame induction and parsing

Anwar, Saba; Shelmanov, Artem; Arefyev, Nikolay; Panchenko, Alexander; Biemann, Chris

doi:10.1007/s10579-023-09679-8

Text augmentation for semantic frame induction and parsing

Original Paper
Open access
Published: 21 October 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Language Resources and Evaluation Aims and scope Submit manuscript

Text augmentation for semantic frame induction and parsing

Download PDF

Saba Anwar ORCID: orcid.org/0000-0002-8344-5165¹,
Artem Shelmanov²,
Nikolay Arefyev³,
Alexander Panchenko^4,5 &
…
Chris Biemann¹

771 Accesses
Explore all metrics

Abstract

Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy, Kidnapping, or Exchange. Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping, two key roles are Perpetrator and the Victim, and this frame can be evoked with lexical units abduct, kidnap, or snatcher. While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.

Natural Language Processing

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

Article 07 February 2024

Text Data Augmentation for Deep Learning

Article Open access 19 July 2021

1 Introduction

Data augmentation refers to techniques used to enlarge human-authored datasets by automatically generating more additional instances that are similar to the original data. In natural language processing (NLP), the augmentation of text is a challenging task because of the discrete, symbolic nature of text data. However, despite the challenges, it provides a way to improve machine learning models in situations where human-annotated data is scarce (Şahin, 2022). In this work, we demonstrate how text augmentation by the means of lexical substitution can be used to enrich representations of semantic frames.

A semantic frame is a linguistic structure used to formally describe the meaning of a situation, action or event (Fillmore, 1982). A frame annotation for a sentence provides (i) a set of target words that evoke frames in this sentence, (ii) the respective frame for each of the targets and (iii) a set of arguments for each of the frames in the sentence. An example sentence is given in Fig. 1 along with two frame annotations taken from FrameNet (Baker et al., 1998)—a widely-used publicly available resource of frame annotations. The example sentence contains two targets: help, which evokes the frame ‘Assistance’ and hope, which evokes the frame ‘Desiring’. The corresponding entries help.v and hope.v with target word(s) lemmas and a part-of-speech tag are called lexical units (LUs) or frame evoking elements (FEE) in FrameNet. The arguments represent semantic roles or frame elements (FEs) that act as participants of the situation described by the frame.

Semantic frames have been used in a wide range of applications, such as question answering (Shen & Lapata, 2007; Berant & Liang, 2014; Khashabi et al., 2018), machine translation (Gao & Vogel, 2011; Zhai et al., 2013), and semantic role labeling (Do et al., 2017; Swayamdipta et al., 2018). However, their impact is restricted by the limited availability of annotated resources. Although there are some publicly available resources like FrameNet (Baker et al., 1998) and PropBank (Palmer et al., 2005), yet for many languages and domains, no specialized resources exist. Besides, due to the inherent vagueness of frame definitions, the annotation task is challenging and requires well-trained annotators or very complex crowd-sourcing setups (Fossati et al., 2013).

In this work, we suggest a different approach to the problem: augmenting the FrameNet resource automatically by generating more synthetic examples of existing frame annotations in context via lexical substitution. This way, we are obtaining additional lexical representations of semantic frames (i.e. synonyms of words describing semantic frames). The goal of lexical substitution (McCarthy & Navigli, 2009) is to replace a given word in a particular context with other words, which are semantically similar or related to the original word. The concept is similar to set expansion in its nature; set expansion refers to expanding a small set of seed entities into a larger set by acquiring new entities that belong to the same semantic class (Wang & Cohen, 2007). We consider that given a small set of seed sentences with their frame annotations, we can expand these annotations (a set of seed sentences) by substituting the targets and arguments of those sentences and aggregating possible substitutions into an induced semantic-frame resource. Table 1 shows one such induced example. To generate these substitutes, we experimented with non-contextualized word embeddings, i.e. fastText (Bojanowski et al., 2017), GloVe (Pennington et al., 2014), and word2vec (Mikolov et al., 2013); distributional thesauri from JoBimText (Biemann & Riedl, 2013); and compared their results to pre-trained Transformer-based contextualized models such as BERT (Devlin et al., 2019) and XLNet (Yang et al., 2019). To complete the comparison, we also include the lexical substitution model of Melamud et al. (2015) that uses dependency-based word and context embeddings and produces context-sensitive lexical substitutes.

Table 1 Lexical representations of the Assistance FrameNet frame are retrieved using lexical substitutes from a single seed sentence with the BERT model

Text augmentation for semantic frame induction and parsing

Abstract

Similar content being viewed by others

Natural Language Processing

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

Text Data Augmentation for Deep Learning

1 Introduction

2 Related work

3 Inducing lexical representations of frames

3.1 Non-contextualized models

3.1.1 Non-contextualized word embeddings

3.1.2 Distributional thesauri

3.2 Contextualized models

3.2.1 Melamud’s lexical substitution model

3.2.2 Pre-trained transformer-based models

3.3 Combination of models

4 Intrinsic evaluation: augmenting lexical descriptions in FrameNet

4.1 Experimental setup

4.1.1 Datasets

4.1.2 Evaluation measures

4.1.3 Text pre-processing

4.1.4 Post-processing of substitutes

4.1.5 Combination of models

4.2 Results

4.2.1 Lexical unit expansion task

4.2.2 Semantic role expansion task

4.2.3 Effect of gold set size

4.3 Examples of induced lexical semantic frame representations

4.4 Manual evaluation of lexical substitutes

4.4.1 Problems with automatic evaluation of lexical substitutes

4.4.2 Evaluation framework

4.4.3 Datasets and substitution model

4.4.4 Results

5 Extrinsic evaluation: frame-semantic parsing with lexically expanded FrameNet

5.1 Experimental setup

5.1.1 Lexical substitution models

5.1.2 Frame-semantic parsers

5.1.3 Seed datasets

5.1.4 Dataset expansions

5.1.5 Post-processing

5.2 Examples of augmented sentences

5.3 Results with the ppen-SESAME parser

5.3.1 Effect of train dataset size over model performance

5.4 Results with the BERT-based parser

6 Conclusion

7 Future work

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendices

Appendix 1: Effect of model size and masking on transformer-based models

Appendix 2: Effect of pre-processing pipeline

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation