A multilingual FrameNet-based grammar and lexicon for controlled natural language


Berkeley FrameNet is a lexico-semantic resource for English based on the theory of frame semantics. It has been exploited in a range of natural language processing applications and has inspired the development of framenets for many languages. We present a methodological approach to the extraction and generation of a computational multilingual FrameNet-based grammar and lexicon. The approach leverages FrameNet-annotated corpora to automatically extract a set of cross-lingual semantico-syntactic valence patterns. Based on data from Berkeley FrameNet and Swedish FrameNet, the proposed approach has been implemented in Grammatical Framework (GF), a categorial grammar formalism specialized for multilingual grammars. The implementation of the grammar and lexicon is supported by the design of FrameNet, providing a frame semantic abstraction layer, an interlingual semantic application programming interface (API), over the interlingual syntactic API already provided by GF Resource Grammar Library. The evaluation of the acquired grammar and lexicon shows the feasibility of the approach. Additionally, we illustrate how the FrameNet-based grammar and lexicon are exploited in two distinct multilingual controlled natural language applications. The produced resources are available under an open source license.

    https://github.com/GrammaticalFramework/gf-contrib/tree/master/framenet (the acquired grammar and lexicon; version 0.9.7 at the time of writing).

  11. 11.

    Where \({{{\mathsf {Adv}}}}\) is a VP-modifying adverb, \({\mathsf {S}}\)—an embedded declarative sentence, and \({\mathsf {QS}}\)—an embedded question.

  12. 12.

    Where \({\mathsf {V}}\) is a one-place verb, \({\mathsf {V2}}\)—a two-place verb, \({\mathsf {V3}}\)—a three-place verb, \({\mathsf {VV}}\)—a \({\mathsf {VP}}\)-complement verb, \({\mathsf {VS}}\)—an \({\mathsf {S}}\)-complement verb, \({\mathsf {VQ}}\)—a \({\mathsf {QS}}\)-complement verb, \({\mathsf {V2V}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {VP}}\) complements, \({\mathsf {V2S}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {S}}\) complements, and \({\mathsf {V2Q}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {QS}}\) complements.

    SweFN tags are described at http://stp.lingfil.uu.se/~nivre/swedish_treebank/.

    Additionally, more than 100 examples are skipped in both corpora due to inconsistent semantico-syntactic annotations that were not fixed by the current heuristics.

    If repeated FEs are of different RGL types, the whole example is currently skipped.

    Taking into account the grammatical types and relations.

    It is often practically impossible or uncommon that all core FEs are used in the same sentence. For instance, Area is mutually exclusive with five other core FEs in the frame Motion, and these five other \({{{\mathsf {Adv}}}}\)-typed FEs normally are not used altogether.

    E.g. in a highly inflected language.

    \({{{\mathsf {Desiring}}\_{\mathsf {V2}}\_{\mathsf {Pass}}}}\) is included for illustration but is not directly acquired from a shared pattern. Missing passive or active voice patterns could be acquired implicitly—deriving them from the corresponding active or passive voice patterns. However, for now we are strictly following the corpus evidence.

    The order of \({{{\mathsf {Adv}}}}\) complements is based on the most frequent sentence pattern.

    A missing subject FE, however, could be often automatically inferred and added.

    The RGL modules Dict L, Dictionary L, Lexicon L, Irreg L and Structural L are a subject to change independently. We have used an RGL snapshot of December 2014.

This work has been supported by the Swedish Research Council under Grant No. 2012-5746 (Reliable Multilingual Digital Communication: Methods and Applications) and by the Centre for Language Technology in Gothenburg. The research leading to these results has received funding also from the Latvian State Research Programme NexIT (Project No. 1).

  • FrameNet
  • Grammatical Framework
  • Multilinguality
  • Natural language generation
  • Controlled natural language