A multilingual FrameNet-based grammar and lexicon for controlled natural language

Abstract

Berkeley FrameNet is a lexico-semantic resource for English based on the theory of frame semantics. It has been exploited in a range of natural language processing applications and has inspired the development of framenets for many languages. We present a methodological approach to the extraction and generation of a computational multilingual FrameNet-based grammar and lexicon. The approach leverages FrameNet-annotated corpora to automatically extract a set of cross-lingual semantico-syntactic valence patterns. Based on data from Berkeley FrameNet and Swedish FrameNet, the proposed approach has been implemented in Grammatical Framework (GF), a categorial grammar formalism specialized for multilingual grammars. The implementation of the grammar and lexicon is supported by the design of FrameNet, providing a frame semantic abstraction layer, an interlingual semantic application programming interface (API), over the interlingual syntactic API already provided by GF Resource Grammar Library. The evaluation of the acquired grammar and lexicon shows the feasibility of the approach. Additionally, we illustrate how the FrameNet-based grammar and lexicon are exploited in two distinct multilingual controlled natural language applications. The produced resources are available under an open source license.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    https://framenet.icsi.berkeley.edu/fndrupal/framenet_data.

  2. 2.

    http://remu.grammaticalframework.org/framenet/SweFN_2014-12-03.zip.

  3. 3.

    https://framenet.icsi.berkeley.edu/.

  4. 4.

    http://spraakbanken.gu.se/swefn/.

  5. 5.

    http://spraakbanken.gu.se/saldo.

  6. 6.

    http://spraakbanken.gu.se/swe/resurs/wordnet-saldo.

  7. 7.

    http://spraakbanken.gu.se/eng/korp-info.

  8. 8.

    http://www.grammaticalframework.org/lib/doc/synopsis.html.

  9. 9.

    https://github.com/GrammaticalFramework/gf-contrib/tree/master/framenet (the acquired grammar and lexicon; version 0.9.7 at the time of writing).

  10. 10.

    http://www.grammaticalframework.org/framenet/.

  11. 11.

    Where \({{{\mathsf {Adv}}}}\) is a VP-modifying adverb, \({\mathsf {S}}\)—an embedded declarative sentence, and \({\mathsf {QS}}\)—an embedded question.

  12. 12.

    Where \({\mathsf {V}}\) is a one-place verb, \({\mathsf {V2}}\)—a two-place verb, \({\mathsf {V3}}\)—a three-place verb, \({\mathsf {VV}}\)—a \({\mathsf {VP}}\)-complement verb, \({\mathsf {VS}}\)—an \({\mathsf {S}}\)-complement verb, \({\mathsf {VQ}}\)—a \({\mathsf {QS}}\)-complement verb, \({\mathsf {V2V}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {VP}}\) complements, \({\mathsf {V2S}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {S}}\) complements, and \({\mathsf {V2Q}}\)—a verb with \({{{\mathsf {NP}}}}\) and \({\mathsf {QS}}\) complements.

  13. 13.

    http://universaldependencies.github.io/docs/u/feat/Voice.html.

  14. 14.

    SweFN tags are described at http://stp.lingfil.uu.se/~nivre/swedish_treebank/.

  15. 15.

    Additionally, more than 100 examples are skipped in both corpora due to inconsistent semantico-syntactic annotations that were not fixed by the current heuristics.

  16. 16.

    If repeated FEs are of different RGL types, the whole example is currently skipped.

  17. 17.

    Taking into account the grammatical types and relations.

  18. 18.

    It is often practically impossible or uncommon that all core FEs are used in the same sentence. For instance, Area is mutually exclusive with five other core FEs in the frame Motion, and these five other \({{{\mathsf {Adv}}}}\)-typed FEs normally are not used altogether.

  19. 19.

    E.g. in a highly inflected language.

  20. 20.

    \({{{\mathsf {Desiring}}\_{\mathsf {V2}}\_{\mathsf {Pass}}}}\) is included for illustration but is not directly acquired from a shared pattern. Missing passive or active voice patterns could be acquired implicitly—deriving them from the corresponding active or passive voice patterns. However, for now we are strictly following the corpus evidence.

  21. 21.

    http://www.grammaticalframework.org/lib/doc/synopsis.html.

  22. 22.

    The order of \({{{\mathsf {Adv}}}}\) complements is based on the most frequent sentence pattern.

  23. 23.

    A missing subject FE, however, could be often automatically inferred and added.

  24. 24.

    The RGL modules Dict L, Dictionary L, Lexicon L, Irreg L and Structural L are a subject to change independently. We have used an RGL snapshot of December 2014.

  25. 25.

    http://universaldependencies.github.io/docs/u/pos/.

  26. 26.

    http://universaldependencies.github.io/docs/u/feat/.

  27. 27.

    http://www.molto-project.eu/.

  28. 28.

    http://www.grammaticalframework.org/demos/phrasebook/.

  29. 29.

    http://museum.ontotext.com/.

References

  1. Barzdins, G. (2014). FrameNet CNL: A knowledge representation and information extraction language. In Controlled natural language, Lecture Notes in Computer Science (Vol. 8625, 90–101). Berlin: Springer.

  2. Boas, H. C. (2009). Multilingual FrameNets in computational lexicography: Methods and applications. Berlin: Mouton de Gruyter.

    Book  Google Scholar 

  3. Borin, L., Dannélls, D., Forsberg, M., Toporowska Gronostaj, M., & Kokkinakis, D. (2010). The past meets the present in Swedish FrameNet++. In Proceedings of the 14th EURALEX international congress (pp. 269–281).

  4. Borin, L., Forsberg, M., & Roxendal, J. (2012). Korp—The corpus infrastructure of Språkbanken. In Proceedings of the 8th international conference on language resources and evaluation (LREC) (pp. 474–478).

  5. Borin, L., Forsberg, M., & Lönngren, L. (2013). SALDO: A touch of yin to WordNet’s yang. Language Resources and Evaluation, 47(4), 1191–1211.

    Article  Google Scholar 

  6. Coyne, B., Bauer, D., & Rambow, O. (2011). VigNet: Grounding language in graphics using frame semantics. In Proceedings of the ACL workshop on relational models of semantics (pp. 28–36).

  7. Dannélls, D. (2010). Discourse generation from formal specifications using the Grammatical Framework, GF. Special Issue of Research in Computing Science, 46, 167–178.

    Google Scholar 

  8. Dannélls, D., & Gruzitis, N. (2014a). Extracting a bilingual semantic grammar from FrameNet-annotated corpora. In Proceedings of the 9th international language resources and evaluation conference (LREC) (pp. 2466–2473).

  9. Dannélls, D., & Gruzitis, N. (2014b). Controlled natural language generation from a multilingual FrameNet-based grammar. In Controlled natural language, Lecture Notes in Computer Science (Vol. 8625, pp. 155–166). Berlin: Springer.

  10. Dannélls, D., Enache, R., Damova, M., & Chechev, M. (2012). Multilingual online generation from Semantic Web ontologies. In Proceedings of the 21st international world wide web conference, European Project Track (pp. 239–242).

  11. Das, D., Chen, D., Martins, A. F. T., Schneider, N., & Smith, N. A. (2014). Frame-semantic parsing. Computational Linguistics, 40(1), 9–56.

    Article  Google Scholar 

  12. Davis, B., Enache, R., van Grondelle, J., & Pretorius, L. (2012). Multilingual verbalisation of modular ontologies using GF and LEMON. In Controlled natural language, Lecture Notes in Computer Science (Vol. 7427, pp. 167–184). Berlin: Springer

  13. de Marneffe M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal stanford dependencies: A cross-linguistic typology. In Proceedings of the 9th international language resources and evaluation conference (LREC) (pp. 4585–4592).

  14. Ferrández, Ó., Ellsworth, M., Muñoz, R., & Baker, C. F. (2010). Aligning FrameNet and WordNet based on semantic neighborhoods. In Proceedings of the 7th international language resources and evaluation conference (LREC) (pp. 310–314).

  15. Fillmore, C. J. (1985). Frames and the semantics of understanding. Quaderni di Semantica, 6(2), 222–254.

    Google Scholar 

  16. Fillmore, C. J., Johnson, C. R., & Petruck, M. R. L. (2003). Background to framenet. International Journal of Lexicography, 16(3), 235–250.

    Article  Google Scholar 

  17. Gruzitis, N., & Barzdins, G. (2010). Polysemy in controlled natural language texts. In Controlled natural language, Lecture Notes in Computer Science (Vol. 5972, pp. 102–120). Berlin: Springer.

  18. Gruzitis, N., Paikens, P., & Barzdins, G. (2012). FrameNet resource grammar library for GF. In Controlled natural language, Lecture Notes in Computer Science (Vol. 7427, pp. 121–137). Berlin: Springer.

  19. Kuhn, T. (2014). A survey and classification of controlled natural languages. Computational Linguistics, 40(1), 121–170.

    Article  Google Scholar 

  20. Lenci, A., Bel, N., Busa, F., Calzolari, N., Gola, E., Monachini, M., et al. (2000). SIMPLE: A general framework for the development of multilingual lexicons. International Journal of Lexicography, 13(4), 249–263.

    Article  Google Scholar 

  21. Meyers, A., Macleod, C., & Grishman, R. (1995). COMLEX Syntax 2.0 manual for tagged entries. Technical Report, New York University.

  22. Moschitti, A., Morarescu, P., & Harabagiu, S. M. (2003). Open domain information extraction via automatic semantic labeling. In Proceedings of the 16th international FLAIRS conference (pp. 397–401).

  23. Nivre, J., Hall, J., & Nilsson, J. (2004). Memory-based dependency parsing. In Proceedings of the 8th conference on computational natural language learning (CoNLL) (pp. 49–56).

  24. Ranta, A. (2004). Grammatical Framework, a type-theoretical grammar formalism. Journal of Functional Programming, 14(2), 145–189.

    Article  Google Scholar 

  25. Ranta, A. (2009). The GF resource grammar library. Linguistic Issues in Language Technology (LILT), 2(2), 1–63.

    Google Scholar 

  26. Ranta, A., Enache, R., & Détrez, G. (2010). Controlled language for everyday use: The MOLTO Phrasebook. In Controlled natural language, Lecture Notes in Computer Science (Vol. 7175, pp. 115–136). Berlin: Springer.

  27. Roth, M., & Frank, A. (2009). A NLG-based application for walking directions. In Proceedings of the 47th ACL and the 4th IJCNLP conference (pp. 37–40).

  28. Roth, M., & Frank, A. (2010). Computing EM-based alignments of routes and route directions as a basis for natural language generation. In Proceedings of the 23rd international conference on computational linguistics (COLING) (pp. 958–966).

  29. Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Johnson, C. R., & Scheffczyk, J. (2010). FrameNet II: Extended theory and practice. Berkeley: International Computer Science Institute.

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the Swedish Research Council under Grant No. 2012-5746 (Reliable Multilingual Digital Communication: Methods and Applications) and by the Centre for Language Technology in Gothenburg. The research leading to these results has received funding also from the Latvian State Research Programme NexIT (Project No. 1).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Normunds Gruzitis.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gruzitis, N., Dannélls, D. A multilingual FrameNet-based grammar and lexicon for controlled natural language. Lang Resources & Evaluation 51, 37–66 (2017). https://doi.org/10.1007/s10579-015-9321-8

Download citation

Keywords

  • FrameNet
  • Grammatical Framework
  • Multilinguality
  • Natural language generation
  • Controlled natural language