Skip to main content

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2017)

Abstract

This paper summarizes findings of a three-year study on verb synonymy in translation based on both syntactic and semantic criteria and reports on recent results extending this work. Primary language resources used are existing Czech and English lexical and corpus resources, namely the Prague Dependency Treebank-style valency lexicons, FrameNet, VerbNet, PropBank, WordNet and the parallel Prague Czech-English Dependency Treebank, which contains deep syntactic and partially semantic annotation of running texts. The resulting lexicon (called formerly CzEngClass, now SynSemClass) and all associated resources linked to the existing lexicons and corpora following from this project are publicly and freely available. While the project proper assumes manual annotation work, we expect to use the resulting resource (together with the existing ones) as a necessary resource for developing automatic methods for extending such a lexicon, or creating similar lexicons for other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://lindat.mff.cuni.cz/services/PDT-Vallex/.

  2. 2.

    http://www.ldc.upenn.edu/LDC2006T01.

  3. 3.

    http://hdl.handle.net/11858/00-097C-0000-0023-4337-2.

  4. 4.

    PCEDT 2.0 is available from the LINDAT/CLARIAH-CZ repository at http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4).

  5. 5.

    https://framenet.icsi.berkeley.edu.

  6. 6.

    OntoNotes Sense Groupings are also viewable at the Unified Verb Index Reference Page at http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes.

  7. 7.

    http://verbs.colorado.edu/~mpalmer/projects/verbnet.html.

  8. 8.

    http://propbank.github.io/.

  9. 9.

    The PBs semantic roles are not the same as SynSemClass lexicon roles.

  10. 10.

    https://verbs.colorado.edu/semlink/.

  11. 11.

    http://verbs.colorado.edu/verb-index/.

  12. 12.

    https://wordnet.princeton.edu/.

  13. 13.

    http://hdl.handle.net/11858/00-097C-0000-0001-4880-3.

  14. 14.

    http://corpus.byu.edu/coca/.

  15. 15.

    https://www.sketchengine.co.uk/.

  16. 16.

    http://lindat.mff.cuni.cz/services/kontext/.

  17. 17.

    For the time being, bilingual: in Czech and English.

  18. 18.

    The term “sense” is used here for the differentiation of a single verb lexeme (“word”) into one or more senses, represented technically by its valency frame ID, as it is done in the original valency lexicons (PDT-Vallex and EngVallex).

  19. 19.

    https://verbs.colorado.edu/verb-index.

  20. 20.

    Using the Unified Verb Index, http://verbs.colorado.edu/verb-index.

  21. 21.

    With possible modifications in the subsequent two steps.

  22. 22.

    Using FrameNet v1.7, there are 1,168 different FE labels available across all frames. Later, VerbNet’s thematic roles will be compared with the selected FEs and a common set used, provided a suitable common theoretical framework can be found.

  23. 23.

    We can only speculate why the translator has used znamenat here; possibly because literal translations of hold are awkward in Czech (in this context), and the translator also determined that in fact the semantics of hold is already contained in the phrase previous year’s level, and thus a translation of a hyperonym of hold can be used instead.

  24. 24.

    https://framenet2.icsi.berkeley.edu/fnReports/data/frameIndex.xml?frame=Statement.

  25. 25.

    In fact, the “Topic” part should be annotated within the information structure “layer” (topic/focus), not using semantic roles.

  26. 26.

    Later investigation, as well as testing the addition of new verbs into this class, however revealed that most of the verbs should have had only three valency slots in EngVallex: ACT, PAT and ADDR, where PAT corresponds to either “payment” or “obligation”, but not both. In addition, the EFF does not seem to be core for the “reimbursement” concept. Therefore the Roleset has been reduced (or, generalized) to three SRs only, namely Payee (mapped to ADDR), Obligation_Payment (mapped to PAT or EXT) and Payer (mapped to ACT). In any case, this is still an example of a prevailing 1:1 valency slot:SR mapping.

  27. 27.

    http://hdl.handle.net/11234/1-3215; previous version was available as SynSemClass 1.0, http://hdl.handle.net/11234/1-3125.

  28. 28.

    http://hdl.handle.net/11234/1-3215.

References

  1. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, ACL 1998, vol. 1, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998). https://doi.org/10.3115/980845.980860. http://dx.doi.org/10.3115/980845.980860

  2. Cinková, S.: From propbank to engvallex: adapting the propbank-lexicon to the valency theory of the functional generative description. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 2170–2175. ELRA, Genova (2006)

    Google Scholar 

  3. Duffield, C.J., et al.: Criteria for the manual grouping of verb senses. In: Proceedings of the Linguistic Annotation Workshop, LAW 2007, pp. 49–52. Association for Computational Linguistics, Stroudsburg (2007). http://dl.acm.org/citation.cfm?id=1642059.1642067

  4. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  5. Fillmore, C.J.: Frame semantics and the nature of language. Ann. New York Acad. Sci.: Conf. Origin Dev. Lang. Speech 280(1), 20–32 (1976)

    Article  Google Scholar 

  6. Fučíková, E., Hajič, J., Šindlerová, J., Urešová, Z.: Czech-English bilingual valency lexicon online. In: 14th International Workshop on Treebanks and Linguistic Theories (TLT 2015), pp. 61–71. IPIPAN, Warszawa (2015)

    Google Scholar 

  7. Hajič, J., et al.: Announcing Prague Czech-English dependency treebank 2.0. In: Proceedings of the 8th LREC 2012), pp. 3153–3160. ELRA, Istanbul (2012)

    Google Scholar 

  8. Hajič, J., et al.: Prague Czech-English dependency treebank 2.0 (2012). https://catalog.ldc.upenn.edu/LDC2004T25. http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4

  9. Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolářová, V., Pajas, P.: PDT-VALLEX: creating a large-coverage valency lexicon for treebank annotation. In: Nivre, J., Hinrichs, E. (eds.) Proceedings of The Second Workshop on Treebanks and Linguistic Theories. Mathematical Modeling in Physics, Engineering and Cognitive Sciences, vol. 9, pp. 57–68. Vaxjo University Press, Vaxjo (2003)

    Google Scholar 

  10. Hajič, J., et al.: Prague dependency treebank 3.5 (2018). http://hdl.handle.net/11234/1-2621

  11. Hajič, J., et al.: Prague dependency treebank 2.0 (2006)

    Google Scholar 

  12. Kettnerová, V.: Konstrukce s rozpadem tématu a dikta v češtině (constructions with topic and message separation in Czech). Slovo Slovesnost 70(3), 163–174 (2009)

    Google Scholar 

  13. Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: Extending VerbNet with novel verb classes. In: Proceedings of LREC, p. 1 (2006)

    Google Scholar 

  14. Levin, B.: English verb classes and alternations. The University of Chicago Press, Chicago and London (1993)

    Google Scholar 

  15. Lyons, J.: Introduction to Theoretical Linguistics. Cambridge University Press, Cambridge (1968)

    Book  Google Scholar 

  16. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748. http://doi.acm.org/10.1145/219717.219748

    Article  Google Scholar 

  17. Pala, K., Smrz, P.: Building Czech Wordnet. Roman. J. Inf. Sci. Technol. 7(1–2), 79–88 (2004). http://nlp.fi.muni.cz/publications/romjist2004_pala_smrz/

  18. Palmer, M.: Semlink: linking PropBank, VerbNet and FrameNet. In: Proceedings of the Generative Lexicon Conference, pp. 9–15 (2009)

    Google Scholar 

  19. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005). https://doi.org/10.1162/0891201053630264. http://dx.doi.org/10.1162/0891201053630264

    Article  Google Scholar 

  20. Pradhan, S.S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 01(04), 405–419 (2007). https://doi.org/10.1142/S1793351X07000251. https://www.worldscientific.com/doi/abs/10.1142/S1793351X07000251

    Article  Google Scholar 

  21. Pradhan, S.S., Xue, N.: OntoNotes: the 90% solution. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, pp. 11–12. Association for Computational Linguistics, Boulder, May 2009. https://www.aclweb.org/anthology/N09-4006

  22. Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: extended theory and practice. Unpublished Manuscript (2006). http://framenet.icsi.berkeley.edu/

  23. Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania (2006). http://verbs.colorado.edu/~kipper/Papers/dissertation.pdf

  24. Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. D. Reidel, Dordrecht (1986)

    Google Scholar 

  25. Urešová, Z.: Valence sloves v Pražském závislostním korpusu. Studies in Computational and Theoretical Linguistics. Ústav formální a aplikované lingvistiky, Praha, Czechia (2011)

    Google Scholar 

  26. Urešová, Z., Fučíková, E., Hajič, J., Šindlerová, J.: Czengvallex - Czech English valency lexicon (2015)

    Google Scholar 

  27. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: Syntactic-semantic classes of context-sensitive synonyms based on a bilingual corpus. In: Vetulani, Z., Mariani, J. (eds.) Proceedings of 8th Language and Technology Conference, pp. 201–205. Fundacja Uniwersytetu im. Adama Mickiewicza, Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, Poznań (2017)

    Google Scholar 

  28. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: Creating a verb synonym lexicon based on a parallel corpus. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, May 2018

    Google Scholar 

  29. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: A Cross-lingual synonym classes lexicon. Prace Filologiczne LXXII, 405–418 (2018)

    Google Scholar 

  30. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: Defining verbal synonyms: between syntax and semantics. In: Haug, D., Oepen, S., Øvrelid, L., Candito, M., Hajič, J. (eds.) Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), pp. 75–90. Universitetet i Oslo, Linköping University Electronic Press Pub No. 155, Linköping (2018)

    Google Scholar 

  31. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: Tools for Building an Interlinked Synonym Lexicon Network. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, May 2018

    Google Scholar 

  32. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: Meaning and semantic roles in CzEngClass lexicon. Jazykovedný časopis/J. Linguist. 70(2), 403–411 (2019)

    Google Scholar 

  33. Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J.: SynSemClass linked lexicon: mapping synonymy between languages. In: Proceedings of the Globalex 2020 Workshop at the 12th International Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association (ELRA), Marseille, May 2020

    Google Scholar 

  34. Urešová, Z., Fučíková, E., Šindlerová, J.: CzEngVallex: a bilingual Czech-English valency lexicon. Prague Bull. Math. Linguist. 105, 17–50 (2016)

    Article  Google Scholar 

  35. Wu, S., Choi, J.D., Palmer, M.: Detecting cross-lingual semantic similarity using parallel PropBanks. In: Proceedings of the 9th Conference of the Association for Machine Translation in the Americas, AMTA 2010, Denver, CO (2010). https://amta2010.amtaweb.org

Download references

Acknowledgments

This work has been supported by the grants No. GA17-07313S and GX20-16819X of the Grant Agency of the Czech Republic, and it uses resources hosted by the LINDAT/CLARIAH-CZ Research Infrastructure, project No. LM2018101, supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Hajič .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Urešová, Z., Fučíková, E., Hajičová, E., Hajič, J. (2020). Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66527-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66526-5

  • Online ISBN: 978-3-030-66527-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics