Advertisement

Verb Sense Annotation for Turkish PropBank via Crowdsourcing

  • Gözde Gül Şahin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9623)

Abstract

In order to extract meaning representations from sentences, a corpus annotated with semantic roles is obligatory. Unfortunately building such a corpus requires tremendous amount of manual work for creating semantic frames and annotation of corpus. Thereby, we have divided the annotation task into two microtasks as verb sense annotation and argument annotation tasks and employed crowd intelligence to perform these microtasks. In this paper, we present our approach and the challenges on crowdsourcing verb sense disambiguation task and introduce the resource with 5855 annotated verb senses with 83.15% annotator agreement.

Keywords

Turkish PropBank Verb sense disambiguation Crowdsourcing Semantic annotation 

References

  1. 1.
    Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)Google Scholar
  2. 2.
    Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, ACL 1998, vol. 1, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998).  https://doi.org/10.3115/980845.980860
  3. 3.
    Basile, V., Bos, J., Evang, K., Venhuizen, N.: Developing a large semantically annotated corpus. In: LREC, vol. 12, pp. 3196–3200 (2012)Google Scholar
  4. 4.
    Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A multi-representational and multi-layered treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1698381.1698417
  5. 5.
    Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: LREC, pp. 1516–1521 (2012)Google Scholar
  6. 6.
    Callison-Burch, C., Ungar, L., Pavlick, E.: Crowdsourcing for NLP. In: Proceedings of NAACL 2015. North America Association for Computational Linguistics (2015)Google Scholar
  7. 7.
    Duran, M.S., Aluísio, S.M.: Propbank-Br: a Brazilian treebank annotated with semantic role labels. In: LREC, pp. 1862–1867 (2012)Google Scholar
  8. 8.
    Eryiğit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)CrossRefGoogle Scholar
  9. 9.
    Fossati, M., Giuliano, C., Tonelli, S.: Outsourcing FrameNet to the crowd. In: ACL, vol. 2, pp. 742–747 (2013)Google Scholar
  10. 10.
    Fossati, M., Tonelli, S., Giuliano, C.: Frame semantics annotation made easy with DBpedia. In: Crowdsourcing the Semantic Web (2013)Google Scholar
  11. 11.
    Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., Ginter, F.: The Finnish proposition bank. Lang. Resour. Eval. 49(4), 907–926 (2015)CrossRefGoogle Scholar
  12. 12.
    İşgüder, G.G., Adalı, E.: Using morphosemantic information in construction of a pilot lexical semantic resource for Turkish. In: Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pp. 46–54. Association for Computational Linguistics and Dublin City University, Dublin, August 2014. http://www.aclweb.org/anthology/W14-5807
  13. 13.
    Madnani, N., Tetreault, J., Chodorow, M., Rozovskaya, A.: They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 508–513. Association for Computational Linguistics (2011)Google Scholar
  14. 14.
    Negri, M., Mehdad, Y.: Creating a bi-lingual entailment corpus through translations with mechanical turk: $100 for a 10-day rush. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 212–216. Association for Computational Linguistics (2010)Google Scholar
  15. 15.
    Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a Turkish treebank. In: Abeillè, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 261–277. Springer, Dordrecht (2003).  https://doi.org/10.1007/978-94-010-0201-1_15 CrossRefGoogle Scholar
  16. 16.
    Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)CrossRefGoogle Scholar
  17. 17.
    Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of LREC (2014)Google Scholar
  18. 18.
    Sahin, I.G.G.: Framing of verbs for Turkish PropBank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)Google Scholar
  19. 19.
    Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Doctoral dissertation, University of Pennsylvania (2005)Google Scholar
  20. 20.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)Google Scholar
  21. 21.
    Sulubacak, U., Eryiğit, G.: A redefined Turkish dependency grammar and its implementations: a new Turkish web treebank & the revised Turkish treebank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)Google Scholar
  22. 22.
    Xue, N., Palmer, M.: Adding semantic roles to the Chinese treebank. Nat. Lang. Eng. 15(1), 143–172 (2009)CrossRefGoogle Scholar
  23. 23.
    Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., Palmer, M.: The revised Arabic PropBank. In: Proceedings of the Fourth Linguistic Annotation Workshop, LAW IV 2010, pp. 222–226. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1868720.1868756
  24. 24.
    Zhai, H., Lingren, T., Deleger, L., Li, Q., Kaiser, M., Stoutenborough, L., Solti, I.: Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J. Med. Internet Res. 15(4), e73 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer EngineeringIstanbul Technical UniversityIstanbulTurkey

Personalised recommendations