Skip to main content

Semantic Annotation of MASC

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

Word Sense Disambiguation (WSD) continues to present a formidable challenge for Natural Language Processing. To better perform automatic WSD, manually annotated corpora are created that serve as training and testing data. When the annotation labels are drawn from an independently created lexical resource, there is an added benefit of checking the resources’ lexical inventory and sense representations against the corpus data. Such corrections can in turn benefit future manual and automatic annotation. We report on the annotation of a number of selected word forms of different parts of speech in the MASC corpus with WordNet senses. Analyses of the annotations reveal good annotator agreement for half of the lemmas but low agreement for the other half, with no obvious indications for the reasons. Through crowdsourcing, however, instead of a single label per word, we had many annotators assign labels to each word to create a corpus where we can infer a single ground truth label per sentence from the many labels, along with a confidence. Even for words with low agreement, many of the instances have confident labels. In a complementary effort, 100 of the MASC sentences with WordNet-annotated lemmas were fully annotated with FrameNet lexical units and Frame Elements. This allowed for the comparison between, and alignment of, the WordNet and FrameNet senses for the chosen lemmas. We reflect on the fundamental design differences between these two complementary resources and their respective contributions to WSD. The MASC word sense annotation effort has demonstrated that it is possible to collect reliable manual annotations of moderately polysemous words, and that we do not yet know what makes this possible for some words and not others. The corpus, therefore, can serve as a valuable resource for investigating this question.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FrameNet does not include the mathematical sense, equivalent to orthogonal.

  2. 2.

    The annotation process has been described in detail in several publications. The text for this section is drawn from [27].

  3. 3.

    A preliminary version of the same table appeared in [27] prior to completion of the corpus.

  4. 4.

    The \(\alpha \) scores and confidence intervals are produced with Ron Artstein’s script, calculate-alpha.perl, which is distributed with the word sense sentence corpus.

References

  1. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Ling. 34(4), 555–596 (2008)

    Article  Google Scholar 

  2. Baker, C.F., Fellbaum, C.: Wordnet and framenet as complementary resources for annotation. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 125–129. Association for Computational Linguistics, Suntec, Singapore (2009). http://www.aclweb.org/anthology/W/W09/W09-3021

  3. Chow, I.C., Webster, J.J.: Integration of linguistic resources for verb classification: FrameNet, WordNet, VerbNet, and suggested upper merged ontology. In: Proceedings of CICLing, pp. 1–11 (2007)

    Google Scholar 

  4. Clark, P., Fellbaum, C., Hobbs, J.R., Harrison, P., Murray, W.R., Thompson, J.: Augmenting WordNet for deep understanding of text. In: Proceedings of the 2008 Conference on Semantics in Text Processing, STEP ’08, pp. 45–57. Association for Computational Linguistics, Stroudsburg (2008). http://dl.acm.org/citation.cfm?id=1626481.1626486

  5. Coppola, B., Moschitti, A., Tonelli, S., Riccardi, G.: Automatic FrameNet-based annotation of conversational speech. In: Proceedings of IEEE-SLT 2008, Goa, pp. 73–76 (2008)

    Google Scholar 

  6. Dawid, A.P., Skene, A.M.: Maximum likellihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28(1), 20–28 (1979)

    Article  Google Scholar 

  7. De Cao, D., Croce, D., Basili, R.: Extensive evaluation of a FrameNet-WordNet mapping resource. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)

    Google Scholar 

  8. de Melo, G., Baker, C.F., Ide, N., Passonneau, R.J., Fellbaum, C.: Empirical comparisons of MASC word sense annotations. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012). http://www.icsi.berkeley.edu/pubs/ai/empiricalcomparisons12.pdf

  9. Erk, K., PadĂł, S.: Analysing models for semantic role assignment using confusability. In: Proceedings of HLT/EMNLP-05. Vancouver, Canada (2005)

    Google Scholar 

  10. Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Fellbaum, C., Grabowski, J., Landes, S., et al.: Analysis of a hand-tagging task. In: Proceedings of ANLP-97 Workshop on Tagging Text with Lexical Semantics: Why, What, and How (1997)

    Google Scholar 

  12. Fellbaum, C., Grabowski, J., Landes, S.: Performance and confidence in a semantic annotation task. WordNet: An Electronic Lexical Database, pp. 217–239. MIT Press, Cambridge (1998)

    Google Scholar 

  13. Ferrández, O., Ellsworth, M., Muñoz, R., Baker, C.F.: Aligning FrameNet and WordNet based on semantic neighborhoods. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 310–314. European Language Resources Association (ELRA), Valletta, Malta (2010)

    Google Scholar 

  14. Fillmore, C.J.: Scenes-and-frames semantics. In: Zampolli, A. (ed.) Linguistic Structures Processing in Fundamental Studies in Computer Science, vol. 59. North Holland Publishing, Netherlands (1977)

    Google Scholar 

  15. Fillmore, C.J.: Frame semantics. Linguistics in the Morning Calm, pp. 111–137. Hanshin Publishing Co., South Korea (1982)

    Google Scholar 

  16. Fillmore, C.J., Baker, C.F.: A frames approach to semantic analysis. In: Heine, B., Narrog, H. (eds.) Oxford Handbook of Linguistic Analysis, pp. 313–341. Oxford University Press, Oxford (2010)

    Google Scholar 

  17. Ide, N., Reppen, R., Suderman, K.: The American national corpus: more than the web can provide. In: Proceedings of the Third Language Resources and Evaluation Conference (LREC), pp. 839–44, Las Palmas, Canary Islands, Spain (2002). http://americannationalcorpus.org/pubs.html

  18. Ide, N., Baker, C., Fellbaum, C., Fillmore, C., Passonneau, R.: MASC: The manually annotated sub-Corpus of American English. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), Morocco (2008)

    Google Scholar 

  19. Johansson, R., Nugues, P.: LTH: Semantic structure extraction using nonprojective dependency trees. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 227–230. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2048

  20. Kratzer, A.: Stage level and individual level predicates. In: Carlson, G., Pelletier, F.J. (eds.) The Generic Book. The University of Chicago Press, Chicago (1995). http://sf3.ub.fu-berlin.de/F/7G5IQ44ASMIYAN9352IVKTM2H45I83EMHDNLG5FKL3BP8UE914-38987?func=find-b&find_code=WRD&request=the+generic+book&adjacent=N

  21. KuÄŤera, H., Francis, W.N.: Computational Analysis of Present-day American English. Brown University Press, Providence (1967)

    Google Scholar 

  22. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993). http://www-personal.umich.edu/~jlawler/levin.html

  23. Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995). doi:10.1145/219717.219748

    Article  Google Scholar 

  24. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the Workshop on Human Language Technology, HLT ’94, pp. 240–243. Association for Computational Linguistics, Stroudsburg (1994). doi:10.3115/1075812.1075866

  25. Passonneau, R.J., Carpenter, B.: The benefits of a model of annotation. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 187–195. Association for Computational Linguistics, Sofia, Bulgaria (2013). http://www.aclweb.org/anthology/W13-2323

  26. Passonneau, R.J., Habash, N., Rambow, O.: Inter-annotator agreement on a multilingual semantic annotation task. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 1951–1956 (2006)

    Google Scholar 

  27. Passonneau, R.J., Baker, C., Fellbaum, C., Ide, N.: The MASC word sense sentence corpus. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012)

    Google Scholar 

  28. Passonneau, R.J., Bhardwaj, V., Salleb-Aouissi, A., Ide, N.: Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations. Lang. Resour. Eval. 46(2), 219–252 (2012). doi:10.1007/s10579-012-9188-x

    Article  Google Scholar 

  29. Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 76–83 (2005)

    Google Scholar 

  30. Pradhan, S., Loper, E., Dligach, D., Palmer, M.: Semeval-2007 task-17: English lexical sample, srl and all words. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 87–92. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Collin Baker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Baker, C., Fellbaum, C., J. Passonneau, R. (2017). Semantic Annotation of MASC. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_25

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics