Semantic Annotation of MASC

Baker, Collin; Fellbaum, Christiane; J. Passonneau, Rebecca

doi:10.1007/978-94-024-0881-2_25

Collin Baker³,
Christiane Fellbaum⁴ &
Rebecca J. Passonneau⁵

2082 Accesses
1 Citations

Abstract

Word Sense Disambiguation (WSD) continues to present a formidable challenge for Natural Language Processing. To better perform automatic WSD, manually annotated corpora are created that serve as training and testing data. When the annotation labels are drawn from an independently created lexical resource, there is an added benefit of checking the resources’ lexical inventory and sense representations against the corpus data. Such corrections can in turn benefit future manual and automatic annotation. We report on the annotation of a number of selected word forms of different parts of speech in the MASC corpus with WordNet senses. Analyses of the annotations reveal good annotator agreement for half of the lemmas but low agreement for the other half, with no obvious indications for the reasons. Through crowdsourcing, however, instead of a single label per word, we had many annotators assign labels to each word to create a corpus where we can infer a single ground truth label per sentence from the many labels, along with a confidence. Even for words with low agreement, many of the instances have confident labels. In a complementary effort, 100 of the MASC sentences with WordNet-annotated lemmas were fully annotated with FrameNet lexical units and Frame Elements. This allowed for the comparison between, and alignment of, the WordNet and FrameNet senses for the chosen lemmas. We reflect on the fundamental design differences between these two complementary resources and their respective contributions to WSD. The MASC word sense annotation effort has demonstrated that it is possible to collect reliable manual annotations of moderately polysemous words, and that we do not yet know what makes this possible for some words and not others. The corpus, therefore, can serve as a valuable resource for investigating this question.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FrameNet does not include the mathematical sense, equivalent to orthogonal.
2.
The annotation process has been described in detail in several publications. The text for this section is drawn from [27].
3.
A preliminary version of the same table appeared in [27] prior to completion of the corpus.
4.
The \(\alpha \) scores and confidence intervals are produced with Ron Artstein’s script, calculate-alpha.perl, which is distributed with the word sense sentence corpus.

References

Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Ling. 34(4), 555–596 (2008)
Article Google Scholar
Baker, C.F., Fellbaum, C.: Wordnet and framenet as complementary resources for annotation. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 125–129. Association for Computational Linguistics, Suntec, Singapore (2009). http://www.aclweb.org/anthology/W/W09/W09-3021
Chow, I.C., Webster, J.J.: Integration of linguistic resources for verb classification: FrameNet, WordNet, VerbNet, and suggested upper merged ontology. In: Proceedings of CICLing, pp. 1–11 (2007)
Google Scholar
Clark, P., Fellbaum, C., Hobbs, J.R., Harrison, P., Murray, W.R., Thompson, J.: Augmenting WordNet for deep understanding of text. In: Proceedings of the 2008 Conference on Semantics in Text Processing, STEP ’08, pp. 45–57. Association for Computational Linguistics, Stroudsburg (2008). http://dl.acm.org/citation.cfm?id=1626481.1626486
Coppola, B., Moschitti, A., Tonelli, S., Riccardi, G.: Automatic FrameNet-based annotation of conversational speech. In: Proceedings of IEEE-SLT 2008, Goa, pp. 73–76 (2008)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likellihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28(1), 20–28 (1979)
Article Google Scholar
De Cao, D., Croce, D., Basili, R.: Extensive evaluation of a FrameNet-WordNet mapping resource. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)
Google Scholar
de Melo, G., Baker, C.F., Ide, N., Passonneau, R.J., Fellbaum, C.: Empirical comparisons of MASC word sense annotations. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012). http://www.icsi.berkeley.edu/pubs/ai/empiricalcomparisons12.pdf
Erk, K., Padó, S.: Analysing models for semantic role assignment using confusability. In: Proceedings of HLT/EMNLP-05. Vancouver, Canada (2005)
Google Scholar
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
Google Scholar
Fellbaum, C., Grabowski, J., Landes, S., et al.: Analysis of a hand-tagging task. In: Proceedings of ANLP-97 Workshop on Tagging Text with Lexical Semantics: Why, What, and How (1997)
Google Scholar
Fellbaum, C., Grabowski, J., Landes, S.: Performance and confidence in a semantic annotation task. WordNet: An Electronic Lexical Database, pp. 217–239. MIT Press, Cambridge (1998)
Google Scholar
Ferrández, O., Ellsworth, M., Muñoz, R., Baker, C.F.: Aligning FrameNet and WordNet based on semantic neighborhoods. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 310–314. European Language Resources Association (ELRA), Valletta, Malta (2010)
Google Scholar
Fillmore, C.J.: Scenes-and-frames semantics. In: Zampolli, A. (ed.) Linguistic Structures Processing in Fundamental Studies in Computer Science, vol. 59. North Holland Publishing, Netherlands (1977)
Google Scholar
Fillmore, C.J.: Frame semantics. Linguistics in the Morning Calm, pp. 111–137. Hanshin Publishing Co., South Korea (1982)
Google Scholar
Fillmore, C.J., Baker, C.F.: A frames approach to semantic analysis. In: Heine, B., Narrog, H. (eds.) Oxford Handbook of Linguistic Analysis, pp. 313–341. Oxford University Press, Oxford (2010)
Google Scholar
Ide, N., Reppen, R., Suderman, K.: The American national corpus: more than the web can provide. In: Proceedings of the Third Language Resources and Evaluation Conference (LREC), pp. 839–44, Las Palmas, Canary Islands, Spain (2002). http://americannationalcorpus.org/pubs.html
Ide, N., Baker, C., Fellbaum, C., Fillmore, C., Passonneau, R.: MASC: The manually annotated sub-Corpus of American English. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), Morocco (2008)
Google Scholar
Johansson, R., Nugues, P.: LTH: Semantic structure extraction using nonprojective dependency trees. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 227–230. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2048
Kratzer, A.: Stage level and individual level predicates. In: Carlson, G., Pelletier, F.J. (eds.) The Generic Book. The University of Chicago Press, Chicago (1995). http://sf3.ub.fu-berlin.de/F/7G5IQ44ASMIYAN9352IVKTM2H45I83EMHDNLG5FKL3BP8UE914-38987?func=find-b&find_code=WRD&request=the+generic+book&adjacent=N
Kučera, H., Francis, W.N.: Computational Analysis of Present-day American English. Brown University Press, Providence (1967)
Google Scholar
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993). http://www-personal.umich.edu/~jlawler/levin.html
Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995). doi:10.1145/219717.219748
Article Google Scholar
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the Workshop on Human Language Technology, HLT ’94, pp. 240–243. Association for Computational Linguistics, Stroudsburg (1994). doi:10.3115/1075812.1075866
Passonneau, R.J., Carpenter, B.: The benefits of a model of annotation. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 187–195. Association for Computational Linguistics, Sofia, Bulgaria (2013). http://www.aclweb.org/anthology/W13-2323
Passonneau, R.J., Habash, N., Rambow, O.: Inter-annotator agreement on a multilingual semantic annotation task. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 1951–1956 (2006)
Google Scholar
Passonneau, R.J., Baker, C., Fellbaum, C., Ide, N.: The MASC word sense sentence corpus. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012)
Google Scholar
Passonneau, R.J., Bhardwaj, V., Salleb-Aouissi, A., Ide, N.: Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations. Lang. Resour. Eval. 46(2), 219–252 (2012). doi:10.1007/s10579-012-9188-x
Article Google Scholar
Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 76–83 (2005)
Google Scholar
Pradhan, S., Loper, E., Dligach, D., Palmer, M.: Semeval-2007 task-17: English lexical sample, srl and all words. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 87–92. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2016

Download references

Author information

Authors and Affiliations

International Computer Science Institute, 1947 Center St. Suite 600, Berkeley, CA, 94704, USA
Collin Baker
Princeton University, Princeton, NJ, USA
Christiane Fellbaum
Columbia University, New York, NY, USA
Rebecca J. Passonneau

Authors

Collin Baker
View author publications
You can also search for this author in PubMed Google Scholar
Christiane Fellbaum
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca J. Passonneau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Collin Baker .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baker, C., Fellbaum, C., J. Passonneau, R. (2017). Semantic Annotation of MASC. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_25

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_25
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics