Skip to main content

Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration

  • Chapter

Part of the Studies in Computational Intelligence book series (SCI,volume 370)

Abstract

Modern NLP systems rely either on unsupervised methods, or on data created as part of governmental initiatives such as MUC, ACE, or GALE. The data created in these efforts tend to be annotated according to task-specific schemes. The Anaphoric Bank is an attempt to create large quantities of data annotated with anaphoric information according to a general purpose and linguistically motivated scheme. We do this by pooling smaller amounts of data annotated according to rich schemes that are by and large compatible, and by taking advantage of Web collaboration. In this chapter we discuss the markup infrastructure that underpins the two modalities of Web collaboration in the project: expert annotation and game-based annotation.

Keywords

  • Computational Linguistics
  • Annotate Corpus
  • Annotation Level
  • Anaphora Resolution
  • Segment Element

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-22613-7_10
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-22613-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)

    CrossRef  Google Scholar 

  2. Bird, S., Liberman, M.: Annotation graphs as a framework for multidimensional linguistic data analysis. In: Proceedings of the Workshop ”Towards Standards and Tools for Discourse Tagging”, Association for Computational Linguistics, pp. 1–10 (1999), http://xxx.lanl.gov/abs/cs.CL/9907003

  3. Broeder, D., Kemps-Snijders, M., Uytvanck, D.V., Windhouwer, M., Withers, P., Wittenburg, P., Zinn, C.: A data category registry- and component-based metadata framework. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), Valletta, Malta, pp. 43–47 (2010)

    Google Scholar 

  4. Chamberlain, J., Poesio, M., Kruschwitz, U.: Phrase Detectives: A Web-based collaborative annotation game. In: iSemantics (2008)

    Google Scholar 

  5. Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P.C. (eds.) Thinking: Readings in Cognitive Science, pp. 411–420. Cambridge University Press, Cambridge (1977)

    Google Scholar 

  6. DCMI Usage Board, DCMI Metadata Terms. DCMI Recommendation, Dublin Core Metadata Initiative (2006), http://dublincore.org/documents/dcmi-terms/

  7. van Deemter, K., Kibble, R.: On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4), 629–637 (2000)

    CrossRef  Google Scholar 

  8. Diewald, N.: Serengeti – A brief Starting Guide. Technical manual (2008), http://www.text-technology.de/publications/serengeti_guide.pdf

  9. Diewald, N., Stührenberg, M., Garbar, A., Goecke, D.: Serengeti – Webbasierte Annotation semantischer Relationen. LDV Forum 23(2) (2008)

    Google Scholar 

  10. Dipper, S.: XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Germany, pp. 39–50 (2005)

    Google Scholar 

  11. Garrett, J.J.: Ajax: A new approach to web applications (2005), http://adaptivepath.com/ideas/essays/archives/000385.php , http://adaptivepath.com/ideas/essays/archives/000385.php

  12. Hirschman, L.: MUC-7 coreference task definition, version 3.0. In: Chinchor, N. (ed.) Proceedings of the 7th Message Understanding Conference (1998), http://www.muc.saic.com/proceedings/muc_7_toc.html

  13. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proc. HLT-NAACL (2006)

    Google Scholar 

  14. Ide, N., Suderman, K.: GrAF: A Graph-based Format for Linguistic Annotations. In: Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics, Prague, Czech Republic, pp. 1–8 (2007)

    Google Scholar 

  15. IMDI (ISLE Metadata Initiative) Metadata Elements for Session Descriptions. version 3.0.4. Reference Document, MPI, Nijmegen (2003), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf

  16. IMDI (ISLE Metadata Initiative) Metadata Elements for Catalogue Descriptions. version 3.0.0. Tech. rep., MPI, Nijmegen (2004), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_Catalogue_3.0.0.pdf

  17. Johnson, N.L., Rasmussen, S., Joslyn, C., Rocha, L., Smith, S., Kantor, M.: Symbiotic Intelligence: Self-Organizing Knowledge on Distributed Networks Driven by Human Interaction. In: Proceedings of the Sixth International Conference on Artificial Life. MIT Press, Cambridge (1998)

    Google Scholar 

  18. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall, Englewood Cliffs (2008)

    Google Scholar 

  19. Krasavina, O., Chiarcos, C.: PoCoS – Potsdam Coreference Scheme. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 156–163 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1525.pdf

  20. Kruschwitz, U., Chamberlain, J., Poesio, M.: (Linguistic) Science Through Web Collaboration in the ANAWIKI Project. In: Proceedings of WebSci 2009, Athens (2009)

    Google Scholar 

  21. Morton, T., LaCivita, J.: WordFreak: An Open Tool for Linguistic Annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 17–18 (2003)

    Google Scholar 

  22. Müller, C., Strube, M.: Multi-level annotation of linguistic data with mmax2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods, English Corpus Linguistics, Peter Lang, vol. 3, pp. 197–214 (2006)

    Google Scholar 

  23. Navarretta, C.: Abstract anaphora resolution in Danish. In: Dybkjaer, L., Hasida, K., Traum, D. (eds.) Proc. of the 1st SIGdial Workshop on Discourse and Dialogue, ACL, pp. 56–65 (2000)

    Google Scholar 

  24. Orăsan C, PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan (2003)

    Google Scholar 

  25. Poesio, M.: Discourse annotation and semantic annotation in the GNOME corpus. In: Proc. of the ACL Workshop on Discourse Annotation, Barcelona, pp. 72–79 (2004)

    Google Scholar 

  26. Poesio, M.: The MATE/GNOME scheme for anaphoric annotation, revisited. In: Proceedings of SIGDIAL, Boston (2004)

    Google Scholar 

  27. Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of The ACL Workshop on Frontiers in Corpus Annotation, Association for Computational Linguistics, pp. 76–83 (2005), http://acl.ldc.upenn.edu/W/W05/W05-0311.pdf

  28. Sasaki, F., Wegener, C., Witt, A., Metzing, D., Pönninghaus, J.: Co-reference annotation and resources: A multilingual corpus of typologically diverse languages. In: Proceedings of the 3nd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain (2002)

    Google Scholar 

  29. Simons, G., Bird, S.: OLAC Metadata. OLAC: Open Language Archives Community (2003), http://www.language-archives.org/OLAC/metadata.html

  30. Siorpaes, K., Hepp, M.: Games with a purpose for the semantic web. IEEE Intelligent Systems 23(3), 50–60 (2008)

    CrossRef  Google Scholar 

  31. Stührenberg, M., Goecke, D.: SGF – An integrated model for multiple annotations and its application in a linguistic domain. In: Proceedings of Balisage: The Markup Conference, Montreal, Kanada (2008), http://www.balisage.net/Proceedings/html/2008/Stuehrenberg01/Balisage2008-Stuehrenberg01.html

  32. Stührenberg, M., Jettka, D.: A toolkit for multi-dimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec, Balisage Series on Markup Technologies (2009)

    Google Scholar 

  33. Stührenberg, M., Goecke, D., Diewald, N., Cramer, I., Mehler, A.: Webbased Annotation of Anaphoric Relations and Lexical Chains. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 140–147 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1523.pdf

  34. Thompson, H.S., McKelvie, D.: Hyperlink semantics for standoff markup of read-only documents. In: Proceedings of SGML Europe 1997: The next decade – Pushing the Envelope, Barcelona, pp. 227–229 (1997), http://www.ltg.ed.ac.uk/~ht/sgmleu97.html

  35. Waltinger, U., Mehler, A., Stührenberg, M.: An integrated model of lexical chaining: application, resources and its format. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds) KONVENS 2008 – Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin, pp. 59–70 (2008)

    Google Scholar 

  36. Witt, A., Goecke, D., Sasaki, F., Lüngen, H.: Unification of XML Documents with Concurrent Markup. Literary and Lingustic Computing 20(1), 103–116 (2005)

    CrossRef  Google Scholar 

  37. Witt, A., Stührenberg, M., Goecke, D., Metzing, D.: Integrated linguistic annotation models and their application in the domain of antecedent detection. In: Mehler, A., Kühnberger, K.U., Lobin H., Lüngen, H., Storrer, A., Witt, A. (eds.) Modelling, Learning and Processing of Text Technological Data Structures, Studies in Computational Intelligence, Springer, Heidelberg (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Poesio, M. et al. (2011). Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22613-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22612-0

  • Online ISBN: 978-3-642-22613-7

  • eBook Packages: EngineeringEngineering (R0)