Document Mark-Up for Different Users and Purposes

  • David King
  • David R. Morse
Part of the Communications in Computer and Information Science book series (CCIS, volume 390)

Abstract

Semantic enhancement of texts aids their use by researchers. However, mark-up of large bodies of text is slow and requires precious expert resources. The task could be automated if there were marked-up texts to train and test mark-up tools. This paper looks at the re-purposing of texts originally marked-up to support taxonomists to provide computer scientists with training and test data for their mark-up tools. The re-purposing highlighted some key differences in the requirements of taxonomists and computer scientists and their approaches to mark-up.

Keywords

mark-up XML annotation stand-off annotation biodiversity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biodiversity Heritage Library, http://www.biodiversitylibrary.org/
  2. 2.
  3. 3.
    INOTAXA, INtegrated Open TAXonomic Access, http://www.inotaxa.org/
  4. 4.
    Weitzman, A.L., Lyal, C.H.C.: INOTAXA — INtegrated Open TAXonomic Access and the “ BiologiaCentrali-Americana”. In: Proceedings Of The Contributed Papers Sessions Biomedical And Life Sciences Division, SLA, p. 8 (2006), http://units.sla.org/division/dbio/Baltimore/index.html
  5. 5.
    ViBRANT, Virtual Biodiversity Research and Access Network for Taxonomy, http://vbrant.eu/
  6. 6.
    Murray-Rust, P., Rzepa, H.S.: Scientific publications in XML - towards a global knowledge base. Data Science 1, 84–98 (2002)CrossRefGoogle Scholar
  7. 7.
    Cui, H.: Approaches to Semantic Mark-up for Natural Heritage Literature. In: Proceedings of the iConference 2008 (2008), http://ischools.org/conference08/pc/PA5-2_iconf08.doc
  8. 8.
    Parr, C.S., Lyal, C.H.C.: Use cases for online taxonomic literature from taxonomists, conservationists, and others. In: Proceedings of TDWG Annual Conference (2007), http://www.tdwg.org/proceedings/article/view/269
  9. 9.
    Penev, L., Lyal, C.H.C., Weitzman, A., Morse, D., King, D., Sautter, G., Georgiev, T., Morris, R.A., Catapano, T., Agosti, D.: XML schemas and mark-up practices of taxonomic literature. In: Smith, V., Penev, L. (eds.) e-Infrastructures for Data Publishing in Biodiversity Science, vol. 150, pp. 89–116. ZooKeys (2011)Google Scholar
  10. 10.
  11. 11.
  12. 12.
    Weitzman, A.L., Lyal, C.H.C.: An XML schema for taxonomic literature – taXMLit - (2004), http://www.sil.si.edu/digitalcollections/bca/documentation/taXMLitv1-3Intro.pdf
  13. 13.
    TEI, Text Encoding Initiative, http://www.tei-c.org/index.xml
  14. 14.
  15. 15.
    Catapano, T.: TaxPub: An extension of the NLM/NCBI Journal Publishing DTD for taxonomic descriptions. Proceedings of the Journal Article Tag Suite Conference (2010), http://www.ncbi.nlm.nih.gov/books/NBK47081/#ref2
  16. 16.
    US National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/
  17. 17.
    Penev, L., Agosti, D., Georgiev, T., Catapano, T., Miller, J., Blagoderov, V., Roberts, D., Smith, V., Brake, I., Ryrcroft, S., Scott, B., Johnson, N., Morris, R., Sautter, G., Chavan, V., Robertson, T., Remsen, D., Stoev, P., Parr, C., Knapp, S., Kress, W., Thompson, C., Erwin, T.: Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples. ZooKeys 50, 1–16 (2010), doi:10.3897/zookeys.50.538Google Scholar
  18. 18.
  19. 19.
    Willis, A., King, D., Morse, D., Dil, A., Lyal, C., Roberts, D.: From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta (2010), http://www.lrec-conf.org/proceedings/lrec2010/pdf/787_Paper.pdf Google Scholar
  20. 20.
    Ide, N., Romary, L.: International standard for a linguistic annotation framework. Journal of Natural Language Engineering 10(3-4), 211–225 (2004)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    brat rapid annotation tool, http://brat.nlplab.org/

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • David King
    • 1
  • David R. Morse
    • 1
  1. 1.Department of Computing and CommunicationsThe Open UniversityMilton KeynesUK

Personalised recommendations