Advertisement

From Handwritten Manuscripts to Linked Data

  • Lise StorkEmail author
  • Andreas Weber
  • Jaap van den Herik
  • Aske Plaat
  • Fons Verbeek
  • Katherine Wolstencroft
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11057)

Abstract

Museums, archives and digital libraries make increasing use of Semantic Web technologies to enrich and publish their collection items. The contents of those items, however, are not often enriched in the same way. Extracting named entities within historical manuscripts and disclosing the relationships between them would facilitate cultural heritage research, but it is a labour-intensive and time-consuming process, particularly for handwritten documents.

It requires either automated handwriting recognition techniques, or manual annotation by domain experts before the content can be semantically structured. Different workflows have been proposed to address this problem, involving full-text transcription and named entity extraction, with results ranging from unstructured files to semantically annotated knowledge bases. Here, we detail these workflows and describe the approach we have taken to disclose historical biodiversity data, which enables the direct labelling and semantic annotation of document images in hand-written archives.

Keywords

Linked data Cultural heritage Handwriting recognition Semantic annotation Named entity recognition 

References

  1. 1.
    The Field Book Project. https://siarchives.si.edu/about/field-book-project. Accessed 14 Mar 2018
  2. 2.
    Baechler, M., Fischer, A., Naji, N., Ingold, R., Bunke, H., Savoy, J.: HisDoc: historical document analysis, recognition, and retrieval. In: Proceedings of Digital Humanities, pp. 94–96. University of Hamburg, July 2012Google Scholar
  3. 3.
    Dijkshoorn, C., De Boer, V., Aroyo, L., Schreiber, G.: Accurator: nichesourcing for cultural heritage. Computing Research Repository, abs/1709.09249 (2017)Google Scholar
  4. 4.
    Kahan, J., Koivunen, M.R., Prud’Hommeaux, E., Swick, R.R.: Annotea: an open RDF infrastructure for shared web annotations. Comput. Netw. 39(5), 589–608 (2002)CrossRefGoogle Scholar
  5. 5.
    Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus-a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 4, pp. 19–24. IEEE (2017)Google Scholar
  6. 6.
    Moyle, M., Tonra, J., Wallace, V.: Manuscript transcription by crowdsourcing: transcribe bentham. Liber Q. 20(3–4), 347–356 (2011)CrossRefGoogle Scholar
  7. 7.
    Schomaker, L.: Design considerations for a large-scale image-based text search engine in historical manuscript collections. IT - Inf. Technol. 58(2), 80–88 (2016)Google Scholar
  8. 8.
    Stork, L., et al.: Semantic annotation of natural history collections. Web Semant.: Sci. Serv. Agents World Wide Web. (2018).  https://doi.org/10.1016/j.websem.2018.06.002
  9. 9.
    Thomer, A., Vaidya, G., Guralnick, R., Bloom, D., Russell, L.: From documents to datasets: a mediawiki-based metod of annotating and extracting species observations in century-old field notebooks. ZooKeys 209, 235–253 (2012)CrossRefGoogle Scholar
  10. 10.
    Weber, A., Ameryan, M., Wolstencroft, K., Stork, L., Heerlien, M., Schomaker, L.: Towards a digital infrastructure for illustrated handwritten archives. In: Ioannides, M. (ed.) ITN-DCH 2017. LNCS, vol. 10605, pp. 155–166. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-75826-8_13CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lise Stork
    • 1
    • 2
    Email author
  • Andreas Weber
    • 3
  • Jaap van den Herik
    • 1
    • 2
  • Aske Plaat
    • 1
    • 2
  • Fons Verbeek
    • 1
    • 2
  • Katherine Wolstencroft
    • 1
    • 2
  1. 1.Leiden Institute of Advanced Computer ScienceLeidenThe Netherlands
  2. 2.The Leiden Centre of Data ScienceLeidenThe Netherlands
  3. 3.University of TwenteEnschedeThe Netherlands

Personalised recommendations