Skip to main content

Towards a Digital Infrastructure for Illustrated Handwritten Archives

  • Chapter
  • First Online:
Digital Cultural Heritage

Abstract

Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17,000 page account of exploration of the Indonesian Archipelago between 1820–1850 (“Natuurkundige Commissie voor Nederlands-Indië”) tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.

Mahya Ameryan and Andreas Weber share the first authorship of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Metamorfoze programme funds the preservation of paper heritage that is deemed to be of national importance for the Netherlands. The FCD programme (FES Collection Digitization, 2010–2015) digitized a significant part of the specimens preserved by Naturalis.

  2. 2.

    An exception is: https://kogs-www.informatik.uni-hamburg.de/projekte/IMPACT.html, last accessed 2017/09/02.

  3. 3.

    http://biodivlib.wikispaces.com/The+Field+Book+Project, last accessed 2017/09/01.

  4. 4.

    https://github.com/lisestork/NHC-Ontology, last accessed 2017/09/01.

References

  1. Heerlien, M., Van Leusen, J., Schnörr, S., De Jong-Kole, S., Raes, N., Van Hulsen, K.: The natural history production pine: an industrial approach to the digitization of scientific collections. J. Comput. Cult. Herit. 8, 3:1–3:11 (2015)

    Article  Google Scholar 

  2. Pethers, H., Huertas, B.: The Dollmann collection: a case study of linking library and historical specimen collections at the Natural History Museum, London. Linnean 31, 18–22 (2015)

    Google Scholar 

  3. Ogilvie, B.: Correspondence networks. In: Lightman, B. (ed.) A Companion to the History of Science, pp. 358–371. Wiley (2016)

    Chapter  Google Scholar 

  4. Ridge, M. (ed.): Crowdsourcing Our Cultural Heritage. Ashgate, Farnham (2014)

    Google Scholar 

  5. Franzoni, C., Sauermann, H.: Crowd science: the organization of scientific research in open collaborative projects. Res. Policy 43, 1–20 (2014)

    Article  Google Scholar 

  6. Terras, M.: Crowdsourcing in the digital humanities. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A New Companion to Digital Humanities, pp. 420–438. Wiley, New York (2015)

    Chapter  Google Scholar 

  7. Causer, T., Tonra, J., Wallace, V.: Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham. Lit. Linguist. Comput. 27, 119–137 (2012)

    Article  Google Scholar 

  8. Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections. In: Crowdsourcing Our Cultural Heritage, pp. 57–88. Ashgate, Surrey (2014)

    Google Scholar 

  9. Orli, S., Bird, J.: Establishing workflows and opening access to data within natural history collections. Collections 12, 147–162 (2016)

    Google Scholar 

  10. Mitchell, W.J.T.: Picture Theory: Essays on Verbal and Visual Representation. University of Chicago Press, Chicago (1994)

    Google Scholar 

  11. Kusukawa, S.: Picturing the Book of Nature: Image, Text, and Argument in Sixteenth-Century Human Anatomy and Medical Botany. University of Chicago Press, Chicago (2011)

    Google Scholar 

  12. Kwastek, K.: Vom Bild zum Bild - digital humanities jenseits des textes. In: Baum, C., Stäcker, T. (eds.) Grenzen und Möglichkeiten der Digital Humanities (= Sonderband der Zeitschrift für digitale Geisteswissenschaften, 1) (2015)

    Google Scholar 

  13. van der Zant, T., Schomaker, L., Zinger, S., van Schie, H.: Where are the search engines for handwritten documents? Interdisc. Sci. Rev. 34, 224–235 (2009)

    Article  Google Scholar 

  14. Mühlberger, G.: Die automatisierte Volltexterkennung historischer Handschriften. In: Digitalisierung im Archiv: Neue Wege der Bereitstellung des Archivguts, pp. 87–116. Archivschule Marburg, Marburg (2015)

    Google Scholar 

  15. Schomaker, L.: Design considerations for a large-scale image-based text search engine in historical manuscript collections. Inf. Technol. 58, 80–88 (2016)

    Google Scholar 

  16. Mees, G., van Achterberg, C.: Vogelkundig onderzoek op Nieuw Guinea in 1828. Zoologische Bijdragen 40, 3–64 (1994)

    Google Scholar 

  17. Klaver, C.J.: Inseparable Friends in Life and Death: The Life and Work of Heinrich Kuhl (1797–1821) and Johan Conrad van Hasselt (1797–1823). Barkhuis, Groningen (2007)

    Google Scholar 

  18. Temminck, C.J., Müller, S., Schlegel, H., de Haan, W., Korthals, P.W.: Verhandelingen over de natuurlijke geschiedenis der Nederlandsche overzeesche bezittingen. Luchtmans, Leiden (1839–1847)

    Google Scholar 

  19. Roberts, T.R.: The freshwater fishes of Java, as observed by Kuhl and van Hasselt in 1820-23. Zoologische Verhandelingen 285, 1–93 (1993)

    Google Scholar 

  20. Fransen, C.H.J.M., Holthuis, L.B., Adama, J.P.H.M.: Type-catalogue of the Decapod Crustacea in the collections of the Nationaal Natuurhistorisch Museum, with appendices of pre-1900 collectors and material. Zoologische Verhandelingen 311, 1–344 (1997)

    Google Scholar 

  21. Hildenhagen, T.: Heinrich Kuhl - Das Leben eines fast vergessenen Naturforschers aus Hanau. Neues Magazin für Hanauische Geschichte, pp. 110–214 (2013)

    Google Scholar 

  22. See for instance the digital Cyclopaedia of Malaysian Collectors. http://www.nationaalherbarium.nl/FMCollectors/Introduction.htm. Last Accessed 08 Sep 2017

  23. Hoogmoed, M.S., Gassó Miracle, M.E.: Type specimens of recent and fossil Testudines and Crocodylia in the collections of NCB Naturalis, Leiden, the Netherlands. Zoologische Mededeelingen 84, 159–199 (2010)

    Google Scholar 

  24. van der Zant, T., Schomaker, L., Haak, K.: Handwritten-word spotting using biologically inspired features. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1945–1957 (2008)

    Article  Google Scholar 

  25. Van Oosten, J.-P., Schomaker, L.: A Reevaluation and benchmark of hidden Markov models. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 531–536 (2014)

    Google Scholar 

  26. Van Oosten, J.-P., Schomaker, L.: Separability versus prototypicality in handwritten word-image retrieval. Pattern Recognit. 47, 1031–1038 (2014)

    Article  Google Scholar 

  27. READ project website. https://read.transkribus.eu/. Last Accessed 27 July 2017

  28. He, S., Wiering, M., Schomaker, L.: Junction detection in handwritten documents and its application to writer identification. Pattern Recognit. 48, 4036–4048 (2015)

    Article  Google Scholar 

  29. Günter, S., Bunke, H.: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognit. 37, 2069–2079 (2004)

    Article  Google Scholar 

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  31. Graves, A.: RNNLIB: a recurrent neural network library for sequence learning problems. http://sourceforge.net/projects/rnnl/. Last Accessed 01 Sep 2017

  32. Bulacu, M., Brink, A., van der Zant, T., Schomaker, L.: Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 808–812 (2009)

    Google Scholar 

  33. Yan, K., Verbeek, F.J.: Segmentation for high-throughput image analysis: watershed masked clustering. In: Margaria, T., Steffen, B. (eds.) ISoLA 2012. LNCS, vol. 7610, pp. 25–41. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34032-1_4

    Chapter  Google Scholar 

  34. Shi, Z.: Handwritten document images based on positional expectancy, Master thesis, Artificial Intelligence, University of Groningen, the Netherlands, May 2016

    Google Scholar 

  35. Gassó Miracle, M.E.: On whose authority? Temminck’s debates on zoological classification and nomenclature: 1820–1850. J. Hist. Biol. 44, 445–481 (2011)

    Article  Google Scholar 

  36. Stork, L., Weber, A.: A linked data approach to disclose handwritten biodiversity heritage collections. In: Presented at the Digital Humanities Benelux Conference 2017 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Weber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Weber, A., Ameryan, M., Wolstencroft, K., Stork, L., Heerlien, M., Schomaker, L. (2018). Towards a Digital Infrastructure for Illustrated Handwritten Archives. In: Ioannides, M. (eds) Digital Cultural Heritage. Lecture Notes in Computer Science(), vol 10605. Springer, Cham. https://doi.org/10.1007/978-3-319-75826-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75826-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75825-1

  • Online ISBN: 978-3-319-75826-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics