Advertisement

Towards a Digital Infrastructure for Illustrated Handwritten Archives

  • Andreas Weber
  • Mahya Ameryan
  • Katherine Wolstencroft
  • Lise Stork
  • Maarten Heerlien
  • Lambert Schomaker
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10605)

Abstract

Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17,000 page account of exploration of the Indonesian Archipelago between 1820–1850 (“Natuurkundige Commissie voor Nederlands-Indië”) tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.

Keywords

Deep learning Digital heritage Natural history Biodiversity heritage 

References

  1. 1.
    Heerlien, M., Van Leusen, J., Schnörr, S., De Jong-Kole, S., Raes, N., Van Hulsen, K.: The natural history production pine: an industrial approach to the digitization of scientific collections. J. Comput. Cult. Herit. 8, 3:1–3:11 (2015)CrossRefGoogle Scholar
  2. 2.
    Pethers, H., Huertas, B.: The Dollmann collection: a case study of linking library and historical specimen collections at the Natural History Museum, London. Linnean 31, 18–22 (2015)Google Scholar
  3. 3.
    Ogilvie, B.: Correspondence networks. In: Lightman, B. (ed.) A Companion to the History of Science, pp. 358–371. Wiley (2016)CrossRefGoogle Scholar
  4. 4.
    Ridge, M. (ed.): Crowdsourcing Our Cultural Heritage. Ashgate, Farnham (2014)Google Scholar
  5. 5.
    Franzoni, C., Sauermann, H.: Crowd science: the organization of scientific research in open collaborative projects. Res. Policy 43, 1–20 (2014)CrossRefGoogle Scholar
  6. 6.
    Terras, M.: Crowdsourcing in the digital humanities. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A New Companion to Digital Humanities, pp. 420–438. Wiley, New York (2015)CrossRefGoogle Scholar
  7. 7.
    Causer, T., Tonra, J., Wallace, V.: Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham. Lit. Linguist. Comput. 27, 119–137 (2012)CrossRefGoogle Scholar
  8. 8.
    Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections. In: Crowdsourcing Our Cultural Heritage, pp. 57–88. Ashgate, Surrey (2014)Google Scholar
  9. 9.
    Orli, S., Bird, J.: Establishing workflows and opening access to data within natural history collections. Collections 12, 147–162 (2016)Google Scholar
  10. 10.
    Mitchell, W.J.T.: Picture Theory: Essays on Verbal and Visual Representation. University of Chicago Press, Chicago (1994)Google Scholar
  11. 11.
    Kusukawa, S.: Picturing the Book of Nature: Image, Text, and Argument in Sixteenth-Century Human Anatomy and Medical Botany. University of Chicago Press, Chicago (2011)Google Scholar
  12. 12.
    Kwastek, K.: Vom Bild zum Bild - digital humanities jenseits des textes. In: Baum, C., Stäcker, T. (eds.) Grenzen und Möglichkeiten der Digital Humanities (= Sonderband der Zeitschrift für digitale Geisteswissenschaften, 1) (2015)Google Scholar
  13. 13.
    van der Zant, T., Schomaker, L., Zinger, S., van Schie, H.: Where are the search engines for handwritten documents? Interdisc. Sci. Rev. 34, 224–235 (2009)CrossRefGoogle Scholar
  14. 14.
    Mühlberger, G.: Die automatisierte Volltexterkennung historischer Handschriften. In: Digitalisierung im Archiv: Neue Wege der Bereitstellung des Archivguts, pp. 87–116. Archivschule Marburg, Marburg (2015)Google Scholar
  15. 15.
    Schomaker, L.: Design considerations for a large-scale image-based text search engine in historical manuscript collections. Inf. Technol. 58, 80–88 (2016)Google Scholar
  16. 16.
    Mees, G., van Achterberg, C.: Vogelkundig onderzoek op Nieuw Guinea in 1828. Zoologische Bijdragen 40, 3–64 (1994)Google Scholar
  17. 17.
    Klaver, C.J.: Inseparable Friends in Life and Death: The Life and Work of Heinrich Kuhl (1797–1821) and Johan Conrad van Hasselt (1797–1823). Barkhuis, Groningen (2007)Google Scholar
  18. 18.
    Temminck, C.J., Müller, S., Schlegel, H., de Haan, W., Korthals, P.W.: Verhandelingen over de natuurlijke geschiedenis der Nederlandsche overzeesche bezittingen. Luchtmans, Leiden (1839–1847)Google Scholar
  19. 19.
    Roberts, T.R.: The freshwater fishes of Java, as observed by Kuhl and van Hasselt in 1820-23. Zoologische Verhandelingen 285, 1–93 (1993)Google Scholar
  20. 20.
    Fransen, C.H.J.M., Holthuis, L.B., Adama, J.P.H.M.: Type-catalogue of the Decapod Crustacea in the collections of the Nationaal Natuurhistorisch Museum, with appendices of pre-1900 collectors and material. Zoologische Verhandelingen 311, 1–344 (1997)Google Scholar
  21. 21.
    Hildenhagen, T.: Heinrich Kuhl - Das Leben eines fast vergessenen Naturforschers aus Hanau. Neues Magazin für Hanauische Geschichte, pp. 110–214 (2013)Google Scholar
  22. 22.
    See for instance the digital Cyclopaedia of Malaysian Collectors. http://www.nationaalherbarium.nl/FMCollectors/Introduction.htm. Last Accessed 08 Sep 2017
  23. 23.
    Hoogmoed, M.S., Gassó Miracle, M.E.: Type specimens of recent and fossil Testudines and Crocodylia in the collections of NCB Naturalis, Leiden, the Netherlands. Zoologische Mededeelingen 84, 159–199 (2010)Google Scholar
  24. 24.
    van der Zant, T., Schomaker, L., Haak, K.: Handwritten-word spotting using biologically inspired features. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1945–1957 (2008)CrossRefGoogle Scholar
  25. 25.
    Van Oosten, J.-P., Schomaker, L.: A Reevaluation and benchmark of hidden Markov models. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 531–536 (2014)Google Scholar
  26. 26.
    Van Oosten, J.-P., Schomaker, L.: Separability versus prototypicality in handwritten word-image retrieval. Pattern Recognit. 47, 1031–1038 (2014)CrossRefGoogle Scholar
  27. 27.
    READ project website. https://read.transkribus.eu/. Last Accessed 27 July 2017
  28. 28.
    He, S., Wiering, M., Schomaker, L.: Junction detection in handwritten documents and its application to writer identification. Pattern Recognit. 48, 4036–4048 (2015)CrossRefGoogle Scholar
  29. 29.
    Günter, S., Bunke, H.: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognit. 37, 2069–2079 (2004)CrossRefGoogle Scholar
  30. 30.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  31. 31.
    Graves, A.: RNNLIB: a recurrent neural network library for sequence learning problems. http://sourceforge.net/projects/rnnl/. Last Accessed 01 Sep 2017
  32. 32.
    Bulacu, M., Brink, A., van der Zant, T., Schomaker, L.: Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 808–812 (2009)Google Scholar
  33. 33.
    Yan, K., Verbeek, F.J.: Segmentation for high-throughput image analysis: watershed masked clustering. In: Margaria, T., Steffen, B. (eds.) ISoLA 2012. LNCS, vol. 7610, pp. 25–41. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34032-1_4CrossRefGoogle Scholar
  34. 34.
    Shi, Z.: Handwritten document images based on positional expectancy, Master thesis, Artificial Intelligence, University of Groningen, the Netherlands, May 2016Google Scholar
  35. 35.
    Gassó Miracle, M.E.: On whose authority? Temminck’s debates on zoological classification and nomenclature: 1820–1850. J. Hist. Biol. 44, 445–481 (2011)CrossRefGoogle Scholar
  36. 36.
    Stork, L., Weber, A.: A linked data approach to disclose handwritten biodiversity heritage collections. In: Presented at the Digital Humanities Benelux Conference 2017 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.BMS-STePSUniversity of TwenteEnschedeThe Netherlands
  2. 2.ALICEUniversity of GroningenGroningenThe Netherlands
  3. 3.LIACSLeiden UniversityLeidenThe Netherlands
  4. 4.Naturalis Biodiversity CenterLeidenThe Netherlands

Personalised recommendations