Towards a Digital Infrastructure for Illustrated Handwritten Archives

Weber, Andreas; Ameryan, Mahya; Wolstencroft, Katherine; Stork, Lise; Heerlien, Maarten; Schomaker, Lambert

doi:10.1007/978-3-319-75826-8_13

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10605))

2383 Accesses
11 Citations
3 Altmetric

Abstract

Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17,000 page account of exploration of the Indonesian Archipelago between 1820–1850 (“Natuurkundige Commissie voor Nederlands-Indië”) tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.

Mahya Ameryan and Andreas Weber share the first authorship of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The Metamorfoze programme funds the preservation of paper heritage that is deemed to be of national importance for the Netherlands. The FCD programme (FES Collection Digitization, 2010–2015) digitized a significant part of the specimens preserved by Naturalis.
2.
An exception is: https://kogs-www.informatik.uni-hamburg.de/projekte/IMPACT.html, last accessed 2017/09/02.
3.
http://biodivlib.wikispaces.com/The+Field+Book+Project, last accessed 2017/09/01.
4.
https://github.com/lisestork/NHC-Ontology, last accessed 2017/09/01.

References

Heerlien, M., Van Leusen, J., Schnörr, S., De Jong-Kole, S., Raes, N., Van Hulsen, K.: The natural history production pine: an industrial approach to the digitization of scientific collections. J. Comput. Cult. Herit. 8, 3:1–3:11 (2015)
Article Google Scholar
Pethers, H., Huertas, B.: The Dollmann collection: a case study of linking library and historical specimen collections at the Natural History Museum, London. Linnean 31, 18–22 (2015)
Google Scholar
Ogilvie, B.: Correspondence networks. In: Lightman, B. (ed.) A Companion to the History of Science, pp. 358–371. Wiley (2016)
Chapter Google Scholar
Ridge, M. (ed.): Crowdsourcing Our Cultural Heritage. Ashgate, Farnham (2014)
Google Scholar
Franzoni, C., Sauermann, H.: Crowd science: the organization of scientific research in open collaborative projects. Res. Policy 43, 1–20 (2014)
Article Google Scholar
Terras, M.: Crowdsourcing in the digital humanities. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A New Companion to Digital Humanities, pp. 420–438. Wiley, New York (2015)
Chapter Google Scholar
Causer, T., Tonra, J., Wallace, V.: Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham. Lit. Linguist. Comput. 27, 119–137 (2012)
Article Google Scholar
Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections. In: Crowdsourcing Our Cultural Heritage, pp. 57–88. Ashgate, Surrey (2014)
Google Scholar
Orli, S., Bird, J.: Establishing workflows and opening access to data within natural history collections. Collections 12, 147–162 (2016)
Google Scholar
Mitchell, W.J.T.: Picture Theory: Essays on Verbal and Visual Representation. University of Chicago Press, Chicago (1994)
Google Scholar
Kusukawa, S.: Picturing the Book of Nature: Image, Text, and Argument in Sixteenth-Century Human Anatomy and Medical Botany. University of Chicago Press, Chicago (2011)
Google Scholar
Kwastek, K.: Vom Bild zum Bild - digital humanities jenseits des textes. In: Baum, C., Stäcker, T. (eds.) Grenzen und Möglichkeiten der Digital Humanities (= Sonderband der Zeitschrift für digitale Geisteswissenschaften, 1) (2015)
Google Scholar
van der Zant, T., Schomaker, L., Zinger, S., van Schie, H.: Where are the search engines for handwritten documents? Interdisc. Sci. Rev. 34, 224–235 (2009)
Article Google Scholar
Mühlberger, G.: Die automatisierte Volltexterkennung historischer Handschriften. In: Digitalisierung im Archiv: Neue Wege der Bereitstellung des Archivguts, pp. 87–116. Archivschule Marburg, Marburg (2015)
Google Scholar
Schomaker, L.: Design considerations for a large-scale image-based text search engine in historical manuscript collections. Inf. Technol. 58, 80–88 (2016)
Google Scholar
Mees, G., van Achterberg, C.: Vogelkundig onderzoek op Nieuw Guinea in 1828. Zoologische Bijdragen 40, 3–64 (1994)
Google Scholar
Klaver, C.J.: Inseparable Friends in Life and Death: The Life and Work of Heinrich Kuhl (1797–1821) and Johan Conrad van Hasselt (1797–1823). Barkhuis, Groningen (2007)
Google Scholar
Temminck, C.J., Müller, S., Schlegel, H., de Haan, W., Korthals, P.W.: Verhandelingen over de natuurlijke geschiedenis der Nederlandsche overzeesche bezittingen. Luchtmans, Leiden (1839–1847)
Google Scholar
Roberts, T.R.: The freshwater fishes of Java, as observed by Kuhl and van Hasselt in 1820-23. Zoologische Verhandelingen 285, 1–93 (1993)
Google Scholar
Fransen, C.H.J.M., Holthuis, L.B., Adama, J.P.H.M.: Type-catalogue of the Decapod Crustacea in the collections of the Nationaal Natuurhistorisch Museum, with appendices of pre-1900 collectors and material. Zoologische Verhandelingen 311, 1–344 (1997)
Google Scholar
Hildenhagen, T.: Heinrich Kuhl - Das Leben eines fast vergessenen Naturforschers aus Hanau. Neues Magazin für Hanauische Geschichte, pp. 110–214 (2013)
Google Scholar
See for instance the digital Cyclopaedia of Malaysian Collectors. http://www.nationaalherbarium.nl/FMCollectors/Introduction.htm. Last Accessed 08 Sep 2017
Hoogmoed, M.S., Gassó Miracle, M.E.: Type specimens of recent and fossil Testudines and Crocodylia in the collections of NCB Naturalis, Leiden, the Netherlands. Zoologische Mededeelingen 84, 159–199 (2010)
Google Scholar
van der Zant, T., Schomaker, L., Haak, K.: Handwritten-word spotting using biologically inspired features. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1945–1957 (2008)
Article Google Scholar
Van Oosten, J.-P., Schomaker, L.: A Reevaluation and benchmark of hidden Markov models. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 531–536 (2014)
Google Scholar
Van Oosten, J.-P., Schomaker, L.: Separability versus prototypicality in handwritten word-image retrieval. Pattern Recognit. 47, 1031–1038 (2014)
Article Google Scholar
READ project website. https://read.transkribus.eu/. Last Accessed 27 July 2017
He, S., Wiering, M., Schomaker, L.: Junction detection in handwritten documents and its application to writer identification. Pattern Recognit. 48, 4036–4048 (2015)
Article Google Scholar
Günter, S., Bunke, H.: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognit. 37, 2069–2079 (2004)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Graves, A.: RNNLIB: a recurrent neural network library for sequence learning problems. http://sourceforge.net/projects/rnnl/. Last Accessed 01 Sep 2017
Bulacu, M., Brink, A., van der Zant, T., Schomaker, L.: Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 808–812 (2009)
Google Scholar
Yan, K., Verbeek, F.J.: Segmentation for high-throughput image analysis: watershed masked clustering. In: Margaria, T., Steffen, B. (eds.) ISoLA 2012. LNCS, vol. 7610, pp. 25–41. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34032-1_4
Chapter Google Scholar
Shi, Z.: Handwritten document images based on positional expectancy, Master thesis, Artificial Intelligence, University of Groningen, the Netherlands, May 2016
Google Scholar
Gassó Miracle, M.E.: On whose authority? Temminck’s debates on zoological classification and nomenclature: 1820–1850. J. Hist. Biol. 44, 445–481 (2011)
Article Google Scholar
Stork, L., Weber, A.: A linked data approach to disclose handwritten biodiversity heritage collections. In: Presented at the Digital Humanities Benelux Conference 2017 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

BMS-STePS, University of Twente, 7500 AE, Enschede, The Netherlands
Andreas Weber
ALICE, University of Groningen, Nijenborgh 9, 9747 AG, Groningen, The Netherlands
Mahya Ameryan & Lambert Schomaker
LIACS, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Katherine Wolstencroft & Lise Stork
Naturalis Biodiversity Center, PO Box 9517, 2300 RA, Leiden, The Netherlands
Maarten Heerlien

Authors

Andreas Weber
View author publications
You can also search for this author in PubMed Google Scholar
Mahya Ameryan
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Wolstencroft
View author publications
You can also search for this author in PubMed Google Scholar
Lise Stork
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Heerlien
View author publications
You can also search for this author in PubMed Google Scholar
Lambert Schomaker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Weber .

Editor information

Editors and Affiliations

Cyprus University of Technology , Limassol, Cyprus
Marinos Ioannides

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Weber, A., Ameryan, M., Wolstencroft, K., Stork, L., Heerlien, M., Schomaker, L. (2018). Towards a Digital Infrastructure for Illustrated Handwritten Archives. In: Ioannides, M. (eds) Digital Cultural Heritage. Lecture Notes in Computer Science(), vol 10605. Springer, Cham. https://doi.org/10.1007/978-3-319-75826-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-75826-8_13
Published: 23 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75825-1
Online ISBN: 978-3-319-75826-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics