In Codice Ratio: Machine Transcription of Medieval Manuscripts

Ammirati, Serena; Firmani, Donatella; Maiorino, Marco; Merialdo, Paolo; Nieddu, Elena

doi:10.1007/978-3-030-11226-4_15

Serena Ammirati¹²,
Donatella Firmani¹³,
Marco Maiorino¹⁴,
Paolo Merialdo¹³ &
…
Elena Nieddu¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 988))

Included in the following conference series:

Italian Research Conference on Digital Libraries

966 Accesses
2 Citations

Abstract

Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.

This work is an extended abstract of [4].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.inf.uniroma3.it/db/icr/.
2.
For such word images, most of first-ranked transcriptions have \({\le }3\) spelling errors.

References

Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E., Rossi, A.: In codice ratio: scalable transcription of historical handwritten documents. In: Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, 25–29 June 2017, p. 65 (2017)
Google Scholar
Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: transcribe Bentham and crowdsourcing manuscript collections. Crowdsourcing Our Cultural Heritage, pp. 57–88 (2014)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB 2003, pp. 73–78 (2003)
Google Scholar
Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican secret archives. In codice ratio - episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018, pp. 263–272 (2018)
Google Scholar
Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In codice ratio: OCR of handwritten Latin documents using deep convolutional networks. In: Proceedings of the 11th International Workshop on Artificial Intelligence for Cultural Heritage (2017)
Google Scholar
Fischer, A., et al.: Automatic transcription of handwritten medieval documents. In: 15th International Conference on Virtual Systems and Multimedia. IEEE (2009)
Google Scholar
Flaounas, I., et al.: Research methods in the age of digital journalism: massive-scale automated analysis of news-content-topics, style and gender. Digit. J. 1(1), 102–116 (2013)
Google Scholar
Keysers, D., Deselaers, T., Rowley, H.A., Wang, L.-L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1180–1194 (2017)
Article Google Scholar
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2741–2749. AAAI Press (2016)
Google Scholar
Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. 9(2), 123–138 (2007)
Article Google Scholar
Puigcerver, J., Toselli, A.H., Vidal, E.: ICDAR 2015 competition on keyword spotting for handwritten documents. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1176–1180. IEEE (2015)
Google Scholar
Sánchez, J.A., et al.: tranScriptorium: a European project on handwritten text recognition. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 227–228. ACM (2013)
Google Scholar
Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR 2014 competition on handwritten text recognition on tranScriptorium datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790. IEEE (2014)
Google Scholar
Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)
Google Scholar

Download references

Acknowledgments

We thank NVIDIA Corporation for the donation of a Quadro M5000 GPU, and Regione Lazio (Progetti di Gruppi di Ricerca) and Roma Tre University (Piano Straordinario di Sviluppo della Ricerca di Ateneo) for supporting our project “In Codice Ratio”.

Author information

Authors and Affiliations

Department of Humanities, Roma Tre University, Rome, Italy
Serena Ammirati
Department of Computer Science, Roma Tre University, Rome, Italy
Donatella Firmani, Paolo Merialdo & Elena Nieddu
Vatican Secret Archives, Vatican City, Italy
Marco Maiorino

Authors

Serena Ammirati
View author publications
You can also search for this author in PubMed Google Scholar
Donatella Firmani
View author publications
You can also search for this author in PubMed Google Scholar
Marco Maiorino
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Merialdo
View author publications
You can also search for this author in PubMed Google Scholar
Elena Nieddu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donatella Firmani .

Editor information

Editors and Affiliations

Italian National Research Council, Pisa, Italy
Paolo Manghi
Italian National Research Council, Pisa, Italy
Leonardo Candela
University of Padua, Padua, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In: Manghi, P., Candela, L., Silvello, G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham. https://doi.org/10.1007/978-3-030-11226-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-11226-4_15
Published: 15 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11225-7
Online ISBN: 978-3-030-11226-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics