Abstract
Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.
This work is an extended abstract of [4].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
For such word images, most of first-ranked transcriptions have \({\le }3\) spelling errors.
References
Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E., Rossi, A.: In codice ratio: scalable transcription of historical handwritten documents. In: Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, 25–29 June 2017, p. 65 (2017)
Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: transcribe Bentham and crowdsourcing manuscript collections. Crowdsourcing Our Cultural Heritage, pp. 57–88 (2014)
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB 2003, pp. 73–78 (2003)
Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican secret archives. In codice ratio - episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018, pp. 263–272 (2018)
Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In codice ratio: OCR of handwritten Latin documents using deep convolutional networks. In: Proceedings of the 11th International Workshop on Artificial Intelligence for Cultural Heritage (2017)
Fischer, A., et al.: Automatic transcription of handwritten medieval documents. In: 15th International Conference on Virtual Systems and Multimedia. IEEE (2009)
Flaounas, I., et al.: Research methods in the age of digital journalism: massive-scale automated analysis of news-content-topics, style and gender. Digit. J. 1(1), 102–116 (2013)
Keysers, D., Deselaers, T., Rowley, H.A., Wang, L.-L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1180–1194 (2017)
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2741–2749. AAAI Press (2016)
Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. 9(2), 123–138 (2007)
Puigcerver, J., Toselli, A.H., Vidal, E.: ICDAR 2015 competition on keyword spotting for handwritten documents. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1176–1180. IEEE (2015)
Sánchez, J.A., et al.: tranScriptorium: a European project on handwritten text recognition. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 227–228. ACM (2013)
Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR 2014 competition on handwritten text recognition on tranScriptorium datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790. IEEE (2014)
Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)
Acknowledgments
We thank NVIDIA Corporation for the donation of a Quadro M5000 GPU, and Regione Lazio (Progetti di Gruppi di Ricerca) and Roma Tre University (Piano Straordinario di Sviluppo della Ricerca di Ateneo) for supporting our project “In Codice Ratio”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In: Manghi, P., Candela, L., Silvello, G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham. https://doi.org/10.1007/978-3-030-11226-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-11226-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11225-7
Online ISBN: 978-3-030-11226-4
eBook Packages: Computer ScienceComputer Science (R0)