Skip to main content

In Codice Ratio: Machine Transcription of Medieval Manuscripts

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 988))

Abstract

Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.

This work is an extended abstract of [4].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    www.inf.uniroma3.it/db/icr/.

  2. 2.

    For such word images, most of first-ranked transcriptions have \({\le }3\) spelling errors.

References

  1. Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E., Rossi, A.: In codice ratio: scalable transcription of historical handwritten documents. In: Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, 25–29 June 2017, p. 65 (2017)

    Google Scholar 

  2. Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: transcribe Bentham and crowdsourcing manuscript collections. Crowdsourcing Our Cultural Heritage, pp. 57–88 (2014)

    Google Scholar 

  3. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB 2003, pp. 73–78 (2003)

    Google Scholar 

  4. Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican secret archives. In codice ratio - episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018, pp. 263–272 (2018)

    Google Scholar 

  5. Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In codice ratio: OCR of handwritten Latin documents using deep convolutional networks. In: Proceedings of the 11th International Workshop on Artificial Intelligence for Cultural Heritage (2017)

    Google Scholar 

  6. Fischer, A., et al.: Automatic transcription of handwritten medieval documents. In: 15th International Conference on Virtual Systems and Multimedia. IEEE (2009)

    Google Scholar 

  7. Flaounas, I., et al.: Research methods in the age of digital journalism: massive-scale automated analysis of news-content-topics, style and gender. Digit. J. 1(1), 102–116 (2013)

    Google Scholar 

  8. Keysers, D., Deselaers, T., Rowley, H.A., Wang, L.-L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1180–1194 (2017)

    Article  Google Scholar 

  9. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2741–2749. AAAI Press (2016)

    Google Scholar 

  10. Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. 9(2), 123–138 (2007)

    Article  Google Scholar 

  11. Puigcerver, J., Toselli, A.H., Vidal, E.: ICDAR 2015 competition on keyword spotting for handwritten documents. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1176–1180. IEEE (2015)

    Google Scholar 

  12. Sánchez, J.A., et al.: tranScriptorium: a European project on handwritten text recognition. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 227–228. ACM (2013)

    Google Scholar 

  13. Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR 2014 competition on handwritten text recognition on tranScriptorium datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790. IEEE (2014)

    Google Scholar 

  14. Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

We thank NVIDIA Corporation for the donation of a Quadro M5000 GPU, and Regione Lazio (Progetti di Gruppi di Ricerca) and Roma Tre University (Piano Straordinario di Sviluppo della Ricerca di Ateneo) for supporting our project “In Codice Ratio”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donatella Firmani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In: Manghi, P., Candela, L., Silvello, G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham. https://doi.org/10.1007/978-3-030-11226-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11226-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11225-7

  • Online ISBN: 978-3-030-11226-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics