Advertisement

Making Large Collections of Handwritten Material Easily Accessible and Searchable

  • Anders HastEmail author
  • Per Cullhed
  • Ekta Vats
  • Matteo Abrate
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 988)

Abstract

Libraries and cultural organisations contain a rich amount of digitised historical handwritten material in the form of scanned images. A vast majority of this material has not been transcribed yet, owing to technological challenges and lack of expertise. This renders the task of making these historical collections available for public access challenging, especially in performing a simple text search across the collection. Machine learning based methods for handwritten text recognition are gaining importance these days, which require huge amount of pre-transcribed texts for training the system. However, it is impractical to have access to several thousands of pre-transcribed documents due to adversities transcribers face. Therefore, this paper presents a training-free word spotting algorithm as an alternative for handwritten text transcription, where case studies on Alvin (Swedish repository) and Clavius on the Web are presented. The main focus of this work is on discussing prospects of making materials in the Alvin platform and Clavius on the Web easily searchable using a word spotting based handwritten text recognition system.

Keywords

Transcription Handwritten text recognition Word spotting Alvin Clavius on the Web 

Notes

Acknowledgment

This work was supported by the Swedish strategic research programme eSSENCE and the Riksbankens Jubileumsfond (Dnr NHS14-2068:1).

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    Abrate, M., et al.: Sharing cultural heritage: the clavius on the web project. In: LREC, pp. 627–634 (2014)Google Scholar
  7. 7.
    Pedretti, I., et al.: The clavius on the web project: digitization, annotation and visualization of early modern manuscripts. In: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem, p. 11. ACM (2014)Google Scholar
  8. 8.
  9. 9.
  10. 10.
    Valsecchi, F., Abrate, M., Bacciu, C., Piccini, S., Marchetti, A.: Text encoder and annotator: an all-in-one editor for transcribing and annotating manuscripts with RDF. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 399–407. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-47602-5_52CrossRefGoogle Scholar
  11. 11.
    Piccini, S., et al.: When traditional ontologies are not enough: modelling and visualizing dynamic ontologies in semantic-based access to texts. In: Digital Humanities 2016: Conference Abstracts, Jagiellonian University and Pedagogical University, Kraków (2016)Google Scholar
  12. 12.
    Piccini, S., Bellandi, A., Benotto, G.: Formalizing and querying a diachronic termino-ontological resource: the clavius case study. In: Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, Krakow, Poland, 11 July 2016, pp. 38–41, no. 126. Linköping University Electronic Press (2016)Google Scholar
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
    Romero, V., Bosch, V., Hernández, C., Vidal, E., Sánchez, J.A.: A historical document handwriting transcription end-to-end system. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 149–157. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58838-4_17CrossRefGoogle Scholar
  19. 19.
    Terrades, O.R., Toselli, A.H., Serrano, N., Romero, V., Vidal, E., Juan, A.: Interactive layout analysis and transcription systems for historic handwritten documents. In: 10th ACM Symposium on Document Engineering, pp. 219–222 (2010)Google Scholar
  20. 20.
    Serrano, N., Pérez, D., Sanchis, A., Juan, A.: Adaptation from partially supervised handwritten text transcriptions. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 289–292. ACM, New York (2009)Google Scholar
  21. 21.
    Serrano, N., Giménez, A., Sanchis, A., Juan, A.: Active learning strategies for handwritten text transcription. In: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010, pp. 48:1–48:4. ACM, New York (2010)Google Scholar
  22. 22.
    Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription, vol. 80. World Scientific, Singapore (2012)zbMATHGoogle Scholar
  23. 23.
  24. 24.
    Moyle, M., Tonra, J., Wallace, V.: Manuscript transcription by crowdsourcing: transcribe Bentham. Liber Q. 20(3–4), 347–356 (2011)CrossRefGoogle Scholar
  25. 25.
  26. 26.
    Hast, A., Fornés, A.: A segmentation-free handwritten word spotting approach by relaxed feature matching. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 150–155. IEEE (2016)Google Scholar
  27. 27.
    Vats, E., Hast, A., Singh, P.: Automatic document image binarization using Bayesian optimization. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, pp. 89–94. ACM (2017)Google Scholar
  28. 28.
    Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)CrossRefGoogle Scholar
  29. 29.
    Hast, A., Vats, E.: Radial line Fourier descriptor for historical handwritten text representation. In: 26th International Conference on Computer Graphics, Visualization and Computer Vision (2018)Google Scholar
  30. 30.
    Zagoris, K., Pratikakis, I., Gatos, B.: Unsupervised word spotting in historical handwritten document images using document-oriented local features. IEEE Trans. Image Process. 26(8), 4032–4041 (2017)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Leydier, Y., Ouji, A., LeBourgeois, F., Emptoz, H.: Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognit. 42(9), 2089–2105 (2009)CrossRefGoogle Scholar
  32. 32.
    Hast, A., Marchetti, A.: An efficient preconditioner and a modified RANSAC for fast and robust feature matching. In: WSCG 2012 (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anders Hast
    • 1
    Email author
  • Per Cullhed
    • 2
  • Ekta Vats
    • 1
  • Matteo Abrate
    • 3
  1. 1.Department of Information TechnologyUppsala UniversityUppsalaSweden
  2. 2.University LibraryUppsala UniversityUppsalaSweden
  3. 3.Institute of Informatics and Telematics, CNRPisaItaly

Personalised recommendations