Abstract
Dematerialization and digitalization of historical documents are key elements for their availability, preservation and diffusion. Unfortunately, the conversion from handwritten to digitalized documents presents several technical challenges.
The XDOCS project is created with the main goal of making available and extending the usability of historical documents for a great variety of audience, like scholars, institutions and libraries. In this paper, the core elements of XDOCS, i.e. page dewarping and word spotting technique, are described and two new applications, i.e. annotation/indexing and search tool, are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Balducci, F., Borghi, G.: An annotation tool for a digital library system of epidermal data. In: Grana, C., Baraldi, L. (eds.) IRCDL 2017. CCIS, vol. 733, pp. 173–186. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68130-6_14
Bolelli, F.: Indexing of historical document images: ad hoc dewarping technique for handwritten text. In: Grana, C., Baraldi, L. (eds.) IRCDL 2017. CCIS, vol. 733, pp. 45–55. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68130-6_4
Bolelli, F., Borghi, G., Grana, C.: Historical handwritten text images word spotting through sliding window HOG features. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 729–738. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68560-1_65
Cao, H., Ding, X., Liu, C.: Rectifying the bound document image captured by the camera: a model based approach. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 71–74. IEEE (2003)
Corbelli, A., Baraldi, L., Balducci, F., Grana, C., Cucchiara, R.: Layout analysis and content classification in digitized books. In: Agosti, M., Bertini, M., Ferilli, S., Marinai, S., Orio, N. (eds.) IRCDL 2016. CCIS, vol. 701, pp. 153–165. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56300-8_14
Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972)
Fu, B., Wu, M., Li, R., Li, W., Xu, Z., Yang, C.: A model-based book dewarping method using text line detection. In: Proceedings of the 2nd International Workshop on Camera Based Document Analysis and Recognition, Curitiba, Barazil, pp. 63–70 (2007)
Gatos, B., Pratikakis, I., Ntirogiannis, K.: Segmentation based recovery of arbitrarily warped document images. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 989–993. IEEE (2007)
Kolcz, A., Alspector, J., Augusteijn, M., Carlson, R., Popescu, G.V.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 153–168 (2000)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Manmatha, R., Croft, W.: Word spotting: Indexing handwritten archives. In: Intelligent Multimedia Information Retrieval Collection, pp. 43–64 (1997)
Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of the first ACM International Conference on Digital Libraries, pp. 151–159. ACM (1996)
Pini, S., Cornia, M., Baraldi, L., Cucchiara, R.: Towards video captioning with naming: a novel dataset and a multi-modal approach. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 384–395. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_36
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 218–222. IEEE (2003)
Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: Proceedings of the 1st ICFHR, pp. 7–12 (2008)
Stamatopoulos, N., Gatos, B., Pratikakis, I., Perantonis, S.J.: A two-step dewarping of camera document images. In: The Eighth IAPR International Workshop on Document Analysis Systems, DAS 2008, pp. 209–216. IEEE (2008)
Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 437–441. IEEE (2005)
Terasawa, K., Tanaka, Y.: Slit style hog feature for document image word spotting. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 116–120. IEEE (2009)
Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 413–418. IEEE (2002)
Ulges, A., Lampert, C.H., Breuel, T.M.: Document image dewarping using robust estimation of curled text lines. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 1001–1005. IEEE (2005)
Acknowledgement
The XDOCS project is currently underway at SATA s.r.l. in collaboration with the University of Modena and Reggio-Emilia, and co-funded by the Emilia-Romagna regional administration.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Bolelli, F., Borghi, G., Grana, C. (2018). XDOCS: An Application to Index Historical Documents. In: Serra, G., Tasso, C. (eds) Digital Libraries and Multimedia Archives. IRCDL 2018. Communications in Computer and Information Science, vol 806. Springer, Cham. https://doi.org/10.1007/978-3-319-73165-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-73165-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73164-3
Online ISBN: 978-3-319-73165-0
eBook Packages: Computer ScienceComputer Science (R0)