Abstract
In the context of sustainability of document management technologies, this paper presents a new system for layout-based document retrieval specifically designed for commercial form retrieval. The system first uses a technique based on mathematical morphology to extract grid-based structural components from the document image. Successively, Radon Transform is used for document layout description. A document matching technique based on dynamic time warping is finally adopted. The experimental results carried out on real and simulated data set, demonstrate the effectiveness of the approach with respect to different classes of commercial forms.
Chapter PDF
Similar content being viewed by others
Keywords
References
Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge Press (2009)
Doermann, D.: The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Understanding 70(3), 287–298 (1998)
Ko, Y.: A study of term weighting schemes using class information for text classification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1029–1030. ACM, NY (2012)
Marukawa, K., Hu, T., Fujisawa, H., Shima, Y.: Document retrieval tolerating character recognition errors - Evaluation and application. Pattern Recognition 30(8), 1361–1371 (1997)
Taghva, K., Borsack, J., Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text. ACM TOIS 14(1), 64–93 (1996)
Lopresti, D.: Robust Retrieval of noisy text. In: Proceedings of the Third Forum on Research and Technology Advances in, pp.76–85 (1996)
Doermann, D.: The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Understanding 70(3), 287–298 (1998)
Mitra, M., Chaudhuri, B.: Information retrieval from documents: A Survey. Information Retrieval 2(2/3), 141–163 (2000)
Tzacheva, A., El-Sonbaty, Y., El-Kwae, A.: Document Image Matching Using a Maximal Grid Approach. In: Proc. SPIE Document Recognition and Retrieval IX, pp. 121–128 (2002)
Duygulu, P., Atalay, V.: A Hierarchical Representation of Form Documents for Identification and Retrieval. International Journal on Document Analysis and Recognition 5(1), 17–27 (2002)
Huang, M., Dementhon, D., Doermann, D., Golebiowski, L.: Document ranking by layout relevance. In: Proc. Eighth International Conference on, vol. 1, pp. 362–366 (2005)
Erol, B., Antúnez, E., Hull, J.J.: Hotpaper: multimedia interaction with paper using mobile phones. In: Proceeding of the 16th ACM International Conference on Multimedia, pp. 399–408 (2008)
Liu, Q., Liao, C.: PaperUI. In: Iwamura, M., Shafait, F. (eds.) CBDAR 2011. LNCS, vol. 7139, pp. 83–100. Springer, Heidelberg (2012)
Serra, J.: Image Analysis and Mathematical Morphology. Academic Press (1982)
Pirlo, G.: Removing Underlines from Handwritten Text: An experimental investigation. In: Downton, C., et al. (eds.) Handwriting Recognition, pp. 497–502. World Scientific Publishing Co. Pte. Ltd., Singapore (1997) (in Progress)
Cormack, A.M.: Computed tomography: Some history and recent developments. In: Proc. Symposia in Applied Mathematics, vol. 27, pp. 35–42 (1983)
Deans, S.R.: The Radon Transform and Some of Its Applications. Wiley, NY (1983)
Jafari-Khouzani, K., Soltanian-Zadeh, H.: Radon Transform orientation estimation for rotation invariant texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 1004–1008 (2005)
Seo, S., et al.: A robust image fingerprinting system using the Radon transforms. Signal Process. Image Commun. 19(4), 325–339 (2004)
Hjouj, F., Kammler, D.W.: Identification of Reflected, Scaled, Translated, and Rotated Objects From Their Radon Projections. IEEE Trans. Image Processing 17(3), 301–310 (2008)
Salvador, S., Chan, P.: Fast DTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. In: Proc. KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80 (2004)
Lemire, D.: Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound. Pattern Recognition 42(9), 2169–2180 (2009)
Kittler, J., Hatef, M., Duin, R.P.W., Matias, J.: On combining classifiers. IEEE Trans. on Pattern Analysis Machine Intelligence 20(3), 226–239 (1998)
Xu, L., Krzyzak, A., Suen, C.Y.: Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition. IEEE Transaction on Systems, Man and Cybernetics 22(3), 418–435 (1992)
Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)
Huang, M., Dementhon, D., Doermann, D., Golebiowski, L.: Document ranking by layout relevance. In: Proc. 8th ICDAR, pp. 362–366 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pirlo, G., Chimienti, M., Dassisti, M., Impedovo, D., Galiano, A. (2013). Layout-Based Document-Retrieval System by Radon Transform Using Dynamic Time Warping. In: Petrosino, A. (eds) Image Analysis and Processing – ICIAP 2013. ICIAP 2013. Lecture Notes in Computer Science, vol 8156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41181-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-41181-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41180-9
Online ISBN: 978-3-642-41181-6
eBook Packages: Computer ScienceComputer Science (R0)