Abstract
Textual information extraction is a challenging issue in Information Retrieval. Two main approaches are commonly distinguished: texture-based and region-based. In this paper, we propose a method guided by the quadtree decomposition. The principle of the method is to recursively decompose regions of a document image is four equal regions, starting from the image of the whole document. At each step of the decomposition process an OCR engine is used for retrieving a given textual information from the obtained regions. Experiments on real invoice data provide promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dagher, I., & Taleb, C. (2014). Image denoising using fourth order wiener filter with wavelet quadtree decomposition. Journal of Electrical and Computer Engineering, 2014, 9.
Emmanouilidis, C., Batsalas, C., & Papamarkos, N. (2009). Development and evaluation of text localization techniques based on structural texture features and neural classifiers. In 10th International Conference on Document Analysis and Recognition (pp. 1270–1274).
Finkel, R. A., & Bentley, J. L. (1974). Quad trees: A data structure for retrieval on composite keys. Acta Informatica, 4, 11–9.
Gatos, B. G. (2014). Imaging techniques in document analysis processes. In Handbook of document image processing and recognition (Vol. 1, pp. 73–131). London: Springer-Verlag
Jacobs, P. S. (2014). Text-based intelligent systems: Current research and practice in information extraction and retrieval. New York: Psychology Press.
Minaee, S., Yu, H., & Wang, Y. (2014). A robust regression approach for background/Foreground segmentation. arXiv preprint arXiv: 1412.5126.
Piskorski, J., & Yangarber, R. (2013). Information extraction: past, present and future. In Multi-source, multilingual information extraction and summarization (pp. 23–49). Berlin Heidelberg: Springer-Verlag
Ramanathan, V., Mishra, S., & Mitra, P. (2011). Quadtree decomposition based extended vector space model for image retrieval. 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 139–144.
Sumathi, C. P., Santhanam, T., & Gayathri, D. (2012a). A survey on various approaches of text extraction in images. International Journal of Computer Science & Engineering Survey, 3(4), 27–42.
Sumathi, C. P., Santhanam, T., Priya, N. (2012b). Techniques and challenges of automatic text extraction in complex images: a survey. Journal of Theoretical and Applied Information Technology, 35(2), 225–235.
Wei, L., Lefebvre, S., Kwatra, V., Turk, G., (2009). State of the art in example-based texture synthesis. Eurographics 2009, State of the Art Report, EG-STAR (pp. 93–117).
Ying, L., Dengsheng, Z., & Guojun, L. (2008). Region based image retrieval with high-level semantics using decision tree learning. Pattern Recognition, 41, 2554–2570.
Ying, L., Zhang, G., & Wei-Ying, M. (2006). Study on texture feature extraction in region-based image retrieval system. In International Multimedia Modelling Conference (pp. 264–271).
Yuan, Y., Kim, I. K., Zheng, X., Liu, L., Cao, X., Lee, S., & Park, J. H. (2012). Quadtree based nonsquare block structure for inter frame coding in high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1707–1719.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pitou, C., Diatta, J. (2016). Textual Information Localization and Retrieval in Document Images Based on Quadtree Decomposition. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-25226-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)