Abstract
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4,500 Web images crawled from Google Image Search into three content categories — pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1,512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable to handle high intra-class variability and shows results that exceed other state-of-the-art approaches, allowing it to be used as a content recognizer in image indexing and retrieval systems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1999)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Biswas, S., Aggarwal, G., Chellappa, R.: Efficient indexing for articulation invariant shape matching and retrieval. In: Proc. CVPR, pp. 1–8 (2007)
Busch, A., Boles, W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–697 (1986)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR, pp. 886–893 (2005)
Ding, J., Lam, L., Suen, C.: Classification of oriental and European scripts by using characteristic features. In: Proc. ICDAR, pp. 1023–1027 (1997)
Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 1–16 (2008)
Fidler, S., Leonardis, A.: Towards scalable representations of object categories: Learning a hierarchy of parts. In: Proc. CVPR, pp. 1–8 (2007)
Gdalyahu, Y., Weinshall, D.: Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1312–1328 (1999)
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)
Jacobs, D.: Robust and efficient detection of salient convex groups. IEEE Trans. Pattern Anal. Mach. Intell. 18(1), 23–37 (1996)
Latecki, L., Lakamper, R., Eckhardt, U.: Shape descriptors for non-rigid shapes with a single closed contour. In: Proc. CVPR, pp. 424–429 (2000)
Lee, D., Nohl, C., Baird, H.: Language Identification in Complex, Unoriented, and Degraded Document Images. Document Analysis Systems II (1998)
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)
Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 286–299 (2007)
Lowe, D.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31(3), 355–395 (1987)
Lu, S., Tan, C.: Script and language identification in noisy and degraded document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 14–24 (2008)
Marti, U., Bunke, H.: The IAM-database: An English sentence database for off-line handwriting recognition. Int. J. Document Analysis and Recognition 5, 39–46 (2006), http://www.iam.unibe.ch/~fki/iamDB/
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Computer Vision 42(3), 145–175 (2001)
Plamondon, R., Srihari, S.: On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Rice, S., Nagy, G., Nartker, T.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Dordrecht (1999)
Rothwell, C., Zisserman, A., Forsyth, D., Mundy, J.: Planar object recognition using projective shape representation. Int. J. Computer Vision 16(5), 57–99 (1995)
Sharvit, D., Chan, J., Tek, H., Kimia, B.: Symmetry-based indexing of image database. J. Visual Commun. and Image Representation 9(4), 366–380 (1998)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Spitz, A.: Determination of script and language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19(3), 235–245 (1997)
Suen, C., Bergler, S., Nobile, N., Waked, B., Nadal, C., Bloch, A.: Categorizing document images into script and language classes. In: Proc. ICDAR, pp. 297–306 (1998)
Tan, T.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 751–756 (1998)
Vincent, L.: Google Book Search: Document understanding on a massive scale. In: Proc. ICDAR, pp. 819–823 (2007)
Yu, S., Shi, J.: Multiclass spectral clustering. In: Proc. ICCV, pp. 11–17 (2003)
Zhu, G., Bethea, T.J., Krishna, V.: Extracting relevant named entities for automated expense reimbursement. In: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 1004–1012 (2007)
Zhu, G., Yu, X., Li, Y., Doermann, D.: Unconstrained language identification using a shape codebook. In: Proc. ICFHR, pp. 13–18 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, G., Yu, X., Li, Y., Doermann, D. (2008). Learning Visual Shape Lexicon for Document Image Content Recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88688-4_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-88688-4_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88685-3
Online ISBN: 978-3-540-88688-4
eBook Packages: Computer ScienceComputer Science (R0)