Learning Visual Shape Lexicon for Document Image Content Recognition

Zhu, Guangyu; Yu, Xiaodong; Li, Yi; Doermann, David

doi:10.1007/978-3-540-88688-4_55

Guangyu Zhu⁴,
Xiaodong Yu⁴,
Yi Li⁴ &
…
David Doermann⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5303))

Included in the following conference series:

European Conference on Computer Vision

9646 Accesses
2 Citations

Abstract

Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4,500 Web images crawled from Google Image Search into three content categories — pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1,512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable to handle high intra-class variability and shows results that exceed other state-of-the-art approaches, allowing it to be used as a content recognizer in image indexing and retrieval systems.

Download to read the full chapter text

Chapter PDF

A Bag of Constrained Visual Words Model for Image Representation

A VCA-Based Approach to Enhance Learning Data Sets for Object Classification

Encoding Spatial Arrangements of Visual Words for Rotation-Invariant Image Classification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1999)
Article Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Article Google Scholar
Biswas, S., Aggarwal, G., Chellappa, R.: Efficient indexing for articulation invariant shape matching and retrieval. In: Proc. CVPR, pp. 1–8 (2007)
Google Scholar
Busch, A., Boles, W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)
Article Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–697 (1986)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR, pp. 886–893 (2005)
Google Scholar
Ding, J., Lam, L., Suen, C.: Classification of oriental and European scripts by using characteristic features. In: Proc. ICDAR, pp. 1023–1027 (1997)
Google Scholar
Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 1–16 (2008)
Article Google Scholar
Fidler, S., Leonardis, A.: Towards scalable representations of object categories: Learning a hierarchy of parts. In: Proc. CVPR, pp. 1–8 (2007)
Google Scholar
Gdalyahu, Y., Weinshall, D.: Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1312–1328 (1999)
Article Google Scholar
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)
Article Google Scholar
Jacobs, D.: Robust and efficient detection of salient convex groups. IEEE Trans. Pattern Anal. Mach. Intell. 18(1), 23–37 (1996)
Article Google Scholar
Latecki, L., Lakamper, R., Eckhardt, U.: Shape descriptors for non-rigid shapes with a single closed contour. In: Proc. CVPR, pp. 424–429 (2000)
Google Scholar
Lee, D., Nohl, C., Baird, H.: Language Identification in Complex, Unoriented, and Degraded Document Images. Document Analysis Systems II (1998)
Google Scholar
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)
Article Google Scholar
Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 286–299 (2007)
Article Google Scholar
Lowe, D.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31(3), 355–395 (1987)
Article Google Scholar
Lu, S., Tan, C.: Script and language identification in noisy and degraded document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 14–24 (2008)
Google Scholar
Marti, U., Bunke, H.: The IAM-database: An English sentence database for off-line handwriting recognition. Int. J. Document Analysis and Recognition 5, 39–46 (2006), http://www.iam.unibe.ch/~fki/iamDB/
Article MATH Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Article MATH Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Computer Vision 42(3), 145–175 (2001)
Article MATH Google Scholar
Plamondon, R., Srihari, S.: On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Article Google Scholar
Rice, S., Nagy, G., Nartker, T.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Dordrecht (1999)
Book Google Scholar
Rothwell, C., Zisserman, A., Forsyth, D., Mundy, J.: Planar object recognition using projective shape representation. Int. J. Computer Vision 16(5), 57–99 (1995)
Article Google Scholar
Sharvit, D., Chan, J., Tek, H., Kimia, B.: Symmetry-based indexing of image database. J. Visual Commun. and Image Representation 9(4), 366–380 (1998)
Article Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Spitz, A.: Determination of script and language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19(3), 235–245 (1997)
Article Google Scholar
Suen, C., Bergler, S., Nobile, N., Waked, B., Nadal, C., Bloch, A.: Categorizing document images into script and language classes. In: Proc. ICDAR, pp. 297–306 (1998)
Google Scholar
Tan, T.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 751–756 (1998)
Article MathSciNet Google Scholar
Vincent, L.: Google Book Search: Document understanding on a massive scale. In: Proc. ICDAR, pp. 819–823 (2007)
Google Scholar
Yu, S., Shi, J.: Multiclass spectral clustering. In: Proc. ICCV, pp. 11–17 (2003)
Google Scholar
Zhu, G., Bethea, T.J., Krishna, V.: Extracting relevant named entities for automated expense reimbursement. In: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 1004–1012 (2007)
Google Scholar
Zhu, G., Yu, X., Li, Y., Doermann, D.: Unconstrained language identification using a shape codebook. In: Proc. ICFHR, pp. 13–18 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD 20742, USA
Guangyu Zhu, Xiaodong Yu, Yi Li & David Doermann

Authors

Guangyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
David Doermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 3310 Siebel Hall, IL 61801, Urbana, USA
David Forsyth
Department of Computing, Oxford Brookes University, OX33 1HX, Wheatley, Oxford, UK
Philip Torr
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, G., Yu, X., Li, Y., Doermann, D. (2008). Learning Visual Shape Lexicon for Document Image Content Recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88688-4_55

Download citation

DOI: https://doi.org/10.1007/978-3-540-88688-4_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88685-3
Online ISBN: 978-3-540-88688-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Visual Shape Lexicon for Document Image Content Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A Bag of Constrained Visual Words Model for Image Representation

A VCA-Based Approach to Enhance Learning Data Sets for Object Classification

Encoding Spatial Arrangements of Visual Words for Rotation-Invariant Image Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Visual Shape Lexicon for Document Image Content Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A Bag of Constrained Visual Words Model for Image Representation

A VCA-Based Approach to Enhance Learning Data Sets for Object Classification

Encoding Spatial Arrangements of Visual Words for Rotation-Invariant Image Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation