Text/Graphics Separation Revisited

  • Karl Tombre
  • Salvatore Tabbone
  • Loïc Pélissier
  • Bart Lamiroy
  • Philippe Dosch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

Text/graphics separation aims at segmenting the document into two layers: a layer assumed to contain text and a layer containing graphical objects. In this paper, we present a consolidation of a method proposed by Fletcher and Kasturi, with a number of improvements to make it more suitable for graphics-rich documents. We discuss the right choice of thresholds for this method, and their stability. We also propose a post-processing step for retrieving text components touching the graphics, through local segmentation of the distance skeleton.

Keywords

Text Component Document Image Search Area Hough Transform Graphical Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. [1]
    E. Appiani, F. Cesarini, A. M. Colla, M. Diligenti, M. Gori, S. Marinai, and G. Soda. Automatic document classification and indexing in high-volume applications. International Journal on Document Analysis and Recognition, 4(2):69–83, December 2001.CrossRefGoogle Scholar
  2. [2]
    R. Cao and C. L. Tan. Separation of Overlapping Text from Graphics. In Proceedings of 6th International Conference on Document Analysis and Recognition, Seattle (USA), pages 44–48, September 2001.Google Scholar
  3. [3]
    R. Cao and C. L. Tan. Text/Graphics Separation in Maps. In Proceedings of 4th IAPR International Workshop on Graphics Recognition, Kingston, Ontario (Canada), pages 245–254, September 2001.Google Scholar
  4. [4]
    A. K. Chhabra, V. Misra, and J. Arias. Detection of Horizontal Lines in Noisy Run Length Encoded Images: The FAST Method. In R. Kasturi and K. Tombre, editors, Graphics Recognition-Methods and Applications, volume 1072 of Lecture Notes in Computer Science, pages 35–48. Springer-Verlag, May1996.Google Scholar
  5. [5]
    J. E. den Hartog, T. K. ten Kate, and J. J. Gerbrands. An Alternative to Vectorization: Decomposition of Graphics into Primitives. In Proceedings of Third Symposium on Document Analysis and Information Retrieval, Las Vegas, April 1994.Google Scholar
  6. [6]
    G. Sanniti di Baja. Well-Shaped, Stable, and Reversible Skeletons from the (3,4)-Distance Transform. Journal of Visual Communication and Image Representation, 5(1):107–115, 1994.CrossRefGoogle Scholar
  7. [7]
    D. Dori and L. Wenyin. Vector-Based Segmentation of Text Connected to Graphics in Engineering Drawings. In P. Perner, P. Wang, and A. Rosenfeld, editors, Advances in Structural and Syntactial Pattern Recognition (Proceedings of 6th International SSPR Workshop, Leipzig, Germany), volume 1121 of Lecture Notes in Computer Science, pages 322–331. Springer-Verlag, August 1996.Google Scholar
  8. [8]
    Ph. Dosch, K. Tombre, C. Ah-Soon, and G. Masini. A complete system for analysis of architectural drawings. International Journal on Document Analysis and Recognition, 3(2):102–116, December 2000.CrossRefGoogle Scholar
  9. [9]
    L. A. Fletcher and R. Kasturi. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images. IEEE Transactions on PAMI, 10(6):910–918, 1988.Google Scholar
  10. [10]
    J. M. Gloger. Use of Hough Transform to Separate Merged Text/Graphics in Forms. In Proceedings of 11th International Conference on Pattern Recognition, Den Haag (The Netherlands), volume 2, pages 268–271, 1992.Google Scholar
  11. [11]
    T. Kaneko. Line Structure Extraction from Line-Drawing Images. Pattern Recognition, 25(9):963–973, 1992.CrossRefGoogle Scholar
  12. [12]
    D. X. Le, G. R. Thoma, and H. Wechsler. Classification of binarydo cument images into textual or nontextual data blocks using neural network models. Machine Vision and Applications, 8:289–304, 1995.CrossRefGoogle Scholar
  13. [13]
    Z. Lu. Detection of Text Regions From Digital Engineering Drawings. IEEE Transactions on PAMI, 20(4):431–439, April 1998.Google Scholar
  14. [14]
    H. Luo and I. Dinstein. Using Directional Mathematical Morphologyfor Separation of Character Strings from Text/Graphics Image. In Shape, Structure and Pattern Recognition (Post-proceedings of IAPR Workshop on Syntactic and Structural Pattern Recognition, Nahariya, Israel), pages 372–381. World Scientific, 1994.Google Scholar
  15. [15]
    Huizhu Luo and Rangachar Kasturi. Improved Directional Morphological Operations for Separation of Characters from Maps/Graphics. In K. Tombre and A. K. Chhabra, editors, Graphics Recognition-Algorithms and Systems, volume 1389 of Lecture Notes in Computer Science, pages 35–47. Springer-Verlag, April 1998.Google Scholar
  16. [16]
    G. Nagyand S. Seth. Hierarchical Representation of OpticallyScanned Documents. In Proceedings of 7th International Conference on Pattern Recognition, Montréal (Canada), pages 347–349, 1984.Google Scholar
  17. [17]
    H.-C. Park, S.-Y. Ok, Y.-J. Yu, and H.-G. Cho. A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model. International Journal on Document Analysis and Recognition, 4(2):115–130, December 2001.CrossRefGoogle Scholar
  18. [18]
    T. Pavlidis and J. Zhou. Page Segmentation and Classification. CVGIP: Graphical Models and Image Processing, 54(6):484–496, November 1992.CrossRefGoogle Scholar
  19. [19]
    Y. Wang, I. T. Phillips, and R. Haralick. Using Area Voronoi Tessellation to Segment Characters Connected to Graphics. In Proceedings of 4th IAPR International Workshop on Graphics Recognition, Kingston, Ontario (Canada), pages 147–153, September 2001.Google Scholar
  20. [20]
    K. Y. Wong, R. G. Casey, and F. M. Wahl. Document Analysis System. IBM Journal of Research and Development, 26(6):647–656, 1982.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Karl Tombre
    • 1
  • Salvatore Tabbone
    • 1
  • Loïc Pélissier
    • 1
  • Bart Lamiroy
    • 1
  • Philippe Dosch
    • 1
  1. 1.LORIAVandœuvre-lès-NancyFrance

Personalised recommendations