Extraction of Index Components Based on Contents Analysis of Journal’s Scanned Cover Page

  • Young-Bin Kwon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3926)


In this paper, a method for automatically indexing the contents to reduce the effort that used to be required for input paper information and constructing index is sought. Various contents formats for journals, which have different features from those for general documents, are described. The principal elements that we want to represent are titles, authors, and pages for each paper. Thus, the three principal elements are modeled according to the order of their arrangement, and then their features are generalized. The content analysis system is then implemented based on the suggested modeling method. The content analysis system, implemented for verifying the suggested method, gets its input in the form containing more than 300 dpi gray scale image and analyze structural features of the contents. It classifies titles, authors and pages using efficient projection method. The definition of each item is classified according to regions, and then is extracted automatically as index information. It also helps to recognize characters region by region. The experimental result is obtained by applying to some of the suggested 6 models, and the system shows 97.3% success rate for various journals.


Character Recognition Document Image Text Line Page Number Vertical Projection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tsujimoto, S., Asada, H.: Major Components of A Complete Text Reading System. Proceedings of IEEE 80(7), 1133–1149 (1992)CrossRefGoogle Scholar
  2. 2.
    Wang, D., Srihari, S.N.: Classification of Newspaper Image Block Using Texture Analysis. Computer Vision, Graphics and Image Processing 47, 327–352 (1989)CrossRefGoogle Scholar
  3. 3.
    Fletcher, L.A., Kasturi, R.: A Robust Algorithm for Text line Separation from Mixed Text/Graphics Images. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(6), 910–918 (1988)CrossRefGoogle Scholar
  4. 4.
    Wahl, F.M., Wong, K.Y., Casey, R.G.: Block Segmentation and Text Extraction in Mixed Text/Image Document. Computer Vision Graphics and Image Processing 22, 375–390 (1982)CrossRefGoogle Scholar
  5. 5.
    Hirayama, Y.: A Block Segmentation Method for Document Image with Complicated Column Structures. In: Proceedings of the 2nd International Conference on Document Analysis and Recognition, pp. 91–94 (1993) Google Scholar
  6. 6.
    Belaid, A., Pierron, L., Valverde, N.: Part-of-Speech Tagging for Table of Contents Recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 451–454 (2000)Google Scholar
  7. 7.
    Mandal, S., Chowdhury, S.P., Das, A.K., Chanda, B.: Automated Detection and Segmentation of Table of Contents Page from Document Images. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 398–402 (2003)Google Scholar
  8. 8.
    Tsuruoka, S., Hirano, C.: Image-based Structure Analysis for a Table of Contents and Conversion to XML Documents. In: Proc. DLIA Workshop (2001)Google Scholar
  9. 9.
    Lin, X., Xiong, Y.: Detection and Analysis of Table of Contents Based on Content Association, Hewlett-Packard Technical Report, HPL-2005-105 (May 31, 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Young-Bin Kwon
    • 1
  1. 1.Department of Computer EngineeringChung-Ang UniversitySeoulKorea

Personalised recommendations