Extraction of Index Components Based on Contents Analysis of Journal’s Scanned Cover Page
In this paper, a method for automatically indexing the contents to reduce the effort that used to be required for input paper information and constructing index is sought. Various contents formats for journals, which have different features from those for general documents, are described. The principal elements that we want to represent are titles, authors, and pages for each paper. Thus, the three principal elements are modeled according to the order of their arrangement, and then their features are generalized. The content analysis system is then implemented based on the suggested modeling method. The content analysis system, implemented for verifying the suggested method, gets its input in the form containing more than 300 dpi gray scale image and analyze structural features of the contents. It classifies titles, authors and pages using efficient projection method. The definition of each item is classified according to regions, and then is extracted automatically as index information. It also helps to recognize characters region by region. The experimental result is obtained by applying to some of the suggested 6 models, and the system shows 97.3% success rate for various journals.
KeywordsCharacter Recognition Document Image Text Line Page Number Vertical Projection
Unable to display preview. Download preview PDF.
- 5.Hirayama, Y.: A Block Segmentation Method for Document Image with Complicated Column Structures. In: Proceedings of the 2nd International Conference on Document Analysis and Recognition, pp. 91–94 (1993) Google Scholar
- 6.Belaid, A., Pierron, L., Valverde, N.: Part-of-Speech Tagging for Table of Contents Recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 451–454 (2000)Google Scholar
- 7.Mandal, S., Chowdhury, S.P., Das, A.K., Chanda, B.: Automated Detection and Segmentation of Table of Contents Page from Document Images. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 398–402 (2003)Google Scholar
- 8.Tsuruoka, S., Hirano, C.: Image-based Structure Analysis for a Table of Contents and Conversion to XML Documents. In: Proc. DLIA Workshop (2001)Google Scholar
- 9.Lin, X., Xiong, Y.: Detection and Analysis of Table of Contents Based on Content Association, Hewlett-Packard Technical Report, HPL-2005-105 (May 31, 2005)Google Scholar