Information Retrieval

, Volume 2, Issue 2–3, pp 227–243 | Cite as

Comparison and Classification of Documents Based on Layout Similarity

  • Jianying Hu
  • Ramanujan Kashi
  • Gordon Wilfong


This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. These fixed-length vectors are then compared to each other through a Manhattan distance computation for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval coding in a hidden Markov model based page layout classification system that is trainable and extendible. The methods described in the paper can be used in various document retrieval tasks including visual similarity based retrieval, categorization and information extraction.

hidden Markov models edit distance dynamic warping document classification document retrieval clustering Manhattan distance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baird HS (1994) Background structure in document images. Int Journal of Pattern Recognition and Artificial intelligence, 8(5):1013–1030.Google Scholar
  2. Cullen JF, Hull JJ and Hart PE (1997) Document image database retrieval and browsing using texture analysis. In: Proc. ICDAR'97, Ulm, Germany, pp. 718–721.Google Scholar
  3. Dengel A and Dubiel F (1996) Computer understanding of document structure. Int Journal of Imaging Systems and Technology, 7:271–278.Google Scholar
  4. Doermann D (1997) The retrieval of document images: a brief survey. In: Proc. ICDAR'97, Ulm, Germany, pp. 945–949.Google Scholar
  5. Doermann D, Li H and Kia D (1997) The detection of duplicates in document image databases. In: ICDAR'97, Ulm, Germany, pp. 314–318.Google Scholar
  6. Ferguson JD (1980) Variable duration models for speech. In: Proc. Symp. on the Application of HMM to Text and Speech, Priceton, NJ, pp. 143–179.Google Scholar
  7. Gersho A and Gray RM (1992) Vector Quantization and Signal Compression. Kluwer Academic Publishers.Google Scholar
  8. Hu J, Brown MK and Turin W (1996) HMM based on-line handwriting recognition. IEEE PAMI, 18(10):1039–1045.Google Scholar
  9. Hull JJ and Cullen JF (1997) Document image similarity and equivalence detection. In: ICDAR'97, Ulm, Germany, pp. 308–312.Google Scholar
  10. Kashi R, Hu J, Nelson W and Turin W (1997). On-line handwriting signature verification using hidden Markov model features. In: Proc. ICDAR'97, Ulm, Germany.Google Scholar
  11. Kruskal JB and Sankoff D (1993), Eds. TimeWarps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.Google Scholar
  12. Levinson SE (1986) Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech & Language, 1(1):29–45.Google Scholar
  13. Rabiner LR and Juang BH (1993) Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  14. Sakoe H and Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-26:43–49.Google Scholar
  15. Taylor SL, Lipshutz M and Nilson RW(1995) Classification and functional decomposition of business documents. In: Proc. ICDAR'95, Montreal, Canada, pp. 563–566.Google Scholar
  16. Turin W (1990) Performance Analysis of Digital Transmission Systems. Computer Science Press, New York.Google Scholar
  17. Walischewski H (1997) Automatic knowledge acquisition for spatial document interpretation. In: ICDAR'97, Ulm, Germany, pp. 243–247.Google Scholar
  18. Zhu W and Syeda-Mahmood T (1998) Image organization and retrieval using a flexible shape model. In: IEEE Int. Workshop on Content Based Access of Image and Video Databases, Bombay, India, pp. 31–39.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Jianying Hu
    • 1
  • Ramanujan Kashi
    • 1
  • Gordon Wilfong
    • 1
  1. 1.Lucent Technologies Bell LabsMurray HillUSA

Personalised recommendations