Document Reverse Engineering: From Paper to XML

  • Kyong-Ho Lee
  • Yoon-Chul Choy
  • Sung-Bae Cho
  • Xiao Tang
  • Victor McCrary
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

Since XML has the advantage of embedding logical structure information into documents, it is widely used as the universal format for structured documents on the Web. This makes it attractive to convert paper-based documents with logical hierarchy into XML representations automatically. Document image analysis and understanding [1] consists of two phases: geometric and logical structure analysis. Because the two phases take different kinds of data as input, it may not be desirable to apply the same method to them. Targeting technical journal document with multiple pages, we present a hybridization of knowledge-based and syntactic methods for geometric and logical structure analysis of document images.

References

  1. 1.
    Nagy, G.: Twenty Years of Document Image Analysis in PAMI. IEEE Trans. Pattern Analysis and Machine Intelligence. 1 (2000) 38–62CrossRefGoogle Scholar
  2. 2.
    Summers, K.M.: Toward a Taxonomy of Logical Document Structures. Proc. Dartmouth Institute for Advanced Graduate Studies (DAGS’95), Boston (1995) 124–133Google Scholar
  3. 3.
    Koffka, K.: Principles of Gestalt Psychology. Harcourt, Brace and World, New York (1935)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Kyong-Ho Lee
    • 1
  • Yoon-Chul Choy
    • 2
  • Sung-Bae Cho
    • 2
  • Xiao Tang
    • 1
  • Victor McCrary
    • 1
  1. 1.National Institute of Standards and TechnologyGaithersburgUSA
  2. 2.Dept. Computer ScienceYonsei Univ.Seodaemun-kuKorea

Personalised recommendations