Document Reverse Engineering: From Paper to XML
Since XML has the advantage of embedding logical structure information into documents, it is widely used as the universal format for structured documents on the Web. This makes it attractive to convert paper-based documents with logical hierarchy into XML representations automatically. Document image analysis and understanding  consists of two phases: geometric and logical structure analysis. Because the two phases take different kinds of data as input, it may not be desirable to apply the same method to them. Targeting technical journal document with multiple pages, we present a hybridization of knowledge-based and syntactic methods for geometric and logical structure analysis of document images.