Document Categorization

  • Qianhong Liu
  • Peter A. Ng

Abstract

The document model of TEXPROS discussed in Chapter 2 employs a dual approach to describing and classifying office documents by defining both a document type hierarchy and a folder organization (or logical filing structure). The document type hierarchy depicts the structural organization of the document types used in the problem domain. It identifies and organizes the structural commonalities among documents, and facilitates classifying various documents. The folder organization represents the user’s view of the document filing organization. In this chapter, we present two different architectures to implement the document filing organization [143, 168, 169, 189]. We start in Section 3.1 by giving a formal definition of the document model, including frame templates, a document type hierarchy, folders, and folder organizations. A frame template (document type) specifies the structure and components common to different documents or frame instances (document instances) of the same kind. The folder organization specifying the document filing view is defined using predicates and directed graphs. Then, we show how these concepts can be used to solve the Reconstruction Problem in Section 3.2. We investigate that under what circumstances it is possible to reconstruct a folder organization from its folder level predicates. The results are expressed in terms of graph-theoretic concepts, such as, an associated digraph, transitive closure, and redundant/nonredundant filing paths.

Keywords

Tate Sten 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Qianhong Liu
    • 1
  • Peter A. Ng
    • 1
  1. 1.New Jersey Institute of TechnologyNewarkUSA

Personalised recommendations