CatRelate: A New Hierarchical Document Category Integration Algorithm by Learning Category Relationships

  • Shanfeng Zhu
  • Christopher C. Yang
  • Wai Lam
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3334)

Abstract

We address the problem of integrating documents from a source catalog into a master catalog. Current technologies for solving the problem deem it as a flat category integration problem without considering the useful hierarchy information in the catalog, or deal with it hierarchically but without a rigorous model. In contrast, our method is based on correctly identifying relationships among categories, such as Match, Disjoint, SubConcept, SuperConcept, and Overlap, which come from the relations of sets in Set theory. Compared with traditional Match/NotMatch relationship in literature, our approach is more expressive in defining the relationship. The relationships among categories are first learned in a probabilistic way, and then refined by considering the hierarchy context. Our preliminary experiments show that it can help to correctly identify category relationships, and thus increase the accuracy of document integration.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: On Integrating Catalogs. In: Proceedings of WWW10 Conference, Hong Kong, May 1-5, pp. 603–612 (2001)Google Scholar
  2. 2.
    Cheng, T.H., Wei, C.: Integration of Document-category Hierarchies: A Clustering-based Approach. In: Web 2003 (The Second Workshop on e-Business), Seattle, Washington, USA (December 13-14, 2003)Google Scholar
  3. 3.
    Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic Web. The VLDB Journal 12, 303–319 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Shanfeng Zhu
    • 1
  • Christopher C. Yang
    • 2
  • Wai Lam
    • 2
  1. 1.Bioinformatics Center, Institute for Chemical ResearchKyoto UniversityJapan
  2. 2.Department of System Engineering and Engineering ManagementThe Chinese University of HongHong Kong

Personalised recommendations