Learning to Integrate Web Catalogs with Conceptual Relationships in Hierarchical Thesaurus

  • Jui-Chi Ho
  • Ing-Xiang Chen
  • Cheng-Zen Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)


Web catalog integration has been addressed as an important issue in current digital content management. Past studies have shown that exploiting a flattened structure with auxiliary information extracted from the source catalog can improve the integration results. Although earlier studies have also shown that exploiting a hierarchical structure in classification may bring better advantages, the effectiveness has not been testified in catalog integration. In this paper, we propose an enhanced catalog integration (ECI) approach to extract the conceptual relationships from the hierarchical Web thesaurus and further improve the accuracy of Web catalog integration. We have conducted experiments of real-world catalog integration with both a flattened structure and a hierarchical structure in the destination catalog. The results show that our ECI scheme effectively boosts the integration accuracy of both the flattened scheme and the hierarchical scheme with the advanced Support Vector Machine (SVM) classifiers.


Support Vector Machine Test Document Accuracy Improvement Source Category Hierarchical Scheme 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Srikant., R.: On Integrating Catalogs. In: Proc. the 10th WWW Conf. (WWW10), May 2001, pp. 603–612 (2001)Google Scholar
  2. 2.
    Boyapati, V.: Improving Hierarchical Text Classification Using Unlabeled Data. In: Proc. the 25th Annual ACMConf. on Research and Development in Information Retrieval (SIGIR 2002), Augest 2002, pp. 363–364 (2002)Google Scholar
  3. 3.
    Chen, I.-X., Ho, J.-C., Yang, C.-Z.: An iterative approach for web catalog integration with support vector machines. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 703–708. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proc. the 23rd Annual ACM Conf. on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263 (July 2000)Google Scholar
  5. 5.
    Frakes, W., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  6. 6.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  7. 7.
    Joachims, T.: Making Large-Scale SVM Learning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    Keller, A.M.: Smart Catalogs and Virtual Catalogs. In: Kalakota, R., Whinston, A. (eds.) Readings in Electronic Commerce. Addison-Wesley, Reading (1997)Google Scholar
  9. 9.
    Kim, D., Kim, J., Lee, S.: Catalog Integration for Electronic Commerce through Category-Hierarchy Merging Technique. In: Proc. the 12th Int’l Workshop on Research Issues in Data Engineering: Engineering e-Commerce/e-Business Systems (RIDE 2002), pp. 28–33 (Febraury 2002)Google Scholar
  10. 10.
    Marron, P.J., Lausen, G., Weber, M.: Catalog Integration Made Easy. In: Proc. the 19th Int’l Conf. on Data Engineering (ICDE 2003), pp. 677–679 (March 2003)Google Scholar
  11. 11.
    Rennie, J.D.M., Rifkin, R.: Improving Multiclass Text Classification with the Support Vector Machine. Tech. Report AI Memo AIM-2001-026 and CCL Memo 210. MIT (October 2001)Google Scholar
  12. 12.
    Sarawagi, S., Chakrabarti, S., Godbole., S.: Cross-Training: Learning Probabilistic Mappings between Topics. In: Proc. the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 177–186 (Augest 2003)Google Scholar
  13. 13.
    Stonebraker, M., Hellerstein, J.M.: Content Integration for e-Commerce. In: Proc. of the 2001 ACM SIGMOD Int’l Conf. on Management of Data, pp. 552–560 (May 2001)Google Scholar
  14. 14.
    Sun, A., Lim, E.-P., Ng., W.-K.: Performance Measurement Framework for Hierarchical Text Classification. Journal of the American Society for Information Science and Technology (JASIST) 54(11), 1014–1028 (2003)CrossRefGoogle Scholar
  15. 15.
    Tsay, J.-J., Chen, H.-Y., Chang, C.-F., Lin, C.-H.: Enhancing Techniques for Efficient Topic Hierarchy Integration. In: Proc. the 3rd Int’l Conf. on Data Mining (ICDM 2003), pp. 657–660 (November 2003)Google Scholar
  16. 16.
    Wu, C.-W., Tsai, T.-H., Hsu, W.-L.: Learning to Integrate Web Taxonomies with Fine- Grained Relations: A Case Study Using Maximum Entropy Model. In: Proc. of Asia Information Retrieval Symposium 2005 (AIRS 2005), pp. 190–205 (October 2005)Google Scholar
  17. 17.
    Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proc. the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pp. 42–49 (Augest 1999)Google Scholar
  18. 18.
    Zadrozny., B.: Reducing Multiclass to Binary by Coupling Probability Estimates. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 (NIPS 2001). MIT Press, Cambridge (2002)Google Scholar
  19. 19.
    Zhang, D., Lee, W.S.: Web Taxonomy Integration using Support Vector Machines. In: Proc. WWW 2004, pp. 472–481 (May 2004)Google Scholar
  20. 20.
    Zhang, D., Lee, W.S.: Web Taxonomy Integration through Co-Bootstrapping. In: Proc. SIGIR 2004, pp. 410–417 (July 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jui-Chi Ho
    • 1
  • Ing-Xiang Chen
    • 1
  • Cheng-Zen Yang
    • 1
  1. 1.Department of Computer Science and EngineeringYuan Ze UniversityTaiwan, R.O.C.

Personalised recommendations