Abstract
Web catalog integration has been addressed as an important issue in current digital content management. Past studies have shown that exploiting a flattened structure with auxiliary information extracted from the source catalog can improve the integration results. Although earlier studies have also shown that exploiting a hierarchical structure in classification may bring better advantages, the effectiveness has not been testified in catalog integration. In this paper, we propose an enhanced catalog integration (ECI) approach to extract the conceptual relationships from the hierarchical Web thesaurus and further improve the accuracy of Web catalog integration. We have conducted experiments of real-world catalog integration with both a flattened structure and a hierarchical structure in the destination catalog. The results show that our ECI scheme effectively boosts the integration accuracy of both the flattened scheme and the hierarchical scheme with the advanced Support Vector Machine (SVM) classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant., R.: On Integrating Catalogs. In: Proc. the 10th WWW Conf. (WWW10), May 2001, pp. 603–612 (2001)
Boyapati, V.: Improving Hierarchical Text Classification Using Unlabeled Data. In: Proc. the 25th Annual ACMConf. on Research and Development in Information Retrieval (SIGIR 2002), Augest 2002, pp. 363–364 (2002)
Chen, I.-X., Ho, J.-C., Yang, C.-Z.: An iterative approach for web catalog integration with support vector machines. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 703–708. Springer, Heidelberg (2005)
Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proc. the 23rd Annual ACM Conf. on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263 (July 2000)
Frakes, W., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Joachims, T.: Making Large-Scale SVM Learning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)
Keller, A.M.: Smart Catalogs and Virtual Catalogs. In: Kalakota, R., Whinston, A. (eds.) Readings in Electronic Commerce. Addison-Wesley, Reading (1997)
Kim, D., Kim, J., Lee, S.: Catalog Integration for Electronic Commerce through Category-Hierarchy Merging Technique. In: Proc. the 12th Int’l Workshop on Research Issues in Data Engineering: Engineering e-Commerce/e-Business Systems (RIDE 2002), pp. 28–33 (Febraury 2002)
Marron, P.J., Lausen, G., Weber, M.: Catalog Integration Made Easy. In: Proc. the 19th Int’l Conf. on Data Engineering (ICDE 2003), pp. 677–679 (March 2003)
Rennie, J.D.M., Rifkin, R.: Improving Multiclass Text Classification with the Support Vector Machine. Tech. Report AI Memo AIM-2001-026 and CCL Memo 210. MIT (October 2001)
Sarawagi, S., Chakrabarti, S., Godbole., S.: Cross-Training: Learning Probabilistic Mappings between Topics. In: Proc. the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 177–186 (Augest 2003)
Stonebraker, M., Hellerstein, J.M.: Content Integration for e-Commerce. In: Proc. of the 2001 ACM SIGMOD Int’l Conf. on Management of Data, pp. 552–560 (May 2001)
Sun, A., Lim, E.-P., Ng., W.-K.: Performance Measurement Framework for Hierarchical Text Classification. Journal of the American Society for Information Science and Technology (JASIST) 54(11), 1014–1028 (2003)
Tsay, J.-J., Chen, H.-Y., Chang, C.-F., Lin, C.-H.: Enhancing Techniques for Efficient Topic Hierarchy Integration. In: Proc. the 3rd Int’l Conf. on Data Mining (ICDM 2003), pp. 657–660 (November 2003)
Wu, C.-W., Tsai, T.-H., Hsu, W.-L.: Learning to Integrate Web Taxonomies with Fine- Grained Relations: A Case Study Using Maximum Entropy Model. In: Proc. of Asia Information Retrieval Symposium 2005 (AIRS 2005), pp. 190–205 (October 2005)
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proc. the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pp. 42–49 (Augest 1999)
Zadrozny., B.: Reducing Multiclass to Binary by Coupling Probability Estimates. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 (NIPS 2001). MIT Press, Cambridge (2002)
Zhang, D., Lee, W.S.: Web Taxonomy Integration using Support Vector Machines. In: Proc. WWW 2004, pp. 472–481 (May 2004)
Zhang, D., Lee, W.S.: Web Taxonomy Integration through Co-Bootstrapping. In: Proc. SIGIR 2004, pp. 410–417 (July 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ho, JC., Chen, IX., Yang, CZ. (2006). Learning to Integrate Web Catalogs with Conceptual Relationships in Hierarchical Thesaurus. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_17
Download citation
DOI: https://doi.org/10.1007/11880592_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)