Heterogeneous Information Integration in Hierarchical Text Classification

Yang, Huai-Yuan; Liu, Tie-Yan; Gao, Li; Ma, Wei-Ying

doi:10.1007/11731139_29

Heterogeneous Information Integration in Hierarchical Text Classification

Huai-Yuan Yang^22,23,
Tie-Yan Liu²²,
Li Gao²³ &
…
Wei-Ying Ma²²

Conference paper

3022 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Dekel, O., Keshet, J., Singer, Y.: Large Margin Hierarchical Classification. In: Proceedings of the 21st International Conference on Machine Learning (2004)
Google Scholar
Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proc. SIGIR, pp. 256–263 (2000)
Google Scholar
Huang, K., Yang, H., King, I., Lyu, M.R.: Learning Large Margin Classifiers Locally and Globally. In: Proceedings of the 21st International Conference on Machine Learning (2004)
Google Scholar
Hofmann, T., Cai, L., Ciaramita, M.: Learning with Taxonomies: Classifying Documents and Words. In: Conference on Neural Information Processing Systems (NIPS)
Google Scholar
Lewis, D.D.: Naïve (Bayes) at Forty: the Independence Assumption in Information Retrieval. In: ECML 1998 (1998)
Google Scholar
Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support Vector Machines Classification with Very Large Scale Taxonomy, SIGKDD Explorations. Special Issue on Text Mining and Natural Language Processing 7(1), 36–43 (2005)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University, CA (1998)
Google Scholar
Raydan, M.: The BarziLai and Borwein Gradient Method for Large Scale Unconstrained Minimization Problem. SIAM J. OPIM (1997)
Google Scholar
Sun, A., Lim, E.P.: Hierarchical Text Classification and Evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining (2001)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)
Article MathSciNet Google Scholar
http://people.csail.mit.edu/~jrenie/20Newsgroups

Download references

Author information

Authors and Affiliations

5F Sigma Center, Microsoft Research Asia, No. 49 Zhichun Road, Haidian District, Beijing, 100080, P.R. China
Huai-Yuan Yang, Tie-Yan Liu & Wei-Ying Ma
Department of Scientific & Engineering Computing School of Mathematical Sciences, Peking University, Beijing, 100871, P.R. China
Huai-Yuan Yang & Li Gao

Authors

Huai-Yuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tie-Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ying Ma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, HY., Liu, TY., Gao, L., Ma, WY. (2006). Heterogeneous Information Integration in Hierarchical Text Classification. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_29

Download citation

DOI: https://doi.org/10.1007/11731139_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics