Skip to main content

Heterogeneous Information Integration in Hierarchical Text Classification

  • Conference paper
  • 3022 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  2. Dekel, O., Keshet, J., Singer, Y.: Large Margin Hierarchical Classification. In: Proceedings of the 21st International Conference on Machine Learning (2004)

    Google Scholar 

  3. Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proc. SIGIR, pp. 256–263 (2000)

    Google Scholar 

  4. Huang, K., Yang, H., King, I., Lyu, M.R.: Learning Large Margin Classifiers Locally and Globally. In: Proceedings of the 21st International Conference on Machine Learning (2004)

    Google Scholar 

  5. Hofmann, T., Cai, L., Ciaramita, M.: Learning with Taxonomies: Classifying Documents and Words. In: Conference on Neural Information Processing Systems (NIPS)

    Google Scholar 

  6. Lewis, D.D.: Naïve (Bayes) at Forty: the Independence Assumption in Information Retrieval. In: ECML 1998 (1998)

    Google Scholar 

  7. Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support Vector Machines Classification with Very Large Scale Taxonomy, SIGKDD Explorations. Special Issue on Text Mining and Natural Language Processing 7(1), 36–43 (2005)

    Google Scholar 

  8. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University, CA (1998)

    Google Scholar 

  9. Raydan, M.: The BarziLai and Borwein Gradient Method for Large Scale Unconstrained Minimization Problem. SIAM J. OPIM (1997)

    Google Scholar 

  10. Sun, A., Lim, E.P.: Hierarchical Text Classification and Evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining (2001)

    Google Scholar 

  11. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  12. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)

    Article  MathSciNet  Google Scholar 

  13. http://people.csail.mit.edu/~jrenie/20Newsgroups

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, HY., Liu, TY., Gao, L., Ma, WY. (2006). Heterogeneous Information Integration in Hierarchical Text Classification. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_29

Download citation

  • DOI: https://doi.org/10.1007/11731139_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics