Skip to main content

Iterative Reinforcement Cross-Domain Text Classification

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

  • 2532 Accesses

Abstract

Traditional text classification techniques are based on a basic assumption that the underlying distributions of training and test data should be identical. However, in many real world applications, this assumption is not often satisfied. Labeled training data are expensive, but there may be some labeled data available in a different but related domain from test data. Therefore, how to make use of labeled data from a different domain to supervise the classification becomes a crucial task. In this paper, we propose a novel algorithm for cross-domain text classification using reinforcement learning. In our algorithm, the training process is iteratively reinforced by making use of the relations between documents and words. Empirically, our method is an effective and scalable approach for text categorization when the training and test data are from different but related domains. The experimental results show that our algorithm can achieve better performance than several state-of-art classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mitchell, T.M.: 6. In: Machine Learning, p. 179. McGraw Hill, New York (1997)

    Google Scholar 

  2. Schmidhuber, J.: On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultat fur Informatik (1994)

    Google Scholar 

  3. Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)

    Article  Google Scholar 

  4. Lewis, D.D.: Representation and learning in information retrieval. Ph.D thesis, Amherst, MA, USA (1992)

    Google Scholar 

  5. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992)

    Google Scholar 

  6. Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin–Madison (2006)

    Google Scholar 

  7. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory (1998)

    Google Scholar 

  8. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  9. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of Sixteenth International Conference on Machine Learning (1999)

    Google Scholar 

  10. Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proceedings of the Sixteenth Annual Conference on Learning Theory (2003)

    Google Scholar 

  11. Bennett, P.N., Dumais, S.T., Horvitz, E.: Inductive transfer for text classification using generalized reliability indicators. In: Proceedings of ICML 2003 Workshop on The Continuum from Labeled and Unlabeled Data (2003)

    Google Scholar 

  12. Swarup, S., Ray, S.R.: Cross-domain knowledge transfer using structured representations. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)

    Google Scholar 

  13. DauméIII, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006)

    MathSciNet  Google Scholar 

  14. Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of Twenty-Third International Conference on Machine Learning (2006)

    Google Scholar 

  15. Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., Ma, W.: ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 274–281 (2003)

    Google Scholar 

  16. Xue, G., Shen, D., Yang, Q., Zeng, H., Chen, Z., Yu, Y., Xi, W., Ma, W.: IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects. In: ICDM 2004: Proceedings of the 4th IEEE International Conference on Data Mining, pp. 273–280 (2004)

    Google Scholar 

  17. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)

    Google Scholar 

  18. McCallum, A.K.: Simulated/real/aviation/auto usenet data, http://www.cs.umass.edu/~mccallum/code-data.html

  19. Lewis, D.D.: Reuters-21578 test collection, http://www.daviddlewis.com/

  20. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  21. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of Fourteenth International Conference on Machine Learning (1997)

    Google Scholar 

  22. Karypis, G.: Cluto – software for clustering high-dimensional datasets, http://glaros.dtc.umn.edu/gkhome/views/cluto

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, D., Xue, GR., Yu, Y. (2008). Iterative Reinforcement Cross-Domain Text Classification. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics