Iterative Reinforcement Cross-Domain Text Classification

Zhang, Di; Xue, Gui-Rong; Yu, Yong

doi:10.1007/978-3-540-88192-6_27

Di Zhang⁶,
Gui-Rong Xue⁶ &
Yong Yu⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2532 Accesses

Abstract

Traditional text classification techniques are based on a basic assumption that the underlying distributions of training and test data should be identical. However, in many real world applications, this assumption is not often satisfied. Labeled training data are expensive, but there may be some labeled data available in a different but related domain from test data. Therefore, how to make use of labeled data from a different domain to supervise the classification becomes a crucial task. In this paper, we propose a novel algorithm for cross-domain text classification using reinforcement learning. In our algorithm, the training process is iteratively reinforced by making use of the relations between documents and words. Empirically, our method is an effective and scalable approach for text categorization when the training and test data are from different but related domains. The experimental results show that our algorithm can achieve better performance than several state-of-art classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mitchell, T.M.: 6. In: Machine Learning, p. 179. McGraw Hill, New York (1997)
Google Scholar
Schmidhuber, J.: On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultat fur Informatik (1994)
Google Scholar
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Article Google Scholar
Lewis, D.D.: Representation and learning in information retrieval. Ph.D thesis, Amherst, MA, USA (1992)
Google Scholar
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin–Madison (2006)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory (1998)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of Sixteenth International Conference on Machine Learning (1999)
Google Scholar
Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proceedings of the Sixteenth Annual Conference on Learning Theory (2003)
Google Scholar
Bennett, P.N., Dumais, S.T., Horvitz, E.: Inductive transfer for text classification using generalized reliability indicators. In: Proceedings of ICML 2003 Workshop on The Continuum from Labeled and Unlabeled Data (2003)
Google Scholar
Swarup, S., Ray, S.R.: Cross-domain knowledge transfer using structured representations. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)
Google Scholar
DauméIII, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006)
MathSciNet Google Scholar
Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of Twenty-Third International Conference on Machine Learning (2006)
Google Scholar
Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., Ma, W.: ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 274–281 (2003)
Google Scholar
Xue, G., Shen, D., Yang, Q., Zeng, H., Chen, Z., Yu, Y., Xi, W., Ma, W.: IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects. In: ICDM 2004: Proceedings of the 4th IEEE International Conference on Data Mining, pp. 273–280 (2004)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)
Google Scholar
McCallum, A.K.: Simulated/real/aviation/auto usenet data, http://www.cs.umass.edu/~mccallum/code-data.html
Lewis, D.D.: Reuters-21578 test collection, http://www.daviddlewis.com/
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of Fourteenth International Conference on Machine Learning (1997)
Google Scholar
Karypis, G.: Cluto – software for clustering high-dimensional datasets, http://glaros.dtc.umn.edu/gkhome/views/cluto

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, No. 800 Dongchuan Road, Shanghai, 200240, China
Di Zhang, Gui-Rong Xue & Yong Yu

Authors

Di Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gui-Rong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
Department of Computer Science, The University of Western Ontario, Canada
Charles X. Ling
School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
Faculty of Science & Engineering, York University, 355 Lumbers Building, M3J 1P3, Toronto, Ontario, Canada
Nick J. Cercone
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, 4072, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Xue, GR., Yu, Y. (2008). Iterative Reinforcement Cross-Domain Text Classification. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-88192-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics