Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method

  • Dequan Zheng
  • Chenghe Zhang
  • Geli Fei
  • Tiejun Zhao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7182)


This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence of keywords, we construct a domain-adaptive transfer knowledge base. Relation experiments show that, the presented method improved the performance of text categorization on traditional corpus, and our results were only about 5% lower than the baseline on cross-domain classification tasks. And thus we demonstrate the effectiveness of our method.


Transfer learning Text Categorization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  2. 2.
    Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Co-clustering based Classification for Out-of-domain Documents. In: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), San Jose, California, USA, August 12-15, pp. 210–219 (2007)Google Scholar
  3. 3.
    Xue, G.-R., Dai, W., Yang, Q., Yu, Y.: Topic-bridged PLSA for Cross-Domain Text Classification. In: Proceedings of the Thirty-first International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR 2008), Singapore, July 20-24, pp. 627–634 (2008)Google Scholar
  4. 4.
    Ling, X., Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Spectral Domain-Transfer Learning. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Las Vegas, Nevada, USA, August 24-27, pp. 488–496 (2008)Google Scholar
  5. 5.
    Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Self-taught Clustering. In: Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland, July 5-9, pp. 200–207 (2008)Google Scholar
  6. 6.
    Dai, W., Chen, Y., Xue, G.-R., Yang, Q., Yu, Y.: Translated Learning: Transfer Learning across Different Feature Spaces. Advances in Neural Information ProcessingGoogle Scholar
  7. 7.
    Ling, X., Xue, G.-R., Dai, W., Jiang, Y., Yang, Q., Yu, Y.: Can Chinese Web Pages be Classified with English Data Source? In: Proceedings the Seventeenth International World Wide Web Conference (WWW 2008), Beijing, China, April 21-25, pp. 969–978 (2008)Google Scholar
  8. 8.
    Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 120–128 (2006)Google Scholar
  9. 9.
    Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  10. 10.
    Lewis, D.D.: Naïve(Bayes) at forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Yang, Y.M., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrival, Berkeley, CA, USA, pp. 42–49 (August 1999)Google Scholar
  12. 12.
    Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Yang, Y.M.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1), 76–88 (1999)CrossRefGoogle Scholar
  14. 14.
    He, J., Tan, A.H., Tan, C.L.: A Comparative Study on Chinese Text Categorization Methods. In: PRICAL 2000 Workshop on Text and Web Mining, Melbourne, pp. 24–35 (August 2000)Google Scholar
  15. 15.
    Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: Proceedings of the IJCAI 1999 Workshop on Information Filtering, Stockholm, Sweden (1999)Google Scholar
  16. 16.
    Wiener, E.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symopsium on Document Analysis and Information Retrieval (SDAIR 1995), Las Vegas, NV (1995)Google Scholar
  17. 17.
    Apte, C., Damerau, P., Weiss, S.: Text mining with decision rules and decision trees. In: Proceedings of the Conference on Automated Learning and Discovery Workshop 6: Learning from Text and the Web (1998)Google Scholar
  18. 18.
    Lent, B., Swami, A., Widom, J.: Clustering association rules. In: Proceedings of the Thirteenth International Conference on Data Engineering (ICDE 1997), Birmingham, England (1997)Google Scholar
  19. 19.
    Tan, S., Wang, Y.: Chinese Text Categorization Corpus-TanCorpV1.0.,
  20. 20.
    Tan, S., et al.: A Novel Refinement Approach for Text Categorization. In: ACM CIKM (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dequan Zheng
    • 1
  • Chenghe Zhang
    • 1
  • Geli Fei
    • 1
  • Tiejun Zhao
    • 1
  1. 1.MOE-MS Key Laboratory of Natural Language Processing and SpeechHarbin Institute of TechnologyHarbinChina

Personalised recommendations