A More Accurate Text Classifier for Positive and Unlabeled data

  • Rur Ming Xin
  • Wan li Zuo
Conference paper


Almost all LPU algorithms rely heavily on two steps: exploiting reliable negative dataset and supplementing positive dataset. For above two steps, this paper originally proposes a two-step approach, that is, CoTrain-Active. The first step, employing CoTrain algorithm, iterates to purify the unlabeled set with two individual SVM base classifiers. The second step, adopting active-learning algorithm, further expands the positive set effectively by request the true label for the "suspect positive" examples. Comprehensive experiments demonstrate that our approach is superior to Biased-SVM which is said to be previous best. Moreover, CoTrain-Active is especially suitable for those situations where the given positive dataset P is extremely insufficient.


Active Learning Unlabeled Data Positive Data Negative Data True Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of international conference on Machine Learning (ICML).Google Scholar
  2. [2]
    Liu, B., Dai, Y., Li, X., Lee, W.S. & Yu, P. (2003). Building Text Classifiers Using Positive and Unlabeled Examples. Proceedings ICDM-03Google Scholar
  3. [3]
    H. Yu, J. Han, and K.C.-C. Chang, "PEBL: Positive-Example Based Learning for Web Page Classification Using SVM," Proc. Eighth Int’l Conf. Knowledge Discovery and Data Mining (KDD ’02), pp. 239248, 2002.Google Scholar
  4. [4]
    Rocchio, J. (1971) Relevance Feedback Information Retrieval. In Gerald Salton (Ed.), The Smart Retrieval System-Experiments in Automated Document Processing, 313–323. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  5. [5]
    Liu, B., Lee, W.S., Yu, P.S. and Li, X., Partially Supervised Classification of Text Documents, Proc. 19th Intl. Conf. on Machine Learning 2002, 387–394Google Scholar
  6. [6]
    K. Aas and L. Eikvil. Text categorisation: A survey. Technical report, Norwegian Computing Center, June 1999Google Scholar
  7. [7]
    Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, pp. 137–142Google Scholar
  8. [8]
    Hua-Jun Zeng, Xuan-Hui Wang, Zheng Chen, Wei-Ying Ma. CBC: Clustering Based Text Classification Requiring Minimal Labeled Data. ICDM2003 page(s): 443–450Google Scholar
  9. [9]
    Porter,, M. (1980). An algorithm for suffix stripping. Program( Automated library and information Systems), 14(3): 130–137.Google Scholar
  10. [10]
    F. Denis, R. Gilleron, A. Laurent, M. Tommasi, Text Classification and Co-Training from Positive and Unlabeled Examples, Proceedings of the ICML 2003Google Scholar
  11. [11]
    Nigam, K., & Ghani, R. (2000). Analyzing the applicability and effectiveness of co-training. Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management (pp. 86–93)..Google Scholar
  12. [12]
    K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. AAAI Press, 1998.Google Scholar
  13. [13]
    McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Machine Learning: Proceedings of the Fifteenth International Conference (ICML ’98), pp. 350–358.Google Scholar

Copyright information

© Springer-Verlag/Wien 2005

Authors and Affiliations

  • Rur Ming Xin
    • 1
  • Wan li Zuo
    • 1
  1. 1.Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Educating, Colloege of Computer ScienceJiLin University of ChinaChina

Personalised recommendations