Knowledge Supervised Text Classification with No Labeled Documents

  • Congle Zhang
  • Gui-Rong Xue
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5351)


In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called “Knowledge Supervised Learning”(KSL), which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1 on average, which is much better than baselines and even comparable against SVM classifier supervised by labeled documents.


Supervise Learning Test Document Semantic Meaning Dependent Risk Label Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dayanik, A., Lewis, D.: Constructing informative prior distributions from domain knowledge in text classification. In: SIGIR 2006, pp. 493–500 (1995)Google Scholar
  2. 2.
    Genkin, A., Lewis, D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technical report, DIMACS (2004)Google Scholar
  3. 3.
    Liu, B., Li, X., Lee, W.S.: Text Classification by Labeling Words. In: AAAI 2004, pp. 425–430 (2004)Google Scholar
  4. 4.
    Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. In: EMNLP 2004 (2004)Google Scholar
  5. 5.
    Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994 (1994)Google Scholar
  6. 6.
    Madigan, D., Gavrin, J., Raftery, A.: Eliciting prior information to enhance the predictive performance of bayesian graphical models. Communications in Statistics-Theory and Methods, pp. 2271–2292 (1995)Google Scholar
  7. 7.
    Gabrilovich, E., Markovitch, S.: Feature Generation for Text Categorization Using World Knowledge. In: IJCAI 2005 (2005)Google Scholar
  8. 8.
    Gabrilovich, E., Markovitch, S.: Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: AAAI 2006 (2006)Google Scholar
  9. 9.
    Ifrim, G., Weikum, G.: Transductive Learning for Text Classification Using Explicit Knowledge Models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 223–234. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Raghavan, H., Madani, O., Jones, R.: Interactive feature selection. In: IJCAI 2005, pp. 841–846 (2005)Google Scholar
  11. 11.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proceedings of SIGIR 2001 (2001)Google Scholar
  12. 12.
    Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-training. In: CIKM 2000, pp. 86–93 (2000)Google Scholar
  13. 13.
    Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for text learning tasks. In: IJCAI 1999 Workshop on Text Mining (1999)Google Scholar
  14. 14.
    Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: ICML 2006, pp. 713–720 (2006)Google Scholar
  15. 15.
    Schapire, R., Rochery, M., Rahim, M., Gupta, N.: Incorporating prior knowledge into boosting. In: ICML 2002 (2002)Google Scholar
  16. 16.
    Hofmann, T., Puzicha, J.: Statistical Models for Co-occurrence Data. Technical Report 1999 (1999)Google Scholar
  17. 17.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, ICML 1999 (1999)Google Scholar
  19. 19.
    T. Joachims, Transductive Learning via Spectral Graph Partitioning. In: Proceedings of the International Conference on Machine Learning (ICML) (2003)Google Scholar
  20. 20.
    Mitchell, T.: The role of unlabeled data in supervised learning. In: Proceedings of the Sixth International Colloquium on Cognitive Science (1999)Google Scholar
  21. 21.
    Ji, X., Xu, W.: Document clustering with prior knowledge. In: SIGIR 2006, pp. 405–412 (2006)Google Scholar
  22. 22.
    Wu, X., Srihari, R.: Incorporating prior knowledge with weighted margin support vector machines. In: KDD 2004, pp. 326–333 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Congle Zhang
    • 1
    • 2
  • Gui-Rong Xue
    • 1
    • 2
  • Yong Yu
  1. 1.Apex LabShanghai Jiaotong UniversityShanghaiChina
  2. 2.State Key Lab of CAD & CGZhejiang UniversityHangzhouChina

Personalised recommendations