Knowledge and Information Systems

, Volume 28, Issue 1, pp 79–98 | Cite as

Single pass text classification by direct feature weighting

  • Hassan H. Malik
  • Dmitriy Fradkin
  • Fabian Moerchen
Regular Paper


The Feature Weighting Classifier (FWC) is an efficient multi-class classification algorithm for text data that uses Information Gain to directly estimate per-class feature weights in the classifier. This classifier requires only a single pass over the dataset to compute the feature frequencies per class, is easy to implement, and has memory usage that is linear in the number of features. Results of experiments performed on 128 binary and multi-class text and web datasets show that FWC’s performance is at least comparable to, and often better than that of Naive Bayes, TWCNB, Winnow, Balanced Winnow and linear SVM. On a large-scale web dataset with 12,294 classes and 135,973 training instances, FWC trained in 13 s and yielded comparable classification performance to a state of the art multi-class SVM implementation, which took over 15 min to train.


Text classification Feature weighting Linear classifiers Information gain Scalable learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anagnostopoulos A, Broder A, Punera K (2008) Effective and efficient classification on a search-engine model. Knowl Inf Syst 16(2): 129–154CrossRefGoogle Scholar
  2. 2.
    Cohen W (1995) Fast effective rule induction. In: Proceedings of the international conference on machine learning (ICML). pp 115–123Google Scholar
  3. 3.
    Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47: 201–233MATHCrossRefGoogle Scholar
  4. 4.
    Davidov D, Gabrilovich E, Markovitch S (2004) Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: The 27th annual international ACM SIGIR conference. pp 250–257Google Scholar
  5. 5.
    Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874Google Scholar
  6. 6.
    Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res (JMLR) 3: 1289–1305MATHCrossRefGoogle Scholar
  7. 7.
    Forman G (2008) BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of 17th ACM conference on information and knowledge management (CIKM). pp 263–270Google Scholar
  8. 8.
    Gabrilovich E, Markovitch S (2004) Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with c4.5. In: The 21st international conference on machine learning (ICML). pp 321–328Google Scholar
  9. 9.
    Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on machine learning (ICML). pp 377–384Google Scholar
  10. 10.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18CrossRefGoogle Scholar
  11. 11.
    Joachims T (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Springer, BerlinGoogle Scholar
  12. 12.
    Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the international conference on knowledge discovery and data mining (KDD). pp 217–226Google Scholar
  13. 13.
    Junejo KN, Karim A (2008) A robust discriminative term weighting based linear discriminant method for text classification. In: Proceedings of IEEE international conference on data mining (ICDM). pp 323–332Google Scholar
  14. 14.
    Karypis G (2003) CLUTO: a software package for clustering high dimensional datasets.
  15. 15.
    Keerthi SS, Sundararajan S, Chang K-W, Hsieh C-J, Lin C-J (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  16. 16.
    Kibriya AM, Frank E, Pfahringer B, Holmes G (2004) Multinomial Naive Bayes for text categorization revisited. In: Webb G, Yu X (eds) AI 2004, LNAI 3339. Springer, Berlin, pp 488–499Google Scholar
  17. 17.
    Lewis DD, Yang Y, Rose T, Li F (2004) RCV1: a new benchmark collection for text categorization. J Mach Learn Res 5: 361–397Google Scholar
  18. 18.
    Littlestone N (1988) Learning quickly when irrelevant attributes are abound: a new linear threshold algorithm. Mach Learn 2: 285–318Google Scholar
  19. 19.
    Littlestone N (1989) Mistake bounds and logarithmic linear-threshold learning algorithms. Technical report UCSC-CRL-89-11, University of California, Santa CruzGoogle Scholar
  20. 20.
    Lyman P, Varian HR (2003) How much information?
  21. 21.
    Madani O, Connor M, Greiner W (2009) Learning when concepts abound. J Mach Learn Res 10: 2571–2613MathSciNetGoogle Scholar
  22. 22.
    Malik HH, Kender JR (2008) Classifying high-dimensional text and web data using very short patterns. In: Proceedings of IEEE international conference on data mining (ICDM). pp 923–928Google Scholar
  23. 23.
    McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. In: Proceedings of AAAI-98 workshop on learning for text categorization. pp 41–48Google Scholar
  24. 24.
    Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACLGoogle Scholar
  25. 25.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106Google Scholar
  26. 26.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufman, Los AltosGoogle Scholar
  27. 27.
    Quinlan JR, Cameron-Jones RM (1993) FOIL: a midterm report. In: Proceedings of the European conference on machine learning (ECML). pp 3–20Google Scholar
  28. 28.
    Rennie JD (2001) Improving multi-class text classification with Naive Bayes. AI technical report 2001-04, Massachusetts Institute of TechnologyGoogle Scholar
  29. 29.
    Rennie JD, Shih L, Teevan J, Karger D (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML)Google Scholar
  30. 30.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5): 513–523CrossRefGoogle Scholar
  31. 31.
    Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34: 1–47CrossRefMathSciNetGoogle Scholar
  32. 32.
    Wang P, Hu J, Zeng H-J, Chen Z (2009) Using wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3): 265–281CrossRefGoogle Scholar
  33. 33.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, 14th international conference on machine learning. pp 412–420Google Scholar
  34. 34.
    Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceedings of the SIAM international conference on data mining (SDM). pp 331–335Google Scholar
  35. 35.
    Zhang R, Tran T (2010) An information gain-based approach for recommending useful product reviews. Knowl Inf Syst. doi: 10.1007/s10115-010-0287-y

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • Hassan H. Malik
    • 1
  • Dmitriy Fradkin
    • 2
  • Fabian Moerchen
    • 2
  1. 1.Thomson ReutersNew YorkUSA
  2. 2.Integrated Data SystemsSiemens Corporate ResearchPrincetonUSA

Personalised recommendations