Skip to main content
Log in

Fast and accurate text classification via multiple linear discriminant projections

  • Original Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their training time and memory requirement. For n training instances held in memory, the best-known SVM implementations take time proportional to na, where a is typically between 1.8 and 2.1. SVMs have been trained on data sets with several thousand instances, but Web directories today contain millions of instances that are valuable for mapping billions of Web pages into Yahoo!-like directories. We present SIMPL, a nearly linear-time classification algorithm that mimics the strengths of SVMs while avoiding the training bottleneck. It uses Fisher's linear discriminant, a classical tool from statistical pattern recognition, to project training instances to a carefully selected low-dimensional subspace before inducing a decision tree on the projected instances. SIMPL uses efficient sequential scans and sorts and is comparable in speed and memory scalability to widely used naive Bayes (NB) classifiers, but it beats NB accuracy decisively. It not only approaches and sometimes exceeds SVM accuracy, but also beats the running time of a popular SVM implementation by orders of magnitude. While describing SIMPL, we make a detailed experimental comparison of SVM-generated discriminants with Fisher's discriminants, and we also report on an analysis of the cache performance of a popular SVM implementation. Our analysis shows that SIMPL has the potential to be the method of choice for practitioners who want the accuracy of SVMs and the simplicity and speed of naive Bayes classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • 1. Agrawal R, Bayardo RJ, Srikant R (2000) Athena: mining-based interactive management of text databases. In: Proceedings of the 7th international conference on extending database technology (EDBT), Konstanz, Germany, March 2000. http://www.almaden.ibm.com/cs/people/ragrawal/papers/athena.ps

  • 2. Basu C, Hirsh H, Cohen WW (1998) Recommendation as classification: using social and content-based information in recommendation. In: Proceedings of the 15th national conference on artificial intelligence, Madison, WI, July 1998, pp 714--720

  • 3. Chakrabarti S, Dom B, Agrawal R, Raghavan P (1998) Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J http://www.cs.berkeley.edu/~soumen/\VLDB54_3.pdf

  • 4. Cooke T (2002) Two variations on Fisher's linear discriminant for pattern recognition. IEEE Trans Patt Analysis Machine Intell (PAMI) 24(2):268--273 http://www.computer.org/\tpami\tp2002/i0268abs.htm

    Google Scholar 

  • 5. Dasgupta S (1999) Learning mixtures of Gaussians. In: FOCS, pp 634--644 http://charlotte.ucsd.edu/users/dasgupta/papers/\focs2.ps

  • 6. Dasgupta S (2000) Experiments with random projection. UAI 16:143--151 http://charlotte.ucsd.edu/users/dasgupta/papers/\random.ps

    Google Scholar 

  • 7. Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New York

  • 8. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th conference on information and knowledge management, 1998. http://www.research.microsoft.com/~jplatt/cikm98.pdf

  • 9. Frankl P, Maehara H (1988) The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J Combin Theory B 44:355--362

    Google Scholar 

  • 10. Friedman JH (1987) Exploratory projection pursuit. J Am Stat Assoc 82:249--266

    Google Scholar 

  • 11. Fung G, Mangasarian OL (2001) Proximal support vector classifiers. In: Provost F, Srikant R (eds) Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, August 2001, pp 77--86 University of Wisconsin Data Mining Institute Technical Report 01-02, http://www.cs.wisc.edu/~gfung/

  • 12. Fung G, Mangasarian OL (2002) Incremental support vector machine classification. In: Proceedings of the 2nd SIAM international conference on data mining, Arlington, VA, April 2002, pp 247--260 University of Wisconsin Data Mining Institute Technical Report 01-08, ftp://ftp.cs.wisc.edu/pub/dmi/\tech-reports/01-08.ps

  • 13. Graefe G, Fayyad UM, Chaudhuri S (1998) On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Knowledge discovery and data mining, vol 4. AAAI Press, New York, pp 204--208

  • 14. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, Lecture notes in computer science, vol 1398. Springer, Berlin Heidelberg New York, pp 137--142

  • 15. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, MA http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.pdf

  • 16. Joachims T (2001) A statistical learning model of text classification for support vector machines. In: Croft WB, Harper DJ, Kraft DH, Zobel J (eds) Proceedings of the international conference on research and development in information retrieval, vol 24, New Orleans, September 2001, ACM Press, New York, pp 128--136

  • 17. Johnson RA, Wichern DW (2001) Applied multivariate statistical analysis, 3rd edn. Prentice-Hall, New Delhi

  • 18. Kleinberg JM (1997) Two algorithms for nearest-neighbor search in high dimensions. In: Proceedings of the ACM symposium on theory of computing, pp 599--608

  • 19. LeCun Y, Simard PY, Pearlmetter B (1993) Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors. In: Hanson SJ, Cowan JD, Lee-Giles C (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, CA, pp 156--163

  • 20. Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining, Chicago, April 2001. http://www.siam.org/meetings/sdm01/pdf/sdm01_13.pdf

  • 21. Lewis DD (1997) The reuters-21578 text categorization test collection, 1997. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  • 22. Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of SIGIR-96, 19th ACM international conference on research and development in information retrieval, ACM Press, New York, pp 298--306

  • 23. Mangasarian OL, Musicant DR (1999) Successive over-relaxation for support vector machines. In: IEEE Trans Neural Netw 10:1032--1037 ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-18.ps

  • 24. Mangasarian OL, Musicant DR (2000) Lagrangian support vector machines. Technical Report 00-06, Data Mining Institute, University of Wisconsin, Madison, June 2000. http://www.cs.wisc.edu/~musicant/

  • 25. McCallum A (1998) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. Software available from http://www.cs.cmu.edu/~mccallum/bow/

  • 26. McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: AAAI/\-ICML-98 workshop on learning for text categorization, AAAI Press, pp 41--48 Also technical report WS-98-05, CMU, http://www.cs.cmu.edu/~knigam/ papers/multinomial-aaaiws98.pdf.

  • 27. Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2:1--32

    Google Scholar 

  • 28. Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61--67. http://www.cs.cmu.edu/~knigam/ and http://www.cs.cmu.edu/~mccallum/papers/maxent-ijcaiws99.ps.gz

  • 29. Pavlov D, Mao J, Dom B (2000) Scaling-up support vector machines using boosting algorithm. In: Proceedings of the international conference on pattern recognition (ICPR), Barcelona, September 2000. http://www.cvc.uab.es/ICPR2000/

  • 30. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research. http://www.research.microsoft.com/users/jplatt/smoTR.pdf

  • 31. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk E-mail. In: Learning for text categorization: papers from the 1998 workshop, Madison, WI, AAAI Technical Report WS-98-05

  • 32. Schapire RE (2001) The boosting approach to machine learning: an overview. In: Proceedings of the MSRI workshop on nonlinear estimation and classification, Berkeley, CA, March 2001. http://stat.bell-labs.com/who/cocteau/nec/ and http://www.research.att.com/~schapire/boost.html

  • 33. Schutze H, Hull DA, Pederson JO (1995) A comparison of classifiers and document representations for the routing problem. In: SIGIR, pp 229--237. ftp://parcftp.xerox.com/pub/qca/SIGIR95.ps

  • 34. Shafer JC, Agrawal R, Mehta M (1996) SPRINT: A scalable parallel classifier for data mining. VLDB, pp 544--555

  • 35. Shashua A (1999) On the equivalence between the support vector machine for classification and sparsified Fisher's linear discriminant. Neural Processing Lett 9(2):129--139 http://www.cs.huji.ac.il/~shashua/papers/fisher-NPL.pdf

  • 36. Swayne DF, Cook D, Buja A (1998) XGobi: interactive dynamic data visualization in the x window system. J Computat Graph Stat 7(1) http://lib.stat.cmu.edu/general/XGobi/

  • 37. Vapnik V, Golowich S, Smola AJ (1996) Support vector method for function approximation, regression estimation, and signal processing. In: Advances in neural information processing systems. MIT Press, Cambridge, MA

  • 38. Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumen Chakrabarti.

Additional information

Received: 9 September 2002, Accepted: 3 March 2003, Published online: 21 July 2003

Edited by Y. Ioannidis

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakrabarti, S., Roy, S. & Soundalgekar, M.V. Fast and accurate text classification via multiple linear discriminant projections. VLDB 12, 170–185 (2003). https://doi.org/10.1007/s00778-003-0098-9

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-003-0098-9

Keywords:

Navigation