Fast and accurate text classification via multiple linear discriminant projections

Chakrabarti, Soumen; Roy, Shourya; Soundalgekar, Mahesh V.

doi:10.1007/s00778-003-0098-9

Fast and accurate text classification via multiple linear discriminant projections

Original Paper
Published: August 2003

Volume 12, pages 170–185, (2003)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Soumen Chakrabarti¹,
Shourya Roy¹ &
Mahesh V. Soundalgekar¹

360 Accesses
76 Citations
3 Altmetric
Explore all metrics

Abstract.

Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their training time and memory requirement. For n training instances held in memory, the best-known SVM implementations take time proportional to n^a, where a is typically between 1.8 and 2.1. SVMs have been trained on data sets with several thousand instances, but Web directories today contain millions of instances that are valuable for mapping billions of Web pages into Yahoo!-like directories. We present SIMPL, a nearly linear-time classification algorithm that mimics the strengths of SVMs while avoiding the training bottleneck. It uses Fisher's linear discriminant, a classical tool from statistical pattern recognition, to project training instances to a carefully selected low-dimensional subspace before inducing a decision tree on the projected instances. SIMPL uses efficient sequential scans and sorts and is comparable in speed and memory scalability to widely used naive Bayes (NB) classifiers, but it beats NB accuracy decisively. It not only approaches and sometimes exceeds SVM accuracy, but also beats the running time of a popular SVM implementation by orders of magnitude. While describing SIMPL, we make a detailed experimental comparison of SVM-generated discriminants with Fisher's discriminants, and we also report on an analysis of the cache performance of a popular SVM implementation. Our analysis shows that SIMPL has the potential to be the method of choice for practitioners who want the accuracy of SVMs and the simplicity and speed of naive Bayes classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification

Article 20 March 2017

References

1. Agrawal R, Bayardo RJ, Srikant R (2000) Athena: mining-based interactive management of text databases. In: Proceedings of the 7th international conference on extending database technology (EDBT), Konstanz, Germany, March 2000. http://www.almaden.ibm.com/cs/people/ragrawal/papers/athena.ps
2. Basu C, Hirsh H, Cohen WW (1998) Recommendation as classification: using social and content-based information in recommendation. In: Proceedings of the 15th national conference on artificial intelligence, Madison, WI, July 1998, pp 714--720
3. Chakrabarti S, Dom B, Agrawal R, Raghavan P (1998) Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J http://www.cs.berkeley.edu/~soumen/\VLDB54_3.pdf
4. Cooke T (2002) Two variations on Fisher's linear discriminant for pattern recognition. IEEE Trans Patt Analysis Machine Intell (PAMI) 24(2):268--273 http://www.computer.org/\tpami\tp2002/i0268abs.htm
Google Scholar
5. Dasgupta S (1999) Learning mixtures of Gaussians. In: FOCS, pp 634--644 http://charlotte.ucsd.edu/users/dasgupta/papers/\focs2.ps
6. Dasgupta S (2000) Experiments with random projection. UAI 16:143--151 http://charlotte.ucsd.edu/users/dasgupta/papers/\random.ps
Google Scholar
7. Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New York
8. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th conference on information and knowledge management, 1998. http://www.research.microsoft.com/~jplatt/cikm98.pdf
9. Frankl P, Maehara H (1988) The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J Combin Theory B 44:355--362
Google Scholar
10. Friedman JH (1987) Exploratory projection pursuit. J Am Stat Assoc 82:249--266
Google Scholar
11. Fung G, Mangasarian OL (2001) Proximal support vector classifiers. In: Provost F, Srikant R (eds) Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, August 2001, pp 77--86 University of Wisconsin Data Mining Institute Technical Report 01-02, http://www.cs.wisc.edu/~gfung/
12. Fung G, Mangasarian OL (2002) Incremental support vector machine classification. In: Proceedings of the 2nd SIAM international conference on data mining, Arlington, VA, April 2002, pp 247--260 University of Wisconsin Data Mining Institute Technical Report 01-08, ftp://ftp.cs.wisc.edu/pub/dmi/\tech-reports/01-08.ps
13. Graefe G, Fayyad UM, Chaudhuri S (1998) On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Knowledge discovery and data mining, vol 4. AAAI Press, New York, pp 204--208
14. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, Lecture notes in computer science, vol 1398. Springer, Berlin Heidelberg New York, pp 137--142
15. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, MA http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.pdf
16. Joachims T (2001) A statistical learning model of text classification for support vector machines. In: Croft WB, Harper DJ, Kraft DH, Zobel J (eds) Proceedings of the international conference on research and development in information retrieval, vol 24, New Orleans, September 2001, ACM Press, New York, pp 128--136
17. Johnson RA, Wichern DW (2001) Applied multivariate statistical analysis, 3rd edn. Prentice-Hall, New Delhi
18. Kleinberg JM (1997) Two algorithms for nearest-neighbor search in high dimensions. In: Proceedings of the ACM symposium on theory of computing, pp 599--608
19. LeCun Y, Simard PY, Pearlmetter B (1993) Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors. In: Hanson SJ, Cowan JD, Lee-Giles C (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, CA, pp 156--163
20. Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining, Chicago, April 2001. http://www.siam.org/meetings/sdm01/pdf/sdm01_13.pdf
21. Lewis DD (1997) The reuters-21578 text categorization test collection, 1997. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
22. Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of SIGIR-96, 19th ACM international conference on research and development in information retrieval, ACM Press, New York, pp 298--306
23. Mangasarian OL, Musicant DR (1999) Successive over-relaxation for support vector machines. In: IEEE Trans Neural Netw 10:1032--1037 ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-18.ps
24. Mangasarian OL, Musicant DR (2000) Lagrangian support vector machines. Technical Report 00-06, Data Mining Institute, University of Wisconsin, Madison, June 2000. http://www.cs.wisc.edu/~musicant/
25. McCallum A (1998) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. Software available from http://www.cs.cmu.edu/~mccallum/bow/
26. McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: AAAI/\-ICML-98 workshop on learning for text categorization, AAAI Press, pp 41--48 Also technical report WS-98-05, CMU, http://www.cs.cmu.edu/~knigam/ papers/multinomial-aaaiws98.pdf.
27. Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2:1--32
Google Scholar
28. Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61--67. http://www.cs.cmu.edu/~knigam/ and http://www.cs.cmu.edu/~mccallum/papers/maxent-ijcaiws99.ps.gz
29. Pavlov D, Mao J, Dom B (2000) Scaling-up support vector machines using boosting algorithm. In: Proceedings of the international conference on pattern recognition (ICPR), Barcelona, September 2000. http://www.cvc.uab.es/ICPR2000/
30. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research. http://www.research.microsoft.com/users/jplatt/smoTR.pdf
31. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk E-mail. In: Learning for text categorization: papers from the 1998 workshop, Madison, WI, AAAI Technical Report WS-98-05
32. Schapire RE (2001) The boosting approach to machine learning: an overview. In: Proceedings of the MSRI workshop on nonlinear estimation and classification, Berkeley, CA, March 2001. http://stat.bell-labs.com/who/cocteau/nec/ and http://www.research.att.com/~schapire/boost.html
33. Schutze H, Hull DA, Pederson JO (1995) A comparison of classifiers and document representations for the routing problem. In: SIGIR, pp 229--237. ftp://parcftp.xerox.com/pub/qca/SIGIR95.ps
34. Shafer JC, Agrawal R, Mehta M (1996) SPRINT: A scalable parallel classifier for data mining. VLDB, pp 544--555
35. Shashua A (1999) On the equivalence between the support vector machine for classification and sparsified Fisher's linear discriminant. Neural Processing Lett 9(2):129--139 http://www.cs.huji.ac.il/~shashua/papers/fisher-NPL.pdf
36. Swayne DF, Cook D, Buja A (1998) XGobi: interactive dynamic data visualization in the x window system. J Computat Graph Stat 7(1) http://lib.stat.cmu.edu/general/XGobi/
37. Vapnik V, Golowich S, Smola AJ (1996) Support vector method for function approximation, regression estimation, and signal processing. In: Advances in neural information processing systems. MIT Press, Cambridge, MA
38. Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco

Download references

Author information

Authors and Affiliations

IIT Bombay
Soumen Chakrabarti, Shourya Roy & Mahesh V. Soundalgekar

Authors

Soumen Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Shourya Roy
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh V. Soundalgekar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumen Chakrabarti.

Additional information

Received: 9 September 2002, Accepted: 3 March 2003, Published online: 21 July 2003

Edited by Y. Ioannidis

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakrabarti, S., Roy, S. & Soundalgekar, M.V. Fast and accurate text classification via multiple linear discriminant projections. VLDB 12, 170–185 (2003). https://doi.org/10.1007/s00778-003-0098-9

Download citation

Issue Date: August 2003
DOI: https://doi.org/10.1007/s00778-003-0098-9

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and accurate text classification via multiple linear discriminant projections

Abstract.

Access this article

Similar content being viewed by others

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Fast and accurate text classification via multiple linear discriminant projections

Abstract.

Access this article

Similar content being viewed by others

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation