Information Retrieval

, Volume 4, Issue 1, pp 5–31 | Cite as

Text Categorization Based on Regularized Linear Classification Methods

  • Tong Zhang
  • Frank J. Oles


A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.

text categorization linear classification regularization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Apte C, Damerau F and Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12: 233-251.Google Scholar
  2. Cooper WS, Gey FC and Dabney DP (1992) Probabilistic retrieval based on staged logistic regression. In: SGIR 92, pp. 198-210.Google Scholar
  3. Cortes C and Vapnik V (1995) Support vector networks. Machine Learning, 20: 273-297.Google Scholar
  4. Dumais S, Platt J, Heckerman D and Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, pp. 148-155.Google Scholar
  5. Fuhr N and Pfeifer U (1991) Combining model-oriented and description-oriented approaches for probabilistic indexing. In: SIGIR 91, pp. 46-56.Google Scholar
  6. Gey FC (1994) Inferring probability of relevance using the method of logistic regression. In: SIGIR 94, pp. 222-231.Google Scholar
  7. Golub G and Van Loan C (1996) Matrix Computations, 3rd ed. Johns Hopkins University Press, Baltimore, MD.Google Scholar
  8. Hastie TJ and Tibshirani RJ (1990) Generalized Additive Models, Chapman and Hall Ltd., London.Google Scholar
  9. Hoerl AE and Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1): 55-67.Google Scholar
  10. Ittner DJ, Lewis DD and Ahn DD (1995) Text categorization of low quality images. In: Symposium on Document Analysis and Information Retrieval, pp. 301-315.Google Scholar
  11. Jaakkola T, Diekhans M and Haussler D (2000) A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7: 95-114.Google Scholar
  12. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: European Conference on Machine Learing, ECML-98, pp. 137-142.Google Scholar
  13. Lewis DD and Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR 94, pp. 3-12.Google Scholar
  14. McCallum A and Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.Google Scholar
  15. Minsky M and Papert S (1990) Perceptrons, MIT Press, Cambridge, MA, expanded edition.Google Scholar
  16. Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, Bartlett P, Scholkopf B and Schuurmans D, Eds. Advances in Large Margin Classifiers, MIT Press, Cambridge, MA.Google Scholar
  17. Ripley B (1996) Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, MA.Google Scholar
  18. Rockafellar RT (1970) Convex Analysis, Princeton University Press, Princeton, NJ.Google Scholar
  19. Schölkopf B, Burges CJC and Smola AJ, Eds. (1999) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA.Google Scholar
  20. Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: SIGIR 95, pp. 229-237.Google Scholar
  21. Vapnik V (1998) Statistical Learning Theory, John Wiley & Sons, New York.Google Scholar
  22. Wahba G (1999) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, Ch.6.Google Scholar
  23. Weiss S, Apte C, Damerau F, Johnson D, Oles F, Goetz T and Hampp T (1999) Maximizing text-mining performance. IEEE Intelligent Systems, 14: 69-90.Google Scholar
  24. Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval Journal, 1: 69-90.Google Scholar
  25. Yang Y and Chute CG (1994) An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12: 252-277.Google Scholar
  26. Yang Y and Liu X (1999) A re-examination of text categorization methods. In: SIGIR 99, pp. 42-49.Google Scholar
  27. Yang Y and Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Tong Zhang
    • 1
  • Frank J. Oles
    • 1
  1. 1.Mathematical Sciences DepartmentIBM T.J. Watson Research Center

Personalised recommendations