Abstract
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
Article PDF
Similar content being viewed by others
References
Apte C, Damerau F and Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12: 233-251.
Cooper WS, Gey FC and Dabney DP (1992) Probabilistic retrieval based on staged logistic regression. In: SGIR 92, pp. 198-210.
Cortes C and Vapnik V (1995) Support vector networks. Machine Learning, 20: 273-297.
Dumais S, Platt J, Heckerman D and Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, pp. 148-155.
Fuhr N and Pfeifer U (1991) Combining model-oriented and description-oriented approaches for probabilistic indexing. In: SIGIR 91, pp. 46-56.
Gey FC (1994) Inferring probability of relevance using the method of logistic regression. In: SIGIR 94, pp. 222-231.
Golub G and Van Loan C (1996) Matrix Computations, 3rd ed. Johns Hopkins University Press, Baltimore, MD.
Hastie TJ and Tibshirani RJ (1990) Generalized Additive Models, Chapman and Hall Ltd., London.
Hoerl AE and Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1): 55-67.
Ittner DJ, Lewis DD and Ahn DD (1995) Text categorization of low quality images. In: Symposium on Document Analysis and Information Retrieval, pp. 301-315.
Jaakkola T, Diekhans M and Haussler D (2000) A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7: 95-114.
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: European Conference on Machine Learing, ECML-98, pp. 137-142.
Lewis DD and Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR 94, pp. 3-12.
McCallum A and Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.
Minsky M and Papert S (1990) Perceptrons, MIT Press, Cambridge, MA, expanded edition.
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, Bartlett P, Scholkopf B and Schuurmans D, Eds. Advances in Large Margin Classifiers, MIT Press, Cambridge, MA.
Ripley B (1996) Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, MA.
Rockafellar RT (1970) Convex Analysis, Princeton University Press, Princeton, NJ.
Schölkopf B, Burges CJC and Smola AJ, Eds. (1999) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA.
Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: SIGIR 95, pp. 229-237.
Vapnik V (1998) Statistical Learning Theory, John Wiley & Sons, New York.
Wahba G (1999) Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, Ch.6.
Weiss S, Apte C, Damerau F, Johnson D, Oles F, Goetz T and Hampp T (1999) Maximizing text-mining performance. IEEE Intelligent Systems, 14: 69-90.
Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval Journal, 1: 69-90.
Yang Y and Chute CG (1994) An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12: 252-277.
Yang Y and Liu X (1999) A re-examination of text categorization methods. In: SIGIR 99, pp. 42-49.
Yang Y and Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhang, T., Oles, F.J. Text Categorization Based on Regularized Linear Classification Methods. Information Retrieval 4, 5–31 (2001). https://doi.org/10.1023/A:1011441423217
Issue Date:
DOI: https://doi.org/10.1023/A:1011441423217