Abstract
Previous studies have shown that one-class SVM is a rather weak learning method for text categorization problems. This paper points out that the poor performance observed before is largely due to the fact that the standard term weighting schemes are inadequate for one-class SVMs. We propose several representation modifications, and demonstrate empirically that, with the proposed document representation, the performance of one-class SVM, although trained on only small portion of positive examples, can reach up to 95% of that of two-class SVM trained on the whole labeled dataset.
Chapter PDF
Similar content being viewed by others
References
Chang, C., Lin, C.: LIBSVM: a library for support vector machines, version 2.3 (2001)
Denis, F.: PAC learning from positive statistical queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)
Joachims, T.: Learning To Classify Text Using Support Vector Machines. Kluwer Academic Publishers, Boston (2002)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of ICML 2003, 20th International Conference on Machine Learning, ACM Press, US (2003)
Liu, B., Lee, W., Yu, P., Li, X.: Partially supervised classification of text documents. In: Proc. 19th Intl. Conf. on Machine Learning, Sydney, Australia (July 2002)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proceedings of Third IEEE International Conference on Data Mining, Melbourne, Florida (2003)
Manevitz, L.M., Yousef, M.: One-class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Belkin, N.J., Narasimhalu, A.D.-s., Willett, P. (eds.) Proceedings of SIGIR 1997, 20th ACM International Conference on Research and Development in Information Retrieval, Philadelphia, US, pp. 67–73. ACM Press, New York (1997)
Piatt, J.: Fast training of support vector machines using sequential minimal optimization. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in kernel methods - support vector learning, MIT Press, Cambridge (1998)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization,Optimization and Beyond. The MIT Press, Cambridge (2002)
Wettschereck, D., Dietterich, T.G.: An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning 19(l), 5–27 (1995)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Yu, H., Han, J., Pebl, K.C.-C.: Positive example-based learning for web page classification using svm. In: Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery in Databases (KDD 2002), New York, pp. 239–248 (2002)
Zhang, T., Oles, P.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4(1), 5–31 (2001)
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced datasets. KDD Exploration, Special issue on Learning from Imbalanced Datasets 6(1) (2004) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, X., Srihari, R., Zheng, Z. (2004). Document Representation for One-Class SVM. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-30115-8_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive