Abstract
This paper presents some empirical results showing that simple attribute scaling in the data preprocessing stage can improve the performance of linear binary classifiers. In particular, a class specific scaling method that utilises information about the class distribution of the training sample can significantly improve classification accuracy. This form of scaling can boost the performance of a simple centroid classifier to similar levels of accuracy as the more complex, and computationally expensive, support vector machine and regression classifiers. Further, when SVMs are used, scaled data produces better results, for smaller amounts of training data, and with smaller regularisation constant values, than unscaled data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berry, M.J.A., Linoff, G.: Data Mining Techniques: For Marketing, Sales and Customer Support. Wiley, New York (1997)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., California (1999)
Corte, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)
Kimeldorf, G., Whaba, G.: A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Ann. Math. Statist. 41, 495–502 (1970)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Computation 7, 219–269 (1995)
Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Kowalczyk, A., Raskutti, B.: Exploring Fringe Settings of SVMs for Classification. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 278–290. Springer, Heidelberg (2003)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Weiss, G., Provost, F.: The effect of class distribution on classifier learning. Technical report, Rutgers University (2001)
Centor, R.: Signal detectability: The use of ROC curves and their analysis. Med. Decis. Making 11, 102–106 (1991)
Fawcett, T.: ROC Graphs: Notes and practical considerations for data mining researchers. In: HP Labs Tech Report HPL-2003-4 (2003)
Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psych. 12, 387–415 (1975)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification (2003), http://www.csie.ntu.tw/cjlin/papers/guide/guide.pdf
Sarle, W.: Neural network FAQ (1997), ftp://ftp.sas.com/pub/neural/FAQ2.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edwards, C., Raskutti, B. (2004). The Effect of Attribute Scaling on the Performance of Support Vector Machines. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-30549-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)