An Efficient Alternative to SVM Based Recursive Feature Elimination with Applications in Natural Language Processing and Bioinformatics

Bedo, Justin; Sanderson, Conrad; Kowalczyk, Adam

doi:10.1007/11941439_21

An Efficient Alternative to SVM Based Recursive Feature Elimination with Applications in Natural Language Processing and Bioinformatics

Justin Bedo^20,21,
Conrad Sanderson^20,21 &
Adam Kowalczyk^20,21,22

Conference paper

3663 Accesses
16 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Abstract

The SVM based Recursive Feature Elimination (RFE-SVM) algorithm is a popular technique for feature selection, used in natural language processing and bioinformatics. Recently it was demonstrated that a small regularisation constant C can considerably improve the performance of RFE-SVM on microarray datasets. In this paper we show that further improvements are possible if the explicitly computable limit C →0 is used. We prove that in this limit most forms of SVM and ridge regression classifiers scaled by the factor \(\frac{1}{C}\) converge to a centroid classifier. As this classifier can be used directly for feature ranking, in the limit we can avoid the computationally demanding recursion and convex optimisation in RFE-SVM. Comparisons on two text based author verification tasks and on three genomic microarray classification tasks indicate that this straightforward method can surprisingly obtain comparable (at times superior) performance and is about an order of magnitude faster.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proc. 21st Int. Conf. Machine Learning (ICML), Banff, Canada (2004)
Google Scholar
Huang, T.M., Kecman, V.: Gene extraction for cancer diagnosis by support vector machines - an improvement. Artificial Intelligence in Medicine 35, 185–194 (2005)
Article Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
MATH Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley & Sons, Chichester (2001)
MATH Google Scholar
Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proc. 20th Int. Conf. Computational Linguistics (COLING), Geneva, pp. 611–617 (2004)
Google Scholar
Love, H.: Attributing Authorship: An Introduction. Cambridge University Press, Cambridge (2002)
Book Google Scholar
Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In: Proc. 2006 Conf. Empirical Methods in Natural Language Processing (EMNLP), Sydney, pp. 482–491 (2006)
Google Scholar
Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. National Acad. Sci. 99, 6562–6566 (2002)
Article MATH Google Scholar
Alizadeh, A., Eisen, M., Davis, R., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Article Google Scholar
Chu, F., Wang, L.: Gene expression data analysis using support vector machines. In: Proc. Intl. Joint Conf. Neural Networks, pp. 2268–2271 (2003)
Google Scholar
Tothill, R., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., et al.: An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research 65, 4031–4040 (2005)
Article Google Scholar
Tibshirani, R., Hastie, T., et al.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science 18, 104–117 (2003)
Article MATH MathSciNet Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)
MathSciNet Google Scholar
van’t Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, 0200, ACT, Australia
Justin Bedo, Conrad Sanderson & Adam Kowalczyk
National ICT Australia (NICTA), Locked Bag 8001, ACT, 2601, Australia
Justin Bedo, Conrad Sanderson & Adam Kowalczyk
Dept. Electrical & Electronic Eng., University of Melbourne, VIC, 3010, Australia
Adam Kowalczyk

Authors

Justin Bedo
View author publications
You can also search for this author in PubMed Google Scholar
Conrad Sanderson
View author publications
You can also search for this author in PubMed Google Scholar
Adam Kowalczyk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bedo, J., Sanderson, C., Kowalczyk, A. (2006). An Efficient Alternative to SVM Based Recursive Feature Elimination with Applications in Natural Language Processing and Bioinformatics. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_21

Download citation

DOI: https://doi.org/10.1007/11941439_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics