A sparse Bayesian approach for joint feature selection and classifier learning

Lapedriza, Àgata; Seguí, Santi; Masip, David; Vitrià, Jordi

doi:10.1007/s10044-008-0130-1

A sparse Bayesian approach for joint feature selection and classifier learning

Theoretical Advances
Published: 15 July 2008

Volume 11, pages 299–308, (2008)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Àgata Lapedriza^1,2,
Santi Seguí¹,
David Masip^1,3 &
…
Jordi Vitrià^1,4

289 Accesses
8 Citations
Explore all metrics

Abstract

In this paper we present a new method for Joint Feature Selection and Classifier Learning using a sparse Bayesian approach. These tasks are performed by optimizing a global loss function that includes a term associated with the empirical loss and another one representing a feature selection and regularization constraint on the parameters. To minimize this function we use a recently proposed technique, the Boosted Lasso algorithm, that follows the regularization path of the empirical risk associated with our loss function. We develop the algorithm for a well known non-parametrical classification method, the relevance vector machine, and perform experiments using a synthetic data set and three databases from the UCI Machine Learning Repository. The results show that our method is able to select the relevant features, increasing in some cases the classification accuracy when feature selection is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selecting Relevant Features for Classifier Optimization

Bayesian Group Feature Selection for Support Vector Learning Machines

Multiple Bayesian discriminant functions for high-dimensional massive data classification

Article 28 October 2016

References

Bellman R (1961) Adaptive control process: a guided tour. Princeton University Press, New Jersey
MATH Google Scholar
Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley
MATH Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Schapire RE, Freund Y, Bartlett PL, Lee WS (1997) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):322–330
MathSciNet Google Scholar
Madigan D, Genkin A, Lewis DD, Fradkin D (2005) Bayesian multinomial logistic regression for author identification. In: AIP conference proceedings—25th international workshop on Bayesian inference and maximum entropy methods in science and engineering, vol 803, pp 509–516, 23 November 2005
Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Appl 9(2–3):127–137
MathSciNet Google Scholar
Zivkovic Z, van der Heijden F (2004) Improving the selection of feature points for tracking. Pattern Anal Appl 7(2):144–150
Article MathSciNet Google Scholar
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Article Google Scholar
Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston
MATH Google Scholar
Masip D, Kuncheva LI, Vitria J (2005) An ensemble-based method for linear feature extraction for two-class problems. Pattern Anal Appl 8:227–237
Article MathSciNet Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Article MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2004) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Google Scholar
Mao KZ (2004) Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Trans Syst Man Cybern Part B 34(1):60–67
Article Google Scholar
Chen S, Wang X, Hong X, Harris CJ (2006) Kernel classifier construction using orthogonal forward selection and boosting with fisher ratio class separability measure. IEEE Trans Neural Netw 17(6):1652–1656
Article Google Scholar
Hong X, Mitchell RJ (2007) Backward elimination model construction for regression and classification using leave-one-out criteria. Int J Syst Sci 38(2):101–113
Article MATH MathSciNet Google Scholar
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. In: Leen TK, Dietterich TG, Tresp V (eds) NIPS. MIT Press, Cambridge, pp 668–674
Neal RM (1996) Bayesian learning for neural networks. LNS, vol 118. Springer, Heidelberg
Seeger M (1999) Bayesian model selection for support vector machines, gaussian processes and other kernel classifiers. In: Solla SA, Leen TK, Müller K-R (eds) NIPS. The MIT Press, Cambridge, pp 603–609
Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-Norm support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural information processing systems, vol 16. MIT Press, Cambridge
Jebara T, Jaakkola T (2000) Feature selection and dualities in maximum entropy discrimination. In: Proc. 16th conf. on uncertainty in artif. intell. Morgan Kaufmann Publ. Inc, San Francisco, pp 291–300
Li K, Peng J, Bai E (2006) A two-stage algorithm for identification of nonlinear dynamic systems. Automatica 42(7):1189–1197
Article MATH MathSciNet Google Scholar
Krishnapuram B, Hartemink AJ, Carin L, Figueiredo MAT (2004) A bayesian approach to joint feature selection and classifier design. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1111
Article Google Scholar
Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
Article MATH MathSciNet Google Scholar
Effron B, Hastie T, Johnstone I, Tibshinrani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article Google Scholar
Effron B, Hastie T, Johnstone I, Tibshinrani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Google Scholar
Zhao P, Yu B (2007) Stagewise lasso. J Mach Learn Res 8:2701–2726
Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. J Numer Anal 20(3):389–403
Article MATH MathSciNet Google Scholar
Osborne M, Presnell B, Turlach B (2000) On the lasso and its dual. J Comput Graph Stat 9(2):319–337
Article MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407
Article MATH MathSciNet Google Scholar
Vert J-P, Foveau N, Lajaunie C, Vandenbrouck Y (2006) An accurate and interpretable model for sirna efficacy prediction. BMC Bioinf 7:520–537
Article Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Series B 67(1):91–108
Article MATH MathSciNet Google Scholar
Ghosh D, Chinnaiyan A (2005) Classification and selection of biomarkers in genomic data using lasso. J Biomed Biotechnol 2005(2):147–54
Gao J, Suzuki H, Yu B (2006) Approximation lasso methods for language modeling. In: ACL ’06: proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the ACL, Association for Computational Linguistics. Morristown, NJ, pp 225–232
Obozinsky G, Taskar B, Jordan M (2006) Multi-task feature selection. Tech. rep., Statistics Department UC Berkeley
Igual L, Seguí S, Vitrià J, Azpiroz F, Radeva P (2007) Sparse bayesian feature selection applied to intestinal motility analysis. In: XVI Congreso Argentino de Bioingeniería, pp 467–470
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–374
Article MATH MathSciNet Google Scholar
Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Pranckeviciene E, Ho T, Somorjai RL (2006) Class separability in spaces reduced by feature selection. In: ICPR (3). IEEE Computer Society, pp 254–257
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature-selection. Pattern Recognit Lett 15(11):1119–1125
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by MEC grant TIC2006-15308-C02-01 and CONSOLIDER-INGENIO 2010 (CSD2007-00018).

Author information

Authors and Affiliations

Computer Vision Center, Universitat Autònoma de Barcelona, Edifici O Bellaterra, 08193, Barcelona, Spain
Àgata Lapedriza, Santi Seguí, David Masip & Jordi Vitrià
Dept. Informàtica, Universitat Autònoma de Barcelona, Edifici O Bellaterra, 08193, Barcelona, Spain
Àgata Lapedriza
Universitat Oberta de Catalunya, Estudis d’Informàtica Multimèdia i Telecomunicació, Rambla del Poblenou 156, 08018, Barcelona, Spain
David Masip
Dept. Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, Spain
Jordi Vitrià

Authors

Àgata Lapedriza
View author publications
You can also search for this author in PubMed Google Scholar
Santi Seguí
View author publications
You can also search for this author in PubMed Google Scholar
David Masip
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Vitrià
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Àgata Lapedriza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lapedriza, À., Seguí, S., Masip, D. et al. A sparse Bayesian approach for joint feature selection and classifier learning. Pattern Anal Applic 11, 299–308 (2008). https://doi.org/10.1007/s10044-008-0130-1

Download citation

Received: 06 January 2007
Accepted: 23 May 2008
Published: 15 July 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10044-008-0130-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sparse Bayesian approach for joint feature selection and classifier learning

Abstract

Access this article

Similar content being viewed by others

Selecting Relevant Features for Classifier Optimization

Bayesian Group Feature Selection for Support Vector Learning Machines

Multiple Bayesian discriminant functions for high-dimensional massive data classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sparse Bayesian approach for joint feature selection and classifier learning

Abstract

Access this article

Similar content being viewed by others

Selecting Relevant Features for Classifier Optimization

Bayesian Group Feature Selection for Support Vector Learning Machines

Multiple Bayesian discriminant functions for high-dimensional massive data classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation