Skip to main content
Log in

A sparse Bayesian approach for joint feature selection and classifier learning

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper we present a new method for Joint Feature Selection and Classifier Learning using a sparse Bayesian approach. These tasks are performed by optimizing a global loss function that includes a term associated with the empirical loss and another one representing a feature selection and regularization constraint on the parameters. To minimize this function we use a recently proposed technique, the Boosted Lasso algorithm, that follows the regularization path of the empirical risk associated with our loss function. We develop the algorithm for a well known non-parametrical classification method, the relevance vector machine, and perform experiments using a synthetic data set and three databases from the UCI Machine Learning Repository. The results show that our method is able to select the relevant features, increasing in some cases the classification accuracy when feature selection is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Bellman R (1961) Adaptive control process: a guided tour. Princeton University Press, New Jersey

    MATH  Google Scholar 

  2. Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley

    MATH  Google Scholar 

  3. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  4. Schapire RE, Freund Y, Bartlett PL, Lee WS (1997) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):322–330

    MathSciNet  Google Scholar 

  5. Madigan D, Genkin A, Lewis DD, Fradkin D (2005) Bayesian multinomial logistic regression for author identification. In: AIP conference proceedings—25th international workshop on Bayesian inference and maximum entropy methods in science and engineering, vol 803, pp 509–516, 23 November 2005

  6. Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Appl 9(2–3):127–137

    MathSciNet  Google Scholar 

  7. Zivkovic Z, van der Heijden F (2004) Improving the selection of feature points for tracking. Pattern Anal Appl 7(2):144–150

    Article  MathSciNet  Google Scholar 

  8. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Article  Google Scholar 

  9. Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Google Scholar 

  10. Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston

    MATH  Google Scholar 

  11. Masip D, Kuncheva LI, Vitria J (2005) An ensemble-based method for linear feature extraction for two-class problems. Pattern Anal Appl 8:227–237

    Article  MathSciNet  Google Scholar 

  12. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Article  MATH  Google Scholar 

  13. Guyon I, Weston J, Barnhill S, Vapnik V (2004) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Google Scholar 

  14. Mao KZ (2004) Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Trans Syst Man Cybern Part B 34(1):60–67

    Article  Google Scholar 

  15. Chen S, Wang X, Hong X, Harris CJ (2006) Kernel classifier construction using orthogonal forward selection and boosting with fisher ratio class separability measure. IEEE Trans Neural Netw 17(6):1652–1656

    Article  Google Scholar 

  16. Hong X, Mitchell RJ (2007) Backward elimination model construction for regression and classification using leave-one-out criteria. Int J Syst Sci 38(2):101–113

    Article  MATH  MathSciNet  Google Scholar 

  17. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. In: Leen TK, Dietterich TG, Tresp V (eds) NIPS. MIT Press, Cambridge, pp 668–674

  18. Neal RM (1996) Bayesian learning for neural networks. LNS, vol 118. Springer, Heidelberg

  19. Seeger M (1999) Bayesian model selection for support vector machines, gaussian processes and other kernel classifiers. In: Solla SA, Leen TK, Müller K-R (eds) NIPS. The MIT Press, Cambridge, pp 603–609

  20. Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-Norm support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural information processing systems, vol 16. MIT Press, Cambridge

  21. Jebara T, Jaakkola T (2000) Feature selection and dualities in maximum entropy discrimination. In: Proc. 16th conf. on uncertainty in artif. intell. Morgan Kaufmann Publ. Inc, San Francisco, pp 291–300

  22. Li K, Peng J, Bai E (2006) A two-stage algorithm for identification of nonlinear dynamic systems. Automatica 42(7):1189–1197

    Article  MATH  MathSciNet  Google Scholar 

  23. Krishnapuram B, Hartemink AJ, Carin L, Figueiredo MAT (2004) A bayesian approach to joint feature selection and classifier design. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1111

    Article  Google Scholar 

  24. Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244

    Article  MATH  MathSciNet  Google Scholar 

  25. Effron B, Hastie T, Johnstone I, Tibshinrani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  Google Scholar 

  26. Effron B, Hastie T, Johnstone I, Tibshinrani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    Google Scholar 

  27. Zhao P, Yu B (2007) Stagewise lasso. J Mach Learn Res 8:2701–2726

    Google Scholar 

  28. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  29. Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. J Numer Anal 20(3):389–403

    Article  MATH  MathSciNet  Google Scholar 

  30. Osborne M, Presnell B, Turlach B (2000) On the lasso and its dual. J Comput Graph Stat 9(2):319–337

    Article  MathSciNet  Google Scholar 

  31. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407

    Article  MATH  MathSciNet  Google Scholar 

  32. Vert J-P, Foveau N, Lajaunie C, Vandenbrouck Y (2006) An accurate and interpretable model for sirna efficacy prediction. BMC Bioinf 7:520–537

    Article  Google Scholar 

  33. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Series B 67(1):91–108

    Article  MATH  MathSciNet  Google Scholar 

  34. Ghosh D, Chinnaiyan A (2005) Classification and selection of biomarkers in genomic data using lasso. J Biomed Biotechnol 2005(2):147–54

  35. Gao J, Suzuki H, Yu B (2006) Approximation lasso methods for language modeling. In: ACL ’06: proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the ACL, Association for Computational Linguistics. Morristown, NJ, pp 225–232

  36. Obozinsky G, Taskar B, Jordan M (2006) Multi-task feature selection. Tech. rep., Statistics Department UC Berkeley

  37. Igual L, Seguí S, Vitrià J, Azpiroz F, Radeva P (2007) Sparse bayesian feature selection applied to intestinal motility analysis. In: XVI Congreso Argentino de Bioingeniería, pp 467–470

  38. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–374

    Article  MATH  MathSciNet  Google Scholar 

  39. Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  40. Pranckeviciene E, Ho T, Somorjai RL (2006) Class separability in spaces reduced by feature selection. In: ICPR (3). IEEE Computer Society, pp 254–257

  41. Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature-selection. Pattern Recognit Lett 15(11):1119–1125

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by MEC grant TIC2006-15308-C02-01 and CONSOLIDER-INGENIO 2010 (CSD2007-00018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Àgata Lapedriza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lapedriza, À., Seguí, S., Masip, D. et al. A sparse Bayesian approach for joint feature selection and classifier learning. Pattern Anal Applic 11, 299–308 (2008). https://doi.org/10.1007/s10044-008-0130-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-008-0130-1

Keywords

Navigation