Abstract
Motivated by examples in spectroscopy, we study variable selection for discrimination in problems with very many predictor variables. Assuming multivariate normal distributions with common variance for the predictor variables within groups, we develop a Bayesian decision theory approach that balances costs for variables against a loss due to classification errors. The approach is computationally intensive, requiring a simulation to approximate the intractable expected loss and a search, using simulated annealing, over a large space of possible subsets of variables. It is illustrated by application to a spectroscopic example with 3 groups, 100 variables, and 71 training cases, where the approach finds subsets of between 5 and 14 variables whose discriminatory power is comparable with that of linear discriminant analysis using principal components derived from the full 100 variables. We study both the evaluation of expected loss and the tuning of the simulated annealing for the example, and conclude that computational effort should be concentrated on the search.
Similar content being viewed by others
References
Bernardo J.M. and Smith A.F.M. 1994. Bayesian Theory. John Wiley, Chichester.
Brown P.J. 1993. Measurement, Regression, and Calibration. Clarendon Press, Oxford.
Brown P.J., Fearn T., and Haque M.S. 1999. Discrimination with many variables. Journal of the American Statistical Association 94: 1320-1329.
Brown P.J., Fearn T., and Vannucci M. 1999. The choice of variables in multivariate regression: Anon-conjugate Bayesian decision theory approach. Biometrika 86: 635-648.
Dawid A.P. 1981. Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika 68: 265-274.
Dowsland K.A. 1995. Simulated annealing. In: Reeves C.R. (Ed.), Modern Heuristic Techniques for Combinatorial Problems. McGraw-Hill, London, pp. 377-419.
Hastie T., Buja A., and Tibshirani R. 1995. Penalized discriminant analysis. Annals of Statistics 23: 73-102.
Jolliffe I.T. 1986. Principal Component Analysis. Springer-Verlag, New York.
Krzanowski W.J., Jonathan P., McCarthy W.V., and Thomas M.R. 1995. Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Applied Statistics 44: 105-115.
Lundy M. and Mees A. 1986. Convergence of an annealing algorithm. Mathematical Programming 34: 111-124.
McLachlan G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.
Savage L.J. 1971. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66: 783-801.
Stone M. and Jonathan P. 1994. Statistical thinking and technique for QSAR and related studies. Part II. Specific methods. Journal of Chemometrics 8: 1-20.
Rights and permissions
About this article
Cite this article
Fearn, T., Brown, P.J. & Besbeas, P. A Bayesian decision theory approach to variable selection for discrimination. Statistics and Computing 12, 253–260 (2002). https://doi.org/10.1023/A:1020702927247
Issue Date:
DOI: https://doi.org/10.1023/A:1020702927247