Summary
For many data approximation problems in metrology, there are a num- ber of competing models which can potentially fit the observed data. A crucial task is to quantify the extent to which one model performs better than others, taking into account the influence of random effects associated with the data. For example, for a given data set, we can use a series of polynomials of various degrees to fit the data using a least squares criterion. The residual sum of squares is a measure of how well the model fits the data. However, it is generally required to balance goodness of fit with minimising the model complexity. We consider a number of criteria that aim to do this: the Akaike information criterion (AIC), the Bayesian/Schwarz in- formation criterion, and the AIC with a correction for small sample size (AICc). In this paper, we compare the performance of these criteria for polynomial regression and show that for the examples tested the AICc criterion performs best. A second element of model selection is to determine from a set of feature vectors, the sub- set that defines a model space most suitable for describing the observed response. Since there are 2N possible model spaces defined by N feature vectors, for even a modest number of feature vectors it is necessary to reduce or prioritise the number of candidate models. Partial least squares and the least angle regression algorithms can be used as model reduction tools. We describe these algorithms in the context of feature selection and how they can be used with a model selection criterion such as AICc and illustrate their performance using simulations and on an application from human sensory perception.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Akaike: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 1974, 716–723.
A. Bialek, A.B. Forbes, T. Goodman, R. Montgomery, M. Rides, G. van der Heijden, H. van der Voet, G. Polder, and K. Overvliet: Model development to predict perceived degree of naturalness. In: IMEKO XIX World Congress, 6-11 September 2009, Lisbon, 2009.
K.P. Burnham and D.R. Anderson: Model Selection and Multimodel Inferences: A Practical Information-Theoretic Approach. 2nd edition, Springer, New York, 2002.
H. Chipman, E.I. George, and R.E. McCulloch: The practical implementation of Bayesian model selection. In: IMS Lecture Notes – Monograph Series, Vol. 38, P. Lahiri (ed.), Institute of Mathematical Statistics, Beachwood, Ohio, 2001.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani: Least angle regression. Annals of Statistics 32(2), 2004, 407–499.
G.H. Golub and C.F. Van Loan: Matrix Computations. John Hopkins University Press, Baltimore, third edition, 1996.
T.G. Goodman, R. Montgomery, A. Bialek, A. Forbes, M. Rides, A. Whitaker, K. Overvliet, F. McGlone, and G. van der Heijden: The measurement of naturalness (MONAT). In: Man, Science & Measurement. Proceedings of the 12th IMEKO TC1–TC7 joint symposium, Annecy, France, September 3–5, 2008.
R. Hoffman, V.I. Minkin, and B.K. Carpenter: Ockham’s razor and chemistry. International Journal for Philosophy of Chemistry 3, 1997, 3–28.
C.M. Hurvich and C. Tsai: Regression and time series model selection in sample samples. Biometrika 76, 1989, 297–307.
K. Knight andW. Fu: Asymptotics for LASSO-type estimators. Annals of Statistics 28, 2000, 1356–1378.
H. Linhart and W. Zucchini: Model Selection. Wiley, New York, 1986.
D. Madigan and A.E. Raftery: Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89, 1535–1546.
R. Manne: Analysis of two partial-least-squares algorithms for multivariate calibration. Chemometrics and Intelligent Laboratory Systems 2(1-3), August 1987, 187–198.
A.J. Miller: Subset Selection in Regression. Chapman-Hall, New York, 1990.
A.E. Raftery, D. Madigan, and J.A. Hoeting: Bayesian model averaging for linear regression. Journal of the American Statistical Association 92, 1997, 179–191.
G. Schwarz: Estimating the dimension of a model. Annals of Statistics 6, 1978, 461–464.
M. Sewell: Statistical inference (and what is wrong with classical statistics). In: The Social Construction of Statistics, S.M. Harding, T. and R. Thomas (eds.), Pluto Press, London, 2008.
R. Tibshirani: Regression shrinkage and selection via LASSO. Journal of Royal
Statistical Society, Series B 58, 1996, 267–288.
L. Wasserman: Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 2000, 92–107.
A. Zellner, H.A. Keuzenkamp, and M. McAleer: Simplicity, Inference and Modelling: Keeping it Sophisticated Simple. Cambrige University Press, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, XS., Forbes, A.B. (2011). Model and Feature Selection in Metrology Data Approximation. In: Georgoulis, E., Iske, A., Levesley, J. (eds) Approximation Algorithms for Complex Systems. Springer Proceedings in Mathematics, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16876-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-16876-5_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16875-8
Online ISBN: 978-3-642-16876-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)