Abstract
A common type of statistical challenge, widespread across many areas of research, involves the selection of a preferred model to describe the main features and trends in a particular data set. The objective of model selection is to balance the quality of fit to data against the complexity and predictive ability of the model achieving that fit. Several model selection techniques, including two information criteria, which aim to determine which set of model parameters the data best support, are reviewed here. The techniques rely on computing the probabilities of the different models, given the data, rather than considering the allowed values of the fitted parameters. Such information criteria have only been applied to the field of radiation epidemiology recently, even though they have longer traditions of application in other areas of research. The purpose of this review is to make two information criteria more accessible by fully detailing how to calculate them in a practical way and how to interpret the resulting values. This aim is supported with the aid of some examples involving the computation of risk models for radiation-induced solid cancer mortality fitted to the epidemiological data from the Japanese A-bomb survivors. These examples illustrate that the Bayesian information criterion is particularly useful in concluding that the weight of evidence is in favour of excess relative risk models that depend on age-at-exposure and excess relative risk models that depend on age-attained.
Similar content being viewed by others
Notes
However, this should not give the impression that the standard model selection approach involving maximum likelihoods pays no attention to the number of fit parameters, which, in fact, determines the number of degrees of freedom, as explained below.
References
Burnham KP, Anderson DR (2002) Model selection and multimodel inference. 2nd edn. Springer, New York
MacKay DJC (2003) Information theory, inference and learning algorithms. Cambridge University Press, London
Gregory P (2005) Bayesian logical data analysis for the physical sciences. Cambridge University Press, London
Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference, part II. Biometrika 20A:263–294
Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression and survival analysis. Springer Series in Statistics
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Caski F (eds) Proceedings of the 2nd international symposium on information theory. Budapest, Hungary, Akademiai Kiado, pp 267–281
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Schwarz G (1978) Estimating the dimension of a model. Ann stat 6:461–464
Walsh L, Rühm W, Kellerer AM (2004) Cancer risk estimates for γ-rays with regard to organ specific doses, part I: All solid cancers combined. Radiat Environ Biophys 43:145–151
Izumi S, Ohtaki M (2004) Aspects of the Armitage–Doll gamma frailty model for cancer incidence data. Environmetrics 15:209–218
Tavecchia G, Pradel R, Boy V, Johnson AR, Cezilly F (2001) Sex- and age-related variation in survival and cost of reproduction in greater flamingos. Ecology 82(1):165–174
Mukherjee S, Feigelson ED, Babu GL, Murtagh F, Fraley C, Raftery A (1998) Three types of gamma-ray bursts. Ap J 508:314–325
Preston DL, Shimizu Y, Pierce DA, Suyama A, Mabuchi K (2003) Studies of the mortality of atomic bomb survivors. Report 13 solid cancer and noncancer disease mortality1950–1997. Radiat Res 160:381–407
Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Kluwer Academic, Dordrecht
Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92:937–950
Motulsky H, Christopoulos A (2002) Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting. GraphPad Software, Inc.
Jeffreys H (1935) Some tests of significance, treated by the theory of probability. Proc Camb Philo Soc 31:203–222
Jeffreys H (1961) Theory of probability, 3rd edn. Oxford University Press, Oxford
Radivoyevitch T, Hoel DG (2000) Biologically-based risk estimation for radiation-induced chronic myeloid leukemia. Radiat Environ Biophys 39:153–159
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assn 90:773–795
Kashyap R (1980) Inconsistency of the AIC rule for estimating the order of autoregressive models. IEEE Trans Auto Control 25:996–998
Mallows CL (1973) Some Comments on C p . Technometrics 15(4):661–675
Kolmogorov A (1968) Three approaches to the quantitative definition of information. Probl Inf Transmission 1:1–12
Ramos AA (2006) The minimum description length principle and model selection in spectropolarimetry. Online under arXiv:astro-ph/0606516 v1 21 June 2006
Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100
Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 64, part 4, 583–639
Bennett B (2003) DS02: The new dosimetry system DS02. Hiroshima Igaku (Japanese). J Hiroshima Med Assoc 56:386
Young R, Kerr GD (eds) (2005) DS02: Reassessment of the atomic bomb radiation dosimetry for Hiroshima and Nagasaki, Dosimetry System 2002, DS02, vols 1, 2, Radiation Effects Research Foundation, Hiroshima
Straume T, Rugel G, Marchetti AA, Rühm W, Korschinek G, McAninch JE, Carroll K, Egbert S, Faestermann T, Knie K, Martinelli R, Wallner A, Wallner C, Fujita S, Shizuma K, Hoshi M, Hasai H (2003) Measuring fast neutrons in Hiroshima at distances relevant to atomic-bomb survivors. Nature 424:539–541
Straume T, Rugel G, Marchetti AA, Rühm W, Korschinek G, McAninch JE, Carroll K, Egbert S, Faestermann T, Knie K, Martinelli R, Wallner A, Wallner C, Fujita S, Shizuma K, Hoshi M, Hasai H (2004) Measuring fast neutrons in Hiroshima at distances relevant to atomic-bomb survivors. Nature 430:483
Huber T, Rühm W, Hoshi M, Egbert SD, Nolte E (2003) 36Cl measurements in Hiroshima granite samples as part of an international intercomparison study: results from the Munich group. Radiat Environ Biophys 42:27–32
Huber T, Rühm W, Kato K, Egbert S, Kubo F, Lazarev V, Nolte E (2005) The Hiroshima thermal neutron discrepancy for 36Cl at large distances; Part I: New 36Cl measurements in granite samples exposed to a-bomb neutrons. Radiat Environ Biophys 44:75–86
Kellerer AM, Walsh L (2001) Risk estimation for fast neutrons with regard to solid cancer. Radiat Res 156:708–717
Kellerer AM, Barclay D (1992) Age dependences in the modelling of radiation carcinogenesis: age-dependent factors in the biokinetics and dosimetry of radionuclides. Radiat Prot Dosim 41:273–281
Pierce DA, Mendelsohn ML (1999) A model for radiation related cancer suggested by atomic bomb survivor data. Radiat Res 152:642–654
James F (1994) Minuit function minimization and error analysis, Version 94.1. Technical report, CERN
Preston DL, Lubin JH, Pierce DA (1993) Epicure User`s Guide. HiroSoft International Corp., Seattle
Lagarde F (2006) Understanding estimation of time and age effect-modification of radiation-induced cancer risk among atomic-bomb survivors. Health Phys 91(6):608–618
Box GEP (1976) Science and statistics. J Am Stat Assoc 71:791–799
Acknowledgments
The author would like to thank Dr. W. Rühm and Dr. J. R. Walsh for critically reading the manuscript, Prof. D. Pierce and Dr. P. Jacob for useful discussions and two anonymous reviewers for many valuable comments which lead to an improvement of the original manuscript. This work makes use of the data obtained from the Radiation Effects Research Foundation (RERF) in Hiroshima, Japan. RERF is a private foundation funded equally by the Japanese Ministry of Health and Welfare and the US Department of Energy through the US National Academy of Sciences. The conclusions in this work are those of the author and do not necessarily reflect the scientific judgement of RERF or its funding agencies.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Table 6
Rights and permissions
About this article
Cite this article
Walsh, L. A short review of model selection techniques for radiation epidemiology. Radiat Environ Biophys 46, 205–213 (2007). https://doi.org/10.1007/s00411-007-0109-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00411-007-0109-0