Skip to main content
Log in

Parametric and semi-nonparametric model strategies for the estimation of distributions of chemical contaminant data

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

The determination of an appropriate distribution for concentration data is of major importance in chemical risk assessment. The selection and the estimation of an appropriate distribution is hindered by observations below the limit-of-detection and the limit-of-quantification, leading to left-censored and interval-censored data. The log-normal distribution is a typical choice, owing its popularity from the use of the log transform in daily laboratory practice, in combination with the nice mathematical and computational properties of the normal distribution. But the log-normal should not be the only choice and other distributions need to be considered as well. Here we focus on several families of distributions that are related to the log-normal distribution in some direct or indirect way, and that are parametric or semi-nonparametric extensions of the log-normal distribution: the log-skew-normal, the log-t, the log-skew-t, the Weibull, the gamma, the generalized-gamma, and the semi-nonparametric estimator of Zhang and Davidian (Biometrics 64(2):567–669, 2008). Whereas Nysen et al. (Stat Med 31:2374–2385, 2012) developed methodology to test the goodness-of-fit of a particular hypothesized distribution, our interest here goes to model selection and model averaging, using all parametric models only or in addition the series of extensions of the log-normal underlying the semi-nonparametric estimator. The models and methods of selection and averaging are further investigated through simulations and illustrated on data of cadmium concentration in food products.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aerts M, Claeskens G, Hart J (1999) Testing the fit of a parametric function. J Am Stat Assoc 94(447):869–879

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B, Csaki F (eds) 2nd International symposium on information theory, Akademiai Kiado, Budapest (Reproduced in Breakthroughs in Statistics, vol 1 (eds. S. Kotz and N. L. Johnson), Springer, New York (1992), pp 267–281

  • Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83(4):715–726

    Article  Google Scholar 

  • Azzalini A, Dal Cappello T (2003) Log-skew-normal and log-skew-t distributions as model for family income data. J Income Distrib 11:12–20

    Google Scholar 

  • Burnham K, Anderson D (1998) Model selection and inference: a practical information-theoretical approach. Springer, New York

    Book  Google Scholar 

  • Burnham K, Anderson D, Huyvaert K (2011) AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav Ecol Sociobiol 65:23–35

    Article  Google Scholar 

  • Claeskens G, Carroll R (2007) An asymptotic theory for model selection inference in general semiparametric problems. Biometrika 94:249–265

    Article  Google Scholar 

  • Claeskens G, Hjort N (2003) The focused information criterion. J Am Stat Assoc 98(464):900–916

    Article  Google Scholar 

  • Claeskens G, Hjort N (2008) Model selecting and model averaging. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Davison A (2003) Statistical models. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • EFSA (2010) Management of left-censored data in dietary exposure assessment of chemical substances. EFSA J 8(3):96

  • Faes C, Aerts M, Geys H, Molenberghs G (2007) Model averaging using fractional polynomials to estimate a safe level of exposure. Risk Anal 27:111–123

    Article  PubMed  Google Scholar 

  • Fenton V, Gallant A (1996) Qualitative and asymptotic performance of snp density estimators. J Econ 74:77–118

    Article  Google Scholar 

  • Gallant A, Nychka D (1987) Semi-nonparametric maximum likelihood estimation. Econometrica 55(2):363–390

    Article  Google Scholar 

  • Hart J (1997) Nonparametric smoothing and lack-of-fit tests. Springer, New York

    Book  Google Scholar 

  • Hewett P, Ganser G (2007) A comparison of several methods for analyzing censored data. Ann Occupat Hyg 51(7):611–632

    Article  Google Scholar 

  • Hurvich C, Simonoff J, Tsai CL (1998) Smoothing parameter selection in nonparametric regression using an improved akaike information criterion. J R Stat Soc B 60:271–293

    Article  Google Scholar 

  • Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481

    Article  Google Scholar 

  • Lin G, Stoyanov J (2009) The logarithmic skew-normal distributions are moment-indeterminate. J Appl Probab 46:909–916

    Article  Google Scholar 

  • Moy G (2013) Total diet studies. Springer, New York

    Book  Google Scholar 

  • Namata H, Aerts M, Faes C, Teunis P (2008) Model averaging in microbial risk assessment using fractional polynomials. Risk Anal 28:891–905

    Article  PubMed  Google Scholar 

  • Nysen R (2015) Statistical methodology for left- and interval-censored analytical data in environmental risk assessment. PhD thesis, Hasselt University, Belgium, URL http://ibiostat.be/publications

  • Nysen R, Aerts M, Faes C (2012) Testing goodness of fit of parametric models for censored data. Stat Med 31:2374–2385

    Article  PubMed  Google Scholar 

  • Schmoyer R, Beauchamp J, Brandt C, Hoffman F Jr (1996) Difficulties with the lognormal model in mean estimation and testing. Environ Ecol Stat 3:81–97

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  Google Scholar 

  • Stacy E (1962) A generalization of the gamma distribution. Ann Math Stat 33(3):1187–1192

    Article  Google Scholar 

  • Symonds R, Matthew Moussalli A (2011) A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using akaike’s information criterion. Behav Ecol Sociobiol 65:13–21

    Article  Google Scholar 

  • Zhang M, Davidian M (2008) “Smooth” semiparametric regression analysis for arbitrarily censored time-to-event data. Biometrics 64(2):567–669

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

This research was supported by the IAP research network nr P6/03 of the Belgian Government (Belgian Science Policy). For the simulations we used the infrastructure of the VSC - Flemish Supercomputer Center, funded by the Hercules Foundation and the Flemish Government - department EWI. The authors thank the members of the EFSA Working Group on Left Censored Data (Martine Bakker, Peter Fürst, Gerhard Heinemeyer, Jessica Tressou, Philippe Verger and the EFSA staff members Pietro Ferrari, Olaf Mosbach-Schulz, Billy Amzal) and are grateful to EFSA for the approval to use the cadmium data (EFSA/DATEX/2007/005). The authors thank the reviewers for their suggestions and constructive comments, which helped us to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruth Nysen.

Additional information

Handling Editor: Pierre Dutilleul.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nysen, R., Faes, C., Ferrari, P. et al. Parametric and semi-nonparametric model strategies for the estimation of distributions of chemical contaminant data. Environ Ecol Stat 22, 423–444 (2015). https://doi.org/10.1007/s10651-014-0304-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-014-0304-5

Keywords

Navigation