Abstract
We introduce and study the Box–Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box–Cox t, Box–Cox Cole-Green (or Box–Cox normal), Box–Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box–Cox symmetric models is illustrated in a series of applications to nutritional data.
Similar content being viewed by others
Notes
It is the distribution of \(Z/U^{1/q}\), where \(q>0\) and Z and U are independent random variables with standard normal and uniform distribution, respectively.
If \(\sigma |{{\lambda }}|=0\), \(1/\sigma |{{\lambda }}|\) is interpreted as \(\lim _{\sigma {{\lambda }} \rightarrow 0}{( 1/\sigma |{{\lambda }}| )}=\infty \) and \(F ( 1/\sigma |{{\lambda }}|)\) is taken as 1.
\(y^{-\infty }=\infty \), if \(0<y<1\), \(=1\), if \( y=1\), \(=0\), if \(y>1\); \(y^{\infty }=0\), if \(0<y<1\), \(=1\), if \( y=1\), \(=\infty \), if \(y>1\).
The tail indices were obtained using Maple 13; see http://www.maplesoft.com. The tail index for the log-power exponential distribution with \(\tau >1\) was obtained for \(\tau \in \mathbb {Q}\), and for the slash distribution, for \(q \in \mathbb {N}^{*}\).
References
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32(2), 159–188 (2005)
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B 26(2), 211–252 (1964)
Cole, T., Green, P.J.: Smoothing reference centile curves: the LMS method and penalized likelihood. Stat. Med. 11(10), 1305–1319 (1992)
Cordeiro, G.M., Andrade, M.G.: Transformed symmetric models. Stat. Model. 11(4), 371–388 (2011)
de Haan, L.: On Regular Variation and Its Application to the Weak Convergence of Sample Extremes, Mathematical Centre Tracts, vol. 32. Mathematics Centre, Amsterdam (1970)
Dunn, P.K., Smyth, G.K.: Randomized quantile residuals. J. Comput. Graph. Stat. 5(3), 236–244 (1996)
Fang, K.T., Kotz, S., NG, K.W.: Symmetric Multivariate and Related Distributions. Chapman and Hall, London (1990)
Hubert, M., Vandervieren, E.: An adjusted boxplot for skewed distributions. Comput. Stat. Data Anal. 52(12), 5186–5201 (2008)
Jondeau, E., Rockinger, M.: Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. J. Econ. Dyn. Control 27, 1699–1737 (2003)
Kelker, D.: Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya A 32(4), 419–430 (1970)
Luceño, A.: Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Comput. Stat. Data Anal. 51(2), 904–917 (2005)
Poirier, D.J.: The use of box–cox transformation in limited dependent variable models. J. Am. Stat. Assoc. 73(362), 284–287 (1978)
Resnick, S.I.: Heavy-Tail Phenomena Probabilistic and Statistical Modeling. Springer, New York (2007)
Rigby, R.A., Stasinopoulos, D.M.: Smooth centile curves for skew and kurtotic data modelled using the Box–Cox power exponential distribution. Stat. Med. 23(19), 3053–3076 (2004)
Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C Appl. Stat. 54(3), 507–554 (2005)
Rigby, R.A., Stasinopoulos, D.M.: Using the Box–Cox t distribution in GAMLSS to model skewness and kurtosis. Stat. Model. 6(3), 209–229 (2006)
Rigby, R.A., Stasinopoulos, D.M., Heller, G. Voudouris, V.: The Distribution Toolbox of GAMLSS. London (2014). http://www.gamlss.org/wp-content/uploads/2014/10/distributions.pdf
Rogers, W.H., Tukey, J.W.: Understanding some long-tailed symmetrical distributions. Stat. Neerl. 26(3), 211–226 (1972)
Stasinopoulos, D.M., Rigby, R.A. Akantziliotou, C.: Instructions on how to use the GAMLSS package in R. London (2008). http://www.gamlss.org
Vanegas, L.H., Paula, G.A.: A semiparametric approach for joint modeling of median and skewness. Test 24(1), 110–135 (2015)
Vanegas, L.H., Paula, G.A.: Log-symmetric distributions: statistical properties and parameter estimation. Braz. J. Probab. Stat. 30, 196–220 (2016)
Voudouris, V., Gilchrist, R., Rigby, R.A., Sedgwick, J., Stasinopoulos, D.M.: Modelling skewness and kurtosis with the BCPE density in GAMLSS. J. Appl. Stat. 39(6), 1279–1293 (2012)
Yang, Z.: A modified family of power transformations. Econ. Lett. 92(1), 14–19 (2006)
Yang, Z.L.: Some asymptotic results on Box-Cox transformation methodology. Commun. Stat. Theory Methods 25(2), 403–415 (1996)
Yeo, I.K., Johnson, R.A.: A new family of power transformation to improve normality or symmetry. Biometrika 87(4), 954–959 (2000)
Acknowledgements
We thank José Eduardo Corrente for providing the data used in this study, and Eliane C. Pinheiro for helpful discussions. We are grateful to the Associate Editor and two anonymous referees for constructive comments and suggestions. Funding was provided by Conselho Nacional de Desenvolvimento Científico e Tecnológico (Grant No. 304388-2014-9), Fundação de Amparo à Pesquisa do Estado de São Paulo (Grant No. 2012/21788-2), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this “Appendix,” we give the first and second derivatives of the log-likelihood function with respect to the parameters. Let \(z=h(y;\mu ,\sigma ,\lambda )\), where \(h(y;\mu ,\sigma ,\lambda )\) is given in (1), \(\varpi =-2r'(z^2)/r(z^2)\), and \(\xi =r((\sigma \lambda )^{-2}) / R((\sigma |\lambda |)^{-1}).\) We have
Let \(\ell \) denote the log-likelihood for a single observation y. We have
if \(\lambda \ne 0\); the last term in \(\ell \) is zero if \(\lambda =0\). The first derivatives of \(\ell \) are given by
The second derivatives of \(\ell \) are given by
The first and second derivatives of \(\ell \) are obtained after plugging the derivatives of z given above.
Note that the first derivatives of \(\ell \) depend on the weighting function \(\varpi \) (\(\varpi \) is given in Table 3 for some distributions). Consequently, \(\hbox {d}\varpi /\hbox {d}z\) appears in all the second derivatives of \(\ell \). Note that \(\partial \ell /\partial \sigma \) and \(\partial \ell /\partial \lambda \) involve \(\xi \), which in turn depends on the particular distribution in the BCS class and the truncation set. The first derivatives of \(\xi \) appear in \(\partial ^2 \ell /\partial \sigma ^2\), \(\partial ^2 \ell /\partial \lambda ^2\) and \(\partial ^2 \ell /\partial \sigma \partial \lambda \). The stability of the terms that involve \(\xi \) and its first derivatives around \(\lambda =0\) may vary according to different distributions. For instance, they may be unstable for the Box–Cox t distribution with small degrees of freedom parameter. Yet, a simulation study of the type I error probability of the likelihood ratio test of \(\mathrm{H}_0: \lambda =0\) in the Box–Cox t model for different values of the degrees of freedom parameter performed well; see Sect. 4.
Rights and permissions
About this article
Cite this article
Ferrari, S.L.P., Fumes, G. Box–Cox symmetric distributions and applications to nutritional data. AStA Adv Stat Anal 101, 321–344 (2017). https://doi.org/10.1007/s10182-017-0291-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-017-0291-6