Abstract
Replicated data with measurement errors are frequently presented in economical, environmental, chemical, medical and other fields. In this paper, we discuss a replicated measurement error model under the class of scale mixtures of skew-normal distributions, which extends symmetric heavy and light tailed distributions to asymmetric cases. We also consider equation error in the model for displaying the matching degree between the true covariate and response. Explicit iterative expressions of maximum likelihood estimates are provided via the expectation–maximization type algorithm. Empirical Bayes estimates are conducted for predicting the true covariate and response. We study the effectiveness as well as the robustness of the maximum likelihood estimations through two simulation studies. The method is applied to analyze a continuing survey data of food intakes by individuals on diet habits.
Similar content being viewed by others
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Andrews, D. F., & Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B, 36(1), 99–102.
Arellano-Valle, R. B., Bolfarine, H., & Lachos, V. H. (2005). Skew-normal linear mixed models. Journal of Data Science, 3, 415–438.
Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multivariate skew-normal distribution. Journal of the Royal Statistical Society: Series B, 61(3), 579–602.
Bartlett, J. W., De Stavola, B. L., & Frost, C. (2009). Linear mixed models for replication data to efficiently allow for covariate measurement error. Statistics in Medicine, 28(25), 3158–3178.
Basso, R. M., Lachos, V. H., Cabral, C. R., & Ghosh, P. (2010). Robust mixture modeling based on scale mixtures of skew-normal distributions. Computational Statistics and Data Analysis, 54(12), 2926–2941.
Branco, M. D., & Dey, D. K. (2001). A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis, 79(1), 99–113.
Cancho, V. G., Lachos, V. H., & Ortega, E. M. M. (2008). A nonlinear regression model with skew-normal errors. Statistical Papers, 52, 571–583.
Cao, C. Z., Lin, J. G., & Shi, J. Q. (2014). Diagnostics on nonlinear model with scale mixtures of skew-normal and first-order autoregressive errors. Statistics, 48(5), 1033–1047.
Cao, C. Z., Lin, J. G., Shi, J. Q., Wang, W., & Zhang, X. Y. (2015). Multivariate measurement error models for replicated data under heavy-tailed distributions. Journal of Chemometrics, 29(8), 457–466.
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective (2nd ed.). Boca Raton: Chapman and Hall.
Chan, L. K., & Mak, T. K. (1979). Maximum likelihood estimation of a linear structural relationship with replication. Journal of the Royal Statistical Society: Series B, 41(2), 263–268.
Cheng, C. L., & Van Ness, J. W. (1999). Statistical regression with measurement error. London: Arnold.
Cheng, C. L., & Riu, J. (2006). On estimating linear relationships when both variables are subject to heteroscedastic measurement errors. Technometrics, 48, 511–519.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
Fang, K. T., Kotz, S., & Ng, K. W. (1990). Symmetrical multivariate and related distributions. London: Chapman and Hall.
Fuller, W. A. (1987). Measurement error models. New York: Wiley.
Genton, M. G. (2004). Skew-elliptical distributions and their applications: A Journey beyond normality. Boca Raton: Chapman & Hall.
Giménez, P., & Patat, M. L. (2005). Estimation in comparative calibration models with replicated measurement. Statistics and Probability Letters, 71(2), 155–164.
Gori, L., & Sodini, M. (2011). Nonlinear dynamics in an OLG growth model with young and old age labour supply: The role of public health expenditure. Computational Economics, 38, 261–275.
Harnack, L., Stang, J., & Story, M. (1999). Soft drink consumption among US children and adolescents: Nutritional consequences. Journal of the American Dietetic Association, 99(4), 436–441.
Harville, D. A. (1997). Matrix algebra from a statistician’s perspective. New York: Springer.
Isogawa, Y. (1985). Estimating a multivariate linear structural relationship with replication. Journal of the Royal Statistical Society: Series B, 47, 211–215.
Jacobs, H. L., Kahn, H. D., Stralka, K. A., & Phan, D. B. (1998). Estimates of per capita fish consumption in the US based on the continuing survey of food intake by individuals (CSFII). Risk Analysis, 18(3), 283–291.
Jara, A., Quintana, F., & Martin, E. S. (2008). Linear mixed models with skew-elliptical distributions: A Bayesian approach. Computational Statistics and Data Analysis, 52(11), 5033–5045.
Jones, D. Y., Schatzkin, A., Green, S. B., Block, G., Brinton, L. A., Ziegler, R. G., et al. (1987). Dietary fat and breast cancer in the National Health and Nutrition Examination Survey I: Epidemiologic follow-up study. Journal of the National Cancer Institute, 79, 465–471.
Lachos, V. H., Angolini, T., & Abanto-Valle, C. A. (2011). On estimation and local influence analysis for measurement errors models under heavy-tailed distributions. Statistical Papers, 52, 567–590.
Lachos, V. H., Ghosh, P., & Arellano-Valle, R. B. (2010a). Likelihood based inferance for skew-normal/independent linear mixed models. Statistica Sinica, 20, 303–322.
Lachos, V. H., Labra, F. V., Bolfarine, H., & Ghosh, P. (2010b). Multivariate measurement error models based on scale mixtures of the skew-normal distribution. Statistics, 44(6), 541–556.
Lange, K. L., & Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198.
le Coutre, J., Mattson, M. P., Dillin, A., Friedman, J., & Bistrian, B. (2013). Nutrition and the biology of human aging: Cognitive decline/food intake and caloric restriction. The Journal of Nutrition, Health and Aging, 17(8), 717–720.
Lin, N., Bailey, B. A., He, X. M., & Buttlar, W. G. (2004). Adjustment of measuring devices with linear models. Technometrics, 46, 127–134.
Lin, J. G., & Cao, C. Z. (2013). On estimation of measurement error models with replication under heavy-tailed distributions. Computational Statistics, 28(2), 809–829.
McLachlan, G. L., & Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.
Montenegro, L. C., Bolfarine, H., & Lachos, V. H. (2010). Inference for a skew extension of the Grubb’s model. Statistical Papers, 51, 701–715.
Osorio, F., Paula, G. A., & Galea, M. (2009). On estimation and influence diagnostics for the Grubb’s model under heavy-tailed distributions. Computational Statistics and Data Analysis, 53, 1249–1263.
Reiersol, O. (1950). Identifiability of a linear relation between variables which are subject to errors. Econometrica, 18, 375–389.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Sun, S. Z., & Empie, M. W. (2007). Lack of findings for the association between obesity risk and usual sugar-sweetened beverage consumption in adults: A primary analysis of databases of CSFII-1989–1991, CSFII-1994–1998, NHANES III, and combined NHANES 1999–2002. Food and Chemical Toxicology, 45(8), 1523–1536.
Wimmer, G., & Witkovský, V. (2007). Univariate linear calibration via replicated errors-in-variables model. Journal of Statistical Computation and Simulation, 77, 213–227.
Xie, F. C., Wei, B. C., & Lin, J. G. (2008). Homogeneity diagnostics for skew-normal nonlinear regression models. Statistics and Probability Letters, 20, 303–322.
Zeller, C. B., Carvalho, R. R., & Lachos, V. H. (2012). On diagnostics in multivariate measurement error models under asymmetric heavy-tailed distributions. Statistical Papers, 53(3), 665–683.
Zeller, C. B., Lachos, V. H., & Vilca-Labra, F. E. (2011). Local influence analysis for regression models with scale mixtures of skew-normal distributions. Journal of Applied Statistics, 38(2), 343–368.
Zeller, C. B., Lachos, V. H., & Vilca-Labra, F. E. (2014). Influence diagnostics for Grubb’s model with asymmetric heavy-tailed distributions. Statistical Papers, 55(3), 671–690.
Acknowledgements
This research was supported by the National Science Foundation of China (Grant No. 11301278), the Natural Science Foundation of Jiangsu Province of China (Grant No. BK2012459), the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Grant No. 13YJC910001), and Academic Degree Postgraduate innovation projects of Jiangsu province Ordinary University (Grant No. KYLX15-0883).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 The PDF of Some SMSN Distributions and the Conditional Moments
The pdf of some important SMSN distributions and the properties about conditional moments are as follows:
(1) The multivariate skew-t distribution \(ST _m(\varvec{\mu }, {\varvec{\Sigma }}, \varvec{\lambda };\nu )\):
\(\kappa (u)=1/u\), \(U\sim Gamma (\nu /2,\nu /2)\) with \(\nu > 0\). The pdf of \(\varvec{Y}\) is given by
where \(d=(\varvec{y}-\varvec{\mu })^{\top }{\varvec{\Sigma }}^{-1}(\varvec{y}-\varvec{\mu })\), \(t_m(\cdot |\varvec{\mu }, {\varvec{\Sigma }};\nu )\) and \(T(\cdot ;\nu )\) denote the pdf of m-dimensional Student-t distribution and the cdf of standard univariate t distribution, respectively. The skew-normal distribution is the limiting case when \(\nu \rightarrow +\infty \).
The conditional moments take the forms
where \(f_0(\varvec{y})=\int \nolimits _0^{\infty }\phi _m(\varvec{y}|\varvec{\mu },\kappa (u){\varvec{\Sigma }})d H (u)\), i.e. the pdf of the class of SMN distribution when \(\varvec{\lambda }=\varvec{0}\).
(2) The multivariate skew-slash distribution \(SS _m(\varvec{\mu }, {\varvec{\Sigma }}, \varvec{\lambda };\nu )\):
\(\kappa (u)=1/u\), \(U\sim Beta (\nu , 1)\) with \(0<u<1\) and \(\nu > 0\). The pdf of \(\varvec{Y}\) is given by
When \(\nu \rightarrow +\infty \), the skew-slash distribution reduces to the skew-normal one.
The conditional moments take the forms
where \(S\sim Gamma ((2\nu +m+2r)/2,d/2)I _{(0,1)}\) and \(P_x(a,b)\) denotes the cdf of the \(Gamma (a,b)\) distribution evaluated at x.
(3) The multivariate skew-contaminated normal distribution \(SCN _m(\varvec{\mu }, {\varvec{\Sigma }}, \varvec{\lambda };\nu ,\gamma )\):
When \(\kappa (u)=1/u\) and U follows a discrete random probability function \(h(u;\nu ,\gamma )=\nu I _{(u=\gamma )}+(1-\nu ) I _{(u=1)}\) with given parameter vector \(\varvec{\nu }=(\nu ,\gamma )^{\top }\) and \(0<\nu<1, 0<\gamma \leqslant 1\), we get the multivariate skew-contaminated normal distribution with the pdf as
The SN distribution is a special case as \(\gamma =1\).
The conditional moments take the forms
1.2 The First Derivatives of \(d_t\), \(A_t\) and \(\log |{\varvec{\Sigma }}|\) with Respect to \(\varvec{\theta }\)
By direct calculations, we have the first derivatives of \(d_t\), \(A_t\) and \(\log |{\varvec{\Sigma }}|\) as follows:
for \(d_t\):
for \(A_t\):
where \(\psi =\frac{\lambda _x\phi _x}{\sqrt{\phi _x+\lambda _x^2\Lambda _x}}\).
for \(\log |{\varvec{\Sigma }}|\):
where \(|{\varvec{\Sigma }}|=\phi _{\delta }^{p-1}\phi _{\varepsilon }^{q-1}\tau \), \(\tau =(\phi _{\delta }+p\phi _x)(\phi _{\varepsilon }+q\phi _e)+q\beta ^2\phi _{\delta }\phi _x\).
In addition, we also need to calculate the following derivation:
for \(\varvec{\mu }\):
for \({\varvec{\Sigma }}\):
for \(\varvec{b}\):
for \(\psi \):
Rights and permissions
About this article
Cite this article
Cao, C., Wang, Y., Shi, J.Q. et al. Measurement Error Models for Replicated Data Under Asymmetric Heavy-Tailed Distributions. Comput Econ 52, 531–553 (2018). https://doi.org/10.1007/s10614-017-9702-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-017-9702-8