Linear censored regression models with scale mixtures of normal distributions

Garay, Aldo M.; Lachos, Victor H.; Bolfarine, Heleno; Cabral, Celso R. B.

doi:10.1007/s00362-015-0696-9

Linear censored regression models with scale mixtures of normal distributions

Regular Article
Published: 11 June 2015

Volume 58, pages 247–278, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Aldo M. Garay¹,
Victor H. Lachos¹,
Heleno Bolfarine² &
…
Celso R. B. Cabral³

1095 Accesses
31 Citations
Explore all metrics

Abstract

In the framework of censored regression models the random errors are routinely assumed to have a normal distribution, mainly for mathematical convenience. However, this method has been criticized in the literature because of its sensitivity to deviations from the normality assumption. Here, we first establish a new link between the censored regression model and a recently studied class of symmetric distributions, which extend the normal one by the inclusion of kurtosis, called scale mixtures of normal (SMN) distributions. The Student-t, Pearson type VII, slash, contaminated normal, among others distributions, are contained in this class. A member of this class can be a good alternative to model this kind of data, because they have been shown its flexibility in several applications. In this work, we develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of certain truncated SMN distributions. The proposed algorithm is implemented in the R package SMNCensReg. Applications with simulated and a real data set are reported, illustrating the usefulness of the new methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible regression modeling for censored data based on mixtures of student-t distributions

Article 03 December 2018

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

Article 24 August 2018

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

Article 17 June 2021

References

Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723
Article MathSciNet MATH Google Scholar
Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473
Article MathSciNet MATH Google Scholar
Arellano-Valle R. B (1994) Distribuições elípticas: propriedades, inferência e aplicações a modelos de regressão. Ph.D. thesis, Instituto de Matemática e Estatística, Universidade de São Paulo, in portuguese
Bai ZD, Krishnaiah PR, Zhao LC (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388
Article MathSciNet MATH Google Scholar
Barros M, Galea M, González M, Leiva V (2010) Influence diagnostics in the Tobit censored response model. Stat Methods Appl 19:716–723
Article MathSciNet MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Fang KT, Zhang YT (1990) Generalized multivariate analysis. Springer, Berlin
MATH Google Scholar
Garay A. M, Lachos V, Massuia M. B (2013) SMNCensReg: fitting univariate censored regression model under the scale mixture of normal distributions. R package version 2.2. http://CRAN.R-project.org/package=SMNCensReg
Genç AI (2012) Moments of truncated normal/independent distributions. Stat Pap 54:741–764
Article MathSciNet MATH Google Scholar
Greene W (2012) Econometric analysis. Prentice Hall, New York
Google Scholar
Ibacache-Pulgar G, Paula G (2011) Local influence for Student-t partially linear models. Comput Stat Data Anal 55:1462–1478
Article MathSciNet MATH Google Scholar
Kim HJ (2008) Moments of truncated Student- distribution. J Korean Stat Soc 37:81–87
Article MathSciNet MATH Google Scholar
Labra FV, Garay AM, Lachos VH, Ortega EMM (2012) Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions. J Stat Plan Inference 142:2149–2165
Article MathSciNet MATH Google Scholar
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322
MathSciNet MATH Google Scholar
Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896
MathSciNet Google Scholar
Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829
Article MathSciNet MATH Google Scholar
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278
MathSciNet MATH Google Scholar
Lin TI (2010) Robust mixture modeling using multivariate skew tădistributions. Stat Comput 20:343–356
Article MathSciNet Google Scholar
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc 44:226–233
MathSciNet MATH Google Scholar
Massuia MB, Cabral CRB, Matos LA, Lachos VH (2014) Influence diagnostics for Student-t censored linear regression models. Statistics 85:1–21. doi:10.1080/02331888.2014.958489
MATH Google Scholar
Meilijson I (1989) A fast improvement to the EM algorithm to its own terms. J R Stat Soc Ser B 51:127–138
MathSciNet MATH Google Scholar
Meng XL, Rubin BD (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Article MathSciNet MATH Google Scholar
Meza C, Osorio F, la Cruz RD (2012) Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Stat Comput 22:121–139
Article MathSciNet MATH Google Scholar
Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799
Article Google Scholar
Ortega EMM, Bolfarine H, Paula GA (2003) Influence diagnostics in generalized log-gamma regression models. Comput Stat Data Anal 42:165–186
Article MathSciNet MATH Google Scholar
Osorio F, Paula GA, Galea M (2007) Assessment of local influence in elliptical linear models with longitudinal structure. Comput Stat Data Anal 51:4354–4368
Article MathSciNet MATH Google Scholar
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Therneau TM, Grambsch PM, Fleming RT (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160
Article MathSciNet MATH Google Scholar
Villegas C, Paula G, Cysneiros F, Galea M (2012) Influence diagnostics in generalized symmetric linear models. Comput Stat Data Anal 59:161–170
Article MathSciNet Google Scholar
Wei CG, Tanner MA (1990) Posterior computations for censored regression data. J Am Stat Assoc 85:829–839
Article Google Scholar
Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar

Download references

Acknowledgments

We thank the editor, associate editor, and two referees whose constructive comments led to an improved presentation of the paper. The research of Víctor H. Lachos was supported by Grant 305054/2011-2 from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-Brazil) and by Grant 2014/02938-9 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP-Brazil). Celso R. B. Cabral was supported by CNPq (via the Universal and CT-Amazônia projects) and CAPES (via Project PROCAD 2007). The research of Aldo M. Garay is supported by Grant 161119/2012-3 from CNPq and by Grant 2013/21468-0 from FAPESP-Brazil and Heleno Bolfarine was supported by CNPq.

Author information

Authors and Affiliations

Departamento de Estatística, Universidade Estadual de Campinas, Rua Sérgio Buarque de Holanda, 651 – Cidade Universitária Zeferino Vaz Campinas, São Paulo, SP, CEP 13083-859, Brazil
Aldo M. Garay & Victor H. Lachos
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil
Heleno Bolfarine
Departamento de Estatística, Universidade Federal do Amazonas, Manaus, AM, Brazil
Celso R. B. Cabral

Authors

Aldo M. Garay
View author publications
You can also search for this author in PubMed Google Scholar
Victor H. Lachos
View author publications
You can also search for this author in PubMed Google Scholar
Heleno Bolfarine
View author publications
You can also search for this author in PubMed Google Scholar
Celso R. B. Cabral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldo M. Garay.

Appendices

Appendix 1: Lemmas and corollary

The following Lemmas, provided by Kim (2008) and Genç (2012), are useful for evaluating some integrals used in this paper as well as for the implementation of the proposed EM-type algorithm.

Lemma 1

If $Z\sim \text {TN}_{(a,b)}\left( 0,1\right) $, then

$$\begin{aligned} \left( k+1\right) E\left[ Z^k\right] -E\left[ Z^{k+2}\right] =\frac{\left( b\right) ^{k+1}\phi \left( b\right) -\left( a\right) ^{k+1}\phi \left( a\right) }{\Phi \left( b\right) -\Phi \left( a\right) }, \end{aligned}$$

for $k=-1,0,1,2,\ldots $

Proof: See Lemma 2.3 in Kim (2008).

Lemma 2

Let U be a positive random variable. Then ${F}_{SMN}\left( a\right) =E_U\left[ \Phi \left( aU^{\frac{1}{2}}\right) \right] ,$ where ${F}_{SMN}(\cdot )$ denotes the cdf of a standard SMN random variable, that is, when $\mu =0$ and $\sigma ^2=1$.

Proof: See Lemma 3 in Genç (2012).

The following Corollary is a direct consequence of Proposition 1 given in Sect. 2.

Corollary 1

Let $Y \sim \text {SMN}(\mu ,\sigma ^2, {\varvec{\nu }})$ with scale factor U and $\mathcal{A}=(\text {a},\text {b})$. Then, for $r \ge 1$,

$$\begin{aligned} \text {E}\left[ U^{r}|Y\in \mathcal{A}\right]&=\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y|Y\in \mathcal{A}\right]&=\mu \text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] + \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y^{2}|Y\in \mathcal{A}\right]&=\mu ^2\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] +2\mu \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] \\&\quad +\sigma ^2\text {E}\left[ U^{r}X^{2}|X\in \mathcal{A}^*\right] , \end{aligned}$$

where $X \sim \text {SMN }(0,1,{\varvec{\nu }})$ and $\mathcal{A}^*=\left( a^*,b^*\right) $, with $a^*=\left( a-\mu \right) /\sigma $ and $b^*=\left( b-\mu \right) /\sigma $.

Appendix 2: Derivations of quantities $\text {E}_{\phi }\left( r,h\right) $ and $\text {E}_{\Phi }\left( r,h\right) ~$for SMN distributions

In this Appendix, we calculate the expressions for the expected values $\text {E}_{\phi }\left( r,h\right) $ and $\text {E}_{\Phi }\left( r,h\right) ,$ for $r,h\ge 0$, given in Proposition 1.

1.1 Pearson type VII distribution (and the Student-t distribution)

In this case $U\sim Gamma(\nu /2,\delta /2)$, with $\nu >0~\text {and}~\delta >0.$ To facilitate the notation, let us make $\alpha _1=(\nu +2r)/2$ and $\alpha _2=(h^2+\delta )/2$. Then,

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\int ^{\infty }_{0}\frac{\delta ^{\frac{\nu }{2}} u^{\frac{\nu }{2}-1} u^r}{\sqrt{2\pi } \Gamma \left( \frac{\nu }{2}\right) 2^{\frac{\nu }{2}} }\exp \left\{ -\frac{u(h^2+\delta ) }{2}\right\} du \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) \delta ^{\frac{\nu }{2}}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}}{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) 2^\frac{\nu }{2}} \int ^{\infty }_{0}\frac{\alpha _2^{\alpha _1}u'^{\left\{ \alpha _1-1\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\alpha _2{u'}\right\} du' \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{\nu /2}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}, \end{aligned}$$

(26)

where the integrand in (26) is the pdf of a random variable $U'$ with distribution $Gamma\left( \alpha _1,\alpha _2\right) $.

$$\begin{aligned} {\text {E}}_{\Phi }\left( r,h\right)&=\int ^{\infty }_{0} \frac{u^{\frac{2r+\nu }{2}-1} \Phi \left( h u^\frac{1}{2}\right) \delta ^\frac{\nu }{2} }{2^\frac{\nu }{2} \Gamma \left( \frac{\nu }{2} \right) } \exp \left\{ -\frac{u\delta }{2}\right\} du \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r} \int ^{\infty }_{0}\left( \frac{\delta }{2}\right) ^{\alpha _1}\frac{\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\left\{ {\alpha _1-1}\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\frac{u'\delta }{2}\right\} du' \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}E_{U'}\left[ \Phi \left( hU'^{\{ \frac{1}{2} \} }\right) \right] \nonumber \\&\quad {}=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\delta ), \end{aligned}$$

(27)

where in (27) the expectation is computed with respect to $U' \sim Gamma\left( \alpha _1,\delta /2\right) $ and $F_{PVII}(\cdot )$ represents the cdf of the Pearson type VII distribution. Then, the result follows from Lemma 2. When $\delta =\nu $, i.e., the Student-t distribution, we have that $\text { E}_{\phi }\left( r,h\right) $ and $\text { E}_{\Phi }\left( r,h\right) $ are given by

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) \sqrt{2\pi }}\left( \frac{\nu }{2}\right) ^{\frac{\nu }{2}}\left( \frac{h^2+\nu }{2}\right) ^{-\frac{\left( \nu +2r\right) }{2}};\\ \text {E}_{\Phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\nu }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\nu ). \end{aligned}$$

1.2 Slash distribution

In this case $U \sim Beta(\nu ,1)$, with positive shape parameter $\nu $, and

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=\int ^{1}_{0}u^r\frac{1}{\sqrt{2\pi }}\exp \left\{ -\frac{h^2}{2}u\right\} \nu u^{\nu -1}du = \frac{\nu }{\sqrt{2\pi }}\int ^{1}_{0}u^{\nu +r-1}\exp \left\{ -\frac{h^2}{2}u\right\} du, \nonumber \\&= \frac{\nu }{\sqrt{2\pi }}\left( \frac{h^2}{2}\right) ^{-\left( \nu +r\right) }\gamma ^{*}\left( \nu +r,\frac{h^2}{2}\right) \!, \end{aligned}$$

(28)

thus, considering $\gamma ^{*}\left( a,x\right) =\int ^{x}_{0}e^{-t}t^{a-1}dt$, we obtain Eq. (28).

$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\int ^{1}_{0}u^r\Phi \left( h u^\frac{1}{2}\right) \nu u^{\nu -1}du \nonumber \\&= \frac{\nu }{\nu +r}\int ^{1}_{0}\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\{\nu +r-1\}}\left( \nu +r\right) du' \end{aligned}$$

(29)

$$\begin{aligned}&= \frac{\nu }{\nu +r}F_{SL}(h|\nu +r), \end{aligned}$$

(30)

where the integrand in (29) is the expectation of the random variable $\Phi (h U'^{\{ \frac{1}{2} \} })$, with ${U'}\sim ~\text {Beta}(\nu +r,1)$. Using Lemma 2, we obtain Eq. (30), where $F_{SL}(\cdot )$ is the cdf of the slash distribution.

1.3 Contaminated normal distribution

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=u^r\phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\phi \left( h{\gamma }^\frac{1}{2}\right) \!;\\ \text {E}_{\Phi }\left( r,h\right)&=u^r\Phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\Phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\Phi \left( h \right) = \gamma ^r\left[ \nu \Phi \left( h u^\frac{1}{2}\right) + \left( 1-\nu \right) \Phi \left( h\right) \right] \\&\quad +\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \\&= \gamma ^rF_{CN}(h|\nu ,\gamma )+\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \!, \end{aligned}$$

where $F_{CN}(\cdot )$ is the cdf of the contaminated normal distribution.

Appendix C. Details of the EM-type algorithm

In this Appendix, we derive the EM algorithm Eqs. (20)–(22). Let ${\varvec{\theta }}=({\varvec{\beta }}^{\top }, \sigma ^2, {\varvec{\nu }})$ be the vector with all parameters in the SMN-CR model and consider the notation given in Sect. 3.2. Denoting the complete-data likelihood by $L(\cdot |\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})$ and pdf’s in general by $f(\cdot )$, we have that

$$\begin{aligned} L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&=f(\mathbf {y}_{\text {obs}}, \mathbf {y}_{L},\mathbf {u})=f(\mathbf {y}_{\text {obs}},\mathbf {y}_{L}|\mathbf {u})f(\mathbf {u})\\&= f(\mathbf {y}|\mathbf {u})f(\mathbf {u}) = \prod ^n_{i=1}f(y_i|u_i)f(u_i|{\varvec{\nu }}). \end{aligned}$$

Dropping unimportant constants, the complete-data log-likelihood function is given by

$$\begin{aligned} \ell _c({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&= \log (L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u}))\\&\quad {}= \frac{n}{2} \log (\sigma ^2) +\frac{1}{2} \sum _{i=1}^{n}\log \left( u_i\right) - \frac{1}{2\sigma ^2}\sum _{i=1}^n u_i(y_i-\mathbf {x}^{\top }_i{\varvec{\beta }})^2 \\&\qquad + \sum _{i=1}^{n} \log \left( f(u_i|{\varvec{\nu }})\right) . \end{aligned}$$

The Q-function at the E-step of the algorithm is given by

$$\begin{aligned} Q({\varvec{\theta }}|{\varvec{\theta }}^{(k)}) =\text {E}_{{{\varvec{\theta }}^{(k)}}}\left[ \ell _{c}\left( {\varvec{\theta }}|\mathbf {Y}_\text {obs},\mathbf {Y}_L,\mathbf {U}\right) |\mathbf {y}_\text {obs}\right] , \end{aligned}$$

so we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)})&= -\frac{n}{2}\log \left( \sigma ^2\right) -\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left\{ \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^2|y_{\text {obs}_i}] \right. \\&\quad {} \left. -\,2 \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i|y_{\text {obs}_i}] \mathbf {x}^{\top }_i{\varvec{\beta }}+\text {E}_{{\varvec{\theta }}^{(k)}}[U_i|y_{\text {obs}_i}] (\mathbf {x}^{\top }_i{\varvec{\beta }})^2\right\} \\&\quad {} +\,\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

The expectations $\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) = \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|y_{\text {obs}_i}],\,\,\, s=0,1,2$, used in the E-step of the algorithm, are computed considering the two possible cases: (i) when the observation i is uncensored and (ii) otherwise. In the former case we solve the problem using results obtained by Osorio et al. (2007). In the later case we use Proposition 1. Then, we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}) =&-\frac{n}{2}\log (\sigma ^2)-\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left[ \mathcal{E}_{2 i}\big ({\varvec{\theta }}^{(k)}\big ) -2 \mathcal{E}_{1 i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}+\mathcal{E}_{0 i}\big ({\varvec{\theta }}^{(k)}\big ) \big (\mathbf {x}^{\top }_i{\varvec{\beta }}\big )^2\right] \\&+\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

In the CM-step, we take the derivatives of $Q\big ({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}\big )$ with respect to ${\varvec{\beta }}$ and $\sigma ^2$, i.e.,

The solution of $\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial {\varvec{\beta }}} = 0$ is

$$\begin{aligned} {{\varvec{\beta }}}^{(k+1)}=\left( \sum ^n_{i=1}\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\mathbf {x}_i\mathbf {x}^{\top }_i\right) ^ {-1}\sum ^n_{i=1}\mathbf {x}_i\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ). \end{aligned}$$

The solution of $\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial \sigma ^2} = 0$ is

$$\begin{aligned} {\sigma ^2}^{(k+1)}&=\frac{1}{n}\sum ^n_{i=1}\left[ \mathcal{E}_{2i}\big ({\varvec{\theta }}^{(k)}\big )-2\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)} +\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\big (\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}\big )^2\right] . \end{aligned}$$

For the CML-step, we estimate ${\varvec{\nu }}$ by maximizing the marginal log-likelihood, circumventing the (in general) complicated task of computing $\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}]$ and $\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]$, i.e.,

$$\begin{aligned} {\varvec{\nu }}^{(k+1)}&=\text {argmax}_{{\varvec{\nu }}}\left\{ \sum _{i=1}^{m} \log \left[ {F}_{SMN}\left( \frac{\kappa _i-\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}}{\sigma ^{(k+1)}}\right) \right] \right. \\&\quad {} + \left. \sum ^{n}_{i=m+1} \log \left[ f_{SMN}\big (y_i|\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)},{\sigma ^2}^{(k+1)},{\varvec{\nu }}\big )\right] \right\} . \end{aligned}$$

Appendix D. Complementary results of the simulation studies: asymptotic properties

Figures 5 and 6 depict the average bias and the average MSE of $\widehat{\beta }_1$, $\widehat{\beta }_2$ and $\widehat{\sigma ^2}$ for the levels of censoring $p=25\,\%$ and $p=45\,\%$, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garay, A.M., Lachos, V.H., Bolfarine, H. et al. Linear censored regression models with scale mixtures of normal distributions. Stat Papers 58, 247–278 (2017). https://doi.org/10.1007/s00362-015-0696-9

Download citation

Received: 08 March 2014
Revised: 21 January 2015
Published: 11 June 2015
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00362-015-0696-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear censored regression models with scale mixtures of normal distributions

Abstract

Access this article

Similar content being viewed by others

Flexible regression modeling for censored data based on mixtures of student-t distributions

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

References

Acknowledgments