Skip to main content
Log in

Linear censored regression models with scale mixtures of normal distributions

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In the framework of censored regression models the random errors are routinely assumed to have a normal distribution, mainly for mathematical convenience. However, this method has been criticized in the literature because of its sensitivity to deviations from the normality assumption. Here, we first establish a new link between the censored regression model and a recently studied class of symmetric distributions, which extend the normal one by the inclusion of kurtosis, called scale mixtures of normal (SMN) distributions. The Student-t, Pearson type VII, slash, contaminated normal, among others distributions, are contained in this class. A member of this class can be a good alternative to model this kind of data, because they have been shown its flexibility in several applications. In this work, we develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of certain truncated SMN distributions. The proposed algorithm is implemented in the R package SMNCensReg. Applications with simulated and a real data set are reported, illustrating the usefulness of the new methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473

    Article  MathSciNet  MATH  Google Scholar 

  • Arellano-Valle R. B (1994) Distribuições elípticas: propriedades, inferência e aplicações a modelos de regressão. Ph.D. thesis, Instituto de Matemática e Estatística, Universidade de São Paulo, in portuguese

  • Bai ZD, Krishnaiah PR, Zhao LC (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388

    Article  MathSciNet  MATH  Google Scholar 

  • Barros M, Galea M, González M, Leiva V (2010) Influence diagnostics in the Tobit censored response model. Stat Methods Appl 19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fang KT, Zhang YT (1990) Generalized multivariate analysis. Springer, Berlin

    MATH  Google Scholar 

  • Garay A. M, Lachos V, Massuia M. B (2013) SMNCensReg: fitting univariate censored regression model under the scale mixture of normal distributions. R package version 2.2. http://CRAN.R-project.org/package=SMNCensReg

  • Genç AI (2012) Moments of truncated normal/independent distributions. Stat Pap 54:741–764

    Article  MathSciNet  MATH  Google Scholar 

  • Greene W (2012) Econometric analysis. Prentice Hall, New York

    Google Scholar 

  • Ibacache-Pulgar G, Paula G (2011) Local influence for Student-t partially linear models. Comput Stat Data Anal 55:1462–1478

    Article  MathSciNet  MATH  Google Scholar 

  • Kim HJ (2008) Moments of truncated Student- distribution. J Korean Stat Soc 37:81–87

    Article  MathSciNet  MATH  Google Scholar 

  • Labra FV, Garay AM, Lachos VH, Ortega EMM (2012) Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions. J Stat Plan Inference 142:2149–2165

    Article  MathSciNet  MATH  Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322

    MathSciNet  MATH  Google Scholar 

  • Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896

    MathSciNet  Google Scholar 

  • Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829

    Article  MathSciNet  MATH  Google Scholar 

  • Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278

    MathSciNet  MATH  Google Scholar 

  • Lin TI (2010) Robust mixture modeling using multivariate skew tădistributions. Stat Comput 20:343–356

    Article  MathSciNet  Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Massuia MB, Cabral CRB, Matos LA, Lachos VH (2014) Influence diagnostics for Student-t censored linear regression models. Statistics 85:1–21. doi:10.1080/02331888.2014.958489

    MATH  Google Scholar 

  • Meilijson I (1989) A fast improvement to the EM algorithm to its own terms. J R Stat Soc Ser B 51:127–138

    MathSciNet  MATH  Google Scholar 

  • Meng XL, Rubin BD (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    Article  MathSciNet  MATH  Google Scholar 

  • Meza C, Osorio F, la Cruz RD (2012) Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Stat Comput 22:121–139

    Article  MathSciNet  MATH  Google Scholar 

  • Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799

    Article  Google Scholar 

  • Ortega EMM, Bolfarine H, Paula GA (2003) Influence diagnostics in generalized log-gamma regression models. Comput Stat Data Anal 42:165–186

    Article  MathSciNet  MATH  Google Scholar 

  • Osorio F, Paula GA, Galea M (2007) Assessment of local influence in elliptical linear models with longitudinal structure. Comput Stat Data Anal 51:4354–4368

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Therneau TM, Grambsch PM, Fleming RT (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160

    Article  MathSciNet  MATH  Google Scholar 

  • Villegas C, Paula G, Cysneiros F, Galea M (2012) Influence diagnostics in generalized symmetric linear models. Comput Stat Data Anal 59:161–170

    Article  MathSciNet  Google Scholar 

  • Wei CG, Tanner MA (1990) Posterior computations for censored regression data. J Am Stat Assoc 85:829–839

    Article  Google Scholar 

  • Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

Download references

Acknowledgments

We thank the editor, associate editor, and two referees whose constructive comments led to an improved presentation of the paper. The research of Víctor H. Lachos was supported by Grant 305054/2011-2 from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-Brazil) and by Grant 2014/02938-9 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP-Brazil). Celso R. B. Cabral was supported by CNPq (via the Universal and CT-Amazônia projects) and CAPES (via Project PROCAD 2007). The research of Aldo M. Garay is supported by Grant 161119/2012-3 from CNPq and by Grant 2013/21468-0 from FAPESP-Brazil and Heleno Bolfarine was supported by CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aldo M. Garay.

Appendices

Appendix 1: Lemmas and corollary

The following Lemmas, provided by Kim (2008) and Genç (2012), are useful for evaluating some integrals used in this paper as well as for the implementation of the proposed EM-type algorithm.

Lemma 1

If \(Z\sim \text {TN}_{(a,b)}\left( 0,1\right) \), then

$$\begin{aligned} \left( k+1\right) E\left[ Z^k\right] -E\left[ Z^{k+2}\right] =\frac{\left( b\right) ^{k+1}\phi \left( b\right) -\left( a\right) ^{k+1}\phi \left( a\right) }{\Phi \left( b\right) -\Phi \left( a\right) }, \end{aligned}$$

for \(k=-1,0,1,2,\ldots \)

Proof:  See Lemma 2.3 in Kim (2008).

Lemma 2

Let U be a positive random variable. Then \({F}_{SMN}\left( a\right) =E_U\left[ \Phi \left( aU^{\frac{1}{2}}\right) \right] ,\) where \({F}_{SMN}(\cdot )\) denotes the cdf of a standard SMN random variable, that is, when \(\mu =0\) and \(\sigma ^2=1\).

Proof: See Lemma 3 in Genç (2012).

The following Corollary is a direct consequence of Proposition 1 given in Sect. 2.

Corollary 1

Let \(Y \sim \text {SMN}(\mu ,\sigma ^2, {\varvec{\nu }})\) with scale factor U and \(\mathcal{A}=(\text {a},\text {b})\). Then, for \(r \ge 1\),

$$\begin{aligned} \text {E}\left[ U^{r}|Y\in \mathcal{A}\right]&=\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y|Y\in \mathcal{A}\right]&=\mu \text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] + \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y^{2}|Y\in \mathcal{A}\right]&=\mu ^2\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] +2\mu \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] \\&\quad +\sigma ^2\text {E}\left[ U^{r}X^{2}|X\in \mathcal{A}^*\right] , \end{aligned}$$

where \(X \sim \text {SMN }(0,1,{\varvec{\nu }})\) and \(\mathcal{A}^*=\left( a^*,b^*\right) \), with \(a^*=\left( a-\mu \right) /\sigma \) and \(b^*=\left( b-\mu \right) /\sigma \).

Appendix 2: Derivations of quantities \(\text {E}_{\phi }\left( r,h\right) \) and \(\text {E}_{\Phi }\left( r,h\right) ~\)for SMN distributions

In this Appendix, we calculate the expressions for the expected values \(\text {E}_{\phi }\left( r,h\right) \) and \(\text {E}_{\Phi }\left( r,h\right) ,\) for \(r,h\ge 0\), given in Proposition 1.

1.1 Pearson type VII distribution (and the Student-t distribution)

In this case \(U\sim Gamma(\nu /2,\delta /2)\), with \(\nu >0~\text {and}~\delta >0.\) To facilitate the notation, let us make \(\alpha _1=(\nu +2r)/2\) and \(\alpha _2=(h^2+\delta )/2\). Then,

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\int ^{\infty }_{0}\frac{\delta ^{\frac{\nu }{2}} u^{\frac{\nu }{2}-1} u^r}{\sqrt{2\pi } \Gamma \left( \frac{\nu }{2}\right) 2^{\frac{\nu }{2}} }\exp \left\{ -\frac{u(h^2+\delta ) }{2}\right\} du \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) \delta ^{\frac{\nu }{2}}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}}{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) 2^\frac{\nu }{2}} \int ^{\infty }_{0}\frac{\alpha _2^{\alpha _1}u'^{\left\{ \alpha _1-1\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\alpha _2{u'}\right\} du' \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{\nu /2}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}, \end{aligned}$$
(26)

where the integrand in (26) is the pdf of a random variable \(U'\) with distribution \(Gamma\left( \alpha _1,\alpha _2\right) \).

$$\begin{aligned} {\text {E}}_{\Phi }\left( r,h\right)&=\int ^{\infty }_{0} \frac{u^{\frac{2r+\nu }{2}-1} \Phi \left( h u^\frac{1}{2}\right) \delta ^\frac{\nu }{2} }{2^\frac{\nu }{2} \Gamma \left( \frac{\nu }{2} \right) } \exp \left\{ -\frac{u\delta }{2}\right\} du \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r} \int ^{\infty }_{0}\left( \frac{\delta }{2}\right) ^{\alpha _1}\frac{\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\left\{ {\alpha _1-1}\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\frac{u'\delta }{2}\right\} du' \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}E_{U'}\left[ \Phi \left( hU'^{\{ \frac{1}{2} \} }\right) \right] \nonumber \\&\quad {}=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\delta ), \end{aligned}$$
(27)

where in (27) the expectation is computed with respect to \(U' \sim Gamma\left( \alpha _1,\delta /2\right) \) and \(F_{PVII}(\cdot )\) represents the cdf of the Pearson type VII distribution. Then, the result follows from Lemma 2. When \(\delta =\nu \), i.e., the Student-t distribution, we have that \(\text { E}_{\phi }\left( r,h\right) \) and \(\text { E}_{\Phi }\left( r,h\right) \) are given by

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) \sqrt{2\pi }}\left( \frac{\nu }{2}\right) ^{\frac{\nu }{2}}\left( \frac{h^2+\nu }{2}\right) ^{-\frac{\left( \nu +2r\right) }{2}};\\ \text {E}_{\Phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\nu }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\nu ). \end{aligned}$$

1.2 Slash distribution

In this case \(U \sim Beta(\nu ,1)\), with positive shape parameter \(\nu \), and

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=\int ^{1}_{0}u^r\frac{1}{\sqrt{2\pi }}\exp \left\{ -\frac{h^2}{2}u\right\} \nu u^{\nu -1}du = \frac{\nu }{\sqrt{2\pi }}\int ^{1}_{0}u^{\nu +r-1}\exp \left\{ -\frac{h^2}{2}u\right\} du, \nonumber \\&= \frac{\nu }{\sqrt{2\pi }}\left( \frac{h^2}{2}\right) ^{-\left( \nu +r\right) }\gamma ^{*}\left( \nu +r,\frac{h^2}{2}\right) \!, \end{aligned}$$
(28)

thus, considering \(\gamma ^{*}\left( a,x\right) =\int ^{x}_{0}e^{-t}t^{a-1}dt\), we obtain Eq. (28).

$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\int ^{1}_{0}u^r\Phi \left( h u^\frac{1}{2}\right) \nu u^{\nu -1}du \nonumber \\&= \frac{\nu }{\nu +r}\int ^{1}_{0}\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\{\nu +r-1\}}\left( \nu +r\right) du' \end{aligned}$$
(29)
$$\begin{aligned}&= \frac{\nu }{\nu +r}F_{SL}(h|\nu +r), \end{aligned}$$
(30)

where the integrand in (29) is the expectation of the random variable \(\Phi (h U'^{\{ \frac{1}{2} \} })\), with \({U'}\sim ~\text {Beta}(\nu +r,1)\). Using Lemma 2, we obtain Eq. (30), where \(F_{SL}(\cdot )\) is the cdf of the slash distribution.

1.3 Contaminated normal distribution

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=u^r\phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\phi \left( h{\gamma }^\frac{1}{2}\right) \!;\\ \text {E}_{\Phi }\left( r,h\right)&=u^r\Phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\Phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\Phi \left( h \right) = \gamma ^r\left[ \nu \Phi \left( h u^\frac{1}{2}\right) + \left( 1-\nu \right) \Phi \left( h\right) \right] \\&\quad +\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \\&= \gamma ^rF_{CN}(h|\nu ,\gamma )+\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \!, \end{aligned}$$

where \(F_{CN}(\cdot )\) is the cdf of the contaminated normal distribution.

Appendix C. Details of the EM-type algorithm

In this Appendix, we derive the EM algorithm Eqs. (20)–(22). Let \({\varvec{\theta }}=({\varvec{\beta }}^{\top }, \sigma ^2, {\varvec{\nu }})\) be the vector with all parameters in the SMN-CR model and consider the notation given in Sect. 3.2. Denoting the complete-data likelihood by \(L(\cdot |\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})\) and pdf’s in general by \(f(\cdot )\), we have that

$$\begin{aligned} L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&=f(\mathbf {y}_{\text {obs}}, \mathbf {y}_{L},\mathbf {u})=f(\mathbf {y}_{\text {obs}},\mathbf {y}_{L}|\mathbf {u})f(\mathbf {u})\\&= f(\mathbf {y}|\mathbf {u})f(\mathbf {u}) = \prod ^n_{i=1}f(y_i|u_i)f(u_i|{\varvec{\nu }}). \end{aligned}$$

Dropping unimportant constants, the complete-data log-likelihood function is given by

$$\begin{aligned} \ell _c({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&= \log (L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u}))\\&\quad {}= \frac{n}{2} \log (\sigma ^2) +\frac{1}{2} \sum _{i=1}^{n}\log \left( u_i\right) - \frac{1}{2\sigma ^2}\sum _{i=1}^n u_i(y_i-\mathbf {x}^{\top }_i{\varvec{\beta }})^2 \\&\qquad + \sum _{i=1}^{n} \log \left( f(u_i|{\varvec{\nu }})\right) . \end{aligned}$$

The Q-function at the E-step of the algorithm is given by

$$\begin{aligned} Q({\varvec{\theta }}|{\varvec{\theta }}^{(k)}) =\text {E}_{{{\varvec{\theta }}^{(k)}}}\left[ \ell _{c}\left( {\varvec{\theta }}|\mathbf {Y}_\text {obs},\mathbf {Y}_L,\mathbf {U}\right) |\mathbf {y}_\text {obs}\right] , \end{aligned}$$

so we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)})&= -\frac{n}{2}\log \left( \sigma ^2\right) -\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left\{ \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^2|y_{\text {obs}_i}] \right. \\&\quad {} \left. -\,2 \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i|y_{\text {obs}_i}] \mathbf {x}^{\top }_i{\varvec{\beta }}+\text {E}_{{\varvec{\theta }}^{(k)}}[U_i|y_{\text {obs}_i}] (\mathbf {x}^{\top }_i{\varvec{\beta }})^2\right\} \\&\quad {} +\,\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

The expectations \(\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) = \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|y_{\text {obs}_i}],\,\,\, s=0,1,2\), used in the E-step of the algorithm, are computed considering the two possible cases: (i) when the observation i is uncensored and (ii) otherwise. In the former case we solve the problem using results obtained by Osorio et al. (2007). In the later case we use Proposition 1. Then, we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}) =&-\frac{n}{2}\log (\sigma ^2)-\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left[ \mathcal{E}_{2 i}\big ({\varvec{\theta }}^{(k)}\big ) -2 \mathcal{E}_{1 i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}+\mathcal{E}_{0 i}\big ({\varvec{\theta }}^{(k)}\big ) \big (\mathbf {x}^{\top }_i{\varvec{\beta }}\big )^2\right] \\&+\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

In the CM-step, we take the derivatives of \(Q\big ({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}\big )\) with respect to \({\varvec{\beta }}\) and \(\sigma ^2\), i.e.,

The solution of \(\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial {\varvec{\beta }}} = 0\) is

$$\begin{aligned} {{\varvec{\beta }}}^{(k+1)}=\left( \sum ^n_{i=1}\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\mathbf {x}_i\mathbf {x}^{\top }_i\right) ^ {-1}\sum ^n_{i=1}\mathbf {x}_i\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ). \end{aligned}$$

The solution of \(\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial \sigma ^2} = 0\) is

$$\begin{aligned} {\sigma ^2}^{(k+1)}&=\frac{1}{n}\sum ^n_{i=1}\left[ \mathcal{E}_{2i}\big ({\varvec{\theta }}^{(k)}\big )-2\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)} +\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\big (\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}\big )^2\right] . \end{aligned}$$

For the CML-step, we estimate \({\varvec{\nu }}\) by maximizing the marginal log-likelihood, circumventing the (in general) complicated task of computing \(\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}]\) and \(\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]\), i.e.,

$$\begin{aligned} {\varvec{\nu }}^{(k+1)}&=\text {argmax}_{{\varvec{\nu }}}\left\{ \sum _{i=1}^{m} \log \left[ {F}_{SMN}\left( \frac{\kappa _i-\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}}{\sigma ^{(k+1)}}\right) \right] \right. \\&\quad {} + \left. \sum ^{n}_{i=m+1} \log \left[ f_{SMN}\big (y_i|\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)},{\sigma ^2}^{(k+1)},{\varvec{\nu }}\big )\right] \right\} . \end{aligned}$$

Appendix D. Complementary results of the simulation studies: asymptotic properties

Figures 5 and 6 depict the average bias and the average MSE of \(\widehat{\beta }_1\), \(\widehat{\beta }_2\) and \(\widehat{\sigma ^2}\) for the levels of censoring \(p=25\,\%\) and \(p=45\,\%\), respectively.

Fig. 5
figure 5

Average bias (first row) and average MSE (second row) of \(\widehat{\beta _1},\widehat{\beta _2}\) and \(\widehat{\sigma ^2}\) from the SMN-CR models for level of censoring \(p=25\,\%\)

Fig. 6
figure 6

Average bias (first row) and MSE (second row) of \(\widehat{\beta _1},\widehat{\beta _2}\) and \(\widehat{\sigma ^2}\) from the SMN-CR models for different levels of censoring \(p=45\,\%\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garay, A.M., Lachos, V.H., Bolfarine, H. et al. Linear censored regression models with scale mixtures of normal distributions. Stat Papers 58, 247–278 (2017). https://doi.org/10.1007/s00362-015-0696-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0696-9

Keywords

Navigation