Skip to main content

Shape mixtures of skew-t-normal distributions: characterizations and estimation


This paper introduces the shape mixtures of the skew-t-normal distribution which is a flexible extension of the skew-t-normal distribution as it contains one additional shape parameter to regulate skewness and kurtosis. We study some of its main characterizations, showing in particular that it is generated through a mixture on the shape parameter of the skew-t-normal distribution when the mixing distribution is normal. We develop an Expectation Conditional Maximization Either algorithm for carrying out maximum likelihood estimation. The asymptotic standard errors of estimators are obtained via the information-based approximation. The numerical performance of the proposed methodology is illustrated through simulated and real data examples.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Adcock C, Eling M, Loperfido N (2015) Skewed distributions in finance and actuarial science: a review. Eur J Financ 21:1253–1281

    Article  Google Scholar 

  2. Aitken AC (1927) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305

    Article  MATH  Google Scholar 

  3. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd International symposium on information theory. Akademiai Kiado, Budapest, pp 267–281

    Google Scholar 

  4. Anscombe FJ, Glynn WJ (1983) Distribution of the kurtosis statistic \(b_2\) for normal statistics. Biometrika 70:227–234

    MathSciNet  MATH  Google Scholar 

  5. Arellano-Valle RB, Castro LM, Genton MG, Gómez HW (2008) Bayesian inference for shape mixtures of skewed distributions, with application to regression analysis. Bayesian Ana 3:513–540

    MathSciNet  Article  MATH  Google Scholar 

  6. Arellano-Valle RB, G ómez HW, Quintana FA (2004) A new class of skew-normal distributions. Commun Stat Theory Methods 33:1465–1480

    MathSciNet  Article  MATH  Google Scholar 

  7. Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  MATH  Google Scholar 

  8. Azzalini A with the collaboration of Capitanio A (2014) The skew-normal and related families, IMS monographs. Cambridge University Press, Cambridge

  9. Azzalini A, Capitaino A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J R Stat Soc B 65:367–389

    MathSciNet  Article  MATH  Google Scholar 

  10. Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113

    MathSciNet  Article  MATH  Google Scholar 

  11. Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton

    MATH  Google Scholar 

  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38

    MATH  Google Scholar 

  13. D’Agostino RB (1970) Transformation to normality of the null distribution of g1. Biometrika 57:679–681

    MATH  Google Scholar 

  14. Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information. Biometrika 65:457–482

    MathSciNet  Article  MATH  Google Scholar 

  15. Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–75

    MathSciNet  Article  MATH  Google Scholar 

  16. Eling M (2012) Fitting insurance claims to skewed distributions: are the skew-normal and skew-student good models? Insur Math Econ 51:239–248

    MathSciNet  Article  Google Scholar 

  17. Eling M (2014) Fitting asset returns to skewed distributions: are the skew-normal and skew-student good models? Insur Math Econ 59:45–56

    MathSciNet  Article  MATH  Google Scholar 

  18. Ferreira CS, Bolfarine H, Lachos VH (2011) Skew scale mixture of normal distributions: properties and estimation. Stat Method 8:154–171

    MathSciNet  Article  MATH  Google Scholar 

  19. Gómez HW, Venegas O, Bolfarine H (2007) Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18:395–407

    MathSciNet  Article  Google Scholar 

  20. Ho HJ, Lin TI, Chang HH, Haase HB, Huang S, Pyne S (2012a) Parametric modeling of cellular state transitions as measured with flow cytometry different tissues. BMC Bioinform 13:S5

    Article  Google Scholar 

  21. Ho HJ, Pyne S, Lin TI (2012b) Maximum likelihood inference for mixtures of skew student-\(t\)-normal distributions through practical EM-type algorithms. Stat Comput 22:287–299

    MathSciNet  Article  MATH  Google Scholar 

  22. Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 221–233

  23. Jamalizadeh A, Lin TI (2017) A general class of scale-shape mixtures of skew-normal distributions: properties and estimation. Comput Stat 32:451–474

    MathSciNet  Article  MATH  Google Scholar 

  24. Lin TI, Lee JC, Yen SY (2007) Finite mixture modelling using the skew normal distribution. Stat Sin 17:909–927

    MathSciNet  MATH  Google Scholar 

  25. Lin TI, Ho HJ, Lee CR (2014) Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24:531–546

    MathSciNet  Article  MATH  Google Scholar 

  26. Liu CH, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648

    MathSciNet  Article  MATH  Google Scholar 

  27. Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    MathSciNet  Article  MATH  Google Scholar 

  28. Meng XL, van Dyk D (1997) The EM algorithm-an old folk-song sung to a fast new tune (with discussion). J R Stat Soc B 59:511–567

    Article  MATH  Google Scholar 

  29. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MathSciNet  Article  MATH  Google Scholar 

  30. Smirnov NV (1948) Tables for estimating the goodness of fit of empirical distributions. Ann Math Stat 19:279–281

    MathSciNet  Article  MATH  Google Scholar 

  31. Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput Stat 28:751–769

    MathSciNet  Article  MATH  Google Scholar 

  32. Wu CFJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103

    MathSciNet  Article  MATH  Google Scholar 

  33. Wu LC (2014) Variable selection in joint location and scale models of the skew-\(t\)-normal distribution. Commun Stat Simul Comput 43:615–630

    MathSciNet  Article  MATH  Google Scholar 

Download references


We gratefully acknowledge the chief editor, the associate editor and two anonymous referees for their valuable comments and suggestions, which led to a greatly improved version of this article. This research was supported by MOST 105-2118-M-005-003-MY2 awarded by the Ministry of Science and Technology of Taiwan.

Author information



Corresponding author

Correspondence to Tsung-I Lin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 537 KB)


Appendix A: The score function and Hessian matrix

From (14), the log-likelihood function corresponding to the jth observation is

$$\begin{aligned} \log f(y_{j};\varvec{\theta })= & {} -\log \sigma -\frac{1}{2}\log \nu +\log \varGamma \left( \frac{\nu +1}{2}\right) -\log \varGamma \left( \frac{\nu }{2}\right) \nonumber \\- & {} \frac{\nu +1}{2} \log \left( 1+\frac{u_{j}^{2}}{\nu }\right) +\log \varPhi \left( \frac{\lambda u_{j}}{\sqrt{1+\alpha u_{j}^{2}}}\right) , \end{aligned}$$

where \(u_{j}=(y_{j}-\xi )/\sigma \). Let \(s_{j}(\varvec{\theta })=(s_{j,\xi },s_{j,\sigma },s_{j,\lambda },s_{j,\alpha },s_{j,\nu })\) be a \(5\times 1\) vector. Based on the definition of \(s_{j}(\varvec{\theta })\) in Sect. 3.2, explicit expressions for the components of \(s_{j}(\varvec{\theta })\) are obtained by differentiation from (A.1) with respect to each parameter. They are given by

$$\begin{aligned} s_{j,\xi }= & {} \frac{1}{\sigma }\left( \zeta _{j}u_{j}-\frac{\lambda R_{j} }{\omega _{j}}\right) ,~\quad s_{j,\sigma }=\frac{1}{\sigma }\left( \zeta _{j}u_{j}^{2}-1-\frac{\lambda R_{j}u_{j}}{\omega _{j}}\right) ,\nonumber \\ s_{j,\lambda }= & {} \frac{R_{j}u_{j}}{\omega _{j}^{1/3}},~\quad s_{j,\alpha }=-\frac{\lambda R_{j}u_{j}^{3}}{2\omega _{j}},\nonumber \\ s_{j,\nu }= & {} \frac{1}{2}\left\{ \mathrm{DG}\left( \frac{\nu +1}{2}\right) -\mathrm{DG}\left( \frac{\nu }{2}\right) -\frac{1}{\nu }-\log \left( 1+\frac{u_{j}^{2} }{\nu }\right) +\zeta _{j}\frac{u_{j}^{2}}{\nu }\right\} . \end{aligned}$$

where \(\zeta _{j}=(\nu +1)/(\nu +u_{j}^{2})\), \(\omega _{j}=(1+\alpha u_{j}^{2})^{3/2}\) and \(R_{j}=\phi (\lambda u_{j}\omega _{j}^{-1/3})/\varPhi (\lambda u_{j}\omega _{j}^{-1/3})\). The Hessian matrix consisting of the second partial derivatives of the SMSTN log-likelihood takes the form of

$$\begin{aligned} H_{j}(\varvec{\theta })=\left[ \begin{array} [c]{ccccc} H_{j}^{\xi \xi } &{} H_{j}^{\xi \sigma } &{} H_{j}^{\xi \lambda } &{} H_{j}^{\xi \alpha } &{} H_{j}^{\xi \nu }\\ H_{j}^{\sigma \xi } &{} H_{j}^{\sigma \sigma } &{} H_{j}^{\sigma \lambda } &{} H_{j}^{\sigma \alpha } &{} H_{j}^{\sigma \nu }\\ H_{j}^{\lambda \xi } &{} H_{j}^{\lambda \sigma } &{} H_{j}^{\lambda \lambda } &{} H_{j}^{\lambda \alpha } &{} H_{j}^{\lambda \nu }\\ H_{j}^{\alpha \xi } &{} H_{j}^{\alpha \sigma } &{} H_{j}^{\alpha \lambda } &{} H_{j}^{\alpha \alpha } &{} H_{j}^{\alpha \nu }\\ H_{j}^{\nu \xi } &{} H_{j}^{\nu \sigma } &{} H_{j}^{\nu \lambda } &{} H_{j}^{\nu \alpha } &{} H_{j}^{\nu \nu } \end{array} \right] . \end{aligned}$$

The detailed expressions for the components of \(H_{j}(\varvec{\theta })\) are shown below.

$$\begin{aligned} H_{j}^{\xi \xi }= & {} -\frac{1}{\sigma ^{2}}\left\{ \frac{\zeta _{j}(\nu -u_{j}^{2})}{(\nu +u_{j}^{2})}+\frac{\lambda R_{j}}{\omega _{j}^{2}}(A_j+3\alpha u_{j} \omega _{j}^{1/3}) \right\} , \nonumber \\ H_{j}^{\xi \sigma }= & {} -\frac{1}{\sigma ^{2}}\left\{ \frac{2\zeta _{j}\nu u_{j}}{(\nu +u_{j}^{2})}-\frac{\lambda R_{j}}{\omega _{j} }+\frac{\lambda R_{j}u_{j}}{\omega _{j}^{2}}(A_j+3\alpha u_{j}\omega _{j}^{1/3}) \right\} ,\nonumber \\ H_{j}^{\sigma \sigma }= & {} -\frac{1}{\sigma ^{2}}\left\{ \frac{\zeta _{j}u_{j}^{2} }{(\nu +u_{j}^{2})}\left( 3\nu +u_{j}^{2}\right) -1-\frac{2\lambda R_{j}u_{j} }{\omega _{j}}+\frac{\lambda R_{j}u_{j}^{2}}{\omega _{j}^{2}}(A_j+3\alpha u_{j}\omega _{j} ^{1/3})\right\} ,\nonumber \\ H_{j}^{\xi \lambda }= & {} -\frac{R_{j}}{\sigma \omega _{j} }(1-u_{j}\omega _{j}^{-1/3}A_j), \quad H_{j}^{\sigma \lambda }=-\frac{R_{j}u_{j}}{\sigma \omega _{j}}(1-u_{j}\omega _{j}^{-1/3}A_j), \nonumber \\ H_{j}^{\lambda \lambda }= & {} -\frac{R_{j}u_{j}^{2}}{\omega _{j}^{2/3}}(R_{j}+\lambda u_{j}\omega _{j}^{-1/3}), \quad H_{j}^{\alpha \lambda } =-\frac{R_{j}u_{j}^{3} }{2\omega _{j}}(1-u_{j}\omega _{j}^{-1/3}A_j), \nonumber \\ H_{j}^{\xi \alpha }= & {} -\frac{\lambda R_{j}u_{j}^{2} }{2\sigma \omega _{j}^{2}}(u_jA_j-3\omega _{j}^{1/3}),\quad H_{j}^{\sigma \alpha } = -\frac{\lambda R_{j}u_{j}^{3} }{2\sigma \omega _{j}^{2}}(u_jA_j -3\omega _{j}^{1/3}), \nonumber \\ H_{j}^{\alpha \alpha }= & {} -\frac{\lambda R_{j}u_{j}^{5}}{4\omega _{j}^{2} }(u_jA_j-3\omega _{j}^{1/3}),\quad H_{j}^{\xi \nu } = -\frac{1}{\sigma }\frac{u_{j}(1-u_{j}^{2} )}{(\nu +u_{j}^{2})^{2}}, \nonumber \\ H_{j}^{\nu \nu }= & {} \frac{1}{2}\left\{ \frac{1}{2}TG\left( \frac{\nu +1}{2}\right) -\frac{1}{2}TG\left( \frac{\nu }{2}\right) +\frac{\nu +u_{j}^{4} }{\nu \left( \nu +u_{j}^{2}\right) ^{2}}\right\} ,\nonumber \\ H_{j}^{\sigma \nu }= & {} -\frac{1}{\sigma }\frac{u_{j}^{2}(1-u_{j}^{2})}{(\nu +u_{j}^{2})^{2}},\quad H_{j}^{\lambda \nu }=H_{j}^{\alpha \nu }=0, \end{aligned}$$

where \(H_j^{\theta _r\theta _s}=H_j^{\theta _s\theta _r}\), \(A_j=\lambda R_{j}+\lambda ^{2}u_{j}\omega _{j}^{-1/3}\) and \(\mathrm{TG}(x)=d^{2}\log \varGamma (x)/dx^{2}\) is the trigamma function.

Appendix B: The procedure of the Kolmogorov–Smirnov test for continuous data

  1. 1.

    Sort data values into ascending order \(y_{(1)}\le y_{(2)}\le \cdots \le y_{(n)}\).

  2. 2.

    Compute the KS test statistic

    $$\begin{aligned} D=\mathop {\max _{j=1,\ldots ,n}}\left\{ \frac{j}{n}-\hat{F}(y_{(j)}),~\hat{F}(y_{(j)})-\frac{j-1}{n}\right\} , \end{aligned}$$

    where \(\hat{F}(\cdot )\) is the fitted cdf under a specific distribution.

  3. 3.

    Generate n random number from U(0, 1) and sort them into ascending order, we have \(u^{(i)}_{(1)}\le u^{(i)}_{(2)} \le \cdots \le u^{(i)}_{(n)}\) for \(i=1,\ldots ,n\).

  4. 4.


    $$\begin{aligned} d_i=\mathop {\max _{j=1,\ldots ,n}}\left\{ \frac{j}{n}-u_{(j)}^{(i)},~u_{(j)}^{(i)}-\frac{j-1}{n}\right\} . \end{aligned}$$
  5. 5.

    Set \(I_i=1\) if \(d_i\ge D\) and 0 otherwise. Repeat Steps 3 and 4 M times, we get \(I_1,...,I_M\). The p-value is estimated by \(\sum ^{M}_{i=1} I_i/M\).


Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tamandi, M., Jamalizadeh, A. & Lin, TI. Shape mixtures of skew-t-normal distributions: characterizations and estimation. Comput Stat 34, 323–347 (2019).

Download citation


  • Asymmetry
  • ECME algorithm
  • Observed information matrix
  • Robustness
  • Skew-symmetric
  • Truncated normal