Skip to main content
Log in

Maximum Likelihood Estimation for the Asymmetric Exponential Power Distribution

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

The asymmetric exponential power (AEP) distribution has received much attention in economics and finance. Simulation study shows that iterative methods developed for finding the maximum likelihood (ML) estimates of the AEP distribution sometimes fail to converge. In this paper, the expectation–maximization (EM) algorithm is proposed to find the ML estimates of the AEP distribution which always converges. Performance of the EM algorithm is demonstrated by simulations and a real data illustration. As an application, the proposed EM algorithm is applied to find the ML estimates for the regression coefficients when the error term in a linear regression model follows the AEP distribution. Performance of the AEP distribution in robust simple regression modelling is established through a real data illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Ayebo, A., & Kozubowski, T. J. (2003). An asymmetric generalization of Gaussian and Laplace laws. Journal of Probability and Statistical Science, 1, 187–210.

    Google Scholar 

  • Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171–178.

    Google Scholar 

  • Basso, R. M., Lachos, V. H., Cabral, C. R. B., & Ghosh, P. (2010). Robust mixture modeling based on scale mixtures of skew-normal distributions. Computational Statistics & Data Analysis, 54, 2926–2941.

    Article  Google Scholar 

  • Butler, R. J., McDonald, J. B., Nelson, R. D., & White, S. B. (1990). Robust and partially adaptive estimation of regression models. The Review of Economics and Statistics, 72, 321–327.

    Article  Google Scholar 

  • Christoffersen, P., Dorion, C., Jacobs, K., & Wang, Y. (2010). Volatility components, affine restrictions, and nonnormal innovations. Journal of Business & Economic Statistics, 28, 483–502.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1–38.

    Google Scholar 

  • Devroye, L. (2009). Random variate generation for exponentially and polynomially tilted stable distributions. ACM Transactions on Modeling and Computer Simulation (TOMACS), 19, 1–20.

    Article  Google Scholar 

  • DiCiccio, T. J., & Monti, A. C. (2004). Inferential aspects of the skew exponential power distribution. Journal of the American Statistical Association, 99, 439–450.

    Article  Google Scholar 

  • Diebolt, J., & Celeux, G. (1993). Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Stochastic Models, 9, 599–613.

    Article  Google Scholar 

  • Duan, J. C. (1999). Conditionally fat-tailed distributions and the volatility smile in options. Rotman School of Management, University of Toronto, Working Paper.

  • Eugene, N., Lee, C., & Famoye, F. (2002). Beta-normal distribution and its applications. Communications in Statistics-Theory and Methods, 31, 497–512.

    Article  Google Scholar 

  • Fama, E. F. (1963). Mandelbrot and the stable Paretian hypothesis. The Journal of Business, 36, 420–429.

    Article  Google Scholar 

  • Fama, E. F. (1965). The behavior of stock-market prices. The Journal of Business, 38, 34–105.

    Article  Google Scholar 

  • Fernandez, C., Osiewalski, J., & Steel, M. F. (1995). Modeling and inference with \(\upsilon \)-spherical distributions. Journal of the American Statistical Association, 90, 1331–1340.

    Google Scholar 

  • Gilks, W. R., & Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41, 337–348.

    Google Scholar 

  • Hsieh, D. A. (1989). Modeling heteroscedasticity in daily foreign-exchange rates. Journal of Business & Economic Statistics, 7, 307–317.

    Google Scholar 

  • Ip, E. (1994). A stochastic EM estimator for handling missing data. Unpublished Ph. D. Thesis, Department of Statistics, Stanford University.

  • Karlis, D., & Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41, 577–590.

    Article  Google Scholar 

  • Komunjer, I. (2007). Asymmetric power distribution: Theory and applications to risk measurement. Journal of Applied Econometrics, 22, 891–921.

    Article  Google Scholar 

  • Lee, S., & McLachlan, G. J. (2014). Finite mixtures of multivariate skew t-distributions: Some recent and new results. Statistics and Computing, 24, 181–202.

    Article  Google Scholar 

  • Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20, 343–356.

    Article  Google Scholar 

  • Lin, T. I., Lee, J. C., & Hsieh, W. J. (2007a). Robust mixture modeling using the skew t distribution. Statistics and Computing, 17, 81–92.

    Article  Google Scholar 

  • Lin, T. I., Lee, J. C., & Yen, S. Y. (2007b). Finite mixture modelling using the skew normal distribution. Statistica Sinica, 17, 909–927.

    Google Scholar 

  • Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, i-163. JSTOR.

  • Mandelbort, B. (1963). The variation of certain speculative prices. The Journal of Business, 36, 394–419.

    Article  Google Scholar 

  • McLachlan, G., & Krishnan, T. (2007). The EM algorithm and extensions (Vol. 382). Wiley.

    Google Scholar 

  • McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 658–666). Springer.

  • Mudholkar, G. S., & Hutson, A. D. (2000). The epsilon–skew–normal distribution for analyzing near-normal data. Journal of Statistical Planning and Inference, 83, 291–309.

    Article  Google Scholar 

  • Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59, 347–370.

    Article  Google Scholar 

  • Nolan, J. P. (1998). Parameterizations and modes of stable distributions. Statistics & Probability Letters, 38, 187–195.

    Article  Google Scholar 

  • Nolan, J. P., & Ojeda-Revah, D. (2013). Linear and nonlinear regression with stable errors. Journal of Econometrics, 172, 186–194.

    Article  Google Scholar 

  • Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54, 1–20.

    Article  Google Scholar 

  • Rachev, S. T. (2003). Handbook of heavy tailed distributions in finance: Handbooks in finance, Book 1. Elsevier.

    Google Scholar 

  • Rachev, S. T., & Mittnik, S. (2000). Stable Paretian models in finance (Vol. 7). Wiley.

    Google Scholar 

  • Samorodnitsky, G., & Taqqu, M. S. (1994). Stable non-Gaussian random processes: Stochastic models with infinite variance. Chapman & Hall.

    Google Scholar 

  • Teimouri, M., Doser, J. W., & Finley, A. O. (2020). Forestfit: An R package for modeling plant size distributions. Environmental Modelling & Software, 131(2020), 104668.

    Article  Google Scholar 

  • Teimouri, M., Rezakhah, S., & Mohammadpour, A. (2018). Em algorithm for symmetric stable mixture model. Communications in Statistics-Simulation and Computation, 47, 582–604.

    Article  Google Scholar 

  • Theodossiou, P. (2015). Skewed generalized error distribution of financial assets and option pricing. Multinational Finance Journal, 19, 223–266.

    Article  Google Scholar 

  • Zhu, D., & Galbraith, J. W. (2010). A generalized asymmetric student-t distribution with application to financial econometrics. Journal of Econometrics, 157, 297–305.

    Article  Google Scholar 

  • Zhu, D., & Zinde-Walsh, V. (2009). Properties and estimation of asymmetric exponential power distribution. Journal of Econometrics, 148, 86–99.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor and the two referees for careful reading and comments which greatly improved the paper.

Funding

No funds received for this research.

Author information

Authors and Affiliations

Authors

Contributions

MT - methods, SN - simulations and data analysis.

Corresponding author

Correspondence to Saralees Nadarajah.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Ethical Standard

We have complied with all ethical standards. Research did not involve human participants and/or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Theorem 2.1

The pdf of Y in (2) can be rewritten as

$$\begin{aligned} \displaystyle f_{Y}(y | {\varvec{\theta }})=\frac{1}{2\sigma \Gamma \left( 1+\frac{1}{\alpha }\right) } \exp \left\{ -\left| \frac{y-\mu }{\sigma \left[ 1+\mathrm{sign}(y-\mu )\epsilon \right] }\right| ^{\alpha }\right\} . \end{aligned}$$

Defining \(Y= X / \sqrt{2W} +\mu \), we can write

$$\begin{aligned} \displaystyle F_{Y} \left( y | {\varvec{\theta }}\right) = P\left( Y\le y\right) =\int _{0}^{\infty }P\left( X\le \frac{y-\mu }{\sigma }\sqrt{2w}\right) f_{W}(w)dw. \end{aligned}$$
(14)

Differentiating the right-hand side of (14) with respect to y yields

$$\begin{aligned} \displaystyle f_{Y}(y|{\varvec{\theta }})=\int _{0}^{\infty }\frac{\sqrt{2w}}{\sigma }f_{X}\left( \frac{y-\mu }{\sigma }\sqrt{2w}\right) f_{W}(w)dw. \end{aligned}$$
(15)

The pdf of X in (4) can be rewritten as

$$\begin{aligned} \displaystyle f_{X}(x)=\frac{1}{\sqrt{2\pi }} \exp \left\{ -\frac{x^2}{2 \left[ 1+\mathrm{sign}(x)\epsilon \right] ^2}\right\} . \end{aligned}$$
(16)

Substituting the pdf of X in (16) into the right-hand side of (15), we see that

$$\begin{aligned} \displaystyle f_{Y}(y | {\varvec{\theta }}) = \int _{0}^{\infty }\frac{\sqrt{w}}{\sigma }\frac{1}{\sqrt{\pi }} \exp \left\{ -\frac{(y-\mu )^2}{\sigma ^2 \left[ 1+\mathrm{sign}(y-\mu )\epsilon \right] ^2}w\right\} f_{W}(w)dw. \end{aligned}$$
(17)

Further, substituting the pdf of W in (1) into the right-hand side of (17) yields

$$\begin{aligned} \displaystyle f_{Y}(y|\theta )&= \displaystyle \frac{\Gamma (1+1/2)}{\Gamma (1+1/\alpha )}\int _{0}^{\infty } \frac{\sqrt{w}}{\sigma }\frac{1}{\sqrt{\pi }} \exp \left\{ -\frac{(y-\mu )^2}{\sigma ^2 \left[ 1+\mathrm{sign}(y-\mu )\epsilon \right] ^2}w\right\} \frac{f_{P}(w)}{\sqrt{w}}dw \nonumber \\&= \displaystyle \frac{1}{2\sigma \Gamma (1+1/\alpha )}\int _{0}^{\infty } \exp \left\{ -\frac{(y-\mu )^2}{\sigma ^2 \left[ 1+\mathrm{sign}(y-\mu )\epsilon \right] ^2}w\right\} f_{P}(w)dw. \end{aligned}$$
(18)

The integral in the right-hand side of (18) is the chf of a positive \(\alpha \)-stable random variable, say P. It follows that (Samorodnitsky & Taqqu, 1994, p. 16)

$$\begin{aligned} \displaystyle E \left[ \exp (\lambda P) \right] =\exp \left( -\lambda ^{\frac{\alpha }{2}}\right) , \ \lambda \ge 0. \end{aligned}$$
(19)

Utilizing (19) for the right-hand side of (18), we have

$$\begin{aligned} \displaystyle f_{Y}(y|\theta )=\frac{1}{2\sigma \Gamma (1+1/\alpha )} \exp \left\{ -\left| \frac{y-\mu }{\sigma \left[ 1+\mathrm{sign}(y-\mu )\epsilon \right] }\right| ^{\alpha }\right\} , \end{aligned}$$

where \(0<\alpha \le 2\), \(\sigma \in {{{\mathbb {R}}}}^{+}\), \(\mu \in {{\mathbb {R}}}\) and \(-1<\epsilon <+1\). The proof is complete.

Appendix 2: Computing \(E\left( W\big | y,\theta \right) \)

By definition, we have

$$\begin{aligned} \displaystyle E\left( W\big |y,\theta \right) =\frac{\int _{0}^{\infty }w f_{W}(w)f_{Y|W}\left( y\right) dw}{f_{Y}\left( y\big |\theta \right) }=\frac{I}{f_{Y}\left( y\big |\theta \right) }. \end{aligned}$$
(20)

Using the pdf of Y given W in (5), we can write

$$\begin{aligned} \displaystyle I&= \displaystyle \int _{0}^{\infty } w\frac{\Gamma \left( 1+1/2\right) }{\Gamma \left( 1+1/\alpha \right) } \frac{f_{P}(w)}{\sqrt{w}} \frac{\sqrt{w}}{\sqrt{\pi }\sigma } \exp \left\{ -\left[ \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right] ^{2}w\right\} dw \\&= \displaystyle \int _{0}^{\infty } \frac{w}{2\sigma \Gamma \left( 1+1/\alpha \right) } f_{P}(w)\exp \left\{ -\left[ \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right] ^{2}w\right\} dw. \end{aligned}$$

Equivalently,

$$\begin{aligned} \displaystyle I= & {} \frac{\sigma ^2 }{4\Gamma \left( 1+1/\alpha \right) } \left[ \frac{y-\mu }{\left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right] ^{-2} \frac{\partial }{\partial \sigma } \nonumber \\&\int _{0}^{\infty } f_{P}(w)\exp \left\{ -\left[ \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right] ^{2}w\right\} dw. \end{aligned}$$
(21)

Utilizing this fact that the integral in the right-hand side of (21) is the chf of a positive \(\alpha \)-stable random variable, say P, we can write

$$\begin{aligned} \displaystyle I&= \displaystyle \frac{\sigma ^2 }{4\Gamma \left( 1+1/\alpha \right) } \left[ \frac{y-\mu }{\left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right] ^{-2} \frac{\partial }{\partial \sigma } \exp \left\{ -\left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right| ^{\alpha }\right\} \nonumber \\&= \displaystyle \frac{\alpha }{4\sigma \Gamma \left( 1+1/\alpha \right) } \left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) } \right| ^{\alpha -2} \exp \left\{ -\left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right| ^{\alpha }\right\} . \end{aligned}$$
(22)

Substituting the right-hand side of (22) into the right-hand side of (20), we have

$$\begin{aligned} \displaystyle E\left( W\big |y,\theta \right)&= \displaystyle \frac{1}{f_{Y}(y|\theta )}{\frac{\alpha }{4\sigma \Gamma \left( 1+1/\alpha \right) } \left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right| ^{\alpha -2}} \\&\quad \exp \left\{ -\left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right| ^{\alpha }\right\} \\&= \displaystyle \frac{\alpha }{2}\left| \frac{y-\mu }{\sigma \left( 1+\mathrm{sign}(y-\mu )\epsilon \right) }\right| ^{\alpha -2}. \end{aligned}$$

The result follows.

Appendix 3: Algorithm for simulating from AEP distribution

In the first part of the proposed two-step algorithm, we simulate W through steps (a)–(o) using the method given in Devroye (2009). In the second part, i.e., steps (p)–(r), we simulate X. Finally, having X and W, we simulate the AEP distribution in step (s) of the following algorithm:

  1. (a)

    read \(\alpha \), \(\sigma \), \(\mu \), and \(\epsilon \);

  2. (b)

    set \(\delta =\sqrt{2}/\sqrt{1-\alpha /2}\);

  3. (c)

    define \(B(t)=\sin (t)^{1/\alpha }\left\{ \sqrt{\sin (\alpha t/2)}\left[ \sin \left( (1-\alpha /2)t\right) \right] ^{(2-\alpha )/(2\alpha )}\right\} ^{-1}\);

  4. (d)

    define \(B(0)=(\alpha /2)^{-1/2}(1-\alpha /2)^{\alpha /2-1}\);

  5. (e)

    if \(\delta \ge \sqrt{2\pi }\):

  6. (f)

    repeat: generate independent random variables U and V, where \(U\sim U(0,1)\) and \(V\sim U(0,\pi )\);

  7. (g)

    until \(VB(0)<B(U)\);

  8. (h)

    set \(Y=U\);

  9. (i)

    if \(\delta < \sqrt{2\pi }\):

  10. (j)

    repeat: generate independent random variables N and V, where \(N\sim N(0,1)\) and \(V\sim U(0,1)\);

  11. (k)

    until \(\delta |N| <\pi \) and \(V\ B(0)\ \exp \left( -N^2/2 \right) <B \left( \delta |N|\right) \);

  12. (l)

    set Y=\(\delta |N|\);

  13. (m)

    set \(A=\left\{ \sin (\alpha Y/2) \left[ \sin \left( Y(1-\alpha /2)\right) \right] ^{1-\alpha /2} \right\} ^{2/(2-\alpha )} \div \left[ \sin (Y)\right] ^{2/(2-\alpha )}\);

  14. (n)

    generate a gamma random variable G with shape parameter \((\alpha +2)/(2\alpha )\);

  15. (o)

    set \(W= \sqrt{2}\left( A/G\right) ^{(2-\alpha )/(2\alpha )}\);

  16. (p)

    generate a Bernoulli random variable, say B, with success probability \((1+\epsilon )/2\);

  17. (q)

    generate two independent standard normal random variables, say \(N_1\) and \(N_2\);

  18. (r)

    set \(X=B(1+\epsilon ) \left| N_1 \right| -(1-B)(1-\epsilon ) \left| N_2 \right| \);

  19. (s)

    set \(\sigma X/W+\mu \) as a realization from the AEP distribution.

Appendix 4: Finding initial values

Let \({\varvec{y}} = \left( y_1, \ldots , y_n \right) ^{T}\) denote a vector of n independent realizations from the AEP distribution. Here, we propose a simple method to find the vector of initial values \({\varvec{\theta }}^{(0)}=\left( \alpha ^{(0)},\sigma ^{(0)},\mu ^{(0)},\epsilon ^{(0)}\right) ^{T}\) for starting the EM algorithm. Let \(\mathcal{Q}(p)\) denote the pth sample quantile of \({\varvec{y}}\) for \(0<p<1\). We find an initial value for the skewness parameter, i.e., \(\epsilon ^{(0)}\), as follows:

$$\begin{aligned} \displaystyle \epsilon ^{(0)}=\frac{\mathcal{Q}(0.8)-2 \mathcal{Q }(0.5)+\mathcal{Q}(0.2)}{\mathcal{Q}(0.8)-\mathcal{Q}(0.2)}. \end{aligned}$$

By (3), \(P(Y-\mu<0)=P(X<0)\). Further, \(P(X<0)=(1-\epsilon )/2\). So,

$$\begin{aligned} \displaystyle \mu ^{(0)}=\mathcal{Q}\left( \frac{1-\epsilon }{2}\right) . \end{aligned}$$

An initial value for the tail thickness parameter, \(\alpha \), can be obtained by assuming that \(\epsilon =0\): \(\alpha ^{(0)}\) is root of the nonlinear equation

$$\begin{aligned} \displaystyle h(\alpha )=n\frac{\sum _{i=1}^{n}\left( y_i-{\overline{y}}\right) ^4}{\left[ \sum _{i=1}^{n}\left( y_i-{\overline{y}}\right) ^2\right] ^2} - \frac{\Gamma \left( \frac{5}{\alpha }\right) \Gamma \left( \frac{1}{\alpha }\right) }{\left[ \Gamma \left( \frac{3}{\alpha }\right) \right] ^2}=0, \end{aligned}$$

where \({\overline{y}}\) denotes the mean of \({\varvec{y}}\). Finally, we note that

$$\begin{aligned} \displaystyle E(Y)&= \displaystyle \mu +\sigma E(X) E\left( \frac{1}{\sqrt{2W}}\right) \nonumber \\&= \displaystyle \mu +\sigma \frac{2 \epsilon }{\sqrt{\pi }}E\left( \frac{1}{\sqrt{2W}}\right) \nonumber \\&= \displaystyle \mu +\frac{2\sigma \epsilon }{\sqrt{\pi }}\frac{\sqrt{\pi }\Gamma \left( 1+2/\alpha \right) }{2\Gamma \left( 1+1/\alpha \right) }, \end{aligned}$$
(23)

where E(X) is given in Devroye (2009) and \(E\left( 1/\sqrt{2W}\right) \) can be found in Mudholkar and Hutson (2000). Extracting \(\sigma \) from the right-hand side of (23) yields an initial value for \(\sigma \) as follows:

$$\begin{aligned} \displaystyle \sigma ^{(0)}=\frac{{\overline{y}}-\mu ^{(0)}}{\epsilon ^{(0)}}\frac{\Gamma \left( 1+2/\alpha ^{(0)}\right) }{\Gamma \left( 1+2/\alpha ^{(0)}\right) }, \end{aligned}$$

where \(\epsilon ^{(0)} \ne 0\). If \(\epsilon ^{(0)} = 0\), we have

$$\begin{aligned} \displaystyle \sigma ^{(0)}=\frac{1}{n}\Gamma \left( \frac{1}{\alpha ^{(0)}}\right) \sum _{i=1}^{n}\left( y_i-{\overline{y}}\right) ^2\div \Gamma \left( \frac{3}{\alpha ^{(0)}}\right) . \end{aligned}$$

Appendix 5: Computing the Observed Fisher Information Matrix (OFIM)

Computing the OFIM has two parts. In the first part, we compute the OFIM for the parameters of the AEP distribution. The second part is devoted to computing the OFIM for estimates of the regression model parameters.

  1. 1.

    First part: We follow the method used by Lin et al. (2007b) for computing the OFIM. Let \(L\left( \varvec{\theta }\right) \) denote the incomplete log-likelihood function defined as

    $$\begin{aligned} \displaystyle l\left( \varvec{\theta }\right) =\sum _{i=1}^{n}\log f_{Y} \left( y_i|\varvec{\theta } \right) , \end{aligned}$$

    where \(f_{Y} \left( y_i|\varvec{\theta }\right) \) is defined as in (2). The OFIM denoted by \(I_{{\varvec{y}}}\) is given by

    $$\begin{aligned} \displaystyle I_{{\varvec{y}}}=-\frac{\partial ^2 l(\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^T}. \end{aligned}$$

    Under some regularity conditions, the inverted \(I_{{\varvec{y}}}\), i.e., \(I^{-1}_{{\varvec{y}}}\) is an approximation of the variance-covariance matrix of the ML estimator \(\widehat{\varvec{\theta }}\). We have

    $$\begin{aligned} \displaystyle I_{{\varvec{y}}}=\sum _{i=1}^{n}\widehat{\mathbf{D}}_{i}\widehat{\mathbf{D}}_{i}^{T}, \end{aligned}$$

    where

    $$\begin{aligned} \displaystyle \widehat{\mathbf{D}}_{i}=\left( {\widehat{D}}_{i1}, \ldots , {\widehat{D}}_{i4}\right) ^{T} = \left( \frac{\partial l(\varvec{\theta })}{\partial \alpha }\Big |_{\alpha ={\widehat{\alpha }}}, \frac{\partial l(\varvec{\theta })}{\partial \sigma }\Big |_{\sigma ={\widehat{\sigma }}}, \frac{\partial l(\varvec{\theta })}{\partial \mu }\Big |_{\mu ={\widehat{\mu }}}, \frac{\partial l(\varvec{\theta })}{\partial \epsilon }\Big |_{\epsilon ={\widehat{\epsilon }}} \right) ^T. \end{aligned}$$

    The incomplete data log-likelihood function, \(l \left( \varvec{\theta }\right) \), is

    $$\begin{aligned} \displaystyle l(\varvec{\theta })=-n\log 2-n \log \sigma -n \log \Gamma (1+1/\alpha ) - \sum _{i=1}^{n}\left| \frac{y_i-\mu }{\sigma \left[ 1+\mathrm{sign} \left( y_i-\mu \right) \epsilon \right] }\right| ^{\alpha }. \end{aligned}$$

    So,

    $$\begin{aligned} \displaystyle {\widehat{D}}_{i1} =&\displaystyle \frac{\psi \left( 1+1/{\widehat{\alpha }}\right) }{{\widehat{\alpha }}^2} - \left| \frac{y_i-{\widehat{\mu }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}} \log \left| \frac{y_i-{\widehat{\mu }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] } \right| , \\ {\widehat{D}}_{i2} \displaystyle =&\displaystyle -\frac{1}{{\widehat{\sigma }}}+{\widehat{\alpha }}{{\widehat{\sigma }}}^{-{\widehat{\alpha }}-1} \left| \frac{y_i-{\widehat{\mu }}}{\left[ 1+\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}}, \\ \displaystyle {\widehat{D}}_{i3} =&\displaystyle \frac{{\widehat{\alpha }}\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) }{{\widehat{\sigma }} \left[ 1 + \mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] } \left| \frac{y_i-{\widehat{\mu }}}{{\widehat{\sigma }} \left[ 1 + \mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}-1}, \\ \displaystyle {\widehat{D}}_{i4} =&\displaystyle \frac{{\widehat{\alpha }} \mathrm{sign} \left( y_i-{\widehat{\mu }}\right) }{1+\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }}} \left| \frac{y_i-{\widehat{\mu }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\widehat{\mu }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}}, \end{aligned}$$

    where \(\psi (\cdot )\) denotes the digamma function defined by

    $$\begin{aligned} \displaystyle \psi (x)=\frac{1}{\Gamma (x)}\frac{d}{dx} \Gamma (x). \end{aligned}$$
  2. 2.

    Second part: To compute the OFIM for regression coefficient estimators, we note that the incomplete log-likelihood function becomes

    $$\begin{aligned} \displaystyle l\left( \varvec{\gamma }\right) = \sum _{i=1}^{n}\log f_{Y} \left( y_i-{\varvec{x}}_i\varvec{\beta }|\varvec{\gamma } \right) , \end{aligned}$$

    where \(\nu \) follows a zero-location AEP distribution with \(\varvec{\gamma }=\left( \varvec{\beta }^{T},\alpha ,\sigma ,\epsilon \right) ^{T}\). The OFIM is given by

    $$\begin{aligned} \displaystyle \mathcal{I}_{\mathbf{y}}=-\frac{\partial ^2 l({\varvec{\gamma }})}{\partial {\varvec{\gamma }} \partial {\varvec{\gamma }}^T}. \end{aligned}$$

    \(\mathcal{I}^{-1}_\mathbf{y}\) is an approximation of the variance-covariance matrix of the ML estimator \(\widehat{\varvec{\gamma }}\). It follows that

    $$\begin{aligned} \displaystyle \mathcal{I}_\mathbf{y}=\sum _{i=1}^{n}\widehat{\mathcal{D}}_{i}\widehat{\mathcal{D}}_{i}^{T}, \end{aligned}$$

    where

    $$\begin{aligned} \displaystyle \widehat{\mathcal{D}}_{i}=\left( \widehat{\mathcal{D}}_{i1},\ldots ,\widehat{\mathcal{D}}_{i4}\right) ^{T} = \left( \frac{\partial l(\varvec{\gamma })}{\partial \varvec{\beta }}\Big |_{\varvec{\beta }=\widehat{\varvec{\beta }}}, \frac{\partial l(\varvec{\gamma })}{\partial \alpha }\Big |_{\alpha ={\widehat{\alpha }}}, \frac{\partial l(\varvec{\gamma })}{\partial \sigma }\Big |_{\sigma ={\widehat{\sigma }}}, \frac{\partial l(\varvec{\gamma })}{\partial \epsilon }\Big |_{\epsilon ={\widehat{\epsilon }}} \right) ^T. \end{aligned}$$

    The incomplete data log-likelihood function, \(l \left( \varvec{\gamma }\right) \), is

    $$\begin{aligned} \displaystyle l\left( \varvec{\gamma }\right) = -n\log 2-n\log \sigma -n\log \Gamma (1+1/\alpha ) - \sum _{i=1}^{n}\left| \frac{y_i-{\varvec{x}}_i\varvec{\beta }}{\sigma \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\varvec{\beta } \right) \epsilon \right] }\right| ^{\alpha }. \end{aligned}$$

    So,

    $$\begin{aligned} \displaystyle {\widehat{\mathcal{D}}}_{i1} =&\displaystyle {\varvec{x}}_i\frac{{\widehat{\alpha }}\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) }{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }}\right] } \left| \frac{y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }} \right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}-1}, \\ \displaystyle {\widehat{\mathcal{D}}}_{i2} =&\displaystyle \frac{\psi \left( 1+1/{\widehat{\alpha }}\right) }{{\widehat{\alpha }}^2} - \left| \frac{y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}} \log \left| \frac{y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}}{{\widehat{\sigma }} \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }} \right] }\right| , \\ \displaystyle {\widehat{\mathcal{D}}}_{i3} =&\displaystyle -\frac{1}{{\widehat{\sigma }}}+{\widehat{\alpha }}{{\widehat{\sigma }}}^{-{\widehat{\alpha }}-1} \left| \frac{y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}}{\left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }}\right] }\right| ^{{\widehat{\alpha }}}, \\ \displaystyle {\widehat{\mathcal{D}}}_{i4} =&\displaystyle \frac{{\widehat{\alpha }} \mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) }{1+\mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }}}\left| \frac{y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}}{{\widehat{\sigma }} \left[ 1 + \mathrm{sign} \left( y_i-{\varvec{x}}_i\widehat{\varvec{\beta }}\right) {\widehat{\epsilon }} \right] }\right| ^{{\widehat{\alpha }}}. \end{aligned}$$

    It should be noted that \({\widehat{\mathcal{D}}}_{i1}\) is a vector of the same length as \(\widehat{\varvec{\beta }}\).

Appendix 6: Deriving the regression coefficient estimators

Consider the linear regression model given by

$$\begin{aligned} \displaystyle y_i={\varvec{x}}_{i}\varvec{\beta }+\nu _i, \ \displaystyle i=1,2,\ldots , n, \end{aligned}$$
(24)

where \({\varvec{x}}_{i} = \left( 1, x_{i1}, \ldots , x_{ik} \right) ^{T}\) is the ith level of the matrix of independent variables, \(\varvec{\beta } = \left( \beta _0,\beta _1,\ldots ,\beta _k\right) ^T\) is the vector of regression coefficients, and \(\nu _i\) is ith value of the error term following a zero-location AEP distribution. It is clear from (24) that \(y_i\) follows an AEP distribution with \(\mu ={\varvec{x}}_{i}\varvec{\beta }\). Following the methodology in Sect. 2, the complete data log-likelihood function would be

$$\begin{aligned} \displaystyle l_{c}(\varvec{\gamma })=\text {C}+ \sum _{i=1}^{n} \log f_{W}\left( w_i\right) - n \log \sigma -\sum _{i=1}^{n}\left\{ \frac{y_i-{\varvec{x}}_{i}\varvec{\beta }}{\sigma \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_{i}\varvec{\beta } \right) \epsilon \right] }\right\} ^{2}w_i, \end{aligned}$$

where \(\text {C}\) is a constant independent of \(\varvec{\gamma }=\left( \varvec{\beta }^{T},\alpha ,\sigma ,\epsilon \right) ^{T}\). The E- and M-steps of the EM algorithm are

  1. 1.

    E-Step: Suppose we are currently at the \((t+1)\)th iteration of the EM algorithm. Given an estimate of \(\varvec{\gamma }\), i.e., \(\varvec{\gamma }^{(t)}\), the conditional expectation of the \(l_{c} \left( \varvec{\gamma }\right) \) is

    $$\begin{aligned} \displaystyle Q\left( \varvec{\gamma }\big |\varvec{\gamma }^{(t)}\right)= & {} \text {C} +\sum _{i=1}^{n} E\left( \log f_{W} \left( w_i\right) \big | y_i, \varvec{\gamma }^{(t)}\right) - n \log \sigma \nonumber \\&- \sum _{i=1}^{n}\left\{ \frac{y_i-{\varvec{x}}_{i}\varvec{\beta }}{\sigma \left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_{i}\varvec{\beta } \right) \epsilon \right] }\right\} ^{2}\mathcal{E}^{(t)}_{i}, \end{aligned}$$
    (25)

    where

    $$\begin{aligned} \displaystyle \mathcal{E}^{(t)}_{i}=E\left( W_i\big |y_i,\varvec{\gamma }^{(t)}\right) = \frac{\alpha ^{(t)}}{2}\left\{ \frac{y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t)}}{\sigma ^{(t)}\left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t)}\right) \epsilon ^{(t)}\right] }\right\} ^{\alpha ^{(t)}-2}. \end{aligned}$$
  2. 2.

    M-step: The parameter vector \(\varvec{\gamma }^{(t)}\) is updated as \({\varvec{\gamma }}^{(t+1)}\) by maximizing the right-hand side of (25) with respect to \(\varvec{\gamma }\). The vector of regression coefficients is updated as

    $$\begin{aligned} \displaystyle {\varvec{\beta }}^{(t+1)}= & {} \left\{ \sum _{i=1}^{n} {\varvec{x}}^{T}_{i}{\varvec{x}}_{i} \frac{\mathcal{E}^{(t)}_{i}}{\left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t)}\right) \epsilon ^{(t)}\right] ^2} \right\} ^{-1} \\&\left\{ \sum _{i=1}^{n}\frac{{\varvec{x}}_{i} y_i \mathcal{E}^{(t)}_{i}}{\left[ 1+\mathrm{sign}\left( y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t)}\right) \epsilon ^{(t)}\right] ^2}\right\} . \end{aligned}$$

    The nuisance parameters are updated as follows. The updated scale parameter is given by

    $$\begin{aligned} \displaystyle \sigma ^{(t+1)}=\left\{ \frac{2}{n} \sum _{i=1}^{n} \frac{\left( y_{i}-{\varvec{x}}_{i}\varvec{\beta }^{(t+1)}\right) ^2 \mathcal{E}^{(t)}_{i}}{\left[ 1+\mathrm{sign} \left( y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t+1)}\right) \epsilon ^{(t)} \right] ^2} \right\} ^{\frac{1}{2}}. \end{aligned}$$

    The skewness parameter \(\epsilon \) is updated as \(\epsilon ^{(t+1)}\) by solving the nonlinear equation \(h(\epsilon )=0\), where

    $$\begin{aligned} \displaystyle h(\epsilon )= \sum _{i=1}^{n}\frac{\left( y_{i}-{\varvec{x}}_{i}\varvec{\beta }^{(t+1)}\right) ^2 \mathcal{E}^{(t)}_{i}}{\sigma ^{2(t+1)} \left[ 1+\mathrm{sign}\left( y_i-{\varvec{x}}_{i}\varvec{\beta }^{(t+1)}\right) \epsilon \right] ^2}. \end{aligned}$$

    The tail thickness parameter is updated, through a CM-step, by maximizing the marginal log-likelihood function as follows:

    $$\begin{aligned} \displaystyle \alpha ^{(t+1)}=\arg \max \limits _{\alpha } \ \displaystyle \sum _{i=1}^{n}\log f_{Y}\left( y_i\big |\varvec{\gamma }^{*}\right) , \end{aligned}$$

    where \(\varvec{\gamma }^{*}=\left( \varvec{\beta }^{(t+1)}, \alpha , \sigma ^{(t+1)}, \epsilon ^{(t+1)}\right) ^{T}\).

The E- and M-steps described are repeated until the convergence criterion

$$\begin{aligned} \displaystyle \sum _{i=1}^{k+4} \left| \varvec{\gamma }_{i}^{(t+1)}-\varvec{\gamma }_{i}^{(t)}\right| <10^{-5} \end{aligned}$$

is satisfied, where \(\varvec{\gamma }_{i}^{(t)}\) denotes the ith element of \(\varvec{\gamma }^{(t)}=\left( \varvec{\beta }_{0}^{(t)}, \varvec{\beta }_{1}^{(t)}, \ldots , \varvec{\beta }_{k}^{(t)}, {\alpha }^{(t)}, {\sigma }^{(t)},{\epsilon }^{(t)}\right) ^{T}\) for \(t \ge 1\). The initial values of \(\alpha \), \(\sigma \), and \(\epsilon \) are determined in the fashion described in Appendix 4. The initial values for regression coefficients are found by applying the LS technique to truncated data with the lowest and highest 20% removed.

Appendix 7: Stochastic EM algorithm

Assuming that G(3/2) denotes a gamma random variable with shape parameter 3/2. The ratio \(U=\sqrt{G(3/2)/W}\) follows the pdf given by

$$\begin{aligned} \displaystyle f_{U}(u)=\frac{\alpha u^{\alpha }\exp \left( -u^\alpha \right) }{\Gamma \left( 1+\frac{1}{\alpha }\right) }, \end{aligned}$$

which is the pdf of \(G^{1/\alpha }(1+1/\alpha )\). So, after updating \(\sigma ^{(t+1)}\), \(\mu ^{(t+1)}\), and \(\epsilon ^{(t+1)}\), in the CM-step, we construct the observed data as \( y_{i}^{*}=\sqrt{2g_i}\left( y_i-\mu ^{(t+1)}\right) /\sigma ^{(t+1)}\), where \(g_i\)s are simulated independently from a gamma distribution with shape parameter 3/2 (for \(i=1,\ldots , n\)). Now, the complete data are \(\left( y_{1}^{*},\ldots ,y_{n}^{*}, u_{1}, \ldots , u_{n}\right) \) in which \(u_1,\ldots ,u_n\) are latent variables. The complete data log-likelihood becomes

$$\begin{aligned} \displaystyle {\widetilde{l}}(\alpha )&=\sum _{i=1}^{n} \log f_{U}\left( u_i\right) + \sum _{i=1}^{n}\log f_{Y_{i}^{*}|U}(y_{i}^{*}|u) \nonumber \\ \displaystyle&=\text {C}+ \sum _{i=1}^{n} \log f_{U} \left( u_i \right) -\frac{1}{2}\sum _{i=1}^{n}\left\{ \frac{y^{*}_i}{u_{i} \left[ 1+\mathrm{sign} \left( y_{i}^{*}\right) \epsilon ^{(t+1)}\right] }\right\} ^{2} \nonumber \\ \displaystyle&=\text {C}+ n \log \alpha + \alpha \sum _{i=1}^{n} \log u_i -\sum _{i=1}^{n} u^{\alpha }_i - n \log \Gamma \left( 1+\frac{1}{\alpha }\right) \nonumber \\&\quad - \frac{1}{2}\sum _{i=1}^{n}\left\{ \frac{y^{*}_i}{u_{i} \left[ 1+\mathrm{sign} \left( y_{i}^{*}\right) \epsilon ^{(t+1)}\right] }\right\} ^{2}. \end{aligned}$$
(26)

The E-step of the stochastic EM algorithm is complete by simulating from the posterior pdf \(f_{U|Y^{*}}\left( u|y^{*}_{i}\right) \) (for \(i=1,\ldots ,n\)) that is given by

$$\begin{aligned} \displaystyle f_{U|Y^{*}}\left( u|y^{*}_{i}\right) \propto&\frac{\alpha u^{\alpha }\exp \left( -u^\alpha \right) }{\Gamma \left( 1+\frac{1}{\alpha }\right) } \exp \left\{ -\frac{1}{2} \left[ \frac{y^{*}_i}{u_{i} \left( 1+\mathrm{sign} \left( y^{*}\right) \epsilon ^{(t+1)} \right) }\right] ^{2}\right\} . \end{aligned}$$
(27)

The pdf (27) is log-concave for \(\alpha \ge 1\) and so the adaptive rejection sampling approach proposed by Gilks and Wild (1992) can be used to sampling from \(f_{U|Y^{*}}\left( u|y^{*}_{i}\right) \). When \(\alpha <1\), we suggest to use the Metropolis-Hasting approach in which the proposal distribution is \(G^{1/\alpha }(1+1/\alpha )\). Once we have generated the entire vector \(\left\{ u_i\right\} _{i=1}^{n}\), the M-step of the stochastic EM algorithm is complete by maximizing \({\widetilde{l}}(\alpha )\) with respect to \(\alpha \). Generally, for each cycle of the EM algorithm, the E- and M-steps of the stochastic EM algorithm inside the CM-step are repeated \(N\ge 1\) times and the average of the updated values of \(\alpha \) is considered as updated \(\alpha \) (here, we suggest to set \(N=40\)). Since we update \(\alpha \) in each cycle of the EM algorithm by generating from the posterior pdf \(f_{U|Y^{*}}\left( u|y^{*}_{i}\right) \), this type of the EM algorithm can be called a stochastic EM algorithm, thereby the parameter vector converges to a stationary distribution rather than a point (Diebolt & Celeux, 1993). The general structure of the stochastic EM algorithm is given as follows:

  1. 1.

    Update \(\mu ^{(t)}\), \(\sigma ^{(t)}\), and \(\epsilon ^{(t)}\) through (8), (9), and (10);

  2. 2.

    Set \(j=1\) and go to next step;

  3. 3.

    Apply transformation \({\varvec{y}}^{*}=\sqrt{2{\varvec{g}}}\left( {\varvec{y}}-\mu ^{(t+1)}\right) /\sigma ^{(t+1)}\), where \({\varvec{g}}\) is a vector of n independent simulated observations from a gamma distribution with shape parameter 3/2;

  4. 4.

    Generate \({\varvec{u}}=\left( u_1,\ldots ,u_n\right) \) from \(f_{U|Y^{*}}\left( u|y^{*}_{i}\right) \);

  5. 5.

    Maximize \({\widetilde{l}}(\alpha )\) given in (26) with respect to \(\alpha \) to obtain \({\widetilde{\alpha }}_{j}\);

  6. 6.

    Go to step 2 and repeat the algorithm \(N=40\) times;

  7. 7.

    Update \(\alpha \) as \({\widehat{\alpha }}^{(t+1)}=\frac{1}{N}\sum _{j=1}^{N}{\widetilde{\alpha }}_{j}\);

  8. 8.

    Go back to step 1 and update location, scale, and skewness parameters using \({\widehat{\alpha }}^{(t+1)}\) obtained in the previous step;

  9. 9.

    Repeat steps 1 to 8 until convergence occurs for \(\left\{ \varvec{\theta }^{(t)}\right\} _{t \ge 1}\).

In practice, the number of iterations before convergence is determined via an exploratory data analysis approach such as a graphical display (Ip, 1994).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teimouri, M., Nadarajah, S. Maximum Likelihood Estimation for the Asymmetric Exponential Power Distribution. Comput Econ 60, 665–692 (2022). https://doi.org/10.1007/s10614-021-10162-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-021-10162-1

Keywords

Navigation