Abstract
For a finite mixture of skew normal distributions, the maximum likelihood estimator is not well-defined because of the unboundedness of the likelihood function when scale parameters go to zero and the divergency of the skewness parameter estimates. To overcome these two problems simultaneously, we propose constrained maximum likelihood estimators under constraints on both the scale parameters and the skewness parameters. The proposed estimators are consistent and asymptotically efficient under relaxed constraints on the scale and skewness parameters. Numerical simulations show that in finite sample cases the proposed estimators outperform the ordinary maximum likelihood estimators. Two real datasets are used to illustrate the success of the proposed approach.
Similar content being viewed by others
References
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A, Arellano-Valle RB (2013) Maximum penalized likelihood estimation for skew-normal and skew-\(t\) distributions. J Statis Plann Inference 143:419–433
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61:579–602
Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142
Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61
Chen J, Li P (2009) Hypothesis test for normal mixture models: the EM approach. Ann Statist 37:2523–2542
Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Statistica Sinica 18:443–465
Chen J (2017) Consistency of the MLE under mixture models. Stat Sci 32:47–63
Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–22
DiCiccio TJ, Monti AC (2004) Inferential aspects of the skew exponential power distribution. J Amer Statist Assoc 99:439–450
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11:317–336
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Statist 36(3):1324–1345
Greco L (2011) Minimum Hellinger distance based inference for scalar skew-normal and skew-t distributions. Test 20:120–137
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Statist 13:795–800
Hathaway RJ (1986) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230
Ho HJ, Pyne S, Lin TI (2012) Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Stat Comput 22(1):287–299
Jin L, Xu W, Zhu L, Zhu L (2019) Penalized maximum likelihood estimator for skew normal mixtures (in Chinese). Sci Sin Math 49:1225–1250
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:887–906
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivariate Anal 100(2):257–265
Lin TI, Lee JC, Hsieh WJ (2007a) Robust mixture modeling using the skew \(t\) distribution. Stat Comput 17(2):81–92
Lin TI, Lee JC, Yen SY (2007b) Finite mixture modelling using the skew normal distribution. Statist Sinica 17:909
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivariate Anal 143:398–413
Lindsay BG (1995) Mixture Models: Theory. Geometry and Applications, Institute of Mathematical Statistics and American Statistical Association
McLachlan G, Peel D (2000) Finite Mixture Models. John Wiley & Sons, New York
Nettleton D (1999) Convergence properties of the EM algorithm in constrained parameter spaces. Canad J Statist 27(3):639–648
Peters BC, Walker HF (1978) An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM J Appl Math 35:362–378
Phillips RF (1991) A constrained maximum-likelihood approach to estimating switching regressions. J Econometrics 48:241–262
Prates MO, Cabral CRB, Lachos VH (2013) mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions. J Statist Softw 54(12):1–20
Redner R (1981) Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann Statist 9:225–228
Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. The 2007 Proceedings of the American Statistical Association, Section on Statistical Education, Alexandria, VA, 2113 - 2119
Wald A (1949) Note on the Consistency of the Maximum Likelihood Estimate. Ann Math Stat 20:595–601
Wolfowitz J (1949) On wald’s proof of the consistency of the maximum likelihood estimate. Ann Math Stat 20:601–602
Xu J, Tan X, Zhang R (2010) A note on Phillips (1991): “A constrained maximum likelihood approach to estimating switching regressions’’. J Econom 154:35–41
Zhao J, Jin L, Shi L (2015) Mixture model selection via hierarchical bic. Comput Stat Data Anal 88:139–153
Acknowledgements
The authors gratefully acknowledge the Chief Editor, the Associate Editor and two anonymous referees for their comments and suggestions, which have led to a much improved version of this article. We are also grateful to Prof. Tsung I. Lin for his help in providing codes about Bayesian estimation. This research was supported by two grants from the University Grants Council of Hong Kong, Hong Kong, China, four grants from the National Science Foundation of China (NSFC: 11801370, 12161089, 11761076, 11671042), a grant from Yunnan Natural Science Foundation (2019FB002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proofs of (2.2) and its lower bound. The equation (2.2) can be clearly figured out by considering its counterpart, the probability that all \({\hat{\lambda }}_i\) is finite, denoted as \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\). It is easy to confirm that within n observations \(\{x_i\}_{i=1}^n\), if there exists at least one \(x_i\) larger than \(\mu _{(g)}\), then there will be no \({\hat{\lambda }}_i=-\infty \). Meanwhile, if there exists at least one \(x_i\) smaller than \(\mu _{(1)}\), then there will be no \({\hat{\lambda }}_i=\infty \). Therefore, the probability \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\) equals to the probability of simultaneously existing at least one \(x_i>\mu _{(g)}\) and at least one \(x_j<\mu _{(1)}\).
Recall \(\mu _{(1)},\cdots ,\mu _{(g)}\), denote \(\mu _{(0)}=-\infty \) and \(\mu _{(g+1)}=\infty \), and define the probability \({\mathbb {P}}_{k}={\mathbb {P}}\{ X\in (\mu _{(k)}, \mu _{(k+1)})\}\) with \(k=0,1,\cdots ,g\). According to the Multinomial theorem, the total probability is given by
with
in which \({\mathbb {N}}\) is the natural number set. Then the probability of all \({\hat{\lambda }}_i\) being finite can be viewed as the probability of \(n_0\ge 1,n_g\ge 1\). Denote
and we obtain
Next, we investigate the probability \({\mathbb {P}} \{ \max _i|{\hat{\lambda }}_i|=\infty \}\). First, by complementing \(\Omega _1\), we have
Then, we can conclude that
This completes the proof of equation (2.2).
At the same time, we study the condition that \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}\) attains a lower bound. Let \({\mathbb {P}}\{X<\mu _{(g)}\}={\mathbb {P}}_a\), \({\mathbb {P}}\{ X>\mu _{(1)}\}={\mathbb {P}}_b\) and \({\mathbb {P}}\{X\in (\mu _{(1)}, \mu _{(g)})\}={\mathbb {P}}_{c}\). Note that \({\mathbb {P}}_b=1-{\mathbb {P}}_a+{\mathbb {P}}_{c}\), then we have
Note that \({\mathbb {P}}_a, 1-{\mathbb {P}}_a\in (0,1)\) and \({\mathbb {P}}_{c}\ge 0\). Hence, we can conclude that only when \({\mathbb {P}}_{c}=0\), that is \(\mu _{(1)}=\mu _{(g)}\), \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}\) will attain its minimum. \(\square \)
Proof of Theorem 2.1
Suppose that \(\Psi _1\in \Gamma _c\) satisfies the ith component location parameter \(|\mu _i|<\infty \), and \(\Psi _2\in \Gamma _c\) is a copy from \(\Psi _1\) but sets \(|\mu _i|\rightarrow \infty \). Note that \(f(x;\Psi _2)<f(x;\Psi _1)\) and thus \(l_n(\Psi _2)<l_n(\Psi _1)\). Moreover, for \(\Psi \in \Gamma _c\), it is easy to show \(\sigma _j=O(\sigma _i),j\ne i\) when either \(\sigma _i\rightarrow 0\) or \(\infty \). Hence, for any \(\Psi \in \Gamma _c\) satisfying either \(\sigma _i\rightarrow 0\) or \(\infty \), we have \(l_n(\Psi )\rightarrow -\infty \). The constraints on \(\lambda \) in \(\Gamma _c\) are of compactness and are equivalent to confine that \(\max |\lambda _i|\le \sqrt{c_2-1}\). From the above, we conclude that \(\sup _{\Psi \in \Gamma _c}l_n(\Psi )=\sup _{\Psi \in \Gamma ^*}l_n(\Psi )\), where
for some constants \({\overline{\mu }},{\underline{\sigma }},{\overline{\sigma }}\), and \({\overline{\lambda }}\). Hence, in the context of the compactness of \(\Gamma ^*\) and the continuity of \(l_n(\Psi )\), there exists a \(\Psi ^*\in \Gamma _c\) such that
\(\square \)
Proof of Theorem 2.2
For \({\hat{\Psi }}_c\in \Gamma ^*\), let \(d(\cdot ,\cdot )\) be the distance on \(\Gamma ^*\) and for any \(\epsilon >0\), define
as an open ball centered at \(\Psi _0\) with radius \(\epsilon \). Define the complement of \(B(\Psi _0,\epsilon )\) in the space \(\Gamma ^*\) as
For any \(\delta >0\) and \(\Psi \), denote
Note that \(\lim _{\delta \rightarrow 0}\log f(x;\Psi ,\delta )=\log f(x;\Psi )\) and \(\sup \{ f(x;\Psi ): \Psi \in \Gamma ^*(\epsilon ) \}<\infty \). By the dominated convergence theorem, we have
where \(E_0(\cdot )\) is the expectation with respect to the density function \(f(x;\Psi _0)\).
For any \(\Psi \in \Gamma ^*(\epsilon )\), let \(K(\Psi )=E_0\log f(X;\Psi )\). Due to the compactness of \(\Gamma ^*(\epsilon )\), there exists a \(\Psi _{\epsilon }\in \Gamma ^*(\epsilon )\) such that \(K(\Psi _{\epsilon })=\sup \{ K(\Psi ): \Psi \in \Gamma ^*(\epsilon )\}\). Further, using Jensen’s inequality, we have \(k_0=K(\Psi _0)-K(\Psi _{\epsilon })>0\). From (A.1), it is easy to show that for all \(\Psi \in \Gamma ^*(\epsilon )\), there exists a \(\delta _{\Psi }\) such that \(E_0\log f(X;\Psi ,\delta _{\Psi })\le E_0\log f(X;\Psi )+k_0/2\).
In addition, the compactness of \(\Gamma ^*(\epsilon )\) ensures the existence of a finite number of \(\Psi _1,\Psi _2,\ldots ,\Psi _h\) such that \(\Gamma ^*(\epsilon )\subset \cup _{i=1}^h B(\Psi _i,\delta _{\Psi _i})\). By Kolmogorov’s strong law of large numbers, for \(i=1,\ldots ,h\), we get
almost surely as \(n\rightarrow \infty \). Note that
Combining (A.2) and (A.3), we conclude that when \(i=1,\ldots ,h\),
or
Hence, by the definition of the constrained MLE, \({\hat{\Psi }}_c\rightarrow \Psi _0\) almost surely. \(\square \)
Proof of Theorem 2.3
Since \({\hat{\Psi }}_c\) is strongly consistent, by Taylor’s expansion, we have
where \(\Psi '\) is an intermediate point between \({\hat{\Psi }}_c\) and \(\Psi _0\), thus \(\Psi '\rightarrow \Psi _0\) in probability as \(n\rightarrow \infty \). Write it as \(\Psi '-\Psi _0=o_p(1)\). Besides, \(R_n(\Psi ')\) is a 3-dimensional array with elements
Based on (A.4), it is easy to obtain
By the weak law of large numbers, we have
Hence, \(0.5({\hat{\Psi }}_c-\Psi _0)^T\frac{R_n(\Psi ')}{n}=o_p(1)\). Besides, by the central limit theorem,
Consequently, invoking Slutsky’s theorem, we have
\(\square \)
Proof of Theorem 3.1
Before stating the proof of Theorem 3.1, we first introduce a useful lemma proved by Chen (2017) as follows.
Lemma 7.1
Let \(X_1,\cdots ,X_n\) be i.i.d. observations from an absolute continuous distribution F with density function f(x). Suppose f(x) is continuous and \(M=\sup _x f(x)<\infty \). Let \(F_n(x)=n^{-1}\sum _{i=1}^nI(X_i\le x)\) be the empirical distribution function. Thus, as \(n\rightarrow \infty \),
holds uniformly for all \(\epsilon >0\) almost surely.
This lemma aims to assess the number of observations falling in a small neighborhood of the location parameters in \(\Psi \), which is crucially related to the upper bound of \(l_n(\Psi )\).
Proof of Theorem 3.1
Without loss of generality, assume \(\sigma _{1}\le \sigma _{2}\le \cdots \le \sigma _{g}\). To begin, partition the parameter space \(\Gamma _n\) into
where \(\epsilon _0, \tau _{0}\) and \(\eta _0\) are three constants. Since \(\cup _{t=1}^g\Gamma ^t_{n\sigma }=\{\Psi \in \Gamma _n: \min _i\{\sigma _i\}\le \epsilon _{0}\}\), the subspace \(\cup _{t=1}^g\Gamma ^t_{n\sigma }\) thus represents all possible cases that the mixing distribution has at least one component deviation close to zero.
Let \(M=\sup _x f(x;\Psi _0)\) and recall \(K(\Psi _0)=E_{0}\{\log f(X;\Psi _0)\}\). We then select \(\epsilon _0, \tau _{0}, \eta _0\) satisfying the following conditions:
where \(\lambda _{0i}\) is the element of \(\Psi _0\) and \(G_{\tau _0}\) satisfying \(G_{\tau _0}-K(\Psi _0)>0\) will be specified later. The choice of \(\epsilon _0\) and \(\eta _0\) clearly depends on \(\Psi _0\) but not on the sample size n, meanwhile, the existence of \(\epsilon _0\) and \(\eta _0\) is easy to verify. We proceed with the proof in four steps.
Step 1: First, we exclude the possibility of \({\hat{\Psi }}_n\in \Gamma ^g_{n\sigma }\). Define index sets of observations \(A_{k}=\{ i:|x_i-\mu _{k}|<|\sigma _{k}\log \sigma _{k}| \}\) for \(k=1,\ldots ,g\) and let \(n(A_{k})\) be the number of elements belonging to \(A_{k}\). Denote
Note that for any \(i\in A_{k}\), the log-likelihood contribution of \(x_i\) is bounded by \(-\log \sigma _{k}\). While for \(i\in \cap _{k=1}^{g}A_{k}^c\), it is easy to show \(\log f(x_i;\Psi )\le -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\). Hence, we get
For \(0<\sigma _k\le n^{-1}\), by lemma and \(\alpha <1\), we have
and for \(n^{-1}<\sigma _k\le e^{-2}\),
For small enough \(\epsilon _0\), we have \(\sum _{i=1}^g n(A_i)<n/2\) and \(n(\cap _{i=1}^{g}A_{i}^c)\ge n/2\) almost surely. Noticing the first two conditions in (A.5), we obtain that
By the strong law of large numbers, we have \(n^{-1}l_n(\Psi _0)\xrightarrow {a.s.} K(\Psi _0)\). Thus,
Step 2: Then we prove that \({\hat{\Psi }}_n\) is not in \(\Gamma ^t_{n\sigma }\) with probability one when \(t=1,\ldots ,g-1\). Let \(\Gamma ^t_{\sigma }=\{ \Psi \in \Gamma : \sigma _{t}\le \tau _{0} <\epsilon _{0}\le \sigma _{t+1} \}\) for \(t=1,\ldots ,g-1\), and it is clear that \(\Gamma ^t_{n\sigma }\subset \Gamma ^t_{\sigma }\). Define a distance on \(\Gamma ^t_{\sigma }\) by
under which \(\Gamma ^t_{\sigma }\) is totally bounded and can be compacted. Let \({\bar{\Gamma }}^t_{\sigma }\) be a compactified \(\Gamma ^t_{\sigma }\) which includes all limit points. For \(\Psi \in {\bar{\Gamma }}^t_{\sigma }\), define
where \(g_{t}(x;\Psi )\) is bound over \({\bar{\Gamma }}^t_{\sigma }\), although \(\sigma _1=\cdots =\sigma _t=0\) is allowed. For \(\Psi \in {\bar{\Gamma }}_{\sigma }^t\), denote \(G_t(\Psi )=E_{0}\log \{g_{t}(X;\Psi )\}\) and \(G_{\tau _0}=\sup \{ G_t(\Psi ): \Psi \in {\bar{\Gamma }}_{\sigma }^t \}\). If \(\tau _0\) is small enough such that \(\Psi _0\notin {\bar{\Gamma }}_{\sigma }^t\), by the upper bound of Jensen’s inequality, we have
Define \(\ell _n^{t}(\Psi )=\sum _{i=1}^n \log \{ g_{t}(x_i;\Psi ) \}\) on \({\bar{\Gamma }}_{\sigma }^t\), applying the strong law of large numbers, we have
Note that for each \(i\in A_k\), it is easy to show \(\log \{f(x_i;\Psi )\}\le -\log \sigma _{k}+ \log \{g_{t}(x_i;\Psi )\}\). For \(i \not \in A_k\), since \(|x_i-\mu _{k}|\ge |\sigma _{k}\log \sigma _{k}|\), if \(\sigma _{k}\) is small enough that \(\sigma _{k}^{-1}=\exp \{ -\log \sigma _{k} \}< \exp \{ \frac{1}{4}\log ^2\sigma _{k} \}\), then the inequalities
hold with \(\sigma _{k}\le \epsilon _0\), which imply that \(\log \{f(x_i;\Psi )\}\le \log \{g_{t}(x_i;\Psi )\}\).
In summary, for \(\Psi \in {\bar{\Gamma }}_{\sigma }^t\), the log-likelihood has the following upper bound
Moreover, when \(0<\sigma _k\le n^{-1}\), \(n(A_k)\le (4M+20)\log n\) and
Hence, for \(\alpha >1\),
For \(n^{-1}<\sigma _k\le e^{-2}\),
With the conclusions of (A.6)–(A.8), recalling the third conditions in (A.5) that \(t(4M+1)\tau _0\log ^2\tau _0\le \frac{1}{2}\{K(\Psi _0)-G_{\tau _0}\}\), it can be showed that for \(t=1,\ldots ,g-1\),
Remark 7.1
Based on the first two steps, we actually exclude the possibility of the CMLE \({\hat{\Psi }}_n\) not belonging to the subspace \(\Gamma _{n\sigma }=\cup _{t=1}^g\Gamma ^t_{n\sigma }\). Besides, confining \(\Psi \) to \(\Gamma _{n\sigma }^c\) is equivalent to setting up a positive constant lower bound on \(\sigma _k\) for \(k=1,\ldots ,g\).
Step 3: To show
Define \(\Gamma _{\lambda }=\{ \Psi \in \Gamma : \max _i\{ |\lambda _i|\}\ge \eta _0 \}\). It is clear that \(\Gamma _{n\lambda }\subset \Gamma _{\lambda }\) and \((\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda })\subset (\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda })\). Recall \(K(\Psi )=E_0\log f(X;\Psi )\) and denote \(K_{\eta _0}=\sup \{ K(\Psi ): \Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda } \}\). Suppose \(\eta _0\) is large enough so that \(\Psi _0\notin \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }\). According to Jensen’s inequality, we have \(E_{0}\log \{ f(X;\Psi )/f(X;\Psi _0)\}<0\) for any \(\Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }\). Thus we can show \(K_{\eta _0}-K(\Psi _0)<0\). Note that \(K_{\eta _0}\) is an increasing function of \(\eta _0\). Consequently it is easy to show that,
Furthermore, we obtain
Thus we have \(\sup _{\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda }} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \) almost surely as \(n\rightarrow \infty \).
Step 4: From the above, we can conclude that \({\hat{\Psi }}_n\in \Gamma ^{\dagger }\) almost surely as n goes to infinity. By noting the compactness of \(\Gamma ^{\dagger }\), we can conclude the strong consistency of \({\hat{\Psi }}_n\) from Theorem 2.2. \(\square \)
Proof of Theorem 3.2
The proof is almost identical to that of Theorem 2.3. The details are omitted. \(\square \)
Proof of Theorem 3.3
Let \({\hat{\lambda }}^2_{(i)}\) be the ith order statistic of \({\hat{\lambda }}^2_1,\ldots ,{\hat{\lambda }}^2_g\), it is then easy to show that the constraint \(\max _{i,j} \ (1+\hat{\lambda }^2_i)(1+\hat{\lambda }^2_j)\le \vartheta _n\) is equivalent to \((1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le \vartheta _n\). Because \((1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le (1+\hat{\lambda }^2_{(g)})^2\), we have
in which \({\mathbf {1}}_g\) is a \(g\times 1\) vector with all elements is 1. By the mean value theorems for definite integrals, there exists at least one point \(|{\mathbf {x}}_0|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g\) such that
in which \(\phi _g({\mathbf {x}};\varvec{\mu },\Sigma )\) denotes g-dimensional multivariate normal density function with mean vector \(\varvec{\mu }\) and covariance matrix \(\Sigma \). As \(n\rightarrow \infty \), it can be seen that
where \(\beta \) is a positive constant. This further implies that as \(n\rightarrow \infty \) and \(\vartheta _n\rightarrow \infty \),
Combining (A.9) and (A.10) leads to
Meanwhile, it is easy to verified that
Consequently, we have
which can also be derived from (A.10). Let us assume further that
in which \(\kappa (n)=o\{ n^{-2}\exp (4\beta n/g) \}\), and thus \(\Delta _n=\kappa (n)n^{2}\exp (-4\beta n/g)=o(1)\). Under this assumption, we have
Ignoring the higher order term in (A.13) and recalling the conclusion of (A.11), we get
On the other hand, the lower bound of \({\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\) defined in (2.3) can be simplified to
The existence of \(\nu \) lies in that when \(x\in (0,1)\), \(x^{n}\) is continuous and monotone. This further implies that
Consequently, the probability condition of (3.4) links two bounds in (A.14) and (A.15), thus we obtain
and
Therefore, by the definition of \(\Delta _n\), it is easy to show
Applying this result to (A.12) finally leads to an approximate upper bound for \(\vartheta _n\), i.e.
in which \(\beta >0\) and \(0<\nu <1\). Moreover, absorbing 4 and g into the positive constant \(\beta \), the result can be simplified to
\(\square \)
Proof of Theorem 3.4
The proof is similar to that of Theorem 3.3 and thus omitted. \(\square \)
Rights and permissions
About this article
Cite this article
Jin, L., Chiu, S.N., Zhao, J. et al. A constrained maximum likelihood estimation for skew normal mixtures. Metrika 86, 391–419 (2023). https://doi.org/10.1007/s00184-022-00873-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-022-00873-2