Skip to main content
Log in

A constrained maximum likelihood estimation for skew normal mixtures

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

For a finite mixture of skew normal distributions, the maximum likelihood estimator is not well-defined because of the unboundedness of the likelihood function when scale parameters go to zero and the divergency of the skewness parameter estimates. To overcome these two problems simultaneously, we propose constrained maximum likelihood estimators under constraints on both the scale parameters and the skewness parameters. The proposed estimators are consistent and asymptotically efficient under relaxed constraints on the scale and skewness parameters. Numerical simulations show that in finite sample cases the proposed estimators outperform the ordinary maximum likelihood estimators. Two real datasets are used to illustrate the success of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  MATH  Google Scholar 

  • Azzalini A, Arellano-Valle RB (2013) Maximum penalized likelihood estimation for skew-normal and skew-\(t\) distributions. J Statis Plann Inference 143:419–433

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61:579–602

    Article  MathSciNet  MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941

    Article  MathSciNet  MATH  Google Scholar 

  • Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142

    Article  MathSciNet  MATH  Google Scholar 

  • Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Li P (2009) Hypothesis test for normal mixture models: the EM approach. Ann Statist 37:2523–2542

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Statistica Sinica 18:443–465

    MathSciNet  MATH  Google Scholar 

  • Chen J (2017) Consistency of the MLE under mixture models. Stat Sci 32:47–63

    Article  MathSciNet  MATH  Google Scholar 

  • Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  • DiCiccio TJ, Monti AC (2004) Inferential aspects of the skew exponential power distribution. J Amer Statist Assoc 99:439–450

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11:317–336

    Article  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Statist 36(3):1324–1345

    Article  MathSciNet  MATH  Google Scholar 

  • Greco L (2011) Minimum Hellinger distance based inference for scalar skew-normal and skew-t distributions. Test 20:120–137

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Statist 13:795–800

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1986) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230

    Article  Google Scholar 

  • Ho HJ, Pyne S, Lin TI (2012) Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Stat Comput 22(1):287–299

    Article  MathSciNet  MATH  Google Scholar 

  • Jin L, Xu W, Zhu L, Zhu L (2019) Penalized maximum likelihood estimator for skew normal mixtures (in Chinese). Sci Sin Math 49:1225–1250

    Article  MATH  Google Scholar 

  • Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:887–906

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivariate Anal 100(2):257–265

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Lee JC, Hsieh WJ (2007a) Robust mixture modeling using the skew \(t\) distribution. Stat Comput 17(2):81–92

    Article  MathSciNet  Google Scholar 

  • Lin TI, Lee JC, Yen SY (2007b) Finite mixture modelling using the skew normal distribution. Statist Sinica 17:909

    MathSciNet  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivariate Anal 143:398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay BG (1995) Mixture Models: Theory. Geometry and Applications, Institute of Mathematical Statistics and American Statistical Association

  • McLachlan G, Peel D (2000) Finite Mixture Models. John Wiley & Sons, New York

    Book  MATH  Google Scholar 

  • Nettleton D (1999) Convergence properties of the EM algorithm in constrained parameter spaces. Canad J Statist 27(3):639–648

    Article  MathSciNet  MATH  Google Scholar 

  • Peters BC, Walker HF (1978) An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM J Appl Math 35:362–378

    Article  MathSciNet  MATH  Google Scholar 

  • Phillips RF (1991) A constrained maximum-likelihood approach to estimating switching regressions. J Econometrics 48:241–262

    Article  MathSciNet  MATH  Google Scholar 

  • Prates MO, Cabral CRB, Lachos VH (2013) mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions. J Statist Softw 54(12):1–20

    Article  Google Scholar 

  • Redner R (1981) Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann Statist 9:225–228

    Article  MathSciNet  MATH  Google Scholar 

  • Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. The 2007 Proceedings of the American Statistical Association, Section on Statistical Education, Alexandria, VA, 2113 - 2119

  • Wald A (1949) Note on the Consistency of the Maximum Likelihood Estimate. Ann Math Stat 20:595–601

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfowitz J (1949) On wald’s proof of the consistency of the maximum likelihood estimate. Ann Math Stat 20:601–602

    Article  MathSciNet  MATH  Google Scholar 

  • Xu J, Tan X, Zhang R (2010) A note on Phillips (1991): “A constrained maximum likelihood approach to estimating switching regressions’’. J Econom 154:35–41

    Article  MATH  Google Scholar 

  • Zhao J, Jin L, Shi L (2015) Mixture model selection via hierarchical bic. Comput Stat Data Anal 88:139–153

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the Chief Editor, the Associate Editor and two anonymous referees for their comments and suggestions, which have led to a much improved version of this article. We are also grateful to Prof. Tsung I. Lin for his help in providing codes about Bayesian estimation. This research was supported by two grants from the University Grants Council of Hong Kong, Hong Kong, China, four grants from the National Science Foundation of China (NSFC: 11801370, 12161089, 11761076, 11671042), a grant from Yunnan Natural Science Foundation (2019FB002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixing Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proofs of (2.2) and its lower bound. The equation (2.2) can be clearly figured out by considering its counterpart, the probability that all \({\hat{\lambda }}_i\) is finite, denoted as \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\). It is easy to confirm that within n observations \(\{x_i\}_{i=1}^n\), if there exists at least one \(x_i\) larger than \(\mu _{(g)}\), then there will be no \({\hat{\lambda }}_i=-\infty \). Meanwhile, if there exists at least one \(x_i\) smaller than \(\mu _{(1)}\), then there will be no \({\hat{\lambda }}_i=\infty \). Therefore, the probability \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\) equals to the probability of simultaneously existing at least one \(x_i>\mu _{(g)}\) and at least one \(x_j<\mu _{(1)}\).

Recall \(\mu _{(1)},\cdots ,\mu _{(g)}\), denote \(\mu _{(0)}=-\infty \) and \(\mu _{(g+1)}=\infty \), and define the probability \({\mathbb {P}}_{k}={\mathbb {P}}\{ X\in (\mu _{(k)}, \mu _{(k+1)})\}\) with \(k=0,1,\cdots ,g\). According to the Multinomial theorem, the total probability is given by

$$\begin{aligned} 1=({\mathbb {P}}_{0}+{\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g})^n =\sum _{\Omega }\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g} \end{aligned}$$

with

$$\begin{aligned} \Omega = \left\{ (n_0,n_1,\cdots ,n_g)\in {\mathbb {N}}^{p+1}| \sum _{k=0}^gn_k=n\right\} , \end{aligned}$$

in which \({\mathbb {N}}\) is the natural number set. Then the probability of all \({\hat{\lambda }}_i\) being finite can be viewed as the probability of \(n_0\ge 1,n_g\ge 1\). Denote

$$\begin{aligned} \Omega _1=\{ (n_0,n_1,\cdots ,n_g)\in \Omega | n_0\ge 1,n_g\ge 1\} \end{aligned}$$

and we obtain

$$\begin{aligned} {\mathbb {P}} \left\{ \max _i(|{\hat{\lambda }}_i|)<\infty \right\} =\sum _{\Omega _1}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}. \end{aligned}$$

Next, we investigate the probability \({\mathbb {P}} \{ \max _i|{\hat{\lambda }}_i|=\infty \}\). First, by complementing \(\Omega _1\), we have

$$\begin{aligned} \Omega ^c_1=\{ (n_0,n_1,\cdots ,n_g)\in \Omega | n_0=0 \ \text {or} \ n_g=0\}. \end{aligned}$$

Then, we can conclude that

$$\begin{aligned} {\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}&=1-{\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\\&=\sum _{\Omega ^c_1}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&=\sum _{n_0=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&\quad +\sum _{n_g=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g} \\&\quad -\sum _{n_0=0,n_g=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&=({\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g})^n+ ({\mathbb {P}}_{0}+{\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g-1})^n\\&\quad -({\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g-1})^n \\&=[{\mathbb {P}}\{ X>\mu _{(1)}\}]^n+[{\mathbb {P}}\{X<\mu _{(g)}\} ]^n -[{\mathbb {P}}\{X\in (\mu _{(1)}, \mu _{(g)})\}]^n. \end{aligned}$$

This completes the proof of equation (2.2).

At the same time, we study the condition that \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}\) attains a lower bound. Let \({\mathbb {P}}\{X<\mu _{(g)}\}={\mathbb {P}}_a\), \({\mathbb {P}}\{ X>\mu _{(1)}\}={\mathbb {P}}_b\) and \({\mathbb {P}}\{X\in (\mu _{(1)}, \mu _{(g)})\}={\mathbb {P}}_{c}\). Note that \({\mathbb {P}}_b=1-{\mathbb {P}}_a+{\mathbb {P}}_{c}\), then we have

$$\begin{aligned} {\mathbb {P}} \left\{ \max _i(|{\hat{\lambda }}_i|)=\infty \right\}&=({\mathbb {P}}_a)^n+ ({\mathbb {P}}_b)^n-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+ (1-{\mathbb {P}}_a+{\mathbb {P}}_{c})^n-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+ \sum _{r=0}^nC_n^r(1-{\mathbb {P}}_a)^r({\mathbb {P}}_{c})^{n-r}-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+(1-{\mathbb {P}}_a)^n+\sum _{r=1}^{n-1}C_n^r(1-{\mathbb {P}}_a)^r({\mathbb {P}}_{c})^{n-r}. \end{aligned}$$

Note that \({\mathbb {P}}_a, 1-{\mathbb {P}}_a\in (0,1)\) and \({\mathbb {P}}_{c}\ge 0\). Hence, we can conclude that only when \({\mathbb {P}}_{c}=0\), that is \(\mu _{(1)}=\mu _{(g)}\), \({\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}\) will attain its minimum. \(\square \)

Proof of Theorem 2.1

Suppose that \(\Psi _1\in \Gamma _c\) satisfies the ith component location parameter \(|\mu _i|<\infty \), and \(\Psi _2\in \Gamma _c\) is a copy from \(\Psi _1\) but sets \(|\mu _i|\rightarrow \infty \). Note that \(f(x;\Psi _2)<f(x;\Psi _1)\) and thus \(l_n(\Psi _2)<l_n(\Psi _1)\). Moreover, for \(\Psi \in \Gamma _c\), it is easy to show \(\sigma _j=O(\sigma _i),j\ne i\) when either \(\sigma _i\rightarrow 0\) or \(\infty \). Hence, for any \(\Psi \in \Gamma _c\) satisfying either \(\sigma _i\rightarrow 0\) or \(\infty \), we have \(l_n(\Psi )\rightarrow -\infty \). The constraints on \(\lambda \) in \(\Gamma _c\) are of compactness and are equivalent to confine that \(\max |\lambda _i|\le \sqrt{c_2-1}\). From the above, we conclude that \(\sup _{\Psi \in \Gamma _c}l_n(\Psi )=\sup _{\Psi \in \Gamma ^*}l_n(\Psi )\), where

$$\begin{aligned} \Gamma ^*= & {} \{ \Psi \in \Gamma _c: |\mu _i|\le {\overline{\mu }}<\infty , 0<{\underline{\sigma }}\le \sigma _i\le {\overline{\sigma }}<\infty ,\\&\qquad |\lambda _i|\le {\overline{\lambda }}<\infty \ \text {for} \ i=1,\cdots ,g\} \end{aligned}$$

for some constants \({\overline{\mu }},{\underline{\sigma }},{\overline{\sigma }}\), and \({\overline{\lambda }}\). Hence, in the context of the compactness of \(\Gamma ^*\) and the continuity of \(l_n(\Psi )\), there exists a \(\Psi ^*\in \Gamma _c\) such that

$$\begin{aligned} l_n(\Psi ^*)=\sup \{l_n(\Psi ): \Psi \in \Gamma _c\}=\sup \{l_n(\Psi ):\Psi \in \Gamma ^*\}. \end{aligned}$$

\(\square \)

Proof of Theorem 2.2

For \({\hat{\Psi }}_c\in \Gamma ^*\), let \(d(\cdot ,\cdot )\) be the distance on \(\Gamma ^*\) and for any \(\epsilon >0\), define

$$\begin{aligned} B(\Psi _0,\epsilon )=\{ \Psi : d(\Psi , \Psi _0)<\epsilon \} \end{aligned}$$

as an open ball centered at \(\Psi _0\) with radius \(\epsilon \). Define the complement of \(B(\Psi _0,\epsilon )\) in the space \(\Gamma ^*\) as

$$\begin{aligned} \Gamma ^*(\epsilon )=\Gamma ^* \setminus B(\Psi _0,\epsilon ). \end{aligned}$$

For any \(\delta >0\) and \(\Psi \), denote

$$\begin{aligned} f(x;\Psi ,\delta )=\sup \{f(x;\Psi '): \, d(\Psi , \Psi ')<\delta , \Psi '\in \Gamma ^*(\epsilon )\}. \end{aligned}$$

Note that \(\lim _{\delta \rightarrow 0}\log f(x;\Psi ,\delta )=\log f(x;\Psi )\) and \(\sup \{ f(x;\Psi ): \Psi \in \Gamma ^*(\epsilon ) \}<\infty \). By the dominated convergence theorem, we have

$$\begin{aligned} \underset{\delta \rightarrow 0}{{\overline{\lim }}} E_0\log f(X;\Psi ,\delta ) \le E_0\underset{\delta \rightarrow 0}{{\overline{\lim }}}\log f(X;\Psi ,\delta ) =E_0\log f(X;\Psi ), \end{aligned}$$
(A.1)

where \(E_0(\cdot )\) is the expectation with respect to the density function \(f(x;\Psi _0)\).

For any \(\Psi \in \Gamma ^*(\epsilon )\), let \(K(\Psi )=E_0\log f(X;\Psi )\). Due to the compactness of \(\Gamma ^*(\epsilon )\), there exists a \(\Psi _{\epsilon }\in \Gamma ^*(\epsilon )\) such that \(K(\Psi _{\epsilon })=\sup \{ K(\Psi ): \Psi \in \Gamma ^*(\epsilon )\}\). Further, using Jensen’s inequality, we have \(k_0=K(\Psi _0)-K(\Psi _{\epsilon })>0\). From (A.1), it is easy to show that for all \(\Psi \in \Gamma ^*(\epsilon )\), there exists a \(\delta _{\Psi }\) such that \(E_0\log f(X;\Psi ,\delta _{\Psi })\le E_0\log f(X;\Psi )+k_0/2\).

In addition, the compactness of \(\Gamma ^*(\epsilon )\) ensures the existence of a finite number of \(\Psi _1,\Psi _2,\ldots ,\Psi _h\) such that \(\Gamma ^*(\epsilon )\subset \cup _{i=1}^h B(\Psi _i,\delta _{\Psi _i})\). By Kolmogorov’s strong law of large numbers, for \(i=1,\ldots ,h\), we get

$$\begin{aligned} \begin{aligned}&n^{-1}\sum _{t=1}^n \log f(x_t;\Psi _i,\delta _{\Psi _i}) \rightarrow E_0 \log f(X;\Psi _i,\delta _{\Psi _i}), \\&n^{-1}\sum _{t=1}^n \log f(x_t;\Psi _0)\rightarrow E_0 \log f(X;\Psi _0), \end{aligned} \end{aligned}$$
(A.2)

almost surely as \(n\rightarrow \infty \). Note that

$$\begin{aligned} \begin{aligned}&E_0 \log f(X;\Psi _i,\delta _{\Psi _i})-E_0\log f(X;\Psi _i)\le k_0/2, \\&E_0\log f(X;\Psi _i)-E_0\log f(X;\Psi _0)\le -k_0. \end{aligned} \end{aligned}$$
(A.3)

Combining (A.2) and (A.3), we conclude that when \(i=1,\ldots ,h\),

$$\begin{aligned} P\bigg [\lim _{n\rightarrow \infty }\sum _{t=1}^n \{\log f(x_t;\Psi _i,\delta _{\Psi _i})-\log f(x_t;\Psi _0)\} =-\lim _{n\rightarrow \infty }\frac{k_0}{2}n=-\infty \bigg ]=1, \end{aligned}$$

or

$$\begin{aligned} \forall \epsilon >0, \sup _{\Psi \in \Gamma ^*(\epsilon )} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Hence, by the definition of the constrained MLE, \({\hat{\Psi }}_c\rightarrow \Psi _0\) almost surely. \(\square \)

Proof of Theorem 2.3

Since \({\hat{\Psi }}_c\) is strongly consistent, by Taylor’s expansion, we have

$$\begin{aligned} 0= & {} \frac{\partial l_n({\hat{\Psi }}_c)}{\partial \Psi }\nonumber \\= & {} \frac{\partial l_n(\Psi _0)}{\partial \Psi } +\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}({\hat{\Psi }}_c-\Psi _0)\nonumber \\&+0.5({\hat{\Psi }}_c-\Psi _0)^TR_n(\Psi ')({\hat{\Psi }}_c-\Psi _0), \end{aligned}$$
(A.4)

where \(\Psi '\) is an intermediate point between \({\hat{\Psi }}_c\) and \(\Psi _0\), thus \(\Psi '\rightarrow \Psi _0\) in probability as \(n\rightarrow \infty \). Write it as \(\Psi '-\Psi _0=o_p(1)\). Besides, \(R_n(\Psi ')\) is a 3-dimensional array with elements

$$\begin{aligned} R_n(i,j,k)= \bigg (\frac{\partial ^3 l_n(\Psi ')}{\partial \Psi _i\partial \Psi _j\partial \Psi _k}\bigg ), i,j,k\in \{1,\ldots ,4g-1\}. \end{aligned}$$

Based on (A.4), it is easy to obtain

$$\begin{aligned} \sqrt{n}({\hat{\Psi }}_c-\Psi _0)=- \bigg \{0.5({\hat{\Psi }}_c-\Psi _0)^T\frac{R_n(\Psi ')}{n} +\frac{1}{n}\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}\bigg \}^{-1} \frac{1}{\sqrt{n}}\frac{\partial l_n(\Psi _0)}{\partial \Psi }. \end{aligned}$$

By the weak law of large numbers, we have

$$\begin{aligned} \frac{R_n(\Psi ')}{n}=O_p(1), \frac{1}{n}\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}={\mathcal {I}}(\Psi _0)+o_p(1). \end{aligned}$$

Hence, \(0.5({\hat{\Psi }}_c-\Psi _0)^T\frac{R_n(\Psi ')}{n}=o_p(1)\). Besides, by the central limit theorem,

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial l_n(\Psi _0)}{\partial \Psi }\xrightarrow {{\mathcal {L}}} N(0,{\mathcal {I}}(\Psi _0)). \end{aligned}$$

Consequently, invoking Slutsky’s theorem, we have

$$\begin{aligned} \sqrt{n}({\hat{\Psi }}_c-\Psi _0)\xrightarrow {{\mathcal {L}}} N(0,{\mathcal {I}}^{-1}(\Psi _0)). \end{aligned}$$

\(\square \)

Proof of Theorem 3.1

Before stating the proof of Theorem 3.1, we first introduce a useful lemma proved by Chen (2017) as follows.

Lemma 7.1

Let \(X_1,\cdots ,X_n\) be i.i.d. observations from an absolute continuous distribution F with density function f(x). Suppose f(x) is continuous and \(M=\sup _x f(x)<\infty \). Let \(F_n(x)=n^{-1}\sum _{i=1}^nI(X_i\le x)\) be the empirical distribution function. Thus, as \(n\rightarrow \infty \),

$$\begin{aligned} \underset{x\in {\mathbb {R}}}{\sup } \{ F_n(x+\epsilon )- F_n(x)\}\le 2M\epsilon +10n^{-1}\log n, \end{aligned}$$

holds uniformly for all \(\epsilon >0\) almost surely.

This lemma aims to assess the number of observations falling in a small neighborhood of the location parameters in \(\Psi \), which is crucially related to the upper bound of \(l_n(\Psi )\).

Proof of Theorem 3.1

Without loss of generality, assume \(\sigma _{1}\le \sigma _{2}\le \cdots \le \sigma _{g}\). To begin, partition the parameter space \(\Gamma _n\) into

$$\begin{aligned}&\Gamma ^t_{n\sigma }=\{ \Psi \in \Gamma _n: \sigma _{t}\le \tau _{0} <\epsilon _{0}\le \sigma _{t+1} \}, \\&\Gamma _{n\lambda }=\left\{ \Psi \in \Gamma _n: \max _i\{ |\lambda _i|\}\ge \eta _0 \right\} , \\&\Gamma ^\dag =\Gamma _n-(\cup _{t=1}^g\Gamma ^t_{n\sigma })\cup \Gamma _{n\lambda }, \end{aligned}$$

where \(\epsilon _0, \tau _{0}\) and \(\eta _0\) are three constants. Since \(\cup _{t=1}^g\Gamma ^t_{n\sigma }=\{\Psi \in \Gamma _n: \min _i\{\sigma _i\}\le \epsilon _{0}\}\), the subspace \(\cup _{t=1}^g\Gamma ^t_{n\sigma }\) thus represents all possible cases that the mixing distribution has at least one component deviation close to zero.

Let \(M=\sup _x f(x;\Psi _0)\) and recall \(K(\Psi _0)=E_{0}\{\log f(X;\Psi _0)\}\). We then select \(\epsilon _0, \tau _{0}, \eta _0\) satisfying the following conditions:

$$\begin{aligned} \begin{aligned}&\text{(i) } \quad (4M+1)(g-1)\epsilon _{0}\log ^2\epsilon _{0}\le 1, \\&\text{(ii) } \quad \log \epsilon _{0}+ \frac{\log ^2\epsilon _{0}}{4} \ge 2-K(\Psi _0), \\&\text{(iii) } \quad t(4M+1)\tau _0\log ^2\tau _0\le \frac{1}{2}\{G_{\tau _0}-K(\Psi _0)\}, \\&\text{(iv) } \quad \eta _0>\max _i\{ |\lambda _{0i}| \},i=1,\ldots ,p, \end{aligned} \end{aligned}$$
(A.5)

where \(\lambda _{0i}\) is the element of \(\Psi _0\) and \(G_{\tau _0}\) satisfying \(G_{\tau _0}-K(\Psi _0)>0\) will be specified later. The choice of \(\epsilon _0\) and \(\eta _0\) clearly depends on \(\Psi _0\) but not on the sample size n, meanwhile, the existence of \(\epsilon _0\) and \(\eta _0\) is easy to verify. We proceed with the proof in four steps.

Step 1: First, we exclude the possibility of \({\hat{\Psi }}_n\in \Gamma ^g_{n\sigma }\). Define index sets of observations \(A_{k}=\{ i:|x_i-\mu _{k}|<|\sigma _{k}\log \sigma _{k}| \}\) for \(k=1,\ldots ,g\) and let \(n(A_{k})\) be the number of elements belonging to \(A_{k}\). Denote

$$\begin{aligned} l_n(\Psi ;A_{k}) = \sum _{i\in A_{k}} \log f(x_i;\Psi ). \end{aligned}$$

Note that for any \(i\in A_{k}\), the log-likelihood contribution of \(x_i\) is bounded by \(-\log \sigma _{k}\). While for \(i\in \cap _{k=1}^{g}A_{k}^c\), it is easy to show \(\log f(x_i;\Psi )\le -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\). Hence, we get

$$\begin{aligned} l_n(\Psi )&=\sum _{k=1}^g l_n\{\Psi ;(\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k}\}+ l_n\{\Psi ;\cap _{i=1}^{g}A_{i}^c\} \\&\le \sum _{k=1}^{g} n((\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k})\log \frac{1}{\sigma _k} + n(\cap _{i=1}^{g}A_{i}^c)\left( -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\right) \\&= \sum _{k=1}^{g-1} n((\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k})\log \frac{\sigma _g}{\sigma _k} +n(\cup _{i=1}^{g}A_{i})\log \frac{1}{\sigma _g}\\&\quad + n(\cap _{i=1}^{g}A_{i}^c)\left( -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\right) \\&\le \sum _{k=1}^{g-1} n(A_{k})\log \frac{\sigma _g}{\sigma _k} +n\log \frac{1}{\sigma _g}+ n(\cap _{i=1}^{g}A_{i}^c)\left( -\frac{\log ^2\sigma _g}{2}\right) . \end{aligned}$$

For \(0<\sigma _k\le n^{-1}\), by lemma and \(\alpha <1\), we have

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+20)\log n\frac{n}{(\log n)^{\alpha }}=o(n) \end{aligned}$$

and for \(n^{-1}<\sigma _k\le e^{-2}\),

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+1)n\sigma _k\log ^2\sigma _k\le (4M+1)n\epsilon _0\log ^2\epsilon _0 \end{aligned}$$

For small enough \(\epsilon _0\), we have \(\sum _{i=1}^g n(A_i)<n/2\) and \(n(\cap _{i=1}^{g}A_{i}^c)\ge n/2\) almost surely. Noticing the first two conditions in (A.5), we obtain that

$$\begin{aligned} l_n(\Psi )\le n(K(\Psi _0)-1). \end{aligned}$$

By the strong law of large numbers, we have \(n^{-1}l_n(\Psi _0)\xrightarrow {a.s.} K(\Psi _0)\). Thus,

$$\begin{aligned} \underset{\Psi \in \Gamma ^g_{n\sigma }}{\sup }l_n(\Psi )-l_n(\Psi _0)\le -n\rightarrow -\infty , \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Step 2: Then we prove that \({\hat{\Psi }}_n\) is not in \(\Gamma ^t_{n\sigma }\) with probability one when \(t=1,\ldots ,g-1\). Let \(\Gamma ^t_{\sigma }=\{ \Psi \in \Gamma : \sigma _{t}\le \tau _{0} <\epsilon _{0}\le \sigma _{t+1} \}\) for \(t=1,\ldots ,g-1\), and it is clear that \(\Gamma ^t_{n\sigma }\subset \Gamma ^t_{\sigma }\). Define a distance on \(\Gamma ^t_{\sigma }\) by

$$\begin{aligned} d(\Psi , \Psi ')= \sum _{i=1}^{4g-1}|\arctan \Psi _i-\arctan \Psi '_i|, \end{aligned}$$

under which \(\Gamma ^t_{\sigma }\) is totally bounded and can be compacted. Let \({\bar{\Gamma }}^t_{\sigma }\) be a compactified \(\Gamma ^t_{\sigma }\) which includes all limit points. For \(\Psi \in {\bar{\Gamma }}^t_{\sigma }\), define

$$\begin{aligned} g_{t}(x;\Psi ) = \sum _{k=1}^{t}\frac{\pi _{(k)}}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\epsilon _0}\bigg ) +\sum _{k=t+1}^{g}\pi _{k}f(x;\theta _{k}), \end{aligned}$$

where \(g_{t}(x;\Psi )\) is bound over \({\bar{\Gamma }}^t_{\sigma }\), although \(\sigma _1=\cdots =\sigma _t=0\) is allowed. For \(\Psi \in {\bar{\Gamma }}_{\sigma }^t\), denote \(G_t(\Psi )=E_{0}\log \{g_{t}(X;\Psi )\}\) and \(G_{\tau _0}=\sup \{ G_t(\Psi ): \Psi \in {\bar{\Gamma }}_{\sigma }^t \}\). If \(\tau _0\) is small enough such that \(\Psi _0\notin {\bar{\Gamma }}_{\sigma }^t\), by the upper bound of Jensen’s inequality, we have

$$\begin{aligned} G_{\tau _0}-K(\Psi _0)=\sup _{\Psi \in {\bar{\Gamma }}_{\sigma }^t} E_{0}\log \{g_{t}(X;\Psi )/f(X;\Psi _0)\}<0. \end{aligned}$$

Define \(\ell _n^{t}(\Psi )=\sum _{i=1}^n \log \{ g_{t}(x_i;\Psi ) \}\) on \({\bar{\Gamma }}_{\sigma }^t\), applying the strong law of large numbers, we have

$$\begin{aligned} \underset{\Psi \in {\bar{\Gamma }}_{\sigma }^t}{\sup } n^{-1}\{ \ell _n^{t}(\Psi )-l_n(\Psi _0) \} \rightarrow G_{\tau _0}-K(\Psi _0) \quad \text{ almost } \text{ surely }. \end{aligned}$$
(A.6)

Note that for each \(i\in A_k\), it is easy to show \(\log \{f(x_i;\Psi )\}\le -\log \sigma _{k}+ \log \{g_{t}(x_i;\Psi )\}\). For \(i \not \in A_k\), since \(|x_i-\mu _{k}|\ge |\sigma _{k}\log \sigma _{k}|\), if \(\sigma _{k}\) is small enough that \(\sigma _{k}^{-1}=\exp \{ -\log \sigma _{k} \}< \exp \{ \frac{1}{4}\log ^2\sigma _{k} \}\), then the inequalities

$$\begin{aligned} \varphi (x;\theta _{k})\le \frac{2}{\sigma _{k}}\phi \bigg (\frac{x-\mu _{k}}{\sigma _{k}}\bigg ) \le \frac{1}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\sigma _{k}}\bigg ) \le \frac{1}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\epsilon _0}\bigg ) \end{aligned}$$

hold with \(\sigma _{k}\le \epsilon _0\), which imply that \(\log \{f(x_i;\Psi )\}\le \log \{g_{t}(x_i;\Psi )\}\).

In summary, for \(\Psi \in {\bar{\Gamma }}_{\sigma }^t\), the log-likelihood has the following upper bound

$$\begin{aligned} l_n(\Psi )\le \ell _n^{t}(\Psi ) -\sum _{k=1}^t n(A_{k})\log \sigma _{k}. \end{aligned}$$

Moreover, when \(0<\sigma _k\le n^{-1}\), \(n(A_k)\le (4M+20)\log n\) and

$$\begin{aligned} \log \frac{1}{\sigma _k}\le \log \frac{\sigma _g}{\sigma _k}-\log \sigma _g \le \frac{n}{(\log n)^{\alpha }} -\log \epsilon _0 =O\big (n(\log n)^{-\alpha }\big ). \end{aligned}$$
(A.7)

Hence, for \(\alpha >1\),

$$\begin{aligned} n(A_{k})\log \frac{1}{\sigma _k}\le (4M+20)O(n(\log n)^{1-\alpha })=o(n) \quad \text{ almost } \text{ surely }. \end{aligned}$$

For \(n^{-1}<\sigma _k\le e^{-2}\),

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+1)n\sigma _k\log ^2\sigma _k\le (4M+1)n\tau _0\log ^2\tau _0. \end{aligned}$$
(A.8)

With the conclusions of (A.6)–(A.8), recalling the third conditions in (A.5) that \(t(4M+1)\tau _0\log ^2\tau _0\le \frac{1}{2}\{K(\Psi _0)-G_{\tau _0}\}\), it can be showed that for \(t=1,\ldots ,g-1\),

$$\begin{aligned} \sup _{\Gamma _{n\sigma }^t} \ l_n(\Psi )-l_n(\Psi _0)&\le \underset{\Gamma _{n\sigma }^t}{\sup } \{ \ell _n^{t}(\Psi )-l_n(\Psi _0) \} +\underset{\Gamma _{n\sigma }^t}{\sup }\sum _{k=1}^t n(A_{k})\frac{1}{\log \sigma _{k}}+o(n) \\&\le 0.9n\{G_{\tau _0}-K(\Psi _0)\} + t(4M+1)n\tau _0\log ^2\tau _0+o(n) \\&\le 0.4\{G_{\tau _0}-K(\Psi _0)\}n+o(n)\rightarrow -\infty \ \quad \text{ almost } \text{ surely }. \end{aligned}$$

Remark 7.1

Based on the first two steps, we actually exclude the possibility of the CMLE \({\hat{\Psi }}_n\) not belonging to the subspace \(\Gamma _{n\sigma }=\cup _{t=1}^g\Gamma ^t_{n\sigma }\). Besides, confining \(\Psi \) to \(\Gamma _{n\sigma }^c\) is equivalent to setting up a positive constant lower bound on \(\sigma _k\) for \(k=1,\ldots ,g\).

Step 3: To show

$$\begin{aligned} \sup _{\Psi \in \Gamma _{n\sigma }^c\cap \Gamma _{n\lambda }} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Define \(\Gamma _{\lambda }=\{ \Psi \in \Gamma : \max _i\{ |\lambda _i|\}\ge \eta _0 \}\). It is clear that \(\Gamma _{n\lambda }\subset \Gamma _{\lambda }\) and \((\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda })\subset (\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda })\). Recall \(K(\Psi )=E_0\log f(X;\Psi )\) and denote \(K_{\eta _0}=\sup \{ K(\Psi ): \Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda } \}\). Suppose \(\eta _0\) is large enough so that \(\Psi _0\notin \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }\). According to Jensen’s inequality, we have \(E_{0}\log \{ f(X;\Psi )/f(X;\Psi _0)\}<0\) for any \(\Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }\). Thus we can show \(K_{\eta _0}-K(\Psi _0)<0\). Note that \(K_{\eta _0}\) is an increasing function of \(\eta _0\). Consequently it is easy to show that,

$$\begin{aligned} \underset{\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }}{\sup }\bigg \{ \frac{1}{n}\sum _{i=1}^n\log \bigg ( \frac{f(x_i;\Psi )}{f(x_i;\Psi _0)}\bigg )\bigg \} \rightarrow K_{\eta _0}-K(\Psi _0)<0 \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Furthermore, we obtain

$$\begin{aligned} \sup _{\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }} l_n(\Psi ) -l_n(\Psi _0)&= \underset{\Gamma ^c_{\sigma }\cap \Gamma _{\lambda }}{\sup } \sum _{i=1}^n\log \bigg ( \frac{f(x_i;\Psi )}{f(x_i;\Psi _0)}\bigg ) \\&\le \frac{K_{\eta _0}-K(\Psi _0)}{2}n+o(n)\rightarrow -\infty \quad \text{ almost } \text{ surely }. \end{aligned}$$

Thus we have \(\sup _{\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda }} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \) almost surely as \(n\rightarrow \infty \).

Step 4: From the above, we can conclude that \({\hat{\Psi }}_n\in \Gamma ^{\dagger }\) almost surely as n goes to infinity. By noting the compactness of \(\Gamma ^{\dagger }\), we can conclude the strong consistency of \({\hat{\Psi }}_n\) from Theorem 2.2. \(\square \)

Proof of Theorem 3.2

The proof is almost identical to that of Theorem 2.3. The details are omitted. \(\square \)

Proof of Theorem 3.3

Let \({\hat{\lambda }}^2_{(i)}\) be the ith order statistic of \({\hat{\lambda }}^2_1,\ldots ,{\hat{\lambda }}^2_g\), it is then easy to show that the constraint \(\max _{i,j} \ (1+\hat{\lambda }^2_i)(1+\hat{\lambda }^2_j)\le \vartheta _n\) is equivalent to \((1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le \vartheta _n\). Because \((1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le (1+\hat{\lambda }^2_{(g)})^2\), we have

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \}&={\mathbb {P}}\{ (1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le \vartheta _n \} \nonumber \\&\ge {\mathbb {P}}\{ (1+\hat{\lambda }^2_{(g)})^2\le \vartheta _n \} \nonumber \\&={\mathbb {P}}\big \{ |\hat{\lambda }_{(g)}|\le (\vartheta ^{1/2}_n-1)^{1/2} =O(\vartheta ^{1/4}_n)\big \} \nonumber \\&={\mathbb {P}}\{ |{\hat{\lambda }}_1|\le \vartheta ^{1/4}_n,\ldots , |{\hat{\lambda }}_g|\le \vartheta ^{1/4}_n \} \nonumber \\&={\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \}. \end{aligned}$$
(A.9)

in which \({\mathbf {1}}_g\) is a \(g\times 1\) vector with all elements is 1. By the mean value theorems for definite integrals, there exists at least one point \(|{\mathbf {x}}_0|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g\) such that

$$\begin{aligned} {\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \} =\int _{|{\mathbf {x}}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g} \phi _g({\mathbf {x}};\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }})d{\mathbf {x}} =(2\vartheta ^{1/4}_n)^g\cdot \phi _g({\mathbf {x}}_0;\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }}), \end{aligned}$$

in which \(\phi _g({\mathbf {x}};\varvec{\mu },\Sigma )\) denotes g-dimensional multivariate normal density function with mean vector \(\varvec{\mu }\) and covariance matrix \(\Sigma \). As \(n\rightarrow \infty \), it can be seen that

$$\begin{aligned} \phi _g({\mathbf {x}}_0;\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }}) =O\{n^{g/2}\exp (-\beta n)\}, \end{aligned}$$

where \(\beta \) is a positive constant. This further implies that as \(n\rightarrow \infty \) and \(\vartheta _n\rightarrow \infty \),

$$\begin{aligned} {\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \} =O\{\vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)\}. \end{aligned}$$
(A.10)

Combining (A.9) and (A.10) leads to

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \ge O\{\vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)\}. \end{aligned}$$
(A.11)

Meanwhile, it is easy to verified that

$$\begin{aligned} \lim _{n,\vartheta _n\rightarrow \infty } \vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)=1. \end{aligned}$$

Consequently, we have

$$\begin{aligned} \vartheta _n=O\{ n^{-2}\exp (4\beta n/g) \}, \end{aligned}$$

which can also be derived from (A.10). Let us assume further that

$$\begin{aligned} \vartheta _n=O\{n^{-2}\exp (4\beta n/g)\}-\kappa (n), \end{aligned}$$
(A.12)

in which \(\kappa (n)=o\{ n^{-2}\exp (4\beta n/g) \}\), and thus \(\Delta _n=\kappa (n)n^{2}\exp (-4\beta n/g)=o(1)\). Under this assumption, we have

$$\begin{aligned} \vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)&=\big \{ \vartheta _nn^{2}\exp (-4\beta n/g) \big \}^{g/4} \nonumber \\&=\big \{ 1- \kappa (n)n^{2}\exp (-4\beta n/g) \big \}^{g/4} \nonumber \\&=1- \frac{g}{4}\Delta _n+o(\Delta _n). \end{aligned}$$
(A.13)

Ignoring the higher order term in (A.13) and recalling the conclusion of (A.11), we get

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \ge 1- \frac{g}{4}\Delta _n. \end{aligned}$$
(A.14)

On the other hand, the lower bound of \({\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\) defined in (2.3) can be simplified to

$$\begin{aligned} {\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\ge 2\cdot \nu ^{n}, \quad \text{ for } \nu \in (0,1). \end{aligned}$$

The existence of \(\nu \) lies in that when \(x\in (0,1)\), \(x^{n}\) is continuous and monotone. This further implies that

$$\begin{aligned} 1-{\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\le 1-2\cdot \nu ^{n}. \end{aligned}$$
(A.15)

Consequently, the probability condition of (3.4) links two bounds in (A.14) and (A.15), thus we obtain

$$\begin{aligned} 1- \frac{g}{4}\Delta _n\le {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \le 1-{\mathbb {P}}\left\{ \max _i|{\hat{\lambda }}_i|=\infty \right\} \le 1-2\cdot \nu ^{n} \end{aligned}$$

and

$$\begin{aligned} \Delta _n\ge O(\nu ^{n}). \end{aligned}$$

Therefore, by the definition of \(\Delta _n\), it is easy to show

$$\begin{aligned} \kappa (n)=\Delta _nn^{-2}\exp (4\beta n/g) \ge O\{\nu ^{n}n^{-2}\exp (4\beta n/g)\}. \end{aligned}$$

Applying this result to (A.12) finally leads to an approximate upper bound for \(\vartheta _n\), i.e.

$$\begin{aligned} \vartheta _n\le O\{(1-\nu ^{n})n^{-2}\exp (4\beta n/g)\}, \end{aligned}$$

in which \(\beta >0\) and \(0<\nu <1\). Moreover, absorbing 4 and g into the positive constant \(\beta \), the result can be simplified to

$$\begin{aligned} \vartheta _n\le O\{(1-\nu ^{n})n^{-2}\exp (\beta n)\}. \end{aligned}$$

\(\square \)

Proof of Theorem 3.4

The proof is similar to that of Theorem 3.3 and thus omitted. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, L., Chiu, S.N., Zhao, J. et al. A constrained maximum likelihood estimation for skew normal mixtures. Metrika 86, 391–419 (2023). https://doi.org/10.1007/s00184-022-00873-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-022-00873-2

Keywords

Navigation