A constrained maximum likelihood estimation for skew normal mixtures

Jin, Libin; Chiu, Sung Nok; Zhao, Jianhua; Zhu, Lixing

doi:10.1007/s00184-022-00873-2

A constrained maximum likelihood estimation for skew normal mixtures

Published: 30 June 2022

Volume 86, pages 391–419, (2023)
Cite this article

Metrika Aims and scope Submit manuscript

Libin Jin¹,
Sung Nok Chiu²,
Jianhua Zhao³ &
…
Lixing Zhu^2,3

330 Accesses
1 Altmetric
Explore all metrics

Abstract

For a finite mixture of skew normal distributions, the maximum likelihood estimator is not well-defined because of the unboundedness of the likelihood function when scale parameters go to zero and the divergency of the skewness parameter estimates. To overcome these two problems simultaneously, we propose constrained maximum likelihood estimators under constraints on both the scale parameters and the skewness parameters. The proposed estimators are consistent and asymptotically efficient under relaxed constraints on the scale and skewness parameters. Numerical simulations show that in finite sample cases the proposed estimators outperform the ordinary maximum likelihood estimators. Two real datasets are used to illustrate the success of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A general class of scale-shape mixtures of skew-normal distributions: properties and estimation

Article 11 October 2016

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Article 19 January 2016

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

Article 19 May 2018

References

Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
MathSciNet MATH Google Scholar
Azzalini A, Arellano-Valle RB (2013) Maximum penalized likelihood estimation for skew-normal and skew-$t$ distributions. J Statis Plann Inference 143:419–433
Article MathSciNet MATH Google Scholar
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61:579–602
Article MathSciNet MATH Google Scholar
Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Article MathSciNet MATH Google Scholar
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142
Article MathSciNet MATH Google Scholar
Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61
Article MathSciNet MATH Google Scholar
Chen J, Li P (2009) Hypothesis test for normal mixture models: the EM approach. Ann Statist 37:2523–2542
Article MathSciNet MATH Google Scholar
Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Statistica Sinica 18:443–465
MathSciNet MATH Google Scholar
Chen J (2017) Consistency of the MLE under mixture models. Stat Sci 32:47–63
Article MathSciNet MATH Google Scholar
Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–22
MathSciNet MATH Google Scholar
DiCiccio TJ, Monti AC (2004) Inferential aspects of the skew exponential power distribution. J Amer Statist Assoc 99:439–450
Article MathSciNet MATH Google Scholar
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-$t$ distributions. Biostatistics 11:317–336
Article MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Statist 36(3):1324–1345
Article MathSciNet MATH Google Scholar
Greco L (2011) Minimum Hellinger distance based inference for scalar skew-normal and skew-t distributions. Test 20:120–137
Article MathSciNet MATH Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Statist 13:795–800
Article MathSciNet MATH Google Scholar
Hathaway RJ (1986) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230
Article Google Scholar
Ho HJ, Pyne S, Lin TI (2012) Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Stat Comput 22(1):287–299
Article MathSciNet MATH Google Scholar
Jin L, Xu W, Zhu L, Zhu L (2019) Penalized maximum likelihood estimator for skew normal mixtures (in Chinese). Sci Sin Math 49:1225–1250
Article MATH Google Scholar
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:887–906
Article MathSciNet MATH Google Scholar
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivariate Anal 100(2):257–265
Article MathSciNet MATH Google Scholar
Lin TI, Lee JC, Hsieh WJ (2007a) Robust mixture modeling using the skew $t$ distribution. Stat Comput 17(2):81–92
Article MathSciNet Google Scholar
Lin TI, Lee JC, Yen SY (2007b) Finite mixture modelling using the skew normal distribution. Statist Sinica 17:909
MathSciNet MATH Google Scholar
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivariate Anal 143:398–413
Article MathSciNet MATH Google Scholar
Lindsay BG (1995) Mixture Models: Theory. Geometry and Applications, Institute of Mathematical Statistics and American Statistical Association
McLachlan G, Peel D (2000) Finite Mixture Models. John Wiley & Sons, New York
Book MATH Google Scholar
Nettleton D (1999) Convergence properties of the EM algorithm in constrained parameter spaces. Canad J Statist 27(3):639–648
Article MathSciNet MATH Google Scholar
Peters BC, Walker HF (1978) An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM J Appl Math 35:362–378
Article MathSciNet MATH Google Scholar
Phillips RF (1991) A constrained maximum-likelihood approach to estimating switching regressions. J Econometrics 48:241–262
Article MathSciNet MATH Google Scholar
Prates MO, Cabral CRB, Lachos VH (2013) mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions. J Statist Softw 54(12):1–20
Article Google Scholar
Redner R (1981) Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann Statist 9:225–228
Article MathSciNet MATH Google Scholar
Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. The 2007 Proceedings of the American Statistical Association, Section on Statistical Education, Alexandria, VA, 2113 - 2119
Wald A (1949) Note on the Consistency of the Maximum Likelihood Estimate. Ann Math Stat 20:595–601
Article MathSciNet MATH Google Scholar
Wolfowitz J (1949) On wald’s proof of the consistency of the maximum likelihood estimate. Ann Math Stat 20:601–602
Article MathSciNet MATH Google Scholar
Xu J, Tan X, Zhang R (2010) A note on Phillips (1991): “A constrained maximum likelihood approach to estimating switching regressions’’. J Econom 154:35–41
Article MATH Google Scholar
Zhao J, Jin L, Shi L (2015) Mixture model selection via hierarchical bic. Comput Stat Data Anal 88:139–153
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the Chief Editor, the Associate Editor and two anonymous referees for their comments and suggestions, which have led to a much improved version of this article. We are also grateful to Prof. Tsung I. Lin for his help in providing codes about Bayesian estimation. This research was supported by two grants from the University Grants Council of Hong Kong, Hong Kong, China, four grants from the National Science Foundation of China (NSFC: 11801370, 12161089, 11761076, 11671042), a grant from Yunnan Natural Science Foundation (2019FB002).

Author information

Authors and Affiliations

Statistics and Mathematics College Interdisciplinary Research Institute of Data Science, Shanghai Lixin University of Accounting and Finance, Shanghai, China
Libin Jin
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Sung Nok Chiu & Lixing Zhu
School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, China
Jianhua Zhao & Lixing Zhu

Authors

Libin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Sung Nok Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lixing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lixing Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proofs of (2.2) and its lower bound. The equation (2.2) can be clearly figured out by considering its counterpart, the probability that all ${\hat{\lambda }}_i$ is finite, denoted as ${\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}$. It is easy to confirm that within n observations $\{x_i\}_{i=1}^n$, if there exists at least one $x_i$ larger than $\mu _{(g)}$, then there will be no ${\hat{\lambda }}_i=-\infty $. Meanwhile, if there exists at least one $x_i$ smaller than $\mu _{(1)}$, then there will be no ${\hat{\lambda }}_i=\infty $. Therefore, the probability ${\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}$ equals to the probability of simultaneously existing at least one $x_i>\mu _{(g)}$ and at least one $x_j<\mu _{(1)}$.

Recall $\mu _{(1)},\cdots ,\mu _{(g)}$, denote $\mu _{(0)}=-\infty $ and $\mu _{(g+1)}=\infty $, and define the probability ${\mathbb {P}}_{k}={\mathbb {P}}\{ X\in (\mu _{(k)}, \mu _{(k+1)})\}$ with $k=0,1,\cdots ,g$. According to the Multinomial theorem, the total probability is given by

$$\begin{aligned} 1=({\mathbb {P}}_{0}+{\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g})^n =\sum _{\Omega }\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g} \end{aligned}$$

with

$$\begin{aligned} \Omega = \left\{ (n_0,n_1,\cdots ,n_g)\in {\mathbb {N}}^{p+1}| \sum _{k=0}^gn_k=n\right\} , \end{aligned}$$

in which ${\mathbb {N}}$ is the natural number set. Then the probability of all ${\hat{\lambda }}_i$ being finite can be viewed as the probability of $n_0\ge 1,n_g\ge 1$. Denote

$$\begin{aligned} \Omega _1=\{ (n_0,n_1,\cdots ,n_g)\in \Omega | n_0\ge 1,n_g\ge 1\} \end{aligned}$$

and we obtain

$$\begin{aligned} {\mathbb {P}} \left\{ \max _i(|{\hat{\lambda }}_i|)<\infty \right\} =\sum _{\Omega _1}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}. \end{aligned}$$

Next, we investigate the probability ${\mathbb {P}} \{ \max _i|{\hat{\lambda }}_i|=\infty \}$. First, by complementing $\Omega _1$, we have

$$\begin{aligned} \Omega ^c_1=\{ (n_0,n_1,\cdots ,n_g)\in \Omega | n_0=0 \ \text {or} \ n_g=0\}. \end{aligned}$$

Then, we can conclude that

$$\begin{aligned} {\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}&=1-{\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)<\infty \}\\&=\sum _{\Omega ^c_1}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&=\sum _{n_0=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&\quad +\sum _{n_g=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g} \\&\quad -\sum _{n_0=0,n_g=0}\frac{n!}{n_0!\cdot n_1!\cdots n_g!} {\mathbb {P}}_{0}^{n_0}{\mathbb {P}}_{1}^{n_1}\cdots {\mathbb {P}}_{g}^{n_g}\\&=({\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g})^n+ ({\mathbb {P}}_{0}+{\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g-1})^n\\&\quad -({\mathbb {P}}_{1}+\cdots +{\mathbb {P}}_{g-1})^n \\&=[{\mathbb {P}}\{ X>\mu _{(1)}\}]^n+[{\mathbb {P}}\{X<\mu _{(g)}\} ]^n -[{\mathbb {P}}\{X\in (\mu _{(1)}, \mu _{(g)})\}]^n. \end{aligned}$$

This completes the proof of equation (2.2).

At the same time, we study the condition that ${\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}$ attains a lower bound. Let ${\mathbb {P}}\{X<\mu _{(g)}\}={\mathbb {P}}_a$, ${\mathbb {P}}\{ X>\mu _{(1)}\}={\mathbb {P}}_b$ and ${\mathbb {P}}\{X\in (\mu _{(1)}, \mu _{(g)})\}={\mathbb {P}}_{c}$. Note that ${\mathbb {P}}_b=1-{\mathbb {P}}_a+{\mathbb {P}}_{c}$, then we have

$$\begin{aligned} {\mathbb {P}} \left\{ \max _i(|{\hat{\lambda }}_i|)=\infty \right\}&=({\mathbb {P}}_a)^n+ ({\mathbb {P}}_b)^n-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+ (1-{\mathbb {P}}_a+{\mathbb {P}}_{c})^n-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+ \sum _{r=0}^nC_n^r(1-{\mathbb {P}}_a)^r({\mathbb {P}}_{c})^{n-r}-({\mathbb {P}}_{c})^n \\&=({\mathbb {P}}_a)^n+(1-{\mathbb {P}}_a)^n+\sum _{r=1}^{n-1}C_n^r(1-{\mathbb {P}}_a)^r({\mathbb {P}}_{c})^{n-r}. \end{aligned}$$

Note that ${\mathbb {P}}_a, 1-{\mathbb {P}}_a\in (0,1)$ and ${\mathbb {P}}_{c}\ge 0$. Hence, we can conclude that only when ${\mathbb {P}}_{c}=0$, that is $\mu _{(1)}=\mu _{(g)}$, ${\mathbb {P}} \{ \max _i(|{\hat{\lambda }}_i|)=\infty \}$ will attain its minimum. $\square $

Proof of Theorem 2.1

Suppose that $\Psi _1\in \Gamma _c$ satisfies the ith component location parameter $|\mu _i|<\infty $, and $\Psi _2\in \Gamma _c$ is a copy from $\Psi _1$ but sets $|\mu _i|\rightarrow \infty $. Note that $f(x;\Psi _2)<f(x;\Psi _1)$ and thus $l_n(\Psi _2)<l_n(\Psi _1)$. Moreover, for $\Psi \in \Gamma _c$, it is easy to show $\sigma _j=O(\sigma _i),j\ne i$ when either $\sigma _i\rightarrow 0$ or $\infty $. Hence, for any $\Psi \in \Gamma _c$ satisfying either $\sigma _i\rightarrow 0$ or $\infty $, we have $l_n(\Psi )\rightarrow -\infty $. The constraints on $\lambda $ in $\Gamma _c$ are of compactness and are equivalent to confine that $\max |\lambda _i|\le \sqrt{c_2-1}$. From the above, we conclude that $\sup _{\Psi \in \Gamma _c}l_n(\Psi )=\sup _{\Psi \in \Gamma ^*}l_n(\Psi )$, where

$$\begin{aligned} \Gamma ^*= & {} \{ \Psi \in \Gamma _c: |\mu _i|\le {\overline{\mu }}<\infty , 0<{\underline{\sigma }}\le \sigma _i\le {\overline{\sigma }}<\infty ,\\&\qquad |\lambda _i|\le {\overline{\lambda }}<\infty \ \text {for} \ i=1,\cdots ,g\} \end{aligned}$$

for some constants ${\overline{\mu }},{\underline{\sigma }},{\overline{\sigma }}$, and ${\overline{\lambda }}$. Hence, in the context of the compactness of $\Gamma ^*$ and the continuity of $l_n(\Psi )$, there exists a $\Psi ^*\in \Gamma _c$ such that

$$\begin{aligned} l_n(\Psi ^*)=\sup \{l_n(\Psi ): \Psi \in \Gamma _c\}=\sup \{l_n(\Psi ):\Psi \in \Gamma ^*\}. \end{aligned}$$

$\square $

Proof of Theorem 2.2

For ${\hat{\Psi }}_c\in \Gamma ^*$, let $d(\cdot ,\cdot )$ be the distance on $\Gamma ^*$ and for any $\epsilon >0$, define

$$\begin{aligned} B(\Psi _0,\epsilon )=\{ \Psi : d(\Psi , \Psi _0)<\epsilon \} \end{aligned}$$

as an open ball centered at $\Psi _0$ with radius $\epsilon $. Define the complement of $B(\Psi _0,\epsilon )$ in the space $\Gamma ^*$ as

$$\begin{aligned} \Gamma ^*(\epsilon )=\Gamma ^* \setminus B(\Psi _0,\epsilon ). \end{aligned}$$

For any $\delta >0$ and $\Psi $, denote

$$\begin{aligned} f(x;\Psi ,\delta )=\sup \{f(x;\Psi '): \, d(\Psi , \Psi ')<\delta , \Psi '\in \Gamma ^*(\epsilon )\}. \end{aligned}$$

Note that $\lim _{\delta \rightarrow 0}\log f(x;\Psi ,\delta )=\log f(x;\Psi )$ and $\sup \{ f(x;\Psi ): \Psi \in \Gamma ^*(\epsilon ) \}<\infty $. By the dominated convergence theorem, we have

$$\begin{aligned} \underset{\delta \rightarrow 0}{{\overline{\lim }}} E_0\log f(X;\Psi ,\delta ) \le E_0\underset{\delta \rightarrow 0}{{\overline{\lim }}}\log f(X;\Psi ,\delta ) =E_0\log f(X;\Psi ), \end{aligned}$$

(A.1)

where $E_0(\cdot )$ is the expectation with respect to the density function $f(x;\Psi _0)$.

For any $\Psi \in \Gamma ^*(\epsilon )$, let $K(\Psi )=E_0\log f(X;\Psi )$. Due to the compactness of $\Gamma ^*(\epsilon )$, there exists a $\Psi _{\epsilon }\in \Gamma ^*(\epsilon )$ such that $K(\Psi _{\epsilon })=\sup \{ K(\Psi ): \Psi \in \Gamma ^*(\epsilon )\}$. Further, using Jensen’s inequality, we have $k_0=K(\Psi _0)-K(\Psi _{\epsilon })>0$. From (A.1), it is easy to show that for all $\Psi \in \Gamma ^*(\epsilon )$, there exists a $\delta _{\Psi }$ such that $E_0\log f(X;\Psi ,\delta _{\Psi })\le E_0\log f(X;\Psi )+k_0/2$.

In addition, the compactness of $\Gamma ^*(\epsilon )$ ensures the existence of a finite number of $\Psi _1,\Psi _2,\ldots ,\Psi _h$ such that $\Gamma ^*(\epsilon )\subset \cup _{i=1}^h B(\Psi _i,\delta _{\Psi _i})$. By Kolmogorov’s strong law of large numbers, for $i=1,\ldots ,h$, we get

$$\begin{aligned} \begin{aligned}&n^{-1}\sum _{t=1}^n \log f(x_t;\Psi _i,\delta _{\Psi _i}) \rightarrow E_0 \log f(X;\Psi _i,\delta _{\Psi _i}), \\&n^{-1}\sum _{t=1}^n \log f(x_t;\Psi _0)\rightarrow E_0 \log f(X;\Psi _0), \end{aligned} \end{aligned}$$

(A.2)

almost surely as $n\rightarrow \infty $. Note that

$$\begin{aligned} \begin{aligned}&E_0 \log f(X;\Psi _i,\delta _{\Psi _i})-E_0\log f(X;\Psi _i)\le k_0/2, \\&E_0\log f(X;\Psi _i)-E_0\log f(X;\Psi _0)\le -k_0. \end{aligned} \end{aligned}$$

(A.3)

Combining (A.2) and (A.3), we conclude that when $i=1,\ldots ,h$,

$$\begin{aligned} P\bigg [\lim _{n\rightarrow \infty }\sum _{t=1}^n \{\log f(x_t;\Psi _i,\delta _{\Psi _i})-\log f(x_t;\Psi _0)\} =-\lim _{n\rightarrow \infty }\frac{k_0}{2}n=-\infty \bigg ]=1, \end{aligned}$$

or

$$\begin{aligned} \forall \epsilon >0, \sup _{\Psi \in \Gamma ^*(\epsilon )} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Hence, by the definition of the constrained MLE, ${\hat{\Psi }}_c\rightarrow \Psi _0$ almost surely. $\square $

Proof of Theorem 2.3

Since ${\hat{\Psi }}_c$ is strongly consistent, by Taylor’s expansion, we have

$$\begin{aligned} 0= & {} \frac{\partial l_n({\hat{\Psi }}_c)}{\partial \Psi }\nonumber \\= & {} \frac{\partial l_n(\Psi _0)}{\partial \Psi } +\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}({\hat{\Psi }}_c-\Psi _0)\nonumber \\&+0.5({\hat{\Psi }}_c-\Psi _0)^TR_n(\Psi ')({\hat{\Psi }}_c-\Psi _0), \end{aligned}$$

(A.4)

where $\Psi '$ is an intermediate point between ${\hat{\Psi }}_c$ and $\Psi _0$, thus $\Psi '\rightarrow \Psi _0$ in probability as $n\rightarrow \infty $. Write it as $\Psi '-\Psi _0=o_p(1)$. Besides, $R_n(\Psi ')$ is a 3-dimensional array with elements

$$\begin{aligned} R_n(i,j,k)= \bigg (\frac{\partial ^3 l_n(\Psi ')}{\partial \Psi _i\partial \Psi _j\partial \Psi _k}\bigg ), i,j,k\in \{1,\ldots ,4g-1\}. \end{aligned}$$

Based on (A.4), it is easy to obtain

$$\begin{aligned} \sqrt{n}({\hat{\Psi }}_c-\Psi _0)=- \bigg \{0.5({\hat{\Psi }}_c-\Psi _0)^T\frac{R_n(\Psi ')}{n} +\frac{1}{n}\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}\bigg \}^{-1} \frac{1}{\sqrt{n}}\frac{\partial l_n(\Psi _0)}{\partial \Psi }. \end{aligned}$$

By the weak law of large numbers, we have

$$\begin{aligned} \frac{R_n(\Psi ')}{n}=O_p(1), \frac{1}{n}\frac{\partial ^2 l_n(\Psi _0)}{\partial \Psi \partial \Psi ^T}={\mathcal {I}}(\Psi _0)+o_p(1). \end{aligned}$$

Hence, $0.5({\hat{\Psi }}_c-\Psi _0)^T\frac{R_n(\Psi ')}{n}=o_p(1)$. Besides, by the central limit theorem,

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial l_n(\Psi _0)}{\partial \Psi }\xrightarrow {{\mathcal {L}}} N(0,{\mathcal {I}}(\Psi _0)). \end{aligned}$$

Consequently, invoking Slutsky’s theorem, we have

$$\begin{aligned} \sqrt{n}({\hat{\Psi }}_c-\Psi _0)\xrightarrow {{\mathcal {L}}} N(0,{\mathcal {I}}^{-1}(\Psi _0)). \end{aligned}$$

$\square $

Proof of Theorem 3.1

Before stating the proof of Theorem 3.1, we first introduce a useful lemma proved by Chen (2017) as follows.

Lemma 7.1

Let $X_1,\cdots ,X_n$ be i.i.d. observations from an absolute continuous distribution F with density function f(x). Suppose f(x) is continuous and $M=\sup _x f(x)<\infty $. Let $F_n(x)=n^{-1}\sum _{i=1}^nI(X_i\le x)$ be the empirical distribution function. Thus, as $n\rightarrow \infty $,

$$\begin{aligned} \underset{x\in {\mathbb {R}}}{\sup } \{ F_n(x+\epsilon )- F_n(x)\}\le 2M\epsilon +10n^{-1}\log n, \end{aligned}$$

holds uniformly for all $\epsilon >0$ almost surely.

This lemma aims to assess the number of observations falling in a small neighborhood of the location parameters in $\Psi $, which is crucially related to the upper bound of $l_n(\Psi )$.

Proof of Theorem 3.1

Without loss of generality, assume $\sigma _{1}\le \sigma _{2}\le \cdots \le \sigma _{g}$. To begin, partition the parameter space $\Gamma _n$ into

$$\begin{aligned}&\Gamma ^t_{n\sigma }=\{ \Psi \in \Gamma _n: \sigma _{t}\le \tau _{0} <\epsilon _{0}\le \sigma _{t+1} \}, \\&\Gamma _{n\lambda }=\left\{ \Psi \in \Gamma _n: \max _i\{ |\lambda _i|\}\ge \eta _0 \right\} , \\&\Gamma ^\dag =\Gamma _n-(\cup _{t=1}^g\Gamma ^t_{n\sigma })\cup \Gamma _{n\lambda }, \end{aligned}$$

where $\epsilon _0, \tau _{0}$ and $\eta _0$ are three constants. Since $\cup _{t=1}^g\Gamma ^t_{n\sigma }=\{\Psi \in \Gamma _n: \min _i\{\sigma _i\}\le \epsilon _{0}\}$, the subspace $\cup _{t=1}^g\Gamma ^t_{n\sigma }$ thus represents all possible cases that the mixing distribution has at least one component deviation close to zero.

Let $M=\sup _x f(x;\Psi _0)$ and recall $K(\Psi _0)=E_{0}\{\log f(X;\Psi _0)\}$. We then select $\epsilon _0, \tau _{0}, \eta _0$ satisfying the following conditions:

$$\begin{aligned} \begin{aligned}&\text{(i) } \quad (4M+1)(g-1)\epsilon _{0}\log ^2\epsilon _{0}\le 1, \\&\text{(ii) } \quad \log \epsilon _{0}+ \frac{\log ^2\epsilon _{0}}{4} \ge 2-K(\Psi _0), \\&\text{(iii) } \quad t(4M+1)\tau _0\log ^2\tau _0\le \frac{1}{2}\{G_{\tau _0}-K(\Psi _0)\}, \\&\text{(iv) } \quad \eta _0>\max _i\{ |\lambda _{0i}| \},i=1,\ldots ,p, \end{aligned} \end{aligned}$$

(A.5)

where $\lambda _{0i}$ is the element of $\Psi _0$ and $G_{\tau _0}$ satisfying $G_{\tau _0}-K(\Psi _0)>0$ will be specified later. The choice of $\epsilon _0$ and $\eta _0$ clearly depends on $\Psi _0$ but not on the sample size n, meanwhile, the existence of $\epsilon _0$ and $\eta _0$ is easy to verify. We proceed with the proof in four steps.

Step 1: First, we exclude the possibility of ${\hat{\Psi }}_n\in \Gamma ^g_{n\sigma }$. Define index sets of observations $A_{k}=\{ i:|x_i-\mu _{k}|<|\sigma _{k}\log \sigma _{k}| \}$ for $k=1,\ldots ,g$ and let $n(A_{k})$ be the number of elements belonging to $A_{k}$. Denote

$$\begin{aligned} l_n(\Psi ;A_{k}) = \sum _{i\in A_{k}} \log f(x_i;\Psi ). \end{aligned}$$

Note that for any $i\in A_{k}$, the log-likelihood contribution of $x_i$ is bounded by $-\log \sigma _{k}$. While for $i\in \cap _{k=1}^{g}A_{k}^c$, it is easy to show $\log f(x_i;\Psi )\le -\log \sigma _g-\frac{\log ^2\sigma _g}{2}$. Hence, we get

$$\begin{aligned} l_n(\Psi )&=\sum _{k=1}^g l_n\{\Psi ;(\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k}\}+ l_n\{\Psi ;\cap _{i=1}^{g}A_{i}^c\} \\&\le \sum _{k=1}^{g} n((\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k})\log \frac{1}{\sigma _k} + n(\cap _{i=1}^{g}A_{i}^c)\left( -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\right) \\&= \sum _{k=1}^{g-1} n((\cap _{i=1}^{k-1}A_{i}^c)\cap A_{k})\log \frac{\sigma _g}{\sigma _k} +n(\cup _{i=1}^{g}A_{i})\log \frac{1}{\sigma _g}\\&\quad + n(\cap _{i=1}^{g}A_{i}^c)\left( -\log \sigma _g-\frac{\log ^2\sigma _g}{2}\right) \\&\le \sum _{k=1}^{g-1} n(A_{k})\log \frac{\sigma _g}{\sigma _k} +n\log \frac{1}{\sigma _g}+ n(\cap _{i=1}^{g}A_{i}^c)\left( -\frac{\log ^2\sigma _g}{2}\right) . \end{aligned}$$

For $0<\sigma _k\le n^{-1}$, by lemma and $\alpha <1$, we have

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+20)\log n\frac{n}{(\log n)^{\alpha }}=o(n) \end{aligned}$$

and for $n^{-1}<\sigma _k\le e^{-2}$,

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+1)n\sigma _k\log ^2\sigma _k\le (4M+1)n\epsilon _0\log ^2\epsilon _0 \end{aligned}$$

For small enough $\epsilon _0$, we have $\sum _{i=1}^g n(A_i)<n/2$ and $n(\cap _{i=1}^{g}A_{i}^c)\ge n/2$ almost surely. Noticing the first two conditions in (A.5), we obtain that

$$\begin{aligned} l_n(\Psi )\le n(K(\Psi _0)-1). \end{aligned}$$

By the strong law of large numbers, we have $n^{-1}l_n(\Psi _0)\xrightarrow {a.s.} K(\Psi _0)$. Thus,

$$\begin{aligned} \underset{\Psi \in \Gamma ^g_{n\sigma }}{\sup }l_n(\Psi )-l_n(\Psi _0)\le -n\rightarrow -\infty , \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Step 2: Then we prove that ${\hat{\Psi }}_n$ is not in $\Gamma ^t_{n\sigma }$ with probability one when $t=1,\ldots ,g-1$. Let $\Gamma ^t_{\sigma }=\{ \Psi \in \Gamma : \sigma _{t}\le \tau _{0} <\epsilon _{0}\le \sigma _{t+1} \}$ for $t=1,\ldots ,g-1$, and it is clear that $\Gamma ^t_{n\sigma }\subset \Gamma ^t_{\sigma }$. Define a distance on $\Gamma ^t_{\sigma }$ by

$$\begin{aligned} d(\Psi , \Psi ')= \sum _{i=1}^{4g-1}|\arctan \Psi _i-\arctan \Psi '_i|, \end{aligned}$$

under which $\Gamma ^t_{\sigma }$ is totally bounded and can be compacted. Let ${\bar{\Gamma }}^t_{\sigma }$ be a compactified $\Gamma ^t_{\sigma }$ which includes all limit points. For $\Psi \in {\bar{\Gamma }}^t_{\sigma }$, define

$$\begin{aligned} g_{t}(x;\Psi ) = \sum _{k=1}^{t}\frac{\pi _{(k)}}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\epsilon _0}\bigg ) +\sum _{k=t+1}^{g}\pi _{k}f(x;\theta _{k}), \end{aligned}$$

where $g_{t}(x;\Psi )$ is bound over ${\bar{\Gamma }}^t_{\sigma }$, although $\sigma _1=\cdots =\sigma _t=0$ is allowed. For $\Psi \in {\bar{\Gamma }}_{\sigma }^t$, denote $G_t(\Psi )=E_{0}\log \{g_{t}(X;\Psi )\}$ and $G_{\tau _0}=\sup \{ G_t(\Psi ): \Psi \in {\bar{\Gamma }}_{\sigma }^t \}$. If $\tau _0$ is small enough such that $\Psi _0\notin {\bar{\Gamma }}_{\sigma }^t$, by the upper bound of Jensen’s inequality, we have

$$\begin{aligned} G_{\tau _0}-K(\Psi _0)=\sup _{\Psi \in {\bar{\Gamma }}_{\sigma }^t} E_{0}\log \{g_{t}(X;\Psi )/f(X;\Psi _0)\}<0. \end{aligned}$$

Define $\ell _n^{t}(\Psi )=\sum _{i=1}^n \log \{ g_{t}(x_i;\Psi ) \}$ on ${\bar{\Gamma }}_{\sigma }^t$, applying the strong law of large numbers, we have

$$\begin{aligned} \underset{\Psi \in {\bar{\Gamma }}_{\sigma }^t}{\sup } n^{-1}\{ \ell _n^{t}(\Psi )-l_n(\Psi _0) \} \rightarrow G_{\tau _0}-K(\Psi _0) \quad \text{ almost } \text{ surely }. \end{aligned}$$

(A.6)

Note that for each $i\in A_k$, it is easy to show $\log \{f(x_i;\Psi )\}\le -\log \sigma _{k}+ \log \{g_{t}(x_i;\Psi )\}$. For $i \not \in A_k$, since $|x_i-\mu _{k}|\ge |\sigma _{k}\log \sigma _{k}|$, if $\sigma _{k}$ is small enough that $\sigma _{k}^{-1}=\exp \{ -\log \sigma _{k} \}< \exp \{ \frac{1}{4}\log ^2\sigma _{k} \}$, then the inequalities

$$\begin{aligned} \varphi (x;\theta _{k})\le \frac{2}{\sigma _{k}}\phi \bigg (\frac{x-\mu _{k}}{\sigma _{k}}\bigg ) \le \frac{1}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\sigma _{k}}\bigg ) \le \frac{1}{\sqrt{2}} \phi \bigg (\frac{x-\mu _{k}}{\sqrt{2}\epsilon _0}\bigg ) \end{aligned}$$

hold with $\sigma _{k}\le \epsilon _0$, which imply that $\log \{f(x_i;\Psi )\}\le \log \{g_{t}(x_i;\Psi )\}$.

In summary, for $\Psi \in {\bar{\Gamma }}_{\sigma }^t$, the log-likelihood has the following upper bound

$$\begin{aligned} l_n(\Psi )\le \ell _n^{t}(\Psi ) -\sum _{k=1}^t n(A_{k})\log \sigma _{k}. \end{aligned}$$

Moreover, when $0<\sigma _k\le n^{-1}$, $n(A_k)\le (4M+20)\log n$ and

$$\begin{aligned} \log \frac{1}{\sigma _k}\le \log \frac{\sigma _g}{\sigma _k}-\log \sigma _g \le \frac{n}{(\log n)^{\alpha }} -\log \epsilon _0 =O\big (n(\log n)^{-\alpha }\big ). \end{aligned}$$

(A.7)

Hence, for $\alpha >1$,

$$\begin{aligned} n(A_{k})\log \frac{1}{\sigma _k}\le (4M+20)O(n(\log n)^{1-\alpha })=o(n) \quad \text{ almost } \text{ surely }. \end{aligned}$$

For $n^{-1}<\sigma _k\le e^{-2}$,

$$\begin{aligned} n(A_{k})\log \frac{\sigma _g}{\sigma _k}\le (4M+1)n\sigma _k\log ^2\sigma _k\le (4M+1)n\tau _0\log ^2\tau _0. \end{aligned}$$

(A.8)

With the conclusions of (A.6)–(A.8), recalling the third conditions in (A.5) that $t(4M+1)\tau _0\log ^2\tau _0\le \frac{1}{2}\{K(\Psi _0)-G_{\tau _0}\}$, it can be showed that for $t=1,\ldots ,g-1$,

$$\begin{aligned} \sup _{\Gamma _{n\sigma }^t} \ l_n(\Psi )-l_n(\Psi _0)&\le \underset{\Gamma _{n\sigma }^t}{\sup } \{ \ell _n^{t}(\Psi )-l_n(\Psi _0) \} +\underset{\Gamma _{n\sigma }^t}{\sup }\sum _{k=1}^t n(A_{k})\frac{1}{\log \sigma _{k}}+o(n) \\&\le 0.9n\{G_{\tau _0}-K(\Psi _0)\} + t(4M+1)n\tau _0\log ^2\tau _0+o(n) \\&\le 0.4\{G_{\tau _0}-K(\Psi _0)\}n+o(n)\rightarrow -\infty \ \quad \text{ almost } \text{ surely }. \end{aligned}$$

Remark 7.1

Based on the first two steps, we actually exclude the possibility of the CMLE ${\hat{\Psi }}_n$ not belonging to the subspace $\Gamma _{n\sigma }=\cup _{t=1}^g\Gamma ^t_{n\sigma }$. Besides, confining $\Psi $ to $\Gamma _{n\sigma }^c$ is equivalent to setting up a positive constant lower bound on $\sigma _k$ for $k=1,\ldots ,g$.

Step 3: To show

$$\begin{aligned} \sup _{\Psi \in \Gamma _{n\sigma }^c\cap \Gamma _{n\lambda }} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Define $\Gamma _{\lambda }=\{ \Psi \in \Gamma : \max _i\{ |\lambda _i|\}\ge \eta _0 \}$. It is clear that $\Gamma _{n\lambda }\subset \Gamma _{\lambda }$ and $(\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda })\subset (\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda })$. Recall $K(\Psi )=E_0\log f(X;\Psi )$ and denote $K_{\eta _0}=\sup \{ K(\Psi ): \Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda } \}$. Suppose $\eta _0$ is large enough so that $\Psi _0\notin \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }$. According to Jensen’s inequality, we have $E_{0}\log \{ f(X;\Psi )/f(X;\Psi _0)\}<0$ for any $\Psi \in \Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }$. Thus we can show $K_{\eta _0}-K(\Psi _0)<0$. Note that $K_{\eta _0}$ is an increasing function of $\eta _0$. Consequently it is easy to show that,

$$\begin{aligned} \underset{\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }}{\sup }\bigg \{ \frac{1}{n}\sum _{i=1}^n\log \bigg ( \frac{f(x_i;\Psi )}{f(x_i;\Psi _0)}\bigg )\bigg \} \rightarrow K_{\eta _0}-K(\Psi _0)<0 \ \quad \text{ almost } \text{ surely } \text{ as } n\rightarrow \infty . \end{aligned}$$

Furthermore, we obtain

$$\begin{aligned} \sup _{\Gamma ^c_{n\sigma }\cap \Gamma _{\lambda }} l_n(\Psi ) -l_n(\Psi _0)&= \underset{\Gamma ^c_{\sigma }\cap \Gamma _{\lambda }}{\sup } \sum _{i=1}^n\log \bigg ( \frac{f(x_i;\Psi )}{f(x_i;\Psi _0)}\bigg ) \\&\le \frac{K_{\eta _0}-K(\Psi _0)}{2}n+o(n)\rightarrow -\infty \quad \text{ almost } \text{ surely }. \end{aligned}$$

Thus we have $\sup _{\Gamma ^c_{n\sigma }\cap \Gamma _{n\lambda }} l_n(\Psi )-l_n(\Psi _0)\rightarrow -\infty $ almost surely as $n\rightarrow \infty $.

Step 4: From the above, we can conclude that ${\hat{\Psi }}_n\in \Gamma ^{\dagger }$ almost surely as n goes to infinity. By noting the compactness of $\Gamma ^{\dagger }$, we can conclude the strong consistency of ${\hat{\Psi }}_n$ from Theorem 2.2. $\square $

Proof of Theorem 3.2

The proof is almost identical to that of Theorem 2.3. The details are omitted. $\square $

Proof of Theorem 3.3

Let ${\hat{\lambda }}^2_{(i)}$ be the ith order statistic of ${\hat{\lambda }}^2_1,\ldots ,{\hat{\lambda }}^2_g$, it is then easy to show that the constraint $\max _{i,j} \ (1+\hat{\lambda }^2_i)(1+\hat{\lambda }^2_j)\le \vartheta _n$ is equivalent to $(1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le \vartheta _n$. Because $(1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le (1+\hat{\lambda }^2_{(g)})^2$, we have

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \}&={\mathbb {P}}\{ (1+\hat{\lambda }^2_{(g-1)})(1+\hat{\lambda }^2_{(g)})\le \vartheta _n \} \nonumber \\&\ge {\mathbb {P}}\{ (1+\hat{\lambda }^2_{(g)})^2\le \vartheta _n \} \nonumber \\&={\mathbb {P}}\big \{ |\hat{\lambda }_{(g)}|\le (\vartheta ^{1/2}_n-1)^{1/2} =O(\vartheta ^{1/4}_n)\big \} \nonumber \\&={\mathbb {P}}\{ |{\hat{\lambda }}_1|\le \vartheta ^{1/4}_n,\ldots , |{\hat{\lambda }}_g|\le \vartheta ^{1/4}_n \} \nonumber \\&={\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \}. \end{aligned}$$

(A.9)

in which ${\mathbf {1}}_g$ is a $g\times 1$ vector with all elements is 1. By the mean value theorems for definite integrals, there exists at least one point $|{\mathbf {x}}_0|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g$ such that

$$\begin{aligned} {\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \} =\int _{|{\mathbf {x}}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g} \phi _g({\mathbf {x}};\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }})d{\mathbf {x}} =(2\vartheta ^{1/4}_n)^g\cdot \phi _g({\mathbf {x}}_0;\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }}), \end{aligned}$$

in which $\phi _g({\mathbf {x}};\varvec{\mu },\Sigma )$ denotes g-dimensional multivariate normal density function with mean vector $\varvec{\mu }$ and covariance matrix $\Sigma $. As $n\rightarrow \infty $, it can be seen that

$$\begin{aligned} \phi _g({\mathbf {x}}_0;\varvec{\lambda }_0, n^{-1}\Sigma _{\varvec{\lambda }}) =O\{n^{g/2}\exp (-\beta n)\}, \end{aligned}$$

where $\beta $ is a positive constant. This further implies that as $n\rightarrow \infty $ and $\vartheta _n\rightarrow \infty $,

$$\begin{aligned} {\mathbb {P}}\{ |\hat{\varvec{\lambda }}|\le \vartheta ^{1/4}_n\cdot {\mathbf {1}}_g \} =O\{\vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)\}. \end{aligned}$$

(A.10)

Combining (A.9) and (A.10) leads to

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \ge O\{\vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)\}. \end{aligned}$$

(A.11)

Meanwhile, it is easy to verified that

$$\begin{aligned} \lim _{n,\vartheta _n\rightarrow \infty } \vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)=1. \end{aligned}$$

Consequently, we have

$$\begin{aligned} \vartheta _n=O\{ n^{-2}\exp (4\beta n/g) \}, \end{aligned}$$

which can also be derived from (A.10). Let us assume further that

$$\begin{aligned} \vartheta _n=O\{n^{-2}\exp (4\beta n/g)\}-\kappa (n), \end{aligned}$$

(A.12)

in which $\kappa (n)=o\{ n^{-2}\exp (4\beta n/g) \}$, and thus $\Delta _n=\kappa (n)n^{2}\exp (-4\beta n/g)=o(1)$. Under this assumption, we have

$$\begin{aligned} \vartheta ^{g/4}_nn^{g/2}\exp (-\beta n)&=\big \{ \vartheta _nn^{2}\exp (-4\beta n/g) \big \}^{g/4} \nonumber \\&=\big \{ 1- \kappa (n)n^{2}\exp (-4\beta n/g) \big \}^{g/4} \nonumber \\&=1- \frac{g}{4}\Delta _n+o(\Delta _n). \end{aligned}$$

(A.13)

Ignoring the higher order term in (A.13) and recalling the conclusion of (A.11), we get

$$\begin{aligned} {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \ge 1- \frac{g}{4}\Delta _n. \end{aligned}$$

(A.14)

On the other hand, the lower bound of ${\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}$ defined in (2.3) can be simplified to

$$\begin{aligned} {\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\ge 2\cdot \nu ^{n}, \quad \text{ for } \nu \in (0,1). \end{aligned}$$

The existence of $\nu $ lies in that when $x\in (0,1)$, $x^{n}$ is continuous and monotone. This further implies that

$$\begin{aligned} 1-{\mathbb {P}}\{ \max _i|{\hat{\lambda }}_i|=\infty \}\le 1-2\cdot \nu ^{n}. \end{aligned}$$

(A.15)

Consequently, the probability condition of (3.4) links two bounds in (A.14) and (A.15), thus we obtain

$$\begin{aligned} 1- \frac{g}{4}\Delta _n\le {\mathbb {P}}\{ \hat{\varvec{\lambda }}\in \Omega _{\vartheta _n} \} \le 1-{\mathbb {P}}\left\{ \max _i|{\hat{\lambda }}_i|=\infty \right\} \le 1-2\cdot \nu ^{n} \end{aligned}$$

and

$$\begin{aligned} \Delta _n\ge O(\nu ^{n}). \end{aligned}$$

Therefore, by the definition of $\Delta _n$, it is easy to show

$$\begin{aligned} \kappa (n)=\Delta _nn^{-2}\exp (4\beta n/g) \ge O\{\nu ^{n}n^{-2}\exp (4\beta n/g)\}. \end{aligned}$$

Applying this result to (A.12) finally leads to an approximate upper bound for $\vartheta _n$, i.e.

$$\begin{aligned} \vartheta _n\le O\{(1-\nu ^{n})n^{-2}\exp (4\beta n/g)\}, \end{aligned}$$

in which $\beta >0$ and $0<\nu <1$. Moreover, absorbing 4 and g into the positive constant $\beta $, the result can be simplified to

$$\begin{aligned} \vartheta _n\le O\{(1-\nu ^{n})n^{-2}\exp (\beta n)\}. \end{aligned}$$

$\square $

Proof of Theorem 3.4

The proof is similar to that of Theorem 3.3 and thus omitted. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, L., Chiu, S.N., Zhao, J. et al. A constrained maximum likelihood estimation for skew normal mixtures. Metrika 86, 391–419 (2023). https://doi.org/10.1007/s00184-022-00873-2

Download citation

Received: 15 October 2020
Accepted: 30 May 2022
Published: 30 June 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00184-022-00873-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A constrained maximum likelihood estimation for skew normal mixtures

Abstract

Access this article

Similar content being viewed by others

A general class of scale-shape mixtures of skew-normal distributions: properties and estimation

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Theorem 2.3

Proof of Theorem 3.1

Lemma 7.1

Proof of Theorem 3.1

Remark 7.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Proof of Theorem 3.4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A constrained maximum likelihood estimation for skew normal mixtures

Abstract

Access this article

Similar content being viewed by others

A general class of scale-shape mixtures of skew-normal distributions: properties and estimation

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Theorem 2.3

Proof of Theorem 3.1

Lemma 7.1

Proof of Theorem 3.1

Remark 7.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Proof of Theorem 3.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation