Skip to main content
Log in

Estimation for a Class of Semiparametric Pareto Mixture Densities

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

We study the estimation of a class of semiparametric mixture models, where the models have a symmetric nonparametric component and a parametric component of Pareto distribution with unknown parameters. We establish an estimation procedure by minimizing a criterion function after dealing with the jump point. We study the large sample properties of the proposed estimator, and prove consistency and asymptotic normality of the parameter estimation. For the nonparametric component, bias and variance are derived, and a rule-of-thumb bandwidth selection method is given. Simulation studies demonstrate good performance of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Benaglia, T., Chauveau, D. and Hunter, D. (2009). An EM-like algorithm for semi- and nonparametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics 18, 2, 505–526.

    Article  MathSciNet  Google Scholar 

  • Bordes, L., Delmas, C. and Vandekerkhove, P. (2006a). Semiparametric estimation of a two-component mixture model where one component is known. Scandinavian Journal of Statistics 33, 4, 733–752.

    Article  MathSciNet  Google Scholar 

  • Bordes, L., Mottelet, S. and Vandekerkhove, P. (2006b). Semiparametric estimation of a two-component mixture model. Annals of Statistics 34, 3, 1204–1232.

    Article  MathSciNet  Google Scholar 

  • Bordes, L., Chauveau, D. and Vandekerkhove, P. (2007). A stochastic em algorithm for a semiparametric mixture model. Computational Statistics & Data Analysis 51, 5429–5443.

    Article  MathSciNet  Google Scholar 

  • Bordes, L. and Vandekerkhove, P. (2010). Semiparametric two-component mixture model with a known component: an asymptotically normal estimator. Mathematical Methods of Statistics 19, 1, 22–41.

    Article  MathSciNet  Google Scholar 

  • Chu, C. K. and Cheng, P. E. (1996). Estimation of jump points and jump values of a density function. Statistica Sinica 6, 1, 79–95.

    MathSciNet  MATH  Google Scholar 

  • Efron, B. (2007). Size, power and false discovery rates. Annals of Statistics 35, 1351–1377.

    Article  MathSciNet  Google Scholar 

  • Gayraud, G. (2002). Minimax estimation of a discontinuity for the density. Journal of Nonparametric Statistics 14, 1-2, 59–66.

    Article  MathSciNet  Google Scholar 

  • Hall, P. and Zhou, X. (2003). Nonparametric estimation of component distributions in a multivariate mixture. Annals of Statistics, 201–224.

  • Hohmann, D. and Holzmann, H. (2013). Semiparametric location mixtures with distinct components. Statistics: A Journal of Theoretical and Applied Statistics 47, 348–362.

    Article  MathSciNet  Google Scholar 

  • Huang, M., Wang, S., Wang, H. and Jin, T. (2018). Maximum smoothed likelihood estimation for a class of semiparametric pareto mixture densities. Statistics and Its Interface 11, 1, 31–40.

    Article  MathSciNet  Google Scholar 

  • Hunter, D., Wang, S. and Hettmansperger, T. (2007). Inference for mixtures of symmetric distributions. Annals of Statistics 35, 1, 224–251.

    Article  MathSciNet  Google Scholar 

  • Jiang, G. and Wang, H. (2005). Should firms with two consecutive annual losses be specially treated (st)?. Economic Research Journal 38, 2, 894–942.

    Google Scholar 

  • Kraft, C., Lepage, Y. and Van Eeden, C. (1985). Estimation of a symmetric density function. Communications in statistics. Theory and Methods 14, 2, 273–288.

    Article  MathSciNet  Google Scholar 

  • Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York.

    Book  Google Scholar 

  • Levine, M., Hunter, D. R. and Chauveau, D. (2011). Maximum smoothed likelihood for multivariate mixtures. Biometrika 98, 2, 403.

    Article  MathSciNet  Google Scholar 

  • Li, Q. and Racine, J. (2007). Nonparametric econometrics: Theory and practice. Princeton University Press, Princeton.

    MATH  Google Scholar 

  • Ma, Y. and Yao, W. (2015). Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electronic Journal of Statistics 9, 1, 444–474.

    Article  MathSciNet  Google Scholar 

  • Maiboroda, R. and Sugakova, O. (2010a). Generalized estimating equations for symmetric distributions observed with admixture. Communications in Statistics Theory and Methods 40, 1, 96–116.

    Article  MathSciNet  Google Scholar 

  • Maiboroda, R. and Sugakova, O. (2010b). Nonparametric density estimation for symmetric distributions by contaminated data. Metrika, 1–18.

  • McLachlan, G. and Peel, D. (2004). Finite mixture models. Wiley, New York.

    MATH  Google Scholar 

  • Naylor, J. and Smith, A. (1983). A contamination model in clinical chemistry: an illustration of a method for the efficient computation of posterior distributions. The Statistician, 82–87.

  • Nguyen, V. H. and Matias, C. (2014). On efficient estimators of the proportion of true null hypotheses in a multiple testing setup. Statistics and Its Interface41, 1167–1194.

    MathSciNet  MATH  Google Scholar 

  • Pal, C. and SenGupta, A. (2000). Optimal tests for no contamination in reliability models. Lifetime Data Analysis 6, 3, 281–290.

    Article  MathSciNet  Google Scholar 

  • Patra, R. K. and Sen, B. (2016). Estimation of a two-component mixture model with applications to multiple testing. Journal of the Royal Statistical Society: Series B 78, 869893.

    MathSciNet  Google Scholar 

  • Silverman, B. (1986). Density estimation for statistics and data analysis, 26. Chapman & Hall/CRC, CRC Press.

    MATH  Google Scholar 

  • Wang, H., Li, G. and Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business and Economic Statistics 25, 3, 347–355.

    Article  MathSciNet  Google Scholar 

  • Xiang, S., Yao, W. and Wu, J. (2014). Minimum profile hellinger distance estimation for a semiparametric mixture model. The Canadian Journal of StatisticsThe Canadian Journal of Statistics 42, 246–267.

    Article  MathSciNet  Google Scholar 

  • Xiang, S., Yao, W. and Yang, G. (2019). An overview of semiparametric extensions of finite mixture models. Statistical Science 34, 3, 391–404.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank Dr. Jiamin Zhang for the support on writing assistance, technical editing and language editing in the preparation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiyang Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: A. Regularity conditions

  1. (i)

    The kernel density function K is even, bounded, uniformly continuous, square integrable, of bounded variation and has a second-order moment.

  2. (ii)

    The first derivative \(K^{\prime }\) of the kernel density function K satisfies that \(K^{\prime }\in L^{1}(\mathbb {R})\) and \(K^{\prime }(x)\rightarrow 0\) as \(|x|\rightarrow \infty \).

  3. (iii)

    Let ω denote the square root of the modulus of continuity of K.

    $$ {{\int}_{0}^{1}}(\log(1/u))^{1/2}d\omega(u)<\infty. $$
  4. (iv)

    \(h_{n}\rightarrow 0\), \(nh_{n}\rightarrow \infty \) and \(n{h_{n}^{4}}\rightarrow 0\);

  5. (v)

    \(nh_{n}/|\log {h_{n}}|\rightarrow \infty \), \(|\log {h_{n}}|/\log \log {n}\rightarrow \infty \), and there exists a constant c > 0 such that hnch2n for all n ≥ 1;

  6. (vi)

    \(|\log {h_{n}}|/(n{h_{n}^{3}})\rightarrow 0\).

  7. (vii)

    F is strictly increasing on \(\mathbb {R}\).

  8. (viii)

    P is twice continuously differentiable. G is piecewise twice continuously differentiable. All the second derivatives of P and G are in \(L^{1}(\mathbb {R})\).

Proof of Theorem 1

The proof for the first part is similar to that of lemma 1 of Hunter et al. (2007). Because d(θ0) = 0 and \(d_{n}(\boldsymbol {\hat {\theta }})\le d_{n}(\hat {\alpha },\boldsymbol {\eta }_{0})\),

$$ \begin{array}{rl} d(\boldsymbol{\hat{\theta}}) & \le [d(\boldsymbol{\hat{\theta}})-d_{n}(\boldsymbol{\hat{\theta}})] + d_{n}(\hat{\alpha},\boldsymbol{\eta}_{0})\\ & \le |d(\boldsymbol{\hat{\theta}})-d_{n}(\boldsymbol{\hat{\theta}})| + |d(\boldsymbol{\theta}_{0}) - d_{n}(\hat{\alpha},\boldsymbol{\eta}_{0})|, \end{array} $$

this inequality, together with part (iii) of lemma 2 and lemma 3, implies that \(d(\boldsymbol {\hat {\theta }}) = o_{a.s.}(n^{-1/2+\gamma })\). By parts (i) and (ii) of lemma 2, \(\boldsymbol {\hat {\theta }}\) converges to θ0 almost surely. We are now to show the second part. By part (iv) of lemma 2, there exist constants r1 > 0 and c1 > 0 such that

$$ d(\alpha_{0}, \boldsymbol{\eta}_{0}+\boldsymbol{v})\ge c_{1}\|\boldsymbol{v}\|^{2}, \quad \text{for all } \|\boldsymbol{v}\|<r_{1}, \boldsymbol{v}\in\mathbb{R}^{3}, $$
(4.1)

where ∥⋅∥ stands for Euclidean norm. Since d is Lipschitz continuous, and \(\hat {\alpha }-\alpha _{0}\) converges to 0 with rate (n/hn)− 1/2, it follows that \(|d(\alpha _{0},\boldsymbol {\hat {\eta }}) - d(\hat {\alpha },\boldsymbol {\hat {\eta }})|=o_{a.s.}(n^{-1/2})\). Note that

$$ |d(\alpha_{0},\boldsymbol{\hat{\eta}})| \le |d(\alpha_{0},\boldsymbol{\hat{\eta}}) - d(\hat{\alpha},\boldsymbol{\hat{\eta}})| + d(\hat{\alpha},\boldsymbol{\hat{\eta}}), $$

where \((\hat {\alpha },\boldsymbol {\hat {\eta }}) = \boldsymbol {\hat {\theta }}\). Therefore, \(|d(\alpha _{0},\boldsymbol {\hat {\eta }})| = o_{a.s.}(n^{-1/2+\gamma })\) for all γ > 0. Then inequality (4.1) implies the desired rate for \(|\boldsymbol {\hat {\eta }}-\boldsymbol {\eta }_{0}|\) and \(|\boldsymbol {\hat {\theta }}-\boldsymbol {\theta }_{0}|\).

Proof of Theorem 2

Recall that we use dot and double dot to denote the derivatives of a function with respect to η. Since \(\boldsymbol {\hat {\eta }}\) is the minimizer of \(d_{n}(\hat {\alpha },\boldsymbol {\eta })\), \(\dot {d_{n}}(\hat {\alpha },\boldsymbol {\hat {\eta }})=0\). We do a Taylor expansion for \(\dot {d_{n}}(\hat {\alpha },\boldsymbol {\hat {\eta }})\) around \((\hat {\alpha },\boldsymbol {\eta }_{0})\):

$$ \ddot{d}_{n}(\hat{\alpha},\boldsymbol{\eta}^{*}) \sqrt{n}(\boldsymbol{\hat{\eta}}-\boldsymbol{\eta}_{0}) = -\sqrt{n}\dot{d}_{n}(\hat{\alpha},\boldsymbol{\eta}_{0}), $$

where η lies in the line segment connecting \(\boldsymbol {\hat {\eta }}\) and η0.

The regularity condition (viii) implies that \(\dot {d}_{n}(\alpha ,\boldsymbol {\eta })\) is Lipschitz continuous with respect α. Hence there exists a constant c2 which is independent of n such that

$$ |\dot{d}_{n}(\alpha_{1},\boldsymbol{\eta}_{0}) - \dot{d}_{n}(\alpha_{2},\boldsymbol{\eta}_{0})|\le c_{2}|\alpha_{1}-\alpha_{2}|. $$

Since \(\hat {\alpha }-\alpha _{0}\) converges to 0 with a rate (n/hn)− 1/2,

$$ |\dot{d}_{n}(\hat{\alpha},\boldsymbol{\eta}_{0}) - \dot{d}_{n}(\alpha_{0},\boldsymbol{\eta}_{0})|=o_{p}(n^{-1/2}). $$

By the proof the Theorem 3.2 of Bordes and Vandekerkhove (2010),

$$ \dot{d}_{n}(\boldsymbol{\theta}_{0}) = \frac{2}{n}\sum\limits_{i=1}^{n} H(X_{i};\boldsymbol{\theta}_{0},\hat{F}_{n}) \dot{H}(X_{i};\boldsymbol{\theta}_{0},F) +o_{a.s.}(n^{-1/2}). $$

It follows that

$$ \dot{d}_{n} (\hat{\alpha},\boldsymbol{\eta}_{0}) = \frac{2}{n}\sum\limits_{i=1}^{n} H(X_{i};\boldsymbol{\theta}_{0},\hat{F}_{n}) \dot{H}(X_{i};\boldsymbol{\theta}_{0},F) +o_{p}(n^{-1/2}). $$

Similar to the proof the Theorem 3.2 of Bordes and Vandekerkhove (2010), we can show that

$$ \ddot{d}_{n}(\hat{\alpha},\boldsymbol{\eta}^{*}) \rightarrow -\boldsymbol{J}(\boldsymbol{\theta}_{0}) \text{ a.s. as } n\rightarrow\infty. $$

The remaining part of the proof is similar to that of Theorem 3.2 of Bordes and Vandekerkhove (2010) and is omitted here.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Wang, X. Estimation for a Class of Semiparametric Pareto Mixture Densities. Sankhya A 84, 609–627 (2022). https://doi.org/10.1007/s13171-020-00208-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-020-00208-1

Keywords

AMS (2000) subject classification

Navigation