Abstract
In this paper, we consider a flexible semiparametric approach for estimating multivariate probability mass functions. The corresponding estimator is governed by a parametric starter, for instance a multivariate Poisson distribution with nonnegative cross correlations which is basically estimated through an expectation–maximization algorithm, and a nonparametric part which is an unknown weight discrete function to be smoothed through multiple binomial kernels. Our central focus is upon the selection matrix of bandwidths by the local Bayesian method. We additionally discuss the diagnostic model to enact an appropriate choice between the parametric, semiparametric and nonparametric approaches. Retaining a pure nonparametric method implies losing parametric benefices in this modelling framework. Practical applications, including a tail probability estimation, on multivariate count datasets are analyzed under several scenarios of correlations and dispersions. This semiparametic approach demonstrates superior performances and better interpretations compared to parametric and nonparametric ones.
Similar content being viewed by others
Data availibility
The data that support the findings of this study are available in this published article, and also from the corresponding author upon request.
Code availability
The code that supports the findings of this study is available from the corresponding author upon request.
References
Abdous B, Kokonendji CC, Senga Kiessé T (2012) On semiparametric regression for count explanatory variables. J Stat Plan Inference 142:1537–1548
Aitchinson J, Ho CH (1989) The multivariate Poisson-log normal distribution. Biometrika 75:621–629
Arab A, Holan SH, Wikle CK, Wildhaber ML (2011) Semiparametric bivariate zero-inflated Poisson models with application to studies of abundance for multiple species. Environmetrics 23:183–196
Belaid N, Adjabi S, Zougab N, Kokonendji CC (2016a) Bayesian bandwidth selection in discrete multivariate associated kernel estimators for probability mass functions. J Korean Stat Soc 45:557–567
Belaid N, Adjabi S, Kokonendji CC, Zougab N (2016b) Bayesian local bandwidth selector in multivariate associated kernel estimator for joint probability mass functions. J Stat Comput Simul 86:3667–3681
Belaid N, Adjabi S, Kokonendji CC, Zougab N (2018) Bayesian adaptive bandwidth selector for multivariate binomial kernel estimator. Commun Stat Theory Methods 47:2988–3001
Berkhout P, Plug E (2004) A bivariate Poisson count data model using conditional probabilities. Stat Neerl 58:349–364
Corporación Favorita (2018) Grocery sales data. https://www.kaggle.com/c/favorita-grocery-salesforecasting/data. Accessed 12 Nov 2021
Cuenin J, Jørgensen B, Kokonendji CC (2016) Simulations of full multivariate Tweedie with flexible dependence structure. Comput Stat 31:1477–1492
Hall P, Marron JS (1991) Lower bounds for bandwidth selection in density estimation. Probab Theory Relat Fields 90:149–173
Harfouche L, Adjabi S, Zougab N, Funke B (2018) Multiplicative bias correction for discrete kernels. Stat Methods Appl 27:253–276
Huang A, Sippel L, Fung T (2022) Consistent second-order discrete kernel smoothing using dispersed Conway–Maxwell–Poisson kernels. Comput Stat 37:551–563
Johnson NL, Kotz S, Balakrishnan N (1997) Discrete multivariate distributions. Wiley, New York
Jørgensen B, Kokonendji CC (2016) Discrete dispersion models and their Tweedie asymptotics. AStA Adv Stat Anal 100:43–78
Kano K, Kawamura K (1991) On recurrence relations for the probability function of multivariate generalized Poisson distribution. Commun Stat Theory Methods 20:165–178
Karlis D (2003) An EM algorithm for multivariate Poisson distribution and related models. J Appl Stat 30:63–77
Karlis D, Ntzoufras J (2005) Bivariate Poisson and diagonal inflated bivariate Poisson regression Models in R. J Stat Softw 14(10):1–36
Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker Inc, New York
Kokonendji CC, Puig P (2018) Fisher dispersion index for multivariate count distributions: a review and a new proposal. J Multivar Anal 165:180–193
Kokonendji CC, Senga Kiessé T (2011) Discrete associated kernels method and extensions. Stat Methodol 8:497–516
Kokonendji CC, Senga Kiessé T, Balakrishnan N (2009) Semiparametric estimation for count data through weighted distributions. J Stat Plan Inference 139:3625–3638
Kokonendji CC, Somé SM (2018) On multivariate associated kernels to estimate general density functions. J Korean Stat Soc 47:112–126
Kokonendji CC, Somé SM (2021) Bayesian bandwidths in semiparametric modelling for nonnegative orthant data with diagnostics. Stats 4:162–183
Kokonendji CC, Touré AY, Sawadogo A (2020) Relative variation indexes for multivariate continuous distributions on \([0,\infty )^k\) and extensions. AStA Adv Stat Anal 104:285–307
Kokonendji CC, Zougab N, Senga Kiessé T (2017) Poisson-weighted estimation by discrete kernel with application to radiation biodosimetry. In: Ainsbury EA, Calle ML, Cardis E, Einbeck J, Gomez G, Puig P (eds) Biomedical big data & statistics for low dose radiation research-extended abstracts fall 2015, vol. VII, Part II, Chap. 19. Springer, Basel, pp 115–120
Krummenauer F (1998) Efficient simulation of multivariate binomial and Poisson distributions. Biom J 40:823–832
Mellinger GD, Sylwester DL, Gaffey WR, Manheimer DI (1965) A mathematical model with application to a study of accident repeatedness among children. J Am Stat Assoc 60:1046–1059
NBA. NBA All-Star Game, 2000–2016. https://www.kaggle.com/fmejia21/nba-all-star-game-20002016?. Accessed 12 Nov 2021
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Sellers KF, Li T, Wu Y, Balakrishnan N (2021) A flexible multivariate distribution for correlated count data. Stats 4:308–326
Senga Kiessé T, Durrieu G (2020) Discrete optimal symmetric kernels for estimating count data distributions. Preprint hal-02503789
Somé SM, Kokonendji CC (2016) Effects of associated kernels in nonparametric multiple regressions. J Stat Theory Pract 10:456–471
Su P (2015) Generation of multivariate data with arbitrary marginals—Package ’NORTARA’. https://cran.r-project.org/web/packages/NORTARA/
Tsionas EG (1999) Bayesian analysis of the multivariate Poisson distribution. Commun Stat Theory Methods 28:431–451
Tsionas EG (2001) Bayesian multivariate Poisson regression. Commun Stat Theory Methods 30:243–255
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50:1–26
Zougab N, Adjabi S, Kokonendji CC (2012) Binomial kernel and Bayes local bandwidth in discrete functions estimation. J Nonparametr Stat 24:783–795
Acknowledgements
This article is dedicated to, and in memory of Professor Longin Somé (1955-2022). The authors sincerely thank an Associate Editor and two anonymous referees for their valuable comments that improved the paper. Part of this work was carried out while the second author was at Research Unit LaMOS - University of Béjaia as a visiting scientist. For the second coauthor, this work is supported by the EIPHI Graduate School (contract ANR-17-EURE-0002).
Funding
The LmB (from the second author) receives support from the EIPHI Graduate School (contract ANR-17-EURE-0002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 1
From (6), it is enough to calculate \(\textrm{Bias}[{\widetilde{w}}_{n}({\textbf{x}})]\) and \(\textrm{Var}[{\widetilde{w}}_{n}({\textbf{x}})]\), since one has \(\textrm{Bias}[{\widehat{f}}_n({\textbf{x}})]=p_{d}({\textbf{x}};\widehat{\varvec{\theta }}_{n}){\mathbb {E}}[{\widetilde{w}}_n({\textbf{x}})]-f({\textbf{x}})\) and \(\textrm{var}[{\widehat{f}}_n({\textbf{x}})]=[p_{d}({\textbf{x}};\widehat{\varvec{\theta }}_{n})]^2 \textrm{var} [{\widetilde{w}}_n({\textbf{x}})]\). Hence,
where the random variables \({\textsf{Z}}_{x_{j},h_{j}}\) are independent with mean \(\mu _j\) and variance \(\sigma _j\).
Then, using a second order Taylor exapansion and finite differences, we obtain
and (13) becomes
and the desired Bias is deduced.
The pointwise variance is successively obtained with
where
Hence, this concludes the proof. \(\square\)
Proof of Theorem 2
Expanding \((x+h)^y=\sum \limits _{k=0}^{y}x^k h^{y-k}y![k!(y-k)!]^{-1}\), and denoting by \({\textbf{L}}:={\widehat{f}}_{n}({\textbf{x}})\pi ({\textbf{H}})\), we successively express \({\textbf{L}}\) as
By direct calculation, the second term \(\int _{{\mathcal {M}}}{\widehat{f}}_{n}({\textbf{x}})\pi ({\textbf{H}})d{\textbf{H}}\) of (9) becomes
Combining (14) and (15) as in (9), we easily get the closed expression of the posterior distribution \({\widehat{\pi }}({\textbf{H}}|{\textbf{x}},{\textbf{X}}_1,\dots ,{\textbf{X}}_n)\) provided in (11).
The diagonal elements of the matrix of bandwidths are obtained as:
Then
which corresponds to Eq. (12). \(\square\)
Proof of Proposition 1
Using the property of the beta function and for fixed \(j \in 1,\dots ,d\), \({\widehat{h}}_{j}(x_j)\) is written as
Hence, \({\widehat{h}}_j(x_j)\) can be bounded to the left-hand side as follows:
Since \(X_{ij}\le x_{j}\), the bandwidth \({\widehat{h}}_j(x_j)\) is bounded to the right-hand side by:
Then, one gets
which leads to the desired result. \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Somé, S.M., Kokonendji, C.C., Belaid, N. et al. Bayesian local bandwidths in a flexible semiparametric kernel estimation for multivariate count data with diagnostics. Stat Methods Appl 32, 843–865 (2023). https://doi.org/10.1007/s10260-023-00682-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-023-00682-5
Keywords
- Dispersion index
- EM algorithm
- Model diagnostics
- Multivariate discrete associated kernel
- Multivariate Poisson distribution
- Weighted distribution