Skip to main content
Log in

Estimation of the Minimum Probability of a Multinomial Distribution

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Cite this article


The estimation of the minimum probability of a multinomial distribution is important for a variety of application areas. However, standard estimators such as the maximum likelihood estimator and the Laplace smoothing estimator fail to function reasonably in many situations as, for small sample sizes, these estimators are fully deterministic and completely ignore the data. Inspired by a smooth approximation of the minimum used in optimization theory, we introduce a new estimator, which takes advantage of the entire data set. We consider both the cases with a known and an unknown number of categories. We categorize the asymptotic distributions of the proposed estimator and conduct a small-scale simulation study to better understand its finite sample performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2


  1. Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  2. Chao A (1984) Nonparametric estimation of the number of classes in a population. Scandinavian J Stat 11:265–270

    MathSciNet  Google Scholar 

  3. Chao A (1987) Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43:783–791

    Article  MathSciNet  Google Scholar 

  4. Chung K, AitSahlia F (2003) Elementary Probability Theory with Stochastic Processes and an Introduction to Mathematical Finance, 4th edn. Springer, New York

    Google Scholar 

  5. Colwell C (1994) Estimating terrestrial biodiversity through extrapolation. Philos Trans Biol Sci 345:101–118

    Article  Google Scholar 

  6. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) ‘Visual categorization with bags of keypoints’, In: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22

  7. Grabchak M, Marcon E, Lang G, Zhang Z (2017) The generalized Simpson’s entropy is a measure of biodiversity. PLOS ONE 12:e0173305

    Article  Google Scholar 

  8. Grabchak M, Zhang Z (2018) Asymptotic normality for plug-in estimators of diversity indices on countable alphabets. J Nonparam Stat 30:774–795

    Article  MathSciNet  Google Scholar 

  9. Gu Z, Shao M, Li L, Fu Y (2012) ‘Discriminative metric: Schatten norm vs. vector norm’, In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1213–1216

  10. Haykin S (1994) Neural networks: a comprehensive foundation. Pearson Prentice Hall, New York

    MATH  Google Scholar 

  11. Lange M, Zühlke D, Holz T, Villmann O (2014) ‘Applications of \(l_p\)-norms and their smooth approximations for gradient based learning vector quantization’, In: ESANN 2014: Proceedings of the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 271–276

  12. May WL, Johnson WD (1998) On the singularity of the covariance matrix for estimates of multinomial proportions. J Biopharmaceut Stat 8:329–336

    Article  Google Scholar 

  13. Shao J (2003) Mathematical Statistics, 2nd edn. Springer, New York

    Book  Google Scholar 

  14. Turney P, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21:315–346

    Article  Google Scholar 

  15. Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. ACM SIGIR Forum 51:268–276

    Article  Google Scholar 

  16. Zhang Z, Chen C, Zhang J (2020) Estimation of population size in entropic perspective. Commun Stat Theory Methods 49:307–324

    Article  MathSciNet  Google Scholar 

Download references


This paper was inspired by the question of Dr. Zhiyi Zhang (UNC Charlotte): How to estimate the minimum probability of a multinomial distribution? We thank Ann Marie Stewart for her editorial help. The authors wish to thank two anonymous referees whose comments have improved the presentation of this paper. The second author’s work was funded, in part, by the Russian Science Foundation (Project No. 17-11-01098).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Michael Grabchak.

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

Throughout the section, let \(\Sigma ={\mathrm {diag}}({\mathbf{p}} )-{\mathbf{p}} {} {\mathbf{p}} ^T\), \(\sigma _n=\sqrt{\nabla g_n({\mathbf{p}} )^{T}{{\Sigma }} \nabla g_n({\mathbf{p}} )}\), \(\Lambda =\lim \limits _{n \rightarrow \infty } \nabla g_n({\mathbf{p}} )\), and \(\sigma =\sqrt{\Lambda ^T \Sigma \Lambda }\). It is well known that \(\Sigma\) is a positive definite matrix, see, e.g., [12]. For simplicity, we use the standard notation \(O(\cdot )\), \(o(\cdot )\), \(O_p(\cdot )\), and \(o_p(\cdot )\), see, e.g., [13] for the definitions. In the case of matrices and vectors, this notation should be interpreted as component wise.

It may, at first, appear that Theorem 2.1 can be proved using the delta method. However, the difficulty lies in the fact that the function \(g_n(\cdot )\) depends on n. For this reason, the proof requires a more subtle approach. We begin with several lemmas.

Lemma A.1

  1. 1

    There is a constant \(\epsilon >0\) such that \(p_0 \le g_n({\mathbf{p}} ) \le p_0+ (k-r)e^{-n^{\alpha } \epsilon }\). When \(r\ne k\), we can take \(\epsilon = \min \limits _{j:p_j > p_0}(p_j-p_0)\)

  2. 2.

    For any constant \(\beta \in {\mathbb {R}}\)

    $$\begin{aligned} n^{\beta }\{g_n({\mathbf{p}} )-p_0\} \xrightarrow []{}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$
  3. 3

    For any \(1\le j \le k\) and any constant \(\beta \in {\mathbb {R}}\)

    $$\begin{aligned} n^{\beta }e^{-n^{\alpha }p_j}w^{-1}\{g_n({\mathbf{p}} )-p_j\} \xrightarrow []{}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$


We begin with the first part. First, assume that \(r=k\). In this case, it is immediate that \(g_n({\mathbf{p}} )=k^{-1}=p_0\) and the result holds with any \(\epsilon >0\). Now assume \(r\ne k\). In this case,

$$\begin{aligned} p_0=p_0\sum \limits _{i=1}^k e^{-n^\alpha p_i} w^{-1} \le \sum \limits _{i=1}^k p_i e^{-n^\alpha p_i} w^{-1} = g_n({\mathbf{p}} ). \end{aligned}$$

To show the other inequality, note that

$$\begin{aligned} e^{-n^{\alpha }p_0} w^{-1} =\left\{ \sum \limits _{i=1}^k e^{-n^{\alpha }(p_i-p_0)} \right\} ^{-1} \le (re^0)^{-1}= {r}^{-1} \end{aligned}$$

and that, for any \(p_i>p_0\), we have

$$\begin{aligned} e^{n^\alpha p_i}w= \sum \limits _{j=1}^k e^{-n^{\alpha }(p_j-p_i)} \ge e^{-n^{\alpha }(p_0-p_i)} =e^{n^{\alpha }(p_i-p_0)} \ge \exp \left\{ n^{\alpha } \min \limits _{j:p_j > p_0}(p_j-p_0)\right\} . \end{aligned}$$

Setting \(\epsilon = \min \limits _{j:p_j> p_0}(p_j-p_0)>0\), it follows that, for \(p_i>p_0\),

$$\begin{aligned} e^{-n^{\alpha }p_i} w^{-1}\le e^{-n^{\alpha }\epsilon } . \end{aligned}$$

We thus get

$$\begin{aligned} g_n({\mathbf{p}} ) =\sum \limits _{i:p_i=p_0} p_i e^{-n^\alpha p_i} w^{-1}+\sum \limits _{i:p_i>p_0} p_i e^{-n^\alpha p_i} w^{-1} \le rp_0(r)^{-1}+(k-r)e^{-n^{\alpha }\epsilon }. \end{aligned}$$

The second part follows immediately from the first. We now turn to the third part. When \(p_j=p_0\) Eq. (11) and Part 1 imply that \(e^{-n^{\alpha }p_j}w^{-1} \le r^{-1}\) and that there is an \(\epsilon >0\) such that

$$\begin{aligned} 0 \le g_n({\mathbf{p}} )-p_j \le (k-r)e^{-n^\alpha \epsilon }. \end{aligned}$$

It follows that when \(p_j=p_0\)

$$\begin{aligned} 0 \le n^{\beta }e^{-n^{\alpha }p_j}w^{-1}\{g_n({\mathbf{P}} )-p_j\} \le (k-r)r^{-1}n^{\beta }e^{-n^\alpha \epsilon } \xrightarrow []{}0 {\text { as }} n \xrightarrow {}\infty . \end{aligned}$$

On the other hand, when \(p_j > p_0\), by Part 1 there is an \(\epsilon >0\) such that

$$\begin{aligned} 0\le |g_n({\mathbf{p}} )-p_j|\le p_j-p_0+ (k-r)e^{-n^\alpha \epsilon }. \end{aligned}$$

Using this and Eq. (12) gives

$$\begin{aligned} 0 \le |n^{\beta }e^{-n^{\alpha }p_j}w^{-1}(g_n({\mathbf{p}} )-p_j)|\le (p_j-p_0)n^\beta e^{-n^{\alpha }\epsilon }+(k-r)n^{\beta }e^{-n^{\alpha }(2\epsilon )}\xrightarrow []{} 0, \end{aligned}$$

as \(n \rightarrow \infty\). \(\square\)

We now consider the case when the probabilities are estimated.

Lemma A.2

Let \({\mathbf{p}} ^*_n={\mathbf{p}} ^*=(p^*_1,\dots ,p^*_{k-1})\) be a sequence of random vectors with \(p^*_i\ge 0\) and \(\sum _{i=1}^{k-1} p_i^*\le 1\). Let \(p_k = 1-\sum _{i=1}^{k-1} p_i^*\), \(w^*= \sum \limits _{i=1}^k e^{-n^\alpha p^*_i}\), and assume that \({\mathbf{p}} _n^* \xrightarrow {} {\mathbf{p}}\) a.s. and \(n^\alpha ({\mathbf{p}} _n^* -{\mathbf{p}} )\xrightarrow {p}0\). For every \(j=1,2,\dots ,k\), we have

$$\begin{aligned} n^\alpha \left( p_j^*-p_0\right) e^{-n^\alpha p_j^*}\frac{1}{w^*} {\mathop {\rightarrow }\limits ^{p}}0 \end{aligned}$$


$$\begin{aligned} n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}\{g_n({\mathbf{p}} ^*_n)-p_j^*\} {\mathop {\rightarrow }\limits ^{p}}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$


First note that, from the definition of \(w^*\), we have

$$\begin{aligned} 0\le e^{-n^\alpha p_j^*}\frac{1}{w^*}\le 1. \end{aligned}$$

Assume that \(p_j=p_0\). In this case, the first equation follows from (13) and the fact that \(n^\alpha \left( p_j^*-p_0\right) =n^\alpha \left( p_j^*-p_j\right) {\mathop {\rightarrow }\limits ^{p}}0\). In particular, this completes the proof of the first equation in the case where \(k= r\).

Now assume that \(k\ne r\). Let \(p^*_0=\min \{p^*_i:i=1,2,\dots ,k\}\), \(\epsilon =\min \limits _{i:p_i\ne p_0}\{ p_i-p_0\}\), and \(\epsilon _n^*=\min \limits _{i:p_i\ne p_0}\{ p^*_i-p^*_0\}\). Since \({\mathbf{p}} _n^* \rightarrow {\mathbf{p}}\) a.s., it follows that \(\epsilon ^*_n\rightarrow \epsilon\) a.s. Further, by arguments similar to the proof of Eq. (12), we can show that, if \(p_j\ne p_0\) then there is a random variable N, which is finite a.s., such that for \(n\ge N\)

$$\begin{aligned} e^{-n^\alpha p^*_j} \frac{1}{w^*} \le e^{-n^\alpha \epsilon _n^*} \le e^{-n^\alpha \epsilon /2}. \end{aligned}$$

It follows that for such j and \(n\ge N\)

$$\begin{aligned} n^\alpha \left| p_j^*-p_0\right| e^{-n^\alpha p_j^*}\frac{1}{w^*} \le 2n^\alpha e^{-n^\alpha \epsilon /2}\rightarrow 0. \end{aligned}$$

This completes the proof of the first limit.

Now assume either \(k=r\) or \(k\ne r\). For the second limit, note that

$$\begin{aligned}&n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}(g_n(\mathbf{p} ^*_n)-p^*_j)\\&\quad = n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}(g_n({\mathbf{p}} ^*)-p_0)+n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}(p_0-p^*_j)\\&\quad = n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}\sum \limits _{i=1}^k (p^*_i-p_0) e^{-n^\alpha p^*_i} \frac{1}{w^*} +n^{\alpha }e^{-n^{\alpha }p^*_j}\frac{1}{w^*}(p_0-p^*_j). \end{aligned}$$

From here the result follows by the first limit and (13). \(\square\)

Lemma A.3

1. If \(r= k\), then for each \(i=1,2,\dots ,k\)

$$\begin{aligned} \frac{\partial g_n({\mathbf{p}} )}{\partial p_i} = 0. \end{aligned}$$

2. If \(r\ne k\), then for each \(i=1,2,\dots ,k\)

$$\begin{aligned} \lim \limits _{n \rightarrow \infty }\frac{\partial g_n({\mathbf{p}} )}{\partial p_i} = {\left\{ \begin{array}{ll} r^{-1}, &{} {\text { if }} p_k \ne p_0 {\text { and }} p_i = p_0 \\ -r^{-1}, &{}{\text { if }} p_k = p_0 {\text { and }} p_i \ne p_0 \\ 0, &{} {\text {otherwise.}} \end{array}\right. } \end{aligned}$$


When \(r= k\), the result is immediate from (4). Now assume that \(r\ne k\). We can rearrange equation (4) as

$$\begin{aligned} \frac{\partial g_n({\mathbf{p}} )}{\partial p_i}=w^{-1}\left( e^{-n^{\alpha }p_i}-e^{-n^{\alpha }p_k}\right) +r_n, \end{aligned}$$

where \(r_n=n^{\alpha }e^{-n^{\alpha }p_i}w^{-1}\{g_n({\mathbf{p}} )-p_i\}-n^{\alpha }e^{-n^{\alpha }p_k}w^{-1}\{g_n({\mathbf{p}} )-p_k\}\). Note that Lemma A.1 implies that \(r_n\rightarrow 0\) as \(n \rightarrow \infty\). It follows that

$$\begin{aligned} \lim \limits _{n \rightarrow \infty }\frac{ \partial g_n({\mathbf{p}} )}{\partial p_i}= & {} \lim \limits _{n \rightarrow \infty } e^{-n^\alpha p_i}w^{-1}-\lim \limits _{n \rightarrow \infty }e^{-n^\alpha p_k}w^{-1}\\= & {} \lim \limits _{n \rightarrow \infty } \left\{ \sum \limits _{j=1}^{k} e^{-n^\alpha (p_j-p_i)}\right\} ^{-1}-\lim \limits _{n \rightarrow \infty }\left\{ \sum \limits _{j=1}^{k} e^{-n^\alpha (p_j-p_k)} \right\} ^{-1}. \end{aligned}$$

Consider the case where \(p_k \ne p_0\) and \(p_i = p_0\). In this case, the first part has r component(s) in the denominator that are equal to one (\(e^0\)) and the remaining \(k-r\) terms go to zero individually. However, since \(p_k\ne p_0\), the denominator of the second fraction has r terms of the form \(e^{-n^{\alpha } (p_0-p_k) }\), which go to \(+\infty\), while the other terms go to 0, 1, or \(+\infty\). Thus, in this case, the limit is \(r^{-1}-0=r^{-1}\). The arguments in the other cases are similar and are thus omitted. \(\square\)

Lemma A.4

Assume that \(r\ne k\) and let \({\mathbf{p}} ^*_n\) be as in Lemma A.2. In this case, \(\frac{\partial (g_n(\mathbf{p }))}{\partial p_i}=O(1)\), \(\frac{\partial (g_n(\mathbf{p ^*_n}))}{\partial p_i}=O_p(1)\), \(\frac{\partial ^2 (g_n(\mathbf{p }))}{\partial p_i \partial p_j}=O(n^{\alpha })\), \(\frac{\partial ^2 (g_n(\mathbf{p ^*_n}))}{\partial p_i \partial p_j}=O_p(n^{\alpha })\), \(\frac{\partial ^3 (g_n(\mathbf{p }))}{\partial p_\ell \partial p_i \partial p_j}=O(n^{2\alpha })\), and \(\frac{\partial ^3 (g_n(\mathbf{p ^*_n}))}{\partial p_\ell \partial p_i \partial p_j}=O_p(n^{2\alpha })\).


The results for the first derivatives follow immediately from (4), (13), Lemma A.2, and Lemma A.3. Now let \(\delta _{ij}\) be 1 if \(i=j\) and zero otherwise. It is straightforward to verify that

$$\begin{aligned} \frac{\partial ^2 g_n({\mathbf{p}} )}{\partial p_j \partial p_i }= & {} n^{\alpha }w^{-1} \left( e^{-n^{\alpha }p_i}-e^{-n^{\alpha }p_k}\right) \frac{\partial g_n({\mathbf{p}} )}{\partial p_j} \nonumber \\&+n^{\alpha }w^{-1} \left( e^{-n^{\alpha }p_j}-e^{-n^{\alpha }p_k}\right) \frac{\partial g_n({\mathbf{p}} )}{\partial p_i} \nonumber \\&-n^{\alpha }e^{-n^{\alpha }p_k}w^{-1}\left[ n^\alpha \left( g_n({\mathbf{p}} )-p_k\right) +2\right] \nonumber \\&- \delta _{ij} n^{\alpha } e^{-n^{\alpha }p_i}w^{-1}\left[ n^\alpha \left( g_n({\mathbf{p}} )-p_i\right) +2 \right] , \end{aligned}$$

that for \(\ell \ne i\) and \(\ell \ne j\) we have

$$\begin{aligned} \frac{\partial ^3 g_n({\mathbf{p}} )}{\partial p_\ell \partial p_j \partial p_i }= & {} n^\alpha w^{-1}\left( e^{-n^\alpha p_\ell }-e^{-n^\alpha p_k}\right) \frac{\partial ^2 g_n({\mathbf{p}} )}{\partial p_j\partial p_i} \nonumber \\&+ n^{\alpha }w^{-1} \left( e^{-n^{\alpha }p_i}-e^{-n^{\alpha }p_k}\right) \frac{\partial ^2 g_n({\mathbf{p}} )}{\partial p_\ell \partial p_j} \nonumber \\&+n^{\alpha }w^{-1} \left( e^{-n^{\alpha }p_j}-e^{-n^{\alpha }p_k}\right) \frac{\partial ^2 g_n({\mathbf{p}} )}{\partial p_\ell \partial p_i} \nonumber \\&-n^{2\alpha }e^{-n^{\alpha }p_k}w^{-1}\left( \frac{g_n({\mathbf{p}} )}{\partial p_\ell }+ \frac{\partial g_n({\mathbf{p}} )}{\partial p_j} + \frac{\partial g_n({\mathbf{p}} )}{\partial p_i} +1\right) \nonumber \\&-n^{2\alpha }e^{-n^{\alpha }p_k}w^{-1}\left[ n^\alpha \left( g_n({\mathbf{p}} )-p_k\right) +2\right] \nonumber \\&- \delta _{ij} n^{2\alpha } e^{-n^{\alpha }p_i}w^{-1} \frac{\partial g_n({\mathbf{p}} )}{\partial p_\ell }, \end{aligned}$$

and that for \(i=j=\ell\) we have

$$\begin{aligned} \frac{\partial ^3 g_n({\mathbf{p}} )}{\partial p_i^3}= & {} n^\alpha w^{-1} \left( e^{-n^{\alpha }p_i}-e^{-n^{\alpha }p_k}\right) \left( 3\frac{\partial ^2 g_n({\mathbf{p}} )}{\partial p_i^2 } +2n^\alpha \right) \nonumber \\&+n^{2\alpha }\frac{\partial g_n({\mathbf{p}} )}{\partial p_i}\left[ 1-3w^{-1} \left( e^{-n^{\alpha }p_i}+e^{-n^{\alpha }p_k}\right) \right] . \end{aligned}$$

Combining this with Lemma A.2 and the fact that \(0\le w^{-1}e^{-n^\alpha p_s}\le 1\) for any \(1 \le s \le k\) gives the result. \(\square\)

Lemma A.5

Assume \(r\ne k\) and \(0<\alpha <0.5\), then \(\nabla g_n(\hat{\mathbf{p }})-\nabla g_n({\mathbf{p}} )=O_p(n^{\alpha -\frac{1}{2}})\).


By the mean value theorem, we have

$$\begin{aligned} n^{\frac{1}{2}-\alpha }\nabla g_n(\hat{\mathbf{p }})=n^{\frac{1}{2}-\alpha }\nabla g_n({\mathbf{p}} )+ n^{-\alpha } \nabla ^2 g_n({\mathbf{p}} ^*)\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} ), \end{aligned}$$

where \({\mathbf{p}} ^*= {\mathbf{p}} +{\mathrm {diag}}(\varvec{\omega })(\hat{\mathbf{p }}-{\mathbf{p}} )\) for some \(\varvec{\omega } \in [0,1]^{k-1}\). Note that by the strong law of large numbers \(\hat{\mathbf{p }}\rightarrow {\mathbf{p}}\) a.s., which implies that \({\mathbf{p}} ^*-{\mathbf{p}} \rightarrow 0\) a.s. Similarly, by the multivariate central limit theorem and Slutsky’s theorem \(n^\alpha \left( \hat{\mathbf{p }}-{\mathbf{p}} \right) {\mathop {\rightarrow }\limits ^{p}}0\) implies that \(n^\alpha \left( {\mathbf{p}} ^* -{\mathbf{p}} \right) {\mathop {\rightarrow }\limits ^{p}}0\). Thus, the assumptions of Lemma A.4 are satisfied and that lemma gives

$$\begin{aligned} n^{-\alpha } \nabla ^2 g_n({\mathbf{p}} ^*)\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} ) = n^{-\alpha } O_p(n^\alpha ) O_p(1). \end{aligned}$$

From here, the result is immediate. \(\square\)

Lemma A.6

Assume that \(r\ne k\). In this case, \(\sigma >0\) and \(\lim \limits _{n \rightarrow \infty }\sigma ^{-1}_n \sigma =1\). Further, if \(0<\alpha <0.5\), then \(\hat{\sigma }_n^{-1}\sigma _n \xrightarrow {p} 1\).


Since \(\Sigma\) is a positive definite matrix, and by Lemma A.3, \(\Lambda \ne 0\), it follows that \(\sigma >0\). From here, the fact that \(\lim \limits _{n \rightarrow \infty } \sigma _n=\sigma\) gives the first result. Now assume that \(0<\alpha <0.5\). It is easy to see that \(\hat{p}_i\hat{p}_j- p_i p_j=\hat{p}_j(\hat{p}_i-p_i)+p_i({\hat{p}}_j-p_j) =O_p(n^{-1/2})\) and \(\hat{p}_i(1-\hat{p}_i)- p_i(1-p_i)=({\hat{p}}_i-p_i)(1-p_i-{\hat{p}}_i)=O_p(n^{-1/2})\). Thus, \(\hat{\Sigma }=\Sigma +O_p(n^{-1/2})\). This together with Lemma A.3 and Lemma A.5 leads to

$$\begin{aligned} \frac{\hat{\sigma }_n^2}{\sigma _n^2}= & {} {\frac{ \nabla g_n(\hat{\mathbf{p }})^{T}{{\hat{\Sigma }}} \nabla g_n(\hat{\mathbf{p }}) }{\nabla g_n({\mathbf{p}} )^{T}{{\Sigma }} \nabla g_n({\mathbf{p}} )}}\nonumber \\= & {} {\frac{(\nabla g_n(\mathbf{p })+O_p(n^{\alpha -\frac{1}{2}}))^T(\Sigma +O_P(n^{-\frac{1}{2}}))(\nabla g_n(\mathbf{p })+O_p(n^{\alpha -\frac{1}{2}}))}{\nabla g_n({\mathbf{p}} )^{T}{{\Sigma }} \nabla g_n({\mathbf{p}} )}} \nonumber \\= & {} {1+O_p(n^{\alpha -\frac{1}{2}})+O_p({n^{\alpha -1}})+O_p(n^{2\alpha -\frac{3}{2}})+ O_p(n^{-\frac{1}{2}})+O_p(n^{2\alpha -1}) }\xrightarrow {p}1, \end{aligned}$$

which completes the proof. \(\square\)

Lemma A.7

If \(r\ne k\) and \(0<\alpha <0.5\), then \(\sqrt{n}\hat{\sigma }_n^{-1}\{g_n(\hat{\mathbf{p }})-g_n({\mathbf{p}} )\}\xrightarrow {D} {{\mathcal {N}}}(0,1)\).


Taylor’s theorem implies that

$$\begin{aligned}\sqrt{n}(g_n(\hat{\mathbf{p }})-g_n({\mathbf{p}} ))&=\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )^T \nabla g_n({\mathbf{p}} ) \\&\quad + 0.5 \sqrt{n} (\hat{\mathbf{p }}-{\mathbf{p}} )^T n^{-\alpha }\nabla ^2 g_n({\mathbf{p}} ^*) n^{\alpha }(\hat{\mathbf{p }}-{\mathbf{p}} ), \end{aligned}$$

where \({\mathbf{p}} ^*= {\mathbf{p}} +{\text{ diag }}(\varvec{\omega })(\hat{\mathbf{p }}-{\mathbf{p}} )\) for some \(\varvec{\omega } \in [0,1]^{k-1}\). Using Lemma A.4 and arguments similar to those used in the proof of Lemma A.5 gives \(n^{-\alpha }\nabla ^2 g_n({\mathbf{p}} ^*) =O_p(1)\), \(\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )=O_p(1)\), and \(n^{\alpha }(\hat{\mathbf{p }}-{\mathbf{p}} )=o_p(1)\). It follows that the second term on the RHS above is \(o_p(1)\) and hence that

$$\begin{aligned} \sqrt{n}( g_n(\hat{\mathbf{p }})-g_n({\mathbf{p}} ))=\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )^T \nabla g_n({\mathbf{p}} ) + o_p(1). \end{aligned}$$

It is well known that \(\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )^T\xrightarrow {D} {\mathcal N}(0,\Sigma )\). Hence

$$\begin{aligned} \sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )^T \Lambda \xrightarrow {D} {{\mathcal {N}}}(0, \Lambda ^T \Sigma \Lambda ) \end{aligned}$$

and, by Slutsky’s theorem,

$$\begin{aligned} \sqrt{n}( g_n(\hat{\mathbf{p }})-g_n({\mathbf{p}} )) \xrightarrow {D} {{\mathcal {N}}}(0, \Lambda ^T \Sigma \Lambda ). \end{aligned}$$

By Lemma A.6, \(\sigma ^{-1}_n \sigma \rightarrow 1\), and \(\hat{\sigma }_n^{-1}\sigma _n \xrightarrow {p} 1\). Hence, the result follows by another application of Slutsky’s theorem. \(\square\)

Lemma A.8

Let \(\mathbf {A}=-n^{-\alpha }\nabla ^2 g_n({\mathbf{p}} )\) and let \(\mathbf {I}_{k-1}\) be the \((k-1)\times (k-1)\) identity matrix. If \(r=k\), then \(\Sigma ^{\frac{1}{2}}\mathbf {A}\Sigma ^{\frac{1}{2}}=2k^{-2}\mathbf {I}_{k-1}\).


Let \(\mathbf {1}\) be the column vector in \({\mathbb {R}}^{k-1}\) with all entries equal to 1. By Eq. (16), we have

$$\begin{aligned} \mathbf {A}=-n^{-\alpha }\nabla ^2 g_n({\mathbf{p}} )= 2k^{-1}[\mathbf {1}\mathbf {1}^T+\mathbf {I}_{k-1}]. \end{aligned}$$

Note that \(\Sigma ={\mathrm {diag}}({\mathbf{p}} )-{\mathbf{p}} {} {\mathbf{p}} ^T=k^{-2}[k\mathbf {I}_{k-1}-\mathbf {1}\mathbf {1}^T]\). It follows that

$$\begin{aligned} \mathbf {A}\Sigma= & {} 2k^{-1}[\mathbf {1}\mathbf {1}^T+\mathbf {I}_{k-1}] k^{-2}[k\mathbf {I}_{k-1}-\mathbf {1}\mathbf {1}^T]\\= & {} 2k^{-3}[k \mathbf {1}\mathbf {1}^T-\mathbf {1}\mathbf {1}^T\mathbf {1}\mathbf {1}^T+k\mathbf {I}_{k-1}-\mathbf {1}\mathbf {1}^T]\\= & {} 2k^{-3}[k \mathbf {1}\mathbf {1}^T-(k-1) \mathbf {1}\mathbf {1}^T+k\mathbf {I}_{k-1} - \mathbf {1}\mathbf {1}^T] =2k^{-2}\mathbf {I}_{k-1}. \end{aligned}$$

Now multiplying both sides by \(\Sigma ^{1/2}\) on the left and \(\Sigma ^{-1/2}\) on the right gives the result. \(\square\)

Proof of Theorem 2.1

(i) If \(r \ne k\), then

$$\begin{aligned} \sqrt{n}\hat{\sigma }^{-1}_n \{ {\hat{p}}^*_0-p_0\} = \sqrt{n}\hat{\sigma }^{-1}_n \{g_n(\hat{\mathbf{p }})-g_n(\mathbf{p })\} + \sqrt{n}\hat{\sigma }^{-1}_n \{g_n(\mathbf{p })-p_0\} . \end{aligned}$$

The first part approaches a \({\mathcal {N}}(0,1)\) distribution by Lemma A.7, and the second part approaches zero in probability by Lemmas A.6 and A.1. From there, the first part of the theorem follows by Slutsky’s theorem.

(ii) Assume that \(r =k\). In this case, \(g_n({\mathbf{p}} )=p_0=k^{-1}\), and by Lemma A.3, \(\nabla g_n({\mathbf{p}} )=0\). Thus, Taylor’s theorem gives

$$\begin{aligned} n^{1-\alpha }\{p_0- {\hat{p}}_0^*\}= & {} n^{1-\alpha }\{ g_n({\mathbf{p}} )-g_n(\hat{\mathbf{p }})\}\nonumber \\= & {} 0.5 \sqrt{n} (\hat{\mathbf{p }}-{\mathbf{p}} )^T (-n^{-\alpha })\nabla ^2 g_n({\mathbf{p}} )\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} ) +r_n, \end{aligned}$$

where \(r_n=- 6^{-1} \sum \limits _{q=1}^{k-1} \sum \limits _{r=1}^{k-1} \sum \limits _{s=1}^{k-1} \sqrt{n} (\hat{p}_q-p_q) \sqrt{n} (\hat{p}_r-p_r) n^{\alpha } (\hat{p}_s-p_s) n^{-2\alpha } \frac{\partial ^3 g_n({\mathbf{p}} ^*) }{\partial p_q \partial p_r \partial p_s}\), and \({\mathbf{p}} ^*= {\mathbf{p}} +{\text{ diag }}(\varvec{\omega })(\hat{\mathbf{p }}-{\mathbf{p}} )\) for some \(\varvec{\omega } \in [0,1]^{k-1}\). Lemma A.4 implies that \(n^{-2\alpha }\frac{\partial ^3 g_n({\mathbf{p}} ^*) }{\partial p_q \partial p_r \partial p_s}=O_p(1)\). Combining this with the facts that \(\sqrt{n} (\hat{p}_q-p_q)\) and \(\sqrt{n} (\hat{p}_r-p_r)\) are \(O_p(1)\) and that, for \(\alpha \in (0,0.5)\), \(n^{\alpha } (\hat{p}_s-p_s)=o_p(1)\), it follows that \(r_n{\mathop {\rightarrow }\limits ^{p}}0\).

Let \(\mathbf {x}_n=\sqrt{n}(\hat{\mathbf{p }}-{\mathbf{p}} )\), \(\mathbf {T}_n=\Sigma ^{-\frac{1}{2}}\mathbf {x}_n\), and \(\mathbf {A}=-n^{\alpha }\nabla ^2 g_n({\mathbf{p}} )\). Lemma A.8 implies that

$$\begin{aligned} \mathbf {x}_n^T\mathbf {A}{\mathbf {x}}_n= & {} (\Sigma ^{-\frac{1}{2}}{\mathbf {x}}_n)^T\Sigma ^{\frac{1}{2}} {\mathbf {A}}\Sigma ^{\frac{1}{2}}(\Sigma ^{-\frac{1}{2}}{\mathbf {x}}_n)\nonumber \\= & {} {\mathbf {T}}_n^T (2k^{-2}{\mathbf {I}}_{k-1}){\mathbf {T}}_n. \end{aligned}$$

Since \(\mathbf {x}_n \xrightarrow {D} {{\mathcal {N}}}(0, \Sigma )\), we have \(\mathbf {T}_n\xrightarrow {D} \mathbf {T}\), where \(\mathbf {T}\sim {{\mathcal {N}}}(0,\mathbf {I}_{k-1}).\) Let \(T_i\) be the ith component of vector \({\mathbf {T}}\). Applying the continuous mapping theorem, we obtain

$$\begin{aligned} \mathbf {x}_n^T{\mathbf {A}}\mathbf {x}_n \xrightarrow {D} {\mathbf {T}} ^T (2k^{-2}{\mathbf {I}}_{k-1}){\mathbf {T}}=2k^{-2}\sum \limits _{i=1}^{k-1} T_i^2. \end{aligned}$$

Thus, Eq. (22) becomes

$$\begin{aligned} n^{1-\alpha }\{p_0- g_n(\hat{\mathbf{p }})\}= 0.5 \mathbf {x}_n^T{\mathbf {A}}{\mathbf {x}}_n + o_p(1)\xrightarrow {D} k^{-2}\sum \limits _{i=1}^{k-1} T_i^2 . \end{aligned}$$

The result follows from the fact that the \(T_i^2\) are independent and identically distributed random variables, each following the Chi-square distribution with 1 degree of freedom. \(\square\)

The proof of Theorem 4.1 is very similar to that of Theorem 2.1 and is thus omitted. However, to help the reader to reconstruct the proof, we note that the partial derivatives of \(g_n^\vee\) can be calculated using the facts that

$$\begin{aligned} \frac{\partial g_n^{\vee }({\mathbf{p}} )}{\partial p_j} = \frac{\partial g_n(-{\mathbf{p}} )}{ \partial p_j} {\text{ and }} \frac{\partial ^2 g_n^{\vee }({\mathbf{p}} )}{\partial p_i \partial p_j} = -\frac{\partial ^2 g_n(-{\mathbf{p}} )}{\partial p_i \partial p_j}. \end{aligned}$$

Further, we formulate a version of Lemmas A.1 and A.2 for the maximum.

Lemma A.9

  1. 1.

    There is a constant \(\epsilon >0\) such that \(p_\vee - (k-r_\vee )e^{-n^{\alpha } \epsilon } \le g^\vee _n({\mathbf{p}} ) \le p_\vee\). When \(r_\vee \ne k\), we can take \(\epsilon = \min \limits _{j:p_j < p_\vee }(p_\vee -p_j)\).

  2. 2.

    For any constant \(\beta \in {\mathbb {R}}\)

    $$\begin{aligned} n^{\beta }\{g^\vee _n({\mathbf{p}} )-p_\vee \} \xrightarrow []{}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$
  3. 3.

    For any \(1\le j \le k\) and any constant \(\beta \in {\mathbb {R}}\)

    $$\begin{aligned} n^{\beta }\frac{e^{n^{\alpha }p_j}}{w^\vee }\{g_n^\vee ({\mathbf{p}} )-p_j\} \xrightarrow []{}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$
  4. 4.

    If \({\mathbf{p}} ^*_n\) is as in Lemma A.2 and \(w^{\vee *}= \sum \limits _{i=1}^k e^{n^\alpha p^*_i}\), then for every \(j=1,2,\dots ,k\) we have

    $$\begin{aligned} n^\alpha \left( p_j^*-p_\vee \right) e^{n^\alpha p_j^*}\frac{1}{w^{\vee *}} {\mathop {\rightarrow }\limits ^{p}}0 \end{aligned}$$


    $$\begin{aligned} n^{\alpha }e^{n^{\alpha }p^*_j}\frac{1}{w^{\vee *}}\{g^\vee _n({\mathbf{p}} ^*_n)-p_j^*\} {\mathop {\rightarrow }\limits ^{p}}0 {\text{ as }} n\rightarrow \infty . \end{aligned}$$


We only prove the first part, as proofs of the rest are similar to those of Lemmas A.1 and A.2. If \(r_\vee =k\), then \(g_n^\vee ({\mathbf{p}} )=1/k=p_\vee\) and the result holds with any \(\epsilon >0\). Now, assume that \(k\ne r_\vee\) and let \(\epsilon\) be as defined above. First note that

$$\begin{aligned} g_n^\vee ({\mathbf{p}} )=\sum _{j=1}^k p_j e^{n^\alpha p_j}\frac{1}{w^\vee }\le p_\vee \sum _{j=1}^k e^{n^\alpha p_j}\frac{1}{w^\vee }=p_\vee . \end{aligned}$$

Note further that for \(p_i<p_\vee\)

$$\begin{aligned}&\frac{e^{n^\alpha p_j}}{w^\vee } = \left\{ \sum _{i=1}^k e^{n^\alpha (p_i-p_j)} \right\} ^{-1} \le \left\{ \sum _{i:p_i=p_\vee } e^{n^\alpha (p_\vee -p_j)} \right\} ^{-1} \\&\quad = \frac{1}{r_\vee } e^{-n^\alpha (p_\vee -p_j)} \le \frac{1}{r_\vee } e^{-n^\alpha \epsilon }. \end{aligned}$$

It follows that

$$\begin{aligned} g_n^\vee ({\mathbf{p}} )\ge & {} \sum _{i: p_i=p_\vee } p_i e^{n^\alpha p_i}\frac{1}{w^\vee } = p_\vee \frac{ r_\vee e^{n^\alpha p_\vee }}{w^\vee } = p_\vee +p_\vee \left( \frac{ r_\vee e^{n^\alpha p_\vee }}{w^\vee }-1\right) \\= & {} p_\vee + \frac{p_\vee }{w^\vee }\left( r_\vee e^{n^\alpha p_\vee }-\sum _{i: p_i=p_\vee }e^{n^\alpha p_\vee }-\sum _{i: p_i<p_\vee }e^{-n^\alpha p_i}\right) \\= & {} p_\vee - \frac{p_\vee }{w^\vee } \sum _{i: p_i<p_\vee }e^{-n^\alpha p_i} \ge p_\vee - \frac{p_\vee }{r_\vee } (k-r_\vee ) e^{-n^\alpha \epsilon }. \end{aligned}$$

From here the result follows. \(\square\)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahzarnia, A., Grabchak, M. & Jiang, J. Estimation of the Minimum Probability of a Multinomial Distribution. J Stat Theory Pract 15, 24 (2021).

Download citation

  • Accepted:

  • Published:

  • DOI: