Abstract
The Generalized Chinese Restaurant Process (GCRP) describes a sequence of exchangeable random partitions of the numbers \(\{1,\dots ,n\}\). This process is related to the Ewens sampling model in Genetics and to Bayesian nonparametric methods such as topic models. In this paper, we study the GCRP in a regime where the number of parts grows like nα with α > 0. We prove a non-asymptotic concentration result for the number of parts of size \(k=o(n^{\alpha /(2\alpha +4)}/(\log n)^{1/(2+\alpha )})\). In particular, we show that these random variables concentrate around ckV∗nα where V∗nα is the asymptotic number of parts and ck ≈ k−(1+α) is a positive value depending on k. We also obtain finite-n bounds for the total number of parts. Our theorems complement asymptotic statements by Pitman and more recent results on large and moderate deviations by Favaro, Feng and Gao.
Similar content being viewed by others
References
Abramowitz, M. and Stegun, I.A. (1964). Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York, ninth dover printing tenth gpo printing edition.
Aldous, D., Ibragimov, I. and Jacod, J. (1985). Ecole d’Ete de Probabilites de Saint-Flour XIII, 1983, volume 1117 of Ecole d’Ete de Probabilites de Saint-Flour Springer-Verlag Berlin Heidelberg.
Brightwell, G. and Luczak, M. (2012). Vertices of high degree in the preferential attachment tree. Electron J Probab 17, 1–43.
Chung, F. and Lu, L. (2006). Complex graphs and networks, Vol. 107, AMS and CBMS, Providence.
Crane, H. (2016). The ubiquitous ewens sampling formula. Stat Sci31, 1–19.
Crane, H. and Dempsey, W. (2017). Edge exchangeable models for interaction networks. J Am Stat Assoc 0, 0–0.
Favaro, S. and Feng, S. (2015). Large deviation principles for the ewens-pitman sampling model. Electron J Probab 20, 26 pp.
Favaro, S., Feng, S. and Gao, F. (2018). Large deviation principles for the ewens-pitman sampling model. Sankhya A - Springer India 13171, 1–12.
Freedman, D.A. (1975). On tail probabilities for martingales. Ann Probab 02, 100–118.
Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B. and Blei, D.M. (2004). Hierarchical topic models and the nested chinese restaurant process, 16. MIT Press, Thrun, S., Saul, L. K. and Schölkopf, B. (eds.), p. 17–24.
Kingman, J.F.C. (1978). Uses of exchangeability. Ann Probab 6, 183–197.
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab Th Rel Fields 102, 145–158.
Pitman, J. (2006). Combinatorial stochastic processes, 1875. Springer, Berlin.
Pitman, J. and Yor, M. (1997). The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Ann Probab 04, 855–900.
Warren, J. (1972). Ewens. The sampling theory of selectively neutral alleles. Theor Popul Biol 3, 87–132.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Some Estimates on Γ(x)
Appendix A: Some Estimates on Γ(x)
In this appendix we prove some useful bounds regarding gamma functions and other relations involving them.
1.1 A.1. Preliminaries Estimates
Lemma A.1 (Stirling formula for Gamma function - see formula 6.1.42 in Abramowitz and Stegun (1964)).
For all x > 0 we have
Lemma A.2.
For all positive x, it follows that
Proof
Observe that by the Lemma A.1
and the result follows by Taylor approximation. □
Lemma A.3.
Let β,λ be two positive real numbers with β > λ then
-
(1)
\(\displaystyle \frac {{\Gamma }(\beta -\lambda )}{{\Gamma }(\beta )} \leq e^{\frac {1}{12(\beta -\lambda )}} \left (\frac {\beta }{\beta -\lambda }\right )^{1/2} \left (\frac {1}{\beta -\lambda } \right )^{\lambda }\);
-
(2)
\(\displaystyle \frac {{\Gamma }(\beta )}{{\Gamma }(\beta -\lambda )} \leq e^{\frac {1}{12\beta }} \left (\frac {\beta - \lambda }{\beta }\right )^{1/2} \beta ^{\lambda }\).
Proof
For the first item, by Lemma A.1 and the bound \((1-\frac {x}{n})^{n} \leq e^{-x}\) it follows that
The second item follows analogously. □
Lemma A.4.
For 0 < x < 1 and y > 0 we have
Proof
The righthand side follows directly from the inequality 1 − x ≤ e−x. For the other size observe that \((1-x)^{y} = \exp (y \cdot \log (1-x) )\). Recalling the Taylor expansion of \(\log \)
so
□
Lemma A.5.
For every k ≤ n we have
with
Proof
Using the expression given by Lemma A.2, we obtain
Now, multiplying and dividing by nk−α the right-hand side of the above identity becomes and using the inequality ew ≤ 1 + 2w, for w ∈ (0,1),
so
Now, let us give a lower bound for the fraction. Suppose k ≥
by Lemma A.4 and the inequality ex ≥ 1 + x, we have
so
By Bernoulli’s Inequality, for k ≤ n/𝜃
and
Join these inequalities, we obtain
□
1.2 A.2. Order of ϕ n and ψ n(k)
This part is devoted to prove bounds for the two normalizing factors ϕn and ψn(k) whose definition we recall latter.
Lemma A.6.
Let ϕn be as above, then the following bounds hold
-
(1)
\(\displaystyle \frac {1}{\phi _{j}} < \frac {2{\Gamma }(1+\theta +\alpha )}{{\Gamma }(1+\theta ) \cdot (j+\theta )^{\alpha }}\);
-
(2)
\(\displaystyle \frac {1}{(j+\theta ) \phi _{j+1}} < \frac {2{\Gamma }(1+\theta +\alpha )}{{\Gamma }(1+\theta ) \cdot (j+\theta )^{1+\alpha }}\);
-
(3)
There exists a constant Cϕ such that
$$ \phi_{j} \leq C_{\phi} j^{\alpha};$$
In particular ϕn = Θ(nα).
Proof
Let us prove the first two items and the third will follow analogously.
(1). By Lemma A.3
then
(2). This part follows using the duplication property Γ(x + 1) = xΓ(x) and the previous item and the inequality
□
The next lemma provides similar bounds for the normalization factor ψn(k) whose definition is recalled bellow.
Lemma A.7.
For ψn(k) defined as above, the following bounds hold
-
1.
\(\psi _{n}(k) \leq \frac {2{\Gamma }(k+\theta )}{ {\Gamma }(\alpha +\theta )} \frac {1}{(n+\theta -k+\alpha )^{k-\alpha }}\), for n ≥ 2k;
-
2.
\(\frac {1}{\psi _{n}(k)} \leq \frac { e^{\frac {1}{12}} {\Gamma }(\alpha +\theta )}{{\Gamma }(k+\theta )} (n +\theta )^{k-\alpha }\)
Proof
(1). By Lemma A.3 we have
And for 2k ≤ n it follows that
Then
2. Again, by Lemma A.3 we have
and the result follows from the previous inequality. □
Lemma A.8.
For the ration of the factors ϕn and ψn(k) the following upper bound holds
Proof
Using the definition of both factors, we have
and using the bounds on ratio of gamma functions in Lemma A.3, we have
□
Rights and permissions
About this article
Cite this article
Oliveira, R.I., Pereira, A. & Ribeiro, R. Concentration in the Generalized Chinese Restaurant Process. Sankhya A 84, 628–670 (2022). https://doi.org/10.1007/s13171-020-00210-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-020-00210-7