Skip to main content
Log in

Concentration in the Generalized Chinese Restaurant Process

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

The Generalized Chinese Restaurant Process (GCRP) describes a sequence of exchangeable random partitions of the numbers \(\{1,\dots ,n\}\). This process is related to the Ewens sampling model in Genetics and to Bayesian nonparametric methods such as topic models. In this paper, we study the GCRP in a regime where the number of parts grows like nα with α > 0. We prove a non-asymptotic concentration result for the number of parts of size \(k=o(n^{\alpha /(2\alpha +4)}/(\log n)^{1/(2+\alpha )})\). In particular, we show that these random variables concentrate around ckVnα where Vnα is the asymptotic number of parts and ckk−(1+α) is a positive value depending on k. We also obtain finite-n bounds for the total number of parts. Our theorems complement asymptotic statements by Pitman and more recent results on large and moderate deviations by Favaro, Feng and Gao.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abramowitz, M. and Stegun, I.A. (1964). Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York, ninth dover printing tenth gpo printing edition.

  • Aldous, D., Ibragimov, I. and Jacod, J. (1985). Ecole d’Ete de Probabilites de Saint-Flour XIII, 1983, volume 1117 of Ecole d’Ete de Probabilites de Saint-Flour Springer-Verlag Berlin Heidelberg.

  • Brightwell, G. and Luczak, M. (2012). Vertices of high degree in the preferential attachment tree. Electron J Probab 17, 1–43.

    Article  MathSciNet  Google Scholar 

  • Chung, F. and Lu, L. (2006). Complex graphs and networks, Vol. 107, AMS and CBMS, Providence.

  • Crane, H. (2016). The ubiquitous ewens sampling formula. Stat Sci31, 1–19.

    MathSciNet  MATH  Google Scholar 

  • Crane, H. and Dempsey, W. (2017). Edge exchangeable models for interaction networks. J Am Stat Assoc 0, 0–0.

    MATH  Google Scholar 

  • Favaro, S. and Feng, S. (2015). Large deviation principles for the ewens-pitman sampling model. Electron J Probab 20, 26 pp.

    Article  MathSciNet  Google Scholar 

  • Favaro, S., Feng, S. and Gao, F. (2018). Large deviation principles for the ewens-pitman sampling model. Sankhya A - Springer India 13171, 1–12.

    Google Scholar 

  • Freedman, D.A. (1975). On tail probabilities for martingales. Ann Probab 02, 100–118.

    MathSciNet  MATH  Google Scholar 

  • Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B. and Blei, D.M. (2004). Hierarchical topic models and the nested chinese restaurant process, 16. MIT Press, Thrun, S., Saul, L. K. and Schölkopf, B. (eds.), p. 17–24.

  • Kingman, J.F.C. (1978). Uses of exchangeability. Ann Probab 6, 183–197.

    Article  MathSciNet  Google Scholar 

  • Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab Th Rel Fields 102, 145–158.

    Article  MathSciNet  Google Scholar 

  • Pitman, J. (2006). Combinatorial stochastic processes, 1875. Springer, Berlin.

    Google Scholar 

  • Pitman, J. and Yor, M. (1997). The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Ann Probab 04, 855–900.

    MathSciNet  MATH  Google Scholar 

  • Warren, J. (1972). Ewens. The sampling theory of selectively neutral alleles. Theor Popul Biol 3, 87–132.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. I. Oliveira.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Some Estimates on Γ(x)

Appendix A: Some Estimates on Γ(x)

In this appendix we prove some useful bounds regarding gamma functions and other relations involving them.

1.1 A.1. Preliminaries Estimates

Lemma A.1 (Stirling formula for Gamma function - see formula 6.1.42 in Abramowitz and Stegun (1964)).

For all x > 0 we have

$$ \frac{(2\pi)^{1/2}}{e^{x}}x^{x-\frac{1}{2}} \leq {\Gamma}(x) \leq \frac{(2\pi)^{1/2}e^{1/12x}}{e^{x}}x^{x-1/2}.$$

Lemma A.2.

For all positive x, it follows that

$$ {\Gamma}(x) = \frac{(2\pi)^{1/2}}{e^{x}}x^{x-\frac{1}{2}} \left( 1+ O\left( \frac{1}{x} \right) \right). $$

Proof

Observe that by the Lemma A.1

$$ 0 \leq {\Gamma}(x) - \frac{(2\pi)^{1/2}}{e^{x}} x^{x-\frac{1}{2}} \leq \frac{(2\pi)^{1/2}}{e^{x}}x^{x-\frac{1}{2}} \left( e^{1/12x} - 1 \right), $$

and the result follows by Taylor approximation. □

Lemma A.3.

Let β,λ be two positive real numbers with β > λ then

  1. (1)

    \(\displaystyle \frac {{\Gamma }(\beta -\lambda )}{{\Gamma }(\beta )} \leq e^{\frac {1}{12(\beta -\lambda )}} \left (\frac {\beta }{\beta -\lambda }\right )^{1/2} \left (\frac {1}{\beta -\lambda } \right )^{\lambda }\);

  2. (2)

    \(\displaystyle \frac {{\Gamma }(\beta )}{{\Gamma }(\beta -\lambda )} \leq e^{\frac {1}{12\beta }} \left (\frac {\beta - \lambda }{\beta }\right )^{1/2} \beta ^{\lambda }\).

Proof

For the first item, by Lemma A.1 and the bound \((1-\frac {x}{n})^{n} \leq e^{-x}\) it follows that

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(\beta-\lambda)}{{\Gamma}(\beta)} &\leq& \frac{e^{\frac{1}{12(\beta-\lambda)}}(\beta-\lambda)^{\beta-\lambda-1/2}}{e^{\beta-\lambda}} \frac{e^{\beta}}{\beta^{\beta-1/2}}\\ &\leq& \frac{e^{\frac{1}{12(\beta-\lambda)}}}{e^{-\lambda}}\left( 1-\frac{\lambda}{\beta}\right)^{\beta} \left( 1+\frac{\lambda}{\beta-\lambda}\right)^{1/2} (\beta-\lambda)^{-\lambda}\\ &\leq& e^{\frac{1}{12(\beta-\lambda)}} \left( \frac{\beta}{\beta-\lambda}\right)^{1/2} \left( \frac{1}{\beta-\lambda} \right)^{\lambda}. \end{array} $$

The second item follows analogously. □

Lemma A.4.

For 0 < x < 1 and y > 0 we have

$$ e^{-xy} \exp \left( \frac{x^{2}y}{1-x} \right) \leq (1-x)^{y} \leq e^{-xy}. $$

Proof

The righthand side follows directly from the inequality 1 − xex. For the other size observe that \((1-x)^{y} = \exp (y \cdot \log (1-x) )\). Recalling the Taylor expansion of \(\log \)

$$ \log(1-x) = -x - \sum\limits_{k=2}^{\infty} \frac{x^{k}}{k} \geq -x - \sum\limits_{k=2}^{\infty} x^{k} = -x - \frac{x^{2}}{1-x}, $$

so

$$ (1-x)^{y} \geq \exp(-xy) \exp \left( \frac{x^{2}y}{1-x} \right) . $$

Lemma A.5.

For every kn we have

$$ \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} = \frac{1}{n^{k-\alpha}} \left( 1 + r(n,k) \right) , $$

with

$$ - \frac{k (2k+\theta)}{n+\theta} \leq r(n,k) \leq \frac{28k^{2}}{n}. $$

Proof

Using the expression given by Lemma A.2, we obtain

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} &\leq& \frac{e^{k-\alpha}}{(n+\theta-k+\alpha)^{k-\alpha}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{n+\theta-\frac{1}{2}} \left( 1+ \frac{1}{6(n+\theta-k+\alpha)} \right) \\ &\leq& \frac{1}{(n+\theta-k+\alpha)^{k-\alpha}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{-\frac{1}{2}} \left( 1+ \frac{1}{6(n+\theta-k+\alpha)} \right) \\ &\leq& \frac{1}{n^{k-\alpha}} \left( 1+\frac{k- \theta-\alpha}{n+\theta-k+\alpha} \right)^{k-\alpha} \left( 1 +\frac{k}{n - k} \right)^{\frac{1}{2}} \left( 1+ \frac{1}{6(n-k)} \right) \end{array} $$

Now, multiplying and dividing by nkα the right-hand side of the above identity becomes and using the inequality ew ≤ 1 + 2w, for w ∈ (0,1),

$$ \begin{array}{@{}rcl@{}} \left( 1+\frac{k- \theta-\alpha}{n+\theta-k+\alpha} \right)^{k-\alpha} &\leq& \exp \left( \frac{(k-\alpha)(k- \theta-\alpha)}{n+\theta-k+\alpha} \right) \\ &\leq& \exp \left( \frac{k^{2}}{n-k} \right) \\ &\leq& \left( 1 + \frac{2k^{2}}{n-k} \right) \end{array} $$

so

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} &\leq& \frac{1}{n^{k-\alpha}} \left( 1 + \frac{2k^{2}}{n-k} \right) \left( 1 +\frac{k}{n - k} \right)^{\frac{1}{2}} \left( 1+ \frac{1}{6(n-k)} \right) \\ & \leq& \frac{1}{n^{k-\alpha}} \left( 1 + \frac{28k^{2}}{n} \right) \end{array} $$

Now, let us give a lower bound for the fraction. Suppose k

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} &\geq& \frac{e^{k-\alpha}}{(n+\theta-k+\alpha)^{k-\alpha}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{n+\theta-\frac{1}{2}} \frac{1 }{1+ \frac{1}{n+\theta}} \\ &\geq& \frac{e^{k-\alpha}}{(n+\theta)^{k-\alpha}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{n+\theta-\frac{1}{2}} \frac{1 }{1+ \frac{1}{n+\theta}} \\ &=& \frac{e^{k-\alpha}}{n^{k-\alpha}} \frac{n^{k-\alpha}}{(n+\theta)^{k-\alpha}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{n+\theta-\frac{1}{2}} \left( 1- \frac{1 }{n+\theta} \right) \\ \end{array} $$

by Lemma A.4 and the inequality ex ≥ 1 + x, we have

$$ \begin{array}{@{}rcl@{}} \left( 1-\frac{k-\alpha}{n+\theta} \right)^{n+\theta} &\geq& e^{\alpha -k} \exp \left( -\left( \frac{k-\alpha}{n+\theta} \right)^{2} (n +\theta) \frac{1}{1-\frac{k-\alpha}{n+\theta}} \right) \\ &\geq& e^{\alpha -k} \exp\left( - \frac{(k-\alpha)^{2}}{n+\theta-k+\alpha} \right) \\ &\geq& e^{\alpha -k} \left( 1- \frac{k^{2}}{n+\theta} \right). \end{array} $$

so

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} &\geq& \frac{1}{(n+\theta)^{k-\alpha}} \left( 1 -\frac{k^{2}}{n+\theta} \right) \left( 1- \frac{1 }{n+\theta} \right) \left( 1-\frac{k-\alpha}{n+\theta} \right)^{-\frac{1}{2}} \\ &\geq& \frac{1}{n^{k-\alpha}} \frac{n^{k-\alpha}}{(n+\theta)^{k-\alpha}} \left( 1-\frac{2k^{2}}{n+\theta} \right) \\ \end{array} $$

By Bernoulli’s Inequality, for kn/𝜃

$$ \begin{array}{@{}rcl@{}} \frac{n^{k-\alpha}}{(n+\theta)^{k-\alpha}} = \left( 1- \frac{\theta}{n+\theta} \right)^{k-\alpha} \geq 1- \frac{k \theta}{n+\theta} \end{array} $$

and

Join these inequalities, we obtain

$$ \begin{array}{@{}rcl@{}} \frac{{\Gamma}(n+\theta-k+\alpha)}{{\Gamma}(n+\theta)} &\geq& \frac{1}{n^{k-\alpha}} \left( 1- \frac{k \theta}{n+\theta} \right) \left( 1- \frac{k^{2}}{n+\theta} \right) \left( 1-\frac{k-\alpha}{n+\theta} \right)^{-\frac{1}{2}} \left( 1- \frac{1 }{n+\theta} \right) \\ &\geq& \frac{1}{n^{k-\alpha}} \left( 1- \frac{k \theta}{n+\theta} \right) \left( 1- \frac{k^{2}}{n+\theta} \right)^{\frac{3}{2}} \\ &\geq& \frac{1}{n^{k-\alpha}} \left( 1- \frac{k \theta}{n+\theta} \right) \left( 1- \frac{2k^{2}}{n+\theta} \right) \\ &\geq& \frac{1}{n^{k-\alpha}} \left( 1- \frac{k (2k+\theta)}{n+\theta} \right) \end{array} $$

1.2 A.2. Order of ϕ n and ψ n(k)

This part is devoted to prove bounds for the two normalizing factors ϕn and ψn(k) whose definition we recall latter.

$$ \phi_{n} = \frac{{\Gamma}(1+\theta)}{{\Gamma}(1+\theta+\alpha)} \frac{{\Gamma}(n+\alpha+\theta)}{{\Gamma}(n+\theta)}, $$

Lemma A.6.

Let ϕn be as above, then the following bounds hold

  1. (1)

    \(\displaystyle \frac {1}{\phi _{j}} < \frac {2{\Gamma }(1+\theta +\alpha )}{{\Gamma }(1+\theta ) \cdot (j+\theta )^{\alpha }}\);

  2. (2)

    \(\displaystyle \frac {1}{(j+\theta ) \phi _{j+1}} < \frac {2{\Gamma }(1+\theta +\alpha )}{{\Gamma }(1+\theta ) \cdot (j+\theta )^{1+\alpha }}\);

  3. (3)

    There exists a constant Cϕ such that

    $$ \phi_{j} \leq C_{\phi} j^{\alpha};$$

In particular ϕn = Θ(nα).

Proof

Let us prove the first two items and the third will follow analogously.

(1). By Lemma A.3

$$ \frac{{\Gamma}(j+\theta)}{{\Gamma}(j+\theta+\alpha)} \leq e^{\frac{1}{12(j+\theta)}}\left( 1+\frac{\alpha}{j+\alpha+\theta} \right)^{1/2} (j+\theta)^{-\alpha} \leq 2(j+\theta)^{-\alpha}. $$

then

$$ \begin{array}{@{}rcl@{}} \frac{1}{\phi_{j}} < \frac{2{\Gamma}(1+\theta+\alpha)}{{\Gamma}(1+\theta) \cdot (j+\theta)^{1+\alpha}}. \end{array} $$

(2). This part follows using the duplication property Γ(x + 1) = xΓ(x) and the previous item and the inequality

$$ \frac{{\Gamma}(j+1+\theta)}{{\Gamma}(j+1+\theta+\alpha)} = \frac{(j+\theta) {\Gamma}(j+\theta)}{(j+\theta+\alpha) {\Gamma}(j+\theta+\alpha)} < \frac{{\Gamma}(j+\theta)}{{\Gamma}(j+\theta+\alpha)}. $$

The next lemma provides similar bounds for the normalization factor ψn(k) whose definition is recalled bellow.

$$ \begin{array}{ll} \psi_{n}(k) = \frac{{\Gamma}(k+\theta){\Gamma}(n-k+\alpha+\theta)}{{\Gamma}(\alpha+\theta){\Gamma}(n+\theta)}. \end{array} $$

Lemma A.7.

For ψn(k) defined as above, the following bounds hold

  1. 1.

    \(\psi _{n}(k) \leq \frac {2{\Gamma }(k+\theta )}{ {\Gamma }(\alpha +\theta )} \frac {1}{(n+\theta -k+\alpha )^{k-\alpha }}\), for n ≥ 2k;

  2. 2.

    \(\frac {1}{\psi _{n}(k)} \leq \frac { e^{\frac {1}{12}} {\Gamma }(\alpha +\theta )}{{\Gamma }(k+\theta )} (n +\theta )^{k-\alpha }\)

Proof

(1). By Lemma A.3 we have

$$ \frac{{\Gamma}(n-k+\alpha+\theta)}{{\Gamma}(n+\theta)} \leq e^{\frac{1}{12(n+\theta-k+\alpha)}} \left( 1 + \frac{k-\alpha}{n+\theta - k + \alpha}\right)^{1/2} \frac{1}{(n+\theta - k +\alpha)^{k-\alpha}}. $$

And for 2kn it follows that

$$ \frac{{\Gamma}(n-k+\alpha+\theta)}{{\Gamma}(n+\theta)} \leq \frac{2}{(n+\theta - k +\alpha)^{k-\alpha}}. $$

Then

$$ \psi_{n}(k) \leq \frac{2{\Gamma}(k+\theta)}{ {\Gamma}(\alpha+\theta)} \frac{1}{(n+\theta-k+\alpha)^{k-\alpha}}. $$

2. Again, by Lemma A.3 we have

$$ \frac{{\Gamma}(n+\theta)}{{\Gamma}(n-k+\alpha+\theta)} \leq e^{\frac{1}{12(n+\theta)}} \left( 1- \frac{k-\alpha}{n+\theta}\right)^{1/2} (n+\theta)^{k-\alpha} \leq e^{\frac{1}{12}} (n+\theta)^{k-\alpha} $$

and the result follows from the previous inequality. □

Lemma A.8.

For the ration of the factors ϕn and ψn(k) the following upper bound holds

$$ \frac{\phi_{j}}{(\psi_{j+1}(k))^{2} \cdot(j+\theta)} \leq \frac{{\Gamma}(1+\theta){\Gamma}(\alpha+\theta)^{2}}{{\Gamma}(1+\theta+\alpha){\Gamma}(k+\theta)^{2}} (j+\theta)^{2k-\alpha-1}. $$

Proof

Using the definition of both factors, we have

$$ \frac{\phi_{j}}{(\psi_{j+1}(k))^{2} (j+\theta)} = \frac{{\Gamma}(1+\theta){\Gamma}(\alpha+\theta)^{2}}{{\Gamma}(1+\theta+\alpha){\Gamma}(k+\theta)^{2}} \frac{{\Gamma}(j+\theta){\Gamma}(j-1+\alpha+\theta)}{{\Gamma}(j+1-k+\alpha+\theta)^{2}} (j+\theta)^{2}(j+\alpha+\theta) $$

and using the bounds on ratio of gamma functions in Lemma A.3, we have

$$ \frac{\phi_{j}}{(\psi_{j+1}(k))^{2} \cdot(j+\theta)} \leq \frac{e^{\frac{1}{12}}{\Gamma}(1+\theta){\Gamma}(\alpha+\theta)^{2}}{{\Gamma}(1+\theta+\alpha){\Gamma}(k+\theta)^{2}} (j+\theta)^{2k-\alpha-1}. $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, R.I., Pereira, A. & Ribeiro, R. Concentration in the Generalized Chinese Restaurant Process. Sankhya A 84, 628–670 (2022). https://doi.org/10.1007/s13171-020-00210-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-020-00210-7

Keywords and phrases

AMS (2000) subject classification

Navigation