Skip to main content

Non-linear Log-Sobolev Inequalities for the Potts Semigroup and Applications to Reconstruction Problems

Abstract

Consider the semigroup of random walk on a complete graph, which we call the Potts semigroup. Diaconis and Saloff-Coste (Ann Appl Probab 6(3):695–750, 1996) computed the maximum ratio between the relative entropy and the Dirichlet form, obtaining the constant \(\alpha _2\) in the 2-log-Sobolev inequality (2-LSI). In this paper, we obtain the best possible non-linear inequality relating entropy and the Dirichlet form (i.e., p-NLSI, \(p\ge 1\)). As an example, we show \(\alpha _1 = 1+\frac{1+o(1)}{\log q}\). Furthermore, p-NLSIs allow us to conclude that for \(q\ge 3\), distributions that are not a product of identical distributions can have slower speed of convergence to equilibrium, unlike the case \(q=2\). By integrating the 1-NLSI we obtain new strong data processing inequalities (SDPI), which in turn allows us to improve results of Mossel and Peres (Ann Appl Probab 13(3):817–844, 2003) on reconstruction thresholds for Potts models on trees. A special case is the problem of reconstructing color of the root of a q-colored tree given knowledge of colors of all the leaves. We show that to have a non-trivial reconstruction probability the branching number of the tree should be at least

$$\begin{aligned} \frac{\log q}{\log q - \log (q-1)} = (1-o(1))q\log q. \end{aligned}$$

This recovers previous results (of Sly in Commun Math Phys 288(3):943–961, 2009 and Bhatnagar et al. in SIAM J Discrete Math 25(2):809–826, 2011) in (slightly) more generality, but more importantly avoids the need for any coloring-specific arguments. Similarly, we improve the state-of-the-art on the weak recovery threshold for the stochastic block model with q balanced groups, for all \(q\ge 3\). To further show the power of our method, we prove optimal non-reconstruction results for a broadcasting on trees model with Gaussian kernels, closing a gap left open by Eldan et al. (Combin Probab Comput 31(6):1048–1069, 2022). These improvements advocate information-theoretic methods as a useful complement to the conventional techniques originating from the statistical physics.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Data Availability

This article does not have any external supporting data.

Notes

  1. Throughout this paper, \(\log \) means natural logarithm.

  2. [43] requires the function \(\Phi _p\) to be concave. We do not make this assumption initially, however to extend these inequalities to product semigroups concavification will be necessary – see Sect. 3.

  3. In the case \(q=2\), \(b_p\) differs from [43] by a constant factor due to a different parametrization of the semigroup.

  4. We recall that for a pair of random variables XY the mutual information between X and Y is defined as \(I(X;Y) = D(P_{X,Y} \Vert P_X P_Y)\).

  5. The branching number of a tree roughly measures the growth rate of the tree. For regular trees or Galton–Watson trees, the branching number is almost surely equal to the expected offspring of a vertex. See Definition 30 for a formal definition.

  6. An earlier version of the paper incorrectly stated that \(\Phi _{\frac{1}{r}}(y)\) is convex in r for fixed y. The incorrect statement was not used elsewhere in the paper.

References

  1. Abbe, E., Sandon, C.: Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation. Adv. Neural Inform. Process. Syst. 29, (2016)

  2. Ahlswede, R., Gács, P.: Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 925–939 (1976)

  3. Banks, J., Moore, C., Neeman, J., Netrapalli, P.: Information-theoretic thresholds for community detection in sparse networks. In: Conference on Learning Theory, pp. 383–416. PMLR (2016)

  4. Bernstein, A.J.: Maximally connected arrays on the n-cube. SIAM J. Appl. Math. 15(6), 1485–1489 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bhatnagar, N., Sly, A., Tetali, P.: Reconstruction threshold for the hardcore model. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques: 13th International Workshop, APPROX 2010, and 14th International Workshop, RANDOM 2010, Barcelona, Spain, September 1-3, 2010. Proceedings, pp. 434–447. Springer (2010)

  6. Bhatnagar, N., Vera, J., Vigoda, E., Weitz, D.: Reconstruction for colorings on trees. SIAM J. Discret. Math. 25(2), 809–826 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bleher, P.M., Ruiz, J., Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79, 473–482 (1995)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  8. Bobkov, S.G., Tetali, P.: Modified logarithmic Sobolev inequalities in discrete settings. J. Theor. Probab. 19, 289–336 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Braverman, M., Garg, A., Ma, T., Nguyen, H.L., Woodruff, D.P.: Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 1011–1020 (2016)

  10. Chansangiam, P.: Operator monotone functions: characterizations and integral representations (2013). arXiv:1305.2471

  11. Choi, M.-D., Ruskai, M.B., Seneta, E.: Equivalence of certain entropy contraction coefficients. Linear Algebra Appl. 208, 29–36 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cohen, J., Kempermann, J.H.B., Zbaganu, G.: Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population. Springer, New York (1998)

    Google Scholar 

  13. Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 146–157 (2017)

  14. Diaconis, P., Saloff-Coste, L.: Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab. 6(3), 695–750 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  15. Efthymiou, C.: Reconstruction/non-reconstruction thresholds for colourings of general Galton–Watson trees (2014). arXiv:1406.3617

  16. Eldan, R., Mikulincer, D., Pieters, H.: Community detection and percolation of information in a geometric setting. Comb. Probab. Comput. 31(6), 1048–1069 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  17. Émery, M., Yukich, J.E.: A simple proof of the logarithmic Sobolev inequality on the circle. Séminaire de probabilités de Strasbourg 21, 173–175 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  18. Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the ising model. Ann. Appl. Probab. 410–433 (2000)

  19. Evans, W.S., Schulman, L.J.: Signal propagation and noisy circuits. IEEE Trans. Inf. Theory 45(7), 2367–2373 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  20. Formentin, M., Külske, C.: On the purity of the free boundary condition Potts measure on random trees. Stoch. Process. Appl. 119(9), 2992–3005 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. Goel, S.: Modified logarithmic Sobolev inequalities for some models of random walk. Stoch. Process. Appl. 114(1), 51–79 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  22. Gross, L.: Logarithmic Sobolev inequalities. Am. J. Math. 97(4), 1061–1083 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  23. Gu, Y.: Channel Comparison Methods and Statistical Problems on Graphs. PhD thesis, Massachusetts Institute of Technology (2023)

  24. Hadar, U., Liu, J., Polyanskiy, Y., Shayevitz, O.: Communication complexity of estimating correlations. In: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 792–803 (2019)

  25. Hueston Harper, L.: Optimal assignments of numbers to vertices. J. Soc. Ind. Appl. Math. 12(1), 131–135 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hart, S.: A note on the edges of the n-cube. Discret. Math. 14(2), 157–163 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kesten, H., Stigum, B.P.: Additional limit theorems for indecomposable multidimensional Galton–Watson processes. Ann. Math. Stat. 37(6), 1463–1481 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  28. Külske, C., Formentin, M.: A symmetric entropy bound on the non-reconstruction regime of Markov chains on Galton–Watson trees. Electron. Commun. Probab. 14, 587–596 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. John, H., Lindsey, I.I.: Assignment of numbers to vertices. Am. Math. Mon. 71(5), 508–516 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  30. Liu, W., Ning, N.: Large degree asymptotics and the reconstruction threshold of the asymmetric binary channels. J. Stat. Phys. 174, 1161–1188 (2019)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  31. Lyons, R.: Random walks and percolation on trees. Ann. Probab. 18(3), 931–958 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  32. Makur, A., Polyanskiy, Y.: Comparison of channels: criteria for domination by a symmetric channel. IEEE Trans. Inf. Theory 64(8), 5704–5725 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  33. Martinelli, F., Sinclair, A., Weitz, D.: Fast mixing for independent sets, colorings, and other models on trees. Random Struct. Algor. 31(2), 134–172 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  34. Massoulié, L.: community detection thresholds and the weak Ramanujan property. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 694–703 (2014)

  35. Mézard, M., Montanari, A.: Reconstruction on trees and spin glass transition. J. Stat. Phys. 124, 1317–1350 (2006)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  36. Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 162, 431–461 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  37. Mossel, E., Neeman, J., Sly, A.: A proof of the block model threshold conjecture. Combinatorica 38(3), 665–708 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  38. Mossel, E., Oleszkiewicz, K., Sen, A.: On reverse hypercontractivity. Geom. Funct. Anal. 23(3), 1062–1097 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  39. Mossel, E., Peres, Y.: Information flow on trees. Ann. Appl. Probab. 13(3), 817–844 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  40. Mossel, E., Roch, S., Sly, A.: Robust estimation of latent tree graphical models: inferring hidden states with inexact parameters. IEEE Trans. Inf. Theory 59(7), 4357–4373 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  41. Mossel, E., Sly, A., Sohn, Y.: Exact phase transitions for stochastic block models and reconstruction on trees (2022). arXiv:2212.03362

  42. Ordentlich, O., Polyanskiy, Y.: Strong data processing constant is achieved by binary inputs. IEEE Trans. Inf. Theory 68(3), 1480–1481 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  43. Polyanskiy, Y., Samorodnitsky, A.: Improved log-Sobolev inequalities, hypercontractivity and uncertainty principle on the hypercube. J. Funct. Anal. 277(11), 108280 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  44. Polyanskiy, Y., Wu, Y.: Strong data-processing inequalities for channels and Bayesian networks. In: Convexity and Concentration, pp. 211–249. Springer (2017)

  45. Polyanskiy, Y., Yihong, W.: Application of the information-percolation method to reconstruction problems on graphs. Math. Stat. Learn. 2(1), 1–24 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  46. Raginsky, M.: Strong data processing inequalities and \(\phi \)-Sobolev inequalities for discrete channels. IEEE Trans. Inf. Theory 62(6), 3355–3389 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  47. Rényi, A.: On measures of dependence. Acta Math. Hungar. 10(3–4), 441–451 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  48. Sarmanov, O.V.: Maximum correlation coefficient (nonsymmetric case). Select. Transl. Math. Stat. Probab. 2, 207–210 (1963)

    MathSciNet  MATH  Google Scholar 

  49. Sly, A.: Reconstruction of random colourings. Commun. Math. Phys. 288(3), 943–961 (2009)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  50. Sly, A.: Reconstruction for the Potts model. Ann. Probab. 39(4), 1365–1406 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  51. Wyner, A.D., Ziv, J.: A theorem on the entropy of certain binary sequences and applications: Part I. IEEE Trans. Inf. Theory 19(6), 769–772 (1973)

    Article  MATH  Google Scholar 

  52. Xu, A., Raginsky, M.: Converses for distributed estimation via strong data processing inequalities. In: 2015 IEEE International Symposium on Information Theory (ISIT), pp. 2376–2380. IEEE (2015)

Download references

Acknowledgements

The authors are grateful to Jingbo Liu for helpful discussions on reconstruction problems on trees, and to the anonymous reviewers for valuable comments.

Funding

This work was supported in part by the MIT-IBM Watson AI Lab, by the Center for Science of Information (CSoI), an National Science Foundation Science and Technology Center, under grant agreement CCF-09-39370, and by the National Science Foundation under Grant No CCF-2131115.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuzhou Gu.

Ethics declarations

Conflict of interest

The authors do not have any other competing interests to declare.

Additional information

Communicated by J. Ding.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Input-Unrestricted Contraction Coefficient of Potts Channels

Computation of (input-restricted or input-unrestricted) contraction coefficients is often a daunting task. Previously, Makur and Polyanskiy [32] obtained lower and upper bounds of input-unrestricted KL divergence contraction coefficients for Potts channels. In this section we compute the exact value of these contraction coefficients.

We remark that after our work, Ordentlich and Polyanskiy [42] proved that the input-unrestricted contraction coefficients are achieved by input distributions of support size at most two, giving an alternative (and simpler) proof for Prop. 40. We include our original proof here for completeness.

Proposition 40

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}} ({{\,\mathrm{\text {PC}}\,}}_\lambda ) = \frac{q\lambda ^2}{(q-2)\lambda +2}. \end{aligned}$$
(231)

Proof

The result is obvious for \(\lambda \in \{0, 1\}\). In the following, assume that \(\lambda \not \in \{0, 1\}\).

We use the following characterization of contraction coefficient using Rényi maximal correlation [47] (see e.g. Sarmanov [48]). For any channel M, we have

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}(M) = \left( \sup _\mu \sup _{f, g} {\mathbb {E}}[f(X) g(Y)]\right) ^2 \end{aligned}$$
(232)

where \(\mu \) is a distribution on [q], \(P_X=\mu \), \(P_{Y|X}=M\), \(f: {\mathcal {X}}\rightarrow {\mathbb {R}}\) satisfies \({\mathbb {E}}_X[f] = 0\) and \({\mathbb {E}}_X[f^2]=1\), and \(g: {\mathcal {Y}}\rightarrow {\mathbb {R}}\) satisfies \({\mathbb {E}}_Y[g]=0\) and \({\mathbb {E}}_Y[g^2]=1\).

Specialize to \(M = {{\,\mathrm{\text {PC}}\,}}_\lambda \). Write \(\mu = (p_1, \ldots , p_q)\), \(f = (f_1, \ldots , f_q)\) and \(g = (g_1, \ldots , g_q)\). Then

$$\begin{aligned} {\mathbb {E}}[f(X) g(Y)] = \sum _{i,j} f_i p_i g_j {\mathbb {P}}[Y=j|X=i] = \lambda \sum f_i p_i g_i. \end{aligned}$$
(233)

When \(\lambda > 0\), we need to maximize \(\sum f_i g_i p_i\). When \(\lambda < 0\), we make the transform \(f_i \leftarrow -f_i\), and still maximize \(\sum f_i g_i p_i\). So we get the following optimization problem.

$$\begin{aligned}&\max \sum f_i g_i p_i\nonumber \\ \text {s.t.} \quad&\sum f_i p_i = 0, \end{aligned}$$
(234)
$$\begin{aligned}&\sum f_i^2 p_i = 1, \end{aligned}$$
(235)
$$\begin{aligned}&\sum g_i \left( \lambda p_i + \frac{1-\lambda }{q}\right) = 0, \end{aligned}$$
(236)
$$\begin{aligned}&\sum g_i^2 \left( \lambda p_i + \frac{1-\lambda }{q}\right) = 1, \end{aligned}$$
(237)
$$\begin{aligned}&p_i \ge 0, \sum p_i = 1. \end{aligned}$$
(238)

Lower bound. Take

$$\begin{aligned} \mu = \left( \frac{1}{2}, \frac{1}{2}, 0, \ldots , 0\right) , \quad f = (1, -1, 0, \ldots , 0), \quad g = (u, -u, 0, \ldots , 0) \end{aligned}$$
(239)

where

$$\begin{aligned} u = \sqrt{\frac{q}{(q-2)\lambda +2}}. \end{aligned}$$
(240)

Then

$$\begin{aligned} \sum f_i g_i p_i = u. \end{aligned}$$
(241)

So

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}({{\,\mathrm{\text {PC}}\,}}_\lambda ) \ge (\lambda u)^2 = \frac{q\lambda ^2}{(q-2)\lambda +2}. \end{aligned}$$
(242)

Upper bound. Let us fix \(\mu \) and maximize over f and g. Assume for the sake of contrary that \(\sum f_i g_i p_i > u\). The space of possible g is bounded; some coordinates of f may be unbounded, but their values do not affect the objective function. By compactness, the maximum value of \(\sum f_i g_i p_i\) is achieved at some point f and g. Let us compute the derivatives.

$$\begin{aligned} \nabla _f \sum f_i g_i p_i&= (g_i p_i)_{i\in [q]}, \end{aligned}$$
(243)
$$\begin{aligned} \nabla _f \sum f_i p_i&= (p_i)_{i\in [q]}, \end{aligned}$$
(244)
$$\begin{aligned} \nabla _f \sum f_i^2 p_i&= (2 f_i p_i)_{i\in [q]}, \end{aligned}$$
(245)
$$\begin{aligned} \nabla _g \sum f_i g_i p_i&= (f_i p_i)_{i\in [q]}, \end{aligned}$$
(246)
$$\begin{aligned} \nabla _g \sum g_i \left( \lambda p_i + \frac{1-\lambda }{q}\right)&= \left( \lambda p_i + \frac{1-\lambda }{q}\right) _{i\in [q]}, \end{aligned}$$
(247)
$$\begin{aligned} \nabla _g \sum g_i^2 \left( \lambda p_i + \frac{1-\lambda }{q}\right)&= \left( 2 g_i\left( \lambda p_i + \frac{1-\lambda }{q}\right) \right) _{i\in [q]}. \end{aligned}$$
(248)

By maximality in f, there exist constants A and B such that

$$\begin{aligned} g_i p_i = A p_i + B f_i p_i \end{aligned}$$
(249)

for all i. By maximality in g, there exist constants C and D such that

$$\begin{aligned} f_i p_i = C \left( \lambda p_i + \frac{1-\lambda }{q}\right) + D g_i \left( \lambda p_i + \frac{1-\lambda }{q}\right) \end{aligned}$$
(250)

for all i.

By (249),

$$\begin{aligned} \sum f_i g_i p_i = \sum f_i (A p_i + B f_i p_i) = B. \end{aligned}$$
(251)

By (250),

$$\begin{aligned} \sum f_i g_i p_i = \sum g_i \left( C \left( \lambda p_i + \frac{1-\lambda }{q}\right) + D g_i \left( \lambda p_i + \frac{1-\lambda }{q}\right) \right) = D. \end{aligned}$$
(252)

So \(B=D> u > 0\).

For i with \(p_i\ne 0\), we have \(g_i = A + B f_i\) by (249). If for some i, \(p_i=0\), then

$$\begin{aligned} \frac{1-\lambda }{q}(C+D g_i) = 0. \end{aligned}$$
(253)

This means \(\#\{g_i: p_i=0\} = 1.\) So we can choose \(f_i\) for such i such that

$$\begin{aligned} g_i = A + B f_i \end{aligned}$$
(254)

for all i.

From (236), we get

$$\begin{aligned} 0&= \sum g_i \left( \lambda p_i + \frac{1-\lambda }{q}\right) \nonumber \\&= \sum (A + B f_i) \left( \lambda p_i + \frac{1-\lambda }{q}\right) \nonumber \\&= A + B \frac{1-\lambda }{q} \sum f_i. \end{aligned}$$
(255)

From (237), we get

$$\begin{aligned} 1&= \sum g_i^2 \left( \lambda p_i + \frac{1-\lambda }{q}\right) \nonumber \\&= \sum (A^2 + 2AB f_i + B^2 f_i^2) \left( \lambda p_i + \frac{1-\lambda }{q}\right) \nonumber \\&= A^2 + 2AB \frac{1-\lambda }{q} \sum f_i + B^2 \lambda + B^2 \frac{1-\lambda }{q} \sum f_i^2 \nonumber \\&= B^2 \left( \lambda + \frac{1-\lambda }{q} \sum f_i^2 - \left( \frac{1-\lambda }{q} \sum f_i\right) ^2 \right) . \end{aligned}$$
(256)

The result then follows from Claim 41 because we have

$$\begin{aligned} B&= \frac{1}{\sqrt{\lambda + \frac{1-\lambda }{q} \left( \sum f_i^2 - \frac{1-\lambda }{q} \left( \sum f_i\right) ^2\right) }} \nonumber \\&\le \frac{1}{\sqrt{\lambda + \frac{1-\lambda }{q} \left( \sum f_i^2 - \frac{1}{q-1} \left( \sum f_i\right) ^2\right) }} \nonumber \\&\le \frac{1}{\sqrt{\lambda + \frac{1-\lambda }{q} \cdot 2}} = u. \end{aligned}$$
(257)

The second step is because \(0\le \frac{1-\lambda }{q} \le \frac{1}{q-1}\) for all \(\lambda \in \left[ -\frac{1}{q-1}, 1\right] \). \(\square \)

Claim 41

For any distribution \(\mu \) and any f satisfying (234) and (235), we have

$$\begin{aligned} \sum f_i^2 - \frac{1}{q-1} \left( \sum f_i\right) ^2 \ge 2. \end{aligned}$$
(258)

Proof

Let us first prove the result for f with support size two. WLOG assume that \(f_1>0\), \(f_2<0\), \(f_3 = \cdots = f_q=0\). One can compute that

$$\begin{aligned} f_1 = \sqrt{\frac{p_2}{p_1(p_1+p_2)}}, \quad f_2 = -\sqrt{\frac{p_1}{p_1(p_1+p_2)}}. \end{aligned}$$
(259)

Then

$$\begin{aligned}&~f_1^2+f_2^2-\frac{1}{q-1} (f_1+f_2)^2 \nonumber \\ \ge&~ f_1^2+f_2^2-(f_1+f_2)^2 \nonumber \\ =&~ \frac{1}{p_1+p_2} \left( \frac{p_2}{p_1}+\frac{p_1}{p_2} - \left( \sqrt{\frac{p_2}{p_1}}-\sqrt{\frac{p_1}{p_2}}\right) ^2\right) \nonumber \\ =&~ \frac{2}{p_1+p_2} \ge 2. \end{aligned}$$
(260)

Let us define

$$\begin{aligned} S(\mu )&:= \left\{ f : \sum f_ip_i=0, \sum f_i^2 p_i = 1\right\} \end{aligned}$$
(261)
$$\begin{aligned} U(f)&:= \sum f_i^2 - \frac{1}{q-1} \left( \sum f_i\right) ^2. \end{aligned}$$
(262)

Now suppose that for some \(\mu \) and \(f \in S(\mu )\) we have \(U(f) < 2\). The space \(S(\mu ) / \{\pm \}\) is continuous as a subsapce of \(({\mathbb {R}}^q\backslash \{0\})/\{\pm \}\) (with quotient topology), and there exists \(f\in S(\mu )\) with \(U(f) \ge 2\) (e.g., f with support size two), so for sufficiently small \(\epsilon >0\) there exists \(f\in S(\mu )\) such that \(U(f) \in (2-\epsilon , 2)\).

Let \(\lambda = -\frac{1}{q-1}\). Take \(\epsilon \) small enough so that \(\lambda + \frac{1-\lambda }{q} (2-\epsilon ) > 0\) and choose \(f\in S(\mu )\) with \(U(f) \in (2-\epsilon , 2)\). Define

$$\begin{aligned} B&= \frac{1}{\sqrt{\lambda + \frac{1-\lambda }{q} U(f)}} > u, \end{aligned}$$
(263)
$$\begin{aligned} A&= - B \frac{1-\lambda }{q} \sum f_i, \end{aligned}$$
(264)
$$\begin{aligned} g_i&= A + B f_i \forall i. \end{aligned}$$
(265)

One can check that g satisfies (236) and (237), and

$$\begin{aligned} \sum f_i g_i p_i = B > u. \end{aligned}$$
(266)

By (232) and (233), this implies

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}\left( {{\,\mathrm{\text {PC}}\,}}_{-\frac{1}{q-1}}\right) > \frac{1}{q-1}. \end{aligned}$$
(267)

However, we have

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}\left( {{\,\mathrm{\text {PC}}\,}}_{-\frac{1}{q-1}}\right) \le \eta _{{{\,\mathrm{\text {TV}}\,}}} \left( {{\,\mathrm{\text {PC}}\,}}_{-\frac{1}{q-1}}\right) = \frac{1}{q-1}. \end{aligned}$$
(268)

Contradiction. \(\square \)

An Upper Bound for Input-Restricted Contraction Coefficient for Potts Channels

In this section we prove an upper bound for the input-restricted KL divergence contraction coefficient for ferromagnetic Potts channels.

Proposition 42

Fix \(q\ge 3\). For all \(\lambda \in [0, 1]\), we have

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda ) \le \frac{\lambda ^2}{(1-\lambda ) \frac{2(q-1)\log (q-1)}{q(q-2)}+ \lambda }. \end{aligned}$$
(269)

For all \(\lambda \in [-\frac{1}{q-1}, 0]\), we have

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda ) \le \frac{\lambda ^2}{(1+ (q-1)\lambda )\frac{2(q-1)\log (q-1)}{q(q-2)} - \lambda \frac{\log q}{(q-1)(\log q - \log (q-1))}}. \end{aligned}$$
(270)

We first prove a lemma.

Lemma 43

\(\frac{(qx-1)^2}{\psi (x)}\) is concave in \(x\in [0, 1]\), where \(\psi : [0, 1] \rightarrow {\mathbb {R}}\) is defined in (16).

Proof

Let \(f(x) = \frac{(qx-1)^2}{\psi (x)}\).

$$\begin{aligned} f'(x)&= \frac{2q (qx-1)}{\psi (x)} - \frac{(qx-1)^2 \psi '(x)}{\psi ^2(x)}. \end{aligned}$$
(271)
$$\begin{aligned} f''(x)&= \frac{2q^2}{\psi (x)} - \frac{4q(qx-1) \psi '(x)}{\psi ^2(x)} -\frac{(qx-1)^2\psi ''(x)}{\psi ^2(x)} + \frac{2 (qx-1)^2(\psi ')^2 (x)}{\psi ^3(x)}\nonumber \\&= \frac{2}{\psi ^3(x)} (q\psi (x)-(qx-1)\psi '(x))^2 - \frac{(qx-1)^2\psi ''(x)}{\psi ^2(x)}. \end{aligned}$$
(272)

Therefore it suffices to prove that

$$\begin{aligned} g(x) := \psi ^3(x) f''(x) = 2(q\psi (x)-(qx-1)\psi '(x))^2 - (qx-1)^2 \psi (x) \psi ''(x) \end{aligned}$$
(273)

is non-positive for \(x\in [0, 1]\). Note that \(g\left( \frac{1}{q}\right) =0\). So we only need to prove that \(g'(x)\ge 0\) for \(x\in \left[ 0, \frac{1}{q}\right] \) and \(g'(x) \le 0\) for \(x\in \left[ \frac{1}{q}, 1\right] \).

$$\begin{aligned} g'(x)&= -4(qx-1)\psi ''(x) (q\psi (x) - (qx-1) \psi '(x)) - 2q (qx-1) \psi (x) \psi ''(x) \nonumber \\&- (qx-1)^2 \psi '(x) \psi ''(x) - (qx-1)^2 \psi (x)\psi '''(x) \nonumber \\&= (qx-1) (-6q \psi (x) \psi ''(x) + (qx-1) (3 \psi '(x)\psi ''(x) - \psi (x) \psi '''(x))). \end{aligned}$$
(274)

Therefore we would like to prove that

$$\begin{aligned} u(q,x) := -6q \psi (x) \psi ''(x) + (qx-1) (3 \psi '(x)\psi ''(x) - \psi (x) \psi '''(x)) \end{aligned}$$
(275)

is non-positive. We enlarge the domain of u and prove that \(u(q,x)\le 0\) for real \(q>1\) and \(x\in (0, 1)\).

We fix \(x\in (0, 1)\) and consider \(u_x(q):= u(q, x)\). We have \(u_x\left( \frac{1}{x}\right) =0\). So it suffices to prove that \(u_x\) is concave in q. We have

$$\begin{aligned} \psi '(x)&= \log x - \log \frac{1-x}{q-1}, \end{aligned}$$
(276)
$$\begin{aligned} \psi ''(x)&= \frac{1}{x} + \frac{1}{1-x}, \end{aligned}$$
(277)
$$\begin{aligned} \psi '''(x)&= \frac{1}{(1-x)^2} - \frac{1}{x^2}, \end{aligned}$$
(278)
$$\begin{aligned} \frac{\partial }{\partial q} \psi (x)&= \frac{1}{q} - \frac{1-x}{q-1}, \end{aligned}$$
(279)
$$\begin{aligned} \frac{\partial }{\partial q} \psi '(x)&= \frac{1}{q-1}, \end{aligned}$$
(280)
$$\begin{aligned} \frac{\partial }{\partial q} \psi ''(x)&= \frac{\partial }{\partial q} \psi '''(x)=0. \end{aligned}$$
(281)

So

$$\begin{aligned} u_x'(q) =&~ -6 \psi (x) \psi ''(x) - 6 q \left( \frac{1}{q} - \frac{1-x}{q-1}\right) \psi ''(x) + x (3 \psi '(x) \psi ''(x) - \psi (x) \psi '''(x)) \nonumber \\&+ (qx-1) \left( 3 \frac{1}{q-1} \psi ''(x) - \left( \frac{1}{q} - \frac{1-x}{q-1}\right) \psi '''(x)\right) . \end{aligned}$$
(282)
$$\begin{aligned} u_x''(q) =&~ -12 \left( \frac{1}{q} - \frac{1-x}{q-1}\right) \psi ''(x) -6q \left( -\frac{1}{q^2} + \frac{1-x}{(q-1)^2}\right) \psi ''(x) \nonumber \\&+6x \frac{1}{q-1} \psi ''(x) - 2x \left( \frac{1}{q}-\frac{1-x}{q-1}\right) \psi '''(x) \nonumber \\&+(qx-1)\left( -3 \frac{1}{(q-1)^2}\psi ''(x) -\left( -\frac{1}{q^2} + \frac{1-x}{(q-1)^2}\right) \psi '''(x)\right) \nonumber \\ =&~ \frac{(qx-1)^2(1-2q+(q-2)x)}{q^2(q-1)^2 x^2(1-x)^2}\le 0. \end{aligned}$$
(283)

We are done. \(\square \)

Proof of Prop. 42

For \(x\in [0,1]\) and \(\lambda \in \left[ -\frac{1}{q-1}, 1\right] \) we define

$$\begin{aligned} f_x(\lambda ) := \frac{\lambda ^2 \psi (x)}{\psi \left( \lambda x + \frac{1-\lambda }{q}\right) }. \end{aligned}$$
(284)

(Value of \(f_x(0)\) is defined using continuity.) Note that

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda ) = \sup _{x\in (0,1]} \frac{\lambda ^2}{f_x(\lambda )}. \end{aligned}$$
(285)

So to compute an upper bound for \(\eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda )\), it suffices to lower bound \(f_x(\lambda )\).

Because

$$\begin{aligned} f_x(\lambda ) = \frac{\psi (x)}{(qx-1)^2} \cdot \frac{\left( q\left( \lambda x + \frac{1-\lambda }{q}\right) -1\right) ^2}{\psi \left( \lambda x + \frac{1-\lambda }{q}\right) }, \end{aligned}$$
(286)

by Lemma 43, for fixed x, \(f_x(\lambda )\) is concave for \(\lambda \in \left[ -\frac{1}{q-1}, 1\right] \). Therefore by computing lower bounds of \(f_x(\lambda )\) for \(\lambda = -\frac{1}{q-1}, 0, 1\), we can get lower bounds on \(f_x(\lambda )\) for all \(\lambda \in \left[ -\frac{1}{q-1}, 1\right] \).

By Prop. 33, we have

$$\begin{aligned} f_x\left( -\frac{1}{q-1}\right) \ge \frac{\log q}{(q-1)^2 (\log q - \log (q-1))}. \end{aligned}$$
(287)

By L’Hôpital’s rule,

$$\begin{aligned} f_x(0)&= \psi (x) \lim _{\lambda \rightarrow 0} \frac{2\lambda }{\left( x-\frac{1}{q}\right) \psi '\left( \lambda x + \frac{1-\lambda }{q}\right) } \nonumber \\&= \psi (x) \lim _{\lambda \rightarrow 0} \frac{2}{\left( x-\frac{1}{q}\right) ^2\psi ''\left( \lambda x + \frac{1-\lambda }{q}\right) } \nonumber \\&= \frac{2(q-1) \psi (x)}{(qx-1)^2}. \end{aligned}$$
(288)

By Lemma 43, \(g(x):= \frac{(qx-1)^2}{\psi (x)}\) is concave in x. Also

$$\begin{aligned} g'\left( 1-\frac{1}{q}\right)&= \frac{2q(q-2)}{\frac{1}{q} (q-2) \log (q-1)} -\frac{(q-2)^2 \cdot 2\log (q-1)}{\left( \frac{1}{q} (q-2) \log (q-1)\right) ^2} =0. \end{aligned}$$
(289)

So

$$\begin{aligned} g(x) \le g\left( 1-\frac{1}{q}\right) = \frac{q(q-2)}{\log (q-1)} \end{aligned}$$
(290)

and

$$\begin{aligned} f_x(0) \ge \frac{2(q-1) \log (q-1)}{q(q-2)}. \end{aligned}$$
(291)

It is easy to see that

$$\begin{aligned} f_x(1) \ge 1. \end{aligned}$$
(292)

Because \(f_x(\lambda )\) is concave in \(\lambda \), (269) follows from (291) and (292), and (270) follows from (287) and (291). \(\square \)

Proof of Prop. 42 implies the first order limit behavior of \(\eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda )\) as \(\lambda \rightarrow 0\).

$$\begin{aligned} \lim _{\lambda \rightarrow 0} \frac{\eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda )}{\lambda ^2} = \frac{q(q-2)}{2(q-1)\log (q-1)}. \end{aligned}$$
(293)

For all \(q\ge 3\) and \(\lambda \in (0, 1]\), we have

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , {{\,\mathrm{\text {PC}}\,}}_\lambda )&\le \frac{\lambda ^2}{(1-\lambda ) \frac{2(q-1)\log (q-1)}{q(q-2)}+ \lambda } \nonumber \\&\le \lambda ^2(1-\lambda )\frac{q(q-2)}{2(q-1)\log (q-1)}+\lambda ^3 \nonumber \\&< \lambda ^2 \frac{q(q-2)}{2(q-1)\log (q-1)} \nonumber \\&< \lambda ^2 \frac{q-1}{2\log (q-1)}, \end{aligned}$$
(294)

where the second step is by Cauchy inequality.

For comparison with input-unrestricted contraction coefficient

$$\begin{aligned} \eta _{{{\,\mathrm{\text {KL}}\,}}}({{\,\mathrm{\text {PC}}\,}}_\lambda ) = \frac{q\lambda ^2}{(q-2)\lambda +2}, \end{aligned}$$
(295)

we note that \(\frac{\lambda ^2}{\eta _{{{\,\mathrm{\text {KL}}\,}}}({{\,\mathrm{\text {PC}}\,}}_\lambda )}\) is linear in \(\lambda \), and

$$\begin{aligned} \frac{1}{q-1}&< \frac{\log q}{(q-1)^2(\log q-\log (q-1))}, \end{aligned}$$
(296)
$$\begin{aligned} \frac{2}{q}&< \frac{2(q-1)\log (q-1)}{q(q-2)}. \end{aligned}$$
(297)

So Prop. 42 implies (31).

Non-convexity of Certain Functions

In this section we prove Prop. 26. Let us first prove a lemma.

Lemma 44

Let g be a strictly increasing smooth function from \([x_0, x_1]\) to \([y_0, y_1]\), and f be a smooth function from \([x_0, x_1]\) to \({\mathbb {R}}\). Assume that \(g'(x_0)=f'(x_0)=0\) and \((g'' f''' - f'' g''')(x_0)>0\). Then the function \(h=f\circ g^{-1}: [y_0, y_1] \rightarrow {\mathbb {R}}\) is not concave near \(y_0\).

Proof

Derivatives of h are

$$\begin{aligned} h'(x)&= \frac{f'(g^{-1}(x))}{g'(g^{-1}(x))}, \end{aligned}$$
(298)
$$\begin{aligned} h''(x)&= \left( \frac{f''}{g'}-\frac{f' g''}{(g')^2}\right) \left( g^{-1}(x)\right) \frac{1}{g'\left( g^{-1}(x)\right) } \nonumber \\&=\left( \frac{f''}{(g')^2} - \frac{f' g''}{(g')^3}\right) \left( g^{-1}(x)\right) . \end{aligned}$$
(299)

So it suffices to study the sign of \(g' f'' - f' g''\) for x near \(x_0\). Let \(u = g' f'' - f' g''\). We have \(u(x_0)=0\). Let us compute the derivatives.

$$\begin{aligned} u'&= g' f''' - f' g''', \end{aligned}$$
(300)
$$\begin{aligned} u''&= g' f^{(4)} + g'' f''' - f'' g''' - g' g^{(4)}. \end{aligned}$$
(301)

So \(u'(x_0)=0\) and \(u''(x_0) = (g'' f''' - f'' g''')(x_0) > 0\). So u is positive near \(x_0\). \(\square \)

Proof of Prop. 26

We apply Lemma 44 to \(g=\psi \), \(x_0=\frac{1}{q}\), \(x_1=1\), \(y_0=0\), \(y_1=\log q\), and various f. We have

$$\begin{aligned} \psi '\left( \frac{1}{q}\right)&= 0, \end{aligned}$$
(302)
$$\begin{aligned} \psi ''\left( \frac{1}{q}\right)&= \frac{q^2}{q-1}, \end{aligned}$$
(303)
$$\begin{aligned} \psi '''\left( \frac{1}{q}\right)&= -\frac{q^3(q-2)}{(q-1)^2}. \end{aligned}$$
(304)

Part (i), \(p=1\). For \(b_1\), take

$$\begin{aligned} f(x) = -(q-1)\xi _1(x) = \log x + (q-1)\log \frac{1-x}{q-1} + q(\psi (x)-\log q). \end{aligned}$$
(305)

Then

$$\begin{aligned} f'(x)&= \frac{1}{x} -\frac{q-1}{1-x} + q\psi '(x), \end{aligned}$$
(306)
$$\begin{aligned} f''(x)&= -\frac{1}{x^2} - \frac{q-1}{(1-x)^2} + q \psi ''(x), \end{aligned}$$
(307)
$$\begin{aligned} f'''(x)&= \frac{2}{x^3} - \frac{2(q-1)}{(1-x)^3} + q \psi '''(x). \end{aligned}$$
(308)

So

$$\begin{aligned} f'\left( \frac{1}{q}\right)&=0, \end{aligned}$$
(309)
$$\begin{aligned} f''\left( \frac{1}{q}\right)&=-\frac{2q^3}{q-1}, \end{aligned}$$
(310)
$$\begin{aligned} f'''\left( \frac{1}{q}\right)&=\frac{3(q-2)q^4}{(q-1)^2}. \end{aligned}$$
(311)

We have

$$\begin{aligned} (\psi '' f''' - f'' \psi ''')\left( \frac{1}{q}\right)&= \frac{q^2}{q-1}\cdot \frac{3(q-2)q^4}{(q-1)^2} -\left( -\frac{2q^3}{q-1}\right) \left( -\frac{q^3(q-2)}{(q-1)^2}\right) \nonumber \\&= \frac{q^6(q-2)}{(q-1)^3} > 0. \end{aligned}$$
(312)

So Lemma 44 applies.

Part (i), \(p>1\). For \(b_p\) with \(p>1\), take

$$\begin{aligned} f(x)&= q-(q-1)\xi _p(x) \nonumber \\&= \left( x^{\frac{1}{p}} + (q-1) \left( \frac{1-x}{q-1}\right) ^{\frac{1}{p}}\right) \left( x^{1-\frac{1}{p}} + (q-1) \left( \frac{1-x}{q-1}\right) ^{1-\frac{1}{p}}\right) . \end{aligned}$$
(313)

For simplicity, write \(r = \frac{1}{p}\) and let \(u_r(x) = x^r + (q-1) \left( \frac{1-x}{q-1}\right) ^r\). Then \(f(x) = u_r(x) u_{1-r}(x)\). Let us compute derivatives of \(u_r\).

$$\begin{aligned} u_r'(x)&= r \left( x^{r-1} - \left( \frac{1-x}{q-1}\right) ^{r-1}\right) , \end{aligned}$$
(314)
$$\begin{aligned} u_r''(x)&= r(r-1) \left( x^{r-2} + \frac{1}{q-1} \left( \frac{1-x}{q-1}\right) ^{r-2}\right) , \end{aligned}$$
(315)
$$\begin{aligned} u_r'''(x)&= r(r-1) (r-2) \left( x^{r-3} - \frac{1}{(q-1)^2} \left( \frac{1-x}{q-1}\right) ^{r-3}\right) . \end{aligned}$$
(316)

So

$$\begin{aligned} u_r\left( \frac{1}{q}\right)&=q^{1-r}, \end{aligned}$$
(317)
$$\begin{aligned} u_r'\left( \frac{1}{q}\right)&=0, \end{aligned}$$
(318)
$$\begin{aligned} u_r''\left( \frac{1}{q}\right)&=r(r-1) \frac{q}{q-1} \left( \frac{1}{q}\right) ^{r-2}, \end{aligned}$$
(319)
$$\begin{aligned} u_r'''\left( \frac{1}{q}\right)&= r(r-1)(r-2) \frac{q(q-2)}{(q-1)^2} \left( \frac{1}{q}\right) ^{r-3}. \end{aligned}$$
(320)

Now we compute derivatives of f.

$$\begin{aligned} f'(x)&= u_r'(x) u_{1-r}(x) + u_r(x) u_{1-r}'(x), \end{aligned}$$
(321)
$$\begin{aligned} f''(x)&= u_r''(x) u_{1-r}(x) + 2u_r'(x) u_{1-r}'(x)+ u_r(x) u_{1-r}''(x), \end{aligned}$$
(322)
$$\begin{aligned} f'''(x)&= u_r'''(x) u_{1-r}(x) + 3u_r''(x) u_{1-r}'(x) + 3u_r'(x) u_{1-r}''(x) + u_r(x) u_{1-r}'''(x). \end{aligned}$$
(323)

So

$$\begin{aligned} f'\left( \frac{1}{q}\right)&= 0, \end{aligned}$$
(324)
$$\begin{aligned} f''\left( \frac{1}{q}\right)&= r(r-1) \frac{q}{q-1} \left( \frac{1}{q}\right) ^{r-2} \cdot q^r + (1-r)(-r) \frac{q}{q-1} \left( \frac{1}{q}\right) ^{-r-1} \cdot q^{1-r} \nonumber \\&= 2r(r-1) \frac{q^3}{(q-1)}, \end{aligned}$$
(325)
$$\begin{aligned} f'''\left( \frac{1}{q}\right)&= r(r-1)(r-2) \frac{q(q-2)}{(q-1)^2} \left( \frac{1}{q}\right) ^{r-3} \cdot q^r \nonumber \\&+ (1-r)(-r)(-r-1) \frac{q(q-2)}{(q-1)^2} \left( \frac{1}{q}\right) ^{-r-2} \cdot q^{1-r}, \nonumber \\&=-3r(r-1) \frac{q^4(q-2)}{(q-1)^2}. \end{aligned}$$
(326)

So

$$\begin{aligned} (\psi '' f''' - f'' \psi ''')\left( \frac{1}{q}\right)&= \frac{q^2}{q-1}\left( -3r(r-1) \frac{q^4(q-2)}{(q-1)^2}\right) \nonumber \\&-2r(r-1) \frac{q^3}{(q-1)} \left( -\frac{q^3(q-2)}{(q-1)^2}\right) \nonumber \\&=r(1-r) \frac{q^6(q-2)}{(q-1)^3} > 0. \end{aligned}$$
(327)

So Lemma 44 applies.

Part (ii). For \(s_\lambda \), take

$$\begin{aligned} f(x) = \psi \left( \lambda x + \frac{1-\lambda }{q}\right) . \end{aligned}$$
(328)

Then

$$\begin{aligned} f'(x)&= \lambda \psi '\left( \lambda x + \frac{1-\lambda }{q}\right) , \end{aligned}$$
(329)
$$\begin{aligned} f''(x)&= \lambda ^2 \psi ''\left( \lambda x + \frac{1-\lambda }{q}\right) , \end{aligned}$$
(330)
$$\begin{aligned} f'''(x)&= \lambda ^3 \psi '''\left( \lambda x + \frac{1-\lambda }{q}\right) . \end{aligned}$$
(331)

So

$$\begin{aligned} f'\left( \frac{1}{q}\right)&=0, \end{aligned}$$
(332)
$$\begin{aligned} f''\left( \frac{1}{q}\right)&= \lambda ^2 \psi ''\left( \frac{1}{q}\right) = \lambda ^2\frac{q^2}{q-1}, \end{aligned}$$
(333)
$$\begin{aligned} f'''\left( \frac{1}{q}\right)&= \lambda ^3 \psi '''\left( \frac{1}{q}\right) = -\lambda ^3 \frac{q^3(q-2)}{(q-1)^2}. \end{aligned}$$
(334)

We have

$$\begin{aligned} (\psi '' f''' - f'' \psi ''')\left( \frac{1}{q}\right)&= \frac{q^2}{q-1}\left( -\lambda ^3 \frac{q^3(q-2)}{(q-1)^2}\right) - \lambda ^2\frac{q^2}{q-1}\left( -\frac{q^3(q-2)}{(q-1)^2}\right) \nonumber \\&=\frac{q^5(q-2)}{(q-1)^3} (\lambda ^2-\lambda ^3) > 0. \end{aligned}$$
(335)

So Lemma 44 applies. \(\square \)

Concavity of Log-Sobolev Coefficients

Let K be a Markov kernel with stationary distribution \(\pi \). Define Dirichlet form \({\mathcal {E}}(\cdot , \cdot )\) and entropy form \({{\,\mathrm{\text {Ent}}\,}}_\pi (\cdot )\) as in Sect. 1.

For \(r\in {\mathbb {R}}\), we consider the tightest \(\frac{1}{r}\)-log-Sobolev inequality, corresponding to

$$\begin{aligned} \widetilde{b}_{\frac{1}{r}}(x)&:= \inf _{\begin{array}{c} f: {\mathcal {X}}\rightarrow {\mathbb {R}}_{\ge 0},\\ {\mathbb {E}}_\pi [f]=1, {{\,\mathrm{\text {Ent}}\,}}_\pi (f)=x \end{array}} {\mathcal {E}}(f^r, f^{1-r}), \end{aligned}$$
(336)
$$\begin{aligned} \widetilde{\Phi }_{\frac{1}{r}}(y)&:= \sup _{\begin{array}{c} f: {\mathcal {X}}\rightarrow {\mathbb {R}}_{\ge 0},\\ {\mathbb {E}}_\pi [f]=1, {\mathcal {E}}(f^r, f^{1-r})=y \end{array}} {{\,\mathrm{\text {Ent}}\,}}_\pi (f). \end{aligned}$$
(337)

The \(\frac{1}{r}\)-log-Sobolev constant is

$$\begin{aligned} \widetilde{\alpha }_{\frac{1}{r}} := \inf _{x>0} \frac{b_{\frac{1}{r}}(x)}{x} = \inf _{y>0} \frac{y}{\Phi _{\frac{1}{r}}(y)}. \end{aligned}$$
(338)

Remark 45

When \(r=0\), the fraction \(\frac{1}{r}\) should be understood as a formal symbol, and by definition we have \(\widetilde{b}_{\frac{1}{0}}(x)=\widetilde{b}_1(x)\) and \(\widetilde{\Phi }_{\frac{1}{0}}(y)=\widetilde{\Phi }_1(y)\) whenever they are defined. For \(r\in (0, 1)\), \(\widetilde{\Phi }_{\frac{1}{r}}(y) = \Phi _{\frac{1}{r}}(y)\) where \(\Phi _{\frac{1}{r}}\) the (pointwise) smallest function satisfying (8), and \(\widetilde{\alpha }_{\frac{1}{r}}=\alpha _{\frac{1}{r}}\) where \(\alpha _{\frac{1}{r}}\) is defined in (9). However, in general \(\widetilde{\alpha }_1\) is not equal to \(\alpha _1\). We use \(\widetilde{\cdot }\) to emphasize the difference.

The following result says that log-Sobolev constants are concave in r.Footnote 6

Proposition 46

We have

  1. (i)

    For fixed x, \(\widetilde{b}_{\frac{1}{r}}(x)\) is concave in r.

  2. (ii)

    \(\widetilde{\alpha }_{\frac{1}{r}}\) is concave in r.

Furthermore, if \((\pi , K)\) is reversible, then

  1. (iii)

    For fixed x, \(\widetilde{b}_{\frac{1}{r}}(x)\) is maximized at \(r=\frac{1}{2}\).

  2. (iv)

    \(\widetilde{\alpha }_{\frac{1}{r}}\) is maximized at \(r=\frac{1}{2}\).

Proof

Because \(\inf \) of concave functions is still concave, it suffices to prove that for any \(f: {\mathcal {X}}\rightarrow {\mathbb {R}}_{\ge 0}\) with \({\mathbb {E}}_\pi [f]=1\), \({\mathcal {E}}(f^r, f^{1-r})\) is concave in r.

$$\begin{aligned} \frac{d}{dr^2} {\mathcal {E}}(f^r, f^{1-r})&= \frac{d}{dr^2} \sum _{x,y\in {\mathcal {X}}} (I-K)(x, y) f(y)^r f(x)^{1-r} \pi (x)\\&= \sum _{x,y\in {\mathcal {X}}} (I-K)(x, y) f(y)^r f(x)^{1-r} \pi (x) (\log f(y)-\log f(x))^2\\&= \sum _{x\ne y\in {\mathcal {X}}} -K(x, y) f(y)^r f(x)^{1-r} \pi (x) (\log f(y)-\log f(x))^2\\&\le 0. \end{aligned}$$

When the Markov chain is reversible, we have \({\mathcal {E}}(f, g) = {\mathcal {E}}(g, f)\). So \(\widetilde{b}_{\frac{1}{r}}(x) = \widetilde{b}_{\frac{1}{1-r}}(x)\) and by concavity, \(\widetilde{b}_{\frac{1}{r}}(x)\) is maximized at \(r=\frac{1}{2}\). \(\square \)

Non-reconstruction for Broadcasting with a Gaussian Kernel

In this section, we prove optimal non-reconstruction results for a BOT model with continuous alphabet considered in Eldan et al. [16], using our method developed in Sect. 4.

Definition 47

(Broadcasting on trees with a Gaussian kernel). In this model, we are given a (possibly) infinite tree T with a marked root \(\rho \). The state space \({\mathcal {X}}\) is the unit circle \(S^1 = {\mathbb {R}}/2\pi {\mathbb {Z}}\). Let \(\pi = {{\,\mathrm{\text {Unif}}\,}}(S^1)\) be the uniform distribution. Let \(t>0\) be a parameter. The transfer kernel is \(M_t\), defined as \(Y = X + Z_t\) where \(Z_t \sim {\mathcal {N}}(0, t)\), where X is the input and Y is the output.

Now for each vertex \(v\in T\), we generate a label \(\sigma _v \in {\mathcal {X}}\) according to the following process:

  1. 1.

    Generate \(\sigma _\rho \sim \pi \).

  2. 2.

    Suppose we have generated a label for vertex u. For every child v of u, we generate v according to \(\sigma _v \sim M_t(\cdot | \sigma _u)\).

Let \(L_k\) denote the set of vertices at distance k to \(\rho \). We say reconstruction is impossible if and only if

$$\begin{aligned} \lim _{k\rightarrow \infty } I(\sigma _\rho ; \sigma _{L_k}) = 0. \end{aligned}$$
(339)

Let \(\lambda (M_t)\) denote the second largest eigenvalue of \(M_t\). [16] proved that for the above BOT model on a regular tree with offspring d, reconstruction holds when \(d\lambda (M_t)^2 > 1\), and non-reconstruction holds for \(d \lambda (M_t) < 1\). Note that there is a \(\lambda (M_t)\) factor gap between the reconstruction result and the non-reconstruction result. In the following, we prove that non-reconstruction holds as long as \(d\lambda (M_t)^2 < 1\), closing the gap.

We remark that Mossel et al. [40] studied a different BOT model with Gaussian broadcasting channels, and deteremined the reconstruction threshold for their model (which happened to also coincide with the Kesten–Stigum threshold). While sharing some similarities, their and our models do not seem to be directly comparable with each other.

Theorem 48

(Non-reconstruction for Gaussian BOT model). Consider the BOT model defined in Definition 47.

Let T be an infinite rooted tree with bounded maximum degree. Then reconstruction is impossible when

$$\begin{aligned} {{\,\mathrm{\text {br}}\,}}(T) \lambda (M_t)^2 < 1. \end{aligned}$$
(340)

Let T be a Galton-Watson tree with expected offspring d. Then reconstruction is impossible when

$$\begin{aligned} d \lambda (M_t)^2 < 1. \end{aligned}$$
(341)

The proof idea is to upper bound the input-restricted KL contraction coefficient by \(\lambda (M_t)^2\), then use a tree recursion similar to that of Theorem 5. However, because we are working in a continuous space, we must be careful about what we mean by contraction coefficients.

We would like an inequality of form

$$\begin{aligned} I(\sigma _u; \sigma _{L_{v,k}}) \le \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}} (\pi , M_t) I(\sigma _v; \sigma _{L_{v,k}}) \end{aligned}$$
(342)

where \(u\in V(T)\), v is child of u, \(L_{v,k}\) is the set of descendants of v at distance k to \(\rho \), and \(\widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}} (\pi , M_t)\) is a continuous version of contraction coefficient \(\eta _{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t)\).

We have

$$\begin{aligned} I(\sigma _u; \sigma _{L_{v,k}})&= {\mathbb {E}}_{\sigma _{L_{v,k}}} D(P_{\sigma _u | \sigma _{L_{v,k}}} \Vert P_{\sigma _u}) \end{aligned}$$
(343)
$$\begin{aligned}&= {\mathbb {E}}_{\sigma _{L_{v,k}}} D(M_t \circ P_{\sigma _v | \sigma _{L_{v,k}}} \Vert \pi ). \end{aligned}$$
(344)

Let us consider the distribution \(P_{\sigma _u | \sigma _{L_{u,k}}}\). If \(k = d(v,\rho )\), then \(P_{\sigma _u | \sigma _{L_{u,k}}}\) is a point measure. However, as long as \(k > d(u,\rho )\), pdf of \(P_{\sigma _u | \sigma _{L_{u,k}}}\) is smooth on \({\mathcal {X}}\) by an induction using belief propagation equation. Therefore we make the following definition.

Definition 49

(Smooth contraction coefficient). We define

$$\begin{aligned}&\widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) := \sup _{f\in {\mathcal {C}}}\frac{{{\,\mathrm{\text {Ent}}\,}}_\pi (M_t f)}{{{\,\mathrm{\text {Ent}}\,}}_\pi (f)}, \end{aligned}$$
(345)
$$\begin{aligned}&{\mathcal {C}}:= \{f: {\mathcal {X}}\rightarrow {\mathbb {R}}_{\ge 0} | f\text { smooth}, {\mathbb {E}}_\pi [f] = 1\}. \end{aligned}$$
(346)

where \({{\,\mathrm{\text {Ent}}\,}}_\pi (f)\) is defined in (2).

Lemma 50

$$\begin{aligned} \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) \le \exp (-t). \end{aligned}$$
(347)

Proof

Note that \((M_t)_{t\ge 0}\) forms a semigroup. Therefore it suffices to prove that for all \(f\in {\mathcal {C}}\), we have

$$\begin{aligned} \frac{d}{dt}|_{t=0} {{\,\mathrm{\text {Ent}}\,}}_\pi (f_t) \le -{{\,\mathrm{\text {Ent}}\,}}_\pi (f) \end{aligned}$$
(348)

where \(f_t = M_t f\).

We have

$$\begin{aligned}&~ \frac{d}{dt}|_{t=0} {{\,\mathrm{\text {Ent}}\,}}_\pi (f_t) \\ =&~ {\mathbb {E}}\left[ \frac{d}{dt}|_{t=0} (f_t \log f_t)\right] \\ =&~ {\mathbb {E}}\left[ (1 + \log f) \frac{d}{dt}|_{t=0} f_t\right] \\ =&~ {\mathbb {E}}\left[ (\log f) \frac{d}{dt}|_{t=0} f_t\right] \\ =&~ \frac{1}{2} {\mathbb {E}}\left[ f'' \log f\right]{} & {} \text {heat equation} \\ =&~ -\frac{1}{2} {\mathbb {E}}\left[ \frac{(f')^2}{f}\right]{} & {} \text {integration by parts} \\ \le&~ -{{\,\mathrm{\text {Ent}}\,}}_\pi (f).{} & {} {[17]} \end{aligned}$$

This finishes the proof. \(\square \)

Now we are ready to prove Theorem 48.

Proof of Theorem 48

By Lemma 50, we have

$$\begin{aligned} \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) \le \exp (-t) = \lambda (M_t)^2, \end{aligned}$$
(349)

where the value of \(\lambda (M_t)\) is proved in e.g., [16]. Therefore we only need to prove that \({{\,\mathrm{\text {br}}\,}}(T) \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) < 1\) implies non-reconstruction. Note that the channel \(M_t\) is reversible.

Bounded degree case: For \(u\in V(T)\), define

$$\begin{aligned} r_u := \lim _{k\rightarrow \infty } I(\sigma _u; \sigma _{L_{u,k}}). \end{aligned}$$
(350)

By data processing inequality, \(I(\sigma _{u}; \sigma _{L_{u,k}})\) is non-increasing for \(k\ge d(u,\rho )\), so the limit always exists. Because T has bounded maximum degree, we have

$$\begin{aligned} r_u \le I(\sigma _u; \sigma _{L_{u, d(u,\rho )+1}}). \end{aligned}$$
(351)

So there exists a constant \(C>0\) such that \(r_u\le C\) for all \(u\in v(T)\).

Now define

$$\begin{aligned} a_u = C^{-1} \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t)^{d(u,\rho )} r_u. \end{aligned}$$
(352)

Let c(u) be the set of children of u. For any \(v\in c(u)\), by Markov chain

$$\begin{aligned} \sigma _{L_{v,k}} \rightarrow \sigma _v\rightarrow \sigma _u \end{aligned}$$
(353)

and discussion before Lemma 50, we have

$$\begin{aligned} I(\sigma _u; \sigma _{L_{v,k}}) \le \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) I(\sigma _{v},\sigma _{L_{v,k}}). \end{aligned}$$
(354)

Because \((\sigma _{L_{v,k}})_{v\in c(u)}\) are independent conditioned on \(\sigma _u\), we have

$$\begin{aligned} I(\sigma _u; \sigma _{L_{u,k}}) \le \sum _{v\in c(u)} I(\sigma _u; \sigma _{L_{v,k}}). \end{aligned}$$
(355)

Combining the two inequalities and let \(k\rightarrow \infty \), we get

$$\begin{aligned} a_u \le \sum _{v\in c(u)} a_v. \end{aligned}$$
(356)

Furthermore, we have \(a_u \le \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t)^{d(u,\rho )}\).

Now define a flow b as follows. For any \(u\in V(T)\), let \(u_0=\rho , \ldots , u_\ell =u\) be the shortest path from \(\rho \) to u. Define

$$\begin{aligned} b_u = a_u \prod _{0\le j\le \ell -1} \frac{a_{u_j}}{\sum _{v\in c(u_j)}a_v}. \end{aligned}$$
(357)

(If \(\sum _{v\in c(u_j)} a_v=0\) for some j, then let \(b_u = 0\).) Then we have

$$\begin{aligned} b_u = \sum _{v \in c(u)} b_v, \end{aligned}$$
(358)

and that

$$\begin{aligned} b_u \le a_u \le \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t)^{d(u,\rho )}. \end{aligned}$$
(359)

By definition of branching number, we must have \(b_\rho = 0\). Therefore \(r_\rho = 0\) and non-reconstruction holds.

Galton–Watson tree case: Let D be the offspring distribution. We have

$$\begin{aligned} I(\sigma _{\rho }; \sigma _{L_k} |T)&\le {\mathbb {E}}_{c(\rho )} \sum _{v\in c(\rho )} I(\sigma _{\rho }; \sigma _{L_{v,k}} |T) \\&\le {\mathbb {E}}_{c(\rho )} \sum _{v\in c(\rho )} \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) I(\sigma _v; \sigma _{L_{v,k}} |T_v) \\&= \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) {\mathbb {E}}_{c(\rho )} \sum _{v\in c(\rho )} I(\sigma _v; \sigma _{L_{v,k}} |T_v)\\&= \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) {\mathbb {E}}_{b\sim D} \left[ b I(\sigma _\rho ; \sigma _{L_{k-1}} |T)\right] \\&= \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) d I(\sigma _\rho ; \sigma _{L_{k-1}} |T). \end{aligned}$$

Here \(T_v\) denotes the subtree rooted at v. Because \(I(\sigma _\rho ; \sigma _{L_1} |T) < \infty \), when \(d \widetilde{\eta }_{{{\,\mathrm{\text {KL}}\,}}}(\pi , M_t) < 1\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty } I(\sigma _{\rho }; \sigma _{L_k} | T) = 0. \end{aligned}$$
(360)

This finishes the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, Y., Polyanskiy, Y. Non-linear Log-Sobolev Inequalities for the Potts Semigroup and Applications to Reconstruction Problems. Commun. Math. Phys. 404, 769–831 (2023). https://doi.org/10.1007/s00220-023-04851-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00220-023-04851-1