Skip to main content

Minoration via mixed volumes and Cover’s problem for general channels

Abstract

We give a complete solution to an open problem of Thomas Cover in 1987 about the capacity of a relay channel in the general discrete memoryless setting without any additional assumptions. The key step in our approach is to lower bound a certain soft-max of a stochastic process by convex geometry methods, which is based on two ideas: First, the soft-max is lower bounded in terms of the supremum of another process, by approximating a convex set with a polytope with bounded number of vertices. Second, using a result of Pajor, the supremum of the process is lower bounded in terms of packing numbers by means of mixed-volume inequalities (Minkowski’s first inequality).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Data availibility

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

  1. Unless otherwise noted, the bases in this paper are natural.

  2. The name “soft-max” is justified by the fact that if X is a random variable equiprobably distributed on a finite set \({\mathcal {X}}\subseteq {\mathbb {R}}\), then \(\max _{x\in {\mathcal {X}}}x-\log \vert {\mathcal {X}}\vert \le \log {\mathbb {E}}[\exp (X)]\le \max _{x\in {\mathcal {X}}}x\).

  3. Incidentally, as recounted by D. Donoho, the analogies between information-theoretic inequalities and the Brunn–Minkowski inequality is one among the three that exemplify the “beauty and purity” of Cover’s interest [32]

  4. A priori, \(Q_{{\underline{Y}}{\underline{Z}}}\) is the distribution induced by \(Q_{YZ}\), the capacity-achieving output distribution for \(P_{YZ\vert X}\), and the functions \(Y\mapsto {\underline{Y}}\) and \(Z\mapsto {\underline{Z}}\). This proposition shows that the notation also coincides with the capacity-achieving output distribution for the channel \(P_{{\underline{Y}}{\underline{Z}}\vert X}\).

  5. As in (107), the capacity-achieving output distribution \(Q_{{\underline{Y}}{\underline{Z}}}\) is fully supported on \(\{({\underline{y}},{\underline{z}}):\max _xP_{{\underline{Y}}\vert X=x}({\underline{y}})P_{{\underline{Z}}\vert X=x}({\underline{z}})>0\}\), hence \(\alpha \) is finite.

  6. In principle, we can maximize \(\kappa \) by optimizing over \({\mathcal {C}}_x\) contained in the convex hull of (176), although for our purpose of solving Cover’s problem we can pick any \({\mathcal {C}}_x\).

References

  1. Boucheron, S., Lugosi, G., Bousquet, O.: Concentration Inequalities, pp. 208–240. Oxford University Press, Oxford, UK (2004)

  2. Vershynin, R.: High-dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)

    MATH  Google Scholar 

  3. van Handel, R.: Probability in high dimension. Technical report (This version: December 21, 2016)

  4. Dudley, R.M.: Sample functions of the Gaussian process. Ann. Probab. 1(1), 66–103 (1973)

    MathSciNet  MATH  Google Scholar 

  5. van de Geer, S.: Oracle inequalities and regularization. In: Lectures on empirical processes, EMS Ser. Lect. Math., Eur. Math. Soc., Zurich, pp. 191–252 (2007)

  6. van de Geer, S.: Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge 6 (2000)

  7. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin (1991)

    Book  Google Scholar 

  8. Ahlswede, R., Gács, P.: Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 4, 925–939 (1976)

    Article  MathSciNet  Google Scholar 

  9. Ahlswede, R., Gács, P., Körner, J.: Bounds on conditional probabilities with applications in multi-user communication. Probability Theory Relat. Fields 34(2), 157–177 (1976). Correction (1977). ibid, 39(4), 353–354

  10. Cover, T.M.: The capacity of the relay channel. In: Open Problems in Communication and Computation, edited by T. M. Cover and B. Gopinath, New York: Springer, pp. 72–73 (1987)

  11. Wu, X., Barnes, L.P., Ozgur, A.: The capacity of the relay channel: solution to Cover’s problem in the Gaussian case. IEEE Trans. Inf. Theory 65(1), 255–275 (2019)

    Article  MathSciNet  Google Scholar 

  12. Bai, Y., Wu, X., Ozgur, A.: Information constrained optimal transport: from Talagrand, to Marton, to Cover. arXiv:2008.10249

  13. Barnes, L.P., Wu, X., Ozgur, A.: A solution to cover’s problem for the binary symmetric relay channel: Geometry of sets on the hamming sphere. In: 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 844–851 (2017)

  14. Zhang, Z.: Partial converse for a relay channel. IEEE Trans. Inf. Theory 34(5), 1106–1110 (1988)

    Article  MathSciNet  Google Scholar 

  15. Wu, X., Ozgur, A.: Cut-set bound is loose for Gaussian relay networks. IEEE Trans. Inf. Theory 64(2), 1023–1037 (2018)

    Article  MathSciNet  Google Scholar 

  16. Wu, X., Ozgur, A., Xie, L.: Improving on the cut-set bound via geometric analysis of typical sets. IEEE Trans. Inf. Theory 63(4), 2254–2277 (2017)

    Article  MathSciNet  Google Scholar 

  17. Wu, X., Ozgur, A.: Improving on the cut-set bound for general primitive relay channels. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1675–1679 (2016)

  18. Liu, J., Ozgur, A.: Capacity upper bounds for the relay channel via reverse hypercontractivity. IEEE Trans. Inf. Theory 66(9), 5448–5455 (2020)

    Article  MathSciNet  Google Scholar 

  19. Liu, J., van Handel, R., Verdú, S.: Second-order converses via reverse hypercontractivity. Math. Stat. Learn. 2(2), 103–163 (2020)

    Article  MathSciNet  Google Scholar 

  20. Liu, J.: Information Theory from A Functional Viewpoint. Ph.D. thesis, Princeton University, Princeton, NJ (2018)

  21. Liu, J.: Dispersion bound for the Wyner–Ahlswede–Körner network via a semigroup method on types. IEEE Trans. Inf. Theory 67(2), 869–885

  22. Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. (New Series) Am. Math. Soc. 39(1), 1–49 (2001)

    Article  MathSciNet  Google Scholar 

  23. El Gamal, A., Kim, Y.-H.: Network Information Theory. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  24. Barvinok, A.: Thrifty approximations of convex bodies by polytopes. Int. Math. Res. Notices 2014(16), 4341–4356 (2014)

    Article  MathSciNet  Google Scholar 

  25. Chatterjee, S.: An error bound in the Sudakov–Fernique inequality. arXiv preprint arXiv:math/0510424 (2005)

  26. Carl, B., Pajor, A.: Gelfand numbers of operators with values in a Hilbert space. Invent. Math. 94, 479–504 (1988)

    Article  MathSciNet  Google Scholar 

  27. Talagrand, M.: The supremum of some canonical processes. Am. J. Math. 116(2), 283–325 (1994)

    Article  MathSciNet  Google Scholar 

  28. Latała, R.: Sudakov-type minoration for log-concave vectors. Studia Math. 223(3), 251–274 (2014)

    Article  MathSciNet  Google Scholar 

  29. Pajor, A.: Sous-espaces \(\ell _1^n\) des espaces de banach. Ph.D. Thesis, L’Universite Pierre Et Marie Curie, https://perso.math.u-pem.fr/pajor.alain/recherche/docs/these.pdf (1984)

  30. Mendelson, S., Milman, E., Paouris, G.: Generalized dual Sudakov minoration via dimension-reduction program. Studia Math. 244, 159–202 (2019)

    Article  MathSciNet  Google Scholar 

  31. Gardner, R.J.: The Brunn–Minkowski inequality. Bull. Am. Math. Soc. 39, 355–405 (2002)

    Article  MathSciNet  Google Scholar 

  32. ISL, S.: Thomas M. Cover in memoriam. http://stanford.edu/group/isl/cgi-bin/wordpress/?page_id=3287#comment-13

  33. Schneider, R.: Convex Bodies: the Brunn–Minkowski Theory. Cambridge, expanded edition (2014)

  34. Shenfeld, Y., van Handel, R.: Mixed volumes and the Bochner method. Proc. Am. Math. Soc. 147, 5385–5402 (2019)

    Article  MathSciNet  Google Scholar 

  35. Cordero-Erausquin, D., Klartag, B., Merigot, Q., Santambrogio, F.: One more proof of the Alexandrov–Fenchel inequality 357(8), 676–680 (2019)

  36. Grünbaum, B.: Convex Polytopes, vol. 221. Springer, New York (2003)

    Book  Google Scholar 

  37. Liu, J.: Minoration via mixed volumes and Cover’s problem for general channels. arXiv:2012.14521v2

  38. Kim, Y.-H.: Coding techniques for primitive relay channels. In: Forty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 26–28 (2007)

  39. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  Google Scholar 

  40. Gamal, A.E., Gohari, A., Nair, C.: Strengthened cutset upper bounds on the capacity of the relay channel and applications. arXiv:2101.11139

  41. Haussler, D.: A general minimax result for relative entropy. IEEE Trans. Inf. Theory 43(4), 1276–1280 (1997)

    Article  MathSciNet  Google Scholar 

  42. Wolfowitz, J.: Notes on a general strong converse. Inf. Control 12, 1–4 (1968)

    Article  MathSciNet  Google Scholar 

  43. Tomczak-Jaegermann, N.: Banach-Mazur distance and finite-dimensional operator ideals. Number 38 in Pitman Monographs and Surveys in Pure and Applied Mathematics. Pitman (1989)

  44. Milman, V.: Random subspaces of proportional dimension of finite dimensional normed spaces, approach through the isoperimetric inequality. Semin. Anal. Fonct. 84/85, Université PARIS VI, Paris

  45. Milman, V.: Almost Euclidean quotient spaces of subspaces of finite dimensional normed spaces. Proc. Am. Math. Soc. 94, 445–449 (1985)

    Article  MathSciNet  Google Scholar 

  46. Figiel, T., Johnson, W.: Large subspaces of \(\ell ^n_{\infty }\) and estimates of the Gordon-Lewis constantand estimates of the Gordon–Lewis constant. Israel J. Math. 37(1), 92–112 (1980)

    Article  MathSciNet  Google Scholar 

  47. Gluskin, E.D.: Extremal properties of orthogonal parallelepipeds and their applications to the geometry of Banach spaces. (Russian), Mat. Sb. (N.S.) 136(178) (1988), no. 1, 85–96; English transl., Math. USSR-Sb. 64 (1989), no. 1, 85–96

Download references

Acknowledgements

The author gratefully acknowledge Professor Ramon van Handel for very helpful comments on the initial version of the manuscript, especially for pointing out that Lemma 1 already appeared in the work of Pajor [29]. The author is indebted to Professor Ayfer Ozgur for introducing Cover’s problem, discussions during our previous collaborative work [18], and her guidance and support as a mentor in the IT society. The author also thanks Professor Shahar Mendelson for comments on some references about Rademacher complexity. The author thanks the anonymous reviewers for the careful reading and useful references. This work was supported by the start-up grant at the Department of Statistics, University of Illinois.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingbo Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of claims in Example 3

For any fully supported \(P_X\), we can see that \(P_{X\vert Z=z}\) (induced by \(P_X\) and \(P_{Z\vert X}\)), \(z\in {\mathcal {Z}}\) are distinct distributions. Therefore we cannot combine symbols in \({\mathcal {Z}}\) to form a “more succinct” sufficient statistic for X. However, below we will see that there exists a capacity-achieving \(P_X\) which is not fully supported.

Define \(P_X=[\frac{1}{2},\frac{1}{2},0]\), \(P_{YZ\vert X}=P_{Y\vert X}P_{Z\vert X}\), and \(Q_{YZ}=\frac{1}{2}P_{YZ\vert X=1}+\frac{1}{2}P_{YZ\vert X=2}\). We will show that

$$\begin{aligned} D(P_{YZ\vert X=3}\Vert Q_{YZ})< D(P_{YZ\vert X=1}\Vert Q_{YZ})= D(P_{YZ\vert X=2}\Vert Q_{YZ}) \end{aligned}$$
(A1)

for the range of \(\epsilon \) and \(\delta \) in Example 3, which will imply that \(P_X\) maximizes I(XYZ) in view of the saddle-point characterization of the channel capacity (Sect. 4.3). To show (A1), note that in the matrix form, we have

$$\begin{aligned} Q_{YZ}= \begin{bmatrix} \frac{1}{16}+\frac{\epsilon ^2}{4} &{} \frac{1}{16}+\frac{\epsilon ^2}{4} &{} \frac{1}{8}-\frac{\epsilon ^2}{2} \\ \frac{1}{16}+\frac{\epsilon ^2}{4} &{} \frac{1}{16}+\frac{\epsilon ^2}{4} &{} \frac{1}{8}-\frac{\epsilon ^2}{2} \\ \frac{1}{8}-\frac{\epsilon ^2}{2} &{} \frac{1}{8}-\frac{\epsilon ^2}{2} &{} \frac{1}{4} +\epsilon ^2 \end{bmatrix} = \begin{bmatrix} \frac{1}{16} &{} \frac{1}{16} &{} \frac{1}{8}\\ \frac{1}{16} &{} \frac{1}{16} &{} \frac{1}{8}\\ \frac{1}{8} &{} \frac{1}{8} &{} \frac{1}{4} \end{bmatrix} +\Theta (\epsilon ^2) \end{aligned}$$
(A2)

where \(\Theta (\epsilon ^2)\) denotes a matrix whose Frobenius norm is order \(\epsilon ^2\). Therefore, we can see (for example, by approximating the relative entropy with the \(\chi ^2\)-divergence, noting \(P_{YZ\vert X=1}-Q_{YZ}=\Theta (\epsilon )\)) that

$$\begin{aligned} D(P_{YZ\vert X=1}\Vert Q_{YZ})=D(P_{YZ\vert X=2}\Vert Q_{YZ})=\Theta (\epsilon ^2). \end{aligned}$$
(A3)

Similarly,

$$\begin{aligned} D(P_{YZ\vert X=3}\Vert Q_{YZ})=\Theta (\delta ^2+\epsilon ^4)=\Theta (\epsilon ^4). \end{aligned}$$
(A4)

These establish (A1) for sufficiently small \(\epsilon \), confirming that \(P_X\) is capacity-achieving.

To see \(R_{\mathrm{crit}}=\mathrm{H}(\frac{1}{2}+2\epsilon ^2)\), note that by Definition 2 we have \({\underline{1}}={\underline{2}}\ne {\underline{3}}\). Therefore

$$\begin{aligned} Q_{{\underline{Y}}{\underline{Z}}}= \begin{bmatrix} \frac{1}{4}+\epsilon ^2 &{} \frac{1}{4}-\epsilon ^2\\ \frac{1}{4}-\epsilon ^2 &{} \frac{1}{4}+\epsilon ^2 \end{bmatrix} \end{aligned}$$
(A5)

where the first column/row corresponds to \({\underline{1}}={\underline{2}}\) and the last colum/row corresponds to \({\underline{3}}\). Therefore \(R_{\mathrm{crit}}=H({\underline{Z}}\vert {\underline{Y}})=\mathrm{H}(\frac{1}{2}+2\epsilon ^2)\).

From (A2) we see that

$$\begin{aligned} H(Z\vert Y)&=\frac{1}{4}H(Z\vert Y=1)+\frac{1}{4}H(Z\vert Y=2) +\frac{1}{2}H(Z\vert Y=3) \nonumber \\&=\frac{\log 2}{2}+\mathrm{H}\left( \frac{1}{2}+2\epsilon ^2\right) . \end{aligned}$$
(A6)

Appendix B: Achievability (Proof of Proposition 2)

Consider encoder, relay, and decoder in Model 1 with the additional restrictions that the channel input symbols \(x_1,\dots ,x_n\) must be selected from a set \({\mathcal {X}}_{\mathrm{good}}\subseteq {\mathcal {X}}\) to be defined in (74), and the relay and the decoder must first preprocess their received vectors \(Z^n\) and \(Y^n\) to obtain \({\underline{Z}}^n\) and \({\underline{Y}}^n\) (i.e., applying coordinate-wise maps \(Z_i\mapsto {\underline{Z}}_i\) and \(Y_i\mapsto {\underline{Y}}_i\)). The capacity of the restricted model can only be smaller than that of the original Model 1 due to the restrictions, i.e., \(C_{\mathrm{restrict}}(H({\underline{Z}}\vert {\underline{Y}}))\le C(H({\underline{Z}}\vert {\underline{Y}}))\) Moreover, from the perspectives of the encoder, relay, and the decoder, the restricted model is essentially also Model 1 but with channels \(P_{{\underline{Z}}\vert X}\) and \(P_{{\underline{Y}}\vert X}\) and input alphabet \({\mathcal {X}}_{\mathrm{good}}\). Using compress-and-forward (see [38, Proposition 3] with the substitutions \(Y_1\leftarrow Z\) and \({\hat{Y}}_1\leftarrow V\)), we see that \(C_\mathrm{restrict}(H({\underline{Z}}\vert {\underline{Y}}))=C_{\mathrm{restrict}}(\infty )\). It will be shown in Proposition 4 that \(C_\mathrm{restrict}(\infty )=C(\infty )\). Thus \(C(H({\underline{Z}}\vert {\underline{Y}}))\ge C(\infty )\).

Appendix C: Proof of Theorem 2

Most parts of the proofs of upper and lower bounds on \(R_{\mathrm{crit}}\) follows the same lines as the symmetric case, with the modifications that all underlines for y and z are removed throughout, \({\mathcal {X}}_{\mathrm{bad}}=\emptyset \) in (73), and Proposition 3 and Proposition 4 are no longer used. Note that when the z-equivalence classes are singletons, E in Model 2 is a constant, and in fact Model 2 collapses to Model 1. The only essential difference in the proof is the argument of the injectivity of \(\psi _x\) in the paragraph of (182). We now show the injectivity of \(\psi _x\) as the following instead, continuing (182) (note that now the underlines are removed): Pick arbitrary \(z\ne z'\) in \({\mathcal {Z}}_x\). By the second assumption of Theorem 2 we have \({\mathcal {Y}}_x={\mathcal {Y}}\). Now by (182), if \(({\tilde{K}}_x(y,z))_{y\in {\mathcal {Y}}_x}=({\tilde{K}}_x(y,z'))_{y\in {\mathcal {Y}}_x}\) then the vectors \((Q_{YZ}(y,z))_{y\in {\mathcal {Y}}}\) and \((Q_{YZ}(y,z'))_{y\in {\mathcal {Y}}}\) differ by a multiplicative constant. Therefore \(Q_{Y\vert Z}(\cdot \vert Z)=Q_{Y\vert Z}(\cdot \vert Z')\), which contradicts the assumption that z-equivalence classes are singletons, Hence \(\psi _x(z)=\psi _x(z')\) is false. Therefore \(\psi _x\) is injective.

Appendix D: Proof of Theorem 3

By Theorem 2, it suffices to show that for almost all \((P_{Y\vert X}, P_{Z\vert X})\), the z-equivalence classes are singletons.

For almost all \((P_{Y\vert X}, P_{Z\vert X})\), we have that

$$\begin{aligned} P_{Y\vert X=x_1}P_{Z\vert X=x_1}\ne P_{Y\vert X=x_2}P_{Z\vert X=x_2}, \quad \forall x_1, x_2\in {\mathcal {X}}:x_1\ne x_2. \end{aligned}$$
(D7)

Under (D7), the capacity is positive and the support of any capacity achieving input distribution (CAID) has cardinality at least 2. We will also show that for almost all \((P_{Y\vert X}, P_{Z\vert X})\),

$$\begin{aligned} {\mathrm{rank}}([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)])\ge 2,\quad \forall z_1,z_2\in {\mathcal {Z}}:z_1\ne z_2; \,S_X:\vert {{\,\mathrm{supp}\,}}(S_X)\vert \ge 2. \end{aligned}$$
(D8)

Here, \(S_{YZ}\) is induced by \(S_X\) and \(P_{Y\vert X}P_{Z\vert X}\), and \({\mathrm{rank}}([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)])\) denotes the rank of the matrix \([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)]\) where \(S_{YZ}(\cdot , z_1)\) and \(S_{YZ}(\cdot ,z_2)\) are treated as column vectors. Note (D7) (hence CAID has support size at least 2) and (D8) combined imply that the z-equivalence classes are singletons, and hence imply the claim of the theorem.

It remains to show (D8) for almost all \((P_{Y\vert X},P_{Z\vert X})\). In turn, it suffices to show that given arbitrary \({\mathcal {S}}\subseteq {\mathcal {X}}\), \(\vert {\mathcal {S}}\vert \ge 2\), and any \(z_1,z_2\in {\mathcal {Z}}\), \(z_1\ne z_2\), we have

$$\begin{aligned} {\mathrm{rank}}([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)])\ge 2,\quad \forall S_X:{{\,\mathrm{supp}\,}}(S_X)={\mathcal {S}} \end{aligned}$$
(D9)

for almost all \((P_{Y\vert X}, P_{Z\vert X})\). To see (D9), first call elements in the given \({\mathcal {S}}\) as \(\{x_1,\dots x_s\}\) where \(s\ge 2\). Pick arbitrary \(\{y_1,\dots ,y_s\}\subseteq {\mathcal {Y}}\). A submatrix of the matrix \([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)]\) equals

$$\begin{aligned}&\begin{bmatrix} P_{Y\vert X}(y_1\vert X_1)\dots P_{Y\vert X}(y_1\vert X_s)\\ \dots \\ P_{Y\vert X}(y_s\vert X_1)\dots P_{Y\vert X}(y_s\vert X_s) \end{bmatrix} \begin{bmatrix} S_X(x_1)&{}\dots &{} 0\\ ~&{}\ddots &{}~\\ 0&{}\dots &{}S_X(x_s) \end{bmatrix}\nonumber \\&\qquad \begin{bmatrix} P_{Z\vert X}(z_1\vert X_1)\quad P_{Z\vert X}(z_2\vert X_1)\\ \dots \\ P_{Z\vert X}(z_1\vert X_s)\quad P_{Z\vert X}(z_2\vert X_s) \end{bmatrix}. \end{aligned}$$
(D10)

For almost all \((P_{Y\vert X},P_{Z\vert X})\), the first matrices above is invertible and the third matrix is rank 2. The middle matrix is invertible when \({{\,\mathrm{supp}\,}}(S_X)={\mathcal {S}}\). Therefore (D9) holds for almost all \((P_{Y\vert X},P_{Z\vert X})\), as desired.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J. Minoration via mixed volumes and Cover’s problem for general channels. Probab. Theory Relat. Fields 183, 315–357 (2022). https://doi.org/10.1007/s00440-022-01111-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-022-01111-6

Keywords

  • Multiuser information theory
  • High dimensional probability
  • Empirical process theory
  • Convex geometry
  • Minkowski’s inequality

Mathematics Subject Classification

  • 94A05 Communication theory
  • 52A39 Mixed volumes and related topics in convex geometry
  • 46B07 Local theory of Banach spaces