Abstract
We give a complete solution to an open problem of Thomas Cover in 1987 about the capacity of a relay channel in the general discrete memoryless setting without any additional assumptions. The key step in our approach is to lower bound a certain soft-max of a stochastic process by convex geometry methods, which is based on two ideas: First, the soft-max is lower bounded in terms of the supremum of another process, by approximating a convex set with a polytope with bounded number of vertices. Second, using a result of Pajor, the supremum of the process is lower bounded in terms of packing numbers by means of mixed-volume inequalities (Minkowski’s first inequality).
This is a preview of subscription content, access via your institution.


Data availibility
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Notes
Unless otherwise noted, the bases in this paper are natural.
The name “soft-max” is justified by the fact that if X is a random variable equiprobably distributed on a finite set \({\mathcal {X}}\subseteq {\mathbb {R}}\), then \(\max _{x\in {\mathcal {X}}}x-\log \vert {\mathcal {X}}\vert \le \log {\mathbb {E}}[\exp (X)]\le \max _{x\in {\mathcal {X}}}x\).
Incidentally, as recounted by D. Donoho, the analogies between information-theoretic inequalities and the Brunn–Minkowski inequality is one among the three that exemplify the “beauty and purity” of Cover’s interest [32]
A priori, \(Q_{{\underline{Y}}{\underline{Z}}}\) is the distribution induced by \(Q_{YZ}\), the capacity-achieving output distribution for \(P_{YZ\vert X}\), and the functions \(Y\mapsto {\underline{Y}}\) and \(Z\mapsto {\underline{Z}}\). This proposition shows that the notation also coincides with the capacity-achieving output distribution for the channel \(P_{{\underline{Y}}{\underline{Z}}\vert X}\).
As in (107), the capacity-achieving output distribution \(Q_{{\underline{Y}}{\underline{Z}}}\) is fully supported on \(\{({\underline{y}},{\underline{z}}):\max _xP_{{\underline{Y}}\vert X=x}({\underline{y}})P_{{\underline{Z}}\vert X=x}({\underline{z}})>0\}\), hence \(\alpha \) is finite.
In principle, we can maximize \(\kappa \) by optimizing over \({\mathcal {C}}_x\) contained in the convex hull of (176), although for our purpose of solving Cover’s problem we can pick any \({\mathcal {C}}_x\).
References
Boucheron, S., Lugosi, G., Bousquet, O.: Concentration Inequalities, pp. 208–240. Oxford University Press, Oxford, UK (2004)
Vershynin, R.: High-dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)
van Handel, R.: Probability in high dimension. Technical report (This version: December 21, 2016)
Dudley, R.M.: Sample functions of the Gaussian process. Ann. Probab. 1(1), 66–103 (1973)
van de Geer, S.: Oracle inequalities and regularization. In: Lectures on empirical processes, EMS Ser. Lect. Math., Eur. Math. Soc., Zurich, pp. 191–252 (2007)
van de Geer, S.: Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge 6 (2000)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin (1991)
Ahlswede, R., Gács, P.: Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 4, 925–939 (1976)
Ahlswede, R., Gács, P., Körner, J.: Bounds on conditional probabilities with applications in multi-user communication. Probability Theory Relat. Fields 34(2), 157–177 (1976). Correction (1977). ibid, 39(4), 353–354
Cover, T.M.: The capacity of the relay channel. In: Open Problems in Communication and Computation, edited by T. M. Cover and B. Gopinath, New York: Springer, pp. 72–73 (1987)
Wu, X., Barnes, L.P., Ozgur, A.: The capacity of the relay channel: solution to Cover’s problem in the Gaussian case. IEEE Trans. Inf. Theory 65(1), 255–275 (2019)
Bai, Y., Wu, X., Ozgur, A.: Information constrained optimal transport: from Talagrand, to Marton, to Cover. arXiv:2008.10249
Barnes, L.P., Wu, X., Ozgur, A.: A solution to cover’s problem for the binary symmetric relay channel: Geometry of sets on the hamming sphere. In: 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 844–851 (2017)
Zhang, Z.: Partial converse for a relay channel. IEEE Trans. Inf. Theory 34(5), 1106–1110 (1988)
Wu, X., Ozgur, A.: Cut-set bound is loose for Gaussian relay networks. IEEE Trans. Inf. Theory 64(2), 1023–1037 (2018)
Wu, X., Ozgur, A., Xie, L.: Improving on the cut-set bound via geometric analysis of typical sets. IEEE Trans. Inf. Theory 63(4), 2254–2277 (2017)
Wu, X., Ozgur, A.: Improving on the cut-set bound for general primitive relay channels. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1675–1679 (2016)
Liu, J., Ozgur, A.: Capacity upper bounds for the relay channel via reverse hypercontractivity. IEEE Trans. Inf. Theory 66(9), 5448–5455 (2020)
Liu, J., van Handel, R., Verdú, S.: Second-order converses via reverse hypercontractivity. Math. Stat. Learn. 2(2), 103–163 (2020)
Liu, J.: Information Theory from A Functional Viewpoint. Ph.D. thesis, Princeton University, Princeton, NJ (2018)
Liu, J.: Dispersion bound for the Wyner–Ahlswede–Körner network via a semigroup method on types. IEEE Trans. Inf. Theory 67(2), 869–885
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. (New Series) Am. Math. Soc. 39(1), 1–49 (2001)
El Gamal, A., Kim, Y.-H.: Network Information Theory. Cambridge University Press, Cambridge (2011)
Barvinok, A.: Thrifty approximations of convex bodies by polytopes. Int. Math. Res. Notices 2014(16), 4341–4356 (2014)
Chatterjee, S.: An error bound in the Sudakov–Fernique inequality. arXiv preprint arXiv:math/0510424 (2005)
Carl, B., Pajor, A.: Gelfand numbers of operators with values in a Hilbert space. Invent. Math. 94, 479–504 (1988)
Talagrand, M.: The supremum of some canonical processes. Am. J. Math. 116(2), 283–325 (1994)
Latała, R.: Sudakov-type minoration for log-concave vectors. Studia Math. 223(3), 251–274 (2014)
Pajor, A.: Sous-espaces \(\ell _1^n\) des espaces de banach. Ph.D. Thesis, L’Universite Pierre Et Marie Curie, https://perso.math.u-pem.fr/pajor.alain/recherche/docs/these.pdf (1984)
Mendelson, S., Milman, E., Paouris, G.: Generalized dual Sudakov minoration via dimension-reduction program. Studia Math. 244, 159–202 (2019)
Gardner, R.J.: The Brunn–Minkowski inequality. Bull. Am. Math. Soc. 39, 355–405 (2002)
ISL, S.: Thomas M. Cover in memoriam. http://stanford.edu/group/isl/cgi-bin/wordpress/?page_id=3287#comment-13
Schneider, R.: Convex Bodies: the Brunn–Minkowski Theory. Cambridge, expanded edition (2014)
Shenfeld, Y., van Handel, R.: Mixed volumes and the Bochner method. Proc. Am. Math. Soc. 147, 5385–5402 (2019)
Cordero-Erausquin, D., Klartag, B., Merigot, Q., Santambrogio, F.: One more proof of the Alexandrov–Fenchel inequality 357(8), 676–680 (2019)
Grünbaum, B.: Convex Polytopes, vol. 221. Springer, New York (2003)
Liu, J.: Minoration via mixed volumes and Cover’s problem for general channels. arXiv:2012.14521v2
Kim, Y.-H.: Coding techniques for primitive relay channels. In: Forty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 26–28 (2007)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Gamal, A.E., Gohari, A., Nair, C.: Strengthened cutset upper bounds on the capacity of the relay channel and applications. arXiv:2101.11139
Haussler, D.: A general minimax result for relative entropy. IEEE Trans. Inf. Theory 43(4), 1276–1280 (1997)
Wolfowitz, J.: Notes on a general strong converse. Inf. Control 12, 1–4 (1968)
Tomczak-Jaegermann, N.: Banach-Mazur distance and finite-dimensional operator ideals. Number 38 in Pitman Monographs and Surveys in Pure and Applied Mathematics. Pitman (1989)
Milman, V.: Random subspaces of proportional dimension of finite dimensional normed spaces, approach through the isoperimetric inequality. Semin. Anal. Fonct. 84/85, Université PARIS VI, Paris
Milman, V.: Almost Euclidean quotient spaces of subspaces of finite dimensional normed spaces. Proc. Am. Math. Soc. 94, 445–449 (1985)
Figiel, T., Johnson, W.: Large subspaces of \(\ell ^n_{\infty }\) and estimates of the Gordon-Lewis constantand estimates of the Gordon–Lewis constant. Israel J. Math. 37(1), 92–112 (1980)
Gluskin, E.D.: Extremal properties of orthogonal parallelepipeds and their applications to the geometry of Banach spaces. (Russian), Mat. Sb. (N.S.) 136(178) (1988), no. 1, 85–96; English transl., Math. USSR-Sb. 64 (1989), no. 1, 85–96
Acknowledgements
The author gratefully acknowledge Professor Ramon van Handel for very helpful comments on the initial version of the manuscript, especially for pointing out that Lemma 1 already appeared in the work of Pajor [29]. The author is indebted to Professor Ayfer Ozgur for introducing Cover’s problem, discussions during our previous collaborative work [18], and her guidance and support as a mentor in the IT society. The author also thanks Professor Shahar Mendelson for comments on some references about Rademacher complexity. The author thanks the anonymous reviewers for the careful reading and useful references. This work was supported by the start-up grant at the Department of Statistics, University of Illinois.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of claims in Example 3
For any fully supported \(P_X\), we can see that \(P_{X\vert Z=z}\) (induced by \(P_X\) and \(P_{Z\vert X}\)), \(z\in {\mathcal {Z}}\) are distinct distributions. Therefore we cannot combine symbols in \({\mathcal {Z}}\) to form a “more succinct” sufficient statistic for X. However, below we will see that there exists a capacity-achieving \(P_X\) which is not fully supported.
Define \(P_X=[\frac{1}{2},\frac{1}{2},0]\), \(P_{YZ\vert X}=P_{Y\vert X}P_{Z\vert X}\), and \(Q_{YZ}=\frac{1}{2}P_{YZ\vert X=1}+\frac{1}{2}P_{YZ\vert X=2}\). We will show that
for the range of \(\epsilon \) and \(\delta \) in Example 3, which will imply that \(P_X\) maximizes I(X; YZ) in view of the saddle-point characterization of the channel capacity (Sect. 4.3). To show (A1), note that in the matrix form, we have
where \(\Theta (\epsilon ^2)\) denotes a matrix whose Frobenius norm is order \(\epsilon ^2\). Therefore, we can see (for example, by approximating the relative entropy with the \(\chi ^2\)-divergence, noting \(P_{YZ\vert X=1}-Q_{YZ}=\Theta (\epsilon )\)) that
Similarly,
These establish (A1) for sufficiently small \(\epsilon \), confirming that \(P_X\) is capacity-achieving.
To see \(R_{\mathrm{crit}}=\mathrm{H}(\frac{1}{2}+2\epsilon ^2)\), note that by Definition 2 we have \({\underline{1}}={\underline{2}}\ne {\underline{3}}\). Therefore
where the first column/row corresponds to \({\underline{1}}={\underline{2}}\) and the last colum/row corresponds to \({\underline{3}}\). Therefore \(R_{\mathrm{crit}}=H({\underline{Z}}\vert {\underline{Y}})=\mathrm{H}(\frac{1}{2}+2\epsilon ^2)\).
From (A2) we see that
Appendix B: Achievability (Proof of Proposition 2)
Consider encoder, relay, and decoder in Model 1 with the additional restrictions that the channel input symbols \(x_1,\dots ,x_n\) must be selected from a set \({\mathcal {X}}_{\mathrm{good}}\subseteq {\mathcal {X}}\) to be defined in (74), and the relay and the decoder must first preprocess their received vectors \(Z^n\) and \(Y^n\) to obtain \({\underline{Z}}^n\) and \({\underline{Y}}^n\) (i.e., applying coordinate-wise maps \(Z_i\mapsto {\underline{Z}}_i\) and \(Y_i\mapsto {\underline{Y}}_i\)). The capacity of the restricted model can only be smaller than that of the original Model 1 due to the restrictions, i.e., \(C_{\mathrm{restrict}}(H({\underline{Z}}\vert {\underline{Y}}))\le C(H({\underline{Z}}\vert {\underline{Y}}))\) Moreover, from the perspectives of the encoder, relay, and the decoder, the restricted model is essentially also Model 1 but with channels \(P_{{\underline{Z}}\vert X}\) and \(P_{{\underline{Y}}\vert X}\) and input alphabet \({\mathcal {X}}_{\mathrm{good}}\). Using compress-and-forward (see [38, Proposition 3] with the substitutions \(Y_1\leftarrow Z\) and \({\hat{Y}}_1\leftarrow V\)), we see that \(C_\mathrm{restrict}(H({\underline{Z}}\vert {\underline{Y}}))=C_{\mathrm{restrict}}(\infty )\). It will be shown in Proposition 4 that \(C_\mathrm{restrict}(\infty )=C(\infty )\). Thus \(C(H({\underline{Z}}\vert {\underline{Y}}))\ge C(\infty )\).
Appendix C: Proof of Theorem 2
Most parts of the proofs of upper and lower bounds on \(R_{\mathrm{crit}}\) follows the same lines as the symmetric case, with the modifications that all underlines for y and z are removed throughout, \({\mathcal {X}}_{\mathrm{bad}}=\emptyset \) in (73), and Proposition 3 and Proposition 4 are no longer used. Note that when the z-equivalence classes are singletons, E in Model 2 is a constant, and in fact Model 2 collapses to Model 1. The only essential difference in the proof is the argument of the injectivity of \(\psi _x\) in the paragraph of (182). We now show the injectivity of \(\psi _x\) as the following instead, continuing (182) (note that now the underlines are removed): Pick arbitrary \(z\ne z'\) in \({\mathcal {Z}}_x\). By the second assumption of Theorem 2 we have \({\mathcal {Y}}_x={\mathcal {Y}}\). Now by (182), if \(({\tilde{K}}_x(y,z))_{y\in {\mathcal {Y}}_x}=({\tilde{K}}_x(y,z'))_{y\in {\mathcal {Y}}_x}\) then the vectors \((Q_{YZ}(y,z))_{y\in {\mathcal {Y}}}\) and \((Q_{YZ}(y,z'))_{y\in {\mathcal {Y}}}\) differ by a multiplicative constant. Therefore \(Q_{Y\vert Z}(\cdot \vert Z)=Q_{Y\vert Z}(\cdot \vert Z')\), which contradicts the assumption that z-equivalence classes are singletons, Hence \(\psi _x(z)=\psi _x(z')\) is false. Therefore \(\psi _x\) is injective.
Appendix D: Proof of Theorem 3
By Theorem 2, it suffices to show that for almost all \((P_{Y\vert X}, P_{Z\vert X})\), the z-equivalence classes are singletons.
For almost all \((P_{Y\vert X}, P_{Z\vert X})\), we have that
Under (D7), the capacity is positive and the support of any capacity achieving input distribution (CAID) has cardinality at least 2. We will also show that for almost all \((P_{Y\vert X}, P_{Z\vert X})\),
Here, \(S_{YZ}\) is induced by \(S_X\) and \(P_{Y\vert X}P_{Z\vert X}\), and \({\mathrm{rank}}([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)])\) denotes the rank of the matrix \([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)]\) where \(S_{YZ}(\cdot , z_1)\) and \(S_{YZ}(\cdot ,z_2)\) are treated as column vectors. Note (D7) (hence CAID has support size at least 2) and (D8) combined imply that the z-equivalence classes are singletons, and hence imply the claim of the theorem.
It remains to show (D8) for almost all \((P_{Y\vert X},P_{Z\vert X})\). In turn, it suffices to show that given arbitrary \({\mathcal {S}}\subseteq {\mathcal {X}}\), \(\vert {\mathcal {S}}\vert \ge 2\), and any \(z_1,z_2\in {\mathcal {Z}}\), \(z_1\ne z_2\), we have
for almost all \((P_{Y\vert X}, P_{Z\vert X})\). To see (D9), first call elements in the given \({\mathcal {S}}\) as \(\{x_1,\dots x_s\}\) where \(s\ge 2\). Pick arbitrary \(\{y_1,\dots ,y_s\}\subseteq {\mathcal {Y}}\). A submatrix of the matrix \([S_{YZ}(\cdot , z_1),S_{YZ}(\cdot ,z_2)]\) equals
For almost all \((P_{Y\vert X},P_{Z\vert X})\), the first matrices above is invertible and the third matrix is rank 2. The middle matrix is invertible when \({{\,\mathrm{supp}\,}}(S_X)={\mathcal {S}}\). Therefore (D9) holds for almost all \((P_{Y\vert X},P_{Z\vert X})\), as desired.
Rights and permissions
About this article
Cite this article
Liu, J. Minoration via mixed volumes and Cover’s problem for general channels. Probab. Theory Relat. Fields 183, 315–357 (2022). https://doi.org/10.1007/s00440-022-01111-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-022-01111-6
Keywords
- Multiuser information theory
- High dimensional probability
- Empirical process theory
- Convex geometry
- Minkowski’s inequality
Mathematics Subject Classification
- 94A05 Communication theory
- 52A39 Mixed volumes and related topics in convex geometry
- 46B07 Local theory of Banach spaces