Strong Data-Processing Inequalities for Channels and Bayesian Networks
The data-processing inequality, that is, I(U; Y ) ≤ I(U; X) for a Markov chain U → X → Y, has been the method of choice for proving impossibility (converse) results in information theory and many other disciplines. Various channel-dependent improvements (called strong data-processing inequalities, or SDPIs) of this inequality have been proposed both classically and more recently. In this note we first survey known results relating various notions of contraction for a single channel. Then we consider the basic extension: given SDPI for each constituent channel in a Bayesian network, how to produce an end-to-end SDPI?
Our approach is based on the (extract of the) Evans-Schulman method, which is demonstrated for three different kinds of SDPIs, namely, the usual Ahlswede-Gács type contraction coefficients (mutual information), Dobrushin’s contraction coefficients (total variation), and finally the F I -curve (the best possible non-linear SDPI for a given channel). Resulting bounds on the contraction coefficients are interpreted as probability of site percolation. As an example, we demonstrate how to obtain SDPI for an n-letter memoryless channel with feedback given an SDPI for n = 1.
Finally, we discuss a simple observation on the equivalence of a linear SDPI and comparison to an erasure channel (in the sense of “less noisy” order). This leads to a simple proof of a curious inequality of Samorodnitsky (2015), and sheds light on how information spreads in the subsets of inputs of a memoryless channel.
We thank Prof. M. Raginsky for references [BK98, Daw75, Gol79] and Prof. A. Samorodnitsky for discussions on Proposition 13 with us. We also thank Aolin Xu for pointing out (41). We are grateful to an anonymous referee for helpful comments.
Yury Polyanskiy’s research has been supported by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-09-39370 and by the NSF CAREER award under grant agreement CCF-12-53205.
Yihong Wu’s research has been supported in part by NSF grants IIS-1447879, CCF-1423088 and the Strategic Research Initiative of the College of Engineering at the University of Illinois.
- [AG76]R. Ahlswede and P. Gács. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab., pages 925–939, 1976.Google Scholar
- [AGKN13]Venkat Anantharam, Amin Gohari, Sudeep Kamath, and Chandra Nair. On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover. arXiv preprint arXiv:1304.6133, 2013.Google Scholar
- [BK98]Xavier Boyen and Daphne Koller. Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence—UAI 1998, pages 33–42. San Francisco: Morgan Kaufmann, 1998. Available at http://www.cs.stanford.edu/~xb/uai98/.
- [CIR+93]J.E. Cohen, Yoh Iwasa, Gh. Rautu, M.B. Ruskai, E. Seneta, and Gh. Zbaganu. Relative entropy under mappings by stochastic matrices. Linear algebra and its applications, 179:211–235, 1993.Google Scholar
- [Cou12]T. Courtade. Two Problems in Multiterminal Information Theory. PhD thesis, U. of California, Los Angeles, CA, 2012.Google Scholar
- [CPW15]F. Calmon, Y. Polyanskiy, and Y. Wu. Strong data processing inequalities for input-constrained additive noise channels. arXiv, December 2015. arXiv:1512.06429.Google Scholar
- [DJW13]John C Duchi, Michael Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 429–438. IEEE, 2013.Google Scholar
- [Dob56]R. L. Dobrushin. Central limit theorem for nonstationary Markov chains. I. Theory Probab. Appl., 1(1):65–80, 1956.Google Scholar
- [Doe37]Wolfgang Doeblin. Le cas discontinu des probabilités en chaîne. na, 1937.Google Scholar
- [EKPS00]William Evans, Claire Kenyon, Yuval Peres, and Leonard J Schulman. Broadcasting on trees and the Ising model. Ann. Appl. Probab., 10(2):410–433, 2000.Google Scholar
- [ES99]William S Evans and Leonard J Schulman. Signal propagation and noisy circuits. IEEE Trans. Inf. Theory, 45(7):2367–2373, 1999.Google Scholar
- [HV11]P. Harremoës and I. Vajda. On pairs of f-divergences and their joint range. IEEE Trans. Inf. Theory, 57(6):3230–3235, Jun. 2011.Google Scholar
- [Lau96]Steffen L Lauritzen. Graphical Models. Oxford University Press, 1996.Google Scholar
- [LCV15]Jingbo Liu, Paul Cuff, and Sergio Verdu. Secret key generation with one communicator and a zero-rate one-shot via hypercontractivity. arXiv preprint arXiv:1504.05526, 2015.Google Scholar
- [Led99]M. Ledoux. Concentration of measure and logarithmic Sobolev inequalities. Seminaire de probabilites XXXIII, pages 120–216, 1999.Google Scholar
- [Mar06]Andrey Andreyevich Markov. Extension of the law of large numbers to dependent quantities. Izv. Fiz.-Matem. Obsch. Kazan Univ.(2nd Ser), 15:135–156, 1906.Google Scholar
- [MS77]Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error correcting codes. Elsevier, 1977.Google Scholar
- [MZ15]Anuran Makur and Lizhong Zheng. Bounds between contraction coefficients. arXiv preprint arXiv:1510.01844, 2015.Google Scholar
- [Nai14]C. Nair. Equivalent formulations of hypercontractivity using information measures. In Proc. 2014 Zurich Seminar on Comm., 2014.Google Scholar
- [Ord16]Or Ordentlich. Novel lower bounds on the entropy rate of binary hidden Markov processes. In Proc. 2016 IEEE Int. Symp. Inf. Theory (ISIT), Barcelona, Spain, July 2016.Google Scholar
- [PW16a]Y. Polyanskiy and Y. Wu. Lecture notes on information theory. 2016. http://people.lids.mit.edu/yp/homepage/data/itlectures_v4.pdf.
- [PW16b]Yury Polyanskiy and Yihong Wu. Dissipation of information in channels with input constraints. IEEE Trans. Inf. Theory, 62(1):35–55, January 2016. also arXiv:1405.3629.Google Scholar
- [Rag13]Maxim Raginsky. Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels. In 2013 IEEE International Symposium on Information Theory Proceedings (ISIT), pages 419–423, 2013.Google Scholar
- [Rag14]Maxim Raginsky. Strong data processing inequalities and ϕ-Sobolev inequalities for discrete channels. arXiv preprint arXiv:1411.3575, November 2014.Google Scholar
- [Sam15]Alex Samorodnitsky. On the entropy of a noisy function. arXiv preprint arXiv:1508.01464, August 2015.Google Scholar
- [Sar58]O. V. Sarmanov. Maximal correlation coefficient (non-symmetric case). Dokl. Akad. Nauk SSSR, 121(1):52–55, 1958.Google Scholar
- [Vaj09]I. Vajda. On metric divergences of probability measures. Kybernetika, 45(6):885–900, 2009.Google Scholar
- [Vil03]C. Villani. Topics in optimal transportation. American Mathematical Society, Providence, RI, 2003.Google Scholar
- [XR15]Aolin Xu and Maxim Raginsky. Converses for distributed estimation via strong data processing inequalities. In Proc. 2015 IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, CN, July 2015.Google Scholar
- [ZY97]Zhen Zhang and Raymond W Yeung. A non-Shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory, 43(6):1982–1986, 1997.Google Scholar