Strong Data-Processing Inequalities for Channels and Bayesian Networks

  • Yury PolyanskiyEmail author
  • Yihong Wu
Conference paper
Part of the The IMA Volumes in Mathematics and its Applications book series (IMA, volume 161)


The data-processing inequality, that is, I(U; Y ) ≤ I(U; X) for a Markov chain U → X → Y, has been the method of choice for proving impossibility (converse) results in information theory and many other disciplines. Various channel-dependent improvements (called strong data-processing inequalities, or SDPIs) of this inequality have been proposed both classically and more recently. In this note we first survey known results relating various notions of contraction for a single channel. Then we consider the basic extension: given SDPI for each constituent channel in a Bayesian network, how to produce an end-to-end SDPI?

Our approach is based on the (extract of the) Evans-Schulman method, which is demonstrated for three different kinds of SDPIs, namely, the usual Ahlswede-Gács type contraction coefficients (mutual information), Dobrushin’s contraction coefficients (total variation), and finally the F I -curve (the best possible non-linear SDPI for a given channel). Resulting bounds on the contraction coefficients are interpreted as probability of site percolation. As an example, we demonstrate how to obtain SDPI for an n-letter memoryless channel with feedback given an SDPI for n = 1.

Finally, we discuss a simple observation on the equivalence of a linear SDPI and comparison to an erasure channel (in the sense of “less noisy” order). This leads to a simple proof of a curious inequality of Samorodnitsky (2015), and sheds light on how information spreads in the subsets of inputs of a memoryless channel.



We thank Prof. M. Raginsky for references [BK98, Daw75, Gol79] and Prof. A. Samorodnitsky for discussions on Proposition 13 with us. We also thank Aolin Xu for pointing out (41). We are grateful to an anonymous referee for helpful comments.

Yury Polyanskiy’s research has been supported by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-09-39370 and by the NSF CAREER award under grant agreement CCF-12-53205.

Yihong Wu’s research has been supported in part by NSF grants IIS-1447879, CCF-1423088 and the Strategic Research Initiative of the College of Engineering at the University of Illinois.


  1. [AG76]
    R. Ahlswede and P. Gács. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab., pages 925–939, 1976.Google Scholar
  2. [AGKN13]
    Venkat Anantharam, Amin Gohari, Sudeep Kamath, and Chandra Nair. On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover. arXiv preprint arXiv:1304.6133, 2013.Google Scholar
  3. [Ash65]
    Robert B. Ash. Information Theory. Dover Publications Inc., New York, NY, 1965.zbMATHGoogle Scholar
  4. [Bir57]
    G. Birkhoff. Extensions of Jentzsch’s theorem. Trans. of AMS, 85:219–227, 1957.MathSciNetzbMATHGoogle Scholar
  5. [BK98]
    Xavier Boyen and Daphne Koller. Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence—UAI 1998, pages 33–42. San Francisco: Morgan Kaufmann, 1998. Available at
  6. [CIR+93]
    J.E. Cohen, Yoh Iwasa, Gh. Rautu, M.B. Ruskai, E. Seneta, and Gh. Zbaganu. Relative entropy under mappings by stochastic matrices. Linear algebra and its applications, 179:211–235, 1993.Google Scholar
  7. [CK81]
    I. Csiszár and J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic, New York, 1981.zbMATHGoogle Scholar
  8. [CKZ98]
    J. E. Cohen, J. H. B. Kempermann, and Gh. Zbăganu. Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population. Springer, 1998.zbMATHGoogle Scholar
  9. [Cou12]
    T. Courtade. Two Problems in Multiterminal Information Theory. PhD thesis, U. of California, Los Angeles, CA, 2012.Google Scholar
  10. [CPW15]
    F. Calmon, Y. Polyanskiy, and Y. Wu. Strong data processing inequalities for input-constrained additive noise channels. arXiv, December 2015. arXiv:1512.06429.Google Scholar
  11. [CRS94]
    M. Choi, M.B. Ruskai, and E. Seneta. Equivalence of certain entropy contraction coefficients. Linear algebra and its applications, 208:29–36, 1994.MathSciNetCrossRefzbMATHGoogle Scholar
  12. [Csi67]
    I. Csiszár. Information-type measures of difference of probability distributions and indirect observation. Studia Sci. Math. Hungar., 2:229–318, 1967.MathSciNetGoogle Scholar
  13. [Daw75]
    DA Dawson. Information flow in graphs. Stoch. Proc. Appl., 3(2):137–151, 1975.MathSciNetCrossRefzbMATHGoogle Scholar
  14. [DJW13]
    John C Duchi, Michael Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 429–438. IEEE, 2013.Google Scholar
  15. [DMLM03]
    P. Del Moral, M. Ledoux, and L. Miclo. On contraction properties of Markov kernels. Probab. Theory Relat. Fields, 126:395–420, 2003.MathSciNetCrossRefzbMATHGoogle Scholar
  16. [Dob56]
    R. L. Dobrushin. Central limit theorem for nonstationary Markov chains. I. Theory Probab. Appl., 1(1):65–80, 1956.Google Scholar
  17. [Dob70]
    R. L. Dobrushin. Definition of random variables by conditional distributions. Theor. Probability Appl., 15(3):469–497, 1970.MathSciNetCrossRefGoogle Scholar
  18. [Doe37]
    Wolfgang Doeblin. Le cas discontinu des probabilités en chaîne. na, 1937.Google Scholar
  19. [EC98]
    Elza Erkip and Thomas M. Cover. The efficiency of investment information. IEEE Trans. Inf. Theory, 44(3):1026–1040, 1998.MathSciNetCrossRefzbMATHGoogle Scholar
  20. [EGK11]
    Abbas El Gamal and Young-Han Kim. Network information theory. Cambridge university press, 2011.CrossRefzbMATHGoogle Scholar
  21. [EKPS00]
    William Evans, Claire Kenyon, Yuval Peres, and Leonard J Schulman. Broadcasting on trees and the Ising model. Ann. Appl. Probab., 10(2):410–433, 2000.Google Scholar
  22. [ES99]
    William S Evans and Leonard J Schulman. Signal propagation and noisy circuits. IEEE Trans. Inf. Theory, 45(7):2367–2373, 1999.Google Scholar
  23. [Gol79]
    Sheldon Goldstein. Maximal coupling. Probability Theory and Related Fields, 46(2):193–204, 1979.MathSciNetzbMATHGoogle Scholar
  24. [HV11]
    P. Harremoës and I. Vajda. On pairs of f-divergences and their joint range. IEEE Trans. Inf. Theory, 57(6):3230–3235, Jun. 2011.Google Scholar
  25. [Lau96]
    Steffen L Lauritzen. Graphical Models. Oxford University Press, 1996.Google Scholar
  26. [LCV15]
    Jingbo Liu, Paul Cuff, and Sergio Verdu. Secret key generation with one communicator and a zero-rate one-shot via hypercontractivity. arXiv preprint arXiv:1504.05526, 2015.Google Scholar
  27. [Led99]
    M. Ledoux. Concentration of measure and logarithmic Sobolev inequalities. Seminaire de probabilites XXXIII, pages 120–216, 1999.Google Scholar
  28. [Mar06]
    Andrey Andreyevich Markov. Extension of the law of large numbers to dependent quantities. Izv. Fiz.-Matem. Obsch. Kazan Univ.(2nd Ser), 15:135–156, 1906.Google Scholar
  29. [MS77]
    Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error correcting codes. Elsevier, 1977.Google Scholar
  30. [MZ15]
    Anuran Makur and Lizhong Zheng. Bounds between contraction coefficients. arXiv preprint arXiv:1510.01844, 2015.Google Scholar
  31. [Nai14]
    C. Nair. Equivalent formulations of hypercontractivity using information measures. In Proc. 2014 Zurich Seminar on Comm., 2014.Google Scholar
  32. [Ord16]
    Or Ordentlich. Novel lower bounds on the entropy rate of binary hidden Markov processes. In Proc. 2016 IEEE Int. Symp. Inf. Theory (ISIT), Barcelona, Spain, July 2016.Google Scholar
  33. [PW16a]
    Y. Polyanskiy and Y. Wu. Lecture notes on information theory. 2016.
  34. [PW16b]
    Yury Polyanskiy and Yihong Wu. Dissipation of information in channels with input constraints. IEEE Trans. Inf. Theory, 62(1):35–55, January 2016. also arXiv:1405.3629.Google Scholar
  35. [Rag13]
    Maxim Raginsky. Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels. In 2013 IEEE International Symposium on Information Theory Proceedings (ISIT), pages 419–423, 2013.Google Scholar
  36. [Rag14]
    Maxim Raginsky. Strong data processing inequalities and ϕ-Sobolev inequalities for discrete channels. arXiv preprint arXiv:1411.3575, November 2014.Google Scholar
  37. [Sam15]
    Alex Samorodnitsky. On the entropy of a noisy function. arXiv preprint arXiv:1508.01464, August 2015.Google Scholar
  38. [Sar58]
    O. V. Sarmanov. Maximal correlation coefficient (non-symmetric case). Dokl. Akad. Nauk SSSR, 121(1):52–55, 1958.Google Scholar
  39. [Vaj09]
    I. Vajda. On metric divergences of probability measures. Kybernetika, 45(6):885–900, 2009.Google Scholar
  40. [Vil03]
    C. Villani. Topics in optimal transportation. American Mathematical Society, Providence, RI, 2003.Google Scholar
  41. [XR15]
    Aolin Xu and Maxim Raginsky. Converses for distributed estimation via strong data processing inequalities. In Proc. 2015 IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, CN, July 2015.Google Scholar
  42. [ZY97]
    Zhen Zhang and Raymond W Yeung. A non-Shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory, 43(6):1982–1986, 1997.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of EECSMITCambridgeUSA
  2. 2.Department of StatisticsYale UniversityNew HavenUSA

Personalised recommendations