Abstract
Mapping the structure of the entropy region of at least four jointly distributed random variables is an important open problem. Even partial knowledge about this region has far reaching consequences in other areas in mathematics, like information theory, cryptography, probability theory and combinatorics. Presently, the only known method of exploring the entropy region is, or equivalent to, the one of Zhang and Yeung from 1998. Using some non-trivial properties of the entropy function, their method is transformed to solving high dimensional linear multiobjective optimization problems. Benson’s outer approximation algorithm is a fundamental tool for solving such optimization problems. An improved version of Benson’s algorithm is presented, which requires solving one scalar linear program in each iteration rather than two or three as in previous versions. During the algorithm design, special care is taken for numerical stability. The implemented algorithm is used to verify previous statements about the entropy region, as well as to explore it further. Experimental results demonstrate the viability of the improved Benson’s algorithm for determining the extremal set of medium-sized numerically ill-posed optimization problems. With larger problem sizes, two limitations of Benson’s algorithm is observed: the inefficiency of the scalar LP solver, and the unexpectedly large number of intermediate vertices.
Similar content being viewed by others
Notes
Hamel et al. [14] have observed the same improvement independently.
References
Avis, D., Bremner, D., Seidel, R.: How good are convex hull algorithms? Comput. Geom. 7(5–6), 265–301 (1997)
Baber, R., Christofides, D., Dang, A.N., Riis, S., Vaughan, E.R.: Multiple unicasts, graph guessing games, and non-Shannon inequalities. In: Proc. NetCod 2013, Calgary, pp. 1–6
Bassoli, R., Marques, H., Rodriguez, J., Shum, K.W., Tafazolli, R.: Network coding theory: a survey. IEEE Commun. Surveys Tutor. 15(4), 1950–1978 (2013)
Beimel, A.: Secret-sharing schemes: a survey. Coding and Cryptology. LNCS, pp. 11–46. Springer, Heidelberg (2011)
Benson, P.: An outer approximation algorithm for generating all efficient extreme points in the outcome set of a multiple objective linear program. J. Glob. Optim. 13(1), 1–24 (1998)
Bremner, D.: On the complexity of vertex and facet enumeration for convex polytopes. PhD Thesis, School of Computer Science, McGill University (1997)
Burton, B.A., Ozlen, M.: Projective geometry and the outer approximation algorithm for multiobjective linear programming, arXiv:1006.3085 (2010)
Chan, T.H.: Balanced information inequalities. IEEE Trans. Inform. Theory 49, 3261–3267 (2003)
Chan, T.H.: Recent progresses in characterising information inequalities. Entropy 13, 379–401 (2011)
Csirmaz, L.: Book inequalities. IEEE Trans. Inform. Theory 60, 6811–6818 (2014)
Dougherty, R., Freiling, C., Zeger, K.: Non-Shannon information inequalities in four random variables, arXiv:1104.3602 (2011)
Ehrgott, M., Löhne, A., Shao, L.: A dual variant of Benson’s outer approximation algorithm for multiple objective linear programming. J. Glob. Optim. 52, 757–778 (2012)
Fukuda, K., Prodon, A.: Double description method revisited. Combinatorics and Computer Science (Brest, 1995). LNCS, vol. 1120, pp. 91–111. Springer, Berlin (1996)
Hamel, A.H., Löhne, A., Rudloff, B.: Benson type algorithms for linear vector optimization and applications. J. Glob. Optim. 59, 811–836 (2013)
Heyde, F., Löhne, A.: Geometric duality in multiple objective linear programming. SIAM J. Optim. 19(2), 836–845 (2008)
Kaced, T.: Equivalence of two proof techniques for non-Shannon-type inequalities, In: Proceedings of the 2013 IEEE International Symposium on Information Theory, Istambul, pp. 236–240 (2013)
Madiman, M., Marcus, A.W., Tetali, P.: Information-theoretic inequalities in additive combinatorics. In: IEEE ITW, pp. 1–4 (2010)
Makarychev, K., Makarychev, Yu., Romashchenko, A., Vereshchagin, N.: A new class of non-Shannon-type inequalities for entropies. Commun. Inf. Syst. 2(2), 147–166 (2002)
Matus, F.: Infinitely many information inequalities. In: Proceedings ISIT, pp. 41–47, 24–29 June 2007. Nice, France
Matus, F.: Two constructions on limits of entropy functions. IEEE Trans. Inform. Theory 53(1), 320–330 (2007)
Matus, F.: Personal communication (2012)
Matus, F., Studeny, M.: Conditional independencies among four random variables I. Comb. Probab. Comput. 4, 269–278 (1995)
McRae, W.B., Davidson, E.R.: An algorithm for the extreme rays of a pointed convex polyhedral cone. SIAM J. Comput. 2(4), 281–293 (1973)
Pippenger, N.: What are the laws of information theory. In: Special Problems on Communication and Computation Conference. Palo Alt, California, 3–5 Sept 1986
Studený, M.: Probabilistic Conditional Independence Structures. Springer, New York (2005)
MacLaren Walsh, J., Weber, S.: Relationships among bounds for the region of entropic vectors in four variables. In: Allerton Conference on Communication, Control, and Computing (2010)
Yeung, R.W.: A First Course in Information Theory. Kluwer Academic/Plenum Publishers, New York (2002)
Zhang, Z., Yeung, R.W.: On characterization of entropy function via information inequalities. Proc. IEEE Trans. Inf. Theory 44(4), 1440–1452 (1998)
Acknowledgments
The author would like to acknowledge the help received during the numerous insightful, fruitful, and enjoyable discussions with Frantisek Matúš on the entropy function, matroids, and on the ultimate question of everything. Supported by TAMOP-4.2.2.C-11/1/KONV-2012-0001 and the Lendulet Program.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
1.1 Shannon inequalities
Recall that given a discrete random variable x with possible values \(\{a_1,a_2,\dots ,a_n\}\) and probability distribution \(\{p(a_i)\}_{i=1}^n\), the Shannon entropy of x is defined as \(\mathop {\mathbf{H}}(x)=-\sum _{i=1}^n\,p(a_i)\log p(a_i)\) which is a measure of the average uncertainty associated with x. Let \(\langle x_i:i\in I\rangle \) be a collection of random variables. For \(A\subseteq I\), we let \(x_A = \langle x_i:i\in A\rangle \), and \(\mathop {\mathbf{H}}(x_A)\) be the entropy of \(x_A\) equipped with the marginal distribution. Thus the entropy function \(\mathop {\mathbf{H}}\) associated with collection \(\langle x_i:i\in I\rangle \) maps the non-empty subsets of I to non-negative real numbers. The Shannon inequalities say that this \(\mathop {\mathbf{H}}\) is a monotone and submodular function, that is,
and
for all subsets A, B of I. There are redundant inequalities among the Shannon inequalities. For example, the following smaller collection implies all others: consider the inequalities from (5) where \(B=I\), and A is missing only one element of I; and the inequalities from (6) where both A and B has exactly one element not in \(A\cap B\).
1.2 Independent copy of random variables
We split a set of random variables into two disjoint groups \(\langle x_i: i\in I\rangle \) and \(\langle y_j: j\in J \rangle \), and create \(\langle x'_i:i\in I\rangle \) as an independent copy of \(\langle x_i\rangle \) over \(\langle y_j\rangle \). It means that \(\langle x'_i\rangle \) and \(\langle x_i\rangle \) have the same set of possible values, and
expressing that \(\langle x'_i\rangle \) and \(\langle x_i\rangle \) are independent over \(\langle y_j\rangle \). The entropy of certain subsets of \(\langle x'_i,x_i,y_j\rangle \) can be computed from the entropy of other subsets as follows. Let \(A, B \subseteq I\) and \(C\subseteq J\). Then,
which is due to the complete symmetry between \(\langle x'_i\rangle \) and \(\langle x_i\rangle \). The fact that \(x'_I\) and \(x_I\) are independent over \(y_J\) translates into the following entropy equality:
for all subsets \(A,B\subseteq I\).
1.3 Copy strings
The process starts by fixing four random variables a, b, c, and d with some joint distribution. Split them into two parts, create an independent copy of the first part over the second, add the newly created random variables to the group, and then repeat this process. To save on the number of variables created, in each step certain newly generated variables can be discarded, or two or more new variables can be merged into a single one. This process is described by a copy string, which has the following form:
This string describes three iterations which are separated by semicolons. In the first step we create an independent copy of cd over ab, and name the two new variables by rs such that r is a copy of c, and s is a copy of d. After this step we have six variables abcdrs with some joint distribution. In the next step we make an independent copy of cdrs over ab, merge the copies of c and r to a single variable, name it t, and add it to the pool. In the last step create an independent copy of bdrt over acd, keep the copy of t, name it u, and discard the other three newly created variables. As the result, we get the eight random variables abcdrstu.
1.4 A unimodular matrix
It is advantageous to look at the 15 entropies of the four random variables in another coordinate system. The new coordinates can be computed using the unimodular matrix shown in Table 4. Columns represent the entropies of the subsets of the four random variables a, b, c and d, as indicated in the top row. The value of the “Ingleton row” should be set to 1, and rows marked by the letter “z” vanish for all extremal vertices, thus they should be set to 0.
Appendix 2
This section lists new entropy inequalities which were found during the experiments described in Sect. 4 and have all coefficients less than 100. Each entry in the list contains nine integers representing the coefficients \(c_0\), \(c_1\), \(\dots \), \(c_8\) for the non-Shannon information inequality of the form
Here \(\mathbf {I}(A,B) = \mathop {\mathbf{H}}(A)+\mathop {\mathbf{H}}(B)-\mathop {\mathbf{H}}(AB)\) is the mutual information, \(\mathbf {I}(A,B\,|\,C)=\mathop {\mathbf{H}}(AC)+\mathop {\mathbf{H}}(BC)-\mathop {\mathbf{H}}(ABC)-\mathop {\mathbf{H}}(C)\) is the conditional mutual information. The expression after \(c_0\) is the Ingleton value. Following the list of coefficients is the applied copy string.
Rights and permissions
About this article
Cite this article
Csirmaz, L. Using multiobjective optimization to map the entropy region. Comput Optim Appl 63, 45–67 (2016). https://doi.org/10.1007/s10589-015-9760-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-015-9760-6