Skip to main content
Log in

Using multiobjective optimization to map the entropy region

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Mapping the structure of the entropy region of at least four jointly distributed random variables is an important open problem. Even partial knowledge about this region has far reaching consequences in other areas in mathematics, like information theory, cryptography, probability theory and combinatorics. Presently, the only known method of exploring the entropy region is, or equivalent to, the one of Zhang and Yeung from 1998. Using some non-trivial properties of the entropy function, their method is transformed to solving high dimensional linear multiobjective optimization problems. Benson’s outer approximation algorithm is a fundamental tool for solving such optimization problems. An improved version of Benson’s algorithm is presented, which requires solving one scalar linear program in each iteration rather than two or three as in previous versions. During the algorithm design, special care is taken for numerical stability. The implemented algorithm is used to verify previous statements about the entropy region, as well as to explore it further. Experimental results demonstrate the viability of the improved Benson’s algorithm for determining the extremal set of medium-sized numerically ill-posed optimization problems. With larger problem sizes, two limitations of Benson’s algorithm is observed: the inefficiency of the scalar LP solver, and the unexpectedly large number of intermediate vertices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Hamel et al. [14] have observed the same improvement independently.

References

  1. Avis, D., Bremner, D., Seidel, R.: How good are convex hull algorithms? Comput. Geom. 7(5–6), 265–301 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  2. Baber, R., Christofides, D., Dang, A.N., Riis, S., Vaughan, E.R.: Multiple unicasts, graph guessing games, and non-Shannon inequalities. In: Proc. NetCod 2013, Calgary, pp. 1–6

  3. Bassoli, R., Marques, H., Rodriguez, J., Shum, K.W., Tafazolli, R.: Network coding theory: a survey. IEEE Commun. Surveys Tutor. 15(4), 1950–1978 (2013)

    Article  Google Scholar 

  4. Beimel, A.: Secret-sharing schemes: a survey. Coding and Cryptology. LNCS, pp. 11–46. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Benson, P.: An outer approximation algorithm for generating all efficient extreme points in the outcome set of a multiple objective linear program. J. Glob. Optim. 13(1), 1–24 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bremner, D.: On the complexity of vertex and facet enumeration for convex polytopes. PhD Thesis, School of Computer Science, McGill University (1997)

  7. Burton, B.A., Ozlen, M.: Projective geometry and the outer approximation algorithm for multiobjective linear programming, arXiv:1006.3085 (2010)

  8. Chan, T.H.: Balanced information inequalities. IEEE Trans. Inform. Theory 49, 3261–3267 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. Chan, T.H.: Recent progresses in characterising information inequalities. Entropy 13, 379–401 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  10. Csirmaz, L.: Book inequalities. IEEE Trans. Inform. Theory 60, 6811–6818 (2014)

    Article  MathSciNet  Google Scholar 

  11. Dougherty, R., Freiling, C., Zeger, K.: Non-Shannon information inequalities in four random variables, arXiv:1104.3602 (2011)

  12. Ehrgott, M., Löhne, A., Shao, L.: A dual variant of Benson’s outer approximation algorithm for multiple objective linear programming. J. Glob. Optim. 52, 757–778 (2012)

    Article  MATH  Google Scholar 

  13. Fukuda, K., Prodon, A.: Double description method revisited. Combinatorics and Computer Science (Brest, 1995). LNCS, vol. 1120, pp. 91–111. Springer, Berlin (1996)

    Chapter  Google Scholar 

  14. Hamel, A.H., Löhne, A., Rudloff, B.: Benson type algorithms for linear vector optimization and applications. J. Glob. Optim. 59, 811–836 (2013)

    Article  Google Scholar 

  15. Heyde, F., Löhne, A.: Geometric duality in multiple objective linear programming. SIAM J. Optim. 19(2), 836–845 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kaced, T.: Equivalence of two proof techniques for non-Shannon-type inequalities, In: Proceedings of the 2013 IEEE International Symposium on Information Theory, Istambul, pp. 236–240 (2013)

  17. Madiman, M., Marcus, A.W., Tetali, P.: Information-theoretic inequalities in additive combinatorics. In: IEEE ITW, pp. 1–4 (2010)

  18. Makarychev, K., Makarychev, Yu., Romashchenko, A., Vereshchagin, N.: A new class of non-Shannon-type inequalities for entropies. Commun. Inf. Syst. 2(2), 147–166 (2002)

    MATH  MathSciNet  Google Scholar 

  19. Matus, F.: Infinitely many information inequalities. In: Proceedings ISIT, pp. 41–47, 24–29 June 2007. Nice, France

  20. Matus, F.: Two constructions on limits of entropy functions. IEEE Trans. Inform. Theory 53(1), 320–330 (2007)

    Article  MathSciNet  Google Scholar 

  21. Matus, F.: Personal communication (2012)

  22. Matus, F., Studeny, M.: Conditional independencies among four random variables I. Comb. Probab. Comput. 4, 269–278 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  23. McRae, W.B., Davidson, E.R.: An algorithm for the extreme rays of a pointed convex polyhedral cone. SIAM J. Comput. 2(4), 281–293 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  24. Pippenger, N.: What are the laws of information theory. In: Special Problems on Communication and Computation Conference. Palo Alt, California, 3–5 Sept 1986

  25. Studený, M.: Probabilistic Conditional Independence Structures. Springer, New York (2005)

    MATH  Google Scholar 

  26. MacLaren Walsh, J., Weber, S.: Relationships among bounds for the region of entropic vectors in four variables. In: Allerton Conference on Communication, Control, and Computing (2010)

  27. Yeung, R.W.: A First Course in Information Theory. Kluwer Academic/Plenum Publishers, New York (2002)

    Book  Google Scholar 

  28. Zhang, Z., Yeung, R.W.: On characterization of entropy function via information inequalities. Proc. IEEE Trans. Inf. Theory 44(4), 1440–1452 (1998)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author would like to acknowledge the help received during the numerous insightful, fruitful, and enjoyable discussions with Frantisek Matúš on the entropy function, matroids, and on the ultimate question of everything. Supported by TAMOP-4.2.2.C-11/1/KONV-2012-0001 and the Lendulet Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to László Csirmaz.

Appendices

Appendix 1

1.1 Shannon inequalities

Recall that given a discrete random variable x with possible values \(\{a_1,a_2,\dots ,a_n\}\) and probability distribution \(\{p(a_i)\}_{i=1}^n\), the Shannon entropy of x is defined as \(\mathop {\mathbf{H}}(x)=-\sum _{i=1}^n\,p(a_i)\log p(a_i)\) which is a measure of the average uncertainty associated with x. Let \(\langle x_i:i\in I\rangle \) be a collection of random variables. For \(A\subseteq I\), we let \(x_A = \langle x_i:i\in A\rangle \), and \(\mathop {\mathbf{H}}(x_A)\) be the entropy of \(x_A\) equipped with the marginal distribution. Thus the entropy function \(\mathop {\mathbf{H}}\) associated with collection \(\langle x_i:i\in I\rangle \) maps the non-empty subsets of I to non-negative real numbers. The Shannon inequalities say that this \(\mathop {\mathbf{H}}\) is a monotone and submodular function, that is,

$$\begin{aligned} 0\le \mathop {\mathbf{H}}(x_A)\le \mathop {\mathbf{H}}(x_B) ~~~\text{ when }\; A\subseteq B, \end{aligned}$$
(5)

and

$$\begin{aligned} \mathop {\mathbf{H}}(x_{A\cup B}) + \mathop {\mathbf{H}}(x_{A\cap B}) \le \mathop {\mathbf{H}}(x_A)+\mathop {\mathbf{H}}(x_B), \end{aligned}$$
(6)

for all subsets A, B of I. There are redundant inequalities among the Shannon inequalities. For example, the following smaller collection implies all others: consider the inequalities from (5) where \(B=I\), and A is missing only one element of I; and the inequalities from (6) where both A and B has exactly one element not in \(A\cap B\).

1.2 Independent copy of random variables

We split a set of random variables into two disjoint groups \(\langle x_i: i\in I\rangle \) and \(\langle y_j: j\in J \rangle \), and create \(\langle x'_i:i\in I\rangle \) as an independent copy of \(\langle x_i\rangle \) over \(\langle y_j\rangle \). It means that \(\langle x'_i\rangle \) and \(\langle x_i\rangle \) have the same set of possible values, and

$$\begin{aligned}&\mathop {\mathrm {Prob}}\big (\langle x'_i{=}a'_i\rangle , \langle x_i{=}a_i\rangle , \langle y_j{=}b_j\rangle \big ) \\&\quad = \frac{\mathop {\mathrm {Prob}}\big (\langle x_i{=}a'_i\rangle , \langle y_j{=}b_j\rangle \big ) \cdot \mathop {\mathrm {Prob}}\big (\langle x_i{=}a_i\rangle , \langle y_j{=}b_j\rangle \big )}{\mathop {\mathrm {Prob}}\big (\langle y_j{=}b_j\rangle \big )}, \end{aligned}$$

expressing that \(\langle x'_i\rangle \) and \(\langle x_i\rangle \) are independent over \(\langle y_j\rangle \). The entropy of certain subsets of \(\langle x'_i,x_i,y_j\rangle \) can be computed from the entropy of other subsets as follows. Let \(A, B \subseteq I\) and \(C\subseteq J\). Then,

$$\begin{aligned} \mathop {\mathbf{H}}(x'_A x_By_C) = \mathop {\mathbf{H}}(x'_B x_A y_C), \end{aligned}$$

which is due to the complete symmetry between \(\langle x'_i\rangle \) and \(\langle x_i\rangle \). The fact that \(x'_I\) and \(x_I\) are independent over \(y_J\) translates into the following entropy equality:

$$\begin{aligned} \mathop {\mathbf{H}}(x'_A x_B y_J) = \mathop {\mathbf{H}}(x'_A y_J) + \mathop {\mathbf{H}}(x_B y_J) - \mathop {\mathbf{H}}(y_J) \end{aligned}$$

for all subsets \(A,B\subseteq I\).

1.3 Copy strings

The process starts by fixing four random variables a, b, c, and d with some joint distribution. Split them into two parts, create an independent copy of the first part over the second, add the newly created random variables to the group, and then repeat this process. To save on the number of variables created, in each step certain newly generated variables can be discarded, or two or more new variables can be merged into a single one. This process is described by a copy string, which has the following form:

$$\begin{aligned} \mathtt{rs=cd:ab;\,t=(cr):ab;\,u=t:acs} \end{aligned}$$

This string describes three iterations which are separated by semicolons. In the first step we create an independent copy of cd over ab, and name the two new variables by rs such that r is a copy of c, and s is a copy of d. After this step we have six variables abcdrs with some joint distribution. In the next step we make an independent copy of cdrs over ab, merge the copies of c and r to a single variable, name it t, and add it to the pool. In the last step create an independent copy of bdrt over acd, keep the copy of t, name it u, and discard the other three newly created variables. As the result, we get the eight random variables abcdrstu.

1.4 A unimodular matrix

It is advantageous to look at the 15 entropies of the four random variables in another coordinate system. The new coordinates can be computed using the unimodular matrix shown in Table 4. Columns represent the entropies of the subsets of the four random variables a, b, c and d, as indicated in the top row. The value of the “Ingleton row” should be set to 1, and rows marked by the letter “z” vanish for all extremal vertices, thus they should be set to 0.

Table 4 The unimodular matrix

Appendix 2

This section lists new entropy inequalities which were found during the experiments described in Sect. 4 and have all coefficients less than 100. Each entry in the list contains nine integers representing the coefficients \(c_0\), \(c_1\), \(\dots \), \(c_8\) for the non-Shannon information inequality of the form

$$\begin{aligned}&c_0\big (\mathbf {I}(c,d)-\mathbf {I}(a,b)+\mathbf {I}(a,b\,|\,c)+\mathbf {I}(a,b\,|\,d)\big ) \\&\quad +\, c_1\mathbf {I}(a,b\,|\,c)+c_2\mathbf {I}(a,b\,|\,d) \\&\quad +\, c_3\mathbf {I}(a,c\,|\,b) + c_4\mathbf {I}(b,c\,|\,a)+ c_5\mathbf {I}(a,d\,|\,b)+c_6\mathbf {I}(b,d\,|\,a) \\&\quad +\, c_7\mathbf {I}(c,d\,|\,a) + c_8\mathbf {I}(c,d\,|\,b) \ge 0. \end{aligned}$$

Here \(\mathbf {I}(A,B) = \mathop {\mathbf{H}}(A)+\mathop {\mathbf{H}}(B)-\mathop {\mathbf{H}}(AB)\) is the mutual information, \(\mathbf {I}(A,B\,|\,C)=\mathop {\mathbf{H}}(AC)+\mathop {\mathbf{H}}(BC)-\mathop {\mathbf{H}}(ABC)-\mathop {\mathbf{H}}(C)\) is the conditional mutual information. The expression after \(c_0\) is the Ingleton value. Following the list of coefficients is the applied copy string.

figure a
figure b
figure c

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Csirmaz, L. Using multiobjective optimization to map the entropy region. Comput Optim Appl 63, 45–67 (2016). https://doi.org/10.1007/s10589-015-9760-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-015-9760-6

Keywords

Mathematics Subject Classfication

Navigation