Using Maximum Entropy and Generalized Belief Propagation in Estimation of Distribution Algorithms

Part of the Adaptation, Learning, and Optimization book series (ALO, volume 14)

Abstract

EDAs work by sampling a population from a factorized distribution, like the Boltzmann distribution of an additively decomposable fitness function (ADF). In the Factorized Distribution Algorithm (FDA), a factorization is built from an ADF by choosing a subset of the factors. I present a new algorithm to merge factors into larger sets, allowing to account for all dependencies between the variables. Estimating larger subset distributions is more prone to sample noise, so the larger distribution can be estimated from the smaller ones with the Maximum Entropy method. Building an exact graphical model for sampling is often infeasible. E.g. in a 2-D grid the triangulated Markov network has linear clique size, thus exponentially large distributions. I explore ways to use loopy models and Generalized Belief Propagation in the context of EDA and optimization. The merging algorithm mentioned above can be combined with this.

Keywords

Maximum Entropy Factorization System Maximum Entropy Method Region Graph Additive Decomposition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bertelè, U., Brioschi, F.: Nonserial Dynamic Programming. Academic Press, New York (1972)MATHGoogle Scholar
  2. 2.
    Deming, W.E., Stephan, F.F.: On a least square adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11, 427–444 (1940)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Höns, R.: Estimation of Distribution Algorithms and Minimum Relative Entropy. PhD thesis, University of Bonn, Germany (2006), URN: urn:nbn:de:hbz:5N-07741, http://hss.ulb.uni-bonn.de/2006/0774/0774.htm
  4. 4.
    Huang, C., Darwiche, A.: Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning 15(3), 225–263 (1996)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Ireland, C.T., Kullback, S.: Contingency tables with given marginals. Biometrika (1968)Google Scholar
  6. 6.
    Ising, E.: Beitrag zur Theorie des Ferromagnetismus. Zeitschr. f. Physik 31, 253–258 (1925)CrossRefGoogle Scholar
  7. 7.
    Jaynes, E.T.: Where do we stand on maximum entropy? In: Levine, R.D., Tribus, M. (eds.) The Maximum Entropy Formalism. MIT Press, Cambridge (1978)Google Scholar
  8. 8.
    Jensen, F.V., Jensen, F.: Optimal junction trees. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, Seattle, pp. 360–366 (1994)Google Scholar
  9. 9.
    Jiroušek, R., Přeučil, S.: On the effective implementation of the iterative proportional fitting procedure. Computational Statistics & Data Analysis 19, 177–189 (1995)MATHCrossRefGoogle Scholar
  10. 10.
    Jordan, M.I. (ed.): Learning in Graphical Models. MIT Press, Cambridge (1999)Google Scholar
  11. 11.
    Kikuchi, R.: A theory of cooperative phenomena. Phys. Review 115, 988–1003 (1951)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. on Information Theory 47(2), 498–519 (2001)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Lauritzen, S.L.: Graphical Models. Clarendon Press, Oxford (1996)Google Scholar
  14. 14.
    Mahnig, T.: Populationsbasierte Optimierung durch das Lernen von Interaktionen mit Bayes’schen Netzen. PhD thesis, Universität Bonn (2001)Google Scholar
  15. 15.
    Martelli, A., Montanari, U.: Nonserial dynamic programming: On the optimal strategy of variable elimination for the rectangular lattice. J. Math. Anal. Appl. 40, 226–242 (1972)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    McEliece, R.J., Yildirim, M.: Belief propagation on partially ordered sets. In: Proceedings of the 15th Internatonal Symposium on Mathematical Theory of Networks and Systems, MTNS 2002 (2002)Google Scholar
  17. 17.
    Mézard, M., Parisi, G., Virasoro, M.A.: Spin glass theory and beyond. World Scientific, Singapore (1987)MATHGoogle Scholar
  18. 18.
    Mühlenbein, H., Höns, R.: The estimation of distributions and the minimum relative entropy principle. Evolutionary Computation 13(1), 1–27 (2005)CrossRefGoogle Scholar
  19. 19.
    Mühlenbein, H., Mahnig, T.: Convergence theory and applications of the factorized distribution algorithm. Journal of Computing and Information Technology 7(1), 19–32 (1998)Google Scholar
  20. 20.
    Mühlenbein, H., Mahnig, T.: FDA - a scalable evolutionary algorithm for the optimization of additively decomposed functions. Evolutionary Computation 7(4), 353–376 (1999)CrossRefGoogle Scholar
  21. 21.
    Mühlenbein, H., Mahnig, T., Ochoa, A.: Schemata, distributions and graphical models in evolutionary optimization. Journal of Heuristics 5(2), 213–247 (1999)CrossRefGoogle Scholar
  22. 22.
    Mühlenbein, H., Paaß, G.: From Recombination of Genes to the Estimation of Distributions I. Binary Parameters. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 178–187. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  23. 23.
    Peierls, R.: On Ising’s model of ferromagnetism. Proc. Cambridge Phil. Soc. 32, 477–481 (1936)MATHCrossRefGoogle Scholar
  24. 24.
    Santana, R.: Estimation of distribution algorithms with Kikuchi approximations. Evolutionary Computation 13(1), 67–97 (2005)CrossRefGoogle Scholar
  25. 25.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.AIS InstituteBonnGermany

Personalised recommendations