Machine Learning

, Volume 25, Issue 1, pp 23–50 | Cite as

Using the Minimum Description Length Principle to Infer Reduced Ordered Decision Graphs

  • Arlindo L. Oliveira
  • Alberto Sangiovanni-Vincentelli


We propose an algorithm for the inference of decision graphs from a set of labeled instances. In particular, we propose to infer decision graphs where the variables can only be tested in accordance with a given order and no redundant nodes exist. This type of graphs, reduced ordered decision graphs, can be used as canonical representations of Boolean functions and can be manipulated using algorithms developed for that purpose. This work proposes a local optimization algorithm that generates compact decision graphs by performing local changes in an existing graph until a minimum is reached. The algorithm uses Rissanen‘s minimum description length principle to control the tradeoff between accuracy in the training set and complexity of the description. Techniques for the selection of the initial decision graph and for the selection of an appropriate ordering of the variables are also presented. Experimental results obtained using this algorithm in two sets of examples are presented and analyzed.

Inductive learning MDL principle decision trees 


  1. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1986). Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension. In Proceedings of the 18th Annual ACM Symposium on the Theory of Computation(pp. 273–282). Berkeley, CA: ACM Press.Google Scholar
  2. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information Processing Letters, 24, 377–380.Google Scholar
  3. Brace, K., Rudell, R., & Bryant, R. (1989). Efficient implementation of a BDD package. In Proceedings of the Design Automation Conference(pp. 40–45). Anaheim, CA: ACM Press.Google Scholar
  4. Brayton, R. K., Hachtel, G. D., McMullen, C., & Sangiovanni-Vincentelli, A. S. (1984). Logic Minimization Algorithms for VLSI Synthesis. Hingham, MA: Kluwer Academic Publishers.Google Scholar
  5. Brayton, R. K., Hachtel, G. D., & Vincentelli, A. S. (1990). Multilevel logic synthesis. Proceedings of the IEEE, 78, 264–300.Google Scholar
  6. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.Google Scholar
  7. Bryant, R. E. (1986). Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, 35, 677–691.Google Scholar
  8. Casella, G. & Berger, R. L. (1990). Statistical Inference. Pacific Grove, CA: Wadsworth & Brooks/Cole.Google Scholar
  9. Coudert, O., Berthet, C., & Madre, J. C. (1989). Verification of synchronous sequential machines based on symbolic execution. In Proceedings of the Workshop on Automatic Verification Methods for Finite State Systems, Volume 407 of Lecture Notes in Computer Science(pp. 365–373). Grenoble, France: Springer-Verlag.Google Scholar
  10. Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning(pp. 194–202). Tahoe City, CA: Morgan Kaufmann.Google Scholar
  11. Fayyad, U. M. & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artifical Intelligence(pp. 1022–1027). Chambery, France: Morgan Kaufmann.Google Scholar
  12. Friedman, S. J. & Supowit, K. J. (1990). Finding the optimal variable ordering for binary decision diagrams. IEEE Transactions on Computers, 39, 710–713.Google Scholar
  13. Goldman, J. A. (1994). Machine learning: A comparative study of pattern theory and C4.5. Technical Report 0WL-TR-94-1102, Wright Laboratory, USAF, WL/AART, WPAFB, OH.Google Scholar
  14. Ishiura, N., Sawada, H., & Yajima, S. (1991). Minimization of binary decision diagrams based on exchanges of variables. In Proceedings of the International Conference on Computer Aided Design(pp. 472–475). Santa Clara, CA: IEEE Computer Society Press.Google Scholar
  15. Kam, T. & Brayton, R. (1990). Multi-valued decision diagrams. UC Berkeley Tech.Report ERL M90/125, EECS Department, Berkeley, CA.Google Scholar
  16. Kohavi, R. (1994). Bottom-up induction of oblivious read-once decision graphs: Strengths and limitations. In Proceedings of the Twelfth National Conference on Artificial Intelligence(pp. 613–618). Tahoe City, CA: Morgan Kaufmann.Google Scholar
  17. Li, M. & Vitányi, P. M. B. (1993). An Introduction to Kolmogorov Complexity. New York, NY: Springer-Verlag.Google Scholar
  18. Mahoney, J. J. & Mooney, R. J. (1991). Initializing ID5R with a domain theory: some negative results. Technical Report 91-154, CS Department, University of Texas at Austin, Austin, TX.Google Scholar
  19. Meinel, C. (1989). Modified Branching Programs and Their Computational Power. New York, NY: Springer-Verlag.Google Scholar
  20. Murphy, P. M. & Aha, D. W. (1991). Repository of Machine Learning Databases-Machine readable data repository. University of California, Irvine.Google Scholar
  21. Oliveira, A. L. (1994). Inductive Learning by Selection of Minimal Complexity Representations. PhD thesis, UC Berkeley, Berkeley, CA. Also available as UCB/ERL Technical Report M94/97.Google Scholar
  22. Oliveira, A. L. & Vincentelli, A. S. (1993). Learning complex Boolean functions: algorithms and applications. In Advances in Neural Information Processing Systems 6(pp. 911–918). Denver, CO: Morgan Kaufmann.Google Scholar
  23. Oliveira, A. L. & Vincentelli, A. S. (1995). Inferring reduced ordered decision graphs of minimal description length. In Proceedings of the Twelfth International Conference on Machine Learning(pp. 421–429). Tahoe City, CA: Morgan Kaufmann.Google Scholar
  24. Oliver, J. J. (1993). Decision graphs-an extension of decision trees. Technical Report 92/173, Monash University, 6Clayton, Victoria, Australia.Google Scholar
  25. Pagallo, G. & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–100.Google Scholar
  26. Pearl, J. (1978). On the connection between the complexity and credibility of inferred models. Journal of General Systems, 4, 255–264.Google Scholar
  27. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  28. Quinlan, J. R. (1993). C4.5-Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  29. Quinlan, J. R. & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.Google Scholar
  30. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.Google Scholar
  31. Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statististics, 14, 1080–1100.Google Scholar
  32. Rudell, R. (1993). Dynamic variable ordering for ordered binary decision diagrams. In Proceeddings of the International Conference on Computer Aided Design(pp. 42–47). Santa Clara, CA: IEEE Computer Society Press.Google Scholar
  33. Schaffer, C. (1994). A conservation law for generalization performance. In Proceedings of the Eleventh International Conference on Machine Learning(pp. 259–265). New Brunswick, NJ: Morgan Kaufmann.Google Scholar
  34. Shannon, C. E. (1938). A symbolic analysis of relay and switching circuits. Transactions AIEE, 57, 713–723.Google Scholar
  35. Shiple, T. R., Hojati, R., Vincentelli, A. L. S., & Brayton, R. K. (1994). Heuristic minimization of BDDs using don't cares. In Proceedings of the Design Automation Conference(pp. 225–231). San Diego, CA: ACM Press.Google Scholar
  36. Tani, S., Hamaguchi, K., & Yajima, S. (1993). The complexity of the optimal variable ordering problems of shared binary decision diagrams. In Proceedings of the Fourth International Symposium on Algorithms and Computation(pp. 389–98). Hong Kong: Springer-Verlag.Google Scholar
  37. Wallace, C. S. & Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.Google Scholar
  38. Yang, D. S., Rendell, L., & Blix, G. (1991). Fringe-like feature construction: A comparative study and a unifying scheme. In Proceedings of the Eight International Conference on Machine Learning(pp. 223–227). Evanston, IL: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Arlindo L. Oliveira
    • 1
  • Alberto Sangiovanni-Vincentelli
    • 2
  1. 1.IST/INESCLisboaPortugal; E-mail
  2. 2.Department of EECSUC BerkeleyBerkeleyCA

Personalised recommendations