Classification Using Bayes Averaging of Multiple, Relational Rule-based Models

  • Kamal Ali
  • Michael Pazzani
Part of the Lecture Notes in Statistics book series (LNS, volume 112)


We present a way of approximating the posterior probability of a rule-set model that is comprised of a set of class descriptions. Each class description, in turn, consists of a set of relational rules. The ability to compute this posterior and to learn many models from the same training set allows us to approximate the expectation that an example to be classified belongs to some class. The example is assigned to the class maximizing the expectation. By assuming a uniform prior distribution of models, the posterior of the model does not depend on the structure of the model: it only depends on how the training examples are partitioned by the rules of the rule-set model. This uniform distribution assumption allows us to compute the posterior for models containing relational and recursive rules. Our approximation to the posterior probability yields significant improvements in accuracy as measured on four relational data sets and four attribute-value data sets from the UCI repository. We also provide evidence that learning multiple models helps most in data sets in which there are many, apparently equally good rules to learn.


Posterior Probability Multiple Model Class Description Recursive Call Default Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Ali-Pazzani93]
    Ali K. and Pazzani M. (1993). HYDRA: A Noise-tolerant Relational Concept Learning Algorithm. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. Chambery, France: Morgan Kaufmann.Google Scholar
  2. [1]
    Bergadano F., Giordana A. (1988) A Knowledge Intensive Approach to Concept Induction. In Proceedings of the Fifth International Conference on Machine Learning., Ann Arbor, MA: Morgan Kaufmann.Google Scholar
  3. [Berger85]
    Berger J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York.zbMATHGoogle Scholar
  4. [Buntine90]
    Buntine W. (1990). A Theory of Learning Classification Rules. Doctoral dissertation. School of Computing Science, University of Technology, Sydney, Australia.Google Scholar
  5. [2]
    De Raedt L. and Bruynooghe M. (1988). On Interactive concept-learning and assimilation. In D. Sleeman (Ed.), Proceeings of the Third European Working Session on Learning. (pp. 167–176). Pitman.Google Scholar
  6. [Esposito93]
    Esposito F., Malerba D. and Semeraro G. (1992). Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 3.CrossRefGoogle Scholar
  7. [Gams89]
    New Measurements Highlight the Importance of Redundant Knowledge. In European Working Session on Learning (4th: 1989: Montpeiller, France). Pitman.Google Scholar
  8. [Kononenko92]
    Kononenko I. and Kovacic M. (1992). Learning as Optimization: Stochastic Generation of Multiple Knowledge. In Machine Learning: Proceedings of the Ninth International Workshop. Aberdeen, Scotland. Morgan Kaufmann.Google Scholar
  9. [Kruska178]
    Kruskal W.H. and Tanur J.M. (1978). International encyclopedia of statistics. New York, NY: Free Press.Google Scholar
  10. [Kwok90]
    Kwok S. and Carter C. (1990). Multiple decision trees. Uncertainty in Artificial Intelligence, 4, 327–335.Google Scholar
  11. [Muggleton89]
    Muggleton S., Bain M., Hayes-Michie J. and Michie D. (1989). An experimental comparison of human and machine-learning formalisms. In Proceedings of the Sixth International Workshop on Machine Learning. Ithaca, NY. Morgan Kaufmann.Google Scholar
  12. [3]
    Muggleton S. and Feng C. (1990). Efficient induction of logic programs. In Proceedings of the First Conference on Algorithmic Learning Theory. Tokyo. Ohmsha Press.Google Scholar
  13. [Pazzani-Brunk9l]
    Pazzani M. and Brunk C. (1991). Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning. Knowledge Acquisition, 3, 157–173.CrossRefGoogle Scholar
  14. [Pazzani-Kibler9l]
    Pazzani M. and Kibler D. (1991). The utility of knowledge in inductive learning. Machine Learning, 9, 1, 57–94.Google Scholar
  15. [Quinlan90]
    Quinlan R. (1990). Learning logical definitions from relations. Machine Learning, 5, 3.Google Scholar
  16. [4]
    Segal R. and Etzioni O. (1994). “Learning Decision Lists Using Homogoneous Rules” in Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA: AAAI Press.Google Scholar
  17. [Smyth92]
    Smyth P. and Goodman R. (1992). Rule Induction Using Information Theory. In G. Piatetsky-Shapiro (ed.) Knowledge Discovery in Databases, Menlo Park, CA: AAAI Press, MIT Press.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Kamal Ali
    • 1
  • Michael Pazzani
    • 1
  1. 1.Department of Information and Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations