Learning from Data pp 207-217 | Cite as

# Classification Using Bayes Averaging of Multiple, Relational Rule-based Models

## Abstract

We present a way of approximating the posterior probability of a rule-set model that is comprised of a set of class descriptions. Each class description, in turn, consists of a set of relational rules. The ability to compute this posterior and to learn many models from the same training set allows us to approximate the expectation that an example to be classified belongs to some class. The example is assigned to the class maximizing the expectation. By assuming a uniform prior distribution of models, the posterior of the model does not depend on the structure of the model: it only depends on how the training examples are partitioned by the rules of the rule-set model. This uniform distribution assumption allows us to compute the posterior for models containing relational and recursive rules. Our approximation to the posterior probability yields significant improvements in accuracy as measured on four relational data sets and four attribute-value data sets from the UCI repository. We also provide evidence that learning multiple models helps most in data sets in which there are many, apparently equally good rules to learn.

## Keywords

Posterior Probability Multiple Model Class Description Recursive Call Default Rule## Preview

Unable to display preview. Download preview PDF.

## References

- [Ali-Pazzani93]Ali K. and Pazzani M. (1993). HYDRA: A Noise-tolerant Relational Concept Learning Algorithm. In
*Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence*. Chambery, France: Morgan Kaufmann.Google Scholar - [1]Bergadano F., Giordana A. (1988) A Knowledge Intensive Approach to Concept Induction. In
*Proceedings of the Fifth International Conference on Machine Learning*., Ann Arbor, MA: Morgan Kaufmann.Google Scholar - [Berger85]Berger J. O. (1985).
*Statistical Decision Theory and Bayesian Analysis*. Springer-Verlag, New York.zbMATHGoogle Scholar - [Buntine90]Buntine W. (1990).
*A Theory of Learning Classification Rules*. Doctoral dissertation. School of Computing Science, University of Technology, Sydney, Australia.Google Scholar - [2]De Raedt L. and Bruynooghe M. (1988). On Interactive concept-learning and assimilation. In D. Sleeman (Ed.),
*Proceeings of the Third European Working Session on Learning*. (pp. 167–176). Pitman.Google Scholar - [Esposito93]Esposito F., Malerba D. and Semeraro G. (1992). Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 14, 3.CrossRefGoogle Scholar - [Gams89]New Measurements Highlight the Importance of Redundant Knowledge. In
*European Working Session on Learning (4th: 1989: Montpeiller, France)*. Pitman.Google Scholar - [Kononenko92]Kononenko I. and Kovacic M. (1992). Learning as Optimization: Stochastic Generation of Multiple Knowledge. In
*Machine Learning: Proceedings of the Ninth International Workshop*. Aberdeen, Scotland. Morgan Kaufmann.Google Scholar - [Kruska178]Kruskal W.H. and Tanur J.M. (1978).
*International encyclopedia of statistics*. New York, NY: Free Press.Google Scholar - [Kwok90]Kwok S. and Carter C. (1990). Multiple decision trees.
*Uncertainty in Artificial Intelligence*,*4*, 327–335.Google Scholar - [Muggleton89]Muggleton S., Bain M., Hayes-Michie J. and Michie D. (1989). An experimental comparison of human and machine-learning formalisms. In
*Proceedings of the Sixth International Workshop on Machine Learning*. Ithaca, NY. Morgan Kaufmann.Google Scholar - [3]Muggleton S. and Feng C. (1990). Efficient induction of logic programs. In
*Proceedings of the First Conference on Algorithmic Learning Theory*. Tokyo. Ohmsha Press.Google Scholar - [Pazzani-Brunk9l]Pazzani M. and Brunk C. (1991). Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning.
*Knowledge Acquisition*,*3*, 157–173.CrossRefGoogle Scholar - [Pazzani-Kibler9l]Pazzani M. and Kibler D. (1991). The utility of knowledge in inductive learning.
*Machine Learning*,*9*,*1*, 57–94.Google Scholar - [Quinlan90]Quinlan R. (1990). Learning logical definitions from relations.
*Machine Learning*,*5*,*3*.Google Scholar - [4]Segal R. and Etzioni O. (1994). “Learning Decision Lists Using Homogoneous Rules” in
*Proceedings of the Twelfth National Conference on Artificial Intelligence*, Seattle, WA: AAAI Press.Google Scholar - [Smyth92]Smyth P. and Goodman R. (1992). Rule Induction Using Information Theory. In G. Piatetsky-Shapiro (ed.)
*Knowledge Discovery in Databases*, Menlo Park, CA: AAAI Press, MIT Press.Google Scholar