Machine Learning

, Volume 57, Issue 3, pp 233–269

Naive Bayesian Classification of Structured Data

  • Peter A. Flach
  • Nicolas Lachiche

Abstract

In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along first-order features. These features are built from the individual using structural predicates referring to related objects (e.g., atoms within molecules), and properties applying to the individual or one or several of its related objects (e.g., a bond between two atoms). We describe an individual in terms of elementary features consisting of zero or more structural predicates and one property; these features are treated as conditionally independent in the spirit of the naive Bayes assumption. 1BC2 represents an alternative first-order upgrade to the naive Bayesian classifier by considering probability distributions over structured objects (e.g., a molecule as a set of atoms), and estimating those distributions from the probabilities of its elements (which are assumed to be independent). We present a unifying view on both systems in which 1BC works in language space, and 1BC2 works in individual space. We also present a new, efficient recursive algorithm improving upon the original propositionalisation approach of 1BC. Both systems have been implemented in the context of the first-order descriptive learner Tertius, and we investigate the differences between the two systems both in computational terms and on artificially generated data. Finally, we describe a range of experiments on ILP benchmark data sets demonstrating the viability of our approach.

bayesian classifier structured data inductive logic programming knowledge representation first-order features 

References

  1. Badea, L., & Stanciu, M. (1999). Refinement operators can be (weakly) perfect. In Proceedings of the NinthInternational Workshop on Inductive Logic Programming (pp. 21–32). Springer-Verlag.Google Scholar
  2. Boström, H., & Asker, L. (1999). Combining divide-and-conquer and separate-and-conquer for efficient and effective rule induction. In Proceedings of the Ninth International Workshop on Inductive Logic Programming (pp. 33–43). Springer-Verlag.Google Scholar
  3. Ceci, M., Appice, A., & Malerba, D. (2003). Mr-SBC: A multi-relational naïve Bayes classifier. In Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 95–106). Springer-Verlag.Google Scholar
  4. Cussens, J. (2001). Parameter estimation in stochastic logic programs. Machine Learning, 43, 245–271.Google Scholar
  5. Date, C. (1995). An introduction to database systems. Addison Wesley.Google Scholar
  6. Dehaspe, L. (1997). Maximum entropy modeling with clausal constraints. In Proceedings of the Seventh International Workshop on Inductive Logic Programming (pp. 109–124). Springer-Verlag.Google Scholar
  7. Dietterich, T., Lathrop, R., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.Google Scholar
  8. Dol?sak, B., & Muggleton, S. (1992). The application of inductive logic programming to finite-element mesh design. In S. Muggleton (Ed.), Inductive logic programming. Academic Press.Google Scholar
  9. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.Google Scholar
  10. D?zeroski, S., Schulze-Kremer, S., Heidtke, K., Siems, K., Wettschereck, D., & Blockeel, H. (1998). Diterpene structure elucidation from 13C NMR spectra with inductive logic programming. Applied Artificial Intelligence,12, 363–383.Google Scholar
  11. D?zeroski, S., & Lavrač, N. (Eds.). (2001). Relational data mining. Springer-Verlag.Google Scholar
  12. Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers. Technical report HPL-2003-4. HP Laboratories, Palo Alto, CA, USA. Available at http://www.hpl.hp.com/techreports/ 2003/HPL-2003-4.pdf.Google Scholar
  13. Flach, P., Giraud-Carrier, C., & Lloyd, J. (1998). Strongly typed inductive concept learning. Proceedings of the Eighth International Conference on Inductive Logic Programming (pp. 185–194). Springer-Verlag.Google Scholar
  14. Flach, P., & Lachiche, N. (1999). 1BC: A first-order Bayesian classifier. In Proceedings of the Ninth International Workshop on Inductive Logic Programming (pp. 92–103). Springer-Verlag.Google Scholar
  15. Flach, P. (1999). Knowledge representation for inductive learning. Symbolic and Quantitative Approaches toReasoning and Uncertainty (pp. 160–167). Springer-Verlag.Google Scholar
  16. Flach, P., Gyftodimos, E., & Lachiche, N. (to appear). Probabilistic reasoning with terms. Linkoping Electronic Articles in Computer and Information Science. Available at http://www.ida.liu.se/ext/epa/cis/2002/011/tcover.html.Google Scholar
  17. Flach, P., & Lachiche, N. (2001). Confirmation-guided discovery of first-order rules with tertius. Machine Learning, 42, 61–95.Google Scholar
  18. Gärtner, T., Lloyd, J., & Flach, P. (2004). Kernels and distances for structured data. Machine Learning, 57:3, 205–232.Google Scholar
  19. Getoor, L., Friedman, N., Koller, D., & Pfeffer, A. (2001). Learning probabilistic relational models. In S. D?zeroski and N. Lavrač (Eds.), Relational data mining. Springer-Verlag.Google Scholar
  20. Gyftodimos, E., & Flach, P. (2003). Hierarchical Bayesian networks: an approach to classification and learning for structured data. In Proceedings of the Work-in-Progress Track at the Thirteenth International Conference on Inductive Logic Programming (pp. 12–21). Department of Informatics, University of Szeged.Google Scholar
  21. Hand, D., & Till, R. (2001). A simple generalisation of the Area Under the ROC Curve for multiple class classification problems. Machine Learning, 45, 171–186.Google Scholar
  22. John, G., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). Morgan Kaufmann.Google Scholar
  23. Kersting, K., & De Raedt, L. (2000). Bayesian logic programs. In Proceedings of theWork-in-Progress Track at the Tenth International Conference on Inductive Logic Programming (pp. 138–155). CEURWorkshop Proceedings Series, Vol. 35.Google Scholar
  24. Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In Proceedings of the Eleventh International Conference on Inductive Logic Programming (pp. 118–131). Springer-Verlag.Google Scholar
  25. Knobbe, A., de Haas, M., & Siebes, A. (2001). Propositionalisation and aggregates. In Proceedings of the Fifth European Conference on Principles of Data Mining and Knowledge Discovery (pp. 277–288). Springer-Verlag.Google Scholar
  26. Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. D?zeroski and N. Lavrač (Eds.), Relational data mining. Springer-Verlag.Google Scholar
  27. Krogel, M.-A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In Proceedings of the Eleventh International Conference on Inductive Logic Programming (pp. 142–155). Springer-Verlag.Google Scholar
  28. Lachiche, N., & Flach, P. (2002). 1BC2: a true first-order Bayesian classifier. In Proceedings of the Twelfth International Conference on Inductive Logic Programming (pp. 133–148). Springer-Verlag.Google Scholar
  29. Lachiche, N., & Flach, P. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 416–423). AAAI Press.Google Scholar
  30. Lavrač, N., & Flach, P. (2001). An extended transformation approach to inductive logic programming. ACM Transactions on Computational Logic, 2, 458–494.Google Scholar
  31. Lavrač, N., Železný, F., & Flach, P. (2002). RSD: Relational subgroup discovery through first-order feature construction. In Proceedings of the Twelfth International Conference on Inductive Logic Programming (pp. 149–165). Springer-Verlag.Google Scholar
  32. Lloyd, J. (1999). Programming in an integrated functional and logic language. Journal of Functional and Logic Programming, 1999.Google Scholar
  33. Lloyd, J. (2003). Logic for learning: learning comprehensible theories from structured data. Springer-Verlag.Google Scholar
  34. Lu, Q., & Getoor, L. (2003). Link-based classification. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 496–503). AAAI Press.Google Scholar
  35. Muggleton, S. (Ed.). (1992). Inductive logic programming. Academic Press.Google Scholar
  36. Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing,13, 245–286.Google Scholar
  37. Muggleton, S. (1996). Stochastic logic programs. In L. De Raedt (Ed.), Advances in inductive logic programming. IOS Press.Google Scholar
  38. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.Google Scholar
  39. Muggleton, S., Srinivasan, A., King, R., & Sternberg, M. (1998). Biochemical knowledge discovery using Inductive Logic Programming. In Proceedings of the first International Conference on Discovery Science (pp. 326–341). Springer-Verlag.Google Scholar
  40. Pompe, U., & Kononenko, I. (1995). Naive Bayesian classifier within ILP-R. In Proceedings of the Fifth International Workshop on Inductive Logic Programming (pp. 417–436). Department of Computer Science, Katholieke Universiteit Leuven.Google Scholar
  41. Rouveirol, C. (1994). Flattening and saturation: Two representation changes for generalization. Machine Learning, 14, 219–232.Google Scholar
  42. Sato, T., & Kameya, Y. (1997). Prism: A symbolic-statistical modeling language. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 1330–1335). Morgan Kaufmann.Google Scholar
  43. Sato, T., & Kameya, Y. (2001). Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research, 15, 391–454.Google Scholar
  44. Slattery, S., & Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In Proceedings of the Eighth International Conference on Inductive Logic Programming (pp. 38–52). Springer-Verlag.Google Scholar
  45. Srinivasan, A., Muggleton, S., King, R., & Sternberg, M. (1994). Mutagenesis: ILP experiments in a nondeterminate biological domain. In Proceedings of the Fourth International Workshop on Inductive Logic Programming (pp. 217–232). Gesellschaft für Mathematik und Datenverarbeitung MBH.Google Scholar
  46. Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic clustering in relational data. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 870–87). Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Peter A. Flach
    • 1
  • Nicolas Lachiche
    • 2
  1. 1.Department of Computer ScienceUniversity of BristolUnited Kingdom
  2. 2.LSIIT, UniversitéStrasbourgFrance

Personalised recommendations