Machine Learning

, Volume 70, Issue 2–3, pp 189–206 | Cite as

Margin-based first-order rule learning



We present a new margin-based approach to first-order rule learning. The approach addresses many of the prominent challenges in first-order rule learning, such as the computational complexity of optimization and capacity control. Optimizing the mean of the margin minus its variance, we obtain an algorithm linear in the number of examples and a handle for capacity control based on error bounds. A useful parameter in the optimization problem tunes how evenly the weights are spread among the rules. Moreover, the search strategy for including new rules can be adjusted flexibly, to perform variants of propositionalization or relational learning. The implementation of the system includes plugins for logical queries, graphs and mathematical terms. In extensive experiments, we found that, at least on the most commonly used toxicological datasets, overfitting is hardly an issue. In another batch of experiments, a comparison with margin-based ILP approaches using kernels turns out to be favorable. Finally, an experiment shows how many features are needed by propositionalization and relational learning approaches to reach a certain predictive performance.


First-order learning Relational learning Rule learning Margins Capacity control 


  1. Ben-David, S., Eiron, N., & Long, P. M. (2003). On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3), 496–514. MATHCrossRefMathSciNetGoogle Scholar
  2. Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99) (pp. 335–342). Menlo Park: AAAI Press. Google Scholar
  3. De Raedt, L. D. (1997). Logical settings for concept-learning. Artificial Intelligence, 95(1), 187–201. MATHCrossRefMathSciNetGoogle Scholar
  4. Dias, A. M. (2006). CxProlog.
  5. Friedman, J. H., & Popescu, B. E. (2005). Predictive learning via rule ensembles (Technical report). Stanford University. Google Scholar
  6. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30. MATHCrossRefMathSciNetGoogle Scholar
  7. Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory. Cambridge: MIT Press. Google Scholar
  8. King, R., & Srinivasan, A. (1995). Relating chemical activity to structure: an examination of ILP successes. New Generation Computing, Special issue on Inductive Logic Programming, 13(3–4), 411–434. Google Scholar
  9. Kramer, S., Lavrac, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Dzeroski & N. Lavrac (Eds.), Relational Data Mining (pp. 262–291). Berlin: Springer. Google Scholar
  10. Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: Learning simple relational kernels. In Proceedings of the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference, Boston, Massachusetts, USA, 16–20 July 2006. Menlo Park: AAAI Press. Google Scholar
  11. Li, H., Yap, C. W., Ung, C. Y., Xue, Y., Cao, Z. W., & Chen, Y. Z. (2005). Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods. Journal of Chemical Information and Modeling, 45(5), 1376–1384. CrossRefGoogle Scholar
  12. McDiarmid, C. (1989). On the method of bounded differences. In London mathematical society lecture note series : Vol. 141. Surveys in combinatorics (pp. 148–188). Cambridge: Cambridge Univ. Press. Google Scholar
  13. Muggleton, S., Lodhi, H., Amini, A., & Sternberg, M. J. E. (2005). Support vector inductive logic programming. In A. G. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science (pp. 163–175). New York: Springer. CrossRefGoogle Scholar
  14. Popescul, A., & Ungar, L. (2003). Statistical relational learning for link prediction. In IJCAI workshop on learning statistical models from relational data. Google Scholar
  15. Rückert, U., & Kramer, S. (2004). Frequent free tree discovery in graph data. In H. Haddad & A. Omicini (Eds.), Proceedings of the ACM symposium on applied computing (pp. 564–570). New York: ACM. Google Scholar
  16. Rückert, U., & Kramer, S. (2006). A statistical approach to rule learning. In Machine learning, proceedings of the twenty-third international conference (ICML 2006) (pp. 785–792), Pittsburgh, Pennsylvania, USA, 25–29 June 2006. New York: ACM. Google Scholar
  17. Srinivasan, A., Muggleton, S., Sternberg, M. J. E., & King, R. D. (1996). Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence, 85(1–2), 277–299. CrossRefGoogle Scholar
  18. Woźnica, A., Kalousis, A., & Hilario, M. (2005). Kernels over relational algebra structures. In T. B. Ho, D. Cheung, & H. Liu (Eds.), Lecture notes in computer science : Vol. 3518. PAKDD (pp. 588–598). Berlin: Springer. Google Scholar
  19. Yoshida, F., & Topliss, J. (2000). QSAR model for drug human oral bioavailability. Journal of Medicinal Chemistry, 43, 2575–2585. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Institut für Informatik/I12Technische Universität MünchenGarching b. MünchenGermany

Personalised recommendations