What Kinds of Relational Features Are Useful for Statistical Learning?

  • Amrita Saha
  • Ashwin Srinivasan
  • Ganesh Ramakrishnan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7842)


A workmanlike, but nevertheless very effective combination of statistical and relational learning uses a statistical learner to construct models with features identified (quite often, separately) by a relational learner. This form of model-building has a long history in Inductive Logic Programming (ILP), with roots in the early 1990s with the LINUS system. Additional work has also been done in the field under the categories of propositionalisation and relational subgroup discovery, where a distinction has been made between elementary and non-elementary features, and statistical models have been constructed using one or the other kind of feature. More recently, constructing relational features has become an essential step in many model-building programs in the emerging area of Statistical Relational Learning (SRL). To date, not much work—theoretical or empirical—has been done on what kinds of relational features are sufficient to build good statistical models. On the face of it, the features that are needed are those that capture diverse and complex relational structure. This suggests that the feature-constructor should examine as rich a space as possible, in terms of relational descriptions. One example is the space of all possible features in first-order logic, given constraints of the problem being addressed. Practically, it may be intractable for a relational learner to search such a space effectively for features that may be potentially useful for a statistical learner. Additionally, the statistical learner may also be able to capture some kinds of complex structure by combining simpler features together. Based on these observations, we investigate empirically whether it is acceptable for a relational learner to examine a more restricted space of features than that actually necessary for the full statistical model. Specifically, we consider five sets of features, partially ordered by the subset relation, bounded on top by the set F d , the set of features corresponding to definite clauses subject to domain-specific restrictions; and bounded at the bottom by F e , the set of “elementary” features with substantial additional constraints. Our results suggest that: (a) For relational datasets used in the ILP literature, features from F d may not be required; and (b) Models obtained with a standard statistical learner with features from subsets of features are comparable to the best obtained to date.


Predictive Accuracy Feature Class Relational Feature Inductive Logic Programming Subset Relation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alphonse, É.: Macro-operators revisited in inductive logic programming. In: Camacho, R., King, R., Srinivasan, A. (eds.) ILP 2004. LNCS (LNAI), vol. 3194, pp. 8–25. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Santos Costa, V., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Van Laer, W., Cussens, J., Frisch, A.: Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research 4, 491 (2002)Google Scholar
  3. 3.
    Dembczynski, K., Kotlowski, W., Slowinski, R.: Maximum likelihood rule ensembles. In: ICML, pp. 224–231 (2008)Google Scholar
  4. 4.
    Gottlob, G., Leone, N., Scarcello, F.: On the complexity of some inductive logic programming problems. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 17–32. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  5. 5.
    Gutmann, B., Kersting, K.: TildeCRF: Conditional random fields for logical sequences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 174–185. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Jawanpuria, P., Nath, J.S., Ramakrishnan, G.: Efficient rule ensemble learning using hierarchical kernels. In: ICML, pp. 161–168 (2011)Google Scholar
  7. 7.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International, pp. 121–129. Morgan Kaufmann (1994)Google Scholar
  8. 8.
    Karwath, A., Kersting, K., Landwehr, N.: Boosting relational sequence alignments. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 857–862. IEEE Computer Society, Washington, DC (2008)Google Scholar
  9. 9.
    Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 456–463. ACM, New York (2008)CrossRefGoogle Scholar
  10. 10.
    Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning markov logic networks via functional gradient boosting. In: Proceedings of the IEEE 2011 11th International Conference on Data Mining, ICDM 2011, pp. 320–329. IEEE Computer Society, Washington, DC (2011)Google Scholar
  11. 11.
    Krishnapuram, B.I., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005)CrossRefGoogle Scholar
  12. 12.
    Krogel, M.A., Rawles, S., Zelezny, F., Flach, P.A., Lavrac, N., Wrobel, S.: Comparative evaluation of approaches to propositionalization (2003)Google Scholar
  13. 13.
    Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kfoil: Learning simple relational kernels. In: AAAI, pp. 389–394. AAAI Press (2006)Google Scholar
  14. 14.
    Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Routledge, New York (1993)Google Scholar
  15. 15.
    Matwin, S., Sammut, C. (eds.): ILP 2002. LNCS (LNAI), vol. 2583. Springer, Heidelberg (2003)zbMATHGoogle Scholar
  16. 16.
    Matwin, S., Sammut, C. (eds.): ILP 2002. LNCS (LNAI), vol. 2583. Springer, Heidelberg (2003)zbMATHGoogle Scholar
  17. 17.
    McCreath, E., Sharma, A.: LIME: A system for learning relations. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 336–374. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Muggleton, S.: Inverse entailment and progol. New Generation Computing 13, 245–286 (1995)CrossRefGoogle Scholar
  19. 19.
    Natarajan, S., Khot, T., Kersting, K., Gutmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: The relational dependency network case. Mach. Learn. 86(1), 25–56 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1-2), 107–136 (2006)CrossRefGoogle Scholar
  21. 21.
    Tadepalli, P., Kristian, K., Natarajan, S., Joshi, S., Shavlik, J.: Imitation learning in relational domains: a functional-gradient boosting approach. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1414–1420. AAAI Press (2011)Google Scholar
  22. 22.
    Specia, L., Srinivasan, A., Joshi, S., Ramakrishnan, G., das Graças Volpe Nunes, M.: An investigation into feature construction to assist word sense disambiguation. Machine Learning 76(1), 109–136 (2009)CrossRefGoogle Scholar
  23. 23.
    Srinivasan, A.: The aleph manual (1999)Google Scholar
  24. 24.
    Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artif. Intell. 85(1-2), 277–299 (1996)CrossRefGoogle Scholar
  25. 25.
    Srinivasan, A., Ramakrishnan, G.: Parameter screening and optimisation for ILP using designed experiments. Journal of Machine Learning Research 12, 627–662 (2011)Google Scholar
  26. 26.
    Taskar, B., Abbeel, P., Wong, M.-F., Koller, D.: Relational markov networks. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Amrita Saha
    • 1
  • Ashwin Srinivasan
    • 2
  • Ganesh Ramakrishnan
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of TechnologyBombayIndia
  2. 2.Department of Computer ScienceIndraprastha Institute of TechnologyNew DelhiIndia

Personalised recommendations