What Kinds of Relational Features Are Useful for Statistical Learning?
A workmanlike, but nevertheless very effective combination of statistical and relational learning uses a statistical learner to construct models with features identified (quite often, separately) by a relational learner. This form of model-building has a long history in Inductive Logic Programming (ILP), with roots in the early 1990s with the LINUS system. Additional work has also been done in the field under the categories of propositionalisation and relational subgroup discovery, where a distinction has been made between elementary and non-elementary features, and statistical models have been constructed using one or the other kind of feature. More recently, constructing relational features has become an essential step in many model-building programs in the emerging area of Statistical Relational Learning (SRL). To date, not much work—theoretical or empirical—has been done on what kinds of relational features are sufficient to build good statistical models. On the face of it, the features that are needed are those that capture diverse and complex relational structure. This suggests that the feature-constructor should examine as rich a space as possible, in terms of relational descriptions. One example is the space of all possible features in first-order logic, given constraints of the problem being addressed. Practically, it may be intractable for a relational learner to search such a space effectively for features that may be potentially useful for a statistical learner. Additionally, the statistical learner may also be able to capture some kinds of complex structure by combining simpler features together. Based on these observations, we investigate empirically whether it is acceptable for a relational learner to examine a more restricted space of features than that actually necessary for the full statistical model. Specifically, we consider five sets of features, partially ordered by the subset relation, bounded on top by the set F d , the set of features corresponding to definite clauses subject to domain-specific restrictions; and bounded at the bottom by F e , the set of “elementary” features with substantial additional constraints. Our results suggest that: (a) For relational datasets used in the ILP literature, features from F d may not be required; and (b) Models obtained with a standard statistical learner with features from subsets of features are comparable to the best obtained to date.
KeywordsPredictive Accuracy Feature Class Relational Feature Inductive Logic Programming Subset Relation
Unable to display preview. Download preview PDF.
- 2.Santos Costa, V., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Van Laer, W., Cussens, J., Frisch, A.: Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research 4, 491 (2002)Google Scholar
- 3.Dembczynski, K., Kotlowski, W., Slowinski, R.: Maximum likelihood rule ensembles. In: ICML, pp. 224–231 (2008)Google Scholar
- 6.Jawanpuria, P., Nath, J.S., Ramakrishnan, G.: Efficient rule ensemble learning using hierarchical kernels. In: ICML, pp. 161–168 (2011)Google Scholar
- 7.John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International, pp. 121–129. Morgan Kaufmann (1994)Google Scholar
- 8.Karwath, A., Kersting, K., Landwehr, N.: Boosting relational sequence alignments. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 857–862. IEEE Computer Society, Washington, DC (2008)Google Scholar
- 10.Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning markov logic networks via functional gradient boosting. In: Proceedings of the IEEE 2011 11th International Conference on Data Mining, ICDM 2011, pp. 320–329. IEEE Computer Society, Washington, DC (2011)Google Scholar
- 12.Krogel, M.A., Rawles, S., Zelezny, F., Flach, P.A., Lavrac, N., Wrobel, S.: Comparative evaluation of approaches to propositionalization (2003)Google Scholar
- 13.Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kfoil: Learning simple relational kernels. In: AAAI, pp. 389–394. AAAI Press (2006)Google Scholar
- 14.Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Routledge, New York (1993)Google Scholar
- 21.Tadepalli, P., Kristian, K., Natarajan, S., Joshi, S., Shavlik, J.: Imitation learning in relational domains: a functional-gradient boosting approach. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1414–1420. AAAI Press (2011)Google Scholar
- 23.Srinivasan, A.: The aleph manual (1999)Google Scholar
- 25.Srinivasan, A., Ramakrishnan, G.: Parameter screening and optimisation for ILP using designed experiments. Journal of Machine Learning Research 12, 627–662 (2011)Google Scholar
- 26.Taskar, B., Abbeel, P., Wong, M.-F., Koller, D.: Relational markov networks. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007)Google Scholar