Propositionalisation of Multi-instance Data Using Random Forests
Multi-instance learning is a generalisation of attribute-value learning where examples for learning consist of labeled bags (i.e. multi-sets) of instances. This learning setting is more computationally challenging than attribute-value learning and a natural fit for important application areas of machine learning such as classification of molecules and image classification. One approach to solve multi-instance learning problems is to apply propositionalisation, where bags of data are converted into vectors of attribute-value pairs so that a standard propositional (i.e. attribute-value) learning algorithm can be applied. This approach is attractive because of the large number of propositional learning algorithms that have been developed and can thus be applied to the propositionalised data. In this paper, we empirically investigate a variant of an existing propositionalisation method called TLC. TLC uses a single decision tree to obtain propositionalised data. Our variant applies a random forest instead and is motivated by the potential increase in robustness that this may yield. We present results on synthetic and real-world data from the above two application domains showing that it indeed yields increased classification accuracy when applying boosting and support vector machines to classify the propositionalised data.
KeywordsSupport Vector Machine Random Forest Linear Support Vector Machine Instance Space Tree Ensemble
Unable to display preview. Download preview PDF.
- 1.Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proc. Conf. on Neural Information Processing Systems, pp. 561–568. MIT Press (2003)Google Scholar
- 3.Blockeel, H., Page, D., Srinivasan, A.: Multi-instance tree learning. In: Proc. 22nd Int. Conf. on Machine Learning, pp. 57–64. ACM (2005)Google Scholar
- 9.Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Relational Data Mining, pp. 262–286. Springer (2000)Google Scholar
- 11.Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Proc. Conf. on Neural Information Processing Systems, pp. 570–576. MIT Press (1998)Google Scholar
- 12.Mayo, M.: Effective classifiers for detecting objects. In: Proc. 4th Int. Conf. on Computational Intelligence, Robotics, and Autonomous Systems (2007)Google Scholar
- 16.Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proc. 4th Int Workshop on Inductive Logic Programming, pp. 217–232. GMD (1994)Google Scholar
- 17.Wang, C., Scott, S., Zhang, J., Tao, Q., Fomenko, D., Gladyshev, V.: A study in modeling low-conservation protein superfamilies. Tech. rep., Department of Comp. Sci., University of Nebraska-Lincoln (2004)Google Scholar