Propositionalisation of Multi-instance Data Using Random Forests

  • Eibe Frank
  • Bernhard Pfahringer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8272)


Multi-instance learning is a generalisation of attribute-value learning where examples for learning consist of labeled bags (i.e. multi-sets) of instances. This learning setting is more computationally challenging than attribute-value learning and a natural fit for important application areas of machine learning such as classification of molecules and image classification. One approach to solve multi-instance learning problems is to apply propositionalisation, where bags of data are converted into vectors of attribute-value pairs so that a standard propositional (i.e. attribute-value) learning algorithm can be applied. This approach is attractive because of the large number of propositional learning algorithms that have been developed and can thus be applied to the propositionalised data. In this paper, we empirically investigate a variant of an existing propositionalisation method called TLC. TLC uses a single decision tree to obtain propositionalised data. Our variant applies a random forest instead and is motivated by the potential increase in robustness that this may yield. We present results on synthetic and real-world data from the above two application domains showing that it indeed yields increased classification accuracy when applying boosting and support vector machines to classify the propositionalised data.


Support Vector Machine Random Forest Linear Support Vector Machine Instance Space Tree Ensemble 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proc. Conf. on Neural Information Processing Systems, pp. 561–568. MIT Press (2003)Google Scholar
  2. 2.
    Bjerring, L., Frank, E.: Beyond trees: Adopting MITI to learn rules and ensemble classifiers for multi-instance data. In: Wang, D., Reynolds, M. (eds.) AI 2011. LNCS, vol. 7106, pp. 41–50. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Blockeel, H., Page, D., Srinivasan, A.: Multi-instance tree learning. In: Proc. 22nd Int. Conf. on Machine Learning, pp. 57–64. ACM (2005)Google Scholar
  4. 4.
    Braddock, P.S., Hu, D.E., Fan, T.P., Stratford, I., Harris, A.L., Bicknell, R.: A structure-activity analysis of antagonism of the growth factor and angiogenic activity of basic fibroblast growth factor by suramin and related polyanions. Br. J. Cancer 69(5), 890–898 (1994)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)CrossRefzbMATHGoogle Scholar
  7. 7.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefzbMATHGoogle Scholar
  8. 8.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  9. 9.
    Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Relational Data Mining, pp. 262–286. Springer (2000)Google Scholar
  10. 10.
    Krogel, M.-A., Rawles, S., Železný, F., Flach, P.A., Lavrač, N., Wrobel, S.: Comparative evaluation of approaches to propositionalization. In: Horváth, T., Yamamoto, A. (eds.) ILP 2003. LNCS (LNAI), vol. 2835, pp. 197–214. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Proc. Conf. on Neural Information Processing Systems, pp. 570–576. MIT Press (1998)Google Scholar
  12. 12.
    Mayo, M.: Effective classifiers for detecting objects. In: Proc. 4th Int. Conf. on Computational Intelligence, Robotics, and Autonomous Systems (2007)Google Scholar
  13. 13.
    Nadeau, C., Bengio, Y.: Inference for the Generalization Error. Machine Learning 52(3), 239–281 (2003)CrossRefzbMATHGoogle Scholar
  14. 14.
    Opelt, A., Pinz, A., Fussenegger, M., Auer, P.: Generic object recognition with boosting. IEEE Transaction on Pattern Analysis and Machine Intelligence 28(3), 416–431 (2006)CrossRefGoogle Scholar
  15. 15.
    Reutemann, P., Pfahringer, B., Frank, E.: A toolbox for learning from relational data with propositional and multi-instance learners. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1017–1023. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proc. 4th Int Workshop on Inductive Logic Programming, pp. 217–232. GMD (1994)Google Scholar
  17. 17.
    Wang, C., Scott, S., Zhang, J., Tao, Q., Fomenko, D., Gladyshev, V.: A study in modeling low-conservation protein superfamilies. Tech. rep., Department of Comp. Sci., University of Nebraska-Lincoln (2004)Google Scholar
  18. 18.
    Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 468–479. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Chevaleyre, Y., Zucker, J.-D.: Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. Application to the mutagenesis problem. In: Stroulia, E., Matwin, S. (eds.) AI 2001. LNCS (LNAI), vol. 2056, pp. 204–214. Springer, Heidelberg (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Eibe Frank
    • 1
  • Bernhard Pfahringer
    • 1
  1. 1.Department of Computer ScienceUniversity of WaikatoNew Zealand

Personalised recommendations