Advertisement

Multiple Instance Learning with Bag-Level Randomized Trees

  • Tomáš KomárekEmail author
  • Petr Somol
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

Knowledge discovery in databases with a flexible structure poses a great challenge to machine learning community. Multiple Instance Learning (MIL) aims at learning from samples (called bags) represented by multiple feature vectors (called instances) as opposed to single feature vectors characteristic for the traditional data representation. This relaxation turns out to be useful in formulating many machine learning problems including classification of molecules, cancer detection from tissue images or identification of malicious network communications. However, despite the recent progress in this area, the current set of MIL tools still seems to be very application specific and/or burdened with many tuning parameters or processing steps. In this paper, we propose a simple, yet effective tree-based algorithm for solving MIL classification problems. Empirical evaluation against 28 classifiers on 29 publicly available benchmark datasets shows a high level performance of the proposed solution even with its default parameter settings. Data related to this paper are available at: https://github.com/komartom/MIDatasets.jl. Code related to this paper is available at: https://github.com/komartom/BLRT.jl.

Keywords

Multiple Instance Learning Randomized trees Classification 

Notes

Acknowledgments

This research has been supported by the Grant Agency of the Czech Technical University in Prague, grant No. SGS16/235/OHK3/3T/13.

References

  1. 1.
    Amores, J.: Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).  https://doi.org/10.1016/j.artint.2013.06.003MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS 2002, pp. 577–584. MIT Press, Cambridge (2002). http://dl.acm.org/citation.cfm?id=2968618.2968690
  3. 3.
    Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 983–990, June 2009.  https://doi.org/10.1109/CVPR.2009.5206737
  4. 4.
    Blockeel, H., Page, D., Srinivasan, A.: Multi-instance tree learning. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 57–64. ACM, New York (2005). http://doi.acm.org/10.1145/1102351.1102359
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001).  https://doi.org/10.1145/1102351.1102359CrossRefzbMATHGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)zbMATHGoogle Scholar
  7. 7.
    Chen, Y., Bi, J., Wang, J.Z.: Miles: multiple-instance learning via embeddedinstance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006).  https://doi.org/10.1109/TPAMI.2006.248CrossRefGoogle Scholar
  8. 8.
    Cheplygina, V., Tax, D.M.J.: Characterizing multiple instance datasets. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 15–27. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24261-3_2CrossRefGoogle Scholar
  9. 9.
    Cheplygina, V., Tax, D.M., Loog, M.: Multiple instance learning with bag dissimilarities. Pattern Recogn. 48(1), 264–275 (2015).  https://doi.org/10.1016/j.patcog.2014.07.022. http://www.sciencedirect.com/science/article/pii/S0031320314002817CrossRefGoogle Scholar
  10. 10.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548MathSciNetzbMATHGoogle Scholar
  11. 11.
    Dietterich, T.G., Lathrop, R.H., Lozano-Prez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1), 31–71 (1997).  https://doi.org/10.1016/S0004-3702(96)00034-3. http://www.sciencedirect.com/science/article/pii/S0004370296000343CrossRefzbMATHGoogle Scholar
  12. 12.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014). http://dl.acm.org/citation.cfm?id=2627435.2697065MathSciNetzbMATHGoogle Scholar
  13. 13.
    Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML 2002, pp. 179–186. Morgan Kaufmann Publishers Inc., San Francisco (2002). http://dl.acm.org/citation.cfm?id=645531.656014
  14. 14.
    Gehler, P.V., Chapelle, O.: Deterministic annealing for multiple-instance learning. In: Artificial Intelligence and Statistics, pp. 123–130 (2007)Google Scholar
  15. 15.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefGoogle Scholar
  16. 16.
    Kohout, J., Komárek, T., Čech, P., Bodnár, J., Lokoč, J.: Learning communication patterns for malware discovery in https data. Expert Systems with Applications 101, 129–142 (2018).  https://doi.org/10.1016/j.eswa.2018.02.010. http://www.sciencedirect.com/science/article/pii/S0957417418300794CrossRefGoogle Scholar
  17. 17.
    Leistner, C., Saffari, A., Bischof, H.: MIForests: multiple-instance learning with randomized trees. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 29–42. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15567-3_3CrossRefGoogle Scholar
  18. 18.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986).  https://doi.org/10.1023/A:1022643204877CrossRefGoogle Scholar
  19. 19.
    Ruffo, G.: Learning single and multiple instance decision trees for computer security applications. University of Turin, Torino (2000)Google Scholar
  20. 20.
    Stiborek, J., Pevný, T., Rehák, M.: Multiple instance learning for malware classification. Expert Syst. Appl. 93, 346–357 (2018).  https://doi.org/10.1016/j.eswa.2017.10.036. http://www.sciencedirect.com/science/article/pii/S0957417417307170CrossRefGoogle Scholar
  21. 21.
    Straehle, C., Kandemir, M., Koethe, U., Hamprecht, F.A.: Multiple instance learning with response-optimized random forests. In: 2014 22nd International Conference on Pattern Recognition, pp. 3768–3773, August 2014.  https://doi.org/10.1109/ICPR.2014.647
  22. 22.
    Tax, D.M.J.: A matlab toolbox for multiple-instance learning, version 1.2.2, Faculty EWI, Delft University of Technology, The Netherlands, April 2017Google Scholar
  23. 23.
    Zhang, C., Platt, J.C., Viola, P.A.: Multiple instance boosting for object detection. In: Weiss, Y., Schölkopf, B., Platt, J.C. (eds.) Advances in Neural Information Processing Systems, vol. 18, pp. 1417–1424. MIT Press (2006). http://papers.nips.cc/paper/2926-multiple-instance-boosting-for-object-detection.pdf
  24. 24.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vision 73(2), 213–238 (2007)CrossRefGoogle Scholar
  25. 25.
    Zhang, Q., Goldman, S.A.: Em-dd: an improved multiple-instance learning technique. In: Advances in Neural Information Processing Systems, pp. 1073–1080. MIT Press (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Faculty of Electrical EngineeringCzech Technical University in PraguePrague 6Czech Republic
  2. 2.Institute of Information Theory and AutomationCzech Academy of SciencesPrague 8Czech Republic
  3. 3.Faculty of ManagementUniversity of EconomicsPragueCzech Republic

Personalised recommendations