Machine Learning

, Volume 107, Issue 8–10, pp 1257–1281 | Cite as

Global multi-output decision trees for interaction prediction

  • Konstantinos PliakosEmail author
  • Pierre Geurts
  • Celine Vens
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track


Interaction data are characterized by two sets of objects, each described by their own set of features. They are often modeled as networks and the values of interest are the possible interactions between two instances, represented usually as a matrix. Here, a novel global decision tree learning method is proposed, where multi-output decision trees are constructed over the global interaction setting, addressing the problem of interaction prediction as a multi-label classification task. More specifically, the tree is constructed by splitting the interaction matrix both row-wise and column-wise, incorporating this way both interaction dataset features in the learning procedure. Experiments are conducted across several heterogeneous interaction datasets from the biomedical domain. The experimental results indicate the superiority of the proposed method against other decision tree approaches in terms of predictive accuracy, model size and computational efficiency. The performance is boosted by fully exploiting the multi-output structure of the model. We conclude that the proposed method should be considered in interaction prediction tasks, especially where interpretable models are desired.


Decision tree Interaction data Heterogeneous networks Multi-output learning 



  1. Barutcuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836.CrossRefGoogle Scholar
  2. Ben-Hur, A., & Noble, W. S. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(SUPPL. 1), i38–i46.CrossRefGoogle Scholar
  3. Berge, C. (1973). Graphs and hypergraphs. Amsterdam, The Netherlands: North-Holland.zbMATHGoogle Scholar
  4. Bleakley, K., Biau, G., & Vert, J. P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRefGoogle Scholar
  5. Blockeel, H., Raedt, L. D., & Ramon, J.: Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (ICML) (pp. 55–63). Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  6. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.CrossRefGoogle Scholar
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
  8. Davis, J. & Goadrich, M.: The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on machine learning (ICML) (pp. 233–240). New York, USA (2006)Google Scholar
  9. Dembczynski, K., Waegeman, W., Cheng, W., & Hellermeier, E. (2012). On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1–2), 5–45.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Faith, J. J., Hayete, B., Thaden, J. T., Mogno, I., Wierzbowski, J., Cottarel, G., et al. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRefGoogle Scholar
  11. Fan, W., & Bifet, A. (2013). Mining big data: Current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2), 1–5.CrossRefGoogle Scholar
  12. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefzbMATHGoogle Scholar
  13. Geurts, P., Irrthum, A., & Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5(12), 1593–1605.CrossRefGoogle Scholar
  14. Guo, X., Liu, F., Ju, Y., Wang, Z., & Wang, C. (2016). Human protein subcellular localization with integrated source and multi-label ensemble classifier. Scientific Reports, 6, 28087.CrossRefGoogle Scholar
  15. Henriques, R., Antunes, C., & Madeira, S. C. (2015). A structured view on pattern mining-based biclustering. Pattern Recognition, 48(12), 3941–3958.CrossRefGoogle Scholar
  16. Huang, L., Liao, L., & Wu, C. H. (2016). Protein-protein interaction prediction based on multiple kernels and partial network with linear programming. BMC Systems Biology, 10(S2), 45.CrossRefGoogle Scholar
  17. Joly, A., Geurts, P., & Wehenkel, L.: Random forests with random projections of the output space for high dimensional multi-label classification. In Proceedings of the European conference on machine learning and knowledge discovery in databases, (ECML PKDD) (Vol. 8724, pp. 607–622) (2014)Google Scholar
  18. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRefGoogle Scholar
  20. Kuhn, M., von Mering, C., Campillos, M., Jensen, L. J., & Bork, P. (2007). Stitch: Interaction networks of chemicals and proteins. Nucleic Acids Research, 36(suppl–1), D684–D688.CrossRefGoogle Scholar
  21. Lanckriet, G., & Cristianini, N. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5(Jan), 27–72.MathSciNetzbMATHGoogle Scholar
  22. Li, X., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880–890.CrossRefGoogle Scholar
  23. MacIsaac, K. D., Wang, T., Gordon, D. B., Gifford, D. K., Stormo, G. D., & Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 113.CrossRefGoogle Scholar
  24. Mayer-Schönberger, V., & Cukier, K. (2014). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.Google Scholar
  25. Menon, A. K., & Elkan, C. (2010). Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), 327–343.MathSciNetCrossRefGoogle Scholar
  26. Nascimento, A. C. A., Prudêncio, R. B. C., & Costa, I. G. (2016). A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics, 17(1), 46.CrossRefGoogle Scholar
  27. Papagiannopoulou, C., Tsoumakas, G., & Tsamardinos, I.: Discovering and exploiting deterministic label relationships in multi-label learning. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD) (pp. 915–924) (2015)Google Scholar
  28. Park, Y., & Marcotte, E. M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRefGoogle Scholar
  29. Pratanwanich, N., Lio, P., & Stegle, O.: Warped matrix factorisation for multi-view data integration. In Joint European conference on machine learning and knowledge discovery in databases (pp. 789–804). Springer (2016)Google Scholar
  30. Qi, G. J., Hua, X. S., Rui, Y., Tang, J., Mei, T., & Zhang, H. J.: Correlative multi-label video annotation. In Proceedings of the 15th ACM international conference on Multimedia (pp. 17–26). New York, USA (2007)Google Scholar
  31. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.MathSciNetCrossRefGoogle Scholar
  32. Ruan, J., & Zhang, W. (2006). A bi-dimensional regression tree approach to the modeling of gene expression regulation. Bioinformatics, 22(3), 332–340.CrossRefGoogle Scholar
  33. Schrynemackers, M., Kueffner, R., & Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRefGoogle Scholar
  34. Schrynemackers, M., Wehenkel, L., Babu, M. M., & Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular Biosystems, 11(8), 2116–25.CrossRefGoogle Scholar
  35. Seal, A., Ahn, Y. Y., & Wild, D. J. (2015). Optimizing drug target interaction prediction based on random walk on heterogeneous networks. Journal of Cheminformatics, 7(1), 40.CrossRefGoogle Scholar
  36. Stock, M., Pahikkala, T., Airola, A., De Baets, B., & Waegeman, W. (2016). Efficient pairwise learning using kernel ridge regression: An exact two-step method. arXiv preprint arXiv:1606.04275.
  37. Stojanova, D., Ceci, M., Appice, A., & Džeroski, S. (2012). Network regression with predictive clustering trees. Data Mining and Knowledge Discovery, 25(2), 378–413.MathSciNetCrossRefzbMATHGoogle Scholar
  38. Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: Principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3(2), 1–159.MathSciNetCrossRefGoogle Scholar
  39. Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRefGoogle Scholar
  40. Tang, L., Rajan, S., & Narayanan, V.K.: Large scale multi-label classification via metalabeler. In Proceedings of the 18th international conference on World wide web (WWW) (pp. 211–220). New York, USA (2009)Google Scholar
  41. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook. Boston: Springer.Google Scholar
  42. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089.CrossRefGoogle Scholar
  43. Tsoumakas, G., Zhang, M. L., & Zhou, Z. H. (2012). Introduction to the special issue on learning from multi-label data. Machine Learning, 88(1–2), 1–4.MathSciNetCrossRefzbMATHGoogle Scholar
  44. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214.CrossRefGoogle Scholar
  45. Vert, J. P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In H. M. Lodhi & S. H. Muggleton (Eds.), Elements of computational systems biology (pp. 165–188). New York: Wiley.Google Scholar
  46. Witten, I. H., Frank, E., & Hall, M. A. (2016). Data mining: Practical machine learning tools and techniques (4th ed.). Burlington: Morgan Kaufmann.Google Scholar
  47. Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRefGoogle Scholar
  48. Yin, S., Li, X., Gao, H., & Kaynak, O. (2015). Data-based techniques focused on modern industry: An overview. IEEE Transactions on Industrial Electronics, 62(1), 657–667.CrossRefGoogle Scholar
  49. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.CrossRefzbMATHGoogle Scholar
  50. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar
  51. Zhang, W., Liu, F., Luo, L., & Zhang, J. (2015). Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinformatics, 16(1), 365.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Public Health and Primary Care, KU LeuvenCampus KULAKKortrijkBelgium
  2. 2.Department of Electrical Engineering and Computer Science, Montefiore InstituteUniversity of LiègeLiègeBelgium

Personalised recommendations