Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1370–1394 | Cite as

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

  • Michiel StockEmail author
  • Krzysztof Dembczyński
  • Bernard De Baets
  • Willem Waegeman


Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-K predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach.


Top-K retrieval Exact inference Precision at K Multi-target prediction 



Part of this work was carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation and the Flemish Government - department EWI. K. Dembczyński has been supported by the Polish National Science Centre under grant no. 2013/09/D/ST6/03917.


  1. Agarwal D, Gurevich M (2012) Fast top-k retrieval for model based recommendation. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 483–492Google Scholar
  2. Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd international conference on world wide web, pp 13–24Google Scholar
  3. Ahn YY, Ahnert SE, Bagrow JP, Barabási AL (2011) Flavor network and the principles of food pairing. Sci Rep 1:196CrossRefGoogle Scholar
  4. Basilico J, Hofmann T (2004) Unifying collaborative and content-based filtering. In: Proceedings of the 21st international conference on machine learning, pp 9–16Google Scholar
  5. Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Proceedings of the 16th annual conference on computational learning theory and 7th kernel workshop, pp 567–580Google Scholar
  6. Ben-Hur A, Noble WS (2005) Kernel methods for predicting protein-protein interactions. Bioinformatics 21(Suppl 1):i38–i46CrossRefGoogle Scholar
  7. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517CrossRefzbMATHGoogle Scholar
  8. Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104Google Scholar
  9. Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, pp 55–63Google Scholar
  10. Caruana R (1997) Multitask learning. Mach Learn 75:41–75CrossRefGoogle Scholar
  11. Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proceedings of the 18th international conference on world wide web, pp 691–700Google Scholar
  12. De Clercq M, Stock M, De Baets B, Waegeman W (2015) Data-driven recipe completion using machine learning methods. Trends Food Sci Technol 49:1–13CrossRefGoogle Scholar
  13. De Paepe A, Van Peer G, Stock M, Volders PJ, Vandesompele J, De Baets B, Waegeman W (2015) miRNA target prediction through modeling quantitative and qualitative miRNA binding site information in a stacked model structure. Nucleic Acid ResGoogle Scholar
  14. Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetCrossRefzbMATHGoogle Scholar
  15. Ding H, Takigawa I, Mamitsuka H, Zhu S (2013) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 14(5):734–747Google Scholar
  16. Drineas P, Mahoney M (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetzbMATHGoogle Scholar
  17. Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning, pp 147–153Google Scholar
  18. Evgeniou T (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetzbMATHGoogle Scholar
  19. Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117Google Scholar
  20. Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99MathSciNetCrossRefzbMATHGoogle Scholar
  21. Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656MathSciNetCrossRefzbMATHGoogle Scholar
  22. Fan W, Huai JP (2014) Querying big data: bridging theory and practice. J Comput Sci Technol 29(5):849–869MathSciNetCrossRefGoogle Scholar
  23. Goel S, Langford J, Strehl A (2009) Predictive indexing for fast search. In: Advances in neural information processing systems, pp. 505–512Google Scholar
  24. Gönen M (2012) Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 28(18):2304–10CrossRefGoogle Scholar
  25. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRefzbMATHGoogle Scholar
  26. Hue M, Riffle M, Vert JP, Noble WS (2010) Large-scale prediction of protein-protein interactions from structures. BMC Bioinform 11(144):1–10Google Scholar
  27. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRefGoogle Scholar
  28. Jacob L, Hoffmann B, Stoven V, Vert JP (2008) Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinform 9(1):1–16CrossRefGoogle Scholar
  29. Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156CrossRefGoogle Scholar
  30. Jalali A, Sanghavi S, Ravikumar P, Ruan C (2010) A dirty model for multi-task learning. In: Neural Information processing symposium, pp 964–972Google Scholar
  31. Koenigstein N, Ram P, Shavitt Y (2012) Efficient retrieval of recommendations in a matrix factorization framework. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 535–544Google Scholar
  32. Lee J, Sun M, Lebanon G (2011) A comparative study of collaborative filtering algorithms. ACM Trans Web 5(1):1–27CrossRefGoogle Scholar
  33. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  34. Mineiro P, Karampatziakis N (2014) Fast label embeddings for extremely large output spaces. CoRR abs/1412.6 Google Scholar
  35. Omohundro SM (1989) Five balltree construction algorithms. Science 51:1–22Google Scholar
  36. Pahikkala T, Airola A, Stock M, De Baets B, Waegeman W (2013) Efficient regularized least-squares algorithms for conditional ranking on relational data. Mach Learn 93(2–3):321–356MathSciNetCrossRefzbMATHGoogle Scholar
  37. Pahikkala T, Stock M, Airola A, Aittokallio T, De Baets B, Waegeman W (2014) A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. Lect Notes Comput Sci 8725:517–532CrossRefGoogle Scholar
  38. Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini MR, Galinari P (2015) LSHTC: a benchmark for large-scale text classification. submitted to CoRR pp 1–9Google Scholar
  39. Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web—WWW ’01. ACM Press, New York, pp 285–295Google Scholar
  40. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  41. Shrivastava A, Li P (2014) Asymmetric lsh (ALSH) for sublinear time maximum inner product search (MIPS). Adv Neural Inf Process Syst 27:2321–2329Google Scholar
  42. Shrivastava A, Li P (2015) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). In: Proceedings of the conference on uncertainty in artificial intelligenceGoogle Scholar
  43. Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W (2014) Identification of functionally related enzymes by learning-to-rank methods. IEEE Trans Comput Biol Bioinform 11(6):1157–1169CrossRefGoogle Scholar
  44. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19CrossRefGoogle Scholar
  45. Takács G, Pilászy I, Németh B, Tikk D (2008) Matrix factorization and neighbor based algorithms for the netflix prize problem. In: Proceedings of the 2008 ACM conference on recommender systems. ACM Press, New York, pp 267–274Google Scholar
  46. Tipping M, Bishop C (1997) Probabilistic principal component analysis. J Roy Stat Soc 3:611–622MathSciNetzbMATHGoogle Scholar
  47. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13CrossRefGoogle Scholar
  48. Vert JP, Qiu J, Noble WS (2007) A new pairwise kernel forbiological network inference with support vector machines. BMC Bioinform 8(10):S8CrossRefGoogle Scholar
  49. Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 8:30–37Google Scholar
  50. Waegeman W, Pahikkala T, Airola A, Salakoski T, Stock M, De Baets B (2012) A kernel-based framework for learning graded relations from data. IEEE Trans Fuzzy Syst 20(6):1090–1101CrossRefGoogle Scholar
  51. Wang C, Liu J, Luo F, Deng Z, Hu QN (2015) Predicting target-ligand interactions using protein ligand-binding site and ligand substructures. BMC Syst Biol 9(S–1):S2CrossRefGoogle Scholar
  52. Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V (2006) Kernel dependency estimation. Adv Neur Inf Process Syst 39:440–450Google Scholar
  53. Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26(12):i246–54CrossRefGoogle Scholar
  54. Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6CrossRefGoogle Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Michiel Stock
    • 1
    Email author
  • Krzysztof Dembczyński
    • 2
  • Bernard De Baets
    • 1
  • Willem Waegeman
    • 1
  1. 1.KERMIT, Department of Mathematical Modelling, Statistics and BioinformaticsGhent UniversityGhentBelgium
  2. 2.Institute of Computing SciencePoznan University of TechnologyPoznanPoland

Personalised recommendations