A Two-Step Learning Approach for Solving Full and Almost Full Cold Start Problems in Dyadic Prediction

  • Tapio Pahikkala
  • Michiel Stock
  • Antti Airola
  • Tero Aittokallio
  • Bernard De Baets
  • Willem Waegeman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8725)


Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is to train a model that makes predictions based on a pairwise feature representation of the dyads, or, in case of kernel methods, based on a tensor product pairwise kernel. As an alternative to such a kernel approach, we introduce a novel two-step learning algorithm that borrows ideas from the fields of pairwise learning and spectral filtering. We show theoretically that the two-step method is very closely related to the tensor product kernel approach, and experimentally that it yields a slightly better predictive performance. Moreover, unlike existing tensor product kernel methods, the two-step method allows closed-form solutions for training and parameter selection via cross-validation estimates both in the full and almost full cold start settings, making the approach much more efficient and straightforward to implement.


Dyadic prediction pairwise learning transfer learning kernel ridge regression kernel methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adams, R.P., Dahl, G.E., Murray, I.: Incorporating side information into probabilistic matrix factorization using Gaussian processes. In: The 26th Conference on Uncertainty in Artificial Intelligence, pp. 1–9 (2010)Google Scholar
  2. 2.
    Álvarez, M., Rosasco, L., Lawrence, N.: Kernels for vector-valued functions: a review. Foundation and Trends in Machine Learning 4(3), 195–266 (2012)CrossRefGoogle Scholar
  3. 3.
    Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society 68 (1950)Google Scholar
  4. 4.
    Baldassarre, L., Rosasco, L., Barla, A., Verri, A.: Multi-output learning via spectral filtering. Machine Learning 87(3), 259–301 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Basilico, J., Hofmann, T.: Unifying collaborative and content-based filtering. In: 21st International Conference on Machine Learning, ICML 2004 (2004)Google Scholar
  6. 6.
    Bauer, F., Pereverzev, S., Rosasco, L.: On regularization algorithms in learning theory. Journal of Complexity 23(1), 52–72 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Ben-Hur, A., Noble, W.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl. 1), 38–46 (2005)CrossRefGoogle Scholar
  8. 8.
    Bonilla, E.V., Agakov, F., Williams, C.: Kernel multi-task learning using task-specific features. In: The 11th International Conference on Artificial Intelligence and Statistics, AISTATS 2007, pp. 43–50 (2007)Google Scholar
  9. 9.
    Davis, M.I., Hunt, J.P., Herrgard, S., Ciceri, P., Wodicka, L.M., Pallares, G., Hocker, M., Treiber, D.K., Zarrinkar, P.P.: Comprehensive analysis of kinase inhibitor selectivity. Nature biotechnology 29(11), 1046–1051 (2011)CrossRefGoogle Scholar
  10. 10.
    Fang, Y., Si, L.: Matrix co-factorization for recommendation with rich side information and implicit feedback. In: 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems, pp. 65–69 (2011)Google Scholar
  11. 11.
    Gönen, M., Heller, G.: Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92(4), 965–970 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Hayashi, K., Takenouchi, T., Tomioka, R., Kashima, H.: Self-measuring similarity for multi-task gaussian process. In: ICML Workshop on Unsupervised and Transfer Learning, JMLR Proceedings, vol. 27, pp. 145–154 (2012)Google Scholar
  13. 13.
    Jacob, L., Vert, J.: Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19), 2149–2156 (2008)CrossRefGoogle Scholar
  14. 14.
    Kashima, H., Kato, T., Yamanishi, Y., Sugiyama, M., Tsuda, K.: Link propagation: A fast semi-supervised learning algorithm for link prediction. In: SIAM International Conference on Data Mining (SDM 2009), pp. 1099–1110 (2009)Google Scholar
  15. 15.
    Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRefGoogle Scholar
  16. 16.
    Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: 23rd National Conference on Artificial Intelligence (AAAI 2008), pp. 646–651 (2008)Google Scholar
  17. 17.
    Leslie, C., Eskin, E., Noble, W.S.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing, pp. 564–575 (2002)Google Scholar
  18. 18.
    Lo Gerfo, L., Rosasco, L., Odone, F., De Vito, E., Verri, A.: Spectral algorithms for supervised learning. Neural Computation 20(7), 1873–1897 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Martin, C.D., Van Loan, C.F.: Shifted Kronecker product systems. SIAM Journal on Matrix Analysis and Applications 29(1), 184–198 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Menon, A., Elkan, C.: A log-linear model with latent features for dyadic prediction. In: The 10th IEEE International Conference on Data Mining (ICDM), pp. 364–373 (2010)Google Scholar
  21. 21.
    Oyama, S., Manning, C.: Using feature conjunctions across examples for learning pairwise classifiers. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 322–333. Springer, Heidelberg (2004)Google Scholar
  22. 22.
    Pahikkala, T., Airola, A., Pietilä, S., Shakyawar, S., Szwajda, A., Tang, J., Aittokallio, T.: Toward more realistic drug-target interaction predictions. Briefings in Bioinformatics (in press, 2014), doi:10.1093/bib/bbu010Google Scholar
  23. 23.
    Pahikkala, T., Airola, A., Stock, M., De Baets, B., Waegeman, W.: Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning 93(2–3), 321–356 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Pahikkala, T., Suominen, H., Boberg, J.: Efficient cross-validation for kernelized least-squares regression with sparse basis expansions. Machine Learning 87(3), 381–407 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Pahikkala, T., Waegeman, W., Airola, A., Salakoski, T., De Baets, B.: Conditional ranking on relational data. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 499–514. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Pahikkala, T., Waegeman, W., Tsivtsivadze, E., Salakoski, T., De Baets, B.: Learning intransitive reciprocal relations with kernel methods. European Journal of Operational Research 206(3), 676–685 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  28. 28.
    Park, S.T., Chu, W.: Pairwise preference regression for cold-start recommendation. In: 3rd ACM Conference on Recommender Systems, pp. 21–28 (2009)Google Scholar
  29. 29.
    Park, Y., Marcotte, E.M.: Flaws in evaluation schemes for pair-input computational predictions. Nature Methods 9(12), 1134–1136 (2012)CrossRefGoogle Scholar
  30. 30.
    Raymond, R., Kashima, H.: Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 131–147. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  31. 31.
    Rifkin, R., Lippert, R.: Notes on regularized least squares. Tech. Rep. MIT-CSAIL-TR-2007-025, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA (2007)Google Scholar
  32. 32.
    Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.R., Rätsch, G., Smola, A.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10(5), 1000–1017 (1999)CrossRefGoogle Scholar
  33. 33.
    Schrynemackers, M., Küffner, R., Geurts, P.: On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics 4, 262 (2013)CrossRefGoogle Scholar
  34. 34.
    Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P.: Classifying pairs with trees for supervised biological network inference (2014) (submitted manuscript)Google Scholar
  35. 35.
    Shan, H., Banerjee, A.: Generalized probabilistic matrix factorizations for collaborative filtering. In: The 10th IEEE International Conference on Data Mining (ICDM), pp. 1025–1030 (2010)Google Scholar
  36. 36.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2, 67–93 (2002)zbMATHMathSciNetGoogle Scholar
  37. 37.
    Van Loan, C.F.: The ubiquitous kronecker product. Journal of Computational and Applied Mathematics 123(1–2), 85–100 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    Waegeman, W., Pahikkala, T., Airola, A., Salakoski, T., Stock, M., De Baets, B.: A kernel-based framework for learning graded relations from data. IEEE Transactions on Fuzzy Systems 20(6), 1090–1101 (2012)CrossRefGoogle Scholar
  39. 39.
    Zhou, T., Shan, H., Banerjee, A., Sapiro, G.: Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In: 12th SIAM International Conference on Data Mining, pp. 403–414 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Tapio Pahikkala
    • 1
  • Michiel Stock
    • 2
  • Antti Airola
    • 1
  • Tero Aittokallio
    • 3
  • Bernard De Baets
    • 2
  • Willem Waegeman
    • 2
  1. 1.University of Turku and Turku Centre for Computer ScienceTurkuFinland
  2. 2.Department of Mathematical Modelling, Statistics and BioinformaticsGhent UniversityGhentBelgium
  3. 3.Institute for Molecular Medicine Finland (FIMM)University of HelsinkiHelsinkiFinland

Personalised recommendations