Machine Learning

, Volume 97, Issue 1–2, pp 103–127 | Cite as

A constrained matrix-variate Gaussian process for transposable data

Article

Abstract

Transposable data represents interactions among two sets of entities, and are typically represented as a matrix containing the known interaction values. Additional side information may consist of feature vectors specific to entities corresponding to the rows and/or columns of such a matrix. Further information may also be available in the form of interactions or hierarchies among entities along the same mode (axis). We propose a novel approach for modeling transposable data with missing interactions given additional side information. The interactions are modeled as noisy observations from a latent noise free matrix generated from a matrix-variate Gaussian process. The construction of row and column covariances using side information provides a flexible mechanism for specifying a-priori knowledge of the row and column correlations in the data. Further, the use of such a prior combined with the side information enables predictions for new rows and columns not observed in the training data. In this work, we combine the matrix-variate Gaussian process model with low rank constraints. The constrained Gaussian process approach is applied to the prediction of hidden associations between genes and diseases using a small set of observed associations as well as prior covariances induced by gene-gene interaction networks and disease ontologies. The proposed approach is also applied to recommender systems data which involves predicting the item ratings of users using known associations as well as prior covariances induced by social networks. We present experimental results that highlight the performance of constrained matrix-variate Gaussian process as compared to state of the art approaches in each domain.

Keywords

Constrained Bayesian inference Gaussian process Transposable data Nuclear norm Low rank 

References

  1. Abernethy, J., Bach, F., Evgeniou, T., & Vert, J. P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. JMLR: The Journal of Machine Learning Research, 10, 803–826.MATHGoogle Scholar
  2. Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., et al. (2006). Gene prioritization through genomic data fusion. Nature Biotechnology, 24(5), 537–544.CrossRefGoogle Scholar
  3. Allen, G. I., & Tibshirani, R. (2010). Transposable regularized covariance models with an application to missing data imputation. The Annals of Applied Statistics, 4(2), 764–790.MathSciNetCrossRefMATHGoogle Scholar
  4. Allen, G. I., & Tibshirani, R. (2012). Inference with transposable data: Modelling the effects of row and column correlations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(4), 721–743.Google Scholar
  5. Altun, Y., & Smola, A. J. (2006). Unifying divergence minimization and statistical inference via convex duality. In: COLT.Google Scholar
  6. Álvarez, M. A., Rosasco, L., & Lawrence, N. D. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4(3), 195–266.CrossRefGoogle Scholar
  7. Bauer, H. (1996). Probability Theory. De Gruyter Studies in Mathematics Series: De Gruyter.Google Scholar
  8. Berger, A. L., Pietra, V. J. D., & Pietra, S. A. D. (1996). A maximum entropy approach to natural language processing. Comput Linguist, 22(1), 39–71.Google Scholar
  9. Berlinet, A., & Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Boston, Dordrecht, London: Kluwer Academic Publishers.CrossRefMATHGoogle Scholar
  10. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ, USA: Springer.Google Scholar
  11. Bonilla, E., Chai, K. M., & Williams, C. (2008). Multi-task gaussian process prediction. In: NIPS ,20, 153–160.Google Scholar
  12. Borwein, J., & Zhu, Q. (2005). Techniques of variational analysis, CMS books in mathematics. Berlin: Springer.Google Scholar
  13. Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772.MathSciNetCrossRefMATHGoogle Scholar
  14. Csató, L. (2002). Gaussian processes: Iterative sparse approximations. PhD thesis, Aston University.Google Scholar
  15. Dudík, M., Phillips, S. J., & Schapire, R. E. (2007). Maximum entropy density estimation with generalized regularization and an application to species distribution modeling. Journal of Machine Learning Research, 8, 1217–1260.MATHGoogle Scholar
  16. Dudik, M., Harchaoui, Z., Malick, J., et al. (2012). Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS-proceedings of the fifteenth international conference on artificial intelligence and statistics-2012, Vol. 22.Google Scholar
  17. Ganchev, K., & Ja, Graça. (2010). Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11, 2001–2049.MATHGoogle Scholar
  18. Gelfand, A. E., Smith, A. F. M., & Lee, T. M. (1992). Bayesian analysis of constrained parameter and truncated data problems using gibbs sampling. Journal of the American Statistical Association, 87(418), 523–532.MathSciNetCrossRefGoogle Scholar
  19. Hu, Y., Koren, Y., Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In: Data Mining, 2008. ICDM’08. Eighth IEEE international conference on, IEEE, pp. 263–272.Google Scholar
  20. Jaakkola, T., Meila, M., Jebara, T. (1999). Maximum entropy discrimination. In: NIPS, MIT Press.Google Scholar
  21. Jamali, M., & Ester, M. (2010). A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the fourth ACM conference on recommender systems, ACM, pp. 135–142.Google Scholar
  22. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42, 30–37.CrossRefGoogle Scholar
  23. Koyejo, O. (2013). Constrained relative entropy minimization with applications to multitask learning. PhD thesis, The University of Texas at Austin.Google Scholar
  24. Koyejo, O., & Ghosh, J. (2011). A kernel-based approach to exploiting interaction-networks in heterogeneous information sources for improved recommender systems. In: Proceedings of the 2nd international workshop on information heterogeneity and fusion in recommender systems, ACM, pp. 9–16.Google Scholar
  25. Koyejo, O., & Ghosh, J. (2013). Constrained Bayesian inference for low rank multitask learning. In: Proceedings of the 29th conference on Uncertainty in artificial intelligence (UAI).Google Scholar
  26. Koyejo, O., & Ghosh, J. (2013). A representation approach for relative entropy minimization with expectation constraints. In: ICML workshop on divergences and divergence learning (WDDL).Google Scholar
  27. Laue, S. (2012). A hybrid algorithm for convex semidefinite optimization. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp. 177–184.Google Scholar
  28. Lawrence, N., & Hyvärinen, A. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.MATHGoogle Scholar
  29. Lawrence, N. D., & Urtasun, R. (2009). Non-linear matrix factorization with gaussian processes. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp. 601–608.Google Scholar
  30. Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., & Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Research, 21(7), 1109–1121.CrossRefGoogle Scholar
  31. Li, L., & Toh, K. C. (2010). An inexact interior point method for l 1-regularized sparse covariance selection. Mathematical Programming Computation, 2(3–4), 291–315.MathSciNetCrossRefMATHGoogle Scholar
  32. Li, W. J., & Yeung, D. Y. (2009). Relation regularized matrix factorization. In: Proceedings of the 21st international joint conference on artificial intelligence, IJCAI’09, pp. 1126–1131.Google Scholar
  33. Li, W. J., Yeung, D. Y., & Zhang, Z. (2009). Probabilistic relational PCA. In: Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1123–1131).Google Scholar
  34. Li, W. J., Zhang, Z., Yeung D. Y. (2009). Latent Wishart processes for relational kernel learning. In: D. A. V. Dyk & M. Welling (Eds.), AISTATS, pp. 336–343.Google Scholar
  35. Ma, H., Yang, H., Lyu, M. R., King, I. (2008). Sorec: Social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’08, pp. 931–940.Google Scholar
  36. Maglott, D. R., Ostell, J., Pruitt, K. D., & Tatusova, T. A. (2011). Entrez gene: Gene-centered information at NCBI. Nucleic Acids Research, 39(Database–Issue), 52–57.CrossRefGoogle Scholar
  37. Massa, P., & Avesani, P. (2006). Trust-aware bootstrapping of recommender systems. In: ECAI 2006 workshop on recommender systems, pp. 29–33.Google Scholar
  38. McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P., et al. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews Genetics, 9(5), 356–369.CrossRefGoogle Scholar
  39. Mnih, A., & Salakhutdinov, R. (2007). Probabilistic matrix factorization. In: J. C. Platt, D. Koller, Y. Singer & S. T. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1257–1264).Google Scholar
  40. Mordelet, F., & Vert, J. P. (2011). Prodige: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics, 12, 389.CrossRefGoogle Scholar
  41. National Library of Medicine. (2012) Medical subject headings. http://www.nlm.nih.gov/mesh/. Retrieved from March 2012.
  42. National Library of Medicine. (2012). PubMed. http://www.ncbi.nlm.nih.gov/pubmed/. Retrieved from March 2012.
  43. NCBI. (1998). Genes and disease. Online, URL http://www.ncbi.nlm.nih.gov/books/NBK22183/. Retrieved from January 10, 2011.
  44. Orbanz, P., & Teh, Y. W. (2010). Bayesian nonparametric models. In: C. Sammut & G. I. Webb (Eds.),Encyclopedia of machine learning. Berlin: Springer.Google Scholar
  45. Pan, R., Zhou, Y., Cao, B., Liu, N. N., Lukose, R., Scholz, M., Yang, Q. (2008). One-class collaborative filtering. In: Data mining, 2008. ICDM’08. eighth IEEE international conference on, IEEE, pp. 502–511.Google Scholar
  46. Pong, T. K., Tseng, P., Ji, S., & Ye, J. (2010). Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM Journal on Optimization, 20(6), 3465–3489.MathSciNetCrossRefMATHGoogle Scholar
  47. Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian processes for machine learning (adaptive computation and machine learning series). Cambridge, MA: The MIT Press.Google Scholar
  48. Singh-Blom, U. M., Natarajan, N., Tewari, A., Woods, J. O., Dhillon, I. S., & Marcotte, E. M. (2013). Prediction and validation of gene-disease associations using methods inspired by social network analyses. PloS One, 8(5), e58,977.CrossRefGoogle Scholar
  49. Smola, A. J., & Kondor, R. (2003). Kernels and regularization on graphs. In: B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines (pp. 144–158). Berlin: Springer.Google Scholar
  50. Steck, H. (2010). Training and testing of recommender systems on data missing not at random. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 713–722.Google Scholar
  51. Steck, H., & Zemel, R. S. (2010). A generalized probabilistic framework and its variants for training top-k recommender systems. In: PRSAT.Google Scholar
  52. Stegle, O., Lippert, C., Mooij, J. M., Lawrence, N. D., Borgwardt, K. M. (2011). Efficient inference in matrix-variate gaussian models with iid observation noise. In: Advances in neural information processing systems (pp 630–638).Google Scholar
  53. Sutskever, I., Tenenbaum, J. B., Salakhutdinov, R. (2009). Modelling relational data using bayesian clustered tensor factorization. In: Advances in neural information processing systems (pp 1821–1828).Google Scholar
  54. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., & Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology, 6(1), e1000641.MathSciNetCrossRefGoogle Scholar
  55. Xu, M., Zhu, J., & Zhang, B. (2012). Nonparametric max-margin matrix factorization for collaborative prediction. Advances in Neural Information Processing Systems, 25, 64–72.Google Scholar
  56. Xu, Z., Tresp, V., Yu, K., Kriegel, H. P. (2006). Learning infinite hidden relational models. Uncertainity in, Artificial Intelligence (UAI2006).Google Scholar
  57. Xu, Z., Kersting, K., & Tresp, V. (2009). Multi-relational learning with gaussian processes. In: Proceedings of the 21st international joint conference on artificial intelligence, IJCAI’09, pp. 1309–1314.Google Scholar
  58. Yan, F., Xu, Z., Qi, Y. A. (2011). Sparse matrix-variate gaussian process blockmodels for network modeling. In: UAI.Google Scholar
  59. Yu, K., & Chu, W. (2008). Gaussian process models for link analysis and transfer learning. In: NIPS, pp 1657–1664.Google Scholar
  60. Yu, K., Chu, W., Yu, S., Tresp, V., & Xu, Z. (2007). Stochastic relational models for discriminative link prediction. Advances in neural information processing systems 19 (pp. 1553–1560). Cambridge, MA: MIT Press.Google Scholar
  61. Yu, Y., Cheng, H., Schuurmans, D., Szepesvri, C. (2013). Characterizing the representer theorem. In: ICML.Google Scholar
  62. Zellner, A. (1988). Optimal information processing and bayes’s theorem. The American Statistician, 42(4), 278–280.MathSciNetGoogle Scholar
  63. Zhang, X., & Carin, L. (2012). Joint modeling of a matrix with associated text via latent binary features. Advances in Neural Information Processing Systems, 25, 1565–1573.MATHGoogle Scholar
  64. Zhou, T., Shan, H., Banerjee, A., Sapiro, G. (2012). Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In: SDM, pp 403–414.Google Scholar
  65. Zhu, J. (2012). Max-margin nonparametric latent feature models for link prediction. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 719–726.Google Scholar
  66. Zhu, J., Ahmed, A., Xing, E. P. (2009). Medlda: Maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 1257–1264.Google Scholar
  67. Zhu, J., Chen, N., Xing, E. P. (2011). Infinite latent SVM for classification and multi-task learning. In: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp 1620–1628).Google Scholar
  68. Zhu, J., Chen, N., Xing, E. P. (2012). Bayesian inference with posterior regularization and infinite latent support vector machines. CoRR abs/1210.1766.Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Imaging Research CenterUniversity of Texas at AustinAustinUSA
  2. 2.Department of Biomedical EngineeringUniversity of Texas at AustinAustinUSA
  3. 3.Department of Electrical and Computer EngineeringUniversity of Texas at AustinAustinUSA

Personalised recommendations