Selective Matrix Factorization for Multi-relational Data Fusion

  • Yuehui Wang
  • Guoxian YuEmail author
  • Carlotta Domeniconi
  • Jun Wang
  • Xiangliang Zhang
  • Maozu Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11446)


Matrix factorization based data fusion solutions can account for the intrinsic structures of multi-relational data sources, but most solutions equally treat these sources or prefer sparse ones, which may be irrelevant for the target task. In this paper, we introduce a Selective Matrix Factorization based Data Fusion approach (SelMFDF) to collaboratively factorize multiple inter-relational data matrices into low-rank representation matrices of respective object types and optimize the weights of them. To avoid preference to sparse data matrices, it additionally regularizes these low-rank matrices by approximating them to multiple intra-relational data matrices and also optimizes the weights of them. Both weights contribute to automatically integrate relevant data sources. Finally, it reconstructs the target relational data matrix using the optimized low-rank matrices. We applied SelMFDF for predicting inter-relations (lncRNA-miRNA interactions, functional annotations of proteins) and intra-relations (protein-protein interactions). SelMFDF achieves a higher AUROC (area under the receiver operating characteristics curve) by at least 5.88%, and larger AUPRC (area under the precision-recall curve) by at least 18.23% than other related and competitive approaches. The empirical study also confirms that SelMFDF can not only differentially integrate these relational data matrices, but also has no preference toward sparse ones.


Matrix factorization Data fusion Multi-relational data Association prediction 


  1. 1.
    Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. JMLR 7(11), 2399–2434 (2006)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  3. 3.
    Chatr-Aryamontri, A., Oughtred, R., et al.: The biogrid interaction database: 2017 update. Nucleic Acids Res. 45(D1), D369–D379 (2017)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Yu, G., Domeniconi, C., Wang, J., Zhang, Z.: Matrix factorization for identifying noisy labels of multi-label instances. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11013, pp. 508–517. Springer, Cham (2018). Scholar
  5. 5.
    Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: KDD, pp. 126–135 (2006)Google Scholar
  6. 6.
    Fu, G., Wang, J., Domeniconi, C., Yu, G.: Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. Bioinformatics 34(9), 1529–1537 (2018)CrossRefGoogle Scholar
  7. 7.
    Gligorijević, V., Pržulj, N.: Methods for biological data integration: perspectives and challenges. J. Roy. Soc. Interface 12(112), 20150571 (2015)CrossRefGoogle Scholar
  8. 8.
    Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. JMLR 12(7), 2211–2268 (2011)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Karasuyama, M., Mamitsuka, H.: Multiple graph label propagation by sparse integration. TNNLS 24(12), 1999–2012 (2013)Google Scholar
  10. 10.
    Kerrien, S., et al.: The intact molecular interaction database in 2012. Nucleic Acids Res. 40(D1), D841–D846 (2011)CrossRefGoogle Scholar
  11. 11.
    Keshava Prasad, T., et al.: Human protein reference database–2009 update. Nucleic Acids Research 37(S1), D767–D772 (2008)Google Scholar
  12. 12.
    Kotlyar, M., Pastrello, C., Sheahan, N., Jurisica, I.: Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucleic Acids Res. 44(D1), D536–D541 (2015)CrossRefGoogle Scholar
  13. 13.
    Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)CrossRefGoogle Scholar
  14. 14.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562 (2001)Google Scholar
  15. 15.
    Li, Y., Wu, F.X., Ngom, A.: A review on machine learning principles for multi-view biological data integration. Brief. Bioinf. 19(2), 325–340 (2016)Google Scholar
  16. 16.
    Meng, D., De La Torre, F.: Robust matrix factorization with unknown noise. In: ICCV, pp. 1337–1344 (2013)Google Scholar
  17. 17.
    Nielsen, T.D., Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2009)zbMATHGoogle Scholar
  18. 18.
    Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D.: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). PNAS 100(14), 8348–8353 (2003)CrossRefGoogle Scholar
  19. 19.
    Wang, H., Huang, H., Ding, C.: Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: CIKM, pp. 279–284 (2011)Google Scholar
  20. 20.
    Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. TCSVT 19(5), 733–746 (2009)Google Scholar
  21. 21.
    Yu, G., Domeniconi, C., Rangwala, H., Zhang, G., Yu, Z.: Transductive multi-label ensemble classification for protein function prediction. In: KDD, pp. 1077–1085 (2012)Google Scholar
  22. 22.
    Yu, G., Fu, G., Lu, C., Ren, Y., Wang, J.: BRWLDA: bi-random walks for predicting lncRNA-disease associations. Oncotarget 8(36), 60429 (2017)CrossRefGoogle Scholar
  23. 23.
    Yu, G., Rangwala, H., Domeniconi, C., Zhang, G., Zhang, Z.: Predicting protein function using multiple kernels. TCBB 12(1), 219–233 (2015)Google Scholar
  24. 24.
    Yu, G., Wang, Y., Wang, J., Fu, G., Guo, M., Domeniconi, C.: Weighted matrix factorization based data fusion for predicting lncRNA-disease associations. In: BIBM, pp. 1–6 (2018)Google Scholar
  25. 25.
    Yuan, Y., Savage, R.S., Markowetz, F.: Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput. Biol. 7(10), e1002227 (2011)CrossRefGoogle Scholar
  26. 26.
    Žitnik, M., Zupan, B.: Data fusion by matrix factorization. TPAMI 37(1), 41–53 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yuehui Wang
    • 1
  • Guoxian Yu
    • 1
    • 3
    Email author
  • Carlotta Domeniconi
    • 2
  • Jun Wang
    • 1
  • Xiangliang Zhang
    • 3
  • Maozu Guo
    • 4
  1. 1.College of Computer and Information ScienceSouthwest UniversityChongqingChina
  2. 2.Department of Computer ScienceGeorge Mason UniversityFairfaxUSA
  3. 3.King Abdullah University of Science and TechnologyThuwalSaudi Arabia
  4. 4.College of Electrical and Information EngineeringBeijing University of Civil Engineering and ArchitectureBeijingChina

Personalised recommendations