Multitask Matrix Completion for Learning Protein Interactions Across Diseases

  • Meghana Kshirsagar
  • Jaime G. Carbonell
  • Judith Klein-Seetharaman
  • Keerthiram Murugesan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)


Disease causing pathogens such as viruses, introduce their proteins into the host cells where they interact with the host’s proteins enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different, but related viruses: Hepatitis C, Ebola virus and Influenza A. Our multitask matrix completion based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain upto a 39 % improvement in predictive performance over prior state-of-the-art models. We show how our model’s parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code, data and supplement is available at:


Host-pathogen protein interactions Multitask learning Matrix completion 


  1. 1.
    Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. (JMLR) 10, 803–826 (2009)zbMATHGoogle Scholar
  2. 2.
    Candes, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Chen, J., Liu, J., Ye, J.: Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 22 (2012)Google Scholar
  4. 4.
    Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)Google Scholar
  5. 5.
    Dyer, M.D., Murali, T.M., Sobral, B.W.: Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 23(13), i159–166 (2007)CrossRefGoogle Scholar
  6. 6.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: ACM SIGKDD (2004)Google Scholar
  7. 7.
    Hornbeck, P.V., Zhang, B., Murray, B., Kornhauser, J.M., Latham, V., Skrzypek, E.: Phosphositeplus, 2014: mutations, ptms and recalibrations. Nucleic Acids Res. 43(D1), D512–D520 (2015)CrossRefGoogle Scholar
  8. 8.
    Jain, P., Dhillon, I.S.: Provable inductive matrix completion (2013). arXiv:1306.0626
  9. 9.
    Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Techniques to cope with missing data in host-pathogen protein interaction prediction. Bioinformatics 28(18), i466–i472 (2012)CrossRefGoogle Scholar
  10. 10.
    Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Multi-task learning for host-pathogen protein interactions. Bioinformatics 29(13), i217–i226 (2013)CrossRefGoogle Scholar
  11. 11.
    Nanbo, A., Imai, M., Watanabe, S., et al.: Ebolavirus is internalized into host cells via macropinocytosis in a viral glycoprotein-dependent manner. PLoS Pathog. 6(9), e1001121 (2010)CrossRefGoogle Scholar
  12. 12.
    Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting genedisease associations. Bioinformatics 30(12), i60–i68 (2014)CrossRefGoogle Scholar
  13. 13.
    Qi, Y., et al.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)CrossRefGoogle Scholar
  14. 14.
    Qi, Y., Tastan, O., Carbonell, J.G., Klein-Seetharaman, J., Weston, J.: Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 6(18), i645–i652 (2010)CrossRefGoogle Scholar
  15. 15.
    Shen, J., et al.: Predicting protein-protein interactions based only on sequences information. PNAS 104, 4337–4341 (2007)CrossRefGoogle Scholar
  16. 16.
    Singh, R., Xu, J., Berger, B.: Struct2net: integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 11, 403–414 (2006)Google Scholar
  17. 17.
    Tastan, O., et al.: Prediction of interactions between HIV-1 and human proteins by information integration. Pac. Symp. Biocomput. 14, 516–527 (2009)Google Scholar
  18. 18.
    Tekir, S.D., Ali, S., Tunahan, C., Kutlu, O.U.: Infection strategies of bacterial and viral pathogens through pathogen-host protein protein interactions. Front. Microbio. Immunol. 3, 46 (2012)Google Scholar
  19. 19.
    Thomsen, M.C.F., Nielsen, M.: Seq2logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 40(W1), W281–W287 (2012)CrossRefGoogle Scholar
  20. 20.
    Widmer, C., Leiva, J., Altun, Y., Rätsch, G.: Leveraging sequence classification by taxonomy-based multitask learning. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 522–534. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Xu, Q., Xiang, W.E., Yang, Q.: Protein-protein interaction prediction via collective matrix factorization. In: International Conference on Bioinformatics and Biomedicine (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Meghana Kshirsagar
    • 1
  • Jaime G. Carbonell
    • 2
  • Judith Klein-Seetharaman
    • 3
  • Keerthiram Murugesan
    • 2
  1. 1.IBM T. J. Watson ResearchYorktown HeightsUSA
  2. 2.Language Technologies InstituteCarnegie Mellon UniversityPittsburghUSA
  3. 3.Metabolic and Vascular Health, Warwick Medical SchoolUniversity of WarwickCoventryUK

Personalised recommendations