RECOMB 2016: Research in Computational Molecular Biology pp 53-64 | Cite as
Multitask Matrix Completion for Learning Protein Interactions Across Diseases
Abstract
Disease causing pathogens such as viruses, introduce their proteins into the host cells where they interact with the host’s proteins enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different, but related viruses: Hepatitis C, Ebola virus and Influenza A. Our multitask matrix completion based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain upto a 39 % improvement in predictive performance over prior state-of-the-art models. We show how our model’s parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code, data and supplement is available at: http://www.cs.cmu.edu/~mkshirsa/bsl_mtl.
Keywords
Host-pathogen protein interactions Multitask learning Matrix completionReferences
- 1.Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. (JMLR) 10, 803–826 (2009)MATHGoogle Scholar
- 2.Candes, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2008)CrossRefMATHMathSciNetGoogle Scholar
- 3.Chen, J., Liu, J., Ye, J.: Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 22 (2012)Google Scholar
- 4.Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)Google Scholar
- 5.Dyer, M.D., Murali, T.M., Sobral, B.W.: Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 23(13), i159–166 (2007)CrossRefGoogle Scholar
- 6.Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: ACM SIGKDD (2004)Google Scholar
- 7.Hornbeck, P.V., Zhang, B., Murray, B., Kornhauser, J.M., Latham, V., Skrzypek, E.: Phosphositeplus, 2014: mutations, ptms and recalibrations. Nucleic Acids Res. 43(D1), D512–D520 (2015)CrossRefGoogle Scholar
- 8.Jain, P., Dhillon, I.S.: Provable inductive matrix completion (2013). arXiv:1306.0626
- 9.Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Techniques to cope with missing data in host-pathogen protein interaction prediction. Bioinformatics 28(18), i466–i472 (2012)CrossRefGoogle Scholar
- 10.Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Multi-task learning for host-pathogen protein interactions. Bioinformatics 29(13), i217–i226 (2013)CrossRefGoogle Scholar
- 11.Nanbo, A., Imai, M., Watanabe, S., et al.: Ebolavirus is internalized into host cells via macropinocytosis in a viral glycoprotein-dependent manner. PLoS Pathog. 6(9), e1001121 (2010)CrossRefGoogle Scholar
- 12.Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting genedisease associations. Bioinformatics 30(12), i60–i68 (2014)CrossRefGoogle Scholar
- 13.Qi, Y., et al.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)CrossRefGoogle Scholar
- 14.Qi, Y., Tastan, O., Carbonell, J.G., Klein-Seetharaman, J., Weston, J.: Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 6(18), i645–i652 (2010)CrossRefGoogle Scholar
- 15.Shen, J., et al.: Predicting protein-protein interactions based only on sequences information. PNAS 104, 4337–4341 (2007)CrossRefGoogle Scholar
- 16.Singh, R., Xu, J., Berger, B.: Struct2net: integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 11, 403–414 (2006)Google Scholar
- 17.Tastan, O., et al.: Prediction of interactions between HIV-1 and human proteins by information integration. Pac. Symp. Biocomput. 14, 516–527 (2009)Google Scholar
- 18.Tekir, S.D., Ali, S., Tunahan, C., Kutlu, O.U.: Infection strategies of bacterial and viral pathogens through pathogen-host protein protein interactions. Front. Microbio. Immunol. 3, 46 (2012)Google Scholar
- 19.Thomsen, M.C.F., Nielsen, M.: Seq2logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 40(W1), W281–W287 (2012)CrossRefGoogle Scholar
- 20.Widmer, C., Leiva, J., Altun, Y., Rätsch, G.: Leveraging sequence classification by taxonomy-based multitask learning. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 522–534. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 21.Xu, Q., Xiang, W.E., Yang, Q.: Protein-protein interaction prediction via collective matrix factorization. In: International Conference on Bioinformatics and Biomedicine (2010)Google Scholar