Abstract
Disease causing pathogens such as viruses, introduce their proteins into the host cells where they interact with the host’s proteins enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different, but related viruses: Hepatitis C, Ebola virus and Influenza A. Our multitask matrix completion based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain upto a 39 % improvement in predictive performance over prior state-of-the-art models. We show how our model’s parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code, data and supplement is available at: http://www.cs.cmu.edu/~mkshirsa/bsl_mtl.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dimensions being different does not influence the method or the optimization in any way.
- 2.
Since we use data from several strains for each task, the PPI data contains some interactions that are interologs. Please see the supplementary Sect. S4 for details.
- 3.
For details of these classes, please refer to the supplementary or the original paper.
References
Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. (JMLR) 10, 803–826 (2009)
Candes, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2008)
Chen, J., Liu, J., Ye, J.: Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 22 (2012)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Dyer, M.D., Murali, T.M., Sobral, B.W.: Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 23(13), i159–166 (2007)
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: ACM SIGKDD (2004)
Hornbeck, P.V., Zhang, B., Murray, B., Kornhauser, J.M., Latham, V., Skrzypek, E.: Phosphositeplus, 2014: mutations, ptms and recalibrations. Nucleic Acids Res. 43(D1), D512–D520 (2015)
Jain, P., Dhillon, I.S.: Provable inductive matrix completion (2013). arXiv:1306.0626
Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Techniques to cope with missing data in host-pathogen protein interaction prediction. Bioinformatics 28(18), i466–i472 (2012)
Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Multi-task learning for host-pathogen protein interactions. Bioinformatics 29(13), i217–i226 (2013)
Nanbo, A., Imai, M., Watanabe, S., et al.: Ebolavirus is internalized into host cells via macropinocytosis in a viral glycoprotein-dependent manner. PLoS Pathog. 6(9), e1001121 (2010)
Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting genedisease associations. Bioinformatics 30(12), i60–i68 (2014)
Qi, Y., et al.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)
Qi, Y., Tastan, O., Carbonell, J.G., Klein-Seetharaman, J., Weston, J.: Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 6(18), i645–i652 (2010)
Shen, J., et al.: Predicting protein-protein interactions based only on sequences information. PNAS 104, 4337–4341 (2007)
Singh, R., Xu, J., Berger, B.: Struct2net: integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 11, 403–414 (2006)
Tastan, O., et al.: Prediction of interactions between HIV-1 and human proteins by information integration. Pac. Symp. Biocomput. 14, 516–527 (2009)
Tekir, S.D., Ali, S., Tunahan, C., Kutlu, O.U.: Infection strategies of bacterial and viral pathogens through pathogen-host protein protein interactions. Front. Microbio. Immunol. 3, 46 (2012)
Thomsen, M.C.F., Nielsen, M.: Seq2logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 40(W1), W281–W287 (2012)
Widmer, C., Leiva, J., Altun, Y., Rätsch, G.: Leveraging sequence classification by taxonomy-based multitask learning. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 522–534. Springer, Heidelberg (2010)
Xu, Q., Xiang, W.E., Yang, Q.: Protein-protein interaction prediction via collective matrix factorization. In: International Conference on Bioinformatics and Biomedicine (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J., Murugesan, K. (2016). Multitask Matrix Completion for Learning Protein Interactions Across Diseases. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-31957-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31956-8
Online ISBN: 978-3-319-31957-5
eBook Packages: Computer ScienceComputer Science (R0)