Abstract
Alternative splicing enables a gene spliced into different isoforms, which are closely related with diverse developmental abnormalities. Identifying the isoform-disease associations helps to uncover the underlying pathology of various complex diseases, and to develop precise treatments and drugs for these diseases. Although many approaches have been proposed for predicting gene-disease associations and isoform functions, few efforts have been made toward predicting isoform-disease associations in large-scale, the main bottleneck is the lack of ground-truth isoform-disease associations. To bridge this gap, we propose a multi-instance learning inspired computational approach called IDAPred to fuse genomics and transcriptomics data for isoform-disease association prediction. Given the bag-instance relationship between gene and its spliced isoforms, IDAPred introduces a dispatch and aggregation term to dispatch gene-disease associations to individual isoforms, and reversely aggregate these dispatched associations to affiliated genes. Next, it fuses different genomics and transcriptomics data to replenish gene-disease associations and to induce a linear classifier for predicting isoform-disease associations in a coherent way. In addition, to alleviate the bias toward observed gene-disease associations, it adds a regularization term to differentiate the currently observed associations from the unobserved (potential) ones. Experimental results show that IDAPred significantly outperforms the related state-of-the-art methods.
Keywords
- Isoform-disease association
- Alternative splicing
- Data fusion
- Multi-instance learning
This is a preview of subscription content, access via your institution.
Buying options


References
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn. 77, 329–353 (2018)
Chen, H., Shaw, D., Zeng, J., Bu, D., Jiang, T.: Diffuse: predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics 35(14), i284–i294 (2019)
Claussnitzer, M., et al.: A brief history of human disease genetics. Nature 577(7789), 179–189 (2020)
Consortium, E.P., et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57 (2012)
Eksi, R., et al.: Systematically differentiating functions for alternatively spliced isoforms through integrating rna-seq data. PLoS Comput. Biol. 9(11), e1003314 (2013)
Ellis, J.D., et al.: Tissue-specific alternative splicing remodels protein-protein interaction networks. Mol. Cell 46(6), 884–892 (2012)
Gaudet, P., Dessimoz, C.: Gene ontology: pitfalls, biases, and remedies. In: The Gene Ontology Handbook, pp. 189–205. Humana Press, New York (2017)
Holman, L., Head, M.L., Lanfear, R., Jennions, M.D.: Evidence of experimental bias in the life sciences: why we need blind data recording. PLoS Biol. 13(7), e1002190 (2015)
Holtzman, D.M., et al.: Apolipoprotein E isoform-dependent amyloid deposition and neuritic degeneration in a mouse model of Alzheimer’s disease. Proc. Nat. Acad. Sci. 97(6), 2892–2897 (2000)
Jiang, Y., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1), 184 (2016)
Kim, D., Langmead, B., Salzberg, S.L.: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12(4), 357 (2015)
Li, H.D., Menon, R., Omenn, G.S., Guan, Y.: The emerging era of genomic data integration for analyzing splice isoform function. Trends Genet. 30(8), 340–347 (2014)
Li, W., et al.: High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res. 42(6), e39–e39 (2014)
Lundberg, A.K., Jonasson, L., Hansson, G.K., Mailer, R.K.: Activation-induced FOXP3 isoform profile in peripheral CD4+ T cells is associated with coronary artery disease. Atherosclerosis 267, 27–33 (2017)
Luo, P., Li, Y., Tian, L.P., Wu, F.X.: Enhancing the prediction of disease-gene associations with multimodal deep learning. Bioinformatics 35(19), 3735–3742 (2019)
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: NeurIPS, pp. 570–576 (1998)
Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting gene-disease associations. Bioinformatics 30(12), i60–i68 (2014)
Neagoe, C., et al.: Titin isoform switch in ischemic human heart disease. Circulation 106(11), 1333–1341 (2002)
Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40(12), 1413 (2008)
Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., Salzberg, S.L.: Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290 (2015)
Piñero, J., et al.: The disgenet knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48(D1), D845–D855 (2020)
Pletscher-Frankild, S., Pallejà, A., Tsafou, K., Binder, J.X., Jensen, L.J.: Diseases: text mining and data integration of disease-gene associations. Methods 74, 83–89 (2015)
Sanan, D.A., et al.: Apolipoprotein E associates with beta amyloid peptide of Alzheimer’s disease to form novel monofibrils. isoform apoE4 associates more efficiently than apoE3. J. Clin. Invest. 94(2), 860–869 (1994)
Schriml, L.M., et al.: Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 40(D1), D940–D946 (2012)
Shaw, D., Chen, H., Jiang, T.: Deepisofun: a deep domain adaptation approach to predict isoform functions. Bioinformatics 35(15), 2535–2544 (2019)
Shen, J., et al.: Predicting protein-protein interactions based only on sequences information. Proc. Nat. Acad. Sci. 104(11), 4337–4341 (2007)
Skotheim, R.I., Nees, M.: Alternative splicing in cancer: noise, functional, or systematic? Int. J. Biochem. Cell Biol. 39(7–8), 1432–1449 (2007)
Smith, L.M., Kelleher, N.L.: Proteoforms as the next proteomics currency. Science 359(6380), 1106–1107 (2018)
Strittmatter, W.J., et al.: Binding of human apolipoprotein E to synthetic amyloid beta peptide: isoform-specific effects and implications for late-onset Alzheimer disease. Proc. Nat. Acad. Sci. 90(17), 8098–8102 (1993)
Sun, P.G., Gao, L., Han, S.: Prediction of human disease-related gene clusters by clustering analysis. Int. J. Biol. Sci. 7(1), 61 (2011)
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)
Wang, E.T., et al.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470 (2008)
Wang, K., Wang, J., Domeniconi, C., Zhang, X., Yu, G.: Differentiating isoform functions with collaborative matrix factorization. Bioinformatics 36(6), 1864–1871 (2020)
Wang, X., Gulbahce, N., Yu, H.: Network-based methods for human disease gene prediction. Brief. Funct. Genomics 10(5), 280–293 (2011)
Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z., Guo, M.: Multi-view multi-instance multi-label learning based on collaborative matrix factorization. In: AAAI, pp. 5508–5515 (2019)
Xiong, H.Y., et al.: The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218), 1254806 (2015)
Yeo, G., Holste, D., Kreiman, G., Burge, C.B.: Variation in alternative splicing across human tissues. Genome Biol. 5(10), R74 (2004). https://doi.org/10.1186/gb-2004-5-10-r74
Yu, G., Rangwala, H., Domeniconi, C., Zhang, G., Yu, Z.: Protein function prediction using multilabel ensemble classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 1045–1057 (2013)
Yu, G., Wang, K., Domeniconi, C., Guo, M., Wang, J.: Isoform function prediction based on bi-random walks on a heterogeneous network. Bioinformatics 36(1), 303–310 (2020)
Zhou, Z.H., Zhang, M.L., Huang, S.J., Li, Y.F.: Multi-instance multi-label learning. Artif. Intell. 176(1), 2291–2320 (2012)
Acknowledgements
This research is supported by NSFC (61872300), Fundamental Research Funds for the Central Universities (XDJK2019B024 and XDJK2020B028), Natural Science Foundation of CQ CSTC (cstc2018jcyjAX0228).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Q., Wang, J., Zhang, X., Yu, G. (2020). Isoform-Disease Association Prediction by Data Fusion. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds) Bioinformatics Research and Applications. ISBRA 2020. Lecture Notes in Computer Science(), vol 12304. Springer, Cham. https://doi.org/10.1007/978-3-030-57821-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-57821-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57820-6
Online ISBN: 978-3-030-57821-3
eBook Packages: Computer ScienceComputer Science (R0)