GC\(^2\)NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction
- 100 Downloads
Gene–phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene–phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene–phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene–phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC\(^2\)NMF). Specifically, first we introduce the depth of parent–child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene–phenotype association datasets of mouse and human demonstrate that GC\(^2\)NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
KeywordsNMF Weighted graph constraint Group centric constraint Gene–phenotype association prediction
This work is supported by the National Natural Science Foundation of China (nos. 61702367, 61300972). The Research Project of Tianjin Municipal Commission of Education (no. 2017KJ033).
- 2.Benzi K, Kalofolias V, Bresson X, Vandergheynst P (2016) Song recommendation with non-negative matrix factorization and graph total variationGoogle Scholar
- 7.Daniel D, Lee HSS (2000) Algorithms for non-negative matrix factorization. In: In NIPS. MIT Press, pp 556–562Google Scholar
- 10.Hwang T, Kuang R (2010) A heterogeneous label propagation algorithm for disease gene discovery. SIAM, p 12Google Scholar
- 11.Jeribi A (1997) Spectral graph theory. American Mathematical SocietyGoogle Scholar
- 16.Ma H, Yang H, Lyu MR, King I (2008) SoRec: social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM conference on information and knowledge mining—CIKM ’08. ACM Press, New York, New York, USA, p 931. https://doi.org/10.1145/1458082.1458205
- 17.Rajabi R, Khodadadzadeh M, Ghassemian H (2011) Graph regularized nonnegative matrix factorization for hyperspectral data unmixing. In: 2011 7th Iranian conference on machine vision and image processing, pp 1–4. IEEE. https://doi.org/10.1109/IranianMVIP.2011.6121599
- 18.Salakhutdinov R, Mnih A (2008) Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of the 25th international conference on machine learning—ICML ’08. ACM Press, New York, New York, USA, pp 880–887. https://doi.org/10.1145/1390156.1390267
- 19.Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in neural information processing systems, vol 20Google Scholar
- 20.Shan H, Banerjee A (2010) Generalized probabilistic matrix factorizations for collaborative filtering. In: 2010 IEEE International conference on data mining. IEEE, pp 1025–1030. https://doi.org/10.1109/ICDM.2010.116
- 23.Wu X, Jiang R (2008) Network-based global inference of human disease genes. Mol Syst Biol 189. https://doi.org/10.1038/msb.2008.27
- 25.Zhang S, Wang W, Ford J, Makedon F (2006) Proceedings of the 2006 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, PA. https://doi.org/10.1137/1.9781611972764