Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization
- 382 Downloads
Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Over the past decade, researchers reported a large number of human disease associations with the lncRNAs, both intergenic lncRNAs (lincRNAs) and non-intergenic lncRNAs. Thanks to the next generation sequencing platform, RNA-seq, through which researchers also were able to quantify expression profiles of each of the lncRNAs in human tissue samples. In this article we adapted the non-negative matrix factorization method to develop a low-rank computational model that can describe the existing knowledge about both non-intergenic and intergenic lncRNA-disease associations represented in a two dimensional association matrix as well as convey a way of ranking disease causing lncRNAs. We proposed several NMF formulations for the problem and we found that the sparsity-constrained NMF obtained the best model among all the other models. By exploiting the inherent bi-clustering ability of the NMF models, we extracted several lncRNA groups and disease groups that possess biological significance. Moreover, we proposed an integrative NMF formulation where we incorporated along with the coding gene and lincRNA disease association data, prior knowledge about relationship networks among the coding genes and lincRNAs, and the RNA-seq expression profile data to identify potential lincRNA-coding gene co-modules with which we further enhanced the lincRNA-disease associations and untangled mysteries about functional chemistry of the intergenic lncRNAs. Experimental results show the superiority of our proposed method over two state-of-the-art clustering algorithms—k-means and hierarchical clustering.
KeywordsLong non-coding RNAs Association inference Non-negative matrix factorization Bi-clustering Intergenic long non-coding RNA–gene co-modules
Conflict of interest
The authors declare that they have no conflict of interest.
Research involving human participants and/or animals
This research neither involved any human participant nor animals. All the data-sets were collected from publicly accessible websites.
Not applicable to this research since no human participants, and animals were involved.
- Biswas AK, Gao JX, Zhang B, Wu X (2014) NMF-based LncRNA-disease association inference and bi-clustering. In: Proceedings of the IEEE international conference on Bioinformatics and Bioengineering (BIBE), pp 97–104Google Scholar
- Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Data Mining. ICDM’08. Eighth IEEE International Conference on. IEEE, pp 63–72Google Scholar
- Gu Q, Zhou J (2009) Local learning regularized nonnegative matrix factorization. In: Twenty-First International Joint Conference on Artificial IntelligenceGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl Acids Res 33(suppl 1):D514–D517Google Scholar
- Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. pp 556–562Google Scholar
- Yang X, Gao L, Guo X, Shi X, Wu H, Song F, Wang B (2014) A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases. PLOS One 9(1):e87–e797Google Scholar