Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge

Sreeja, A.; Krishnakumar, U.; Vinayan, K. P.

doi:10.1007/s12539-017-0279-7

Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge

Original Research Article
Published: 30 January 2018

Volume 11, pages 460–474, (2019)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

A. Sreeja¹,
U. Krishnakumar² &
K. P. Vinayan³

330 Accesses
1 Citation
Explore all metrics

Abstract

Interaction of multiple genetic variants is a major challenge in the development of effective treatment strategies for complex disorders. Identifying the most promising genes enhances the understanding of the underlying mechanisms of the disease, which, in turn leads to better diagnostic and therapeutic predictions. Categorizing the disease genes into meaningful groups even helps in analyzing the correlated phenotypes which will further improve the power of detecting disease-associated variants. Since experimental approaches are time consuming and expensive, computational methods offer an accurate and efficient alternative for analyzing gene–disease associations from vast amount of publicly available genomic information. Integration of biological knowledge encoded in genes are necessary for identifying significant groups of functionally similar genes and for the sufficient biological elucidation of patterns classified by these clusters. The aim of the work is to identify gene clusters by utilizing diverse genomic information instead of using a single class of biological data in isolation and using efficient feature selection methods and edge pruning techniques for performance improvement. An optimized and streamlined procedure is proposed based on spectral clustering for automatic detection of gene communities through a combination of weighted knowledge fusion, threshold-based edge detection and entropy-based eigenvector subset selection. The proposed approach is applied to produce communities of genes related to Autism Spectrum Disorder and is compared with standard clustering solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology

Article Open access 26 April 2023

Aim in Genomics

References

Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7):976–978
Article CAS PubMed Google Scholar
Datta S, Datta S (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4):459–466
Article CAS PubMed Google Scholar
Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427
Article CAS PubMed Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the 2005 SIAM international conference on data mining. Society for industrial and applied mathematics, pp 274–285
Hernandez T, Kambhampati S (2004) Integration of biological sources: current systems and challenges ahead. ACM SIgmod Rec 33(3):51–60
Article Google Scholar
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402(6757):83–86
Article CAS PubMed Google Scholar
Joshi T, Chen Y, Becker JM, Alexandrov N, Xu D (2004) Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. Omics J Integr Biol 8(4):322–333
Article CAS Google Scholar
Huang YT, Yeh HY, Cheng SW, Tu CC, Kuo CL, Soo VW (2006) Automatic extraction of information about the molecular interactions in biological pathways from texts based on ontology and semantic processing. In IEEE International Conference on Systems, Man and Cybernetics, 2006. SMC’06, vol 5, pp 3679–3684, IEEE
Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M et al (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res 34(10):3067–3081
Article CAS PubMed PubMed Central Google Scholar
Gaulton KJ, Mohlke KL, Vision TJ (2007) A computational system to select candidate genes for complex human traits. Bioinformatics 23(9):1132–1140
Article CAS PubMed Google Scholar
Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24(5):537–544
Article CAS PubMed Google Scholar
Perez-Iratxeta C, Bork P, Andrade-Navarro MA (2007) Update of the G2D tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res 35(suppl 2):W212–W216
Article PubMed PubMed Central Google Scholar
Al-Mubaid H, Singh RK (2005) A new text mining approach for finding protein-to-disease associations. Am J Biochem Biotechnol 1(3):145–152
Article CAS Google Scholar
Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10):1275–1283
Article CAS PubMed Google Scholar
Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1):88
Article PubMed PubMed Central Google Scholar
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113
Article CAS PubMed Google Scholar
Deng M, Tu Z, Sun F, Chen T (2004) Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics
Lee H, Tu Z, Deng M, Sun F, Chen T (2006) Diffusion kernel-based logistic regression models for protein function prediction. Omics J Integr Biol 10(1):40–55
Article CAS Google Scholar
Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS (2004) A statistical framework for genomic data fusion. Bioinformatics 20(16):2626–2635
Article CAS PubMed Google Scholar
Tsuda K, Shin H, Schölkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21(suppl 2):ii59–ii65
Article CAS PubMed Google Scholar
Alpert CJ, Kahng AB, Yao SZ (1999) Spectral partitioning with multiple eigenvectors. Discrete Appl Math 90(1):3–26
Article Google Scholar
Dong X, Frossard P, Vandergheynst P, Nefedov N (2012) Clustering with multi-layer graphs: a spectral perspective. IEEE Trans Signal Process 60(11):5820–5831
Article Google Scholar
Mohar B (1997) Some applications of Laplace eigenvalues of graphs. Graph symmetry. Springer, The Netherlands, pp 225–275
Google Scholar
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article Google Scholar
Malik J, Belongie S, Leung T, Shi J (2001) Contour and texture analysis for image segmentation. Int J Comput Vision 43(1):7–27
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5^®). American Psychiatric Pub
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Article CAS PubMed PubMed Central Google Scholar
Piñero González J, Rosinach Q, Bravo N, Déu À, Pons J, Bauer-Mehren A, Baron M et al (2015) DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes
Mayer MÁ, Bundschus M, Rautschka M, Sanz F, Furlong LI (2011) Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One 6(6):e20284
Article CAS PubMed PubMed Central Google Scholar
Rogers FB (1963) Medical subject headings. Bull Med Libr Assoc 51:114
CAS PubMed PubMed Central Google Scholar
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27(1):29–34
Article CAS PubMed PubMed Central Google Scholar
Kanehisa M (1997) A database for post-genome analysis. Trends Genet TIG 13(9):375
Article CAS PubMed Google Scholar
Altermann E, Klaenhammer TR (2005) PathwayVoyager: pathway mapping using the Kyoto encyclopedia of genes and genomes (KEGG) database. BMC Genom 6(1):60
Article CAS Google Scholar
Ade AS, Wright ZC (2007) States DJ: Gene2MeSH [Internet]. Ann Arbor (MI): National Center for Integrative Biomedical Informatics
Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: the Jaccard index versus Salton’s cosine formula. Inf Process Manag 25(3):315–318
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval (vol 1, No. 1. Cambridge University Press, Cambridge, 496
Book Google Scholar
Wang JZ, Du Z, Payattakool R, Philip SY, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281
Article CAS PubMed Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
Ding CH (2003) Unsupervised feature selection via two-way ordering in gene expression analysis. Bioinformatics 19(10):1259–1266
Article CAS PubMed Google Scholar
Guzzi PH, Veltri P, Cannataro M (2013) Thresholding of semantic similarity networks using a spectral graph-based technique. In: International workshop on new frontiers in mining complex patterns. Springer International Publishing, pp 201–213
Varshavsky R, Gottlieb A, Linial M, Horn D (2006) Novel unsupervised feature filtering of biological data. Bioinformatics 22(14):e507–e513
Article CAS PubMed Google Scholar
Alvim M, Andrés M, Palamidessi C (2010) Probabilistic information flow. In: Proceedings of the 25th annual IEEE symposium on logic in computer science, pp 314–321
Lima C, de Assis F, de Souza C (2012) An empirical investigation of attribute selection techniques based on Shannon, Rényi and Tsallis entropies for network intrusion detection. Am J Intell Syst 2(5):111–117
Article Google Scholar
Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 110–121
Google Scholar
Marsden A (2013) Eigenvalues of the laplacian and their relationship to the connectedness of a graph. University of Chicago, REU
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690
Article CAS PubMed PubMed Central Google Scholar
Sreeja A, Vinayan KP (2017) Multidimensional knowledge-based framework is an essential step in the categorization of gene sets in complex disorders. J Bioinf Comput Biol 15(6):1750022
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Article CAS Google Scholar
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and Information Sciences—ISCIS 2005. Springer, Berlin, pp 284–293
Chapter Google Scholar
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Article CAS PubMed PubMed Central Google Scholar
Fernández A, Sessel S (2009) Selective antagonism of anticancer drugs for side-effect removal. Trends Pharmacol Sci 30(8):403–410
Article CAS PubMed Google Scholar
Berger SI, Iyengar R (2009) Network analyses in systems pharmacology. Bioinformatics 25(19):2466–2472
Article CAS PubMed PubMed Central Google Scholar
Bocchio-Chiavetto L, Maffioletti E, Bettinsoli P, Giovannini C, Bignotti S, Tardito D et al (2013) Blood microRNA changes in depressed patients during antidepressant treatment. Eur Neuropsychopharmacol 23(7):602–611
Article CAS PubMed Google Scholar
Kohane IS, McMurry A, Weber G, MacFadden D, Rappaport L, Kunkel L et al (2012) The co-morbidity burden of children and young adults with autism spectrum disorders. PLoS One 7(4):e33224
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study is supported by the Cognitive Science Research Initiative (CSRI) of the Department of Science and Technology (DST), Government of India, as part of the funded Project, SR/CSI/81/2011 at Department of Computer Science, School of Arts and Sciences, Amrita University, Kochi.

Author information

Authors and Affiliations

Department of Computer Science and IT, School of Arts and Sciences, Amrita University, Kochi, Kerala, India
A. Sreeja
School of Arts and Sciences, Amrita University, Kochi, Kerala, India
U. Krishnakumar
Division of Paediatric Neurology, Department of Neurology, Amrita Institute of Medical Sciences, Amrita University, Kochi, Kerala, India
K. P. Vinayan

Authors

A. Sreeja
View author publications
You can also search for this author in PubMed Google Scholar
U. Krishnakumar
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Vinayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Sreeja.

Ethics declarations

Conflict of interest

The author states that the present manuscript presents no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreeja, A., Krishnakumar, U. & Vinayan, K.P. Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge. Interdiscip Sci Comput Life Sci 11, 460–474 (2019). https://doi.org/10.1007/s12539-017-0279-7

Download citation

Received: 03 May 2017
Revised: 11 November 2017
Accepted: 15 December 2017
Published: 30 January 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s12539-017-0279-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge

Abstract

Access this article

Similar content being viewed by others

DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology

Aim in Genomics

Aim in Genomics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge

Abstract

Access this article

Similar content being viewed by others

DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology

Aim in Genomics

Aim in Genomics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation