Skip to main content

Advertisement

Log in

Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Interaction of multiple genetic variants is a major challenge in the development of effective treatment strategies for complex disorders. Identifying the most promising genes enhances the understanding of the underlying mechanisms of the disease, which, in turn leads to better diagnostic and therapeutic predictions. Categorizing the disease genes into meaningful groups even helps in analyzing the correlated phenotypes which will further improve the power of detecting disease-associated variants. Since experimental approaches are time consuming and expensive, computational methods offer an accurate and efficient alternative for analyzing gene–disease associations from vast amount of publicly available genomic information. Integration of biological knowledge encoded in genes are necessary for identifying significant groups of functionally similar genes and for the sufficient biological elucidation of patterns classified by these clusters. The aim of the work is to identify gene clusters by utilizing diverse genomic information instead of using a single class of biological data in isolation and using efficient feature selection methods and edge pruning techniques for performance improvement. An optimized and streamlined procedure is proposed based on spectral clustering for automatic detection of gene communities through a combination of weighted knowledge fusion, threshold-based edge detection and entropy-based eigenvector subset selection. The proposed approach is applied to produce communities of genes related to Autism Spectrum Disorder and is compared with standard clustering solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7):976–978

    Article  CAS  PubMed  Google Scholar 

  2. Datta S, Datta S (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4):459–466

    Article  CAS  PubMed  Google Scholar 

  3. Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427

    Article  CAS  PubMed  Google Scholar 

  4. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386

    Article  Google Scholar 

  5. White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the 2005 SIAM international conference on data mining. Society for industrial and applied mathematics, pp 274–285 

  6. Hernandez T, Kambhampati S (2004) Integration of biological sources: current systems and challenges ahead. ACM SIgmod Rec 33(3):51–60

    Article  Google Scholar 

  7. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402(6757):83–86

    Article  CAS  PubMed  Google Scholar 

  8. Joshi T, Chen Y, Becker JM, Alexandrov N, Xu D (2004) Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. Omics J Integr Biol 8(4):322–333

    Article  CAS  Google Scholar 

  9. Huang YT, Yeh HY, Cheng SW, Tu CC, Kuo CL, Soo VW (2006) Automatic extraction of information about the molecular interactions in biological pathways from texts based on ontology and semantic processing. In IEEE International Conference on Systems, Man and Cybernetics, 2006. SMC’06, vol 5, pp 3679–3684, IEEE

  10. Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M et al (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res 34(10):3067–3081

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gaulton KJ, Mohlke KL, Vision TJ (2007) A computational system to select candidate genes for complex human traits. Bioinformatics 23(9):1132–1140

    Article  CAS  PubMed  Google Scholar 

  12. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24(5):537–544

    Article  CAS  PubMed  Google Scholar 

  13. Perez-Iratxeta C, Bork P, Andrade-Navarro MA (2007) Update of the G2D tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res 35(suppl 2):W212–W216

    Article  PubMed  PubMed Central  Google Scholar 

  14. Al-Mubaid H, Singh RK (2005) A new text mining approach for finding protein-to-disease associations. Am J Biochem Biotechnol 1(3):145–152

    Article  CAS  Google Scholar 

  15. Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10):1275–1283

    Article  CAS  PubMed  Google Scholar 

  16. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1):88

    Article  PubMed  PubMed Central  Google Scholar 

  17. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113

    Article  CAS  PubMed  Google Scholar 

  18. Deng M, Tu Z, Sun F, Chen T (2004) Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics

  19. Lee H, Tu Z, Deng M, Sun F, Chen T (2006) Diffusion kernel-based logistic regression models for protein function prediction. Omics J Integr Biol 10(1):40–55

    Article  CAS  Google Scholar 

  20. Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS (2004) A statistical framework for genomic data fusion. Bioinformatics 20(16):2626–2635

    Article  CAS  PubMed  Google Scholar 

  21. Tsuda K, Shin H, Schölkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21(suppl 2):ii59–ii65

    Article  CAS  PubMed  Google Scholar 

  22. Alpert CJ, Kahng AB, Yao SZ (1999) Spectral partitioning with multiple eigenvectors. Discrete Appl Math 90(1):3–26

    Article  Google Scholar 

  23. Dong X, Frossard P, Vandergheynst P, Nefedov N (2012) Clustering with multi-layer graphs: a spectral perspective. IEEE Trans Signal Process 60(11):5820–5831

    Article  Google Scholar 

  24. Mohar B (1997) Some applications of Laplace eigenvalues of graphs. Graph symmetry. Springer, The Netherlands, pp 225–275

    Google Scholar 

  25. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  Google Scholar 

  26. Malik J, Belongie S, Leung T, Shi J (2001) Contour and texture analysis for image segmentation. Int J Comput Vision 43(1):7–27 

  27. American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub

  28. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Piñero González J, Rosinach Q, Bravo N, Déu À, Pons J, Bauer-Mehren A, Baron M et al (2015) DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes

  30. Mayer MÁ, Bundschus M, Rautschka M, Sanz F, Furlong LI (2011) Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One 6(6):e20284  

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rogers FB (1963) Medical subject headings. Bull Med Libr Assoc 51:114

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27(1):29–34

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kanehisa M (1997) A database for post-genome analysis. Trends Genet TIG 13(9):375

    Article  CAS  PubMed  Google Scholar 

  34. Altermann E, Klaenhammer TR (2005) PathwayVoyager: pathway mapping using the Kyoto encyclopedia of genes and genomes (KEGG) database. BMC Genom 6(1):60

    Article  CAS  Google Scholar 

  35. Ade AS, Wright ZC (2007) States DJ: Gene2MeSH [Internet]. Ann Arbor (MI): National Center for Integrative Biomedical Informatics

  36. Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: the Jaccard index versus Salton’s cosine formula. Inf Process Manag 25(3):315–318

    Article  Google Scholar 

  37. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval (vol 1, No. 1. Cambridge University Press, Cambridge, 496

    Book  Google Scholar 

  38. Wang JZ, Du Z, Payattakool R, Philip SY, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281

    Article  CAS  PubMed  Google Scholar 

  39. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  40. Ding CH (2003) Unsupervised feature selection via two-way ordering in gene expression analysis. Bioinformatics 19(10):1259–1266

    Article  CAS  PubMed  Google Scholar 

  41. Guzzi PH, Veltri P, Cannataro M (2013) Thresholding of semantic similarity networks using a spectral graph-based technique. In: International workshop on new frontiers in mining complex patterns. Springer International Publishing, pp 201–213

  42. Varshavsky R, Gottlieb A, Linial M, Horn D (2006) Novel unsupervised feature filtering of biological data. Bioinformatics 22(14):e507–e513

    Article  CAS  PubMed  Google Scholar 

  43. Alvim M, Andrés M, Palamidessi C (2010) Probabilistic information flow. In: Proceedings of the 25th annual IEEE symposium on logic in computer science, pp 314–321

  44. Lima C, de Assis F, de Souza C (2012) An empirical investigation of attribute selection techniques based on Shannon, Rényi and Tsallis entropies for network intrusion detection. Am J Intell Syst 2(5):111–117

    Article  Google Scholar 

  45. Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 110–121

    Google Scholar 

  46. Marsden A (2013) Eigenvalues of the laplacian and their relationship to the connectedness of a graph. University of Chicago, REU

  47. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  48. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Sreeja A, Vinayan KP (2017) Multidimensional knowledge-based framework is an essential step in the categorization of gene sets in complex disorders. J Bioinf Comput Biol 15(6):1750022 

  50. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  CAS  Google Scholar 

  51. Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and Information Sciences—ISCIS 2005. Springer, Berlin, pp 284–293

    Chapter  Google Scholar 

  52. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Fernández A, Sessel S (2009) Selective antagonism of anticancer drugs for side-effect removal. Trends Pharmacol Sci 30(8):403–410

    Article  CAS  PubMed  Google Scholar 

  54. Berger SI, Iyengar R (2009) Network analyses in systems pharmacology. Bioinformatics 25(19):2466–2472

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Bocchio-Chiavetto L, Maffioletti E, Bettinsoli P, Giovannini C, Bignotti S, Tardito D et al (2013) Blood microRNA changes in depressed patients during antidepressant treatment. Eur Neuropsychopharmacol 23(7):602–611

    Article  CAS  PubMed  Google Scholar 

  56. Kohane IS, McMurry A, Weber G, MacFadden D, Rappaport L, Kunkel L et al (2012) The co-morbidity burden of children and young adults with autism spectrum disorders. PLoS One 7(4):e33224

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This study is supported by the Cognitive Science Research Initiative (CSRI) of the Department of Science and Technology (DST), Government of India, as part of the funded Project, SR/CSI/81/2011 at Department of Computer Science, School of Arts and Sciences, Amrita University, Kochi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Sreeja.

Ethics declarations

Conflict of interest

The author states that the present manuscript presents no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sreeja, A., Krishnakumar, U. & Vinayan, K.P. Functional Categorization of Disease Genes Based on Spectral Graph Theory and Integrated Biological Knowledge. Interdiscip Sci Comput Life Sci 11, 460–474 (2019). https://doi.org/10.1007/s12539-017-0279-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-017-0279-7

Keywords

Navigation