Identification of new disease genes from protein–protein interaction network

  • M. Mohamed Divan Masood
  • D. Manjula
  • Vijayan Sugumaran
Original Research


Human beings are susceptible to diseases caused by a variety of reasons. The causal factors for diseases include microorganisms (viruses and bacteria), environmental factors and largely, genetics. As most diseases are caused by the genes of a particular living organism, it is necessary to analyze the genetic characteristics to treat the disease effectively. The genetics analysis is done in this research using protein–protein interaction network (PPI), which contains information on a gene’s interaction with other genes of a particular living organism. Using the PPI network, new genes for a particular disease can be discovered by applying the shortest path algorithm on the PPI network, given that genes reside next to one another in the network. Consequently, the shortest path algorithm that is executed on the PPI network will retrieve the most similar genes that are related to a particular disease. From the retrieved candidate genes, irrelevant genes are filtered out and the remaining are taken as novel genes.


Candidate genes Genetics Novel genes Protein–protein interaction 



The work by V. Sugumaran has been supported by a 2018 School of Business Administration Spring/Summer Research Fellowship from Oakland University.


  1. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Pages F, Trajanoski Z, Galon J (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093. CrossRefGoogle Scholar
  2. Bouramoul A (2017) Gravizor: a graphical tool for the visualization of web search engines results with multi-agent modeling. Int J Intell Inf Technol 13(3):37–56CrossRefGoogle Scholar
  3. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinf 10(1):421CrossRefGoogle Scholar
  4. Chen L, Chu C, Kong X, Huang G, Huang T, Cai YD (2015a) A hybrid computational method for the discovery of novel reproduction-related genes. PLoS One 10(3):e0117090CrossRefGoogle Scholar
  5. Chen L, Chu C, Lu J, Kong X, Huang T, Cai YD (2015b) Gene ontology and KEGG pathway enrichment analysis of a drug target-based classification system. PLoS One 10(5):e0126492. CrossRefGoogle Scholar
  6. Donkor ES, Dayie NTKD., Adiku TK (2014) Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA). J Bioinf Seq Anal 6(1):1–6CrossRefGoogle Scholar
  7. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Minguez P, Bork P, Mering CV, Jensen LJ (2012) STRING v9. 1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41(D1):D808–D815CrossRefGoogle Scholar
  8. Guo X, Gao L, Wei C, Yang X, Zhao Y, Dong A (2011) A computational method based on the integration of heterogeneous networks for predicting disease–gene associations. PLoS One 6(9):e24171CrossRefGoogle Scholar
  9. He B, Tang J, Ding D, Wang H, Sun Y, Shin JH, Chen B, Moorthy G, Qiu J, Desai P, Wild DJ (2011) Mining relational paths in integrated biomedical data. PLoS One 6(12):e27506CrossRefGoogle Scholar
  10. Hema R, Geetha TV (2016) Recognition of chemical entities using pattern matching and functional group classification. Int J Intell Inf Technol 12(4):21–44CrossRefGoogle Scholar
  11. Huang DW, Sherman BT, Lempicki RA (2008) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57CrossRefGoogle Scholar
  12. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci 98(8):4569–4574CrossRefGoogle Scholar
  13. Jiang M, Chen Y, Zhang Y, Chen L, Zhang N, Huang T, Cai YD, Kong XY (2013) Identification of hepatocellular carcinoma related genes with k-th shortest paths in a protein–protein interaction network. Mol BioSyst 9(11):2720–2728CrossRefGoogle Scholar
  14. Ke H (2017) Designing extreme learning machine network structure based on tolerance rough set. Int J Intell Inf Technol 13(4):38–55CrossRefGoogle Scholar
  15. Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8(2):e1002375. CrossRefGoogle Scholar
  16. Li BQ, Huang T, Liu L, Cai YD, Chou KC (2012) Identification of colorectal cancer related genes with mRMR and shortest path in protein–protein interaction network. PLoS One 7(4):e33393CrossRefGoogle Scholar
  17. Manju G, Kavitha V, Geetha TV (2017) Influential researcher identification in academic network using rough set based selection of time-weighted academic and social network features. Int J Intell Inf Technol 13(1):1–25CrossRefGoogle Scholar
  18. Navlakha S, Kingsford C (2010) The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8):1057–1063CrossRefGoogle Scholar
  19. Neha S, Harikumar SL (2013) Use of genomics and proteomics in pharmaceutical drug discovery and development: a review. Int J Pharm Pharm Sci 5(3):24–28Google Scholar
  20. Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TKB, Chandrika KN, Deshpande N, Rashmi SSBP., Shanker K, Padma N, Niranjan V, Harsha HC, Talreja N, Vrushabendra BM, Ramya MA, Yatish AJ, Joy M, Shivashankar HN, Kavitha MP, Menezes M, Choudhury DP, Ghosh N, Saravana R, Chandran S, Mohan S, Jonnalagadda CK, Prasad CK, Sinha CK, Deshpande KS, Pandey A (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res 32(suppl_1):D497–D501CrossRefGoogle Scholar
  21. Priyatharshini R, Chitrakala S (2017) An efficient coronary disease diagnosis system using dual-phase multi-objective optimization and embedded feature selection. Int J Intell Inf Technol 13(3):15–36CrossRefGoogle Scholar
  22. Ran J, Li H, Fu J, Liu L, Xing Y, Li X, Shen H, Chen Y, Jiang X, Li Y, Li H (2013) Construction and analysis of the protein–protein interaction network related to essential hypertension. BMC Syst Biol 7(1):32CrossRefGoogle Scholar
  23. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504CrossRefGoogle Scholar
  24. Subramani S, Kalpana R, Natarajan J (2014) ProNormz—an integrated approach for human proteins and protein kinases normalization. J Biomed Inf 47:131–138CrossRefGoogle Scholar
  25. Sun PG (2015) The human drug–disease–gene network. Inf Sci 306:70–80CrossRefGoogle Scholar
  26. Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R (2010) Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 6(1):e1000641MathSciNetCrossRefGoogle Scholar
  27. Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genome wide association studies. Am J Hum Genet 81(6):1278–1283CrossRefGoogle Scholar
  28. Wang SP, Huang GH, Hu Q, Zou Q (2016) A network-based method for the identification of putative genes related to infertility. Biochim Biophys Acta Gen Subj 1860(11):2716–2724CrossRefGoogle Scholar
  29. Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M (2007) Drug-target network. Nat Biotechnol 25(10):1119–1126CrossRefGoogle Scholar
  30. Zhang J, Jiang M, Yuan F, Feng KY, Cai YD, Xu X, Chen L (2013) Identification of age-related macular degeneration related genes by applying shortest path algorithm in protein–protein interaction network. BioMed Res Int 2013:523415. Google Scholar
  31. Zhang J, Suo Y, Zhang YH, Zhang Q, Chen X, Xu X, Lu W (2016a) Mining for genes related to choroidal neovascularization based on the shortest path algorithm and protein interaction information. Biochim Biophys Acta Gen Subj 1860(11):2740–2749CrossRefGoogle Scholar
  32. Zhang Q, Zhang PW, Cai YD (2016b) The use of protein–protein interactions for the analysis of the associations between PM2.5 and some diseases. BioMed Res Int 2016:4895476. Google Scholar
  33. Zhang YH, Chu C, Wang S, Chen L, Lu J, Kong X, Huang T, Li H, Cai YD (2016c) The use of gene ontology term and KEGG pathway enrichment for analysis of drug half-life. PLoS One 11(10):e0165496CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAnna UniversityChennaiIndia
  2. 2.Center for Data Science and Big Data Analytics, Department of Decision and Information Sciences, School of Business AdministrationOakland UniversityRochesterUSA

Personalised recommendations