Skip to main content
Log in

Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Many methods have been developed for reverse engineering gene networks from time series expression data. However, when the number of genes and the complexity of regulation increase, it becomes increasingly difficult to infer gene networks. To tackle this scalability problem, this study presents an approach with two phases: gene clustering and network reconstruction. To perform gene clustering, a hybrid method of data and knowledge-based clustering was developed to calculate both data and semantic similarity between genes. In the network reconstruction procedure, a Boolean network model that was inferred from the gene clusters was used to represent the network. A series of experiments were conducted to investigate the effect of the hybrid similarity measure in gene clustering and network reconstruction. The results prove the feasibility and effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Hartemink AJ. Reverse engineering gene regulatory networks. Nat Biotechnol. 2005;3(5):554–5.

    Article  Google Scholar 

  2. Ingolia NT, Weissman JS. Systems biology: reverse engineering the cell. Nature. 2008;454:1059–62.

    Article  CAS  PubMed  Google Scholar 

  3. Lee W-P, Tzou W-S. Computational methods for discovering gene networks from expression data. Brief Bioinform. 2009;10(4):408–23.

    CAS  PubMed  Google Scholar 

  4. Voit EO. Biochemical systems theory: a review. ISRN Biomathematics. 2013;2013:897658.

    Article  Google Scholar 

  5. Andreopoulos B, An A, Wang X, Schroeder M. A roadmap of clustering algorithms: finding amatch for a biomedical application. Brief Bioinform. 2009;10(3):297–314.

    Article  CAS  PubMed  Google Scholar 

  6. Pirim H, Ekiolu B, Perkins A. Clustering of high throughput gene expression data. Comput Oper Res. 2012;39(12):3046–61.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Alakwaa FM, Solouma NH, Kadah YM. Construction of gene regulatory networks using biclustering and Bayesian networks. Theor Biol Med Modell. 2011;8(1):39–58.

    Article  Google Scholar 

  8. Lee W-P, Hsiao Y-T. An adaptive GA-PSO approach with gene clustering to infer S-system models of gene regulatory networks. Comput J. 2011;54(9):1449–64.

    Article  Google Scholar 

  9. Gormley P, Li K, Wolkenhauer O, Irwin GW. Reverse engineering of biochemical reaction networks using co-evolution with eng-genes. Cogn Comput. 2013;5(1):106–18.

    Article  Google Scholar 

  10. Picard F, Robin S, Lebarbier E, Daudin JJ. A segmentation/clustering model for the analysis of array CGH data. Biometrics. 2007;63(3):758–66.

    Article  CAS  PubMed  Google Scholar 

  11. Torshizi AD, Zarandi MHF. A new cluster validity measure based on general type-2 fuzzy sets: application in gene expression data clustering. Knowl Based Syst. 2014;64:81–93.

    Article  Google Scholar 

  12. Tan M, Alshalalfa M, Alhajj R, Polat F. Influence of prior knowledge in constraint-based learning of gene regulatory networks. IEEE Trans Comput Biol Bioinform. 2011;8(1):130–42.

  13. Alterovitz G, Ramoni MF. Knowledge-based bioinformatics: from analysis to interpretation. Chichester, UK: Wiley; 2010.

    Book  Google Scholar 

  14. Lee W-P, Yang K-C. A clustering-based approach for inferring recurrent neural networks as gene regulatory networks. Neurocomputing. 2008;71(4–6):600–10.

    Article  Google Scholar 

  15. Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83.

    Article  CAS  PubMed  Google Scholar 

  16. Kustra R, Zagdanski A. Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans Comput Biol Bioinform. 2010;7(1):50–63.

  17. Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer; 2002.

    Google Scholar 

  18. Camon E, Magrane M, Barrell D, et al. The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with Gene Oncology. Nucleic Acids Res. 2004;32(2004):D262–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mussel C, Hopfensitz M, Kestler HA. BoolNet—an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics. 2012;26(10):1378–80.

    Article  Google Scholar 

  20. Hsiao Y-T, Lee W-P. A sensitivity-based incremental evolution approach for the inference of gene networks. BMC Bioinform. 2012;13(Suppl 2):S8.

    Article  Google Scholar 

  21. Wang R-S, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of methodology and applications. Phys Biol. 2012;9(5):055001.

  22. Saadatpoura A, Albert R. Boolean modeling of biological regulatory networks: a methodology tutorial. Methods. 2013;62(1):3–12.

    Article  Google Scholar 

  23. Hernminger BM, Saelim B, Sullivan PF, Vision TJ. Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J Am Soc Inf Sci Technol. 2007;58(14):2341–52.

    Article  Google Scholar 

  24. Praveen P, Frohlich H. Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources. PLoS ONE. 2013;8(6):e67410.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Chen CC, Zhong S. Inferring gene regulatory networks by thermodynamic modeling. BMC Genom. 2008;9(Supplement 2):S19.

    Article  Google Scholar 

  26. Vasic B, Ravanmehr V, Krishnan AR. An information theoretic approach to constructing robust Boolean gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(1):52–65.

  27. Ruz GA, Goles E. Learning gene regulatory networks using the bees algorithm. Neural Comput Appl. 2013;22(1):63–70.

    Article  Google Scholar 

  28. Upstill-Goddard R, Eccles D, Reige J, Collins A. Machine learning approaches for the discovery of gene–gene interactions in disease data. Brief Bioinform. 2013;14(2):251–60.

    Article  CAS  PubMed  Google Scholar 

  29. Ayadi W, Elloumi M, Hao JK. BiMine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowl Based Syst. 2012;35:224–34.

    Article  Google Scholar 

  30. Masciari E, Mazzeo GM, Zaniolo C. Analysing microarray expression data through effective clustering. Inf Sci. 2014;262:32–45.

    Article  Google Scholar 

  31. Malik ZK, Hussain A, Jonathan W. Novel biologically inspired approaches to extracting online information from temporal data. Cogn Comput. 2014;6(3):595–607.

    Article  Google Scholar 

  32. Yeung KY, Ruzzo WL. Principal component analysis for clustering gene expression data. Bioinformatics. 2001;17(9):763–74.

    Article  CAS  PubMed  Google Scholar 

  33. Ma S, Dai Y. Principal component analysis based methods in bioinformatics studies. Brief Bioinform. 2011;12(6):714–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Snaider J, Franklin S. Modular composite representation. Cogn Comput. 2014;6(3):510–27.

    Article  Google Scholar 

  35. Xu J, Yang G, Yin Y, Man H, He H. Sparse-representation-based classification with structure-preserving dimension reduction. Cogn Comput. 2014;6(3):608–21.

    Article  Google Scholar 

  36. Bourdon J, Eveillard D, Siegel A. Integrating quantitative knowledge into a qualitative gene regulatory network. PLoS Comput Biol. 2011;7(9):e1002157.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Mazandu GK, Mulder NJ. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. BioMed Res Int. 2013;2013:292063.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Couto FM, Silva MJ, Coutinho PM. Measuring semantic similarity between Gene Ontology terms. Data Knowl Eng. 2007;61(1):137–52.

    Article  Google Scholar 

  39. Peng J, Wang Y, Chen J. Towards integrative gene functional similarity measurement. BMC Bioinform. 2014;15(S2):S5.

  40. Batet M, Sanchez D, Valls A. An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011;44:118–25.

    Article  PubMed  Google Scholar 

  41. Mazandu GK, Mulder NJ. A topology-based metric for measuring term similarity in the gene ontology. Adv Bioinform. 2012;2012:975783.

    Google Scholar 

  42. Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of international joint conference on artificial intelligence, 1995, p. 448–53.

  43. Resnik P. Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language. J Artif Intell Res. 1999;11:95–130.

    Google Scholar 

  44. Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81.

    Article  CAS  PubMed  Google Scholar 

  45. Yu H, Jansen R, Stolovitzky G, Gerstein M. Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications. Bioinformatics. 2007;23(16):2163–73.

    Article  CAS  PubMed  Google Scholar 

  46. Bezdek J. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1981;10(2–3):191–203.

    Google Scholar 

  47. Trauwaert E. On the meaning of Dunn’s partition coefficient for fuzzy clusters. Fuzzy Sets Syst. 1988;25(2):217–42.

    Article  Google Scholar 

  48. Dembélé D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics. 2003;19(8):973–80.

    Article  PubMed  Google Scholar 

  49. Zainudin S, Mohamed NS. Evaluating the performance of partitioning techniques for gene network inference. Proceedings of international conference on intelligent systems design and applications, 2010, p.1119–24.

  50. Mahdavi MA, Lin Y-H. False positive reduction in protein-protein interaction predictions using gene ontology annotations. BMC Bioinform. 2007; 8:262.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Po Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, WP., Lin, CH. Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction. Cogn Comput 8, 217–227 (2016). https://doi.org/10.1007/s12559-015-9349-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9349-5

Keywords

Navigation