Skip to main content
Log in

COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Biclustering is an increasingly used data mining technique for searching groups of co-expressed genes across the subset of experimental conditions from the gene-expression data. The group of co-expressed genes is present in the form of various patterns called a bicluster. A bicluster provides significant insights related to the functionality of genes and plays an important role in various clinical applications such as drug discovery, biomarker discovery, gene network analysis, gene identification, disease diagnosis, pathway analysis etc. This paper presents a novel unsupervised approach ‘COmprehensive Search for Column-Coherent Evolution Biclusters (COSCEB)’ for a comprehensive search of biologically significant column-coherent evolution biclusters. The concept of column subspace extraction from each gene pair and Longest Common Contiguous Subsequence (LCCS) is employed to identify significant biclusters. The experiments have been performed on both synthetic as well as real datasets. The performance of COSCEB is evaluated with the help of key issues. The issues are comprehensive search, Deep OPSM bicluster, bicluster types, bicluster accuracy, bicluster size, noise, overlapping, output nature, computational complexity and biologically significant biclusters. The performance of COSCEB is compared with six all-time famous biclustering algorithms SAMBA, OPSM, xMotif, Bimax, Deep OPSM- and UniBic. The result shows that the proposed approach performs effectively on most of the issues and extracts all possible biologically significant column-coherent evolution biclusters which are far more than other biclustering algorithms. Along with the proposed approach, we have also presented the case study which shows the application of significant biclusters for hub gene identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  • Anand P, Kunnumakara AB, Sundaram C, Harikumar KB, Tharakan ST, Lai OS, Sung B and Aggarwal BB 2008 Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25 2097–2116

    Article  CAS  Google Scholar 

  • Baldi P and Hatfield GW 2011 DNA microarrays and gene expression: From experiments to data analysis and modeling (Cambridge: Cambridge University Press)

  • Barkow S, Bleuler S, Prelic A, Zimmermann P and Zitzler E 2006 BicAT: A biclustering analysis toolbox. Bioinformatics 22 1282–1283

    Article  CAS  Google Scholar 

  • Behjati S and Tarpey PS 2013 What is next generation sequencing? Arch. Dis. Child.-Educ. Pract. 98 236–238

    Article  Google Scholar 

  • Ben-Dor A, Chor B, Karp R and Yakhini Z 2003 Discovering local structure in gene expression data: The order-preserving submatrix problem. J. Comput. Biol. 10 373–384

    Article  CAS  Google Scholar 

  • Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM and Sherlock G 2004 GO: termFinder – open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20 3710–3715

    Article  CAS  Google Scholar 

  • Cheng Y and Church GM 2000 Biclustering of expression data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8 93–103

    CAS  PubMed  Google Scholar 

  • Cheng KO, Law NF, Siu WC and Liew AWC 2008 Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinf. 9 210

    Article  Google Scholar 

  • Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ and Davis RW 1998 A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2 65–73

    Article  CAS  Google Scholar 

  • Gao BJ, Griffith OL, Ester M, Xiong H, Zhao Q and Jones SJ 2012 On the deep order-preserving submatrix problem: a best effort approach. IEEE Trans. Knowl. Data Eng. 24 309–325

    Article  Google Scholar 

  • Gao C, McDowell IC, Zhao S, Brown CD and Engelhardt BE 2016 Context specific and differential gene co-expression networks via Bayesian biclustering. PLoS Comput. Biol. 12 1004791

    Article  Google Scholar 

  • Gaur P and Chaturvedi A 2017 Clustering and candidate motif detection in exosomal miRNAs by application of machine learning algorithms. Interdiscip. Sci.: Comput. Life Sci. 1–9

  • Hanna EM, Zaki N and Amin A 2015 Detecting protein complexes in protein interaction networks modeled as gene expression biclusters. PloS One 10 p.e0144163

    Article  Google Scholar 

  • Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W and Bijnens L 2010 FABIA: factor analysis for bicluster acquisition. Bioinformatics 26 1520–1527

    Article  CAS  Google Scholar 

  • Jagannatam A 2008 Mersenne Twister – A Pseudo Random Number Generator and its variants. George Mason University, Department of Electrical and Computer Engineering.

  • Kaiser S and Leisch F 2008 A toolbox for bicluster analysis in R, Compstat 2008 – Proceedings in Computational Statistics, 2008 Heidelberg Physica Verlag, pp 201–208

  • Liu W and Ye H 2014 Co-expression network analysis identifies transcriptional modules in the mouse liver. Mol. Genet. Genomics 289 847–853

    Article  CAS  Google Scholar 

  • Madeira SC and Oliveira AL 2004 Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 1 24–45

    Article  CAS  Google Scholar 

  • Mahanta P, Ahmed HA, Bhattacharyya DK and Ghosh A 2014 FUMET: A fuzzy network module extraction technique for gene expression data. J. Biosci. 39 351–364

    Article  CAS  Google Scholar 

  • Maind A and Raut S 2017 Computational analysis of biclustering algorithms for identification of co-expressed genes. Int. J. Data Min. Bioinform. 19 243–269

    Article  Google Scholar 

  • Maind A and Raut S 2018 Comparative analysis and evaluation of biclustering algorithms for microarray data; in Networking communication and data knowledge engineering (Singapore: Springer) pp. 159–171

    Google Scholar 

  • Maind A and Raut S 2019 Identifying condition specific key genes from basal-like breast cancer gene expression data. Comput. Biol. Chem. 78 367–374

    Article  CAS  Google Scholar 

  • Mazel J 2011 Unsupervised network anomaly detection (Doctoral dissertation, INSA de Toulouse)

  • Murali TM and Kasif S 2002 Extracting conserved gene expression motifs from gene expression data; in Biocomputing (Washington, D.C.: World Scientific)

  • Niu BF, Lang XY, Lu ZH and Chi XB 2009 Parallel algorithm research on several important open problems in bioinformatics. Interdisciplinary Sciences: Computational Life Sciences 1 187–195

    CAS  Google Scholar 

  • Ozsolak F and Milos PM 2011 RNA sequencing: Advances, challenges and opportunities. Nat. Rev. Genet. 12 87

    Article  CAS  Google Scholar 

  • Padilha VA and Campello RJ 2017 A systematic comparative evaluation of biclustering techniques. BMC Bioinf. 18 55

    Article  Google Scholar 

  • Pansombut T, Hendrix W, Jacob Gao Z, Harrison BE and Samatova NF 2011 Biclustering-driven ensemble of Bayesian belief network classifiers for underdetermined problems; In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Spain, Vol. 22, No. 1, p. 1439

  • Pontes B, Giraldez R and Aguilar-Ruiz JS 2015 Biclustering on expression data: A review. J. Biomed. Inform. 57 163–180

    Article  Google Scholar 

  • Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L and Zitzler E 2006 A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22 1122–1129

    Article  CAS  Google Scholar 

  • Raut SA, Sathe SR and Raut A 2010 Bioinformatics: Trends in gene expression analysis; In Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on IEEE, Chengdu, China, pp. 97–100

  • Reymond P, Weber H, Damond M and Farmer EE 2000 Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant. Cell. 12 707–719

    Article  CAS  Google Scholar 

  • Sadhu A and Bhattacharyya B 2017 Common subcluster mining in microarray data for molecular biomarker discovery (Interdisciplinary Sciences: Computational Life Sciences, Springer Nature Switzerland) pp. 1–12

  • Szklarczyk R, Megchelenbrink W, Cizek P, Ledent M, Velemans G, Szklarczyk D and Huynen MA 2015 WeGET: Predicting new genes for molecular systems by weighted co-expression. Nucleic Acids Res. 44 D567–D573

    Article  Google Scholar 

  • Tanay A, Sharan R and Shamir R 2002 Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 S136–S144

    Article  Google Scholar 

  • Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y and Shamir R 2010 Expander: From expression microarrays to networks and functions. Nat. Protoc. 5 303

    Article  CAS  Google Scholar 

  • Wang Z, Li G, Robinson RW and Huang X 2016 Unibic: sequential row-based biclustering algorithm for analysis of gene expression data. Sci. Rep. 6 23466

    Article  CAS  Google Scholar 

  • Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL and Somogyi R 1998 Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95 334–339

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors are thankful to the Department of Computer Science and Engineering, VNIT, Nagpur (MS), India, for providing the resources and support during the course of this research. They are also very thankful to the Ministry of Electronics and Information Technology (MeitY), Government of India, for financial assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankush Maind.

Additional information

Communicated by NG Prasad.

Corresponding editor: NG Prasad

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 32 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maind, A., Raut, S. COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification. J Biosci 44, 48 (2019). https://doi.org/10.1007/s12038-019-9862-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12038-019-9862-x

Keywords

Navigation