Knowledge and Information Systems

, Volume 30, Issue 2, pp 341–358 | Cite as

BicFinder: a biclustering algorithm for microarray data analysis

Regular Paper

Abstract

In the context of microarray data analysis, biclustering allows the simultaneous identification of a maximum group of genes that show highly correlated expression patterns through a maximum group of experimental conditions (samples). This paper introduces a heuristic algorithm called BicFinder (The BicFinder software is available at: http://www.info.univ-angers.fr/pub/hao/BicFinder.html) for extracting biclusters from microarray data. BicFinder relies on a new evaluation function called Average Correspondence Similarity Index (ACSI) to assess the coherence of a given bicluster and utilizes a directed acyclic graph to construct its biclusters. The performance of BicFinder is evaluated on synthetic and three DNA microarray datasets. We test the biological significance using a gene annotation web-tool to show that our proposed algorithm is able to produce biologically relevant biclusters. Experimental results show that BicFinder is able to identify coherent and overlapping biclusters.

Keywords

Biclustering Heuristics Evaluation function Data mining Analysis of DNA microarray data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguilar-Ruiz JS (2005) Shifting and scaling patterns from gene expression data. Bioinformatics 21: 3840–3845CrossRefGoogle Scholar
  2. 2.
    Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2010) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst, Published online: 10 March 2010Google Scholar
  3. 3.
    Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403: 503–511CrossRefGoogle Scholar
  4. 4.
    Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. J Inf Sci 178: 1479–1497MATHCrossRefGoogle Scholar
  5. 5.
    Ayadi W, Elloumi M (2011) Algorithms in computational molecular biology: techniques, approaches and applications, chapter biclustering of microarray data. In: Wiley book series on bioinformatics : computational techniques and engineering, Wiley-Blackwell, John Wiley & Sons Ltd., New Jersey (Publish.) (to appear)Google Scholar
  6. 6.
    Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. BioData Min 2(1): 9CrossRefGoogle Scholar
  7. 7.
    Balasubramaniyan R, llermeier H, Weskamp E, Kamper J (2005) Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 21: 1069–1077CrossRefGoogle Scholar
  8. 8.
    Barkow S, Bleuler S, Prelic A, Zimmermann P, Zitzler E (2006) Bicat: a biclustering analysis toolbox. Bioinformatics 22(10): 1282–1283CrossRefGoogle Scholar
  9. 9.
    Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: RECOMB ’02: proceedings of the sixth annual international conference on computational biology. ACM, New York, pp 49–57Google Scholar
  10. 10.
    Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13): 1993–2003CrossRefGoogle Scholar
  11. 11.
    Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with funcassociate. Bioinformatics 19(18): 2502–2504CrossRefGoogle Scholar
  12. 12.
    Bleuler S, Prelic A, Zitzler E (2004) An ea framework for biclustering of gene expression data. In: Proceedings of congress on evolutionary computation. pp 166–173Google Scholar
  13. 13.
    Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. In: IEEE Transactions on information technology on biomedicine, 10(3): 519–525Google Scholar
  14. 14.
    Cano C, Adarve L, Lopez J, Blanco A (2007) Possibilistic approach for biclustering microarray data. In: Computers in biology and medicine, 37, pp 1426–1436Google Scholar
  15. 15.
    Cheng KO, Law NF, Siu WC, Liew AW (2008) Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinformatics 9(210): 1282–1283Google Scholar
  16. 16.
    Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology. AAAI Press, pp 93–103Google Scholar
  17. 17.
    Cheng Y, Church GM (2006) Biclustering of expression data. Technical report (supplementary information)Google Scholar
  18. 18.
    Christinat Y, Wachmann B, Zhang L (2008) Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data. IEEE/ACM Trans Comput Biol Bioinform 5(4): 583–593CrossRefGoogle Scholar
  19. 19.
    Dharan A, Nair AS (2009) Biclustering of gene expression data using reactive greedy randomized adaptive search procedure. BMC Bioinform 10(Suppl 1): S27CrossRefGoogle Scholar
  20. 20.
    Dimaggio P, Mcallister S, Floudas C (2008) Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinform 9(1):458Google Scholar
  21. 21.
    Divina F, Aguilar-Ruiz JS (2007) A multi-objective approach to discover biclusters in microarray data. In: GECCO ’07: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New YorkGoogle Scholar
  22. 22.
    Gallo CA, Carballido JA, Ponzoni I (2009) Microarray biclustering: A novel memetic approach based on the pisa platform. In: EvoBIO ’09: Proceedings of the 7th European conference on evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 44–55Google Scholar
  23. 23.
    Hartigan JA (1972) Direct clustering of a data matrix. J American Statistical Association 67(337): 123–129CrossRefGoogle Scholar
  24. 24.
    Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A (2007) Mining gene-sample-time microarray data: a coherent gene cluster discovery approach. Knowl Inf Syst 13(3): 305–335CrossRefGoogle Scholar
  25. 25.
    Lehmann EL, D’Abrera HJM (1998) Nonparametrics: statistical methods based on ranks. Prentice-Hall, rev. ed. Englewood Cliffs, NJ, pp 292–323Google Scholar
  26. 26.
    Liu J, Li Z, Hu X, Chen Y (2009) Biclustering of microarray data with MOSPO based on crowding distance. BMC Bioinform 10(S–4)Google Scholar
  27. 27.
    Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. IEEE Int Conf Data Min. ISBN 0-7695-1978-4, pp 187–194Google Scholar
  28. 28.
    Liu JW, Li ZJ, Liu FF, Chen YM (2008) Multi-objective particle swarm optimization biclustering of microarray data. In: IEEE international conference on bioinformatics and biomedicine(BIBM 2008). IEEE Computer Society, Washington, pp 363–366Google Scholar
  29. 29.
    Liu X, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1): 50–56CrossRefGoogle Scholar
  30. 30.
    Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19: 474–482CrossRefGoogle Scholar
  31. 31.
    Madeira SaraC, Oliveira ArlindoL (2004) Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 1(1): 24–45CrossRefGoogle Scholar
  32. 32.
    Madeira SC, Oliveira AL (2009) A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algorithms Mol Biol 4: 8CrossRefGoogle Scholar
  33. 33.
    Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10: 27CrossRefGoogle Scholar
  34. 34.
    Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12): 2464–2477MATHCrossRefGoogle Scholar
  35. 35.
    Myers JL, Arnold DW (2003) Research design and statistical analysisGoogle Scholar
  36. 36.
    Okada Y, Okubo K, Horton P, Fujibuchi W (2007) Exhaustive search method of gene expression modules and its application to human tissue data. In: IAENG international journal of computer science, 34, pp 1–16Google Scholar
  37. 37.
    Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19: 834–841CrossRefGoogle Scholar
  38. 38.
    Pontes B, Divina F, Giráldez R, Aguilar-Ruiz JS (2007) Virtual error: a new measure for evolutionary biclustering. In: Evolutionary computation, machine learning and data mining in bioinformatics. pp 217–226Google Scholar
  39. 39.
    Prelic A, Bleuler S, Zimmermann P, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9): 1122–1129CrossRefGoogle Scholar
  40. 40.
    Schliep A, Schonhuth A, Steinhoff C (2003) Using hidden markov models to analyze gene expression time course data. Bioinformatics 19: i255–i263CrossRefGoogle Scholar
  41. 41.
    Son YS, Baek J (2008) A modified correlation coefficient based similarity measure for clustering time-course gene expression data. Pattern Recognit Lett 29(3): 232–242CrossRefGoogle Scholar
  42. 42.
    Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18: S136–S144CrossRefGoogle Scholar
  43. 43.
    Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22: 281–285CrossRefGoogle Scholar
  44. 44.
    Teng L, Chan L (2008) Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J Signal Process Syst 50(3): 267–280CrossRefGoogle Scholar
  45. 45.
    Wei JM, Wang SQ, Yuan XJ (2010) Ensemble rough hypercuboid approach for classifying cancers. IEEE Trans Knowl Data Eng 22(3): 381–391CrossRefGoogle Scholar
  46. 46.
    Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: BIBE ’03: Proceedings of the 3rd IEEE symposium on bioInformatics and bioengineering. IEEE Computer Society, Washington, p 321Google Scholar
  47. 47.
    Zhang Z, Teo A, Ooi BC, Tan KL (2004) Mining deterministic biclusters in gene expression data. Bioinformatic and bioengineering, IEEE international symposium on, pp 283–290Google Scholar
  48. 48.
    Zhao H, Liew A, Xie X, Yan H (2008) A new geometric biclustering algorithm based on the hough transform for analysis of large scale microarray data. J Theoretical Biol 251: 264–274CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.UTIC, Higher School of Sciences and Technologies of TunisUniversity of TunisTunisTunisia
  2. 2.LERIAUniversity of AngersAngersFrance

Personalised recommendations