Proximity Measures and Results Validation in Biclustering – A Survey

  • Patryk Orzechowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7895)


The concept of biclustering evolved from traditional clustering techniques, which have proved to be inadequate for discovering local patterns in gene microarrays, in particular with shifting and scaling patterns. In this work we compare similarity measures applied in different biclustering algorithms and review validation methodologies described in literature. To our best knowledge, this is the first in-depth comparative analysis of proximity measures and validation techniques for biclustering. Current trends in design of similarity measures as well as a rich collection of state-of-the-art benchmark datasets are presented, supporting algorithm designers in classification of comparison and quality assessment criteria of emerging biclustering algorithms.


biclustering co-clustering shifting and scaling patterns pattern similarity proximity measures results validation microarray gene expression data state-of-the-art survey 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aguilar-Ruiz, J.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)CrossRefGoogle Scholar
  2. 2.
    Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)CrossRefGoogle Scholar
  3. 3.
    Armstrong, S., Staunton, J., Silverman, L., Pieters, R., den Boer, M., Minden, M., Sallan, S., Lander, E., Golub, T., Korsmeyer, S., et al.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)CrossRefGoogle Scholar
  4. 4.
    Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)CrossRefGoogle Scholar
  5. 5.
    Ayadi, W., Elloumi, M., Hao, J.: Pattern-driven neighborhood search for biclustering of microarray data. BMC bioinformatics 13(suppl. 7), S11 (2012)Google Scholar
  6. 6.
    Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB 2002, pp. 49–57. ACM, New York (2002), CrossRefGoogle Scholar
  7. 7.
    Bozdağ, D., Kumar, A.S., Catalyurek, U.V.: Comparative analysis of biclustering algorithms. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 265–274. ACM, New York (2010), CrossRefGoogle Scholar
  8. 8.
    Bozdağ, D., Parvin, J.D., Catalyurek, U.V.: A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 151–163. Springer, Heidelberg (2009), CrossRefGoogle Scholar
  9. 9.
    Bryan, K.: Biclustering of expression data using simulated annealing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS 2005, pp. 383–388. IEEE Computer Society Press, Washington, DC (2005), Google Scholar
  10. 10.
    Chen, G., Jaradat, S., Banerjee, N., Tanaka, T., Ko, M., Zhang, M.: Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data. Statistica Sinica 12(1), 241–262 (2002)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Chen, P., Popovich, P.: Correlation: Parametric and nonparametric measures, pp. 137–139. Sage Publications, Incorporated (2002)Google Scholar
  12. 12.
    Cheng, Y., Church, G.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103 (2000)Google Scholar
  13. 13.
    Choi, S., Cha, S., Tappert, C.: A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics 8(1), 43–48 (2010)Google Scholar
  14. 14.
    Dharan, S., Nair, A.S.: Biclustering of gene expression data using reactive greedy randomized adaptive search procedure. BMC Bioinformatics 10(suppl. 1), S27 (2009)Google Scholar
  15. 15.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  16. 16.
    Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.: A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics (2012)Google Scholar
  17. 17.
    Erten, C., Sözdinler, M.: Biclustering expression data based on expanding localized substructures. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 224–235. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Faith, J., Driscoll, M., Fusaro, V., Cosgrove, E., Hayete, B., Juhn, F., Schneider, S., Gardner, T.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Research 36(suppl. 1), D866–D870 (2008)Google Scholar
  19. 19.
    Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., Brown, P.: Genomic expression programs in the response of yeast cells to environmental changes. Science Signalling 11(12), 4241 (2000)Google Scholar
  20. 20.
    Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)CrossRefGoogle Scholar
  21. 21.
    Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences 97(22), 12079–12084 (2000)CrossRefGoogle Scholar
  22. 22.
    Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC genomics 9(suppl. 1), 4 (2008)CrossRefGoogle Scholar
  23. 23.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)zbMATHCrossRefGoogle Scholar
  24. 24.
    Hartigan, J.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)CrossRefGoogle Scholar
  25. 25.
    Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W., et al.: Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)CrossRefGoogle Scholar
  26. 26.
    Hoshida, Y., Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Subclass mapping: identifying common subtypes in independent disease data sets. PloS One 2(11), e1195 (2007)Google Scholar
  27. 27.
    Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13), 1993–2003 (2004)CrossRefGoogle Scholar
  28. 28.
    Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N., et al.: Revealing modular organization in the yeast transcriptional network. Nature Genetics 31(4), 370–378 (2002)Google Scholar
  29. 29.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999), CrossRefGoogle Scholar
  30. 30.
    Jain, A.K., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Inc. (1988)Google Scholar
  31. 31.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010), CrossRefGoogle Scholar
  32. 32.
    Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al: Kegg for linking genomes to life and the environment. Nucleic acids research 36(suppl. 1), D480–D484 (2008)Google Scholar
  33. 33.
    Kerr, G., Ruskin, H., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)CrossRefGoogle Scholar
  34. 34.
    Lazzeroni, L., Owen, A., et al.: Plaid models for gene expression data. Statistica Sinica 12(1), 61–86 (2002)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Li, G., Ma, Q., Tang, H., Paterson, A., Xu, Y.: Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Research 37(15), e101–e101 (2009)Google Scholar
  36. 36.
    Liu, F., Zhou, H., Liu, J., He, G.: Biclustering of gene expression data using eda-ga hybrid. In: IEEE Congress on Evolutionary Computation, CEC 2006, pp. 1598–1602. IEEE (2006)Google Scholar
  37. 37.
    Liu, J., Li, Z., Hu, X., Chen, Y.: Biclustering of microarray data with mospo based on crowding distance. BMC bioinformatics 10(suppl. 4), S9 (2009)Google Scholar
  38. 38.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)CrossRefGoogle Scholar
  39. 39.
    Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial & Applied Mathematics 5(1), 32–38 (1957)MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proc. Pacific Symp. Biocomputing, vol. 3, pp. 77–88 (2003)Google Scholar
  41. 41.
    Myers, J., Well, A.: Research design and statistical analysis. Lawrence Erlbaum (2002)Google Scholar
  42. 42.
    Nepomuceno, J., Troncoso, A., Aguilar-Ruiz, J., et al.: Biclustering of gene expression data by correlation-based scatter search. BioData Mining 4(3) (2011)Google Scholar
  43. 43.
    Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27(1), 29–34 (1999)CrossRefGoogle Scholar
  44. 44.
    Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Computing and Informatics 29(6+), 1221–1231 (2010),
  45. 45.
    Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)CrossRefGoogle Scholar
  46. 46.
    Romesburg, C.: Cluster analysis for researchers. Lulu. com (2004)Google Scholar
  47. 47.
    Roy, S., Bhattacharyya, D., Kalita, J.: Deterministic approach for biclustering of co-regulated genes from gene expression data. Advances in Knowledge-Based and Intelligent Information and Engineering Systems 243, 490–499 (2012)Google Scholar
  48. 48.
    Santamaría, R., Quintales, L., Therón, R.: Methods to bicluster validation and comparison in microarray data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 780–789. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  49. 49.
    Sharan, R., Elkon, R., Shamir, R.: et al.: Cluster analysis and its applications to gene expression data. In: Ernst Schering Res Found Workshop, vol. 38, pp. 83–108 (2002)Google Scholar
  50. 50.
    Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9(12), 3273–3297 (1998)Google Scholar
  51. 51.
    Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), S136–S144 (2002)Google Scholar
  52. 52.
    Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)CrossRefGoogle Scholar
  53. 53.
    Teng, L., Chan, L.: Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. Journal of Signal Processing Systems 50(3), 267–280 (2008)CrossRefGoogle Scholar
  54. 54.
    Wilcox, R.: Introduction to robust estimation and hypothesis testing. Academic Press (2005)Google Scholar
  55. 55.
    Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., Von Rohr, P., Thiele, L., et al: Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biol. 5(11), R92 (2004)Google Scholar
  56. 56.
    Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 321–327 (March 2003)Google Scholar
  57. 57.
    Yip, K.Y., Cheung, D.W., Ng, M.K.: Harp: A practical projected clustering algorithm. IEEE Trans. on Knowl. and Data Eng. 16(11), 1387–1397 (2004), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Patryk Orzechowski
    • 1
  1. 1.Department of Automatics and Biomedical EngineeringAGH University of Science and TechnologyCracowPoland

Personalised recommendations