Discovering Significant Structures in Clustered Bio-molecular Data Through the Bernstein Inequality

  • Alberto Bertoni
  • Giorgio Valentini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4694)


Searching for structures in complex bio-molecular data is a central issue in several branches of bioinformatics. In particular, the reliability of clusters discovered by a given clustering algorithm have been recently assessed through methods based on the concept of stability with respect to random perturbations of the data. In this context, a major problem is to assess the confidence of the measures of reliability. We discuss a partially ”distribution independent” method based on the classical Bernstein inequality to assess the statistical significance of the discovered clusterings. Experimental results with gene expression data show the effectiveness of the proposed approach.


Acute Myeloid Leukemia Cluster Algorithm Gene Expression Data Random Perturbation Random Projection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kaplan, N., Friedlich, M., Fromer, M., Linial, M.: A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5 (2004)Google Scholar
  2. 2.
    Bilu, Y., Linial, M.: The advantage of functional prediction based on clustering of yeast genes and its correlation with non-sequence based classification. Journal of Computational Biology 9, 193–210 (2002)CrossRefGoogle Scholar
  3. 3.
    Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3215 (2005)CrossRefGoogle Scholar
  4. 4.
    Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Computation 16, 1299–1323 (2004)zbMATHCrossRefGoogle Scholar
  5. 5.
    Bertoni, A., Valentini, G.: Model order selection for bio-molecular data clustering. BMC Bioinformatics (accepted for publication) (2007)Google Scholar
  6. 6.
    Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52, 91–118 (2003)zbMATHCrossRefGoogle Scholar
  7. 7.
    McShane, L., Radmacher, D., Freidlin, B., Yu, R., Li, M., Simon, R.: Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)CrossRefGoogle Scholar
  8. 8.
    Bertoni, A., Valentini, G.: Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artificial Intelligence in Medicine 37, 85–109 (2006)CrossRefGoogle Scholar
  9. 9.
    Hoeffding, W.: Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 58, 13–30 (1963)zbMATHCrossRefGoogle Scholar
  10. 10.
    Jain, A., Murty, M., Flynn, P.: Data Clustering: a Review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  11. 11.
    Achlioptas, D.: Database-friendly random projections. In: Buneman, P. (ed.) Proc. ACM Symp. on the Principles of Database Systems, pp. 274–281. ACM Press, New York (2001)Google Scholar
  12. 12.
    Ben-Hur, A., Ellisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Altman, R., Dunker, A., Hunter, L., Klein, T., Lauderdale, K. (eds.) Pacific Symposium on Biocomputing, vol. 7, pp. 6–17. World Scientific, Lihue, Hawaii, USA (2002)Google Scholar
  13. 13.
    Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  14. 14.
    Valentini, G.: Mosclust: a software library for discovering significant structures in bio-molecular data. Bioinformatics 23, 387–389 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Alberto Bertoni
    • 1
  • Giorgio Valentini
    • 1
  1. 1.DSI, Dipartimento di Scienze dell’ Informazione, Università degli Studi di Milano, Via Comelico 39, 20135 MilanoItalia

Personalised recommendations