Discovering Significant Structures in Clustered Bio-molecular Data Through the Bernstein Inequality

Bertoni, Alberto; Valentini, Giorgio

doi:10.1007/978-3-540-74829-8_108

Discovering Significant Structures in Clustered Bio-molecular Data Through the Bernstein Inequality

Alberto Bertoni¹ &
Giorgio Valentini¹

Conference paper

1169 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4694))

Abstract

Searching for structures in complex bio-molecular data is a central issue in several branches of bioinformatics. In particular, the reliability of clusters discovered by a given clustering algorithm have been recently assessed through methods based on the concept of stability with respect to random perturbations of the data. In this context, a major problem is to assess the confidence of the measures of reliability. We discuss a partially ”distribution independent” method based on the classical Bernstein inequality to assess the statistical significance of the discovered clusterings. Experimental results with gene expression data show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kaplan, N., Friedlich, M., Fromer, M., Linial, M.: A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5 (2004)
Google Scholar
Bilu, Y., Linial, M.: The advantage of functional prediction based on clustering of yeast genes and its correlation with non-sequence based classification. Journal of Computational Biology 9, 193–210 (2002)
Article Google Scholar
Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3215 (2005)
Article Google Scholar
Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Computation 16, 1299–1323 (2004)
Article MATH Google Scholar
Bertoni, A., Valentini, G.: Model order selection for bio-molecular data clustering. BMC Bioinformatics (accepted for publication) (2007)
Google Scholar
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52, 91–118 (2003)
Article MATH Google Scholar
McShane, L., Radmacher, D., Freidlin, B., Yu, R., Li, M., Simon, R.: Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)
Article Google Scholar
Bertoni, A., Valentini, G.: Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artificial Intelligence in Medicine 37, 85–109 (2006)
Article Google Scholar
Hoeffding, W.: Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 58, 13–30 (1963)
Article MATH Google Scholar
Jain, A., Murty, M., Flynn, P.: Data Clustering: a Review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
Achlioptas, D.: Database-friendly random projections. In: Buneman, P. (ed.) Proc. ACM Symp. on the Principles of Database Systems, pp. 274–281. ACM Press, New York (2001)
Google Scholar
Ben-Hur, A., Ellisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Altman, R., Dunker, A., Hunter, L., Klein, T., Lauderdale, K. (eds.) Pacific Symposium on Biocomputing, vol. 7, pp. 6–17. World Scientific, Lihue, Hawaii, USA (2002)
Google Scholar
Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Valentini, G.: Mosclust: a software library for discovering significant structures in bio-molecular data. Bioinformatics 23, 387–389 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

DSI, Dipartimento di Scienze dell’ Informazione, Università degli Studi di Milano, Via Comelico 39, 20135 Milano, Italia
Alberto Bertoni & Giorgio Valentini

Authors

Alberto Bertoni
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Valentini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bertoni, A., Valentini, G. (2007). Discovering Significant Structures in Clustered Bio-molecular Data Through the Bernstein Inequality. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74829-8_108

Download citation

DOI: https://doi.org/10.1007/978-3-540-74829-8_108
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74828-1
Online ISBN: 978-3-540-74829-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics