Abstract
In this article, we design an overlapping clustering method in a graph in order to deal with a biological issue: the proteins annotation. Given an unweighted and undirected graph G, we search for subgraphs of G that are dense in edges. The method consists in three steps. First we determine some intial kernels of the classes by means of a local density function; then we improve these kernels using a k-means process; last the kernels are extended to overlapping classes. The method is tested on random graphs and finally applied to a protein interactions network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ARABIE, P., HUBERT, L.J. and DE SOETE, G. (1996): Clustering and Classification. World Scientific, Singapore, New Jersey, London, Hong Kong.
BROSSIER, G. (2003): Les éléments fondamentaux de la classification. In: G. Govaert (Ed.): Analyse des donn’ees, Hermès Lavoisier, Paris, 235–262.
BRUN, C., WOJCIK, J., GUENOCHE, A. and JACQ, B. (2002): Étude bioinformatique des réseaux d’interactions: PRODISTIN, une nouvelle méthode de classification des protéines. In: J. Nicolas, C. Thermes (Eds.): Actes des Journées Ouvertes: Biologie, Informatique et Mathématiques (JOBIM). Rennes: IMPG, 171–182.
BRUN, C., HERRMANN, C. and GUENOCHE, A. (2004): Clustering proteins from interaction networks for the prediction of cellular functions. BMC Bioinformatics 5:95.
COLOMBO, T., QUENTIN, Y. and GUENOCHE, A. (2003): Recherche de zones denses dans un graphe: application aux gènes orthologues. In: Knowledge Discovery and Discrete Mathematics Colloquium (Actes des Journées Informatiques de Metz), INRIA, 203–212.
DENOEUD, L. (2006): Étude de la distance de transfert entre partitions et recherche de zones denses dans un graphe. PhD thesis, University of Paris 1.
DICE, L.R. (1945): Measures of the amount of ecologic association between species. Ecology, 26, 297–302.
DIDAY, E. (1971): Une nouvelle méthode en classification automatique et reconnaissance des formes: la méthode des nuées dynamiques. Revue de Statistique Appliquée, 19(2), 19–33.
HANSEN, P. and JAUMARD, B. (1997): Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215.
MARTIN, D., BRUN, C., REMY, E., MOUREN, P., THIEFFRY, D. and JACQ, B. (2004): GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biology, 5(12), Article R101. http://genomebiology.com/2004/5/12/R101.
WISHART, D. (1969): Mode Analysis: A generalisation of gearest geighbour which reduces chaining effects. In: A.J. Cole (Ed.): Numerical Taxonomy. London: Academic Press, 282–311.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Charon, I., Denoeud, L., Hudry, O. (2007). Overlapping Clustering in a Graph Using k-Means and Application to Protein Interactions Networks. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)