Abstract
Many researches have studied the complex system today because protein complexes, formed by proteins that interact with each other to perform specific biological functions, play a significant role in the biological area. And a few years ago, E. C. Kenley and Y. R. Cho introduced an algorithms which uses the entropy of graph for clustering in [2,3] based on protein-protein interaction network.
In our study, we extend the works to find potential protein complexes while overcoming existing weaknesses of their algorithms to make the results more reliable. We firstly clean the dataset, build a graph based on protein-protein interactions, then trying to determine locally optimal clusters by growing an initial cluster combined of two selected seeds while keeping cluster’s entropy to be minimized. The cluster is formed when its entropy cannot be decreased anymore. Finally, overlapping clusters will be refined to improve their quality and compare to a curated protein complexes dataset. The result shows that the quality of clusters generated by our algorithm measured by the average cluster size considering f1-score is spectacular and the running time is better.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kenly, E.C., Cho, Y.-R.: Entropy-Based Graph Clustering: Application to Biological and Social Networks. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 1550–4786 (December 2011), DOI = http://dx.doi.org/10.1109/ICDM.2011.64
Chaim, T.C., Cho, Y.-R.: Accuracy improvement in protein complex prediction from protein interaction networks by refining cluster overlaps. Proteome Sci. 10 (Suppl 1:S3)(Jun 21, 2012), doi:10.1186/1477-5956-10-S1-S3
IntAct Curated Yeast, Protein-protein Interaction Datasets, ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/species/yeast.zip
IntAct Curated Yeast, Complexes Datasets, ftp://ftp.ebi.ac.uk/pub/databases/intact/complex/current/psi25
Razick, S., Magklaras, G., Donaldson, I.M.: iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008), doi:10.1186/1471-2105-9-405.
Graph Entropy – A Survey. G. Simonyi, http://www.renyi.hu/~simonyi/grams.pdf
Van Dongen, S.: A new clustering algorithm for graphs, National Research Institute for Mathematics and Computer Science in the Netherlands, Tech. Rep. INS-R0010 (2000)
Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2 (2003)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Physical Review E 70, 66111 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Le, VH., Kim, SR. (2015). Using Entropy Cluster-Based Clustering for Finding Potential Protein Complexes. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-16483-0_51
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)