A Modified Markov Clustering Approach for Protein Sequence Clustering

  • Lehel Medvés
  • László Szilágyi
  • Sándor M. Szilágyi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)

Abstract

In this paper we propose a modified Markov clustering algorithm for efficient clustering of large protein sequence databases, based on previously evaluated sequence similarity criteria. The proposed alteration consists in an exponentially decreasing inflation rate, which aims at helping the quick creation of the hard structure of clusters by using a strong inflation in the beginning, and at producing fine partitions with a weaker inflation thereafter. The algorithm, which was tested and validated using the whole SCOP95 database, or randomly selected 10-50% sections, generally converges within 12-14 iteration cycles and provides clusters of high quality. Furthermore, a novel generalized formula is given for the inflation operation, and an efficient matrix symmetrization technique is presented, in order to improve the partition quality with relatively low amount of extra computations. A large graph layout technique is also employed for the efficient visualization of the obtained clusters.

Keywords

Markov clustering protein sequence clustering sparse matrix large graph layout SCOP95 database 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adai, A.T., Date, S.V., Wieland, S., Marcotte, E.M.: LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks. J. Mol. Biol. 340, 179–190 (2004)CrossRefPubMedGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schaffen, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search program. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucl. Acids Res. 32, D226–D229 (2004)CrossRefGoogle Scholar
  4. 4.
    Dayhoff, M.O.: The origin and evolution of protein superfamilies. Fed. Proc. 35, 2132–2138 (1976)PubMedGoogle Scholar
  5. 5.
    Doolittle, R.F.: The multiplicity of domains in proteins. Ann. Rev. Biochem. 64, 287–314 (1995)CrossRefPubMedGoogle Scholar
  6. 6.
    Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365 (1996)CrossRefPubMedGoogle Scholar
  7. 7.
    Eddy, S.R.: Profile hidden Markov models. Bioinf. 14, 755–763 (1998)CrossRefGoogle Scholar
  8. 8.
    Enright, A.J., Ouzounis, C.A.: BioLayout: an automatic graph layout algorithm for similarity visualization. Bioinf. 17, 853–854 (2001)CrossRefGoogle Scholar
  9. 9.
    Enright, A.J., van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Fitch, W.M.: Aspects of molecular evolution. Ann. Rev. Genet. 7, 343–380 (1973)CrossRefPubMedGoogle Scholar
  11. 11.
    Heger, A., Holm, L.: Towards a covering set of protein family profiles. Prog. Biophys. Mol. Biol. 73, 321–337 (2000)CrossRefPubMedGoogle Scholar
  12. 12.
    Hegyi, H., Gerstein, M.: The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164 (1999)CrossRefPubMedGoogle Scholar
  13. 13.
    Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G., Chothia, C.: SCOP: a structural classification of protein database. Nucleic Acids Res. 28, 257–259 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Tsoka, S., Ouzounis, C.A.: Recent developments and future directions in computational genomics. FEBS Lett. 480, 42–48 (2000)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Lehel Medvés
    • 1
  • László Szilágyi
    • 1
    • 2
  • Sándor M. Szilágyi
    • 1
  1. 1.Sapientia - Hungarian Science University of TransylvaniaFaculty of Technical and Human ScienceTârgu-MureşRomania
  2. 2.Department of Control Engineering and Information TechnologyBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations