Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization

  • Akshay Vashist
  • Casimir Kulikowski
  • Ilya Muchnik
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


Reliable automatic protein function annotation requires methods for detecting orthologs with known function from closely related species. While current approaches are restricted to finding ortholog clusters from complete proteomes, most annotation problems arise in the context of partially sequenced genomes. We use a combinatorial optimization method for extracting candidate ortholog clusters robustly from incomplete genomes. The proposed algorithm focuses exclusively on sequence relationships across genomes and finds a subset of sequences from multiple genomes where every sequence is highly similar to other sequences in the subset. We then use an optimization criterion similar to the one for finding ortholog clusters to annotate the target sequences.

We report on a candidate annotation for proteins in the rice genome using ortholog clusters constructed from four partially complete cereal genomes – barley, maize, sorghum, wheat and the complete genome of Arabidopsis.


Query Sequence Query Protein Pfam Family Ortholog Cluster Rice Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abascal, F., Valencia, A.: Automatic annotation of protein function based on family identification. Proteins 53, 683–692 (2003)CrossRefGoogle Scholar
  2. 2.
    Tatusov, R., Koonin, E., Lipmann, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)CrossRefGoogle Scholar
  3. 3.
    Enright, A.J., Van Dongen, S., Ouzonis, C.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575–1584 (2002)CrossRefGoogle Scholar
  4. 4.
    Petryszak, R., Kretschmann, E., Wieser, D., Apweiler, R.: The predictive power of the CluSTr database. Bioinformatics 21, 3604–3609 (2005)CrossRefGoogle Scholar
  5. 5.
    Wu, C.H., Huang, H., Yeh, L.S.L., Barker, W.C.: Protein family classification and functional annotation. Comput. Biol. Chem. 27, 37–47 (2003)CrossRefGoogle Scholar
  6. 6.
    Bru, C., Courcelle, E., Carrre, S., Beausse, Y., Dalmar, S., Kahn, D.: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215 (2005)Google Scholar
  7. 7.
    Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Res 32, 138–141 (2004)CrossRefGoogle Scholar
  8. 8.
    Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32, 226–229 (2004)CrossRefGoogle Scholar
  9. 9.
    Fleishmann, W., Moller, S., Gateau, A., Apweiler, R.: A novel method for automatic functional annotation of proteins. Bioinformatics 15, 228–233 (1999)CrossRefGoogle Scholar
  10. 10.
    Curwen, V., Wyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M.: The Ensembl automatic gene annotation system. Genome Res 14, 942–950 (2004)CrossRefGoogle Scholar
  11. 11.
    Eisen, J., Wu, M.: Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor. Popul. Biol. 61, 481–487 (2002)CrossRefGoogle Scholar
  12. 12.
    Galperin, M.Y., Koonin, E.V.: Who’s your neighbor? new computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000)CrossRefGoogle Scholar
  13. 13.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997)CrossRefGoogle Scholar
  14. 14.
    Koski, L.B., Golding, G.B.: The closest BLAST hit is often not the nearest neighbor. J. Mol. Biol. 52, 540–542 (2001)Google Scholar
  15. 15.
    Remm, M., Strom, C.E., Sonnhammer, E.L.: Automatics clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)CrossRefGoogle Scholar
  16. 16.
    Li, L., Stoeckert, C.K., Roos, D.S.: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003)CrossRefGoogle Scholar
  17. 17.
    Tatusov, R., Fedorova, N., Jackson, J., Jacobs, A., Kiryutin, B., Koonin, E., Krylov, D., Mazumdes, R., Mekhedov, S., Nikolskaya, A., Rao, B., Smirnov, S., Sverdlov, A., Vasudevan, S., Wolf, Y., Yin, J., Natale, D.: The COG database: an updated version includes eukaryotes. BioMed Central Bioinformatics (2003)Google Scholar
  18. 18.
    Abascal, F., Valencia, A.: Clustering of proximal sequence space for identification of protein families. Bioinformatics 18, 908–921 (2002)CrossRefGoogle Scholar
  19. 19.
    Vashist, A., Kulikowski, C., Muchnik, I.: Ortholog clustering on a multipartite graph. In: Workshop on Algorithms in Bioinformatics, pp. 328–340 (2005)Google Scholar
  20. 20.
    Kamvysselis, M., Patterson, N., Birren, B., Berger, B., Lander, E.: Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species. In: RECOMB, pp. 157–166 (2003)Google Scholar
  21. 21.
    Huynen, M.A., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998)CrossRefGoogle Scholar
  22. 22.
    Fujibuchi, W., Ogata, H., Matsuda, H., Kanehisa, M.: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic Acids Res 28, 4036–4096 (2002)Google Scholar
  23. 23.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)CrossRefGoogle Scholar
  24. 24.
    He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of orthologous groups. In: RECOMB, pp. 272–280 (2004)Google Scholar
  25. 25.
    Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998)CrossRefGoogle Scholar
  26. 26.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Workshop on Algorithms in Bioinformatics, pp. 252–263 (2001)Google Scholar
  27. 27.
    Cannon, S.B., Young, N.D.: OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4 (2003)Google Scholar
  28. 28.
    Dong, Q., Schlueter, D., Brendel, V.: PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32, D354–D359 (2004)Google Scholar
  29. 29.
    Schoof, H., Zaccaria, P., Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Mewes, R.A.H., Mayer, K.: MIPS arabidopsis thaliana database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res 30, 91–93 (2002)CrossRefGoogle Scholar
  30. 30.
    Kellogg, E.A.: Relationships of cereal crops and other grasses. Proc. Natl. Acad. Sci. USA 95, 2005–2010 (1998)CrossRefGoogle Scholar
  31. 31.
    Darlingto, H., Rouster, J., Hoffmann, L., Halford, N., Shewry, P., Simpson, D.: Identification and molecular characterisation of hordoindolines from barley grain. Plant Mol. Biol. 47, 785–794 (2001)CrossRefGoogle Scholar
  32. 32.
    Castleden, C.K., Aoki, N., Gillespie, V.J., MacRae, E.A., Quick, W.P., Buchner, P., Foyer, C.H., Furbank, R.T., Lunn, J.E.: Evolution and function of the sucrose-phosphate synthase gene families in wheat and other grasses. Plant Physiology 135, 1753–1764 (2004)CrossRefGoogle Scholar
  33. 33.
    Song, R., Llaca, V., Linton, E., Messing, J.: Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res. 11, 1817–1825 (2001)Google Scholar
  34. 34.
    Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Akshay Vashist
    • 1
  • Casimir Kulikowski
    • 1
  • Ilya Muchnik
    • 1
    • 2
  1. 1.Department of Computer Science 
  2. 2.DIMACS RutgersThe State University of New JerseyPiscatawayUSA

Personalised recommendations