Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Annotation of proteins of unknown function: initial enzyme results


Working with a combination of ProMOL (a plugin for PyMOL that searches a library of enzymatic motifs for local structural homologs), BLAST and Pfam (servers that identify global sequence homologs), and Dali (a server that identifies global structural homologs), we have begun the process of assigning functional annotations to the approximately 3,500 structures in the Protein Data Bank that are currently classified as having “unknown function”. Using a limited template library of 388 motifs, over 500 promising in silico matches have been identified by ProMOL, among which 65 exceptionally good matches have been identified. The characteristics of the exceptionally good matches are discussed.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Bernstein FC, Koetzle TF, Williams GJB et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542

  2. 2.

    Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242

  3. 3.

    Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. doi:10.1093/nar/25.17.3389

  4. 4.

    Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. doi:10.1093/nar/gkr367

  5. 5.

    The Uniprot Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195. doi:10.1093/nar/gkm895

  6. 6.

    Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420

  7. 7.

    Finn RD, Miller BL, Clements J, Bateman A (2014) iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res 42:D364–D373. doi:10.1093/nar/gkt1210

  8. 8.

    Gifford LK, Carter LG, Gabanyi MJ et al (2012) The protein structure initiative structural biology knowledgebase technology portal: a structural biology web resource. J Struct Funct Genomics 13:57–62. doi:10.1007/s10969-012-9133-7

  9. 9.

    Holm L, Rosenström P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549. doi:10.1093/nar/gkq366

  10. 10.

    Fischer M, Zhang QC, Dey F et al (2011) MarkUs: a server to navigate sequence-structure-function space. Nucleic Acids Res 39:W357–W361. doi:10.1093/nar/gkr468

  11. 11.

    Hanson B, Westin C, Rosa M et al (2014) Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 15:87. doi:10.1186/1471-2105-15-87

  12. 12.

    Delano WL. The PyMOL molecular graphics system. Schrodinger, LLC., San Carlos, CA, USA

  13. 13.

    Porter CT (2004) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133. doi:10.1093/nar/gkh028

  14. 14.

    Berman HM, Westbrook JD, Gabanyi MJ et al (2009) The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 37:D365–D368. doi:10.1093/nar/gkn790

  15. 15.

    Torrance JW, Bartlett GJ, Porter CT, Thornton JM (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 347:565–581. doi:10.1016/j.jmb.2005.01.044

  16. 16.

    Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461. doi:10.1002/jcc.21334

  17. 17.

    Seiler CY, Park JG, Sharma A et al (2014) DNASU plasmid and PSI:biology-materials repositories: resources to accelerate biological research. Nucleic Acids Res 42:D1253–D1260. doi:10.1093/nar/gkt1060

  18. 18.

    Cormier C, Park J, Fiacco M et al (2011) PSI:biology-materials repository: a biologist’s resource for protein expression plasmids. J Struct Funct Genomics 12:55–62. doi:10.1007/s10969-011-9100-8

  19. 19.

    Cormier C, Mohr S, Zuo D et al (2010) Protein structure initiative material repository: an open shared public resource of structural genomics plasmids for the biological community. Nucleic Acids Res 38:D743–D749. doi:10.1093/nar/gkp999

  20. 20.

    Vedadi M, Lew J, Artz J et al (2005) Genome-scale protein expression and structural biology of Plasmodium falciparum and related Apicomplexan organisms. Mol Biochem Parasitol 151:100–110. doi:10.1016/j.molbiopara.2006.10.011

  21. 21.

    Tan K, Rakowski E, Jedrzejczak R, Joachimiak A (2009) The crystal structure of a functionally unknown conserved protein from Enterococcus faecalis V583. doi: 10.2210/pdb3l1w/pdb

  22. 22.

    Mol CD, Kuo C-F, Thayer MM et al (1995) Structure and function of the multifunctional DNA-repair enzyme exonuclease III. Nature 374:381–386. doi:10.1038/374381a0

  23. 23.

    Kuzin AP, Chen Y, Seetharaman J et al (2006) X-Ray structure of the hypothetical protein YXIM_BACsu from Bacillus subtilis. doi: 10.2210/pdb2o14/pdb

  24. 24.

    Ho YS, Sheffield PJ, Masuyama J et al (1999) Probing the substrate specificity of the intracellular brain platelet-activating factor acetylhydrolase. Protein Eng 12:693–700

  25. 25.

    Burkhard P, Taylor P, Walkinshaw MD (2000) X-ray structures of small ligand-FKBP complexes provide an estimate for hydrophobic interaction energies. J Mol Biol 295:953–962. doi:10.1006/jmbi.1999.3411

  26. 26.

    Patel S, Albert A, Blundell TL (2001) Hal2p: Ion selectivity and implications on inhibition mechanism. doi: 10.2210/pdb1k9z/pdb

  27. 27.

    Mursula AM, Hiltunen JK, Wierenga RK (2003) Structural studies on delta(3)-delta(2)-enoyl-CoA isomerase: the variable mode of assembly of the trimeric disks of the crotonase superfamily. FEBS Lett 557:81–87. doi:10.1016/S0014-5793(03)01450-9

  28. 28.

    Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (BACUNI_01323) from Bacteroides uniformis ATCC 8492 at 2.32 A resolution. doi: 10.2210/pdb4ghb/pdb

  29. 29.

    Sundaresan V, Yamaguchi M, Chartron J, Stout CD (2003) Conformational change in the NADP(H) binding domain of transhydrogenase defines four states. Biochemistry 42:12143–12153. doi:10.1021/bi035006q

  30. 30.

    Kim Y, Skarina T, Beasley S et al (2001) Crystal structure of Escherichia coli EC1530, a glyoxylate induced protein YgbM. Proteins 48:427–430. doi:10.1002/prot.10160

  31. 31.

    Schmitt E, Mechulam Y, Fromant M et al (1997) Crystal structure at 1.2 A resolution and active site mapping of Escherichia coli peptidyl-tRNA hydrolase. EMBO J 16:4760–4769. doi:10.1093/emboj/16.15.4760

  32. 32.

    Inoue M, Kigawa T, Yokoyama S (2002) Solution structure of the cullin-3 homologue. doi: 10.2210/pdb1iuy/pdb

  33. 33.

    Van Pouderoyen G, Snijder HJ, Benen JA, Dijkstra BW (2002) Structural insights into the processivity of endopolygalacturonase I from Aspergillus niger. FEBS Lett 554:462–466. doi:10.1016/S0014-5793(03)01221-3

  34. 34.

    Christen B, Hornemann S, Damberger FF, Wüthrich K (2009) Prion protein NMR structure from tammar wallaby (Macropus eugenii) shows that the beta2-alpha2 loop is modulated by long-range sequence effects. J Mol Biol 389:833–845. doi:10.1016/j.jmb.2009.04.040

  35. 35.

    Teplyakov A, Obmolova G, Badet-Denisot MA et al (1997) Involvement of the C terminus in intramolecular nitrogen channeling in glucosamine 6-phosphate synthase: evidence from a 1.6 A crystal structure of the isomerase domain. Structure 6:1047–1055. doi:10.1016/S0969-2126(98)00105-1

  36. 36.

    Zahn R, Liu A, Luhrs T et al (1999) NMR solution structure of the human prion protein. Proc Natl Acad Sci 97:145–null. doi: 10.1073/PNAS.97.1.145

  37. 37.

    Gorman J, Shapiro L (2003) Structural Genomics target NYSGRC-T920 related to A/B hydrolase fold. doi: 10.2210/pdb1r3d/pdb

  38. 38.

    Min JR, Antoshenko T, Hong W et al (2005) Crystal structure of acetyltransferases domain of human testis-specific chromodomain protein Y 1. doi: 10.2210/pdb2fbm/pdb

  39. 39.

    Nocek B, Borovilos M, Clancy S, Joachimiak A (2006) Crystal structure of hypothetical protein MM_3350 from Methanosarcina mazei Go1. doi: 10.2210/pdb2i1s/pdb

  40. 40.

    Chang C, Chhor G, Cobb G, Joachimiak A (2009) Crystal structure of uncharacterized protein BP1543 from Bordetella pertussis Tohama I. doi: 10.2210/pdb3kk4/pdb

  41. 41.

    Vorobiev S, Scott L, Schauder C et al (2011) PDB ID: 3HFQ Crystal structure of the lp_2219 protein from Lactobacillus plantarum. doi:10.2210/pdb3hfq/pdb

  42. 42.

    Stone CB, Sugiman-Marangos SN, Junop MS, Mahony JB (2011) Crystal Structure of Cpn0803 from C. pneumoniae. doi: 10.2210/pdb3q9d/pdb

  43. 43.

    Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (lpg1103) from Legionella pneumophila subsp. pneumophila str. Philadelphia 1 at 1.15 A resolution. doi: 10.2210/pdb4ezi/pdb

  44. 44.

    Jiang M, Chen X, Wu X-H et al (2009) Catalytic mechanism of SHCHC synthase in the menaquinone biosynthesis of Escherichia coli: identification and mutational analysis of the active site residues. Biochemistry 48:6921–6931. doi:10.1021/bi900897h

  45. 45.

    Holden HM, Benning MM, Haller T, Gerlt JA (2001) The crotonase superfamily: divergently related enzymes that catalyze different reactions involving acyl coenzyme a thioesters. Acc Chem Res 34:145–157. doi:10.1021/ar000053l

  46. 46.

    Kajander T, Merckel MC, Thompson A et al (2001) The structure of Neurospora crassa 3-carboxy-cis, cis-muconate lactonizing enzyme, a beta propeller cycloisomerase. Structure 10:483–492. doi:10.1016/S0969-2126(02)00744-X

  47. 47.

    Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75

  48. 48.

    Goujon M, McWilliam H, Li W et al (2010) A new bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Res 38:W695–W699. doi:10.1093/nar/gkq313

  49. 49.

    Kleerebezem M, Boekhorst J, van Kranenburg R et al (2003) Complete genome sequence of Lactobacillus plantarum WCFS1. Proc Natl Acad Sci 100:1990–1995. doi:10.1073/pnas.0337704100

  50. 50.

    The Gene Ontology Consortium (2013) Gene ontology annotations and resources. Nucleic Acids Res 41:D530–D535. doi:10.1093/nar/gks1050

  51. 51.

    Hunter S, Jones P, Mitchell A et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312. doi:10.1093/nar/gkr948

  52. 52.

    Lima CD, Kniewel R, Solorzano V, Wu J (2003) Structure of a putative 7-bladed propeller isomerase. doi: 10.2210/pdb1ri6/pdb

Download references


NIGMS 2R15GM078077-02, NIGMS 3R15GM078077-02S1, NIGMS 3R15GM078077-02S2, Dowling College, Rochester Institute of Technology. We would like to thank the following team members who supported our efforts on this project: (from RIT) Weinishet Tedla-Boyd, Tananda Richards; (from Dowling College) Mogjan Asadi, Limone Rosa.

Conflict of interest

None of the authors have reported any conflicts of interest in the completion of the research described in this manuscript.

Author information

Correspondence to Paul A. Craig.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 305 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McKay, T., Hart, K., Horn, A. et al. Annotation of proteins of unknown function: initial enzyme results. J Struct Funct Genomics 16, 43–54 (2015).

Download citation


  • Bioinformatics
  • Catalytic motif
  • Enzyme
  • ProMOL
  • Protein function
  • PyMOL
  • Structural biology