Annotation of proteins of unknown function: initial enzyme results

  • Talia McKay
  • Kaitlin Hart
  • Alison Horn
  • Haeja Kessler
  • Greg Dodge
  • Keti Bardhi
  • Kostandina Bardhi
  • Jeffrey L. Mills
  • Herbert J. Bernstein
  • Paul A. Craig


Working with a combination of ProMOL (a plugin for PyMOL that searches a library of enzymatic motifs for local structural homologs), BLAST and Pfam (servers that identify global sequence homologs), and Dali (a server that identifies global structural homologs), we have begun the process of assigning functional annotations to the approximately 3,500 structures in the Protein Data Bank that are currently classified as having “unknown function”. Using a limited template library of 388 motifs, over 500 promising in silico matches have been identified by ProMOL, among which 65 exceptionally good matches have been identified. The characteristics of the exceptionally good matches are discussed.


Bioinformatics Catalytic motif Enzyme ProMOL Protein function PyMOL Structural biology 



NIGMS 2R15GM078077-02, NIGMS 3R15GM078077-02S1, NIGMS 3R15GM078077-02S2, Dowling College, Rochester Institute of Technology. We would like to thank the following team members who supported our efforts on this project: (from RIT) Weinishet Tedla-Boyd, Tananda Richards; (from Dowling College) Mogjan Asadi, Limone Rosa.

Conflict of interest

None of the authors have reported any conflicts of interest in the completion of the research described in this manuscript.

Supplementary material

10969_2015_9194_MOESM1_ESM.doc (305 kb)
Supplementary material 1 (DOC 305 kb)


  1. 1.
    Bernstein FC, Koetzle TF, Williams GJB et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542CrossRefPubMedGoogle Scholar
  2. 2.
    Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242CrossRefPubMedCentralPubMedGoogle Scholar
  3. 3.
    Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. doi: 10.1093/nar/25.17.3389 CrossRefPubMedCentralPubMedGoogle Scholar
  4. 4.
    Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. doi: 10.1093/nar/gkr367 CrossRefPubMedCentralPubMedGoogle Scholar
  5. 5.
    The Uniprot Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195. doi: 10.1093/nar/gkm895 CrossRefGoogle Scholar
  6. 6.
    Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420CrossRefPubMedGoogle Scholar
  7. 7.
    Finn RD, Miller BL, Clements J, Bateman A (2014) iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res 42:D364–D373. doi: 10.1093/nar/gkt1210 CrossRefPubMedCentralPubMedGoogle Scholar
  8. 8.
    Gifford LK, Carter LG, Gabanyi MJ et al (2012) The protein structure initiative structural biology knowledgebase technology portal: a structural biology web resource. J Struct Funct Genomics 13:57–62. doi: 10.1007/s10969-012-9133-7 CrossRefPubMedCentralPubMedGoogle Scholar
  9. 9.
    Holm L, Rosenström P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549. doi: 10.1093/nar/gkq366 CrossRefPubMedCentralPubMedGoogle Scholar
  10. 10.
    Fischer M, Zhang QC, Dey F et al (2011) MarkUs: a server to navigate sequence-structure-function space. Nucleic Acids Res 39:W357–W361. doi: 10.1093/nar/gkr468 CrossRefPubMedCentralPubMedGoogle Scholar
  11. 11.
    Hanson B, Westin C, Rosa M et al (2014) Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 15:87. doi: 10.1186/1471-2105-15-87 CrossRefPubMedCentralPubMedGoogle Scholar
  12. 12.
    Delano WL. The PyMOL molecular graphics system. Schrodinger, LLC., San Carlos, CA, USAGoogle Scholar
  13. 13.
    Porter CT (2004) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133. doi: 10.1093/nar/gkh028 CrossRefPubMedCentralPubMedGoogle Scholar
  14. 14.
    Berman HM, Westbrook JD, Gabanyi MJ et al (2009) The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 37:D365–D368. doi: 10.1093/nar/gkn790 CrossRefPubMedCentralPubMedGoogle Scholar
  15. 15.
    Torrance JW, Bartlett GJ, Porter CT, Thornton JM (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 347:565–581. doi: 10.1016/j.jmb.2005.01.044 CrossRefPubMedGoogle Scholar
  16. 16.
    Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461. doi: 10.1002/jcc.21334 PubMedCentralPubMedGoogle Scholar
  17. 17.
    Seiler CY, Park JG, Sharma A et al (2014) DNASU plasmid and PSI:biology-materials repositories: resources to accelerate biological research. Nucleic Acids Res 42:D1253–D1260. doi: 10.1093/nar/gkt1060 CrossRefPubMedCentralPubMedGoogle Scholar
  18. 18.
    Cormier C, Park J, Fiacco M et al (2011) PSI:biology-materials repository: a biologist’s resource for protein expression plasmids. J Struct Funct Genomics 12:55–62. doi: 10.1007/s10969-011-9100-8 CrossRefPubMedCentralPubMedGoogle Scholar
  19. 19.
    Cormier C, Mohr S, Zuo D et al (2010) Protein structure initiative material repository: an open shared public resource of structural genomics plasmids for the biological community. Nucleic Acids Res 38:D743–D749. doi: 10.1093/nar/gkp999 CrossRefPubMedCentralPubMedGoogle Scholar
  20. 20.
    Vedadi M, Lew J, Artz J et al (2005) Genome-scale protein expression and structural biology of Plasmodium falciparum and related Apicomplexan organisms. Mol Biochem Parasitol 151:100–110. doi: 10.1016/j.molbiopara.2006.10.011 CrossRefGoogle Scholar
  21. 21.
    Tan K, Rakowski E, Jedrzejczak R, Joachimiak A (2009) The crystal structure of a functionally unknown conserved protein from Enterococcus faecalis V583. doi:  10.2210/pdb3l1w/pdb
  22. 22.
    Mol CD, Kuo C-F, Thayer MM et al (1995) Structure and function of the multifunctional DNA-repair enzyme exonuclease III. Nature 374:381–386. doi: 10.1038/374381a0 CrossRefPubMedGoogle Scholar
  23. 23.
    Kuzin AP, Chen Y, Seetharaman J et al (2006) X-Ray structure of the hypothetical protein YXIM_BACsu from Bacillus subtilis. doi:  10.2210/pdb2o14/pdb
  24. 24.
    Ho YS, Sheffield PJ, Masuyama J et al (1999) Probing the substrate specificity of the intracellular brain platelet-activating factor acetylhydrolase. Protein Eng 12:693–700CrossRefPubMedGoogle Scholar
  25. 25.
    Burkhard P, Taylor P, Walkinshaw MD (2000) X-ray structures of small ligand-FKBP complexes provide an estimate for hydrophobic interaction energies. J Mol Biol 295:953–962. doi: 10.1006/jmbi.1999.3411 CrossRefPubMedGoogle Scholar
  26. 26.
    Patel S, Albert A, Blundell TL (2001) Hal2p: Ion selectivity and implications on inhibition mechanism. doi:  10.2210/pdb1k9z/pdb
  27. 27.
    Mursula AM, Hiltunen JK, Wierenga RK (2003) Structural studies on delta(3)-delta(2)-enoyl-CoA isomerase: the variable mode of assembly of the trimeric disks of the crotonase superfamily. FEBS Lett 557:81–87. doi: 10.1016/S0014-5793(03)01450-9 CrossRefGoogle Scholar
  28. 28.
    Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (BACUNI_01323) from Bacteroides uniformis ATCC 8492 at 2.32 A resolution. doi:  10.2210/pdb4ghb/pdb
  29. 29.
    Sundaresan V, Yamaguchi M, Chartron J, Stout CD (2003) Conformational change in the NADP(H) binding domain of transhydrogenase defines four states. Biochemistry 42:12143–12153. doi: 10.1021/bi035006q CrossRefGoogle Scholar
  30. 30.
    Kim Y, Skarina T, Beasley S et al (2001) Crystal structure of Escherichia coli EC1530, a glyoxylate induced protein YgbM. Proteins 48:427–430. doi: 10.1002/prot.10160 CrossRefGoogle Scholar
  31. 31.
    Schmitt E, Mechulam Y, Fromant M et al (1997) Crystal structure at 1.2 A resolution and active site mapping of Escherichia coli peptidyl-tRNA hydrolase. EMBO J 16:4760–4769. doi: 10.1093/emboj/16.15.4760 CrossRefPubMedCentralPubMedGoogle Scholar
  32. 32.
    Inoue M, Kigawa T, Yokoyama S (2002) Solution structure of the cullin-3 homologue. doi:  10.2210/pdb1iuy/pdb
  33. 33.
    Van Pouderoyen G, Snijder HJ, Benen JA, Dijkstra BW (2002) Structural insights into the processivity of endopolygalacturonase I from Aspergillus niger. FEBS Lett 554:462–466. doi: 10.1016/S0014-5793(03)01221-3 CrossRefGoogle Scholar
  34. 34.
    Christen B, Hornemann S, Damberger FF, Wüthrich K (2009) Prion protein NMR structure from tammar wallaby (Macropus eugenii) shows that the beta2-alpha2 loop is modulated by long-range sequence effects. J Mol Biol 389:833–845. doi: 10.1016/j.jmb.2009.04.040 CrossRefPubMedGoogle Scholar
  35. 35.
    Teplyakov A, Obmolova G, Badet-Denisot MA et al (1997) Involvement of the C terminus in intramolecular nitrogen channeling in glucosamine 6-phosphate synthase: evidence from a 1.6 A crystal structure of the isomerase domain. Structure 6:1047–1055. doi: 10.1016/S0969-2126(98)00105-1 CrossRefGoogle Scholar
  36. 36.
    Zahn R, Liu A, Luhrs T et al (1999) NMR solution structure of the human prion protein. Proc Natl Acad Sci 97:145–null. doi:  10.1073/PNAS.97.1.145
  37. 37.
    Gorman J, Shapiro L (2003) Structural Genomics target NYSGRC-T920 related to A/B hydrolase fold. doi:  10.2210/pdb1r3d/pdb
  38. 38.
    Min JR, Antoshenko T, Hong W et al (2005) Crystal structure of acetyltransferases domain of human testis-specific chromodomain protein Y 1. doi:  10.2210/pdb2fbm/pdb
  39. 39.
    Nocek B, Borovilos M, Clancy S, Joachimiak A (2006) Crystal structure of hypothetical protein MM_3350 from Methanosarcina mazei Go1. doi:  10.2210/pdb2i1s/pdb
  40. 40.
    Chang C, Chhor G, Cobb G, Joachimiak A (2009) Crystal structure of uncharacterized protein BP1543 from Bordetella pertussis Tohama I. doi:  10.2210/pdb3kk4/pdb
  41. 41.
    Vorobiev S, Scott L, Schauder C et al (2011) PDB ID: 3HFQ Crystal structure of the lp_2219 protein from Lactobacillus plantarum. doi: 10.2210/pdb3hfq/pdb
  42. 42.
    Stone CB, Sugiman-Marangos SN, Junop MS, Mahony JB (2011) Crystal Structure of Cpn0803 from C. pneumoniae. doi:  10.2210/pdb3q9d/pdb
  43. 43.
    Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (lpg1103) from Legionella pneumophila subsp. pneumophila str. Philadelphia 1 at 1.15 A resolution. doi:  10.2210/pdb4ezi/pdb
  44. 44.
    Jiang M, Chen X, Wu X-H et al (2009) Catalytic mechanism of SHCHC synthase in the menaquinone biosynthesis of Escherichia coli: identification and mutational analysis of the active site residues. Biochemistry 48:6921–6931. doi: 10.1021/bi900897h CrossRefGoogle Scholar
  45. 45.
    Holden HM, Benning MM, Haller T, Gerlt JA (2001) The crotonase superfamily: divergently related enzymes that catalyze different reactions involving acyl coenzyme a thioesters. Acc Chem Res 34:145–157. doi: 10.1021/ar000053l CrossRefPubMedGoogle Scholar
  46. 46.
    Kajander T, Merckel MC, Thompson A et al (2001) The structure of Neurospora crassa 3-carboxy-cis, cis-muconate lactonizing enzyme, a beta propeller cycloisomerase. Structure 10:483–492. doi: 10.1016/S0969-2126(02)00744-X CrossRefGoogle Scholar
  47. 47.
    Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol 7:539. doi: 10.1038/msb.2011.75 CrossRefPubMedCentralPubMedGoogle Scholar
  48. 48.
    Goujon M, McWilliam H, Li W et al (2010) A new bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Res 38:W695–W699. doi: 10.1093/nar/gkq313 CrossRefPubMedCentralPubMedGoogle Scholar
  49. 49.
    Kleerebezem M, Boekhorst J, van Kranenburg R et al (2003) Complete genome sequence of Lactobacillus plantarum WCFS1. Proc Natl Acad Sci 100:1990–1995. doi: 10.1073/pnas.0337704100 CrossRefPubMedCentralPubMedGoogle Scholar
  50. 50.
    The Gene Ontology Consortium (2013) Gene ontology annotations and resources. Nucleic Acids Res 41:D530–D535. doi: 10.1093/nar/gks1050 CrossRefPubMedCentralGoogle Scholar
  51. 51.
    Hunter S, Jones P, Mitchell A et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312. doi: 10.1093/nar/gkr948 CrossRefPubMedCentralPubMedGoogle Scholar
  52. 52.
    Lima CD, Kniewel R, Solorzano V, Wu J (2003) Structure of a putative 7-bladed propeller isomerase. doi:  10.2210/pdb1ri6/pdb

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Talia McKay
    • 1
  • Kaitlin Hart
    • 1
  • Alison Horn
    • 1
  • Haeja Kessler
    • 1
  • Greg Dodge
    • 1
  • Keti Bardhi
    • 1
  • Kostandina Bardhi
    • 1
  • Jeffrey L. Mills
    • 1
  • Herbert J. Bernstein
    • 2
  • Paul A. Craig
    • 1
  1. 1.College of Science, RITRochesterUSA
  2. 2.Department of Mathematics and Computer ScienceDowling CollegeOakdaleUSA

Personalised recommendations