Abstract
Working with a combination of ProMOL (a plugin for PyMOL that searches a library of enzymatic motifs for local structural homologs), BLAST and Pfam (servers that identify global sequence homologs), and Dali (a server that identifies global structural homologs), we have begun the process of assigning functional annotations to the approximately 3,500 structures in the Protein Data Bank that are currently classified as having “unknown function”. Using a limited template library of 388 motifs, over 500 promising in silico matches have been identified by ProMOL, among which 65 exceptionally good matches have been identified. The characteristics of the exceptionally good matches are discussed.
References
Bernstein FC, Koetzle TF, Williams GJB et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. doi:10.1093/nar/25.17.3389
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. doi:10.1093/nar/gkr367
The Uniprot Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195. doi:10.1093/nar/gkm895
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420
Finn RD, Miller BL, Clements J, Bateman A (2014) iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res 42:D364–D373. doi:10.1093/nar/gkt1210
Gifford LK, Carter LG, Gabanyi MJ et al (2012) The protein structure initiative structural biology knowledgebase technology portal: a structural biology web resource. J Struct Funct Genomics 13:57–62. doi:10.1007/s10969-012-9133-7
Holm L, Rosenström P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549. doi:10.1093/nar/gkq366
Fischer M, Zhang QC, Dey F et al (2011) MarkUs: a server to navigate sequence-structure-function space. Nucleic Acids Res 39:W357–W361. doi:10.1093/nar/gkr468
Hanson B, Westin C, Rosa M et al (2014) Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 15:87. doi:10.1186/1471-2105-15-87
Delano WL. The PyMOL molecular graphics system. Schrodinger, LLC., San Carlos, CA, USA
Porter CT (2004) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133. doi:10.1093/nar/gkh028
Berman HM, Westbrook JD, Gabanyi MJ et al (2009) The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 37:D365–D368. doi:10.1093/nar/gkn790
Torrance JW, Bartlett GJ, Porter CT, Thornton JM (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 347:565–581. doi:10.1016/j.jmb.2005.01.044
Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461. doi:10.1002/jcc.21334
Seiler CY, Park JG, Sharma A et al (2014) DNASU plasmid and PSI:biology-materials repositories: resources to accelerate biological research. Nucleic Acids Res 42:D1253–D1260. doi:10.1093/nar/gkt1060
Cormier C, Park J, Fiacco M et al (2011) PSI:biology-materials repository: a biologist’s resource for protein expression plasmids. J Struct Funct Genomics 12:55–62. doi:10.1007/s10969-011-9100-8
Cormier C, Mohr S, Zuo D et al (2010) Protein structure initiative material repository: an open shared public resource of structural genomics plasmids for the biological community. Nucleic Acids Res 38:D743–D749. doi:10.1093/nar/gkp999
Vedadi M, Lew J, Artz J et al (2005) Genome-scale protein expression and structural biology of Plasmodium falciparum and related Apicomplexan organisms. Mol Biochem Parasitol 151:100–110. doi:10.1016/j.molbiopara.2006.10.011
Tan K, Rakowski E, Jedrzejczak R, Joachimiak A (2009) The crystal structure of a functionally unknown conserved protein from Enterococcus faecalis V583. doi: 10.2210/pdb3l1w/pdb
Mol CD, Kuo C-F, Thayer MM et al (1995) Structure and function of the multifunctional DNA-repair enzyme exonuclease III. Nature 374:381–386. doi:10.1038/374381a0
Kuzin AP, Chen Y, Seetharaman J et al (2006) X-Ray structure of the hypothetical protein YXIM_BACsu from Bacillus subtilis. doi: 10.2210/pdb2o14/pdb
Ho YS, Sheffield PJ, Masuyama J et al (1999) Probing the substrate specificity of the intracellular brain platelet-activating factor acetylhydrolase. Protein Eng 12:693–700
Burkhard P, Taylor P, Walkinshaw MD (2000) X-ray structures of small ligand-FKBP complexes provide an estimate for hydrophobic interaction energies. J Mol Biol 295:953–962. doi:10.1006/jmbi.1999.3411
Patel S, Albert A, Blundell TL (2001) Hal2p: Ion selectivity and implications on inhibition mechanism. doi: 10.2210/pdb1k9z/pdb
Mursula AM, Hiltunen JK, Wierenga RK (2003) Structural studies on delta(3)-delta(2)-enoyl-CoA isomerase: the variable mode of assembly of the trimeric disks of the crotonase superfamily. FEBS Lett 557:81–87. doi:10.1016/S0014-5793(03)01450-9
Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (BACUNI_01323) from Bacteroides uniformis ATCC 8492 at 2.32 A resolution. doi: 10.2210/pdb4ghb/pdb
Sundaresan V, Yamaguchi M, Chartron J, Stout CD (2003) Conformational change in the NADP(H) binding domain of transhydrogenase defines four states. Biochemistry 42:12143–12153. doi:10.1021/bi035006q
Kim Y, Skarina T, Beasley S et al (2001) Crystal structure of Escherichia coli EC1530, a glyoxylate induced protein YgbM. Proteins 48:427–430. doi:10.1002/prot.10160
Schmitt E, Mechulam Y, Fromant M et al (1997) Crystal structure at 1.2 A resolution and active site mapping of Escherichia coli peptidyl-tRNA hydrolase. EMBO J 16:4760–4769. doi:10.1093/emboj/16.15.4760
Inoue M, Kigawa T, Yokoyama S (2002) Solution structure of the cullin-3 homologue. doi: 10.2210/pdb1iuy/pdb
Van Pouderoyen G, Snijder HJ, Benen JA, Dijkstra BW (2002) Structural insights into the processivity of endopolygalacturonase I from Aspergillus niger. FEBS Lett 554:462–466. doi:10.1016/S0014-5793(03)01221-3
Christen B, Hornemann S, Damberger FF, Wüthrich K (2009) Prion protein NMR structure from tammar wallaby (Macropus eugenii) shows that the beta2-alpha2 loop is modulated by long-range sequence effects. J Mol Biol 389:833–845. doi:10.1016/j.jmb.2009.04.040
Teplyakov A, Obmolova G, Badet-Denisot MA et al (1997) Involvement of the C terminus in intramolecular nitrogen channeling in glucosamine 6-phosphate synthase: evidence from a 1.6 A crystal structure of the isomerase domain. Structure 6:1047–1055. doi:10.1016/S0969-2126(98)00105-1
Zahn R, Liu A, Luhrs T et al (1999) NMR solution structure of the human prion protein. Proc Natl Acad Sci 97:145–null. doi: 10.1073/PNAS.97.1.145
Gorman J, Shapiro L (2003) Structural Genomics target NYSGRC-T920 related to A/B hydrolase fold. doi: 10.2210/pdb1r3d/pdb
Min JR, Antoshenko T, Hong W et al (2005) Crystal structure of acetyltransferases domain of human testis-specific chromodomain protein Y 1. doi: 10.2210/pdb2fbm/pdb
Nocek B, Borovilos M, Clancy S, Joachimiak A (2006) Crystal structure of hypothetical protein MM_3350 from Methanosarcina mazei Go1. doi: 10.2210/pdb2i1s/pdb
Chang C, Chhor G, Cobb G, Joachimiak A (2009) Crystal structure of uncharacterized protein BP1543 from Bordetella pertussis Tohama I. doi: 10.2210/pdb3kk4/pdb
Vorobiev S, Scott L, Schauder C et al (2011) PDB ID: 3HFQ Crystal structure of the lp_2219 protein from Lactobacillus plantarum. doi:10.2210/pdb3hfq/pdb
Stone CB, Sugiman-Marangos SN, Junop MS, Mahony JB (2011) Crystal Structure of Cpn0803 from C. pneumoniae. doi: 10.2210/pdb3q9d/pdb
Joint Center for Structural Genomics (JCSG) (2012) Crystal structure of a hypothetical protein (lpg1103) from Legionella pneumophila subsp. pneumophila str. Philadelphia 1 at 1.15 A resolution. doi: 10.2210/pdb4ezi/pdb
Jiang M, Chen X, Wu X-H et al (2009) Catalytic mechanism of SHCHC synthase in the menaquinone biosynthesis of Escherichia coli: identification and mutational analysis of the active site residues. Biochemistry 48:6921–6931. doi:10.1021/bi900897h
Holden HM, Benning MM, Haller T, Gerlt JA (2001) The crotonase superfamily: divergently related enzymes that catalyze different reactions involving acyl coenzyme a thioesters. Acc Chem Res 34:145–157. doi:10.1021/ar000053l
Kajander T, Merckel MC, Thompson A et al (2001) The structure of Neurospora crassa 3-carboxy-cis, cis-muconate lactonizing enzyme, a beta propeller cycloisomerase. Structure 10:483–492. doi:10.1016/S0969-2126(02)00744-X
Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75
Goujon M, McWilliam H, Li W et al (2010) A new bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Res 38:W695–W699. doi:10.1093/nar/gkq313
Kleerebezem M, Boekhorst J, van Kranenburg R et al (2003) Complete genome sequence of Lactobacillus plantarum WCFS1. Proc Natl Acad Sci 100:1990–1995. doi:10.1073/pnas.0337704100
The Gene Ontology Consortium (2013) Gene ontology annotations and resources. Nucleic Acids Res 41:D530–D535. doi:10.1093/nar/gks1050
Hunter S, Jones P, Mitchell A et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312. doi:10.1093/nar/gkr948
Lima CD, Kniewel R, Solorzano V, Wu J (2003) Structure of a putative 7-bladed propeller isomerase. doi: 10.2210/pdb1ri6/pdb
Acknowledgments
NIGMS 2R15GM078077-02, NIGMS 3R15GM078077-02S1, NIGMS 3R15GM078077-02S2, Dowling College, Rochester Institute of Technology. We would like to thank the following team members who supported our efforts on this project: (from RIT) Weinishet Tedla-Boyd, Tananda Richards; (from Dowling College) Mogjan Asadi, Limone Rosa.
Conflict of interest
None of the authors have reported any conflicts of interest in the completion of the research described in this manuscript.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
McKay, T., Hart, K., Horn, A. et al. Annotation of proteins of unknown function: initial enzyme results. J Struct Funct Genomics 16, 43–54 (2015). https://doi.org/10.1007/s10969-015-9194-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10969-015-9194-5