Abstract
The creation and analysis of the 3Dfold_test database are described. This database comprises a large set of pairs of spatially similar protein domain structures and a larger control set of “decoys,” spatially dissimilar protein structures with approximately the same size and compactness as each member of each pair. The database is available at http://phys.protres.ru/resources/prediction_analogy/3Dfold
Similar content being viewed by others
Abbreviations
- AA:
-
is amino acid and
- 3D:
-
is three-dimensional
References
Kopp J., Bordoli L., Battey J.N.D., Kiefer F., Schwede T. 2007. Assesment of CASP7 predictions for templatebased modeling targets. Proteins. 69, S8, 38–56.
Jauch R., Yeo H.C., Kolatkar P.R., Clarke N.D. 2007. Assesment of CASP7 structure predictions for template free targets. Proteins. 69, S8, 38–67.
Berman H., Henrick K., Nakamura H., Markley J.L. 2007. The worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acid Res. 35, D3010–D303; http://www.wwpdb.org.
Smith T.F., Waterman M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197; http://fasta.bioch.virginia.edu/fasta-www2/fasta-www.cgi?rm=select&pgm=sw.
Altschul S.F., Gish W., Miller W., Myers E., Lipman D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410; ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.17/blast-2.2.17-ia32-linux.tar.gz.
Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402; ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.17/blast-2.2.17-ia32-linux.tar.gz.
Finkelstein A.V., Reva B.A. 1990. Globular protein threading by a self-consisted field method. Biofizika. 35, 402–406.
Finkelstein A.V., Reva B.A. 1991. Search for the most stable folds of protein chains. Nature. 351, 497–499.
Bowie J.U., Lüthy R., Eisenberg D. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 253, 164–170.
Godzik A., Kolinski A., Skolnik J. 1992. Topology fingerprint approach to the inverse protein folding problem. J. Mol. Biol. 227, 227–238.
Jones D.T., Thornton J.M. 1996. Potential energy functions for threading. Curr. Opin. Struct. Biol. 6, 210–216.
Park B., Levitt M. 1996. Energy functions that discriminate X-ray and near-native folds from well-constructed decoys. J. Mol. Biol. 258, 367–392.
Samudrala R., Levitt M. 2000. Decoys ‘R’ us: A database of incorrect conformations to improve protein structure prediction. Protein Sci. 9, 1399–1401.
Reva B.A., Finkelstein A.V., Sanner M.F., Olson A.J. 1997. Residue-residue mean-force potentials for protein structure recognition. Protein Eng. 10, 865–876.
Taylor W.R. 2006. Decoy models for protein structure comparison score normalization. J. Mol. Biol. 357, 676–699.
Thompson J.D., Plewniak F., Poch O. 1999. BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics. 15, 87–88; http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE2/index.html.
Gough J., Chothia C. 2002. SUPERFAMILY: HMMs representing all proteins of known structure: SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30, 268–272; http://supfam.org.
Murzin A.G., Brenner S.E., Hubbard T., Chothia C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540; http://scop.mrclmb.cam.ac.uk/scop/parse/index.html.
Galzitskaya O.V., Reifsnyder D.C., Bogatyreva N.S., Ivankov D.N., Garbuzynskiy S.O. 2008. More compact protein globules exhibit slower folding rates. Proteins. 70, 329–332.
Siew N., Elofsson A., Rychlewski L., Fischer D. 2000. MaxSub: An automated measure for the assessment of protein structure prediction quality. Bioinformatics. 16, 776–785.
Lesk A.M. 1986. A toolkit for computational molecular biology: 2. On the optimal superposition of two sets of coordinates, Acta Crystallogr. A. 42, 110–113.
Krieger E., Darden T., Nabuurs S.B., Finkelstein A., Vriend G. 2004, Making optimal use of empirical energy functions: Force field parameterization in crystal space. Proteins. 57, 678–683.
Kabsch W., Sander C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogenbonded and geometrical features. Biopolymers. 22, 2577–2637; http://swift.cmbi.ru.nl/gv/dssp.
Schäffer A.A., Aravind L., Madden T.L., Shavirin S., Spouge J.L., Wolf Y.I., Koonin E.V., Altschul S.F. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29, 2994–3005.
Chothia C., Lesk A.M. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826.
Sunyaev S.R., Bogopolsky G.A., Oleinikova N.A., Vlasov P.K, Finkelstein A.V., Roytberg M.A. 2004. From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins. 54, 569–582.
Kosloff M., Kolodny R. 2008. Sequence-similar, structure-dissimilar protein pairs in PDB. Proteins. 71, 891–902.
Reva B.A., Finkelstein A.V., Skolnick J. 1998. What is the probability of a chance prediction of a protein structure with an RMSD of 6 Å? Fold. Des. 3, 141–147.
Lobanov M.Yu., Finkelstein A.V. 2009. Prediction of protein structure by analogy: II. Testing matrices of substitutions and pseudopoteintials used in protein sequence alignments with spatial matrices. Mol. Biol. 43, 733–740.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © M.Yu. Lobanov, N.S. Bogatyreva, D.N. Ivankov, A.V. Finkel’shtein, 2009, published in Molekulyarnaya Biologiya, 2009, Vol. 43, No. 4, pp. 722–732.
Rights and permissions
About this article
Cite this article
Lobanov, M.Y., Bogatyreva, N.S., Ivankov, D.N. et al. Analogy-based protein structure prediction: I. A new database of spatially similar and dissimilar structures of protein domains for testing and optimizing prediction methods. Mol Biol 43, 665–676 (2009). https://doi.org/10.1134/S0026893309040190
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0026893309040190