Genome-Wide Protein Structure Prediction
The post-genomic era has witnessed an explosion of protein sequences in the public databases; but this has not been complemented by the availability of genome-wide structure and function information, due to the technical difficulties and labor expenses incurred by existing experimental techniques. The rapid advancements in computer-based protein structure prediction methods have enabled automated and yet reliable methods for generating three-dimensional (3D) structural models of proteins. Genome-scale structure prediction experiments have been conducted by a number of groups, starting as early as in 1997, and some noteworthy efforts have been made using the MODELLER and ROSETTA methods. Along another line, TOUCHSTONE was used to predict the structures of all 85 small proteins in the Mycoplasma genitalium genome, which established template-refinement-based structure prediction as a practical approach for genome-scale experiments. This was followed by the development of Threading ASSEmbly Refinement (TASSER) and Iterative Threading ASSEmbly Refinement (I-TASSER) algorithms which use a combination of various approaches for threading, fragment assembly, ab initio loop modeling, and structural refinement to predict the structures. A successful structural prediction for all medium-sized open reading frames (ORFs) in the Escherichia coli genome was demonstrated by this method, achieving high-accuracy models for 920 out of 1,360 proteins. G protein-coupled receptors (GPCRs) are an extremely important class of membrane proteins for which only very few structures are available in the Protein Data Bank (PDB). TASSER was used to predict the structures of all 907 putative GPCRs in the human genome, and the high accuracy confirmed by newly solved GPCR structures and recent blind tests have demonstrated the usefulness and robustness of the TASSER/I-TASSER models for the functional annotation of GPCRs. Recently, the I-TASSER protein structure prediction method has been used as a basis for functional annotation of protein sequences. The increasing popularity and need for such automated structure and function prediction algorithms can be judged by the fact that the I-TASSER server has generated structure predictions for 35,000 proteins submitted by more than 8,000 users from 86 countries in the last 24 months. The success of these modeling experiments demonstrates significant new progress in high-throughput and genome-wide protein structure prediction.
The project is supported in part by the Alfred P. Sloan Foundation, NSF Career Award (DBI 0746198), and the National Institute of General Medical Sciences (R01GM083107, R01GM084222).
- Hubbard R ed (2006) Structure based drug discovery, Royal Society of Chemistry.Google Scholar
- Li Y, Zhang Y (2009) REMO: a new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 76(3):665–676Google Scholar
- Miao Z, Luker KE, Summers BC, Berahovich R, Bhojani MS, Rehemtulla A, Kleer CG, Essner JJ, Nasevicius A, Luker GD, others (2007) CXCR7 (RDC1) promotes breast and lung tumor growth in vivo and is expressed on tumor-associated vasculature. Proc Natl Acad Sci USA 104(40):15735–15740PubMedCrossRefGoogle Scholar
- Michino M, Abola E, et al. (2009) Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008. Nat Rev Drug Discov 8(6):455–463Google Scholar
- Oldziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA, Khalili M, Arnautova YA, Jagielska A, Makowski M, others (2005) Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests. Proc Natl Acad Sci USA 102:7547–7552PubMedCrossRefGoogle Scholar
- Roy A, Kucukural A, Mukherjee S, Hefty PS, Zhang Y (2010) Large scale benchmarking of protein function prediction using modeled protein structures. J Mol Biol (Submitted)Google Scholar
- Sanchez R, Sali A (1997) Evaluation of comparative protein structure modelling by MODELLER-3. Proteins Suppl 1:50–58Google Scholar
- Watson S, Arkinstall S. (1994) The G protein linked receptors factbook. Academic, New York, NYGoogle Scholar
- Wu S, Zhang Y (2009) Improving protein tertiary structure assembly by sequence based contact predictions. SubmittedGoogle Scholar
- Zhang Y (2009a) I-TASSER: fully automated protein structure prediction in CASP8. Proteins:In pressGoogle Scholar