Effect of low-complexity regions on protein structure determination

Bannen, Ryan M.; Bingman, Craig A.; Phillips, George N.

doi:10.1007/s10969-008-9039-6

Effect of low-complexity regions on protein structure determination

Published: 27 February 2008

Volume 8, pages 217–226, (2007)
Cite this article

Journal of Structural and Functional Genomics

Ryan M. Bannen^1,2,
Craig A. Bingman^1,3 &
George N. Phillips Jr.^1,3

473 Accesses
11 Citations
Explore all metrics

Abstract

It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these “low-complexity” sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decoding the Structural Keywords in Protein Structure Universe

Article 18 January 2019

LCR-BLAST—A New Modification of BLAST to Search for Similar Low Complexity Regions in Protein Sequences

Protein Structure Prediction: Are We There Yet?

Abbreviations

CESG:: Center for Eukaryotic Structural Genomics
PDB:: Protein Data Bank
NMR:: Nuclear magnetic resonance
HSQC:: Heteronuclear single quantum coherence

References

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242
Article PubMed CAS Google Scholar
Canaves JM, Page R, Wilson IA, Stevens RC (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J Mol Biol 344(4):977–991
Article PubMed CAS Google Scholar
Daughdrill GW, Chadsey MS, Karlinsey JE, Hughes KT, Dahlquist FW (1997) The C-terminal half of the anti-sigma factor, FlgM, becomes structured when bound to its target, sigma 28. Nat Struct Biol 4(4):285–291
Article PubMed CAS Google Scholar
Dunker A, Lawson J, Brown C, Williams R, Romero P, Oh J, Oldfield C, Campen A, Ratliff C, Hipps K, Ausio J, Nissen M, Reeves R, Kang C, Kissinger C, Bailey R, Griswold M, Chiu W, Garner E, Obradovic Z (2001) Intrinsically disordered protein. J Mol Graph Model 19(1):26–59
Article PubMed CAS Google Scholar
Goh CS, Lan N, Douglas SM, Wu B, Echols N, Smith A, Milburn D, Montelione GT, Zhao H, Gerstein M (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J Mol Biol 336(1):115–130
Article PubMed CAS Google Scholar
Golding GB (1999) Simple sequence is abundant in eukaryotic proteins. Protein Sci 8(6):1358–1361
Article PubMed CAS Google Scholar
Huntley MA, Golding GB (2002) Simple sequences are rare in the Protein Data Bank. Proteins 48(1):134–140
Article PubMed CAS Google Scholar
Huth JR, Bewley CA, Nissen MS, Evans JN, Reeves R, Gronenborn AM, Clore GM (1997) The solution structure of an HMG-I(Y)-DNA complex defines a new architectural minor groove binding motif. Nat Struct Biol 4(8):657–665
Article PubMed CAS Google Scholar
Kay BK, Williamson MP, Sudol M (2000) The importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains. Faseb J 14(2):231–241
PubMed CAS Google Scholar
Li X, Romero P, Rani M, Dunker AK, Obradovic Z (1999) Predicting protein disorder for N-, C-, and internal regions. Genome Inform Ser Workshop Genome Inform 10:30–40
PubMed CAS Google Scholar
Linding R, Jensen L J, Diella F, Bork P, Gibson TJ, Russell RB (2003) Protein disorder prediction: implications for structural proteomics. Structure 11(11):1453–1459
Article PubMed CAS Google Scholar
Linding R, Russell RB, Neduva V, Gibson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31(13):3701–3708
Article PubMed CAS Google Scholar
Liu J, Tan H, Rost B (2002) Loopy proteins appear conserved in evolution. J Mol Biol 322(1):53–64
Article PubMed CAS Google Scholar
Marcotrigiano J, Gingras AC, Sonenberg N, Burley SK (1999) Cap-dependent translation initiation in eukaryotes is regulated by a molecular mimic of eIF4G. Mol Cell 3(6):707–716
Article PubMed CAS Google Scholar
Michelitsch MD, Weissman JS (2000) A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions. Proc Natl Acad Sci USA 97(22):11910–11915
Article PubMed CAS Google Scholar
Nandi T, Dash D, Ghai R, B-Rao C, Kannan K, Brahmachari SK, Ramakrishnan C, Ramachandran S (2003) A novel complexity measure for comparative analysis of protein sequences from complete genomes. J Biomol Struct Dyn 20(5):657–668
PubMed CAS Google Scholar
Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL (2005) Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 59(3):444–453
Article PubMed CAS Google Scholar
Romero P, Obradovic Z, Dunker K (1997) Sequence data analysis for long disordered regions prediction in the Calcineurin family. Genome Inform Ser Workshop Genome Inform 8:110–124
PubMed CAS Google Scholar
Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins 42(1):38–48
Article PubMed CAS Google Scholar
Shin SW, Kim SM (2005) A new algorithm for detecting low-complexity regions in protein sequences. Bioinformatics 21(2):160–170
Article PubMed CAS Google Scholar
Sim KL, Creamer TP (2002) Abundance and distributions of eukaryote protein simple sequences. Mol Cell Proteomics 1(12):983–995
Article PubMed CAS Google Scholar
Wootton JC (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18(3):269–285
Article PubMed CAS Google Scholar
Wootton JC, Federhen S (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem 17(2):149–163
Article CAS Google Scholar

Download references

Acknowledgments

The authors thank Dmitry A. Kondrashov and John L. Markley for helpful comments regarding this paper. The authors would also like to thank Sarah C. Cunningham for assistance with the statistics tests. R.M.B was supported by NLM training grant T15LM007359 and DOE training grant DE-FG2-04ER25627. C.A.B. and G.N.P were supported by the Center for Eukaryotic Structural Genomics NIH/NIGMS grant numbers U54 GM074901-01 and P50 GM064598.

Author information

Authors and Affiliations

Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, 53711, USA
Ryan M. Bannen, Craig A. Bingman & George N. Phillips Jr.
Bacter Institute, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, 53711, USA
Ryan M. Bannen
Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, 53711, USA
Craig A. Bingman & George N. Phillips Jr.

Authors

Ryan M. Bannen
View author publications
You can also search for this author in PubMed Google Scholar
Craig A. Bingman
View author publications
You can also search for this author in PubMed Google Scholar
George N. Phillips Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George N. Phillips Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bannen, R.M., Bingman, C.A. & Phillips, G.N. Effect of low-complexity regions on protein structure determination. J Struct Funct Genomics 8, 217–226 (2007). https://doi.org/10.1007/s10969-008-9039-6

Download citation

Received: 03 August 2007
Accepted: 05 February 2008
Published: 27 February 2008
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10969-008-9039-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of low-complexity regions on protein structure determination

Abstract

Access this article

Similar content being viewed by others

Decoding the Structural Keywords in Protein Structure Universe

LCR-BLAST—A New Modification of BLAST to Search for Similar Low Complexity Regions in Protein Sequences

Protein Structure Prediction: Are We There Yet?

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effect of low-complexity regions on protein structure determination

Abstract

Access this article

Similar content being viewed by others

Decoding the Structural Keywords in Protein Structure Universe

LCR-BLAST—A New Modification of BLAST to Search for Similar Low Complexity Regions in Protein Sequences

Protein Structure Prediction: Are We There Yet?

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation