Predicting protein crystallization propensity from protein sequence

Babnigg, György; Joachimiak, Andrzej

doi:10.1007/s10969-010-9080-0

Predicting protein crystallization propensity from protein sequence

Published: 23 February 2010

Volume 11, pages 71–80, (2010)
Cite this article

Journal of Structural and Functional Genomics

György Babnigg¹ &
Andrzej Joachimiak¹

433 Accesses
28 Citations
Explore all metrics

Abstract

The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Article Open access 19 April 2023

Software for molecular docking: a review

Article 16 January 2017

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

Abbreviations

GRAVY:: Grand average hydropathy
MCSG:: Midwest Center for Structural Genomics
pI:: Iso-electric point
PSI:: Protein Structure Initiative
SVM:: Support vector machine

References

Gao X et al (2005) High-throughput limited proteolysis/mass spectrometry for protein domain elucidation. J Struct Funct Genomics 6(2–3):129–134
Article CAS PubMed Google Scholar
Koth CM et al (2003) Use of limited proteolysis to identify protein domains suitable for structural analysis. Methods Enzymol 368:77–84
Article CAS PubMed Google Scholar
Dong A et al (2007) In situ proteolysis for protein crystallization and structure determination. Nat Methods 4(12):1019–1021
Article CAS PubMed Google Scholar
Goldschmidt L et al (2007) Toward rational protein crystallization: a web server for the design of crystallizable protein variants. Protein Sci 16(8):1569–1576
Article CAS PubMed Google Scholar
Kim Y et al (2008) Large-scale evaluation of protein reductive methylation for improving protein crystallization. Nat Methods 5(10):853–854
Article CAS PubMed Google Scholar
Nocek B et al (2005) Crystal structures of delta1-pyrroline-5-carboxylate reductase from human pathogens Neisseria meningitides and Streptococcus pyogenes. J Mol Biol 354(1):91–106
Article CAS PubMed Google Scholar
Slabinski L et al (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405
Article CAS PubMed Google Scholar
Bertone P et al (2001) SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res 29(13):2884–2898
Article CAS PubMed Google Scholar
Canaves JM et al (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J Mol Biol 344(4):977–991
Article CAS PubMed Google Scholar
Goh CS et al (2003) SPINE 2: a system for collaborative structural proteomics within a federated database framework. Nucleic Acids Res 31(11):2833–2838
Article CAS PubMed Google Scholar
Oldfield CJ et al (2005) Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 59(3):444–453
Article CAS PubMed Google Scholar
Overton IM, Barton GJ (2006) A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett 580(16):4005–4009
Article CAS PubMed Google Scholar
Slabinski L et al (2007) The challenge of protein structure determination—lessons from structural genomics. Protein Sci 16(11):2472–2482
Article CAS PubMed Google Scholar
Smialowski P et al (2006) Will my protein crystallize? A sequence-based predictor. Proteins 62(2):343–355
Article CAS PubMed Google Scholar
Price WN II et al (2009) Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 27(1):51–57
Article CAS PubMed Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
Article CAS PubMed Google Scholar
Marsden RL, Orengo CA (2008) Target selection for structural genomics: an overview. Methods Mol Biol 426:3–25
Article CAS PubMed Google Scholar
Eddy SR (1995) Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120
CAS PubMed Google Scholar
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365
Article CAS PubMed Google Scholar
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763
Article CAS PubMed Google Scholar
Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol 22(10):1315–1316
Article CAS PubMed Google Scholar
Eddy SR, Mitchison G, Durbin R (1995) Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol 2(1):9–23
Article CAS PubMed Google Scholar
Martelli PL et al (2002) A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins. Bioinformatics 18(Suppl 1):S46–S53
PubMed Google Scholar
Ward JJ et al (2004) The DISOPRED server for the prediction of protein disorder. Bioinformatics 20(13):2138–2139
Article CAS PubMed Google Scholar
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed Google Scholar
Babnigg G, Giometti CS (2004) GELBANK: a database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes. Nucleic Acids Res 32(Database issue): D582–D585
Google Scholar
Kawashima S, Ogata H, Kanehisa M (1999) AAindex: amino acid index database. Nucleic Acids Res 27(1):368–369
Article CAS PubMed Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:56–69
Article Google Scholar
Stols L et al (2002) A new vector for high-throughput, ligation-independent cloning encoding a tobacco etch virus protease cleavage site. Protein Expr Purif 25(1):8–15
Article CAS PubMed Google Scholar
Bjellqvist B et al (1994) Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 15(3–4):529–539
Article CAS PubMed Google Scholar
Kall L, Krogh A, Sonnhammer EL (2007) Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Res 35(Web Server issue):W429–W432
Google Scholar
Chang C et al (2010) Extracytoplasmic PAS-like domains are common in signal transduction proteins. J Bacteriol 192(4):1156–1159
Google Scholar
Kawashima S et al (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36(Database issue):D202–D205
Google Scholar
Chothia C (1975) Structural invariants in protein folding. Nature 254(5498):304–308
Article CAS PubMed Google Scholar
Monne M et al (1999) Turns in transmembrane helices: determination of the minimal length of a “helical hairpin” and derivation of a fine-grained turn propensity scale. J Mol Biol 293(4):807–814
Article CAS PubMed Google Scholar
Monne M, Hermansson M, von Heijne G (1999) A turn propensity scale for transmembrane helices. J Mol Biol 288(1):141–145
Article CAS PubMed Google Scholar
Palau J, Argos P, Puigdomenech P (1982) Protein secondary structure. Studies on the limits of prediction accuracy. Int J Pept Protein Res 19(4):394–401
Article CAS PubMed Google Scholar
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article CAS PubMed Google Scholar
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405(2):442–451
CAS PubMed Google Scholar
Chen K, Kurgan L, Rahbari M (2007) Prediction of protein crystallization using collocation of amino acid pairs. Biochem Biophys Res Commun 355(3):764–769
Article CAS PubMed Google Scholar
Overton IM et al (2008) ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics 24(7):901–907
Article CAS PubMed Google Scholar
Chou PY, Fasman GD (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47:45–148
CAS PubMed Google Scholar
Munoz V, Serrano L (1994) Intrinsic secondary structure propensities of the amino acids, using statistical phi-psi matrices: comparison with experimental scales. Proteins 20(4):301–311
Article CAS PubMed Google Scholar
Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202(4):865–884
Article CAS PubMed Google Scholar
Richardson JS, Richardson DC (1988) Amino acid preferences for specific locations at the ends of alpha helices. Science 240(4859):1648–1652
Article CAS PubMed Google Scholar
Ponnuswamy PK et al (1980) Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim Biophys Acta 623(2):301–316
Google Scholar
Rackovsky S, Scheraga HA (1982) Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids. Macromolecules 15(5):1340–1346
Google Scholar
Tanaka S, Scheraga HA (1977) Statistical mechanical treatment of protein conformation. 5. A multistate model for specific-sequence copolymers of amino acids. Macromolecules 10(1):9–20
Google Scholar

Download references

Acknowledgments

This work was supported by the National Institutes of Health grant GM074942 and by the U.S. Department of Energy, Office of Biological and Environmental Research, under contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Midwest Center for Structural Genomics, Biosciences Division, Argonne National Laboratory, 9700 S Cass Ave., Argonne, IL, 60439, USA
György Babnigg & Andrzej Joachimiak

Authors

György Babnigg
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Joachimiak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to György Babnigg or Andrzej Joachimiak.

Additional information

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOCX 257 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Babnigg, G., Joachimiak, A. Predicting protein crystallization propensity from protein sequence. J Struct Funct Genomics 11, 71–80 (2010). https://doi.org/10.1007/s10969-010-9080-0

Download citation

Received: 25 November 2009
Accepted: 05 February 2010
Published: 23 February 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10969-010-9080-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting protein crystallization propensity from protein sequence

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Software for molecular docking: a review

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

(DOCX 257 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting protein crystallization propensity from protein sequence

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Software for molecular docking: a review

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

(DOCX 257 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation