Predicting the Solubility of Recombinant Proteins in Escherichia coli

  • Roger G. HarrisonEmail author
  • Miguel J. Bagajewicz
Part of the Methods in Molecular Biology book series (MIMB, volume 1258)


We describe a statistical model that uses binomial logistic regression for predicting the solubility of heterologous proteins expressed in E. coli. The model is based on a set of proteins reported to have been expressed in E. coli in either soluble or insoluble form. The 22 parameters used in the final model based on proteins’ amino acid composition are discussed. The overall accuracy of the model developed is 94 %. The way to use this model on the website for the prediction of protein solubility is explained.

Key words

Heterologous protein solubility prediction Escherichia coli Binomial logistic regression model 



We thank graduate student Armando Diaz and undergraduate students Emanuele Tomba, Reese Lennarson, and Rex Richard for their help in developing the logistic regression model; undergraduate students Dolores Gutierrez-Cacciabue, Nathan Liles, and Zehra Tosun for their help in developing the protein database; and undergraduate student Andrew Lambeth for developing the website for the model.


  1. 1.
    Diaz AA, Tomba E, Lennarson R et al (2009) Prediction of protein solubility in Escherichia coli using logistic regression. Biotechnol Bioeng 105:374–383CrossRefGoogle Scholar
  2. 2.
    Davis GD, Elisee C, Newham DM et al (1999) New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng 65:382–388PubMedCrossRefGoogle Scholar
  3. 3.
    Walter S, Buchner J (2002) Molecular chaperones—cellular machines for protein folding. Angew Chem Int Ed Engl 41:1098–1113PubMedCrossRefGoogle Scholar
  4. 4.
    Schein CH, Noteborn MHM (1988) Formation of soluble recombinant proteins in Escherichia coli is favored by lower growth temperature. Bio/Technology 6:291–294CrossRefGoogle Scholar
  5. 5.
    Baneyx F, Mujacic M (2004) Recombinant protein folding and misfolding in Escherichia coli. Nat Biotechnol 22:1399–1408PubMedCrossRefGoogle Scholar
  6. 6.
    Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New YorkCrossRefGoogle Scholar
  7. 7.
    Idicula-Thomas S, Balaji PV (2005) Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli. Protein Sci 14:582–592PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci U S A 78:3824–3828PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of Chemical, Biological and Materials EngineeringUniversity of OklahomaNormanUSA
  2. 2.School of Chemical, Biological and Materials EngineeringUniversity of OklahomaNormanUSA

Personalised recommendations