Predicting the Solubility of Recombinant Proteins in Escherichia coli
We describe a statistical model that uses binomial logistic regression for predicting the solubility of heterologous proteins expressed in E. coli. The model is based on a set of proteins reported to have been expressed in E. coli in either soluble or insoluble form. The 22 parameters used in the final model based on proteins’ amino acid composition are discussed. The overall accuracy of the model developed is 94 %. The way to use this model on the website http://www.ou.edu/ for the prediction of protein solubility is explained.
Key wordsHeterologous protein solubility prediction Escherichia coli Binomial logistic regression model
We thank graduate student Armando Diaz and undergraduate students Emanuele Tomba, Reese Lennarson, and Rex Richard for their help in developing the logistic regression model; undergraduate students Dolores Gutierrez-Cacciabue, Nathan Liles, and Zehra Tosun for their help in developing the protein database; and undergraduate student Andrew Lambeth for developing the website for the model.