Skip to main content
Log in

Quantitative Analysis of the Conservation of the Tertiary Structure of Protein Segments

  • Published:
The Protein Journal Aims and scope Submit manuscript

 

The publication of the crystallographic structure of calmodulin protein has offered an example leading us to believe that it is possible for many protein sequence segments to exhibit multiple 3D structures referred to as multi-structural segments. To this end, this paper presents statistical analysis of uniqueness of the 3D-structure of all possible protein sequence segments stored in the Protein Data Bank (PDB, Jan. of 2003, release 103) that occur at least twice and whose lengths are greater than 10 amino acids (AAs). We refined the set of segments by choosing only those that are not parts of longer segments, which resulted in 9297 segments called a sponge set. By adding 8197 signature segments, which occur uniquely in the PDB, into the sponge set we have generated a benchmark set. Statistical analysis of the sponge set demonstrates that rotating, missing and disarranging operations described in the text, result in the segments becoming multi-structural. It turns out that missing segments do not exhibit a change of shape in the 3D-structure of a multi-structural segment. We use the root mean square distance for unit vector sequence (URMSD) as an improved measure to describe the characteristics of hinge rotations, missing, and disarranging segments. We estimated the rate of occurrence for rotating and disarranging segments in the sponge set and divided it by the number of sequences in the benchmark set which is found to be less than 0.85%. Since two of the structure changing operations concern negligible number of segment and the third one is found not to have impact on the structure, we conclude that the 3D-structure of proteins is conserved statistically for more than 98% of the segments. At the same time, the remaining 2% of the sequences may pose problems for the sequence alignment based structure prediction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.

Similar content being viewed by others

Abbreviations

amino acid:

AA

Protein Data Bank:

PDB

root mean square distance:

RMSD

root mean square distance for unit vector sequence:

URMSD

three-dimensional:

3D

References

  • Anfinsen, C. B. (1973). Science 81:223–233

    Article  Google Scholar 

  • Barrientos L. G., Louis J. M., Botos I., Mori T., Han Z., O’Keefe B. R., Boyd M. R., Wlodawer A., Gronenborn A. M. (2002). Structure 10(5):673–686

    Article  CAS  Google Scholar 

  • Bamborough P., Duncan D., Richards W. G. (1994) Protein Eng. 7(9):1077–1082

    CAS  Google Scholar 

  • Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N., Bourne P. E. (2000) Nucleic Acids Res. 28:235–242

    Article  CAS  Google Scholar 

  • Brody S. S., Gough S. P., Kannangara C. G. (1999) Proteins 37(3):485–493

    Article  CAS  Google Scholar 

  • Chen K., Ruan J., and Kurgan, L. A. (2006) The Protein J. 25:(1), 57–70

    Article  CAS  Google Scholar 

  • Chew L. P., Huttenlocher D., Kedem K., Kleinberg J. (1999) J. Comput. Biol. 6(3–4):313–325

    Article  CAS  Google Scholar 

  • Ding J., Das K., Hsiou Y., Sarafianos S. G., Clark A. D., Jacobo-Molina A., Tantillo C., Hughes S. H., Arnold E. (1998) J. Mol. Biol. 284(4):1095–1111

    Article  CAS  Google Scholar 

  • Drum C. L., Yan S.-Z., Bard J., Shen Y.-Q., Lu D., Soelaiman S., Grabarek Z., Bohm A., Tang W. J. (2002) Nature 415:396–402

    Article  CAS  Google Scholar 

  • Elshorst B., Hennig M., Forsterling H., Diener A., Maurer M., Schulte P., Schwalbe H., Griesinger C., Krebs J., Schmid H., Vorherr T., Carafoli E. (1999) Biochemistry 38(38):12320–12332

    Article  CAS  Google Scholar 

  • Falzone C. J., Wang Y., Vu B. C., Scott N. L., Bhattacharya S., Lecomte J. T. (2001) Biochemistry 40: 4879–4891

    Article  CAS  Google Scholar 

  • Hansson M., Gough S. P., Brody S. S. (1997) Proteins 27(4):517–522

    Article  CAS  Google Scholar 

  • Kabsch W. (1978) Acta Crystallogr. A34:827–828

    Google Scholar 

  • Kihara D., Skolnick J. (2003) J. Mol. Biol. 334:793–802

    Article  CAS  Google Scholar 

  • Korolev S., Hsieh J., Gauss G. H., Lohman T. M., Waksman G. (1997) Cell 90(4):635–647

    Article  CAS  Google Scholar 

  • Lindberg J., Sigurdsson S., Lowgren S., Andersson H. O., Sahlberg C., Noreen R., Fridborg K., Zhang H., Unge T. (2002) Eur. J. Biochem. 269(6):1670–1677

    Article  CAS  Google Scholar 

  • Meador W. E., Means A. R., Quiocho F. A. (1992) Science 257(5074):1251–1255

    Article  CAS  Google Scholar 

  • Reva B. A., Finkelstein A. V., Skolnick J. (1998) Fold Des. 3(2):141–147

    Article  CAS  Google Scholar 

  • Schumacher M. A., Crum M., Miller M. C. (2004) Structure (Camb) 12(5):849–860

    Article  CAS  Google Scholar 

  • Shen, S. Y., Yu, T., Kai, B., Ruan, J. S. (2004). J. Eng. Math. 21:(6), 862–870 (in Chinese)

    Google Scholar 

  • Tiraboschi G., Jullian N., Thery V., Antonczak S., Fournie-Zaluski M. C., Roques B. P. (1999) Protein Eng. 12(2):141–149

    Article  CAS  Google Scholar 

  • Toyoshima C., Nakasako M., Nomura H., Ogawa H. (2000) Nature 405(6787): 647–655

    Article  CAS  Google Scholar 

  • Toyoshima C., Nomura H. (2002) Nature 418(6898):605–611

    Article  CAS  Google Scholar 

  • Veerapandian B. (1992) Biophys. J. 62(1):112–115

    Article  CAS  Google Scholar 

  • Xu C., Rice W. J., He W., Stokes D. L. (2002) J. Mol. Biol. 316(1):201–211

    Article  CAS  Google Scholar 

  • Yap K. L., Yuan T., Mal T. K., Vogel H. J., Ikura M. (2003) J. Mol. Biol. 328(1):193–204

    Article  CAS  Google Scholar 

  • Yona G., Kedem K, (2005) J. Comput. Biol. 12(1):12–32

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz A. Kurgan.

Additional information

*Jishou Ruan research was supported by Liuhui Center for Applied Mathematics, China-Canada exchange program administered by MITACS and NSFC (10271061).

#Ke Chen and Lukasz A. Kurgan research was partially supported by NSERC Canada.

Jack A. Tuszynkski research has been supported by MITACS, NSERC Canada and the Allard Foundation.

Appendix 1

Appendix 1

For proving that an absolutely conserved segment must be a conserved segment, we first consider the definition of URMSD. We need to prove the following statement mathematically.

Let

$$ d(\{v_i\},\{w_i\})=\min\nolimits_\phi\left\{{\sqrt{\frac{1}{n} \sum\limits_{i=1}^n {\vert\vert v_i-\phi (w_i)\vert\vert^2}}}\right\} $$

be the URMSD between the two unit vector sequences \(\left\{{v_i}\right\}_{i=1}^n\) and \(\left\{{w_i}\right\}_{i=1}^n\), and let d i be the URMSD between the pair of \(v_{i+1},\ldots,v_{i+5}\) and \(\phi (w_{i+1}),\ldots,\phi (w_{i+5})\).

Then \(d(\{v_i\},\{w_i\})\leq \max\{d_i\;\vert \;i=0,1,2,\ldots,n-5\}\) for all \(n\geq 10\).

Proof. It is easily followed that

$$\begin{array}{l} \min_\phi \left\{\sqrt{\frac{1}{n}\sum\limits_{i=1}^n {\vert\vert v_i-\phi (w_i)\vert\vert^2}}\right\}\\ \mathop{=}\limits^{y_i =\phi (w_i)}\min_{\{\phi\}} \left\{{\sqrt{\frac{1}{n}\sum\limits_{i=1}^n {\vert\vert v_i-\phi (w_i)\vert\vert^2}}} \right\}\\ =\min_{\{\phi\}} \left\{{\sqrt{\frac{1}{n}\sum\limits_{i=1}^n {\vert \vert v_i\vert\vert^2+\vert\vert\phi (w_i)\vert\vert^2-2(v_i,\phi (w_i))}}}\right\}\\ =\min_{\{\phi\}} \left\{{\sqrt {\frac{1}{n}\sum\limits_{i=1}^n {[2-2(v_i,\phi (w_i))]}}}\right\}\\ =\min_{\{\phi\}} \left\{\sqrt{2-\frac{2}{n}\sum\limits_{i=1}^n {(v_i,\phi (w_i))}}\right\}\\ \end{array}$$

We can regard that \(\sum\limits_{i=1}^n (v_i,\phi (w_i))\) as the trace of the correlation matrix R(n), where

$$ R(n,\phi)=\left({\begin{array}{l} (v_1,\phi (w_1))\;(v_1,\phi (w_2 ))\;\ldots\;(v_1,\phi (w_n))\\ (v_2,\phi (w_1))\;(v_2,\phi (w_2 ))\;\ldots\;(v_2,\phi (w_n))\\ \quad\ldots \;\ldots \\ (v_n,\phi (w_1))\;(v_n,\phi (w_2))\;\ldots\;(v_n,\phi (w_n))\\ \end{array}}\right). $$

That is, we have

$$ d(\{v_i\},\{w_i\})=\min\nolimits_{\{\phi\}} \sqrt{\frac{2n-2\times \hbox{trace}(R(n,\phi))}{n}} $$

For a fixed ϕ and every pair of five-unit-vector \(v_{i+1}, \ldots ,v_{i+5}\) and \(\phi (w_{i+1}),\ldots ,\phi (w_{i+5})\), we have a correlation matrix

$$ R_i (5)=\left({\begin{array}{l} (v_{i+1},\phi (w_{i+1}))\;(v_{i+1},\phi (w_{i+2}))\;\ldots \;(v_{i+1}, \phi (w_{i+5}))\\ (v_{i+2},\phi (w_{i+1}))\;(v_{i+2},\phi (w_{i+2}))\;\ldots \;(v_{i+2}, \phi (w_{i+5}))\\ \quad\ldots \;\ldots .\\ (v_{i+5},\phi (w_{i+1}))\;(v_{i+5},\phi (w_{i+2}))\;\ldots \;(v_{i+5}, \phi (w_{i+5}))\\ \end{array}}\right) $$

Let \(d_i =\sqrt{\hbox{trace}(R_i (5)^\prime R_i (5))}\) for \(i=0,1,2,\ldots ,n-5\). For convenience, then we may assume \(d_0 =\max\{d_i\;\vert \;i=0,1,2,\ldots ,n-5\}\), then

$$ \sqrt {\sum\limits_{j=1}^5 {\sum\limits_{j=1}^5 (v_i,\phi (w_j))^2}} \geq\sqrt{\sum\limits_{j=1}^5 {\sum\limits_{j=1}^5 (v_{i+k},\phi (w_{j+k}))^2}}\hbox{ for all }k\geq 1. $$

Considering the relationship

$$ \begin{array}{l} \sqrt{\hbox{tr}(R(n)^\prime R(n))} =\sqrt{\sum\limits_{j=1}^n \sum\limits_{i=1}^n {(v_i,\phi (w_j))^2}} =\sqrt {\sum\limits_{j=1}^n (\sum\limits_{i=1}^5 (v_i,\phi (w_j))^2+\sum\limits_{i=5}^n (v_i,\phi (w_j))^2)}\\ =\sqrt{\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,\phi (w_j))^2+} \sum\limits_{j=6}^n\sum\limits_{i=1}^5 {(v_i,\phi (w_j))^2+ \sum\limits_{j=1}^5\sum\limits_{i=6}^n {(v_i,\phi (w_j))^2+ \sum\limits_{j=6}^n\sum\limits_{i=6}^n {(v_i,\phi (w_j))^2}}}}\\ \end{array} $$

By ordinary fact of structure of protein: for most AAs, the state at site i is not more frequently correlated to the state at site j if the distance between the two sites is greater than 5 AAs. That is, we may assume that \(R(n)^\prime R(n)\) has the following relations mathematically:

  • The number of the set \(\left\{{k\;\vert \;\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,\phi (w_j))^2 <\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_{i+k},\phi (w_j))^2}}} \right\}\) related to n − 5 is very small.

  • The number of the set \(\left\{{k\;\vert \;\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_i,\phi (w_j))^2 <\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_i,\phi (w_{j+k}))^2}}} \right\}\) is also very small related to n − 5.

  • \((v_i,\phi (w_j))^2>(v_{i+k},\phi (w_j))^2\) and \((v_i,\phi (w_j))^2>(v_i,\phi (w_{j+k}))^2\) for all \(i,j\leq 5\) and almost all k > 6.

Then let \(y_i =\phi (w_i)\), we have

  • \(\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,y_j)^2 \geq\sum\limits_{j=5}^5 \sum\limits_{i=1}^5 {(v_{i+k},y_j)^2}}\),

  • \(\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_i,y_j)^2 \geq\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_i,y_{j+k})^2}}\)

  • \(\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_i,y_j)^2 \geq\sum\limits_{i=1}^5 \sum\limits_{j=1}^5 {(v_{i+k},y_{j+k})^2}}\)

for all k > 6. Without lost the generality, we may assume that \(n=0\;(\hbox{mod}5)\), and then we have

$$ \begin{array}{l} \frac{1}{n}\sqrt{\hbox{tr}(R(n)^{\prime}R(n))} =\sqrt{\frac{1}{n^2} \sum\limits_{j=1}^n\sum\limits_{i=1}^n {(v_i,y_j)^2}}\\ =\sqrt{\frac{1}{n^2}\left({\sum\limits_{j=1}^5\sum\limits_{i=1}^5 {(v_i,y_j)^2+}\sum\limits_{j=6}^n\sum\limits_{i=1}^5 {(v_i,y_j)^2+ \sum\limits_{j=1}^5\sum\limits_{i=6}^n {(v_i,y_j)^2+ \sum\limits_{j=6}^n\sum\limits_{i=6}^n {(v_i,y_j)^2}}}}\right)}\\ =\sqrt{\frac{1}{n^2}\left({\frac{n}{5}\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,y_j)^2+}\sum\limits_{j=6}^n\sum\limits_{i=1}^5 {(v_i,y_j)^2+\sum\limits_{j=1}^5\sum\limits_{i=6}^n {(v_i,y_j)^2}}} \right)}\\ \quad \leq\sqrt{\frac{1}{n^2}\left({\frac{n}{5}\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,y_j)^2+} \frac{n-5}{5}\sum\limits_{i=1}^5 {(v_i,y_j)^2+\frac{n-5}{5}\sum\limits_{j=1}^5 {(v_i,y_j)^2}}}\right)}\\ \quad \leq\sqrt{\frac{1}{n^2}\left({\frac{n}{5}\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,y_j)^2+} \frac{n-5}{5}\sum\limits_{i=1}^5 {\sum\limits_{j=1}^5 {(v_i,y_j)^2}}}\right)}\\ =\sqrt{\frac{2n-5}{5n^2}\left({\sum\limits_{j=1}^5\sum\limits_{i=1}^5 {(v_i,y_j)^2}}\right)} <\sqrt {\frac{2}{5n}\left({\sum\limits_{j=1}^5 \sum\limits_{i=1}^5 {(v_i,y_j)^2}} \right)} \mathop{\leq} \limits^{n\geq 10} \frac{1}{5}\sqrt {\hbox{trace}(R(5)^{\prime}R(5)}\\ \end{array} $$

That is, we have proved that

$$ \frac{\sqrt{\sigma_1 (n)^2+\sigma_2 (n)^2+\cdots +\sigma_n(n)^2}} {n}<\frac{\sqrt{\sigma_1 (5)^2+\sigma_2 (5)^2+\cdots +\sigma_5 (5)^2}}{5}\hbox{ if }n\geq 10. $$

where \(\sigma_i (j)\) for \(j=5,n\) and \(i\leq j\), are the singular value of R(5) and R(n) respectively. Replacing R(n) and R(5) by their “squared root”:

$$ R^{\frac{1}{2}}(n)=\left({\begin{array}{l} \sqrt{\vert (v_1,y_1)\vert} \;\sqrt{\vert (v_1,y_2)\vert} \;\ldots.\;\sqrt{\vert (v_1,y_n)\vert}\\ \sqrt{\vert (v_2,y_1)\vert} \;\sqrt{\vert (v_2,y_2)\vert} \;\ldots .\;\sqrt{\vert (v_2,y_n)\vert}\\ \quad\ldots .\;\ldots .\\ \sqrt{\vert (v_n,y_1)\vert} \;\sqrt{\vert (v_n,y_2)\vert} \;\ldots .\;\sqrt{\vert (v_n,y_n)\vert}\\ \end{array}} \right) $$

and with the same argument, we have

$$ \frac{\sigma_1 (n)+\sigma_2 (n)+\cdots +\sigma_n (n)}{n}< \frac{\sigma_1(5)+\sigma_2 (5)+\cdots +\sigma_5 (5)}{5} \quad \hbox{ if }n\geq 10. $$

That is, \(\frac{\hbox{trace}(svd\;R(5))}{5}>\frac{\hbox{trace}(svd\;R(n))}{n}\) for \(n\geq 10\).

Therefore, the maximal URMD among the all five-unit-vectors is greater than the URMSD for the whole segment. This ends the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, J., Chen, K., Tuszynski, J.A. et al. Quantitative Analysis of the Conservation of the Tertiary Structure of Protein Segments. Protein J 25, 301–315 (2006). https://doi.org/10.1007/s10930-006-9016-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10930-006-9016-5

Keywords

Navigation