The Protein Journal

, Volume 25, Issue 5, pp 301–315

Quantitative Analysis of the Conservation of the Tertiary Structure of Protein Segments

  • Jishou Ruan
  • Ke Chen
  • Jack A. Tuszynski
  • Lukasz A. Kurgan

DOI: 10.1007/s10930-006-9016-5

Cite this article as:
Ruan, J., Chen, K., Tuszynski, J.A. et al. Protein J (2006) 25: 301. doi:10.1007/s10930-006-9016-5


The publication of the crystallographic structure of calmodulin protein has offered an example leading us to believe that it is possible for many protein sequence segments to exhibit multiple 3D structures referred to as multi-structural segments. To this end, this paper presents statistical analysis of uniqueness of the 3D-structure of all possible protein sequence segments stored in the Protein Data Bank (PDB, Jan. of 2003, release 103) that occur at least twice and whose lengths are greater than 10 amino acids (AAs). We refined the set of segments by choosing only those that are not parts of longer segments, which resulted in 9297 segments called a sponge set. By adding 8197 signature segments, which occur uniquely in the PDB, into the sponge set we have generated a benchmark set. Statistical analysis of the sponge set demonstrates that rotating, missing and disarranging operations described in the text, result in the segments becoming multi-structural. It turns out that missing segments do not exhibit a change of shape in the 3D-structure of a multi-structural segment. We use the root mean square distance for unit vector sequence (URMSD) as an improved measure to describe the characteristics of hinge rotations, missing, and disarranging segments. We estimated the rate of occurrence for rotating and disarranging segments in the sponge set and divided it by the number of sequences in the benchmark set which is found to be less than 0.85%. Since two of the structure changing operations concern negligible number of segment and the third one is found not to have impact on the structure, we conclude that the 3D-structure of proteins is conserved statistically for more than 98% of the segments. At the same time, the remaining 2% of the sequences may pose problems for the sequence alignment based structure prediction methods.


Multi-structural segments protein structure protein structure comparison protein structure conservation URMSD 


amino acid


Protein Data Bank


root mean square distance


root mean square distance for unit vector sequence




Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Jishou Ruan
    • 1
  • Ke Chen
    • 4
  • Jack A. Tuszynski
    • 2
    • 3
  • Lukasz A. Kurgan
    • 4
  1. 1.Chern Institute of Mathematics, College of Mathematical Science & LPMCNankai UniversityTianjinP. R. China
  2. 2.Department of PhysicsUniversity of AlbertaEdmontonCanada
  3. 3.Department of Experimental OncologyCross Cancer InstituteEdmontonCanada
  4. 4.Department of Electrical and Computer EngineeringUniversity of AlbertaEdmontonCanada

Personalised recommendations