Background

Protein structural alignment plays a key role in defining gold standards for a variety of bioinformatics applications. These include homology assessment, phylogenetic tree construction and multiple sequence alignment evaluation. Our recent findings [1] however showed that superposition methods are rather sensitive to structural variation. To sidestep the problem of alignment variability, golden standards are often derived from the more conserved and 'trusted' regions. It therefore remains unclear which structural elements characterize alignment variability and what functional information these discarded flexible regions entail.

Methods

The dataset was taken from Pirovano et al. [1] and consists of 565 proteins for which in total 2998 alternative crystal structures with redundant sequences were available. Structural alignments were made using DALI [2].

Results

In this study we shed more light on the structural features and functional importance associated with flexible alignment regions. We observe that helices and coils constitute the main source of alignment variability (around 60% and 30%, respectively), while strands appear to be more robust (see Figure 1A). Additional alignment inspection shows that many secondary structure elements are not consistently aligned thus giving rise to mismatches between secondary structure types. Functional investigation using Prosite [3] reveals that roughly 20% of all flexible alignment positions correspond to functional sites (see Figure 1B), similar to stably aligned regions. Interestingly, post-translational modification sites are strongly represented and particularly phosphorylation sites are prominent. It is therefore unwarranted to assume that these flexible regions only play a minor role in protein function. An example of how the alignment of structural motifs can be impacted by tiny structural variations is given by Figure 2, which shows the alignment between a Glutaminyl-tRNA synthetase and a Caspase-8.

Figure 1
figure 1

Structure and function analysis of alignment variability by means of the standard deviation of residue shifts over an ensemble of alternate alignments (sigma). (A) Shows the fraction of helical residues increasing with higher variability at the cost of the fraction of coil residues. (B) Shows the even distribution of functional sites (as grouped by Prosite) over a large range of variability.

Figure 2
figure 2

Dramatic variations in the alignment between a Glutaminyl-tRNA synthetase (1qtq) and a Caspase-8 (1qtn). For 1qtq, the flexible alignment regions contain two PKC phosphorylation sites ('SKR' and 'TDK') in a helix and a myristoylation site ('GNKWCI') in a coil region. (A) The 'master-slave' alignments with 1qtn as master, secondary structures at the top (1qtn) and the bottom (1qtq), functional sites marked in apposite windows. (B) The different Glutaminyl-tRNA synthetase and Caspase-8 structures with 1qtq and alternatives in orange and 1qtn and alternatives in blue. Functional motifs are marked. Image rendered using SwissPDBViewer [4] and PovRay [5].

Conclusion

Our results imply that the current 'gold' standard status of structural alignment should be considered 'silver'. Particularly our observation that helices are associated with flexible alignment regions is at odds with currently used alignment strategies. Moreover, given that functional importance is spread evenly between stably and flexibly aligned regions, we conclude that flexible regions cannot be excluded from analysis of functionality in proteins. In order to explore new strategies for homology detection, phylogeny and alignment we propose that, as a first step, more golden standards be developed that can more comprehensively represent the structural, functional and evolutionary signals.