Finding Longest Common Segments in Protein Structures in Nearly Linear Time

  • Yen Kaow Ng
  • Hirotaka Ono
  • Ling Ge
  • Shuai Cheng Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)


The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a 1, …, a n ) and B = (b 1, …, b n ) where a k , b k  ∈ ℝ3, we are to find, among all the segments f = (a i ,…,a j ) and g = (b i ,…,b j ) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean square deviation (RMSD) between f and g is to be within a given t ∈ ℝ; (2) f and g can be superposed such that for each k, i ≤ k ≤ j, ||a k  − b k || ≤ t for a given t ∈ ℝ. We give an algorithm of \(O(n\log n+n\mbox{\it \textbf{l}})\) time complexity when the first requirement applies, where \(\mbox{\it \textbf{l}}\) is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ε ∈ ℝ, finds a segment of length at least l, but of RMSD up to (1 + ε)t, in O(nlogn + n/ε) time. We propose an FPTAS which for any given ε ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k, i ≤ k ≤ j, ||a k  − b k || ≤ (1 + ε) t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(nlog2 n/ε 5) when consecutive points in A are separated by the same distance (which is the case with protein structures).


Time Complexity Root Mean Square Deviation Longe Length Protein Structure Prediction Consecutive Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 698–700 (1987)CrossRefGoogle Scholar
  2. 2.
    Bowie, J.U., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known 3-dimensional structure. Science 253(5016), 164–170 (1991)CrossRefGoogle Scholar
  3. 3.
    Bryant, S.H., Altschul, S.F.: Statistics of sequence-structure threading. Current Opinion in Structural Biology 5(2), 236–244 (1995)CrossRefGoogle Scholar
  4. 4.
    Choi, V., Goyal, N.: A Combinatorial Shape Matching Algorithm for Rigid Protein Docking. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 285–296. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press (2009)Google Scholar
  6. 6.
    Cristobal, S., Zemla, A., Fischer, D., Rychlewski, L., Elofsson, A.: A study of quality measures for protein threading models. BMC Bioinformatics 2(5) (2001)Google Scholar
  7. 7.
    Jones, D.T., Taylor, W.R., Thornton, J.M.: A new approach to protein fold recognition. Nature 358, 86–89 (1992)CrossRefGoogle Scholar
  8. 8.
    Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 32(5), 922–923 (1976)CrossRefGoogle Scholar
  9. 9.
    Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 34(5), 827–828 (1978)CrossRefGoogle Scholar
  10. 10.
    Leszek, R., Daniel, F., Arne, E.: Livebench-6: large-scale automated evaluation of protein structure prediction servers. Proteins 53(suppl. 6), 542–547 (2003)Google Scholar
  11. 11.
    Li, S.C., Bu, D., Xu, J., Li, M.: Finding nearly optimal GDT scores. J. Comput. Biol. 18(5), 693–704 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Siew, N., Elofsson, A., Rychlewski, L., Fischer, D.: Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9), 776–785 (2000)CrossRefGoogle Scholar
  13. 13.
    Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 268(1), 209–225 (1997)CrossRefGoogle Scholar
  14. 14.
    Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 376–380 (1991)CrossRefGoogle Scholar
  15. 15.
    Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative tasser simulations. BMC Biology 5(17) (2007)Google Scholar
  16. 16.
    Zemla, A.: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Research 31(13), 3370–3374 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yen Kaow Ng
    • 1
  • Hirotaka Ono
    • 2
  • Ling Ge
    • 3
  • Shuai Cheng Li
    • 4
  1. 1.Department of Computer Science, Faculty of Information and Communication TechnologyUniversiti Tunku Abdul RahmanMalaysia
  2. 2.Department of Economic Engineering, Faculty of EconomicsKyushu UniversityJapan
  3. 3.College of BusinessUniversity of Massachusetts DartmouthNorth DartmouthUSA
  4. 4.Department of Computer ScienceCity University of Hong KongHong Kong

Personalised recommendations