On the Convergence of Protein Structure and Dynamics. Statistical Learning Studies of Pseudo Folding Pathways

  • Alessandro Vullo
  • Andrea Passerini
  • Paolo Frasconi
  • Fabrizio Costa
  • Gianluca Pollastri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4973)


Many algorithms that attempt to predict proteins’ native structure from sequence need to generate a large set of hypotheses in order to ensure that nearly correct structures are included, leading to the problem of assessing the quality of alternative 3D conformations. This problem has been mostly approached by focusing on the final 3D conformation, with machine learning techniques playing a leading role. We argue in this paper that additional information for recognising native-like structures can be obtained by regarding the final conformation as the result of a generative process reminiscent of the folding process that generates structures in nature. We introduce a coarse representation of protein pseudo-folding based on binary trees and introduce a kernel function for assessing their similarity. Kernel-based analysis techniques empirically demonstrate a significant correlation between information contained into pseudo-folding trees and features of native folds in a large and non-redundant set of proteins.


Binary Tree Conformational Space Protein Structure Prediction Folding Process Folding Pathway 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alm, E., Baker, D.: Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. PNAS 96, 11305–11310 (1999)CrossRefGoogle Scholar
  2. 2.
    Baker, D.: A surprising simplicity to protein folding. Nature 405, 39–42 (2000)CrossRefGoogle Scholar
  3. 3.
    Bau, D., Martin, A.J.M., Mooney, C., Vullo, A., Walsh, I., Pollastri, G.: Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7(402) (2006)Google Scholar
  4. 4.
    Bau, D., Pollastri, P., Vullo, A.: Distill: a machine learning approach to ab initio protein structure prediction. In: Bandyopadhyay, S., Maulik, U., Wang, J. (eds.) Analysis of Biological Data: A Soft Computing Approach, World Scientific, Singapore (2007)Google Scholar
  5. 5.
    Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining–An Overview. Fundamenta Informaticæ 66(1-2), 161–198 (2005)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Cristianini, N., Kandola, J., Elisseef, A., Shawe-Taylor, J.: On kernel-target alignment, innovations in Machine Learning, pp. 205–256 (2006)Google Scholar
  7. 7.
    Dinner, A.R., Sali, A., Smith, L.J., Dobson, C.M., Karplus, M.: Understanding protein folding via free-energy surfaces from theory to experiments. Trends Biochem 25(7), 331–339 (2000)CrossRefGoogle Scholar
  8. 8.
    Dobson, C.M.: The structural basis of protein folding and its links with human disease. Phil. Trans. R. Soc. Lond. 356, 133–145 (2001)CrossRefGoogle Scholar
  9. 9.
    Dobson, C.M.: Protein folding and misfolding. Nature 426, 884–890 (2003)CrossRefGoogle Scholar
  10. 10.
    Friesner, R.A., Prigogine, I., Rice, A.S.: Computational methods for protein folding. In: Advances in Chemical Physics, vol. 120, John Wiley, Chichester (2002)Google Scholar
  11. 11.
    Hockenmaier, J., Joshi, A.K., Dill, K.A.: Routes are trees: The parsing perspective on protein folding. Proteins 66, 1–15 (2007)CrossRefGoogle Scholar
  12. 12.
    Maity, H., Maity, M., Krishna, M., Mayne, L., Englander, S.W.: Protein folding: the stepwise assembly of foldon units. PNAS 102, 4741–4746 (2005)CrossRefGoogle Scholar
  13. 13.
    Meila, M., Shi, J.: A random walks view of spectral segmentation. AISTATS (2001)Google Scholar
  14. 14.
    Abstracts of the CASP7 conference, Asilomar, CA, USA, 26-30/11/ (2007),
  15. 15.
    Plaxco, K.W., Simons, K.T., Ruczinski, I., Baker, D.L.: Topology, stability, sequence and length. Defining the determinants of two-state protein folding kinetics. Biochemistry 39, 11177–11183 (2000)CrossRefGoogle Scholar
  16. 16.
    Pollastri, G., Vullo, A., Frasconi, P., Baldi, P.: Modular DAG-RNN architectures for assembling coarse protein structures. J. Comp. Biol. 13(3), 631–650 (2006)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Tosatto, S.C.: The victor/FRST function for model quality estimation. J. Comp. Biol. 12(10), 1316–1327 (2005)CrossRefGoogle Scholar
  18. 18.
    Vendruscolo, M., Kussell, E., Domany, E.: Recovery of protein structure from contact maps. Folding and Design 2, 295–306 (1997)CrossRefGoogle Scholar
  19. 19.
    Vendruscolo, M., Paci, E., Dobson, C., Karplus, M.: 3 key residues form a critical contact network in a protein folding transition state. Nature 409, 641–645 (2001)CrossRefGoogle Scholar
  20. 20.
    Verma, D., Meila, M.: A comparison of spectral clustering algorithms. TR 03-05-01, University of Washington (2001)Google Scholar
  21. 21.
    Wright, C.F., Lindorff-Larsen, K., Randles, L.G., Clarke, J.: Parallel protein-unfolding pathways revealed and mapped. Nature Struct Biol 10, 658–662 (2003)CrossRefGoogle Scholar
  22. 22.
    Zaki, M.J., Nadimpally, V., Bardhan, D., Bystroff, C.: Predicting protein folding pathways. Bioinformatics 20, i386–393 (2004)Google Scholar
  23. 23.
    Zhang, Y., Skolnick, J.: Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Alessandro Vullo
    • 1
  • Andrea Passerini
    • 2
  • Paolo Frasconi
    • 2
  • Fabrizio Costa
    • 2
  • Gianluca Pollastri
    • 1
  1. 1.School of Computer Science and InformaticsUniversity College DublinDublin 4Ireland
  2. 2.Dipartimento di Sistemi e InformaticaUniversità degli Studi di FirenzeFirenzeItaly

Personalised recommendations