Quantitative Biology

, Volume 6, Issue 4, pp 307–312 | Cite as

Selecting near-native protein structures from ab initio models using ensemble clustering

  • Li Li
  • Huanqian Yan
  • Yonggang Lu
Research Article



Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.


Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.


We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.


The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.


near-native structure protein structure prediction ab initio decoy ensemble clustering k-medoids 



This work is supported by the National Key R&D Program of China (No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).


  1. 1.
    UniProtKB/TrEMBL Protein Database Release Statistics. (Accessed Jun 30, 2017)
  2. 2.
    Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599CrossRefGoogle Scholar
  3. 3.
    Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300CrossRefGoogle Scholar
  4. 4.
    Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174CrossRefGoogle Scholar
  5. 5.
    Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560CrossRefGoogle Scholar
  6. 6.
    Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of lowenergy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162CrossRefGoogle Scholar
  7. 7.
    Kaufman, L. and Rousseeuw, P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge, Y. (ed.). Basel: Birkhäuser BaselGoogle Scholar
  8. 8.
    Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106CrossRefGoogle Scholar
  9. 9.
    Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134CrossRefGoogle Scholar
  10. 10.
    Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666CrossRefGoogle Scholar
  11. 11.
    Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686CrossRefGoogle Scholar
  12. 12.
    Asur, S., Ucar, D., and Parthasarathy, S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29–i40CrossRefGoogle Scholar
  13. 13.
    Pirim, H. and Seker, S. E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpenGoogle Scholar
  14. 14.
    Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710CrossRefGoogle Scholar
  15. 15.
    Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–vCrossRefGoogle Scholar
  16. 16.
    Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8CrossRefGoogle Scholar
  17. 17.
    Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40CrossRefGoogle Scholar
  18. 18.
    The 11th Critical Assessment of Techniques for Protein Structure Prediction. (Accessed Jun 30, 2017)Google Scholar
  19. 19.
    Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871CrossRefGoogle Scholar
  20. 20.
    Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information Science and EngineeringLanzhou UniversityLanzhouChina

Personalised recommendations