Effect of simple ensemble methods on protein secondary structure prediction

Bouziane, Hafida; Messabih, Belhadri; Chouarfia, Abdallah

doi:10.1007/s00500-014-1355-0

Effect of simple ensemble methods on protein secondary structure prediction

Methodologies and Application
Published: 19 August 2014

Volume 19, pages 1663–1678, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Hafida Bouziane¹,
Belhadri Messabih¹ &
Abdallah Chouarfia¹

409 Accesses
20 Citations
Explore all metrics

Abstract

Ensemble methods for building improved classifier models have been an important topic in machine learning, pattern recognition and data mining areas, where they have shown great promise. They boast a robustness that has spearheaded their application in many practical classification problems, especially when there is a significant diversity among the ensemble members. Actually, they replace traditional machine learning techniques in many applications and special attention has been devoted to them as a mean to improve the prediction accuracy for problems of high complexity. Several combination rules have been investigated in this context. However, it is claimed that no rule is always better than others for designing an optimal decision. The present study evaluates the performance of two different ensemble methods for protein secondary structure prediction. We focus on weighted opinions pooling and the most common aggregation rules for decisions inference. The ensemble members are accurate protein secondary structure single model predictors namely, Multi-Class Support Vector Machines and Artificial Neural Networks. Experiments are carried out using cross-validation tests on RS126 and CB513 benchmark datasets. Our results clearly confirm that ensembles are more accurate than a single model and the experimental comparison of the investigated ensemble schemes demonstrates that the newly introduced rule called Exponential Opinion Pool competes well against state-of-the-art fixed rules, especially the sum rule which in some cases is able to achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Abbreviations

ANN:: Artificial Neural Networks
BLAST:: Basic Local Alignment Search Tool
BLOSUM:: BLOck SUbstitution Matrix
ExpOP:: Exponential Opinion Pool
FNN:: Feed-Forward Neural Network
IFS:: Ideal fold selection
LinOP:: Linear Opinion Pool
LogOP:: Logarithm Opinion Pool
MLP:: Multi-Layer Perceptron
M-SVM:: Multi-Class Support Vector Machines
MV:: Majority vote
PSI-BLAST:: Position-Specific Iterative BLAST
PSSP:: Protein secondary structure prediction
RBFNN:: Radial Basis Function Neural Network
SVM:: Support Vector Machines
WMax:: Weighted Max
WMin:: Weighted Min

References

Anfinsen C (1973) Principles that govern the folding of protein chains. Science 181:223
Article Google Scholar
Baumgartner D, Serpen G (2012) Global-local hybrid ensemble classifier for KDD 2004 cup particle physics dataset. Int J Mach Learn Comput 2(3):231–234
Article Google Scholar
Bouziane H, Messabih B, Chouarfia A (2011) Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evolut Bioinform 7:171–189
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH MathSciNet Google Scholar
Chen J, Chaudhari N (2006) Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction. Soft Comput 10:315–324
Article Google Scholar
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Google Scholar
Cuff J, Barton G (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins Struct Funct Genet 34(4):508–519
Didaci L, Fumera G, Roli F (2013) Diversity in classifier ensembles: fertile concept or dead end? Lecture Notes in Computer Science, vol 7872, pp 37–48
Dietterich T (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science, vol 1857, pp 1–15
Dietterich T (1997) Machine-learning research: four current directions. AI Mag 18(4):97–136
Google Scholar
Dietterich T (2002) Ensemble learning. In: Arbib MA (ed) The handbook of brain theory and neural networks, 2nd edn. Bradford Books, The MIT Press, Cambridge
Google Scholar
Guermeur Y, Lifchitz A, Vert R (2004) Kernel methods in computational biology. MIT Press, Cambridge
Guermeur Y, Monfrini E (2011) A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22(1):73–96
Guermeur Y, Thomarat F (2011) Estimating the class posterior probabilities in protein secondary structure prediction. In: 6th IAPR international conference on pattern recognition in bioinformatics, pp 260–271
Hansen J (2000) Combining predictors: meta machine learning methods and bias/variance & ambiguity decompositions. PhD thesis, BRICS, Department of Computer Science, University of Aarhus, pp 1–191
Jiao T, Zong G, Zheng W (2013) New stability conditions for GRNs with neutral delay. Soft Comput 17:703–712
Article MATH Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22:2577–2637
Article Google Scholar
Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239
Article Google Scholar
Kuncheva L, Bezdek J, Guin R (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34(2):299–314
Article MATH Google Scholar
Kuncheva L (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34:299–314
Article MATH Google Scholar
Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York
Google Scholar
Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51:181–207
Lauer F, Guermeur Y (2011) MSVMpack: a multi-class support vector machine package. J Mach Learn Res 12:2269–2272. http://www.loria.fr/lauer/MSVMpack
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81
Article MATH MathSciNet Google Scholar
Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Article Google Scholar
Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neural network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541
Ou Y, Oyang Y, Chen C (2005) A novel radial basis function network classifier with centers set by hierarchical clustering. In: International joint conference on neural networks (IJCNN), vol 1, pp 1383–1388
Pauling L, Corey R, Branson H (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Natl Acad Sci USA 37(4):205–211
Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, chapter 5. The MIT Press, Cambridge, pp 61–73
Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232(2):584–599
Article Google Scholar
Rost B, Sander C (1993) Prediction of secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599
Article Google Scholar
Rost B, Sander C (1994) Combining evolutionnary information and neural networks to predict protein secondary structure prediction. Proteins 19:55–72
Article Google Scholar
Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56–68
Article Google Scholar
Schapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press, Cambridge
Sewell M (2011) Ensemble learning. Research Note, pp 1–12
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, New Jersey
Tuliakov S, Jaejer S, Govindaraju V, Doermann D (2008) Review of classifier combination methods, vol 90. Machine learning in document analysis and recognition. Springer, Berlin
Wallace B (2012) Class probability estimates are unreliable for imbalanced data (and How to Fix Them). In: 13th IEEE international conference on data mining, pp 695–704
Weston J, Watkins C (1998) Multi-class support vector machines. Tech. Rep. CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science
Whalen S, Pandey G (2013) A comparative analysis of ensemble classifiers: case studies in genomics. In: 13th IEEE international conference on data mining
Wolpert D (1992) Stacked generalisation. Neural Netw 5:241–259
Article Google Scholar
Xu L, Krzyÿzak A, Suen C (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst 22(3):418–435
Google Scholar
Zemla A, Venclovas Č, Fidelis K, Rost B (1999) A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins Struct Funct Genet 34:220–223. http://proteinmodel.org/AS2TS/SOV/sov.html
Zhang Z, Jordan M (2006) Bayesian multicategory support vector machines. In: UAI’06, pp 552–559
Zong G, Liu J, Zhang Y, Hou L (2010) Delay-range-dependent exponential stability criteria and decay estimation for switched hopfield neural networks of neutral type. Nonlinear Anal Hybrid Syst 4(3):583–592
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, USTO-MB University, BP 1505, El M’naouer, Oran, Algeria
Hafida Bouziane, Belhadri Messabih & Abdallah Chouarfia

Authors

Hafida Bouziane
View author publications
You can also search for this author in PubMed Google Scholar
Belhadri Messabih
View author publications
You can also search for this author in PubMed Google Scholar
Abdallah Chouarfia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hafida Bouziane.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouziane, H., Messabih, B. & Chouarfia, A. Effect of simple ensemble methods on protein secondary structure prediction. Soft Comput 19, 1663–1678 (2015). https://doi.org/10.1007/s00500-014-1355-0

Download citation

Published: 19 August 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s00500-014-1355-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of simple ensemble methods on protein secondary structure prediction

Abstract

Access this article

Similar content being viewed by others

Prediction of Protein Structure Classes with Ensemble Classifiers

Ensemble of Diversely Trained Support Vector Machines for Protein Fold Recognition

MMEC: Multi-Modal Ensemble Classifier for Protein Secondary Structure Prediction

Notes

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effect of simple ensemble methods on protein secondary structure prediction

Abstract

Access this article

Similar content being viewed by others

Prediction of Protein Structure Classes with Ensemble Classifiers

Ensemble of Diversely Trained Support Vector Machines for Protein Fold Recognition

MMEC: Multi-Modal Ensemble Classifier for Protein Secondary Structure Prediction

Notes

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation