Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation

Holden, Nicholas; Freitas, Alex A.

doi:10.1007/s00500-008-0321-0

Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation

Focus
Published: 27 May 2008

Volume 13, pages 259–272, (2009)
Cite this article

Soft Computing Aims and scope Submit manuscript

Nicholas Holden¹ &
Alex A. Freitas¹

164 Accesses
23 Citations
Explore all metrics

Abstract

This paper focuses on hierarchical classification problems where the classes to be predicted are organized in the form of a tree. The standard top-down divide and conquer approach for hierarchical classification consists of building a hierarchy of classifiers where a classifier is built for each internal (non-leaf) node in the class tree. Each classifier discriminates only between its child classes. After the tree of classifiers is built, the system uses them to classify test examples one class level at a time, so that when the example is assigned a class at a given level, only the child classes need to be considered at the next level. This approach has the drawback that, if a test example is misclassified at a certain class level, it will be misclassified at deeper levels too. In this paper we propose hierarchical classification methods to mitigate this drawback. More precisely, we propose a method called hierarchical ensemble of hierarchical rule sets (HEHRS), where different ensembles are built at different levels in the class tree and each ensemble consists of different rule sets built from training examples at different levels of the class tree. We also use a particle swarm optimisation (PSO) algorithm to optimise the rule weights used by HEHRS to combine the predictions of different rules into a class to be assigned to a given test example. In addition, we propose a variant of a method to mitigate the aforementioned drawback of top-down classification. These three types of methods are compared against the standard top-down hierarchical classification method in six challenging bioinformatics datasets, involving the prediction of protein function. Overall HEHRS with the rule weights optimised by the PSO algorithm obtains the best predictive accuracy out of the four types of hierarchical classification method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ACO-Based Bayesian Network Ensembles for the Hierarchical Classification of Ageing-Related Proteins

Prediction of Protein Structure Classes with Ensemble Classifiers

An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions

References

Battiti R, Colla AM (1994) Democracy in neural nets: voting schemes for accuracy. Neural Netw 7: 691–707
Article Google Scholar
Bhasin M, Raghava GP (2004) GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids Res. 1 32(Web Server issue): 383–9
Article Google Scholar
Blockeel H, Schietgat L, Struyf J, Dzeroski S, Clare A (2006) Decision trees for hierarchical multilabel classification: a case study in functional genomics. In: Proceedings of PKDD 2006
Breiman L (1996) Bagging predictors. Mach Learn 24: 123–140
MATH MathSciNet Google Scholar
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20
Article Google Scholar
Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinformatics 7: 298
Article Google Scholar
Clare A (2003) Machine learning and data mining for yeast functional genomics. PhD Thesis, University of Wales Aberystwyth
Clerc M, Kennedy J (2002) The particle swarm-explosion, stability and convergence in a multidimensional complex space. IEEE Transactions on Evolutionary Computation 6(1)
Derbeko P, El-Yaniv R, Meir R (2002) Variance Optimized Bagging. In: Proceedings of 13th European conference on machine learning, pp 60–71
Dietterich TG (1997) Machine learning research: four current directions. AI Magaz 18(4): 97–136
Google Scholar
Eisner R, Poulin B, Szafron D, Lu P, Greiner R (2005) Improving protein function prediction using the hierarchical structure of the gene ontology. In: Proceedings of 2005 IEEE symposium on computational intelligence in bioinformatics and computational biology
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview, Advances in Knowledge Discovery and Data Mining. AAAI/MIT, Menlo Park, pp 1–34
Fillmore D (2004) It’s a GPCR world. Modern Drug Discovery 11(7): 24–28
Google Scholar
Guenter S, Bunke H (2004) Optimization of weights in a multiple classifier handwritten word recognition system using a genetic algorithm. ELCVIA(3) 1: 25–44
Google Scholar
Guo YZ, Li ML, Wang KL, Wen ZN, Lu MC, Liu LX, Jiang L (2006) Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids 30(4): 397–402
Article Google Scholar
Günter S, Bunke H (2004) Evaluation of classical and novel ensemble methods for handwritten word recognition. In: Proceedings of 10th international workshop on structural and syntactic pattern recognition (SSPR), pp 583–591
Hand DJ (1997) Construction and assessment of classification rules. Wiley, New York
MATH Google Scholar
Holden N, Freitas AA (2006) Hierarchical classification of G-protein-coupled receptors with a PSO/ACO algorithm. In: Proceedings of IEEE swarm intelligence symposium (SIS-06). IEEE Press, New York, pp 77–84
Huang Y, Cai J, Ji L, Li Y (2004) Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 28(4): 275–280
Article MATH Google Scholar
Hulo N, Sigrist CJA, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, Bairoch A (2004) Recent improvements to the PROSITE database. Nucleic Acids Res 32(Database issue): D134–D137
Article Google Scholar
Karchin R, Karplus K, Haussler D (2002) Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18(1): 147–59
Article Google Scholar
Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: IEEE international conference on evolutionary computation, May
Kennedy J, Eberhart RC, Shi Y (2001) Swarm intelligence. Morgan Kaufmann/Academic Press, San Francisco/New York
Google Scholar
McDowall J (2005) InterPro: Exploring a Powerful Protein Diagnostic Tool, ECCB05, Tutorial, pp 14
Mouser CR, Dunn SA (2004) Comparing genetic algorithms and particle swarm optimisation for an inverse problem exercise. In: Computational techniques and applications conference (CTAC-2004), September
Papasaikas PK, Bagos PG, Litou ZI, Hamodrakas SJ (2003) A novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models. SAR QSAR Environ Res 14(5–6): 413–420
Article Google Scholar
Ranawana R, Palade V (2005) MVGen—Ensemble learning for mcs majority voting with a genetic algorithm, internal report. Oxford University Computing Laboratory
Skalak DB (1997) Prototype selection for composite nearest neighbour classifiers. PhD Thesis, University of Massachusetts, Amherst
Stefano CD, Cioppa AD, Marcelli A (2003) Exploiting reliability for dynamic selection of classifiers by means of genetic algorithms. In: Proceedings of 7th Internatioanl conference on document analysis and recognition, vol 2, ICDAR. IEEE Computer Society, Washington, DC, pp 671
Sun A, Lim E-P (2001) Hierarchical text classification and evaluation. In: Proceedings of 2001 IEEE international conference on data mining, pp 521–528
Sun A, Lim E-P, Ng WK, Srivastava J (2004) Blocking reduction strategies in hierarchical text classification. IEEE Trans Knowl Data Eng 16(10): 1305–1308
Article Google Scholar
Peng T, Zuo W, He F (2006) Text Classification from Positive and Unlabeled Documents Based on GA, vecpar 06(7)
Tan A, Gilbert D, Deville Y (2003) Multi-class protein fold classification using a new ensemble machine learning approach. Genome Inf 14: 206–217
Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, Reading
Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computing Laboratory, University of Kent, Canterbury, CT2 7NF, UK
Nicholas Holden & Alex A. Freitas

Authors

Nicholas Holden
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicholas Holden.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holden, N., Freitas, A.A. Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation. Soft Comput 13, 259–272 (2009). https://doi.org/10.1007/s00500-008-0321-0

Download citation

Published: 27 May 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s00500-008-0321-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation

Abstract

Access this article

Similar content being viewed by others

ACO-Based Bayesian Network Ensembles for the Hierarchical Classification of Ageing-Related Proteins

Prediction of Protein Structure Classes with Ensemble Classifiers

An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation

Abstract

Access this article

Similar content being viewed by others

ACO-Based Bayesian Network Ensembles for the Hierarchical Classification of Ageing-Related Proteins

Prediction of Protein Structure Classes with Ensemble Classifiers

An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation