Abstract
Multi-class classification is an important and challenging problem for biological data classification. Typical methods for dealing with multi-class classification use a powerful single classifier such as neural networks to classify the data into one of many classes. Alternatively, the binary classifiers are used in one-versus-one (OVO) and one-versus-all (OVA) classifier schemes for multi-class classification. However, it is not clear whether OVO or OVA yields good performance results. In this paper, we propose a greedy method for developing a hierarchical classifier where each node corresponds to a binary classifier. The advantage of our greedy hierarchical classifier is that at the nodes any type of classifier can be used. In this paper, we analyze the performance of the proposed technique using neural networks and naive Bayesian classifiers and compare our results with OVO, OVA, and exhaustive methods. Our greedy technique provided better and more robust accuracy than others in general for biological data sets including 3- to 8-classes.
Similar content being viewed by others
References
Allwein E, Schapire R, Singer Y (2002) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141. doi:10.1162/15324430152733133
Asuncion A, Newman DJ (2007) Uci Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine. http://mlearn.ics.uci.edu/MLRepository.html. Accessed May 2012
Bay SD (1998) Combining nearest neighbor classifiers through multiple feature subsets. In: Proceedings of the 17th international conference on machine learning, Madison, WI, pp 37–45
Begum S, Aygun R (2012) Analyzing the performance of hierarchical binary classifiers for multi-class classification problem using biological data. ICMLA 2, IEEE, pp 145–150. doi:10.1109/ICMLA.2012.165
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, New York
Casasent D, Wang Y (2005) A hierarchical classifier using new support vector machine for automatic target recognition. IJCNN, IEEE 18(5–6):541–548. doi:10.1016/j.neunet.2005.06.033
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. doi:10.1023/A:1022627411411
Demuth H, Baele M (1994) Neural network toolbox. User’s guide. The MathWorks Inc, Natick
Duda R, Hart P, Stork D (2000) Pattern classification. Wiley-Interscience, New York
El-Alfy E (2010) A hierarchical GMDH-based polynomial neural network for handwritten numeral recognition using topological features. In: IJCNN, IEEE, pp 1–7
Escalera S, Pujol O, Radeva P (2008) On the decoding process in ternary error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 32(1):120–134. doi:10.1109/TPAMI.2008.266
Escalera S, Pujol O, Radeva P (2009) Separability of ternary codes for sparse designs of error correcting output codes. Pattern Recogn Lett 30:285–297. doi:10.1016/j.patrec.2008.10.002
Escalera S, Pujol O, Radeva P (2010) Error-correcting ouput codes library. J Mach Learn Res 11:661–664
Friedman J (1996) Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University
Gupta K, Agarwal K, Prakash N, Singh B, Misra K (2012) Prediction of miRNA in HIV-1 genome and its targets through artificial neural network: a bioinformatics approach. Netw Model Anal Health Inform Bioinform 1(4):141–151. doi:10.1007/s13721-012-0017-3
Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) An enhanced selective naive Bayes method with optimal discretization. Feature extraction: foundations and applications, Chap. 25. Springer, pp 499–507
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Advances in neural information processing systems, vol 10. MIT Press, Cambridge, pp 507–513
Hulse J, Khoshgoftaar M, Napolitano A, Wald R (2012) Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw Model Anal Health Inform Bioinform 1(1–2):47–61. doi:10.1007/s13721-012-0006-6
Jain P, Wadhwa P, Aygun R, Podila G (2008) Vector-G: multi-modular SVM-based heterotrimeric G-protein prediction. Silico Biol 8(2):141–155
Kumar S, Gosh J, Crawford M (2002) Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Anal Appl 5:210–220. doi:10.1007/s100440200019
Lorena A, Carvalho A (2008) Tree decomposition of multiclass problems. In: Proceedings of the Brazilian symposium on neural networks (SBRN), pp 189–194. doi:10.1109/SBRN.2008.43
Nagi S, Bhattacharyya D (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinform. doi:10.1007/s13721-013-0034-x
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. Advances in neural information processing systems. MIT Press, pp 547–553
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI workshop on empirical methods in artificial intelligence
Sánchez-Maroño N, Alonso-Betanzos A, Garcia-Gonzalez P, Bolón-Canedo V (2010) Multiclass classifiers vs multiple binary classifiers using filters for feature selection. In: IJCNN, IEEE, pp 1–8
Tibshirani R, Hastie T (2007) Margin trees for high-dimensional classification. J Mach Learn Res 8:637–652
Vural V, Dy JG (2004) A hierarchical method for multi-class support vector machines. In: Proceedings of the 21st international conference on machine learning, p 105. doi:10.1145/1015330.1015427
Wang Y, Casasent D (2006) Hierarchical K-means clustering using new support vector machines for multi-class classification. In: Proceedings of the international joint conference on neural networks, pp 3457–3464
Acknowledgments
We would like to acknowledge Marc Pusey, Ph.D., of iXpressGenes, Inc. for providing the Protein Crystallization dataset and Madhav Sigdel for extracting features from this dataset.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Begum, S., Aygun, R.S. Greedy hierarchical binary classifiers for multi-class classification of biological data. Netw Model Anal Health Inform Bioinforma 3, 53 (2014). https://doi.org/10.1007/s13721-014-0053-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-014-0053-2