A New Approach to Multi-class SVM-Based Classification Using Error Correcting Output Codes
Protein fold classification is the prediction of protein’s tertiary structure (fold) from amino acid sequence without relying on the sequence similarity. The problem how to predict protein fold from amino acid sequence is regarded as a great challenge in computational biology and bioinformatics. To deal with this problem the support vector machine (SVM) classifier was introduced. However the SVM is a binary classifier, but protein fold recognition is a multi-class problem. So the method of solving this issue was proposed based on error correcting output codes (ECOC). The key problem in this approach is how to construct the optimal ECOC codewords. There are three strategies presented in this paper based on recognition ratios obtained by binary classfiers on the traing data set. The SVM classifier using the ECOC codewords contructed using these strategies was used on a real world data set. The obtained results (57.1% - 62.6%) are better than the best results published in the literature.
Unable to display preview. Download preview PDF.
- 3.Chan, H.S., Dill, K.: The protein folding problem. Physics Today, 24–32 (February 1993)Google Scholar
- 4.Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
- 5.Chung, I.F., Huang, C.D., Shen, Y.H., Lin, C.T.: Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 1159–1167. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 9.Dubchak, I., Muchnik, I., Kim, S.H.: Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings ISMB, vol. 5, pp. 104–107 (1997)Google Scholar
- 16.Okun, O.: Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, September 24, pp. 51–57 (2004)Google Scholar
- 17.Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large Margin DAGs for Multiclass Classification. In: Proceedings of Neural Information Processing Systems, pp. 547–553 (2000)Google Scholar
- 20.Vural, V., Dy, J.G.: A hierarchical method for multi-class support vector machines. In: Proceedings of the Twenty-First ICML, pp. 831–838 (2004)Google Scholar