RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification
Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a single generative model. This assumption can be restrictive in many real world classification tasks. We describe RNBL-MN, which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on a multinomial event model (one for each class at each node in the tree). In our experiments on protein sequence and text classification tasks, we observe that RNBL-MN substantially outperforms NB classifier. Furthermore, our experiments show that RNBL-MN outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information.
KeywordsSupport Vector Machine Class Label Nominal Attribute Splitting Criterion Decision Tree Learner
Unable to display preview. Download preview PDF.
- 1.McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)Google Scholar
- 2.Andorf, C., Silvescu, A., Dobbs, D., Honavar, V.: Learning classifiers for assigning protein sequences to gene ontology functional families. In: 5th International Conference on Knowledge Based Computer Systems, pp. 256–265 (2004)Google Scholar
- 3.Langley, P.: Induction of recursive bayesian classifiers. In: Proc. of the European Conf. on Machine Learning, London, UK, pp. 153–164. Springer-Verlag, Heidelberg (1993)Google Scholar
- 4.Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
- 6.Kang, D.K., Zhang, J., Silvescu, A., Honavar, V.: Multinomial event model based abstraction for sequence and text classification. In: 6th International Symposium on Abstraction, Reformulation and Approximation, pp. 134–148 (2005)Google Scholar
- 7.Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods: support vector learning, 185–208 (1999)Google Scholar
- 8.Apté, C., Damerau, F., Weiss, S.M.: Towards language independent automated learning of text categorization models. In: 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 23–30 (1994)Google Scholar
- 9.Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th international conference on Information and knowledge management, pp. 148–155. ACM Press, New York (1998)Google Scholar
- 12.Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)Google Scholar
- 14.Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar