Text Classification: Combining Grouping, LSA and kNN vs Support Vector Machine
Text classification is a key technique for handling and organizing text data. The support vector machine(SVM) is shown to be better for the classification among well-known methods. In this paper, the grouping method of the similar words, is proposed for the classification of documents, which is applied to Reuters news and it is shown that the grouping of words has equivalent ability to the Latent Semantic Analysis(LSA) in the classification accuracy. Further, a new combining method is proposed for the classification, which consists of Grouping, LSA followed by the k-Nearest Neighbor classification ( k-NN ). The combining method proposed here, shows the higher accuracy in the classification than the conventional methods of the kNN, and the LSA followed by the kNN. Then, the combining method shows almost same accuracies as SVM.
KeywordsSupport Vector Machine Classification Accuracy Text Classification Latent Semantic Analysis Vector Space Model
Unable to display preview. Download preview PDF.
- 2.Sebastiani, F.: A tutorial on automated text categorization. In: Proc. of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence. Buenos Aires, pp. 7–35 (1999)Google Scholar
- 7.Sirmakessis, S.: Text Mining and its Application, p. 204. Springer, Heidelberg (2003)Google Scholar
- 8.Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, p. 285. Wiley, Chichester (2003)Google Scholar
- 11.Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 1999, pp. 42–49 (1999)Google Scholar
- 12.Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 2001, pp. 128–136 (2001)Google Scholar