Variant Nearest Neighbor Classification Algorithm for Text Document
Categorizing the text documents into predefined number of categories is called text classification. This paper analyzes various ways of applying nearest neighbor classification for text documents. Text document classification categorizes the documents into predefined classes. In this paper, cosine similarity measure is used to find the similarity between the documents. This similarity measure is applied on term frequency-Inverse document frequency vector space model representation of preprocessed Classic data set. The documents that are most similar to a document are said to be nearest neighbors of that document. In this work, nearest neighbors and k nearest neighbor classification algorithms are used to classify the documents into predefined classes and classifier accuracy is measured.
KeywordsText document classification Stemming Algorithm Cosine Similarity Measure Classifier Accuracy
Unable to display preview. Download preview PDF.
- 1.Khan, A., Bahurdin, B.B., Khan, K.: An Overview of E-Documents Classification. In: 2009 International Conference on Machine Learning and Computing IPCSIT, vol. 3. IACSIT Press, Singapore (2011)Google Scholar
- 3.Aha, D. (ed.): Lazy learning. Kluwer Academic Publishers (1997)Google Scholar
- 5.Huang, A.: Similarity Measures for Text Document Clustering. Published in the Proceedings of New Zealand Computer Science Research Student Conference (2008)Google Scholar
- 7.Han, J., Kamber, M.: Dat Mining concepts and techniques. Elsevier PublishersGoogle Scholar
- 8.Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers C-22(11) (November 1973)Google Scholar
- 9.Sandhya, N., Sri Lalitha, Y.: Analysis of Stemming Algorithm for Text Clustering. IJCSI International Journal of Computer Science 8(5(1)) (September 2011)Google Scholar
- 10.Kruengkrai, C., Jaruskulchai, C.: A Parallel Learning Algorithm for Text Classification. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Canada (July 2002)Google Scholar
- 11.Kamruzzaman, S.M., Haider, F., Hasan, A.R.: Text Classification Using Data MiningGoogle Scholar