Variant Nearest Neighbor Classification Algorithm for Text Document

  • M. S. V. S. Bhadri Raju
  • B. Vishnu Vardhan
  • V. Sowmya
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 249)

Abstract

Categorizing the text documents into predefined number of categories is called text classification. This paper analyzes various ways of applying nearest neighbor classification for text documents. Text document classification categorizes the documents into predefined classes. In this paper, cosine similarity measure is used to find the similarity between the documents. This similarity measure is applied on term frequency-Inverse document frequency vector space model representation of preprocessed Classic data set. The documents that are most similar to a document are said to be nearest neighbors of that document. In this work, nearest neighbors and k nearest neighbor classification algorithms are used to classify the documents into predefined classes and classifier accuracy is measured.

Keywords

Text document classification Stemming Algorithm Cosine Similarity Measure Classifier Accuracy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Khan, A., Bahurdin, B.B., Khan, K.: An Overview of E-Documents Classification. In: 2009 International Conference on Machine Learning and Computing IPCSIT, vol. 3. IACSIT Press, Singapore (2011)Google Scholar
  2. 2.
    Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A Noval Feature Selection Algorithm for text catogorization. Elsevier, Science Direct Expert System with Application 33(1), 1–5 (2006)CrossRefGoogle Scholar
  3. 3.
    Aha, D. (ed.): Lazy learning. Kluwer Academic Publishers (1997)Google Scholar
  4. 4.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  5. 5.
    Huang, A.: Similarity Measures for Text Document Clustering. Published in the Proceedings of New Zealand Computer Science Research Student Conference (2008)Google Scholar
  6. 6.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  7. 7.
    Han, J., Kamber, M.: Dat Mining concepts and techniques. Elsevier PublishersGoogle Scholar
  8. 8.
    Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers C-22(11) (November 1973)Google Scholar
  9. 9.
    Sandhya, N., Sri Lalitha, Y.: Analysis of Stemming Algorithm for Text Clustering. IJCSI International Journal of Computer Science 8(5(1)) (September 2011)Google Scholar
  10. 10.
    Kruengkrai, C., Jaruskulchai, C.: A Parallel Learning Algorithm for Text Classification. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Canada (July 2002)Google Scholar
  11. 11.
    Kamruzzaman, S.M., Haider, F., Hasan, A.R.: Text Classification Using Data MiningGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • M. S. V. S. Bhadri Raju
    • 1
  • B. Vishnu Vardhan
    • 2
  • V. Sowmya
    • 3
  1. 1.Department of CSESRKR Engg. CollegeBhimavaramIndia
  2. 2.Department of ITJNTUCEJagityalaIndia
  3. 3.Department of CSEGRIETHyderabadIndia

Personalised recommendations