Skip to main content

Classifying News Articles Using Feature Similarity K Nearest Neighbor

  • Conference paper
  • First Online:
Book cover Proceedings of the Sixth International Conference on Green and Human Information Technology (ICGHIT 2018)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 502))

Included in the following conference series:

  • 559 Accesses

Abstract

This research proposes the KNN (K Nearest Neighbor) which computes the similarity between data items considering features or attributes as well as one to one values. The assumption of the independency among attributes is the violation against the reality especially in the text classification where words are used as features of texts. In this research, we define the similarity measure which considers both attributes and attribute values, modify the traditional version of KNN using the similarity measure, and apply it to the task of text classification. As benefits from this research, it provides the more compact representations of texts and the better performance. Therefore, the goal of this research is to implement the text categorization system with its more efficient data representations and better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  2. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  3. Jo, T.: The implementation of dynamic document organization using text categorization and text clustering. Ph.D. Dissertation of University of Ottawa (2006)

    Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Survey 34, 1–47 (2002)

    Article  Google Scholar 

  5. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification with string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    MATH  Google Scholar 

  6. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  7. Kate, R.J., Mooney, R.J.: Using string kernels for learning semantic parsers. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 913–920 (2006)

    Google Scholar 

  8. Jo, T., Cho, D.: Index based approach for text categorization. Int. J. Math. Comput. Simul. 2, 127–132 (2008)

    Google Scholar 

  9. Jo, T.: Device and method for categorizing electronic document automatically. Patent Document, 10-2009-0041272, 10-1071495 (2011)

    Google Scholar 

  10. Jo, T.: Normalized table matching algorithm as approach to text categorization. Soft. Comput. 19, 839–849 (2015)

    Article  MathSciNet  Google Scholar 

  11. Jo, T.: Inverted index based modified version of k-means algorithm for text clustering. J. Inf. Process. Syst. 4, 67–76 (2008)

    Article  Google Scholar 

  12. Jo, T.: Representationof texts into string vectors for text categorization. J. Comput. Sci. Eng. 4, 110–127 (2010)

    Article  Google Scholar 

  13. Jo, T.: NTSO (Neural Text Self Organizer): a new neural network for text clustering. J. Netw. Technol. 1, 31–43 (2010)

    Google Scholar 

  14. Jo, T.: NTC (Neural Text Categorizer): neural network for text categorization. Int. J. Inf. Stud. 2, 83–96 (2010)

    Google Scholar 

Download references

Acknowledgement

The research was supported by the International Science and Business Belt Program through the Ministry of Science and ICT (2017K000451).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taeho Jo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jo, T. (2019). Classifying News Articles Using Feature Similarity K Nearest Neighbor. In: Hwang, S., Tan, S., Bien, F. (eds) Proceedings of the Sixth International Conference on Green and Human Information Technology. ICGHIT 2018. Lecture Notes in Electrical Engineering, vol 502. Springer, Singapore. https://doi.org/10.1007/978-981-13-0311-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0311-1_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0310-4

  • Online ISBN: 978-981-13-0311-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics