Classifying News Articles Using Feature Similarity K Nearest Neighbor

Jo, Taeho

doi:10.1007/978-981-13-0311-1_14

Taeho Jo³⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 502))

Included in the following conference series:

International Conference on Green and Human Information Technology

559 Accesses

Abstract

This research proposes the KNN (K Nearest Neighbor) which computes the similarity between data items considering features or attributes as well as one to one values. The assumption of the independency among attributes is the violation against the reality especially in the text classification where words are used as features of texts. In this research, we define the similarity measure which considers both attributes and attribute values, modify the traditional version of KNN using the similarity measure, and apply it to the task of text classification. As benefits from this research, it provides the more compact representations of texts and the better performance. Therefore, the goal of this research is to implement the text categorization system with its more efficient data representations and better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Jo, T.: The implementation of dynamic document organization using text categorization and text clustering. Ph.D. Dissertation of University of Ottawa (2006)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Survey 34, 1–47 (2002)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification with string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
MATH Google Scholar
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Article Google Scholar
Kate, R.J., Mooney, R.J.: Using string kernels for learning semantic parsers. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 913–920 (2006)
Google Scholar
Jo, T., Cho, D.: Index based approach for text categorization. Int. J. Math. Comput. Simul. 2, 127–132 (2008)
Google Scholar
Jo, T.: Device and method for categorizing electronic document automatically. Patent Document, 10-2009-0041272, 10-1071495 (2011)
Google Scholar
Jo, T.: Normalized table matching algorithm as approach to text categorization. Soft. Comput. 19, 839–849 (2015)
Article MathSciNet Google Scholar
Jo, T.: Inverted index based modified version of k-means algorithm for text clustering. J. Inf. Process. Syst. 4, 67–76 (2008)
Article Google Scholar
Jo, T.: Representationof texts into string vectors for text categorization. J. Comput. Sci. Eng. 4, 110–127 (2010)
Article Google Scholar
Jo, T.: NTSO (Neural Text Self Organizer): a new neural network for text clustering. J. Netw. Technol. 1, 31–43 (2010)
Google Scholar
Jo, T.: NTC (Neural Text Categorizer): neural network for text categorization. Int. J. Inf. Stud. 2, 83–96 (2010)
Google Scholar

Download references

Acknowledgement

The research was supported by the International Science and Business Belt Program through the Ministry of Science and ICT (2017K000451).

Author information

Authors and Affiliations

Hongik University, Sejong, 30016, South Korea
Taeho Jo

Authors

Taeho Jo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taeho Jo .

Editor information

Editors and Affiliations

Department of Computer Information and Communications, Hongik University, Seoul, Soul-t’ukpyolsi, Korea (Republic of)
Seong Oun Hwang
Multimedia University, Cyberjaya, Malaysia
Syh Yuan Tan
School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology, Ulsan, Korea (Republic of)
Franklin Bien

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jo, T. (2019). Classifying News Articles Using Feature Similarity K Nearest Neighbor. In: Hwang, S., Tan, S., Bien, F. (eds) Proceedings of the Sixth International Conference on Green and Human Information Technology. ICGHIT 2018. Lecture Notes in Electrical Engineering, vol 502. Springer, Singapore. https://doi.org/10.1007/978-981-13-0311-1_14

Download citation

DOI: https://doi.org/10.1007/978-981-13-0311-1_14
Published: 30 June 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0310-4
Online ISBN: 978-981-13-0311-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics