Abstract
This research proposes the KNN (K Nearest Neighbor) which computes the similarity between data items considering features or attributes as well as one to one values. The assumption of the independency among attributes is the violation against the reality especially in the text classification where words are used as features of texts. In this research, we define the similarity measure which considers both attributes and attribute values, modify the traditional version of KNN using the similarity measure, and apply it to the task of text classification. As benefits from this research, it provides the more compact representations of texts and the better performance. Therefore, the goal of this research is to implement the text categorization system with its more efficient data representations and better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Jo, T.: The implementation of dynamic document organization using text categorization and text clustering. Ph.D. Dissertation of University of Ottawa (2006)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Survey 34, 1–47 (2002)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification with string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Kate, R.J., Mooney, R.J.: Using string kernels for learning semantic parsers. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 913–920 (2006)
Jo, T., Cho, D.: Index based approach for text categorization. Int. J. Math. Comput. Simul. 2, 127–132 (2008)
Jo, T.: Device and method for categorizing electronic document automatically. Patent Document, 10-2009-0041272, 10-1071495 (2011)
Jo, T.: Normalized table matching algorithm as approach to text categorization. Soft. Comput. 19, 839–849 (2015)
Jo, T.: Inverted index based modified version of k-means algorithm for text clustering. J. Inf. Process. Syst. 4, 67–76 (2008)
Jo, T.: Representationof texts into string vectors for text categorization. J. Comput. Sci. Eng. 4, 110–127 (2010)
Jo, T.: NTSO (Neural Text Self Organizer): a new neural network for text clustering. J. Netw. Technol. 1, 31–43 (2010)
Jo, T.: NTC (Neural Text Categorizer): neural network for text categorization. Int. J. Inf. Stud. 2, 83–96 (2010)
Acknowledgement
The research was supported by the International Science and Business Belt Program through the Ministry of Science and ICT (2017K000451).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jo, T. (2019). Classifying News Articles Using Feature Similarity K Nearest Neighbor. In: Hwang, S., Tan, S., Bien, F. (eds) Proceedings of the Sixth International Conference on Green and Human Information Technology. ICGHIT 2018. Lecture Notes in Electrical Engineering, vol 502. Springer, Singapore. https://doi.org/10.1007/978-981-13-0311-1_14
Download citation
DOI: https://doi.org/10.1007/978-981-13-0311-1_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0310-4
Online ISBN: 978-981-13-0311-1
eBook Packages: EngineeringEngineering (R0)