Abstract
This paper presents the classification of text data using the symbolic data type of interval-valued feature selection method. Initially, the documents are represented in the form of interval-valued features. The proposed method uses a supervised environment in which every feature is represented using a single crisp value with the help of the proposed ranking method. Further, the features are ranked using scores associated with each of them. The top-ranked Q′ features are chosen from the Q set of evaluated features, and Q′ is decided through empirical evaluation. The feature selection criteria proposed is validated using symbolic classifier with the help of standard text datasets Reuters-21578 and TDT2 dataset. The experimental results obtained from this method show that the proposed method is more effective compare to other existing techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer (2012). ISBN 978-1-4614-3222-7
Ferreira, A., Figueiredo, M.A.T.: Efficient feature selection filters for high-dimensional data. Pattern Recognit. Lett. 33, 1794–1804 (2012)
Guru, D.S., Nagendraswamy, H.S.: Symbolic representation and classification of two-dimensional shapes. In: Proceedings of the 3rd Workshop on Computer vision, Graphics and Image Processing (WCVGIP), pp. 19–24 (2006)
Guru, D.S., Suhil, M., Lavanya, N.R., Vinay Kumar, N.: An alternative framework for univariate filter. Pattern Recognit. Lett. 103, 23–31 (2018)
Guru, D.S., Suhil, M.: A novel term class relevance measure for text categorization. Procedia Comput. Sci. 45, 13–22 (2015)
Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. IJCA Spec. Issue RTIPPR, 110–119 (2010)
Isa, D., Lee, L.H., Kallimani, V.P., Rajkumar, R.: Text document preprocessing with the bayes formula for classification using the support vector machine. IEEE TKDE 20, 1264–1272 (2008)
Lavanya, N.R., Suhil, M., Guru, D.S., Harsha, S.G.: Cluster based symbolic representation for skewed text categorization. In: International Conference on Recent Trends in Image Processing & Pattern Recognition (RTIP2R)-2016. CCIS, vol. 709, pp. 202–216. Springer (2016)
Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)
Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion – a novel feature ranking method for text data. Expert Syst. Appl. 42, 3670–3681 (2015)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Suhil, M., Guru, D.S., Lavanya, N.R., Harsha, S.G.: Simple yet effective classification model for skewed text categorization. In: International Conference on Computing, Communications and Informatics (ICACCI)-2016, pp. 904–910. IEEE (2016)
Swarnalatha, K., Guru, D.S., Anami, B.S, Vinay Kumar, N.: A filter based feature selection for imbalanced text classification. In: International Conference on Recent Trends in Image Processing & Pattern Recognition (RTIP2R)-2018. CCIS. Springer (2018)
Swarnalatha, K., Guru, D.S., Anami, B.S., Suhil, M.: Classwise clustering for classification of imbalanced text data. In: Sridhar, V., Padma, M., Rao, K. (eds.) Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol. 545. Springer, Singapore (2019)
Uysal, A.K., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl. Based Syst. 36, 226–235 (2012)
Vinay Kumar, N., Guru, D.S.: A novel feature ranking criterion for supervised interval valued feature selection for classification. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)
Guru, D.S., Vinay Kumar, N., Suhil, M.: Feature selection of interval valued data through interval k-means clustering. Int. J. Comput. Vis. Image Process. (IJCVIP) 7(2), 64–80 (2017)
Guru, D.S., Vinay Kumar, N.: Interval chi-square score (ICSS): feature selection of interval valued data. In: International Conference on Intelligent Systems Design and Applications, pp. 686–698 (2018)
Guru, D.S., Vinay Kumar, N.: Clustering of interval valued data through interval valued feature selection: filter based approaches. In: 7th International Conference, MIKE 2019. Springer Lecture Notes in Artificial Intelligence (LNAI), vol. 11987 (2020). https://doi.org/10.1007/978-3-030-66187-8
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vinay Kumar, N., Swarnalatha, K., Guru, D.S., Anami, B.S. (2021). Interval-Valued Feature Selection for Classification of Text Documents. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_95
Download citation
DOI: https://doi.org/10.1007/978-3-030-71187-0_95
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)