Abstract
The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.
Similar content being viewed by others
Notes
Porter stemmer. website at http://tartarus.org/martin/PorterStemmer/.
References
Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE). IEEE, pp 67–72
Rao B, Mishra BK (2017) An approach to clustering of text documents using graph mining techniques. International Journal of Rough Sets and Data Analysis (IJRSDA) 4(1):38–55
Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering, pp 1–6
Li C, Lin M, Yang LT, Ding C (2014) Integrating the enriched feature with machine learning algorithms for human movement and fall detection. J Supercomput 67(3):854–865
Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30(2):117–131
Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114
Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977–2990
Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 30(2):117–131
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34
Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313
Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised Text Feature Selection Technique Based on Particle Swarm Optimization Algorithm for Improving the Text Clustering. EAI
Shamsinejadbabki P, Saraee M (2012) A new unsupervised feature selection method for text clustering based on genetic algorithms. J Intell Inf Syst 38(3):669–684
Hong SS, Lee W, Han MM (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7(1):22–40
Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72:1–12
Diao R (2014) Feature selection with harmony search and its applications. Aberystwyth University, Aberystwyth
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032
Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-Based Text Clustering Technique Using K-Mean Algorithm. 2016 July, pp 1–6
Bharti KK, Singh PK (2014) A three-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):156–169
Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. In: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012) Volume 2. Springer, 2013, pp 529–542
Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Harmony Search Algorithm for Improving the Text Clustering. 2016 July, pp 1–6
Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200
Nekkaa M, Boughaci D (2015) Hybrid harmony search combined with stochastic local search for feature selection. Neural Process Lett 44:1–22
Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446
Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291
Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632
Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst Appl 42(1):306–324
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abualigah, L.M., Khader, A.T. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73, 4773–4795 (2017). https://doi.org/10.1007/s11227-017-2046-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2046-2