Skip to main content

Advertisement

Log in

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://sites.labic.icmc.usp.br/text_collections/.

  2. Porter stemmer. website at http://tartarus.org/martin/PorterStemmer/.

References

  1. Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE). IEEE, pp 67–72

  2. Rao B, Mishra BK (2017) An approach to clustering of text documents using graph mining techniques. International Journal of Rough Sets and Data Analysis (IJRSDA) 4(1):38–55

    Article  Google Scholar 

  3. Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering, pp 1–6

  4. Li C, Lin M, Yang LT, Ding C (2014) Integrating the enriched feature with machine learning algorithms for human movement and fall detection. J Supercomput 67(3):854–865

    Article  Google Scholar 

  5. Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30(2):117–131

    Article  MATH  Google Scholar 

  6. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114

    Article  Google Scholar 

  7. Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977–2990

    Article  Google Scholar 

  8. Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 30(2):117–131

    Google Scholar 

  9. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  10. Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636

    Article  Google Scholar 

  11. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34

    Article  Google Scholar 

  12. Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763

    Article  Google Scholar 

  13. Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313

    Article  Google Scholar 

  14. Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised Text Feature Selection Technique Based on Particle Swarm Optimization Algorithm for Improving the Text Clustering. EAI

  15. Shamsinejadbabki P, Saraee M (2012) A new unsupervised feature selection method for text clustering based on genetic algorithms. J Intell Inf Syst 38(3):669–684

    Article  Google Scholar 

  16. Hong SS, Lee W, Han MM (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7(1):22–40

  17. Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72:1–12

    Article  Google Scholar 

  18. Diao R (2014) Feature selection with harmony search and its applications. Aberystwyth University, Aberystwyth

    Google Scholar 

  19. Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19

    Google Scholar 

  20. Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032

    Article  Google Scholar 

  21. Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-Based Text Clustering Technique Using K-Mean Algorithm. 2016 July, pp 1–6

  22. Bharti KK, Singh PK (2014) A three-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):156–169

    Article  Google Scholar 

  23. Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. In: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012) Volume 2. Springer, 2013, pp 529–542

  24. Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised Feature Selection Technique Based on Harmony Search Algorithm for Improving the Text Clustering. 2016 July, pp 1–6

  25. Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200

    Article  Google Scholar 

  26. Nekkaa M, Boughaci D (2015) Hybrid harmony search combined with stochastic local search for feature selection. Neural Process Lett 44:1–22

    Google Scholar 

  27. Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446

    Article  Google Scholar 

  28. Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845

    Article  MATH  MathSciNet  Google Scholar 

  29. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291

    Article  MathSciNet  Google Scholar 

  30. Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632

    Article  Google Scholar 

  31. Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst Appl 42(1):306–324

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Mohammad Abualigah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abualigah, L.M., Khader, A.T. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73, 4773–4795 (2017). https://doi.org/10.1007/s11227-017-2046-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2046-2

Keywords

Navigation