Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection

  • Ibraheem Al-Jadir
  • Kok Wai Wong
  • Chun Che Fung
  • Hong Xie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10607)

Abstract

In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).

Keywords

Clustering Feature F-measure Hybrid Memetic Selection 

Notes

Acknowledgment

Ibraheem wants to thank the Higher Committee for Education Development in Iraq (HCED) for the funning of his scholarship.

References

  1. 1.
    Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A., Zheng, Q.: Adaptive unsupervised feature selection with structure regularization. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–13 (2017)Google Scholar
  2. 2.
    Nie, P.: A filter method for solving nonlinear complementarity problems. Appl. Math. Comput. 167(1), 677–694 (2005)MATHMathSciNetGoogle Scholar
  3. 3.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefMATHGoogle Scholar
  4. 4.
    Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)CrossRefGoogle Scholar
  5. 5.
    Souza, J., Japkowicz, N., Matwin, S.: Feature selection with a general hybrid algorithm. In: SIAM International Conference on Data Mining 2005, Newport Beach, CA, p. 45 (2005)Google Scholar
  6. 6.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  7. 7.
    Al-Jadir, I., Wong, K.W., Fung, C.C., Xie, H.: Text document clustering using memetic feature selection. In: Proceedings of the 9th International Conference on Machine Learning and Computing, pp. 415–420. ACM: Singapore (2017)Google Scholar
  8. 8.
    Vergara, J.R., Estévez, P.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)CrossRefGoogle Scholar
  9. 9.
    Zorarpacı, E., Özel, S.A.: A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 62, 91–103 (2016)CrossRefGoogle Scholar
  10. 10.
    Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT). IEEE (2016)Google Scholar
  11. 11.
    Ong, Y., Lim, M., Zhu, N., Wong, K.: Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 36(1), 141–152 (2006)CrossRefGoogle Scholar
  12. 12.
    Aarts, E., Laarhoven, P.V.: Simulated annealing: an introduction. Stat. Neerl. 43(1), 31–52 (1989)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)CrossRefGoogle Scholar
  14. 14.
    Mafarja, M.M., Mirjalili, S.: Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017)CrossRefGoogle Scholar
  15. 15.
    Lee, J., Kim, D.-W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. 293, 80–96 (2015)CrossRefGoogle Scholar
  16. 16.
    Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)CrossRefGoogle Scholar
  17. 17.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136 a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRefMATHGoogle Scholar
  18. 18.
    Duwairi, R., Abu-Rahmeh, M.: A novel approach for initializing the spherical k-means clustering algorithm. Simul. Model. Pract. Theory 54, 49–63 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ibraheem Al-Jadir
    • 1
    • 2
  • Kok Wai Wong
    • 1
  • Chun Che Fung
    • 1
  • Hong Xie
    • 1
  1. 1.School of Engineering and Information TechnologyMurdoch UniversityPerthAustralia
  2. 2.College of ScienceBaghdad UniversityBaghdadIraq

Personalised recommendations