Advertisement

An Improved Bisecting K-Means Text Clustering Method

  • Ye Zi
  • Liang KunEmail author
  • Zhiyuan Zhang
  • Chunfeng Wang
  • Zhe Peng
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1084)

Abstract

Bisecting K-means clustering method belongs to the hierarchical algorithm in text clustering, in which the selection of K value and initial center of mass will affect the final result of clustering. Chinese word segmentation has the characteristics of vague word and word boundary, etc. We transformed the corpus into word vector by word2vec, reduced the dimension of data by ontology modeling, and cleaned the data by jieba word segmentation and TF-IDF to improve the accuracy of the data. We propose an improved algorithm based on hierarchical clustering and Bisecting K-means clustering to cluster the data many times until it converges. Through experiments, it is proved that the clustering result of this method is better than that of K-means clustering algorithm and Bisecting K-means clustering algorithm.

Keywords

Text clustering Bisecting K-means Ontology theory Hierarchical clustering 

Notes

Acknowledgements

This work was partially supported by NSFC (No. 61807024).

References

  1. 1.
    Zhang, Y., Huang, T., Lin, K., Zhang, Q.: An improved K-means text clustering algorithm. J. Guilin Univ. Electron. Sci. Technol. 36(04), 311–314 (2016)Google Scholar
  2. 2.
    Wang, Q.: Chinese word segmentation and word vector. China New Commun. 20(23), 19–23 (2018)Google Scholar
  3. 3.
    An, J., Gao, G., Shi, Z., Sun, L.: An improved K-means text clustering algorithm. Sens. Microsyst. 34(05), 130–133 (2015)Google Scholar
  4. 4.
    Liu, P., Lu, J.: Improved K-means text clustering algorithm based on MapReduce. Inf. Technol. (11), 201–205 (2016)Google Scholar
  5. 5.
    Zou, H., Li, M.: An improved bisecting K-means algorithm for text clustering. Microcomput. Appl. 29(12), 64–67 (2010)Google Scholar
  6. 6.
    Zhang, J., Wang, N., Huang, S., Li, S.: Research on optimization and parallelization of bisecting K-means clustering algorithm. Comput. Eng. 37(17), 23–25 (2011)Google Scholar
  7. 7.
    Hui, Y., Xia, Y., Chen, Z., Tong, X.: Short text clustering algorithm based on synonyms and K-means. Comput. Knowl. Technol. 15(01), 5–6 (2019)Google Scholar
  8. 8.
    Tang, X., Zhai, X.: Semantic indexing of text knowledge fragments based on ontology and Word2Vec. Inf. Sci. 37(04), 97–102 (2019)Google Scholar
  9. 9.
    Dai, Y., Xu, L.: An improved TF-IDF algorithm based on semantic analysis. J. Southwest Univ. Sci. Technol. 34(01), 6773 (2019)Google Scholar
  10. 10.
    Kui, Z.: Improvement of TF-IDF weight calculation method in text classification. Softw. Guide 17(12), 39–42 (2018)Google Scholar
  11. 11.
    Liang, K., Wang, C., Zhang, Y., Zou, W.: Knowledge aggregation and intelligent guidance for fragmented learning. Procedia Comput. Sci. 131, 656–664 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ye Zi
    • 1
  • Liang Kun
    • 1
    Email author
  • Zhiyuan Zhang
    • 1
  • Chunfeng Wang
    • 1
  • Zhe Peng
    • 1
  1. 1.College of Computer Science and Information EngineeringTianjin University of Science and TechnologyTianjinChina

Personalised recommendations