Word Mover’s Distance for Agglomerative Short Text Clustering

  • Nigel FranciscusEmail author
  • Xuguang Ren
  • Junhu Wang
  • Bela Stantic
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11431)


In the era of information overload, text clustering plays an important part in the analysis processing pipeline. Partitioning high-quality texts into unseen categories tremendously helps applications in information retrieval, databases, and business intelligence domains. Short texts from social media environment such as tweets, however, remain difficult to interpret due to the broad aspects of contexts. Traditional text similarity approaches only rely on the lexical matching while ignoring the semantic meaning of words. Recent advances in distributional semantic space have opened an alternative approach in utilizing high-quality word embeddings to aid the interpretation of text semantics. In this paper, we investigate the word mover’s distance metrics to automatically cluster short text using the word semantic information. We utilize the agglomerative strategy as the clustering method to efficiently group texts based on their similarity. The experiment indicates the word mover’s distance outperformed other standard metrics in the short text clustering task.


Word mover’s distance Text clustering Short text Social media 


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). Scholar
  3. 3.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  4. 4.
    Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.W.: Tweet2vec: character-based distributed representations for social media. In: The 54th Annual Meeting of the Association for Computational Linguistics, p. 269 (2016)Google Scholar
  5. 5.
    Franciscus, N., Ren, X., Stantic, B.: Answering temporal analytic queries over big data based on precomputing architecture. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) ACIIDS 2017. LNCS (LNAI), vol. 10191, pp. 281–290. Springer, Cham (2017). Scholar
  6. 6.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  7. 7.
    Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)Google Scholar
  8. 8.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)Google Scholar
  9. 9.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  10. 10.
    Liu, C.Y., Chen, M.S., Tseng, C.Y.: Incrests: towards real-time incremental short text summarization on comment streams from social network services. IEEE Trans. Knowl. Data Eng. 27(11), 2986–3000 (2015)CrossRefGoogle Scholar
  11. 11.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)Google Scholar
  12. 12.
    Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: ICCV, vol. 9, pp. 460–467 (2009)Google Scholar
  13. 13.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar
  14. 14.
    Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014)CrossRefGoogle Scholar
  15. 15.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)Google Scholar
  16. 16.
    Vakulenko, S., Nixon, L., Lupu, M.: Character-based neural embeddings for tweet clustering. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 36–44 (2017)Google Scholar
  17. 17.
    Vosoughi, S., Vijayaraghavan, P., Roy, D.: Tweet2vec: learning tweet embeddings using character-level CNN-LSTM encoder-decoder. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1041–1044. ACM (2016)Google Scholar
  18. 18.
    Vosoughi, S., Vijayaraghavan, P., Yuan, A., Roy, D.: Mapping twitter conversation landscapes. In: Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM, 15–18 May 2017, pp. 684–687 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Nigel Franciscus
    • 1
    Email author
  • Xuguang Ren
    • 1
  • Junhu Wang
    • 1
  • Bela Stantic
    • 1
  1. 1.Institute for Integrated and Intelligent SystemsBrisbaneAustralia

Personalised recommendations