Skip to main content

Semantic Enriched Short Text Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10352))

Abstract

The paper is devoted to the issue of clustering short texts, which are free answers gathered during brain storming seminars. Those answers are short, often incomplete, and highly biased toward the question, so establishing a notion of proximity between texts is a challenging task. In addition, the number of answers is counted up to hundred instances, which causes sparsity. We present three text clustering methods in order to choose the best one for this specific task, then we show how the method can be improved by a semantic enrichment, including neural-based distributional models and external knowledge resources. The algorithms have been evaluated on the unique seminar’s data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://code.google.com/p/word2vec/.

  2. 2.

    http://snowball.tartarus.org/texts/introduction.html.

  3. 3.

    http://norvig.com/spell-correct.html.

  4. 4.

    https://babelfy.io/v1/disambiguate.

  5. 5.

    https://babelnet.io/v3/getSynset.

References

  1. Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings SIGIR, Copenhagen, pp. 318–329 (1992)

    Google Scholar 

  2. Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23954-0_20

    Chapter  Google Scholar 

  3. Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT Press

    Article  Google Scholar 

  4. Flati, T., Navigli, R.: Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy. In: Proceedings of SEMANTiCS, Leipzig (2014)

    Google Scholar 

  5. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of the ACL (2012)

    Google Scholar 

  6. Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Cham (2014). doi:10.1007/978-3-319-08729-0_25

    Google Scholar 

  7. Kozlowski, M., Rybinski, H.: Word sense induction with closed frequent termsets. In: Computational Intelligence (2016)

    Google Scholar 

  8. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)

    Google Scholar 

  10. Mikolov, T., Le, Q.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing (2014)

    Google Scholar 

  11. Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings Conference on Empirical Methods in NLP, Boston, pp. 116–126 (2010)

    Google Scholar 

  12. Navigli, R.: (Digital) goodies from the ERC wishing well: BabelNet, Babelfy, video games with a purpose and the Wikipedia bitaxonomy. In: Proceedings of the 2nd International Workshop on NLP and DBpedia, Italy (2014)

    Google Scholar 

  13. Osiński, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2004, vol. 25, pp. 359–368. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE Press

    Article  Google Scholar 

  15. Sutskever, I., Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  16. Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)

    MATH  Google Scholar 

  17. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning (2007)

    Google Scholar 

  18. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998)

    Google Scholar 

  19. Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11), 1361–1374 (1999). Elsevier

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Kozlowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kozlowski, M., Rybinski, H. (2017). Semantic Enriched Short Text Clustering. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60438-1_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60437-4

  • Online ISBN: 978-3-319-60438-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics