Skip to main content

Semantic Enriched Short Text Clustering

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10352)

Abstract

The paper is devoted to the issue of clustering short texts, which are free answers gathered during brain storming seminars. Those answers are short, often incomplete, and highly biased toward the question, so establishing a notion of proximity between texts is a challenging task. In addition, the number of answers is counted up to hundred instances, which causes sparsity. We present three text clustering methods in order to choose the best one for this specific task, then we show how the method can be improved by a semantic enrichment, including neural-based distributional models and external knowledge resources. The algorithms have been evaluated on the unique seminar’s data sets.

Keywords

  • Document clustering
  • Information retrieval
  • Semantic enrichment

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-60438-1_43
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-60438-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)

Notes

  1. 1.

    https://code.google.com/p/word2vec/.

  2. 2.

    http://snowball.tartarus.org/texts/introduction.html.

  3. 3.

    http://norvig.com/spell-correct.html.

  4. 4.

    https://babelfy.io/v1/disambiguate.

  5. 5.

    https://babelnet.io/v3/getSynset.

References

  1. Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings SIGIR, Copenhagen, pp. 318–329 (1992)

    Google Scholar 

  2. Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23954-0_20

    CrossRef  Google Scholar 

  3. Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT Press

    CrossRef  Google Scholar 

  4. Flati, T., Navigli, R.: Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy. In: Proceedings of SEMANTiCS, Leipzig (2014)

    Google Scholar 

  5. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of the ACL (2012)

    Google Scholar 

  6. Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Cham (2014). doi:10.1007/978-3-319-08729-0_25

    Google Scholar 

  7. Kozlowski, M., Rybinski, H.: Word sense induction with closed frequent termsets. In: Computational Intelligence (2016)

    Google Scholar 

  8. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    CrossRef  MATH  Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)

    Google Scholar 

  10. Mikolov, T., Le, Q.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing (2014)

    Google Scholar 

  11. Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings Conference on Empirical Methods in NLP, Boston, pp. 116–126 (2010)

    Google Scholar 

  12. Navigli, R.: (Digital) goodies from the ERC wishing well: BabelNet, Babelfy, video games with a purpose and the Wikipedia bitaxonomy. In: Proceedings of the 2nd International Workshop on NLP and DBpedia, Italy (2014)

    Google Scholar 

  13. Osiński, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2004, vol. 25, pp. 359–368. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  14. Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE Press

    CrossRef  Google Scholar 

  15. Sutskever, I., Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  16. Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)

    MATH  Google Scholar 

  17. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning (2007)

    Google Scholar 

  18. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998)

    Google Scholar 

  19. Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11), 1361–1374 (1999). Elsevier

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Kozlowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kozlowski, M., Rybinski, H. (2017). Semantic Enriched Short Text Clustering. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60438-1_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60437-4

  • Online ISBN: 978-3-319-60438-1

  • eBook Packages: Computer ScienceComputer Science (R0)