Semantic Enriched Short Text Clustering

Kozlowski, Marek; Rybinski, Henryk

doi:10.1007/978-3-319-60438-1_43

Semantic Enriched Short Text Clustering

Marek Kozlowski¹⁹ &
Henryk Rybinski¹⁹

Conference paper
First Online: 14 June 2017

1728 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10352))

Abstract

The paper is devoted to the issue of clustering short texts, which are free answers gathered during brain storming seminars. Those answers are short, often incomplete, and highly biased toward the question, so establishing a notion of proximity between texts is a challenging task. In addition, the number of answers is counted up to hundred instances, which causes sparsity. We present three text clustering methods in order to choose the best one for this specific task, then we show how the method can be improved by a semantic enrichment, including neural-based distributional models and external knowledge resources. The algorithms have been evaluated on the unique seminar’s data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings SIGIR, Copenhagen, pp. 318–329 (1992)
Google Scholar
Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23954-0_20
Chapter Google Scholar
Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT Press
Article Google Scholar
Flati, T., Navigli, R.: Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy. In: Proceedings of SEMANTiCS, Leipzig (2014)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of the ACL (2012)
Google Scholar
Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Cham (2014). doi:10.1007/978-3-319-08729-0_25
Google Scholar
Kozlowski, M., Rybinski, H.: Word sense induction with closed frequent termsets. In: Computational Intelligence (2016)
Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Google Scholar
Mikolov, T., Le, Q.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing (2014)
Google Scholar
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings Conference on Empirical Methods in NLP, Boston, pp. 116–126 (2010)
Google Scholar
Navigli, R.: (Digital) goodies from the ERC wishing well: BabelNet, Babelfy, video games with a purpose and the Wikipedia bitaxonomy. In: Proceedings of the 2nd International Workshop on NLP and DBpedia, Italy (2014)
Google Scholar
Osiński, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2004, vol. 25, pp. 359–368. Springer, Heidelberg (2004)
Chapter Google Scholar
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE Press
Article Google Scholar
Sutskever, I., Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)
MATH Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning (2007)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998)
Google Scholar
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11), 1361–1374 (1999). Elsevier
Article Google Scholar

Download references

Author information

Authors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Marek Kozlowski & Henryk Rybinski

Authors

Marek Kozlowski
View author publications
You can also search for this author in PubMed Google Scholar
Henryk Rybinski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Kozlowski .

Editor information

Editors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Marzena Kryszkiewicz
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
Institute of Informatics, University of Warsaw, Warsaw, Poland
Dominik Ślęzak
Faculty of Electronics & Information, Warsaw University of Technology, Warsaw, Poland
Henryk Rybinski
Institute of Mathematics, Warsaw University, Warsaw, Poland
Andrzej Skowron
Department of Computer Science, University of North Carolina at Charlotte, North Carolina, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kozlowski, M., Rybinski, H. (2017). Semantic Enriched Short Text Clustering. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-60438-1_43
Published: 14 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics