ITSA ⋆ : An Effective Iterative Method for Short-Text Clustering Tasks

Errecalde, Marcelo; Ingaramo, Diego; Rosso, Paolo

doi:10.1007/978-3-642-13022-9_55

Marcelo Errecalde²⁴,
Diego Ingaramo²⁴ &
Paolo Rosso²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6096))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2182 Accesses
3 Citations

Abstract

The current tendency for people to use very short documents, e.g. blogs, text-messaging, news and others, has produced an increasing interest in automatic processing techniques which are able to deal with documents with these characteristics. In this context, “short-text clustering” is a very important research field where new clustering algorithms have been recently proposed to deal with this difficult problem. In this work, ITSA^⋆, an iterative method based on the bio-inspired method PAntSA^⋆ is proposed for this task. ITSA^⋆ takes as input the results obtained by arbitrary clustering algorithms and refines them by iteratively using the PAntSA^⋆ algorithm. The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections. However, ITSA^⋆ can not only be used as an effective improvement method. Using random initial clusterings, ITSA^⋆ outperforms well-known clustering algorithms in most of the experimental instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pinto, D., Rosso, P.: On the relative hardness of clustering corpora. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 155–161. Springer, Heidelberg (2007)
Chapter Google Scholar
Errecalde, M., Ingaramo, D., Rosso, P.: Proximity estimation and hardness of short-text corpora. In: Proceedings of TIR-2008, pp. 15–19. IEEE CS, Los Alamitos (2008)
Google Scholar
Ingaramo, D., Pinto, D., Rosso, P., Errecalde, M.: Evaluation of internal validity measures in short-text corpora. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 555–567. Springer, Heidelberg (2008)
Chapter Google Scholar
Cagnina, L., Errecalde, M., Ingaramo, D., Rosso, P.: A discrete particle swarm optimizer for clustering short-text corpora. In: BIOMA08, pp. 93–103 (2008)
Google Scholar
Ingaramo, D., Errecalde, M., Cagnina, L., Rosso, P.: Particle Swarm Optimization for clustering short-text corpora. In: Computational Intelligence and Bioengineering, pp. 3–19. IOS press, Amsterdam (2009)
Google Scholar
Ingaramo, D., Errecalde, M., Rosso, P.: A new anttree-based algorithm for clustering short-text corpora. Journal of CS&T (in press, 2010)
Google Scholar
Ingaramo, D., Errecalde, M., Pinto, D.: A general bio-inspired method to improve the short-text clustering task. In: Proc. of CICLing 2010. LNCS. Springer, Heidelberg (in press 2010)
Google Scholar
Azzag, H., Monmarche, N., Slimane, M., Venturini, G., Guinot, C.: AntTree: A new model for clustering with artificial ants. In: Proc. of the CEC 2003, Canberra, pp. 2642–2647. IEEE Press, Los Alamitos (2003)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Makagonov, P., Alexandrov, M., Gelbukh, A.: Clustering abstracts instead of full texts. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 129–135. Springer, Heidelberg (2004)
Chapter Google Scholar
Alexandrov, M., Gelbukh, A., Rosso, P.: An approach to clustering abstracts. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 8–13. Springer, Heidelberg (2005)
Chapter Google Scholar
Pinto, D., Benedí, J.M., Rosso, P.: Clustering narrow-domain short texts by using the Kullback-Leibler distance. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 611–622. Springer, Heidelberg (2007)
Chapter Google Scholar
Stein, B., Meyer zu Eißen, S.: Document Categorization with MajorClust. In: Proc. WITS 02, pp. 91–96. Technical University of Barcelona (2002)
Google Scholar
Karypis, G., Han, E.H., Vipin, K.: Chameleon: Hierarchical clustering using dynamic modeling. Computer 32, 68–75 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIDIC, Universidad Nacional de San Luis, Argentina
Marcelo Errecalde & Diego Ingaramo
Natural Language Eng. Lab. ELiRF, DSIC, Universidad Politécnica de Valencia, Spain
Paolo Rosso

Authors

Marcelo Errecalde
View author publications
You can also search for this author in PubMed Google Scholar
Diego Ingaramo
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computing and Numerical Analysis, University of Cordoba, Campus Universitario de Rabanales, Einstein Building, 3rd floor, 14071, Cordoba, Spain
Nicolás García-Pedrajas
Dept. of Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
Francisco Herrera
School of Computing, University of the West of Scotland, PA1 2BE, Paisley, UK
Colin Fyfe
Dept. Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
José Manuel Benítez
Department of Computer Science, Texas State University-San Marcos, 601 University Drive, TX 78666-4616, San Marcos, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Errecalde, M., Ingaramo, D., Rosso, P. (2010). ITSA^⋆: An Effective Iterative Method for Short-Text Clustering Tasks. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-13022-9_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ITSA ⋆ : An Effective Iterative Method for Short-Text Clustering Tasks