Abstract
Overlapping clustering algorithms have been successfully applied in several contexts. Among the reported overlapping clustering algorithms, OClustR is the one showing the best trade-of between quality of the clusters and efficiency, in the task of document clustering; however, it has a quadratic computational complexity so it could be less useful in applications dealing with a very large number of documents. In this paper, we propose two parallel versions of the OClustR algorithm, specifically tailored for GPUs and multi-core CPUs, which enhance the efficiency of OClustR in problems dealing with a very large number of documents. The experimental evaluation over standard document collections showed the correctness and good performance of our proposals.
Chapter PDF
Similar content being viewed by others
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Pérez-Suárez, A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina Pagola, J.E.: A New Graph-based Algorithm for Overlapping Clustering. Neurocomputing 121, 234–247 (2013)
Berry, M.: Survey of Text Mining, Clustering, Classification and Retrieval. Springer (2004)
Sanders, J., Kandrot, E.: CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional (2010)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 461–486 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Soler, L.J.G., Pérez-Suárez, A., Chang, L. (2014). Efficient Overlapping Document Clustering Using GPUs and Multi-core Systems. In: Bayro-Corrochano, E., Hancock, E. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2014. Lecture Notes in Computer Science, vol 8827. Springer, Cham. https://doi.org/10.1007/978-3-319-12568-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-12568-8_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12567-1
Online ISBN: 978-3-319-12568-8
eBook Packages: Computer ScienceComputer Science (R0)