A Sampling-PSO-K-means Algorithm for Document Clustering

  • Nadjet Kamel
  • Imane Ouchen
  • Karim Baali
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 238)


Clustering is grouping objects into clusters such that objects within the same cluster are similar and objects of different clusters are dissimilar. Several clustering algorithms have been proposed in the literature, and they are used in several areas: security, marketing, documentation, social networks etc. The K-means algorithm is one of the best clustering algorithms. It is very efficient but its performance is very sensitive to the initialization of clusters. Several solutions have been proposed to address this problem. In this paper we propose a hybrid algorithm for document web clustering. The proposed algorithm is based on K-means, PSO and Sampling algorithms. It is evaluated on four datasets and the results are compared to those obtained by the algorithms: K-means, PSO, Sampling+K-means, and PSO+K-means. The results show that the proposed algorithm generates the most compact clusters.


Clustering algorithms PSO K-means Sampling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subpace clustering of high dimensional data for data mining applications (1999)Google Scholar
  2. 2.
    Nagesh, H., Goil, S., Choudhary, A.: Efficient and scalable subspace clustering for every large data sets (1999)Google Scholar
  3. 3.
    Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, New York City, USA, pp. 428–439 (1998)Google Scholar
  4. 4.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. John Wiley & Sons (1990)Google Scholar
  5. 5.
    Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification. W. H. Freeman and Compagny, San Francisco (1973)MATHGoogle Scholar
  6. 6.
    Vazirani. Algorithmes d’approximation, V. Collection IRIS. Springer (2006)Google Scholar
  7. 7.
    TREC. Text Retrieval Conference (1999),
  8. 8.
    MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  9. 9.
    Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)CrossRefGoogle Scholar
  10. 10.
    Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)CrossRefGoogle Scholar
  11. 11.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)Google Scholar
  12. 12.
    Mirkin, B.: Clustering for data mining: A data recovery approach. Chapman and Hall, London (2005)CrossRefGoogle Scholar
  13. 13.
    Kwedlo, W., Iwanowicz, P.: Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part II. LNCS, vol. 6114, pp. 165–172. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Xiaohui, C., Potok, T.E.: Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm. Applied Software Engineering Research Group, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831- 6085, USA (2005)Google Scholar
  15. 15.
    Saatchi, S., Hung, C.-C.: Hybridization of the Ant Colony Optimization with the K-Means Algorithm for Clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Carlisle, A., Dozier, G.: An Off-The- Shelf PSO. In: Proceedings of the 2001 Workshop on Particle Swarm Optimization, Indianapolis, IN, pp. 1–6 (2001)Google Scholar
  17. 17.
    Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, New York (2001)Google Scholar
  18. 18.
    Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  19. 19.
    Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002), Singapore, pp. 370–374 (2002)Google Scholar
  20. 20.
    Van, D.M., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Canbella, Australia, pp. 215–220 (2003)Google Scholar
  21. 21.
    Alireza, A., Hamidreza, M.: Combining PSO and k-means to Enhance Data Clustering. In: International Symposium on Telecommunication, vol. 1 and 2, pp. 688–691 (2008)Google Scholar
  22. 22.
    Taher, N., Babak, A.: An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing 10(1), 183–197 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Science Department, Faculty of SciencesUFASSetifAlgeria
  2. 2.LRIA, Computer Science DepartmentUSTHBAlgiersAlgeria

Personalised recommendations