Weight-Based Firefly Algorithm for Document Clustering

  • Athraa Jasim MohammedEmail author
  • Yuhanis Yusof
  • Husniza Husni
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)


Existing clustering techniques have many drawbacks and this includes being trapped in a local optima. In this paper, we introduce the utilization of a new meta-heuristics algorithm, namely the Firefly algorithm (FA) to increase solution diversity. FA is a nature-inspired algorithm that is used in many optimization problems. The FA is realized in document clustering by executing it on Reuters-21578 database. The algorithm identifies documents that has the highest light intensity in a search space and represents it as a centroid. This is followed by recognizing similar documents using the cosine similarity function. Documents that are similar to the centroid are located into one cluster and dissimilar in the other. Experiments performed on the chosen dataset produce high values of Purity and F-measure. Hence, suggesting that the proposed Firefly algorithm is a possible approach in document clustering.


Firefly algorithm Partitional clustering Hierarchical clustering Text clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Das, S., Abraham, A., Konar, A.: Metaheuristic Clustering, Springer, Heidelberg (2009).Google Scholar
  2. 2.
    AnithaElavarasi, S., Akilandeswari, J., Sathiyabma, B.: A survay on Partition Clustering Algorithms. In: International journal of Enterprise Computing and Business Systems, vol. 1, issue 1, (2011).Google Scholar
  3. 3.
    Ye, N., Gauch, S., Wang, Q., Luong, H.:An Adaptive Ontology based Hierarchical Browsing System for CiteSeerX. In: Second International Conference on Knowledge and Systems Engineering (KSE), pp. 203–208, IEEE, (2010).Google Scholar
  4. 4.
    Wilson, H., Boots, B., Millward, A. A.: A Comparison of Hierarchical and Partitional Clustering Techniques for Multispectral Image Classification. vol.3, pp. 1624-1626, (2002).Google Scholar
  5. 5.
    Xu, Y.: Hybrid clustering with application to web mining. In: Proceedings of the International Conference on Active Media Technology (AMT 2005), pp. 574–578,IEEE, (2005).Google Scholar
  6. 6.
    Aliguliyev, R. M.: Clustering of Document Collection- A Weighted Approach. In: Expert Systems with Applications,vol. 36, issue 4, pp. 7904–7916,Elsevier, (2009).Google Scholar
  7. 7.
    Boley, D.: Principal Direction Divisive Partitioning. In: Data Mining and Knowledge Discovery, vol. 2, issue. 4, pp. 325 – 344, ACM, (1998).Google Scholar
  8. 8.
    Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F., Liu, K. A.: Fast Divisive Clustering Algorithm Using an Improved Discrete Particle Swarm Optimizer. In:Pattern Recognition Letters, vol. 31, issue. 11, pp. 1216-1225,Elsevier, (2010).Google Scholar
  9. 9.
    Rana, S., Jasola,S., Kumar,R.: A Hybrid Sequential Approach for Data Clustering using K-means and Particle Swarm Optimization Algorithm. In: International Journal of Engineering, Science and Technology,vol. 2, No. 6, pp. 167-176, (2010).Google Scholar
  10. 10.
    Bache, K., Lichman, M.: UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science,(2013).
  11. 11.
    Yang,X. S.: Nature-inspired Metaheuristic Algorithms, 2nd ed., Luniver press, United Kingdom, (2011).Google Scholar
  12. 12.
    Horng, M. H., Jiang, T. W.: Multilevel Image Thresholding Selection based on theFirefly Algorithm. In: 7th International Conference on Ubiquitous Intelligence & Computing and 7th International Conference on Autonomic & Trusted Computing (UIC/ATC), pp. 58 – 63, IEEE, (2010).Google Scholar
  13. 13.
    Senthilnath, J., Omkar, S. N., Mani, V.: Clustering Using Firefly Algorithm: Performance Study. In: Swarm and Evolutionary Computation, vol. 1, issue. 3, pp. 164-171, Elsevier, (2011).Google Scholar
  14. 14.
    Hassanzadeh, T., Meybodi, M. R.:A New Hybrid Approach for Data Clustering Using Firefly Algorithm and K-means. In: 16thIEEECSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 007 – 011, (2012).Google Scholar
  15. 15.
    Abshouri, A. A., Bakhtiary,A.: A New Clustering Method Based on Firefly and KHM. In: Journal of Communication and Computer, vol. 9, pp. 387-391, (2012).Google Scholar
  16. 16.
    Xu, G., Zhang,Y., Li, L.: Web mining and social networking, Techniques and application, New York, Springer, (2011).Google Scholar
  17. 17.
    Manning, C. D., Raghavan,P., Schütze,H.: Introduction to Information Retrieval, 1 ed., Cambridge University Press, (2008).Google Scholar
  18. 18.
    Lewis,D.: The reuters-21578 text categorizationtest collection, 1999.[Online].Available:
  19. 19.
    Murugesan, K, Zhang,J.: Hybrid Bisect K-means Clustering Algorthim. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216 – 219, IEEE, (2011).Google Scholar
  20. 20.
    Meghabghab, G., Kandel, A.: Search Engines,Link Analysis,and User’s Web Behaviour, Berlin Heidelberg: Springer-Verlag, (2008).Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Athraa Jasim Mohammed
    • 1
    Email author
  • Yuhanis Yusof
    • 1
  • Husniza Husni
    • 1
  1. 1.School of Computing, College of Arts and SciencesUniversiti Utara MalaysiaSintokMalaysia

Personalised recommendations