Advertisement

Improving Clustering via a Fine-Grained Parallel Genetic Algorithm with Information Sharing

  • Storm BartlettEmail author
  • Md Zahidul Islam
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1127)

Abstract

Clustering is a very common unsupervised machine learning task, used to organise datasets into groups that can provide useful insight. Genetic algorithms (GAs) are often applied to the task of clustering as they are effective at finding viable solutions to optimization problems. Parallel genetic algorithms (PGAs) are an existing approach that maximizes the effectiveness of GAs by making them run in parallel with multiple independent subpopulations. Each subpopulation can also communicate by exchanging information throughout the genetic process, enhancing their overall effectiveness. PGAs offer greater performance by mitigating some of the weaknesses of GAs. Firstly, having multiple subpopulations enable the algorithm to more widely explore the solution space. This can reduce the probability of converging to poor-quality local optima, while increasing the chance of finding high-quality local optima. Secondly, PGAs offer improved execution time, as each subpopulation is processed in parallel on separate threads. Our technique advances an existing GA-based method called GenClust++, by employing a PGA along with a novel information sharing technique. We also compare our technique with 2 alternative information sharing functions, as well with no information sharing. On 5 commonly researched datasets, our approach consistently yields improved cluster quality and a markedly reduced runtime compared to GenClust++.

Keywords

Genetic algorithm Clustering K-Means Parallel genetic algorithm Data mining Machine learning 

References

  1. 1.
    Hendricks, D., Gebbie, T., Wilcox, D.: High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm. South Afr. J. Sci. 112, 57 (2016)Google Scholar
  2. 2.
    Beg, A.H., Islam, Md.Z., Estivill-Castro, V.: Genetic algorithm with healthy population and multiple streams sharing information for clustering. Knowl.-Based Syst. 114, 61–78 (2016)CrossRefGoogle Scholar
  3. 3.
    Islam, Md.Z., Estivill-Castro, V., Rahman, Md.A., Bossomaier, T.: Combining k-means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering. Expert Syst. Appl. 91, 402–417 (2018)CrossRefGoogle Scholar
  4. 4.
    Cavuoti, S., Garofalo, M., Brescia, M., Pescape’, A., Longo, G., Ventre, G.: Genetic algorithm modeling with GPU parallel computing technology. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F. (eds.) Neural Nets and Surroundings. Smart Innovation, Systems and Technologies, vol. 19, pp. 29–39. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-35467-0_4CrossRefGoogle Scholar
  5. 5.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)CrossRefGoogle Scholar
  6. 6.
    Li, X., Kirley, M.: The effects of varying population density in a fine-grained parallel genetic algorithm, vol. 2, pp. 1709–1714, February 2002Google Scholar
  7. 7.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wessley, Boston (2005)Google Scholar
  8. 8.
    Rahman, Md.A., Islam, Md.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl.-Based Syst. 71, 21–28 (2014)CrossRefGoogle Scholar
  9. 9.
    Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33, 1455–1465 (2000)CrossRefGoogle Scholar
  10. 10.
    University of Waikato - collections of datasets. https://www.cs.waikato.ac.nz/ml/weka/datasets.html. Accessed 7 July 2018
  11. 11.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml. Accessed 7 July 2018
  12. 12.
    Kohlmorgen, U., Schmeck, H., Haase, K.: Experiences with fine-grained parallel genetic algorithms. Ann. Oper. Res.-Ann. OR 90, 203–219 (1999)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Computing and MathematicsCharles Sturt UniversityBathurstAustralia

Personalised recommendations