Advertisement

A high-performance parallel coral reef optimization for data clustering

  • Chun-Wei Tsai
  • Wei-Yan Chang
  • Yi-Chung Wang
  • Huan ChenEmail author
Focus
  • 7 Downloads

Abstract

As a critical research topic toward the new era of big data, how to develop a high-performance data analytics system has received significant research attention from different disciplines since the 2000s. In the literature, many recent works attempted to develop a high-performance data analytics system to handle the large amount of data (i.e., volume) from different information systems (i.e., variety) that typically will be created very quickly in a short time (i.e., velocity). In particular, several recent studies have shown that metaheuristic algorithms can be applied to many data mining optimization problems to provide a better way to find a high-quality result than traditional deterministic algorithms. A high-performance clustering algorithm for big data analytics system will be presented in this paper. The proposed algorithm is designed based on a new kind of metaheuristic algorithm, coral reef optimization with substrate layers (CRO-SL), to get a better cluster result. To improve the effectiveness and efficiency, the proposed CRO-SL scheme has been applied to a cloud computing platform as well to reduce the response time of a data analytics system. The simulation results show that the proposed algorithm is able to provide a better clustering result than the other clustering algorithms compared in this research, including k-means, genetic k-means algorithm, particle swarm optimization, and simple coral reef optimization algorithm in terms of the sum of squared errors.

Keywords

Data clustering Metaheuristic algorithm Coral reef optimization Cloud computing 

Notes

Funding

This work was supported in part by the Ministry of Science and Technology of Taiwan, R.O.C., under Contracts MOST106-2221-E-005-094, MOST107-2221-E-005-029, MOST107-2221-E-005-022 and MOST107-2218-E-005-018.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

References

  1. Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: Proceedings of the international conference on extending database technology, pp 530–533Google Scholar
  2. Ashish T, Kapil S, Manju B (2018) Parallel bat algorithm-based clustering using MapReduce. In: Proceedings of the networking communication and data knowledge engineering. Springer Singapore, pp 73–82Google Scholar
  3. Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on K-means algorithm for optimal clustering in \(R^N\). Inf Sci 146(1):221–237CrossRefzbMATHGoogle Scholar
  4. Baraniuk RG (2011) More is less: signal processing and the data deluge. Science 331(6018):717–719CrossRefGoogle Scholar
  5. Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput Surv 35(3):268–308CrossRefGoogle Scholar
  6. Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. In: Proceedings of the IEEE symposium on computer-based medical systems (CBMS’05), pp 383–388Google Scholar
  7. Daoudi M, Hamena S, Benmounah Z, Batouche M (2014) Parallel differential evolution clustering algorithm based on MapReduce. In: Proceedings of the international conference of soft computing and pattern recognition, pp 337–341Google Scholar
  8. Debuse JC, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. J Intell Inf Syst 9(1):57–81CrossRefGoogle Scholar
  9. Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
  10. Fang W, Lau KK, Lu M, Xiao X, Lam CK, Yang PY, He B, Luo Q, Sander PV, Yang K (2008) Parallel data mining on graphics processors. Tech. Rep., The Hong Kong University of Science and TechnologyGoogle Scholar
  11. Fayyad U, Piatetsky-shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17:37–54Google Scholar
  12. Ficco M, Esposito C, Palmieri F, Castiglione A (2018) A coral-reefs and game theory-based approach for optimizing elastic cloud resource allocation. Future Gener Comput Syst 78:343–352CrossRefGoogle Scholar
  13. Glover F, Kochenberger GA (eds) (2003) Handbook of metaheuristics. Springer, BerlinzbMATHGoogle Scholar
  14. Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2):95–113CrossRefGoogle Scholar
  15. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco. ISBN 0123814790, 9780123814791Google Scholar
  16. Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRefGoogle Scholar
  17. Hoffman P, Grinstein G, Pinkney D (1999) Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. In: Proceedings of the workshop on new paradigms in information visualization and manipulation in conjunction with the ACM international conference on information and knowledge management, pp 9–16Google Scholar
  18. Huang DW, Lin J (2010) Scaling populations of a genetic algorithm for job shop scheduling problems using MapReduce. In: Proceedings of the IEEE second international conference on cloud computing technology and science, pp 780–785Google Scholar
  19. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of international conference on neural networks, vol 4, pp 1942–1948Google Scholar
  20. Krishna K, Murty MN (1999) Genetic \(k\)-means algorithm. IEEE Trans Syst Man Cybern Part B 29(3):433–439CrossRefGoogle Scholar
  21. Lai JZC, Liaw Y-C, Liu J (2008) A fast VQ codebook generation algorithm using codeword displacement. Pattern Recognit Lett 41(1):315–319CrossRefzbMATHGoogle Scholar
  22. Laney D (2001) 3D data management: controlling data volume, velocity, and variety. Tech. Rep, META GroupGoogle Scholar
  23. Liu B (2009) Web data mining: exploring hyperlinks, contents, and usage data. Springer, BerlinzbMATHGoogle Scholar
  24. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727CrossRefGoogle Scholar
  25. Lu Y, Cao B, Rego C, Glover F (2018) A Tabu search based clustering algorithm and its parallel implementation on Spark. Appl Soft Comput 63:97–109CrossRefGoogle Scholar
  26. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: statistics, pp 281–297Google Scholar
  27. Maimon O (2009) Soft computing for knowledge discovery and data mining. Springer, Berlin. ISBN 144194351X, 9781441943514Google Scholar
  28. Medeiros IG, Xavier JC, Canuto AMP (2015) Applying the coral reefs optimization algorithm to clustering problems. In: Proceedings of the international joint conference on neural networks, pp 1–8Google Scholar
  29. Mitra S, Pal SK, Mitra P (2002) Data mining in soft computing framework: a survey. IEEE Trans Neural Netw 13(1):3–14CrossRefGoogle Scholar
  30. Ostfeld A, Salomons S (2005) A hybrid genetic-instance based learning algorithm for CE-QUAL-W2 calibration. J Hydrol 310(1):122–142CrossRefGoogle Scholar
  31. Parpinelli RS, Lopes HS, Freitas AA (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evolut Comput 6(4):321–332CrossRefzbMATHGoogle Scholar
  32. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(3):1–10Google Scholar
  33. Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems (CTS), pp 42–47Google Scholar
  34. Salcedo-Sanz S, Ser JD, Gil-López S, Landa-Torres I, Portilla-Figueras JA (2013a) The coral reefs optimization algorithm: an efficient meta-heuristic for solving hard optimization problems. In: Proceedings of the applied stochastic models and data analysis international conference, pp 751–758Google Scholar
  35. Salcedo-Sanz S, Pastor-Sánchez A, Gallo-Marazuela D, Portilla-Figueras A (2013b) A novel coral reefs optimization algorithm for multi-objective problems. In: Proceedings of the intelligent data engineering and automated learning, pp 326–333Google Scholar
  36. Salcedo-Sanz S, Ser JD, Landa-Torres I, Gil-López S, Portilla-Figueras JA (2014a) The coral reefs optimization algorithm: a novel metaheuristic for efficiently solving optimization problems. Sci World J 2014:1–15Google Scholar
  37. Salcedo-Sanz S, García-Díaz P, Portilla-Figueras J, Ser JD, Gil-López S (2014b) A coral reefs optimization algorithm for optimal mobile network deployment with electromagnetic pollution control criterion. Appl Soft Comput 24:239–248CrossRefGoogle Scholar
  38. Salcedo-Sanz S, Gallo-Marazuela D, Pastor-Sánchez A, Carro-Calvo L, Portilla-Figueras A, Prieto L (2014c) Offshore wind farm design with the coral reefs optimization algorithm. Renew Energy 63:109–115CrossRefGoogle Scholar
  39. Salcedo-Sanz S, Casanova-Mateo C, Pastor-Sánchez A, Sánchez-Girón M (2014d) Daily global solar radiation prediction based on a hybrid coral reefs optimization—extreme learning machine approach. Sol Energy 105:91–98CrossRefGoogle Scholar
  40. Salcedo-Sanz S, Pastor-Sánchez A, Ser JD, Prieto L, Geem Z (2015) A coral reefs optimization algorithm with harmony search operators for accurate wind speed prediction. Renew Energy 75:93–101CrossRefGoogle Scholar
  41. Salcedo-Sanz S, Camacho-Gómez C, Molina D, Herrera F (2016) A coral reefs optimization algorithm with substrate layers and local search for large scale global optimization. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp 3574–3581Google Scholar
  42. Sarazin T, Azzag H, Lebbah M (2014) SOM clustering using Spark-MapReduce. In: Proceedings of the IEEE international parallel distributed processing symposium workshops, pp 1727–1734Google Scholar
  43. Selim SZ, Alsultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recognit 24(10):1003–1008MathSciNetCrossRefGoogle Scholar
  44. Shmueli G, Bruce PC, Yahav I, Patel NR, L KC Jr (2017) Data mining for business analytics: concepts, techniques, and applications in R. Wiley, HobokenGoogle Scholar
  45. Teijeiro D, Pardo XC, González P, Banga JR, Doallo R (2016) Implementing parallel differential evolution on Spark. In: Proceedings of the applications of evolutionary computation. Springer, pp 75–90Google Scholar
  46. Tsai C, Lai C, Chiang M, Yang LT (2014) Data mining for internet of things: a survey. IEEE Commun Surv Tutor 16(1):77–97CrossRefGoogle Scholar
  47. Tsai C-W, Huang K-W, Yang C-S, Chiang M-C (2015) A fast particle swarm optimization for clustering. Soft Comput 19(2):321–338CrossRefGoogle Scholar
  48. Tsai C-W, Chang H-C, Hu K-C, Chiang M-C (2016) Parallel coral reef algorithm for solving JSP on Spark. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 1872–1877Google Scholar
  49. Tsai C-W, Liu S-J, Wang Y-C (2018) A parallel metaheuristic data clustering framework for cloud. J Parallel Distrib Comput 116:39–49CrossRefGoogle Scholar
  50. Tseng L-Y, Chen C (2008) Multiple trajectory search for large scale global optimization. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp 3052–3059Google Scholar
  51. User locations until 2012 (FINLAND) (2018). http://cs.uef.fi/mopsi/data/
  52. van der Merwe DW, Engelbrecht AP (2003) Data clustering using particle swarm optimization. Proc Evolut Comput 1:215–220Google Scholar
  53. Wang Y-C, Tsai C-W (2008) An efficient coral reef optimization with substrate layers for clustering problem on Spark. In: Proceedings of IEEE international conference on systems, man and cyberneticsGoogle Scholar
  54. Wang B, Yin J, Hua Q, Wu Z, Cao J (2016) Parallelizing \(k\)-means-based clustering on Spark. In: Proceedings of the international conference on advanced cloud and big data, pp 31–36Google Scholar
  55. Wu R, Zhang B, Hsu M (2009) Clustering billions of data points using GPUs. In: Proceedings of the combined workshops on unconventional high performance computing workshop plus memory access workshop, pp 1–6Google Scholar
  56. Wu B, Wu G, Yang M (2012) A MapReduce based ant colony optimization approach to combinatorial optimization problems. In: Proceedings of the international conference on natural computation, pp 728–732Google Scholar
  57. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  58. Zhou J, Yu K-M, Wu B-C (2010) Parallel frequent patterns mining algorithm on GPU. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 435–440Google Scholar
  59. Zü (2008) K-harmonic means data clustering with tabu-search method. Appl Math Model 32(6):1115–1125CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Sun Yat-sen UniversityKaohsiungTaiwan, ROC
  2. 2.Department of Computer Science and EngineeringNational Chung Hsing UniversityTaichungTaiwan, ROC

Personalised recommendations