Advertisement

Abstract

Nowadays, when the data size grows exponentially, it becomes more and more difficult to extract useful information in reasonable time. One very important technique to exploit data is clustering and many algorithms have been proposed like k-means and its variations (k-medians, kernel k-means etc.), DBSCAN, OPTICS and others. The time complexity of all these methods is prohibitive (NP hard) in order to make decisions on time and the solution is either new faster algorithms to be invented, or increase the performance of the old well tested ones. Distributed, parallel, and multi-core GPU computing or even combination of these platforms consist a very promising method to speed up clustering techniques. In this paper, parallel versions of the above mentioned algorithms were used and implemented in order to increase their performance and consequently, their perspectives in several fields like industry, political/social sciences, telecommunications businesses, and intrusion detection in big networks. The parallel versions of clustering techniques are presented here and two different cases of their applications on different fields are illustrated. The results obtained are very promising concerning their quality and performance and therefore, the perspective of using clustering techniques in industry and sciences is increased.

Keywords

Clustering k-means DBSCAN Parallel computing 

References

  1. 1.
    Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015).  https://doi.org/10.1016/j.cosrev.2015.05.002MathSciNetCrossRefGoogle Scholar
  2. 2.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1: Statistics, pp. 281–297. University of California Press, Berkeley (1967). https://projecteuclid.org/euclid.bsmsp/1200512992
  3. 3.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: KDD-96 Proceedings, pp. 226–231 (1996). https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf
  4. 4.
    MPICH: High-Performance Portable Message Passing Interface (2018). https://www.mpich.org/
  5. 5.
    OpenMP: The OpenMP API Specification for Parallel Programming (2018). https://www.openmp.org/
  6. 6.
    CUDA Zone: NVDIA Accelerated Computing (2018). https://developer.nvidia.com/cuda-zone
  7. 7.
    Zou, H., Zou, Z., Wang, X.: An enhanced K-means algorithm for water quality analysis of the Haihe River in China. Int. J. Environ. Res. Public Health 12(11), 14400–14413 (2015).  https://doi.org/10.3390/ijerph121114400CrossRefGoogle Scholar
  8. 8.
    Dubey, S.R., Dixit, P., Singh, N., Gupta, J.P.: Infected fruit part detection using k-means clustering segmentation technique international. J. Artif. Intell. Interact. Multimed. 2(2), 65–72.  https://doi.org/10.9781/ijimai.2013.229CrossRefGoogle Scholar
  9. 9.
    NallamReddy, S., Behera, S., Karadagi, S., Desik, A.: Application of multiple random centroid (MRC) based k-means clustering algorithm in insurance-a review article. Oper. Res. Appl. Int. J. 1(1), 15–21 (2014)Google Scholar
  10. 10.
    Ghorbani, A., Farzai, S.: Fraud detection in automobile insurance using a data mining based approach. Int. J. Mechatron. Electr. Comput. Technol. 8(27), 3764–3771 (2018). https://doi.org/IJMEC/10.225163
  11. 11.
    Momeni, M., Mohseni, M., Soofi, M.: Clustering stock market companies via k-means algorithm. Kuwait Chapter Arab. J. Bus. Manag. Rev. 4(5), 1–10 (2015).  https://doi.org/10.12816/0018959CrossRefGoogle Scholar
  12. 12.
    Zhao, J., Zhang, W., Liu, Y.: Improved k-means cluster algorithm in telecommunications enterprises customer segmentation. In: 2010 Information IEEE International Conference on Theory and Information Security (ICITIS), Beijing, pp. 167–169 (2010). https://doi.org/10.1109/ICITIS.2010.5688749
  13. 13.
    Savvas, I.K., Tselios, D., Garani, G.: Distributed and multi-core version of k-means algorithm. Int. J. Grid Util. Comput. (2018, accepted). http://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=ijguc
  14. 14.
    Savvas, I.K., Tselios, D.: Combining distributed and multi-core programming techniques to increase the performance of k-means algorithm. In: 26th IEEE International WETICE Conference, pp. 96–100 (2017) Google Scholar
  15. 15.
    Savvas, I.K., Sofianidou, G.N.: A novel near-parallel version of k-means algorithm for n-dimensional data objects using MPI. Int. J. Grid Util. Comput. 7(2), 80–91 (2016)CrossRefGoogle Scholar
  16. 16.
    Savvas, I.K., Sofianidou, G.N.: Parallelizing k-means algorithm for 1-d data using MPI. In: 2014 IEEE 23rd International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Milano, pp. 179–184 (2016).  https://doi.org/10.1109/wetice.2014.13
  17. 17.
    Savvas, I.K., Sofianidou, G.N., Kechadi, M.: Applying the k-means algorithm in big raw data sets with Hadoop and MapReduce. In: Big Data Management, Technologies, and Applications, pp. 23–46. IGI Global (2014).  https://doi.org/10.4018/978-1-4666-4699-5, ISBN13: 9781466646995, ISBN10: 1466646993
  18. 18.
    Savvas, I.K., Kechadi, M.: Mining on the cloud: k-means with MapReduce. In: 2nd International Conference on Cloud Computing and Services Science, CLOSER, pp. 413–418 (2012)Google Scholar
  19. 19.
    Savvas, I.K., Tselios, D.: Parallelizing DBSCAN algorithm using MPI. In: 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Paris, pp. 77–82 (2016).  https://doi.org/10.1109/wetice.2016.26
  20. 20.
    Ye, L., Qiuru, C., Haixu, X., Guangping, Z.: Customer segmentation for telecom with the k-means clustering method. Inf. Technol. J. 12, 409–413 (2013)CrossRefGoogle Scholar
  21. 21.
    Savvas, I.K., Chaikalis, C., Messina, F., Tselios, D.: Understanding customers’ behaviour of telecommunication companies increasing the efficiency of clustering techniques. In: 25th IEEE Telecommunications Forum TELFOR, Serbia (2017)Google Scholar
  22. 22.
    Mazis, I.T.: Dissertationes academicae geopoliticae. Papazisis Publications, Athens (2015)Google Scholar
  23. 23.
    World Bank: Countries and Economies, January 2015. http://data.worldbank.org/country
  24. 24.
    Savvas, I.K., Stogiannos, A., Mizis, I.T.: A study of comparative clustering of EU-countries using the DBSCAN and k-means techniques within the theoretical framework of systemic geopolitical analysis. Int. J. Grid Util. Comput. 8(2), 94–108 (2017)CrossRefGoogle Scholar
  25. 25.
    Jolliffe, I.T.: Principal Component Analysis, Series: Springer Series in Statistics, 2nd edn., XXIX, 487, p. 28 illus. Springer, New York (2002). ISBN 978-0-387-95442-4Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringT.E.I. of ThessalyLarissaGreece

Personalised recommendations