Advertisement

Toward a MapReduce-Based K-Means Method for Multi-dimensional Time Serial Data Clustering

  • Yongzheng Lin
  • Kun Ma
  • Runyuan Sun
  • Ajith Abraham
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)

Abstract

Time series data is a sequence of real numbers that represent the measurements of a real variable at equal time intervals. There are some bottlenecks to process large scale data. In this paper, we firstly propose a K-means method for multi-dimensional time serial data clustering. As an improvement, MapReduce framework is used to implement this method in parallel. Different versions of k-means for several distance measures are compared, and the experiments show that MapReduce-based K-means has better speedup when the scale of data is larger.

Keywords

Clustering Time serial data K-means MapReduce 

Notes

Acknowledgments

This work was supported by the Science and Technology Program of University of Jinan (XKY1623 & XKY1734), the National Natural Science Foundation of China (61772231), the Shandong Provincial Natural Science Foundation (ZR2017MF025), the Shandong Provincial Key R&D Program of China (2015GGX106007), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LN13).

References

  1. 1.
    Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data k-means clustering using mapreduce. J. Supercomput. 70(3), 1249–1259 (2014)CrossRefGoogle Scholar
  2. 2.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  3. 3.
    Fu, T.C., Chung, F.L., Ng, V., Luk, R.: Pattern discovery from stock time series using self-organizing maps. In: Workshop Notes of KDD2001 Workshop on Temporal Data Mining, pp. 26–29 (2001)Google Scholar
  4. 4.
    Kalpakis, K., Gada, D., Puttagunta, V.: Distance measures for effective clustering of ARIMA time-series. In: Proceedings IEEE International Conference on Data Mining 2001, ICDM 2001, pp. 273–280. IEEE (2001)Google Scholar
  5. 5.
    Kumar, M., Patel, N.R., Woo, J.: Clustering seasonality patterns in the presence of errors. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 557–563. ACM (2002)Google Scholar
  6. 6.
    Mehrabi, M.G., Kannatey-Asibu Jr., E.: Hidden Markov model-based tool wear monitoring in turning. J. Manufact. Sci. Eng. 124(3), 651–658 (2002)CrossRefGoogle Scholar
  7. 7.
    Möller-Levet, C.S., Klawonn, F., Cho, K.H., Wolkenhauer, O.: Fuzzy clustering of short time-series and unevenly distributed sampling points. In: International Symposium on Intelligent Data Analysis, pp. 330–340. Springer (2003)Google Scholar
  8. 8.
    Owsley, L.M., Atlas, L.E., Bernard, G.D.: Self-organizing feature maps and hidden Markov models for machine-tool monitoring. IEEE Trans. Signal Process. 45(11), 2787–2798 (1997)CrossRefGoogle Scholar
  9. 9.
    Shumway, R.H.: Time-frequency clustering and discriminant analysis. Stat. Probab. Lett. 63(3), 307–314 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Vlachos, M., Lin, J., Keogh, E., Gunopulos, D.: A wavelet-based anytime algorithm for k-means clustering of time series. In: Proceedings of Workshop on Clustering High Dimensionality Data and its Applications, Citeseer (2003)Google Scholar
  11. 11.
    Wismüller, A., Lange, O., Dersch, D.R., Leinsinger, G.L., Hahn, K., Pütz, B., Auer, D.: Cluster analysis of biomedical image time-series. Int. J. Comput. Vis. 46(2), 103–128 (2002)CrossRefzbMATHGoogle Scholar
  12. 12.
    Xiong, Y., Yeung, D.Y.: Mixtures of ARMA models for model-based time series clustering. In: 2002 IEEE International Conference on Data Mining 2002, ICDM 2003. Proceedings, pp. 717–720. IEEE (2002)Google Scholar
  13. 13.
    Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: IEEE International Conference on Cloud Computing, pp. 674–679. Springer (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Yongzheng Lin
    • 1
    • 2
  • Kun Ma
    • 1
    • 2
  • Runyuan Sun
    • 1
    • 2
  • Ajith Abraham
    • 3
  1. 1.School of Information Science and EngineeringUniversity of JinanJinanChina
  2. 2.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina
  3. 3.Machines Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research ExcellenceAuburnUSA

Personalised recommendations