Skip to main content

A Parallel Clustering Algorithm for Power Big Data Analysis

  • 1326 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 729)


With the fast development of information technology, the power data is growing at an exponentially speed. In the face of multi-dimensional and complicated power network data, the performance of the traditional clustering algorithms are not satisfied. How to effectively cope with the power network data is becoming a hot topic. This paper proposes a parallel implement of K-means clustering algorithm based on Hadoop distributed file system and Mapreduce distributed computing framework to deal this problem. The experimental results show that the performance of our proposed algorithm significantly outperforms the traditional clustering algorithm and the parallel clustering algorithm can significantly reduce the time complexity and can be applied in analyzing and mining of the power network data.


  • Parallel algorithm
  • K-means clustering
  • Power data

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Aragues, R., Sander, C., Oliva, B.: Predicting cancer involvement of genes from heterogeneous data. BMC Bioinform. 9(1), 1–18 (2008)

    CrossRef  Google Scholar 

  2. Bai, Z.G., Zhang, H.D.: k-means clustgering algorithm based on mutation. J. Anhui Univ. Technol. 4, 019 (2008)

    Google Scholar 

  3. Dundar, M., Kou, Q., Zhang, B., He, Y.: Simplicity of kmeans versus deepness of deep learning: a case of unsupervised feature learning with limited data. In: IEEE International Conference on Machine Learning Applications (2015)

    Google Scholar 

  4. Lee, K.M.: Grid-based single pass classification for mixed big data. Adv. Nat. Appl. Sci. 9(21), 8737–8746 (2014)

    Google Scholar 

  5. Monmarch, N., Slimane, M., Venturini, G.: AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm (2003)

    Google Scholar 

  6. Naimi, A.I., Westreich, D.J.: Big data: a revolution that will transform how we live, work, and think. Information 17(1), 181–183 (2014)

    Google Scholar 

  7. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE Symposium on MASS Storage Systems and Technologies, pp. 1–10 (2010)

    Google Scholar 

  8. Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150(150), 331–345 (2015)

    CrossRef  Google Scholar 

  9. Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28(2), 3–28 (2014)

    CrossRef  Google Scholar 

  10. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: ACM International Workshop on Data Warehousing and Olap, pp. 14–21 (2002)

    Google Scholar 

  11. Wu, G., Lin, H., Fu, E., Wang, L.: An improved k-means algorithm for document clustering. In: International Conference on Computer Science and Mechanical Automation, pp. 65–69 (2015)

    Google Scholar 

Download references


This work is supported by National Science Foundation of China Grant #61672088, Fundamental Research Funds for the Central Universities #2016JBM022 and #2015ZBJ007, Open Research Funds of Guangdong Key Laboratory of Popular High Performance Computers. The corresponding author is Yidong Li.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yidong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Meng, X., Chen, L., Li, Y. (2017). A Parallel Clustering Algorithm for Power Big Data Analysis. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)