Abstract
With the fast development of information technology, the power data is growing at an exponentially speed. In the face of multi-dimensional and complicated power network data, the performance of the traditional clustering algorithms are not satisfied. How to effectively cope with the power network data is becoming a hot topic. This paper proposes a parallel implement of K-means clustering algorithm based on Hadoop distributed file system and Mapreduce distributed computing framework to deal this problem. The experimental results show that the performance of our proposed algorithm significantly outperforms the traditional clustering algorithm and the parallel clustering algorithm can significantly reduce the time complexity and can be applied in analyzing and mining of the power network data.
Keywords
- Parallel algorithm
- K-means clustering
- Power data
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aragues, R., Sander, C., Oliva, B.: Predicting cancer involvement of genes from heterogeneous data. BMC Bioinform. 9(1), 1–18 (2008)
Bai, Z.G., Zhang, H.D.: k-means clustgering algorithm based on mutation. J. Anhui Univ. Technol. 4, 019 (2008)
Dundar, M., Kou, Q., Zhang, B., He, Y.: Simplicity of kmeans versus deepness of deep learning: a case of unsupervised feature learning with limited data. In: IEEE International Conference on Machine Learning Applications (2015)
Lee, K.M.: Grid-based single pass classification for mixed big data. Adv. Nat. Appl. Sci. 9(21), 8737–8746 (2014)
Monmarch, N., Slimane, M., Venturini, G.: AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm (2003)
Naimi, A.I., Westreich, D.J.: Big data: a revolution that will transform how we live, work, and think. Information 17(1), 181–183 (2014)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE Symposium on MASS Storage Systems and Technologies, pp. 1–10 (2010)
Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150(150), 331–345 (2015)
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28(2), 3–28 (2014)
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: ACM International Workshop on Data Warehousing and Olap, pp. 14–21 (2002)
Wu, G., Lin, H., Fu, E., Wang, L.: An improved k-means algorithm for document clustering. In: International Conference on Computer Science and Mechanical Automation, pp. 65–69 (2015)
Acknowledgment
This work is supported by National Science Foundation of China Grant #61672088, Fundamental Research Funds for the Central Universities #2016JBM022 and #2015ZBJ007, Open Research Funds of Guangdong Key Laboratory of Popular High Performance Computers. The corresponding author is Yidong Li.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd
About this paper
Cite this paper
Meng, X., Chen, L., Li, Y. (2017). A Parallel Clustering Algorithm for Power Big Data Analysis. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_51
Download citation
DOI: https://doi.org/10.1007/978-981-10-6442-5_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6441-8
Online ISBN: 978-981-10-6442-5
eBook Packages: Computer ScienceComputer Science (R0)