Abstract
To solve the problems of density peaks clustering (DPC) algorithm sensitive to cutoff distance and subjectivity of clustering center selection, we propose an improved density peaks algorithm based on information entropy and merging strategy (DPC-IEMS) for realizing power load curve clustering. First, a cutoff distance optimization method based on information entropy is proposed. This method uses sparrow search algorithm (SSA) to find the minimum value of information entropy about the product of local density and relative distance to calculate the optimal cutoff distance suitable for the load datasets. Then, a merging strategy is proposed to realize the adaptive selection of clustering centers. This strategy first generates a large number of initial sub-clusters by DPC, and then merges the sub-clusters using the fusion condition until the final iteration condition is satisfied. The performance of DPC-IEMS algorithm is evaluated on the U.S. load datasets and the Chinese load datasets, and the effectiveness and practicality of DPC-IEMS algorithm for power load curve clustering are fully demonstrated.
Similar content being viewed by others
Availability of data and materials
The U.S. load data are available to download from https://dx.doi.org/10.25984/1876417. The Chinese load data cannot be shared for privacy reasons.
Abbreviations
- DPC:
-
Density peaks clustering algorithm
- SSA:
-
Sparrow search algorithm
- FCM:
-
Fuzzy C-means
- KNN:
-
K-Nearest neighbor
- SC:
-
Silhouette coefficient
- CH:
-
Calinski Harabasz score
- DVI:
-
Dunn validity Index
- DBI:
-
Davies Bouldin score
- WPD:
-
Wavelet packet decomposition
- DWT:
-
Discrete wavelet transform
- PCA:
-
Principal component analysis
- DPC-IE:
-
DPC algorithm based on information entropy
- DPC-MS:
-
DPC algorithm based on merging strategy
- DPC-IEMS:
-
Density peaks algorithm based on information entropy and merging strategy
- \(\rho_{i}\) :
-
The local density of DPC
- \(\delta_{i}\) :
-
The relative distance of DPC
- \(d_{c}\) :
-
The cutoff distance of DPC
- \(d_{ij}\) :
-
The Euclidean distance between point i and point j
- \(\gamma_{i}\) :
-
The product of local density and relative distance
- \(CL^{\prime}\) :
-
The initial sub-clusters
- \(ICl^{\prime}\) :
-
The initial sub-cluster center indexes
- \(Cl^{\prime}_{j}\) :
-
The j-th initial sub-clusters
- \(icl^{\prime}_{j}\) :
-
The j-th initial sub-cluster center index
- \(d_{near}\) :
-
Distance between the cluster center of the initial subcluster and the cluster's nearest neighbor curve
- FT:
-
The fusion threshold
References
Gungor VC, Sahin D, Kocak T et al (2011) Smart grid technologies: communication technologies and standards. IEEE Trans Industr Inf 7(4):529–539. https://doi.org/10.1109/TII.2011.2166794
Yang S, Shen C (2013) A review of electric load classification in smart grid environment. Renew Sustain Energy Rev 24:103–110. https://doi.org/10.1016/j.rser.2013.03.023
Jia M, Wang Y, Shen C et al (2020) Privacy-preserving distributed clustering for electrical load profiling. IEEE Tran Smart Grid 12(2):1429–1444. https://doi.org/10.1109/TSG.2020.3031007
Shikhin VA, Shikhina AV, Kouzalis A (2022) Automated electricity price forecast using combined models. Autom Remote Control 83(1):153–163. https://doi.org/10.1134/S0005117922010118
Dinesh C, Makonin S, Bajić IV (2019) Residential power forecasting using load identification and graph spectral clustering. IEEE Trans Circuits Syst II Exp Briefs 66(11):1900–1904. https://doi.org/10.1109/TCSII.2019.2891704
Aurangzeb K, Alhussein M, Javaid K et al (2021) A pyramid-CNN based deep learning model for power load forecasting of similar-profile energy customers based on clustering. IEEE Access 9:14992–15003. https://doi.org/10.1109/ACCESS.2021.3053069
Nie Y, Jiang P, Zhang H (2020) A novel hybrid model based on combined preprocessing method and advanced optimization algorithm for power load forecasting. Appl Soft Comput 97:106809. https://doi.org/10.1016/j.asoc.2020.106809
Cheng Z, Wang L, Yang Y (2023) A hybrid feature pyramid CNN-LSTM model with seasonal inflection month correction for medium-and long-term power load forecasting. Energies 16(7):3081. https://doi.org/10.3390/en16073081
Guo B, Xu Y, Li R et al (2018) Power User Profile under Multi-source Heterogeneous Data Fusion in Smart Grid. DEStech Trans. Comput. Sci. Eng. 10:1–6
Wang J, Zhong H, Ma Z et al (2017) Review and prospect of integrated demand response in the multi-energy system. Appl Energy 202:772–782. https://doi.org/10.1016/j.apenergy.2017.05.150
Zhao Z, Wang J, Liu Y (2017) User electricity behavior analysis based on K-means plus clustering algorithm. In: 2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC), pp. 484–487. IEEE Computer Society, Dalian, China. https://doi.org/10.1109/ICCTEC.2017.00111
Binh PTT, Le TN, Xuan NP (2018) Advanced som & k mean method for load curve clustering. Int. J. Electric. Comput. Eng. 8(6):4829
Panapakidis IP, Christoforidis GC (2017) Implementation of modified versions of the K-means algorithm in power load curves profiling. Sustain Cities Soc 35:83–93. https://doi.org/10.1016/j.scs.2017.08.002
Qtaish A, Braik M, Albashish D et al (2023) Optimization of K-means clustering method using hybrid capuchin search algorithm. J Supercomput 2023:1–60. https://doi.org/10.1007/s11227-023-05540-5
Dong R, Huang MX (2014) An improved FCM algorithm based on subtractive clustering for power load classification. Adv Mater Res 986:206–210. https://doi.org/10.4028/www.scientific.net/AMR.986-987.206
Shang C, Gao J, Liu H et al (2021) Short-term load forecasting based on PSO-KFCM daily load curve clustering and CNN-LSTM model. IEEE Access 9:50344–50357. https://doi.org/10.1109/ACCESS.2021.3067043
Gao C, Wu Y, Tang J et al (2020) Daily power load curves analysis based on grey wolf optimization clustering algorithm. In: Proceedings of PURPLE MOUNTAIN FORUM 2019-International Forum on Smart Grid Protection and Control: Volume II, pp. 661–671. Springer Singapore, Nanjing, China. https://doi.org/10.1007/978-981-13-9783-7_54
Zhang Y, Li X, Wang L et al (2023) An autocorrelation incremental fuzzy clustering framework based on dynamic conditional scoring model. Inf Sci 648:119567. https://doi.org/10.1016/j.ins.2023.119567
Ezugwu AE, Ikotun AM, Oyelade OO et al (2022) A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110:104743. https://doi.org/10.1016/j.engappai.2022.104743
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072
Li Q, Wang G, Zhang Y et al (2023) Analysis of user electricity consumption behavior based on density peak clustering with shared neighbors and attractiveness. Concurrency and Comput: Practice and Exp 35(3):e7518. https://doi.org/10.1002/cpe.7518
Chen J, Ding J, Tian S et al (2018) An improved density peaks clustering algorithm for power load profiles clustering analysis. Power Syst Protect Control 46(20):85–93. https://doi.org/10.7667/PSPC171386
Du H, Zhai Q, Wang Z et al (2022) A dynamic density peak clustering algorithm based on k-nearest neighbor. Security and Commun Netw 2022:1–15. https://doi.org/10.1155/2022/7378801
Yin S, Wu R, Li P, et al (2022) Density Peaks Clustering Algorithm Based on K Nearest Neighbors. In: Advances in Intelligent Systems and Computing: Proceedings of the 7th Euro-China Conference on Intelligent Data Analysis and Applications, pp. 129–144. Singapore: Springer Nature, Hangzhou, China. https://doi.org/10.1007/978-981-16-8048-9_13
Wang C, Qi X, Li W et al (2021) Clustering of residential power consumption behavior based on improved density peaks method. In: 2021 IEEE Sustainable Power and Energy Conference (iSPEC), pp. 2412–2416. IEEE, Nanjing, China. https://doi.org/10.1109/iSPEC53008.2021. 9736054
Han Y, Li K, Ge F et al (2021) Online fault diagnosis for sucker rod pumping well by optimized density peak clustering. ISA Trans 120:222–234. https://doi.org/10.1016/j.isatra.2021.03.022
Jiang D, Zang W, Sun R et al (2020) Adaptive density peaks clustering based on K-nearest neighbor and Gini coefficient. IEEE Access 8:113900–113917. https://doi.org/10.1109/ACCESS.2020.3003057
Xu T, Jiang J (2022) A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation. Expert Syst Appl 195:116539. https://doi.org/10.1016/j.eswa.2022.116539
Yang Q, Yin S, Li Q et al (2022) Analysis of electricity consumption behaviors based on principal component analysis and density peak clustering. Concurrency and Comput: Practice and Exp 34(21):e7126. https://doi.org/10.1002/cpe.7126
Ziwen GU, Peng LI, Xun L et al (2021) A multi-granularity density peak clustering algorithm based on variational mode decomposition. Chin J Electron 30(4):658–668. https://doi.org/10.1049/cje.2021.03.001
Sun L, Qin X, Ding W et al (2022) Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing 473:159–181. https://doi.org/10.1016/j.neucom.2021.12.019
Ding S, Du W, Xu X et al (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276. https://doi.org/10.1016/j.ins.2022.12.078
Wei X, Peng M, Huang H et al (2023) An overview on density peaks clustering. Neurocomputing 554:126633. https://doi.org/10.1016/j.neucom.2023.126633
Xue J, Shen B (2020) A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Control Eng 8(1):22–34. https://doi.org/10.1080/21642583.2019.1708830
Li N, Wu X, Dong J et al (2022) A density-based matrix transformation clustering method for electrical load. PLoS ONE 17(8):e0272767. https://doi.org/10.1371/journal.pone.0272767
Wand MP (1997) Data-based choice of histogram bin width. Am Stat 51(1):59–64
Ivezić Ž (2014) Statistics, data mining, and machine learning in astronomy. In: Ivezić Ž, Connolly AJ, VanderPlas JT, Gray A (eds) Statistics, data mining, and machine learning in astronomy. Princeton University Press, pp 153–156
Freedman D, Diaconis P (1981) On the histogram as a density estimator: L 2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57(4):453–476. https://doi.org/10.1007/BF01025868
National Renewable Energy Laboratory (NREL). (2021). End-Use Load Profiles for the U.S. Building Stock . Retrieved from https://doi.org/10.25984/1876417.
Bai Y, Zhou Y, Liu J (2022) Clustering analysis of daily load curve based on deep convolution embedding clustering. Power Syst Technol 46(6):1–11
Wang J, Wang K, Jia R et al (2020) Research on load clustering based on singular value decomposition and k-means clustering algorithm. In: 2020 Asia Energy and Electrical Engineering Symposium (AEEES), pp.831–835. IEEE, Chengdu, China https://doi.org/10.1109/AEEES48850.2020.9121555
Rajabi A, Eskandari M, Ghadi MJ et al (2020) A comparative study of clustering techniques for electrical load pattern segmentation. Renew Sustain Energy Rev 120:109628. https://doi.org/10.1016/j.rser.2019.109628
Rhif M, Ben Abbes A, Farah IR et al (2019) Wavelet transform application for/in non-stationary time-series analysis: a review. Appl Sci 9(7):1345. https://doi.org/10.3390/app9071345
Zhang C, Huang C, Wang Y et al (2022) Clustering analysis of user load characteristics under new power system based on improved k-means clustering algorithm. In: 2022 5th World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM), pp.1019–1022. IEEE, Ma’anshan, China. https://doi.org/10.1109/WCMEIM56910.2022.10021391
Bai Y, Fang H, Huang H, et al (2022) A novel improved approach for fast and accurate load clustering in power system. In: 2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Vol. 5, 1627–1632
Funding
This work was supported by the National Natural Science Foundation of China (No.42075129) and Hebei Province Natural Science Foundation (No.E2021202179).
Author information
Authors and Affiliations
Contributions
YY: Conceptualization, Methodology, Formal analysis, Validation, Investigation, Software, Writing- Original Draft, Resources, Visualization. LW: Conceptualization, Methodology, Formal analysis, Validation, Writing- Original Draft, Writing—Review & Editing, Funding acquisition. ZC: Data Curation, Visualization.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Wang, L. & Cheng, Z. Density peaks algorithm based on information entropy and merging strategy for power load curve clustering. J Supercomput 80, 8801–8832 (2024). https://doi.org/10.1007/s11227-023-05793-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05793-0