2.1 Introduction

Ports are important hubs for maritime activities and trade transportation. The large amount of handling machinery, vehicles, and berthed ships make ports a major energy consumer and emitter. Driven by the goals of “carbon peaking” and “carbon neutrality,” it is urgent to explore the carrying capacity of port power grids for the electrification transformation of high-energy consumption and high-emission loads, and to develop, utilize, and absorb high-proportion new energy on site. To make the most of the new energy for transportation, handling, and ship power supply, optimizing the configuration of distributed power sources and energy storage has become the key to energy conservation and emission reduction in green port areas.

Unlike conventional microgrids, port microgrids include not only micro power sources, conventional power loads, energy storage systems, and control devices, but also a large number of electric loading and unloading machinery and shore power equipment. The load characteristics of the port area are closely related to logistics and transportation, as well as production operation plans, especially affected by the number of ships, tonnage, and types of cargo, without obvious natural periodic characteristics. On the other hand, new energy sources such as photovoltaics in the port area show power generation characteristics strongly correlated with natural periods such as days, weeks, and months. The inconsistency of randomness and periodic fluctuations between sources and loads poses challenges to the depiction of port operation scenarios, and also poses severe challenges to the optimization and configuration methods based on typical periods such as daily and weekly cycles [1, 2].

The scenario method can represent uncertain variables through multiple deterministic typical scenarios to achieve optimal planning of high-stochastic systems [3]. Increasing the quantity of typical scenario data can effectively improve the accuracy of configuration, but too many operating points not only create redundancy but also result in enormous computational complexity, which can even make the problem unsolvable. Therefore, how to utilize massive data such as load, wind speed, and light intensity, extract key information to generate typical scenarios containing limited operating points, and accurately reflect the system’s operating conditions is crucial for the optimization of renewable energy and hybrid storage configuration in a port microgrid.

Selecting a series of typical days or weeks as typical scenarios is a common method in microgrid planning. Literature [4] selects typical days from each month included in the planning period for capacity expansion optimization. Literature [5] selects operating days with the highest load, lowest load, and maximum load fluctuation as typical days. These two methods are easy to implement and operate but difficult to ensure the representativeness of the selected typical scenarios. The above literature uses days as the length of the scenario, which cannot reflect the fluctuations of load and distributed generation over a longer period of time and is not suitable for port microgrids with unclear natural cycles. Literature [6] uses planning scenarios with a length of 5 days to optimize the configuration of large-capacity energy storage, and literature [7] selects typical week data for optimization. The above methods can effectively reduce the computational complexity of microgrid optimization configuration but are difficult to reflect the correlation between different regions and data types. For example, when the load in the port production area is high, the load in the living area is usually low, and on sunny days, there is often more air convection and wind compared to cloudy days.

In order to extract potential correlation information from massive data, some literature has proposed data compression algorithms to generate typical scenarios. K-means clustering algorithm [8] and hierarchical clustering algorithm [9] are often used to filter typical information from large data sets. Literature [10] integrates the above two algorithms for selecting typical scenarios, which not only retains the advantages of fast calculation speed of K-means algorithm, but also the sensitivity to outlier values of hierarchical clustering. Literature [11] analyzes the spatiotemporal correlation of wind power using multivariate normal distribution and Copula function, and combines Monte Carlo sampling to generate a set of wind power output scenarios with spatiotemporal correlation. Literature [12] applies vector K-medoids clustering method to classify loads, generates typical loads, and then applies K-means clustering method to compress typical loads and distributed generation data to obtain typical scenarios. Literature [13] constructs spatial and temporal feature extraction units for time-series generative adversarial networks to generate daily/monthly wind and solar power time series. The above literature studies the selection method of typical scenarios in planning and achieves good application effects in conventional microgrids. However, the selected scenarios are divided into time scales according to natural cycles, and the effect is average when applied to data with weak periodicity such as port load.

This chapter proposes a high-fidelity compression and reconstruction method for operating scenarios of port microgrid, and applies it to the optimization configuration of port source and storage. In view of the characteristic that the natural cycle of port microgrid operation data is not obvious, hierarchical clustering algorithm and continuous hierarchical clustering algorithm are proposed. According to the dynamic adjustment of the time scale corresponding to each operating point based on the operation characteristics of the port microgrid, the high-fidelity compression and reconstruction of operating scenarios are realized through two data processing steps, and the typical operating scenarios of the port are obtained. Then, the source and storage optimization configuration model is established, and the time scale of the model is dynamically adjusted based on the obtained typical scenarios to obtain the optimal configuration of the port microgrid source and storage. Finally, taking Rizhao Port microgrid as an example, the representativeness of the selected typical scenarios and the accuracy of the optimization configuration results are verified.

2.2 Scenario-Based Depiction of the Fluctuating Characteristics of Port Microgrid

In green port microgrids with rich distributed generation resources and concentrated loads, the penetration rate of renewable energy will increase to over 80%. Therefore, port microgrids should also be equipped with various types of energy storage devices to improve system regulation capabilities and promote renewable energy consumption. Compared to traditional power grids, renewable energy in port microgrids does not have scale effect, and its randomness and volatility are stronger. At the same time, the load in port microgrids includes a high proportion of shore power, belt conveyors, shore cranes, and other equipment, and the load curve is closely related to port production operations. The load operation of conventional microgrids has a clear intra-day rule between days, and the load curve basically coincides with the average value of the two-day load. As shown in Fig. 2.1, the load in port microgrids does not have an intra-day rule and has the characteristics of long-lasting and large fluctuations in a short period of time. Therefore, accurately characterizing the fluctuation characteristics of renewable energy and load in the system is the key and difficulty in the optimization configuration process of source and storage in the microgrid of the port.

Fig. 2.1
figure 1

The load curve of port microgrids

This chapter uses a scenario-based optimization method to address the impact of source load fluctuations on system optimization configuration. The operating scenarios contain multiple operating points, each of which includes system operating information such as load, renewable energy generation, and energy prices at a port within the same period. In existing methods, the time scale corresponding to the operating point is determined based on the scenario's purpose. In most literature focused on optimization configuration, an operating point corresponds to one hour. This chapter focuses on the characteristics of a port microgrid and proposes a high-fidelity compression and reconstruction method for operating scenarios with dynamically adjustable time scales for each operating point. The time scale corresponding to each operating point in the scenario is determined by the system’s operating state, ranging from one hour to several hours, thereby increasing the density of data information in typical scenarios. The overall process of source-storage optimization configuration for a port microgrid is shown in Fig. 2.2. First, the fluctuation characteristics of uncertain sources, such as load, wind speed, and solar irradiance, are extracted from a massive amount of operating data to construct a series of typical operating scenarios for the port microgrid. Through simulation and approximation of the distribution of random variables in scenarios, the random variables’ future situations are characterized. Then, using the typical scenarios as input, the scenario-based stochastic planning problem is transformed into a deterministic model corresponding to the scenario set Ω. Finally, the model is solved to obtain the optimal capacity of renewable energy and energy storage devices.

Fig. 2.2
figure 2

The optimal generation and storage configuration of port microgrids

2.3 High-Fidelity Compression and Reconstruction Method

The accurate characterization of operational characteristics of microgrids in ports through typical scenarios is of great significance for improving the accuracy and feasibility of optimizing source and storage configurations in green ports. This chapter uses a vector autoregression model to generate microgrid operational scenarios based on massive historical data inputs, simulating the operational situation of microgrids in ports over the next few years. The more operational points included in the scenario, the more accurate the simulation of the random variable, but a large number of operational points can make the model difficult to solve. Selecting typical scenarios with an appropriate number of operational points and accurately describing the random variables is crucial to determine the accuracy of the optimization results.

This section proposes a high-fidelity compression and reconstruction method for generating typical scenarios of microgrids in ports. Based on existing system operational scenarios, this method can dynamically adjust the temporal scale of operational points according to operational characteristics, as shown in Fig. 2.3. The method reduces the data size by utilizing the similarities between operational weeks and adjacent operational points in ports microgrids, while retaining the properties of the original data and the correlation between different data types, thereby increasing the density of data information.

Fig. 2.3
figure 3

High-fidelity compression and refactoring method of port microgrid operation scenarios

2.3.1 Port Data Extraction and Integration

The data contained in the operating scenarios of a port microgrid may exhibit differences in both the temporal scale and the data types. Therefore, prior to further processing of data, various types of data contained in the scenarios need to be extracted and integrated. Firstly, the entire temporal cycle data is processed into hourly units using systematic sampling or interpolation methods. Then, a two-dimensional data matrix is obtained by using time as one dimension and different nodes or data types as the other. This matrix serves as the input matrix for data processing. For instance, when the initial operating scenario covers a one-year period, including data on solar irradiance, wind speed, load level, and electricity price of three nodes, the input matrix format should be \(8760 \times 12\). A year corresponds to 8760 h, and the four types of data for three nodes correspond to \(3 \times 4 = 12\).

Due to the potential large differences in the values of data among different types and nodes, it is necessary to calculate the distance between corresponding data of different operating points during data processing. To ensure that various types of data have equal weight, data standardization is required. This chapter employs the Min–max normalization method, as shown in Eq. (2.1).

$$ \begin{array}{*{20}c} {\alpha_{nor} = \frac{\alpha - \min }{{\max - \min }}} \\ \end{array} $$
(2.1)

where \(\alpha_{nor}\) represents the standardized data; \(\alpha\) represents the original data; \(\min\) represents the minimum value of the data set; \(\max\) represents the maximum value of the data set.

2.3.2 High-Fidelity Compression Based on Operational Week

In the operation data of the port microgrid, the data such as load and wind speed do not exhibit obvious diurnal patterns. In order to capture the fluctuation characteristics of the data over a longer period of time, the number of operating points is reduced by utilizing the similarity between operating weeks in the port. Firstly, all operating weeks included in the operation scenario are treated as an initial cluster, and then the number of clusters is reduced using hierarchical clustering until the remaining number of clusters meets the expected target.

The Euclidean distance is used to measure the difference between the data of each week, as shown in Eq. (2.2).

$$ \begin{array}{*{20}c} {E\left( {A,B} \right) = \sqrt {\left( {x_{A1} - x_{B1} } \right)^{2} + \left( {x_{A2} - x_{B2} } \right)^{2} + \cdots + \left( {x_{An} - x_{Bn} } \right)^{2} } } \\ \end{array} $$
(2.2)

where \(E\left( {A,B} \right)\) is the European distance between Week A and Week B; \(x_{A1}\) to \(x_{An}\) are the data in week A; \(x_{B1}\) to \(x_{Bn}\) are the data in week B.

The Ward’s minimum variance method is used as the criterion for merging clusters in the hierarchical clustering process. This method requires calculating the centroid of each cluster, using the formula shown in Eq. (2.3).

$$ \begin{array}{*{20}c} {\overline{I} = \frac{1}{\left| I \right|}\mathop \sum \limits_{A \in I} A} \\ \end{array} $$
(2.3)

where \(\overline{I}\) is the center point of class I; \(A\) represents all operating weeks included in Class I; \(\left| I \right|\) is the number of running weeks in class I.

Then, calculate the within-group sum of squares for each class, which is the sum of the squared Euclidean distances between all observations and the class centroid. The calculation formula is shown in Eq. (2.4).

$$ \begin{array}{*{20}c} {DSS_{I} = \mathop \sum \limits_{A \in I} \left[ {E\left( {A,\overline{I}} \right)} \right]^{2} } \\ \end{array} $$
(2.4)

where \(DSS_{I}\) is the sum of squares of the intra class dispersion of class I.

The within-class sum of squares can characterize the dispersion among the running weeks within each class and measure the clustering of elements within each class. As the number of running weeks in a class increases, its within-class sum of squares will be greater than or equal to the original within-class sum of squares. During the clustering process, the total sum of squares of the data is iteratively calculated, as shown in Eq. (2.5).

$$ \begin{array}{*{20}c} {SDSS = \mathop \sum \limits_{I \in data} DSS_{I} } \\ \end{array} $$
(2.5)

where \(SDSS\) is the sum of squares of the total dispersion of the data; \(data\) represents all data during data processing.

When two classes are merged, the total sum of squares of the data increases. Since a smaller sum of squares indicates a higher degree of clustering among the elements, the two classes with the minimum total sum of squares are selected for merging. After the clustering is completed, this chapter uses the running week closest to the centroid of each class as the representative week for that class, rather than the centroid of each class. This avoids weakening the fluctuations between adjacent running points within the representative week caused by taking the average of multiple weeks.

In the data processing process, the clustering algorithm reduces one running week every time it is run. When the entire running scenario contains i running weeks and needs to be reduced to j representative weeks, the clustering algorithm needs to be run i-j times. The number of running weeks contained in each class in the clustering result is the weight of the representative week.

2.3.3 Variable-Time-Scale Data Reconstruction

The operation of a microgrid at a port is closely related to the production and operational behavior of the port. Under unchanged operational tasks, the same operational state can persist for a long time. Therefore, to further reduce the data scale of typical scenarios, the time scale of each operational point can be dynamically adjusted based on the similarity between operational points, in order to reconstruct the operational scenarios. For example, when the operational state at the port changes little, the time scale of an operational point can be increased to represent multiple hours, thereby reducing the data scale. Conversely, when the operational state changes rapidly, the time scale of an operational point should be decreased to represent one hour, in order to retain key operational information.

Based on the data processing results from the previous stage, the continuous hierarchical clustering method is used to cluster operational points that represent adjacent weeks. Each operational point included in each representative week is used as an initial class, and the classes with the smallest difference and adjacent time series will be merged. Classes from different representative weeks cannot be merged. The difference between classes is measured by the distance between their center points, which are calculated using the same formula as Eq. (2.3). Weighted Euclidean distance, as shown in Eq. (2.6), is used to calculate the distance.

$$ \begin{array}{*{20}c} {WE\left( {X,Y} \right) = \frac{{2\sqrt {\sigma^{A} } }}{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\left| X \right|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left| X \right|}$}} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\left| Y \right|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left| Y \right|}$}}}}E\left( {X,Y} \right), \forall X,Y \in A} \\ \end{array} $$
(2.6)

where \(WE(X,Y)\) is the weighted Euclidean distance between class X and class Y; \(\sigma^{A}\) is the weight representing week A; \(\left| X \right|\) and \(\left| Y \right|\) are the number of elements in class X and class Y, respectively.

From Eq. (2.6), it can be seen that the distance calculation process incorporates weights representing inter-cluster proximity and intra-cluster element count. This approach amplifies the distances between different classes in high-weight inter-cluster proximity, making it more cautious when reducing running points in high-weight clusters. At the same time, it avoids having too many elements included in each class.

After the distance calculation between each class is completed, the two adjacent classes with the closest distance in time sequence are merged, and the centroid of each class is taken as the representative running point of that class. This process is repeated until the desired target number of representative running points is reached, completing the reconstruction of running points. The number of running points contained in each class is the weight of the representative running point for that class.

2.3.4 Data Output

In the preprocessing stage of port data, the data in the running scenario is standardized, so the processing results need to be restored to their original form in order to represent the correct meaning. The method of data restoration corresponds to the min–max normalization method, as shown in Eq. (2.7).

$$ \alpha = \alpha_{nor} \left( {\max - \min } \right) + \min $$
(2.7)

where \(\alpha\) is the restored data. \(\alpha_{nor}\) is the standardized data obtained after data processing.

The high-fidelity compression and reconstruction of data are based on representative operating points, which are the basic unit of composition. Multiple adjacent representative operating points on different time sequences constitute a representative week. Each operating point corresponds not only to multiple types of input data but also to corresponding week weights and operating point weights. The typical output scenario is a two-dimensional data matrix. For example, when three nodes with four types of data are reduced to 672 representative operating points, the output matrix format is \(672 \times 14\). The 672 corresponds to different operating points, the first 12 columns correspond to the four types of data from three nodes \(3 \times 4 = 12\), the 13th column corresponds to the representative week weight, and the 14th column corresponds to the representative operating point weight.

2.4 Optimal Configuration Model for Port’s Renewable Energies and Energy Storages

In order to show the advantages of the proposed high-fidelity compression and reconstruction methods for the microgrid's operating scenario, an optimization configuration model for source and storage in microgrid for ports is established. The model's decision variables include the installation capacity of each renewable energy source and energy storage device, and the objective function is to minimize the total system cost. The model parameters include representative week weights and operating point weights, which dynamically adjust the time scale of the operating points in the model.

2.4.1 Objective Function

The objective function includes three parts: investment cost (\(C^{inv}\)), operation cost (\(C^{ope}\)), and equipment residual value (\(C^{rem}\)). The investment cost includes the investment costs of each renewable energy source and each energy storage device. The operation cost includes the operating costs of each energy storage device and the system’s purchased electricity costs. The equipment residual value refers to the residual value that can be recovered when the equipment is retired or scrapped during the planning period.

$$ \begin{array}{*{20}c} {\min :C^{inv} + C^{ope} - C^{rem} } \\ \end{array} $$
(2.8)
$$ \begin{array}{*{20}c} {C^{inv} = \mathop \sum \limits_{n \in N} \mathop \sum \limits_{g \in G} A_{g}^{gen} \cdot C_{g}^{gen} \cdot \overline{P}_{gn}^{gen} + \mathop \sum \limits_{n \in N} \mathop \sum \limits_{s \in S} A_{s}^{sto} \cdot C_{s}^{sto} \cdot \overline{S}_{sn} } \\ \end{array} $$
(2.9)
$$ \begin{array}{*{20}c} {C^{ope} = \mathop \sum \limits_{{w \in {\Omega }}} \sigma_{w} \mathop \sum \limits_{n \in N} \mathop \sum \limits_{t \in w} \sigma_{t} \left( {\mathop \sum \limits_{s \in S} O_{s}^{sto} \cdot \overline{S}_{sn} + O_{wt}^{grid} \cdot P_{wnt}^{grid} } \right)} \\ \end{array} $$
(2.10)
$$ \begin{array}{*{20}c} {C^{rem} = \mathop \sum \limits_{n \in N} \mathop \sum \limits_{g \in G} B_{g}^{gen} \cdot R_{g}^{gen} \cdot \overline{P}_{gn}^{gen} + \mathop \sum \limits_{n \in N} \mathop \sum \limits_{s \in S} B_{s}^{sto} \cdot R_{s}^{sto} \cdot \overline{S}_{sn} } \\ \end{array} $$
(2.11)

where \(N\) is a collection of system nodes; \(G\) is the collection of renewable energy types; \(S\) is the set of energy storage devices; \({\Omega }\) represents the weekly set; \(A_{g}^{gen}\) and \(A_{s}^{sto}\) are the fund recovery rates of renewable energy \(g\) and energy storage equipment \(n \) within the planning cycle, respectively; \(C_{g}^{gen}\) is the power cost coefficient of renewable energy \(g\); \(\overline{P}_{gn}^{gen}\) is the rated power of renewable energy \(g\) at the node \(n\); \(C_{s}^{sto}\) is the capacity cost coefficient of the energy storage equipment \(s\); \(\overline{S}_{sn}\) is the installed capacity of the energy storage \(s\) device at the node \(n\); \(\sigma_{w}\) and \(\sigma_{t}\) are weights representing week \(w\) and operating point \(t\), respectively; \(O_{s}^{sto}\) is the operating cost per unit capacity of the energy storage equipment \(s\); \(O_{wt}^{grid}\) is the electricity price at the representative week \(w\) and operating point \(t\); \(P_{wnt}^{grid}\) is the active power transmitted from the grid to the node \(n\) at the representative week \(w\) and operating point \(t\); \(B_{g}^{gen}\) and \(B_{s}^{sto}\) are the residual value discount rates of renewable energy \(g\) and energy storage equipment \(s\) within the planning cycle, respectively; \(R_{g}^{gen}\) is the equipment residual value coefficient of renewable energy \(g\); \(R_{s}^{sto}\) is the equipment residual value coefficient of the energy storage equipment \(s\).

2.4.2 Constraints

  1. (1)

    Hybrid Energy Storage Constraints

The port microgrid system comprises various energy storage devices, and this model assumes a fixed ratio of capacity to power for each type of energy storage device. The time scale of operating points in the high-fidelity compression and reconstruction results of the port microgrid operating scenario depends on their weights. Therefore, when calculating the energy level of the energy storage device, this model takes into account the corresponding weights and matches different time scales accordingly.

$$ \begin{array}{*{20}c} {0 \le P_{wsnt}^{ + } \le \frac{{\overline{S}_{sn} }}{{\phi_{s} }}} \\ \end{array} $$
(2.12)
$$ \begin{array}{*{20}c} {0 \le P_{wsnt}^{ - } \le \frac{{\overline{S}_{sn} }}{{\phi_{s} }}} \\ \end{array} $$
(2.13)
$$ \begin{array}{*{20}c} {E_{wsnt} = E_{wsnt - 1} + \sigma_{t} \left( {\eta_{s}^{ + } P_{wsnt}^{ + } - \frac{{P_{wsnt}^{ - } }}{{\eta_{s}^{ - } }}} \right)} \\ \end{array} $$
(2.14)
$$ \alpha_{s}^{\min } \overline{S}_{sn} \le E_{wsnt} \le \alpha_{s}^{\max } \overline{S}_{sn} $$
(2.15)
$$ \begin{array}{*{20}c} {E_{wsnt0} = E_{wsnt\,\max } = \frac{1}{2}\overline{S}_{sn} } \\ \end{array} $$
(2.16)

where \(P_{wsnt}^{ + }\) and \(P_{wsnt}^{ - }\) are respectively the charging and discharging power of the energy storage device; \(\phi_{s}\) is the full power charging duration of the energy storage device; \(E_{wsnt}\) is the energy level of the energy storage device; \(\eta_{s}^{ + }\) and \(\eta_{s}^{ - }\) are respectively the charging and discharging efficiency of the energy storage device; \(\alpha_{s}^{min}\) and \(\alpha_{s}^{max}\) are the minimum and maximum charge levels of the energy storage device, respectively.

  1. (2)

    Renewable Energy Generation Constraints

Given the significant trend towards increasing penetration of renewable energy generation under the “Dual Carbon” goal, this model imposes a minimum share for renewable energy generation. The calculation of renewable energy generation takes into account the dual effects of representative weekly weights and operational point weights.

$$ \begin{aligned} \kappa & \cdot \left[ {\mathop \sum \limits_{w \in \Omega } \sigma_{w} \mathop \sum \limits_{n \in N} \mathop \sum \limits_{t \in w} \sigma_{t} \left( {P_{wnt}^{{{\text{grid}}}} + \mathop \sum \limits_{g \in G} P_{wgnt}^{{{\text{gen}}}} } \right)} \right] \\ & \quad \le \begin{array}{*{20}c} {\mathop \sum \limits_{w \in \Omega } \sigma_{w} \mathop \sum \limits_{n \in N} \mathop \sum \limits_{t \in w} \sigma_{t} \mathop \sum \limits_{g \in G} P_{wgnt}^{{{\text{gen}}}} } \\ \end{array} \\ \end{aligned} $$
(2.17)

where \(\kappa\) is the proportion of renewable energy power generation to the total power supply of the system; \(P_{wgnt}^{{{\text{gen}}}}\) refers to active power generation from renewable energy.

  1. (3)

    Other Constraints

The model's constraints include branch flow constraints, node injection power balance constraints, and system operating constraints, as shown in Appendix 1. Among them, the branch flow constraint contains nonlinear terms. To facilitate solving, the nonlinear model is transformed into a second-order cone programming model, as shown in Appendix 2. Literature [14, 15] has proven that the second-order cone transformation is strictly accurate for radial distribution networks when the objective function is a convex function and a strictly increasing function. The optimized configuration model for the source and storage of the transformed port microgrid can be quickly solved using the commercial software Gurobi.

2.5 Case Studies

2.5.1 Case Description

This chapter utilizes the topology structure and historical data of the Shijiu Port microgrid in Rizhao Port to verify the accuracy and effectiveness of the proposed method. The microgrid at this port is planned to include wind power and photovoltaic power, connected to the superior power grid via a public connection point (PCC point). To increase the proportion of clean energy in the total energy supply, the amount of wind and photovoltaic power generation is limited to more than half of the total system power generation. Additionally, to improve the overall performance of the energy storage system, including longer cycle life and excellent discharge performance, the microgrid contains two types of energy storage devices: lithium-ion batteries (S1) and lead-acid batteries (S2). The topology and location of the installed equipment in the port microgrid are shown in Fig. 2.4. The planning period for the calculation example is four years. The time-of-use electricity price is based on the general industrial and commercial electricity price announced by the National Development and Reform Commission. The parameters for renewable energy and energy storage are shown in Table 2.1.

Fig. 2.4
figure 4

The topology of Rizhao Port microgrid system

Table 2.1 Device parameters

This chapter employs a Vector Autoregression (VAR) model to generate future data on wind speed, irradiance, and load for a microgrid operating in a port based on 6 years of historical data from the Rizhao Port. To simplify the calculation process, the wind speed and irradiance are converted into generating equipment capacity factors during the data extraction and integration stage.

The case study also includes the results of typical day scenario generation method (Method 1) and typical week scenario generation method (Method 2) to compare and analyze the effectiveness of the high-fidelity compression and reconstruction method for the microgrid operating scenarios in the port proposed in this chapter (Method 3). The three methods generate typical scenarios with the same data compression rate based on the operating scenarios. Based on the elbow rule, the compression rate of the data is determined to be 1.92% (672 ÷ 4 ÷ 8760 ≈ 1.92%) by finding the “elbow” point according to the degree of data distortion, which reduces 4 years of data to 672 operating points. Method 1 generates 28 (28 × 24 = 672) typical operating days; Method 2 generates 4 (4 × 7 × 24 = 672) typical operating weeks; Method 3 generates 16 representative weeks during the high-fidelity compression stage based on the operating weeks, and further reduces the 16 representative weeks to 672 operating points during the data reconstruction stage based on the operating points.

2.5.2 Analysis of Typical Scenario Results

The representative scenarios generated by the operating scenarios directly affect the optimization results of microgrid configuration in green ports. Load is the most fundamental parameter in typical scenarios, and the ability to retain the load distribution during different methods of data processing is measured by load duration curve. The load duration curve is not arranged according to the chronological order, but according to the load changes within a statistical period, and is rearranged and composed according to the accumulated duration of each different load value within the period. When the level of renewable energy generation is high, the system may need to charge the energy storage devices and generate curtailed wind and solar power; when the level of generation is low, it may be supplied by grid purchasing and energy storage. Therefore, the distribution of the renewable energy generation capacity factor is also important. Analogous to the load duration curve, wind power duration curve and solar power duration curve are established. The comparison of the duration curves for the initial operating scenarios and the typical scenarios obtained by different methods is shown in Fig. 2.5.

Fig. 2.5
figure 5

The comparison of duration curves

From the figure, it can be seen that for the load duration curve, the fitting results of the three methods are all good. However, for the wind power and solar power duration curve, the fitting effect of methods 1 and 3 is significantly better than that of method 2. To quantify the fitting effect of the curve, a fitting accuracy index is established, and the calculation method is shown in Eq. (2.18).

$$ \begin{array}{*{20}c} {RMSD\left( {X,Y} \right) = \frac{{\sqrt {\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} (X_{i} - Y_{i} )^{2} } }}{\max (Y) - \min (Y)}} \\ \end{array} $$
(2.18)

where \(X\) is the duration curve corresponding to a typical scenario; \(Y\) is the duration curve corresponding to the original data. The smaller the value of RMSD, the closer the two curves are.

Table 2.2 presents a comparison of the duration curves of three methods. It can be observed that method 1 provides the closest fit to the load, and also performs well in fitting the duration curves of wind and photovoltaic power. Method 2 exhibits significant errors in fitting the duration curve of wind power, indicating that the four typical operating weeks selected by this method do not fully represent the wind power generation scenarios. Considering the fitting results of the duration curves of load, wind, and photovoltaic power, method 3 yields the highest accuracy in fitting.

Table 2.2 The RMSD of duration curves

Another measure of the accuracy of typical scene is the correlation between different types of data. The correlation between load and renewable energy generation capacity coefficient reflects the degree of synchronization between demand and supply, which determines the net load of the port microgrid. The correlation between different renewable energy generation coefficients reflects the degree of complementarity between two generation technologies. The correlation between electricity price and load, and the renewable energy generation coefficient directly affects the system's purchasing cost. The Pearson correlation coefficient is used to measure the correlation between different types of data, and the calculation method is shown in Eq. (2.19).

$$ \begin{array}{*{20}c} {PCC\left( {X,Y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X}} \right) \cdot \left( {Y_{i} - \overline{Y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} (X_{i} - \overline{X})^{2} } \cdot \sqrt {\mathop \sum \nolimits_{i = 1}^{n} (Y_{i} - \overline{Y})^{2} } }}} \\ \end{array} $$
(2.19)

where \(\overline{X}\) and \(\overline{Y}\) are the average values of data X and Y. The calculated result of \(PCC\) is between −1 and 1. “1” represents a complete positive correlation between the two types of data, “−1” represents a complete negative correlation, and “0” represents no correlation.

The correlation coefficients between different data types in the typical and initial scenarios obtained by different methods are shown in Fig. 2.6. Table 2.3 lists the errors between the data correlations in the typical scenario and the original data correlations. It can be seen that, except for the poor performance in preserving the correlations between “wind power-photovoltaics” and “wind power-electrovalence”, the correlation coefficients obtained by method 1 are generally close to those of the original data. The correlation coefficients obtained by method 2 are relatively poor, indicating that the data characteristics of the initial scenario cannot be well preserved when there are 672 operating points in the typical scenario. Among the three methods, method 3 preserves the data correlations best, except for the weaker correlations between “load-wind power” and “photovoltaics-electrovalence” compared to method 1, the correlations between other data types are significantly better than those of the other methods.

Fig. 2.6
figure 6

Comparation of correlation coefficient

Table 2.3 Correlation coefficient error between raw data and processed data

2.5.3 Analysis of Optimization Configuration Results of Renewable Energy and Energy Storage

Firstly, the optimization configuration was performed using original data spanning 4 years as the baseline to evaluate the accuracy of other methods. Then, the optimized configuration model was solved using the typical scenarios obtained from the three methods as input to obtain the optimized configuration results of the port microgrid source and storage.

Table 2.4 shows the results obtained by each method and their errors relative to the baseline. It can be seen from the table that the wind power capacity obtained by the optimization configuration of method 1 is closest to the baseline among the three methods, and the error of the configuration result's total cost is also relatively small. However, the storage capacity configured by this method is too large, especially the lead-acid battery capacity is much larger than the baseline. This is because the typical scenario obtained by method 1 fits poorly with the photovoltaic data, resulting in a small photovoltaic capacity in the configuration result, which requires the installation of more storage equipment to supply peak loads. At the same time, the larger storage capacity also enables method 1 to fully utilize the time-of-use electricity pricing to reduce costs, and the purchased electricity cost is significantly lower than the baseline case. In addition, method 1 uses the typical day as the basic unit of scenario construction, and the storage equipment operates on a daily basis, making it difficult to balance energy over longer time intervals, which also increases the difference between the storage configuration results and the baseline.

Table 2.4 Optimal sizing results of port microgrid

The overall optimization configuration results of method 2 are the worst, except for the total cost and purchased electricity cost, which have some reference value, the configuration capacity of the power generation and storage equipment has large errors compared with the baseline. The typical scenario constructed by this method poorly characterizes the light and wind conditions, resulting in an overestimation of the wind power capacity and an underestimation of the photovoltaic capacity in the optimization configuration results. Meanwhile, wind power does not have an obvious intraday pattern, and its output fluctuation cycle is longer. Therefore, compared with the baseline case, the lead-acid battery capacity with a large capacity-to-power ratio is overestimated, and the lithium battery capacity with a small capacity-to-power ratio is underestimated in the configuration results of method 2.

The optimized configuration results obtained by method 3 are the closest to the baseline, and the error of the total cost is only 0.60%. The maximum capacity error is observed for the lead-acid battery, which is 24.31%, but compared with the other two methods, this result still has a significant advantage. This is because method 3 preserves the system operating characteristics more accurately with the same amount of data by dynamically adjusting the time scale of the operating points, which makes the configuration results of wind and photovoltaic more accurate and conducive to storage capacity configuration. At the same time, the operating cycle of this method is one week, which allows the storage to play a role over a longer operating cycle compared to method 1, making the storage capacity configuration more accurate. Therefore, it can be seen that the method proposed in this chapter can be effectively applied to the optimization configuration of renewable energy and hybrid storage in port microgrids.

2.5.4 Calculation Time Analysis

The purpose of using typical scenarios for optimization configuration is to reduce the difficulty of problem solving and improve the speed of model solution. The results of the numerical examples were obtained using a computer with 8 cores, a main frequency of 3.6 GHz, and 64 GB of memory. Figure 2.7 compares the total computation time for optimizing source and storage using the three methods, including both model solving time and data processing time. Method 3 has a longer data processing time compared to Methods 1 and 2, but considering the model solving time, the total time for all three methods remains at the same level. This indicates that the proposed method in this chapter does not increase the difficulty of problem solving while improving the accuracy of the solution. In addition, when using the initial running scenario for optimization configuration, the solving model took 25 h, which fully demonstrates the effect of applying typical scenarios in accelerating problem solving speed.

Fig. 2.7
figure 7

Comparation of calculation time

2.6 Conclusion

The operational status of a port microgrid is closely related to the production and berthing of ships. Describing the system's operating characteristics directly through natural periods such as days and weeks would generate redundant information. This chapter proposes a high-fidelity compression and reconstruction method for port microgrid operating scenarios, which effectively increases the data density of the system's typical operating scenarios. The method combines weeks with similar operating characteristics using the Ward minimum variance method, and then proposes a hierarchical clustering method that dynamically adjusts the time scale of each operating point in the representative week to further reduce redundant information. With the high-fidelity compression and reconstruction method, an optimization model for the source and storage configuration of the port microgrid is established. An example analysis is conducted using Rizhao Port to demonstrate that the proposed method better preserves the fluctuation characteristics of the port's source and load and the correlation between different types of data. In addition, the proposed method provides more accurate source and storage optimization configuration results without increasing the computational resources required for problem-solving, with a total cost error of only 0.60%, which is better than existing methods.