Synthetic weather data generation for simulations
This paper uses a copula-based approach to generate synthetic weather data for simulation. The approach is described in Niemi and Sill Torres (2021). That paper listed some features that were not fully developed at that time. Therefore, this paper provides an update to the approach. This section consists of an introduction to the approach in Sect. 2.1.1 and an update in Sect. 2.1.2. Earlier paper (Niemi and Sill Torres 2021) noticed that the approach lacked means to deal with negative values and seasonality. Sections 2.1.3 and 2.1.4 describe solutions for these issues.
Description of the copulas approach
Copulas are the mechanisms which allow us to isolate the dependency structure in a multivariate distribution (Nelsen 2006). A copula can be constructed by separately specifying the marginal distributions for each dimension and the copula. In the Sklar’s theorem, for a n-dimensional cumulative distribution function F with marginals F1, …, Fn, exists a copula C such that
$$F\left(x_1,\dots,x_n\right)=C\left(F_1(x_1),\dots,F_n\left(x_n\right)\right)$$
(1)
for all xi \(\in\) [-∞, ∞] and i = 1, …, n. In a Gaussian copula, F is a multivariate normal distribution. A particular interest for using a copula is to generate random variables from the multivariate distribution. In this case, the variables are wind speed and wave heights, where these variables have a realistic correlation. As described in Niemi and Sill Torres (2021), our approach uses the Gaussian multivariate class from the Copulas Python library (MIT 2018). This class implements a multivariate distribution by using a Gaussian copula to combine marginal probabilities estimated using univariate distributions. The class automatically chooses the best fitting univariate marginal distribution for each dimension. The Copulas library contains the following univariate distributions: Beta, Gamma, Gaussian, Logarithmic-Laplace, Student T, and truncated Gaussian distributions, and Kernel-Density Estimate that uses a Gaussian kernel. The approach generates time series where the next wind speed sn+ 1 and wave height hn+ 1 would be conditional to the current wind speed sn and wave height hn. Generating the random time series is achieved by forming a four-dimensional copula where the dimensions are wind speed sn and wave height hn, wind speed change ∆s, and the wave height change ∆h. As the current values are known, the particular problem is to generate values ∆s, and ∆h that are conditional to sn, wave height hn. Next values are calculated with the following equations:
$$sn+1=sn+\Delta s,$$
(2)
and
Creating the conditional values ∆s, and ∆h uses the knowledge that the copula is Gaussian and is achieved by transforming the multivariate Gaussian distribution to a conditional one. The mean values of a four-dimensional distribution are as follows:
$$\overline{\upmu }= [{\mu }_{1} {\mu }_{2} {\mu }_{3} {\mu }_{4}]T,$$
(4)
and the covariances are
$$\overline\Sigma=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}&\Sigma_{13}&\Sigma_{14}\\\Sigma_{21}&\Sigma_{22}&\Sigma_{23}&\Sigma_{24}\\\Sigma_{31}&\Sigma_{32}&\Sigma_{33}&\Sigma_34\\\Sigma_{41}&\Sigma_{42}&\Sigma_{43}&\Sigma_{44}\end{bmatrix}.$$
(5)
Let us denote
$$\overline{\mathrm C}=\begin{bmatrix}\Sigma_{13}&\Sigma_{14}\\\Sigma_{23}&\Sigma_{24}\end{bmatrix}\begin{bmatrix}\Sigma_{33}&\Sigma_{34}\\\Sigma_{43}&\Sigma_{44}\end{bmatrix}^{-1}.$$
(6)
For a condition that x3 is normalized sn and x4 is normalized hn, a new distribution can be formed where
$$\overline\mu\ast=\begin{bmatrix}\mu_1\\\mu_2\end{bmatrix}+\overline C\begin{bmatrix}\begin{pmatrix}x_3\\x_4\end{pmatrix}-\begin{pmatrix}\mu_3\\\mu_4\end{pmatrix}\end{bmatrix},$$
(7)
and
$$\overline\Sigma^\ast=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}\\\Sigma_{21}&\Sigma_{22}\end{bmatrix}-\overline{\mathrm C}\begin{bmatrix}\Sigma_{31}&\Sigma_{41}\\\Sigma_{32}&\Sigma_{42}\end{bmatrix}.$$
(8)
A new random value pair x1, x2 is generated with the conditional distribution. The new values, ∆s and ∆h, are formed when the values with the same percent points are calculated from their univariate marginal distributions. The concept is shown in Fig. 1.
Update to the approach
Section 2.1.3 describes a solution for an issue that the approach can generate negative values, and Sect. 2.1.4 introduces seasonality to the approach by using copulas that fitted with season specific data. The solutions are validated with the ERA5 weather data-set that consists of hourly estimates of various weather characteristics and is provided by the European Centre for Medium-Range Weather Forecasts (Hersbach 2018). The weather data are from years 2019 to 2020 and from the North Sea area, which is of high interest for the German offshore wind energy industry. The location is near the Helgoland, which is located about the same distance from the shore as the most of the German wind farms. The exact coordinates are shown in Fig. 2, with the locations of the German offshore wind farms. We use 10 m u, and v component of wind to calculate wind speed s with the following equation:
$$s= \sqrt{({u}^{2}+{v}^{2})}.$$
(9)
Wave height h is the significant height of combined wind, waves, and swell and is provided by Hersbach (2018).
Solving an issue with negative values
A condition exists where the current wind speed or wave height value is small, and the approach generates a large negative change. In this case, Eqs. (2) or (3) will produce a negative wind speed or wave height value, which is not realistic. Paper (Niemi and Sill Torres 2021) provided two optional ways to deal with negative values, when calculating the next value vn + 1. These were using the absolute values:
$$v_{n+1}=\left|v_n+\Delta v\right|,$$
(10)
or forcing the negative values to be zero:
$$v_{n+1}=\left\{\begin{array}{cc}v_n+\Delta v&\mathrm{if}\;v_n+\Delta v>0\\0&\mathrm{otherwise}\end{array}\right..$$
(11)
These two ways can be compared by generating synthetic data with each equation, and comparing the resulting series to the real data. Figure 3 shows the point error between real cumulative wind speed distribution and ones calculated from the synthetic data.
Surprisingly, from the results shown in Fig. 3, the cumulative error is the lowest with the original approach that generates negative values. Forcing negative values to be zero leads to smaller error than using the absolute values. This result seems reasonable as the difference between the original negative value and zero is smaller than it would be when compared to absolute value.
Introducing seasonality
In literature, seasonality of weather has been introduced by fitting a copula with data from specified time interval. A paper by Leontaris G. et al. had an individual copula for each month (Leontaris et al. 2016), and Jäger & Morales Nápoles only considered winter (Jäger and Morales Nápoles 2017). Here we consider one copula for the winter period, and another for summer. Obtained results are compared to data produced with one copula. Like in Jäger and Morales Nápoles (2017), we assume the oceanographic winter period in the North Sea to be between 1st of November and 15th of April.
Figure 4 compares all data approach to seasonal copulas approach by again plotting the differences between real and synthetic cumulative distributions. When the absolute errors between distribution are calculated, the seasonal copula is slightly better than the one calculated from all data.
Synthetic weather data overview against reality
The simulated weather conditions are based on copula approach described in the previous section. In this step of validation, we use a number of historical data sources to compare obtained results with the reality:
-
ERA5—(54° 18′ N, 7° 78° E) wind speed (WSPD) and wave height (WVHT) data provided by the European Centre for Medium-Range Weather Forecasts. Since this source of data was used for copula data generation, we use them as a reference point.
-
NOAA—(58.270′ N 138.019′ W) data obtained from the National Data Buoy Center (NDBC) of the National Oceanic and Atmospheric Administration (NOAA) (NOAA 2021). The data have been collected at the 46083 station, with 1-h resolution. The geographical location of the station is in the Gulf of Alaska area, but the latitude corresponds to the North Sea area data sources.
-
UFS G-B—(54° 10.775′ N, 7° 27.523′ E) wind speed data provided by the Germany’s National Meteorological Service-Deutscher Wetterdienst (DWD) from the unmanned vessel “UFS German-Bight” (DWD 2021).
While investigating parameters in Table 1, it is important to underline that each marine reservoir has its unique characteristic. Keeping in mind that copula results were based on ERA5 data set, it is visible that NOAA data can be described as outlier among other data sets. In particular, the fact that average lower wind speed (6.33 ms−1) produces higher waves (2.09 m) than other examples requires an explanation. A simple comparison between the two maritime areas—Gulf of Alaska and the North Sea—suggests significant impact of wind fetch in the case of NOAA data. It is suspected that a long fetch at the Gulf of Alaska produces higher waves. This phenomenon is not so visible in the North Sea area surrounded by lands from three directions. As a result, it is noticeable that copula method reflects local characteristics of data source—i.e., ERA5, placing the obtained results closer to UFS G-B and ERA5 data sets. The observed tendency also suggests that in case of extreme weather conditions, copula method requires corresponding data sources which may be challenging to acquire in case of local events like downbursts and micro-downbursts.
Table 1 Selected statistic parameters of weather data sets