1 Introduction

The knowledge of extreme wind patterns associated with speed and direction is important for studies on climatology, hydrology, and engineering applications in structural projects related to onshore and offshore activities, wind farms, and oil and gas exploitation (Holmes and Moriarty 1999; Cheng and Yeung 2002; Rajabi and Modarres 2008; Michler-Cieluch et al. 2009).

The south and southeastern coast of Brazil is constantly affected by the passage of cold fronts with cyclones developing throughout the region. These meteorological systems can cause severe storms with risk of damages along the coastline.

Extratropical cyclones, which are considered to be the most severe events that reach the region between 40° and 25°S, can eventually cause considerable damages in onshore and offshore structures. de Oliveira (2004, 2009) and de Oliveira et al. (2008, 2009) verified that the sea level rising (storm surges) was caused by extratropical cyclones located in these geographic positions, affecting the south and southeastern Brazilian coastline.

de Oliveira (2009) analyzed tide gauge series from the southeast coast and wind speed from reanalysis grid points 0, 1, and 2 (Fig. 1), verifying high energy values around 7.5- and 5-day periods. These periods are related to the intervals of frontal system passages over the cyclogenetic region.

Fig. 1
figure 1

The map shows part of South America with the track of the most cold fronts that reach the south and southeast Brazilian coast. This track is represented by the grid points of the wind components obtained from the NCEP/NCAR reanalysis data set. The respective grid points are: grid0_40°56′S/52°30′W, grid1_35°14′S/50°38′W, grid2_29°31′S/45°00′W, grid3_25°42′S/45°00′W, grid4_25°42′S/46°52′W, and grid5_23°481 S/43°07′W

South and southeastern Brazilian oceanic areas are characterized by cyclogenetic activities with cold fronts moving in the southwest–northeast direction (Gan and Rao 1991). The authors found that the extratropical cyclones over South America are more frequent in winter than in any other season. Earlier, Satyamurty et al. (1990) found that the highest cyclogenesis frequency was obtained for the summer season. Both studies agreed on one of the most active cyclogenetic areas, namely the Gulf of San Mathias (42.5°S, 62°W) during summer and over Uruguay (around 31°S, 55°W) during winter. Sinclair (1995) and Simmonds and Keay (2000) observed that the east region of Argentina, Uruguay, and southern Brazil have a larger density of cyclogenesis resulting in more frequent occurrence of the phenomenon during winter and fewer during summer. Mendes (2006) verified that cyclones also travel longer distances during winter. Extratropical cyclones that reach the South Atlantic Ocean near the Brazilian coast occasionally become stronger in the Uruguayan and Argentinean coasts generating extreme winds that eventually reach the south and southeast coast of Brazil (Seluchi 1995; Gonçalves 2007). Mendes et al. (2010) verified that the predominant tracks of the extratropical cyclones, and their interannual variability, affect weather in South America. Fedorova et al. (2007) verified that the extreme wind speeds over this region are related to large-scale systems such as cold fronts associated with low pressure systems, instability lines, as well as mesoscale convective systems. de Oliveira (2009) verified that there is a positive tendency on occurrence number of extreme wind speed events in the South Atlantic Ocean region between 40°S and 23°S (from north Argentina to southeast Brazil), using six grid points of the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis data set (Kistler et al. 2001) from 1975 to 2006 period.

In the South Atlantic Ocean, the time series for wind speed are too sparse in spatial and temporal scales to analyze and simulate storms. The limited number of these series, with uninterrupted long periods (between 30 and 50 years or more), raises difficulties for characterizing the behavior of severe events in these region as well as for extrapolating return levels with long return periods.

Extreme value analysis (EVA) informs about the tails of time series of random processes through the statistical study of the inherent properties of these extremes (Fisher and Tippett 1928). This approach can be used for risk analysis to estimate eventual losses, through the modeling of the behavior of less frequent or even rare of certain phenomena (Embrecht et al. 1997).

Some problems are expected when we apply EVA to meteorological variables because they are rarely independent, as the theory requires. But it is possible if there are a large number of observations, the variables are identically distributed, and there is no serial correlation between successive occurrences of extreme values (Sen 1997).

The use of the generalized extreme value (GEV) and generalized Pareto distributions (GPD) have several applications for environmental data because of their appropriateness for modeling extremes (Brabson and Palutikof 1999; Katz et al. 2002; Ramesh and Davison 2002; Assumpção 2004; Bautista et al. 2004; Bazán 2005; Silva and Zocchi 2006). Application of the GEV distribution implies the use of long annual extreme value series (more than 30 years of observation) but, even so, a considerable loss of information can occur. The GPD presents an advantage over the GEV distribution because it uses more relevant information of extreme using the excesses over a threshold (peaks over threshold, POT) instead of only maxima taken typically over long blocks of time. This method considers, instead of annual maxima, excess over a sufficiently high threshold in the time series (Mendes 2004). Hence, the data set is enlarged to decrease the sampling uncertainty. This method has gained wide acceptance in the extreme value estimation due to Pickands’ (1975) work in 1975—the year when the GPD was proved to be the limiting distribution of peaks (An and Pandey 2005). As the meteorological variables tend to present successive dependent extreme values, the technique of declustering was applied, which considers successive extremes as belonging at the same event.

The aim of this work is to apply extreme value analysis to the wind data set from NCEP/NCAR reanalysis to estimate extreme wind speeds over the oceanic region between 40°S and 23°S. These latitudes are continuously affected by the passage of cold fronts with cyclones. We used the software extRemes, developed at NCAR, to select the best fitting extreme value distributions for each grid point and then to search climatological characteristics of the extreme distributions in this area.

The next section presents the data and methodology, Section 3 presents the results and discussion, and Section 4 summarizes the major findings.

2 Data and methodology

2.1 Data

Long-term series of meteorological experimental data on the South Atlantic Ocean region next to the Brazilian coast, necessary for research on extremes, are scarce. Thus, in this work, we decided to use the reanalysis data set from the NCEP/NCAR. The climatological series utilized is based on the 32 years (1975–2006) of zonal (U) and meridional (V) wind components on 10-m height level at 00, 06, 12, and 18 UTC. It was obtained over the ocean region bounded at 23°S and 40°S and 42°W towards the south and southeastern Brazilian coast. The wind data are considered to be from the most accurate class of data in the NCEP/NCAR reanalysis data set (Kalnay et al. 1996). They are considered type A variables (except in the Tropics), being strongly influenced by the available observations, rendering them more reliable (Kistler et al. 2001).

The quality of this data set pre-1979, before the assimilation of satellite data, is questionable in the southern hemisphere (Tennant and Reason 2005). However, in this work, the period of January 1, 1975 to December 31, 2006 was used to develop extreme analysis modeling. de Oliveira (2009) used basic statistics for the 1975–1979 period, confirming the stationarity of the series with respect to other periods. Several researches use NCEP/NCAR reanalysis data set to study the behavior of extreme meteorological variables, as can be seen in (Brooks et al. 2003; Zolina et al. 2004; Fang et al. 2008).

The wind speed and direction time series are calculated from these data for the six grid points throughout the meridional coast from north of Argentina to southeast of Brazil. This region is related to the main direction of the passage of frontal systems that affect the Brazilian coastline and these grid points represent the main track of these meteorological systems. Thus, our domain in searching to identify the wind extreme values, in the area, is bounded at 23°S and 40°S and 42°W towards the Brazilian coast. Figure 1 shows the study area and the grid points utilized in this work.

2.2 Methodology

EVA is the application of results from extreme value theory to investigations concerning extreme or rare phenomena. The theory states that under certain regularity conditions, if the maximum of random variables taken over suitably large blocks have a non-degenerate distribution, then that distribution must be the GEV distribution. Similarly, for excesses over a suitably high threshold, analogous results state that their distribution is the GPD. A point process model can also be adapted, which can be used to demonstrate the connection between the GEV and GPD. The theory is well established, and more details can be found in numerous texts (e.g., Coles 2007). It is helpful here to show the log-likelihoods for the various models used here because different texts use different parameterizations. The log-likelihood function for the GEV distribution (1) is given by

$$ \begin{gathered} l(\mu, \sigma, \xi ) = - n\log \sigma - (1 + 1/\xi )\sum\limits_{i = 1}^n {\log \left[ {1 + \xi \left( {\frac{{{z_i} - \mu }}{\sigma }} \right)} \right]} \hfill \\- \sum\limits_{i = 1}^n {\left[ {1 + \xi \left( {\frac{{{z_i} - \mu }}{\sigma }} \right)} \right]}_{+}^{ - \frac{1}{\xi }} \hfill \\\end{gathered} $$
(1)

Note that the GEV is a family of distributions where the shape parameter determines the type of distribution that results; specifically, the reverse Weibull (bounded upper tail), Gumbel (light tail), or the Frechet (heavy tail). The case for the Gumbel (ξ = 0) requires special treatment because it is a single point in a continuous parameter space, and therefore, will not be estimated by maximizing the log-likelihood (1) above with probability 1. The log-likelihood for the Gumbel case is

$$ l(\mu, \sigma ) = - n\log \sigma - \sum\limits_{i = 1}^m {\left( {\frac{{{z_i} - \mu }}{\sigma }} \right)} - \sum\limits_{i = 1}^n {\exp \left\{ { - \left( {\frac{{{z_i} - \mu }}{\sigma }} \right)} \right\}} $$
(2)

The log-likelihood for the GPD is given by

$$ l{ }(\sigma, \xi ) = - k\log \sigma - \left( {1 + \frac{1}{\xi }} \right){\sum\limits_{i = 1}^k {\log \left( {1 + \frac{{\xi {y_i}}}{\sigma }} \right)}_{+} } $$
(3)

where k represents the number of excesses over the threshold u.

For the case where ξ = 0,

$$ l(\sigma ) = - k\log \sigma - \frac{1}{\sigma }\sum\limits_{i = 1}^k {{y_i}} $$
(4)

Finally, for the PP, there are two approaches for estimating the parameters: the orthogonal and GEV re-parameterization methods. The orthogonal approach involves fitting the data to the rate parameter and intensity separately. In other words, the GPD is simply fitted using the likelihood (3) or (4). The maximum likelihood estimation (MLE) for the rate parameter in the stationary case is the mean of the excess times. For the latter approach, the likelihood function to maximize is

$$ {L_A}(\mu, \sigma, \xi :{x_1}......{x_N}) \propto \exp \left\{ {{n_{\rm{y}}}{{\left[ {1 + \xi \left( {\frac{{u - \mu }}{\sigma }} \right)} \right]}^{ - \frac{1}{\xi }}}} \right\}\mathop {\Pi }\limits_{i = 1}^{N(A)} \frac{1}{\sigma }{\left[ {1 + \xi \left( {\frac{{{x_i} - \mu }}{\sigma }} \right)} \right]^{ - 1/\xi - 1}} $$
(5)

where n y is the number of observations per year (or other desired time period), and N(A) is the number of threshold excesses.

The R software package, extRemes, from the NCAR, is used to perform the extreme modeling following Gilleland and Katz (2005, 2006). The methods used for each grid point are described below:

  • Generalized extreme value distribution (GEV) and Pareto distribution (GPD) are applied to annual and daily data, considering block maxima for the GEV distribution and peaks over a threshold for the GPD, respectively. As the interest of the present research is to model absolute excesses, thresholds are considered to be constant.

  • The Pareto–Poisson distribution (GPD-P) is also fit to daily excesses over a threshold. This model will give similar results to the GPD alone, but also incorporates information about the frequency of extreme events (Gilleland and Katz 2006).

  • Threshold selection is purely a statistical process. Extreme value distributions are justified only for the excesses over a high threshold, so the chosen threshold needs to be high enough that the assumptions for the GPD are valid. However, it also needs to be low enough so that there are enough data that subsequent confidence intervals will not be too wide. This is not an issue here because the chosen threshold is low enough as to be able to make meaningful inferences for meteorological purposes, in this case pertaining to extreme winds. In this work, the methods of the mean excess or mean residual life plot and the plots of scale and shape parameters against thresholds are used to choose the best ones (Coles 2007).

  • MLE is used to estimate the distribution parameters.

  • Probability and quantile plots are used to evaluate the validity of the assumptions for applying the extreme value distributions used here.

  • Return levels associated with 1/p return periods are calculated. Confidence intervals are estimated by the delta method, which assumes that the return levels are normally distributed. For shorter return periods, this assumption is generally valid; but for longer return periods, the distributions are typically fairly skewed generally leading to intervals that are too wide, particularly for the lower bounds.

  • The extremal index (θ), which measures the degree of independence among excesses, is calculated for the wind extremes, and the higher its value, the greater the independence of excesses over a threshold.

  • Declustering–independent storm method is used to obtain excess data that are more likely to be independent than the raw excesses.

  • Specifically, the runs declustering method is employed, which defines clusters as starting at the first occurrence of an excess and ending after r consecutive values drop below the threshold. Different choices for r are investigated here on cluster identification. In this paper, the GPD and GPD-P approaches are fit without and with the runs declustering method.

3 Results and discussion

3.1 GEV

The Weibull distribution presented the best fit for wind annual maximum speeds for the grid points 0, 1, 2, and 4. However, for grid points 2 and 4, confidence intervals for the shape parameter include zero so that the Gumbel hypothesis cannot be rejected. The shape parameter values for grid points 3 and 5 are very close to zero, with zero firmly inside confidence intervals. Therefore, the Gumbel distribution fits best to the annual maximum series at these grid points. Table 1 shows the best fitting model for each grid point with the respective standard deviation. Table 1 also shows the return levels associated with the 100-year return period (estimated from the best fitting distribution for each), as well as their maxima and mean values (estimated from the series themselves). Note that the predicted 100-year return levels are close to the maxima that already occurred in the region.

Table 1 Shape, location, and scale parameters of GEV distributions with their respective standard errors in parentheses

As can be seen in Fig. 2, most reanalysis data fall within the 1–10-year return periods suggesting that strong wind with low probability (high-return periods of 50–100) have rarely measured in the region. The limits of the confidence interval for return period values over 100 years show a curve very distant from the straight line, mainly for values related to grid point 3. The large confidence intervals for extreme return levels show that there is not enough information to make predictions with any degree of certainty to return periods over 100 years (Coles 2007). The Weibull distribution has a bounded upper tail leading to a convex return level plot with return levels reaching a plateau at this upper bound.

Fig. 2
figure 2

Return level plots of return period of annual maximum wind speed with 95% confidence intervals calculated by the delta method for each grid point

Figure 3 shows the probability and quantile plots for the GEV fitted to annual maxima. The plots show that the assumptions necessary to justify using the GEV are met because the lines are reasonably straight in all cases, with the exception of grid points 3 and 5, which show some curvature.

Fig. 3
figure 3

Plots of probability and quantile of annual maximum wind speed for each grid points

3.2 Excess over threshold—POT model and the GPD model

Table 2 shows the thresholds selected above which the excesses were fitted to the GPD with the number of excesses over the respective thresholds for each grid point. The number of occurrences of those extreme values per year corresponds to the Poisson process rate parameter (λ). The GPD is fit to the tails of daily maxima wind speeds using thresholds around 13 to 15 m/s (45 to 55 km/h) for grid points 0 and 1 yielding λ between 25 and 32 occurrences per year. Those values represent independent events and are in agreement with the values found by Gan and Rao (1991) for cyclogenesis in the region.

Table 2 Excess, λ, shape, and scale parameters of GPD distributions

The parameters of the distributions fit by the GPD model for excesses with the respective standard errors are also shown. Shape parameter (ξ) values indicate that data were best fitted by the beta distribution at all grid points. In points 0, 2, and 4, values are closer to zero, indicating that the hypothesis that this parameter is zero (i.e., the exponential distribution) cannot be rejected.

In Fig. 4, it is possible to verify that very few wind speeds with longer return periods than 10 years have measured in the available period as is also found for the GEV case (cf. Fig. 2). However, return period curves for wind speed extreme values appear less accentuated than for GEV fitting, tending to linearity with confidence intervals much closer to the straight line for longer return periods. This is in concordance with not being able to reject the exponential case, which would have a straight-line return value curve. Because more data are used in fitting the GPD than the GEV distribution, there is more certainty in the estimates, resulting in confidence intervals that are narrower. However, if there is strong dependence in the excesses over the threshold, then these intervals will be unrealistically narrow. Therefore, such dependence is investigated in the next section.

Fig. 4
figure 4

Plots of return period of wind speed excess over a threshold fitted by GPD distribution with 95% confidence interval for each grid points

The quantile–quantile plots between the data, where excesses above the threshold are arranged and plotted against the values of the respective distribution for each grid point follow perfectly straight lines (not shown in this paper), indicating that the assumptions for using the GP distribution functions are reasonable.

3.3 Extremal index and declustering

The extremal index (θ) calculated using thresholds at the 90th percentile of the data at each grid point resulted in a mean value around 0.58. Those θ values characterize a relative dependence between excesses over their respective thresholds, indicating the occurrence of consecutive days of severe storms. Thus, those values could be associated with transient systems as cold fronts, with a period of 3–5 days between passages (Gan and Rao 1991).

Table 3 presents θ values for percentiles above 90%, and the wind speed values for the return periods of 10, 25, 50, and 100 years with their respective 95% confidence intervals, in parenthesis.

Table 3 Extremal index (θ) and wind speed return level (m/s) for 10-, 25-, 50-, and 100-year return period

The declustering method is performed for r = 2 and 3, considering the length of these intervals to be consistent with the passage of frontal systems over the region, thereby keeping the independence of severe events. The best GPD fittings relating to the distribution parameters were reached using r = 3. In Fig. 5 (a) and (b), we can see that for each grid point, the number of excesses vary more with increasing thresholds when declustering is not performed. We can observe in Fig. 5 (b) the number of the cluster is less sensitive when the threshold increases. As the threshold increases, fewer excesses remain, and subsequently, fewer clusters, which may suggest that more severe excesses are associated with shorter durations. The great number of the peaks related to each cluster is in agreement with the severe events that caused sea level rising along the south and southeastern coasts of Brazil.

Fig. 5
figure 5

a Number of excesses over several thresholds using the POT model and b number of clusters after the declustering alternative method with r = 3

4 Conclusions

The main conclusions in the present EV analysis refer to comparison between the GEV and GPD models, through probability and quantile plots using wind reanalysis data set in grid points over South Atlantic region from north of Argentina to southeast Brazil. GPD fits appear more satisfactory than those for the GEV, showing better results for the return levels of the return period of 100 years. GEV fitting shows that the Weibull distribution is more suitable at points between 40° and 25° of latitude, whereas the Gumbel distribution appears to be a better model for latitudes between 25° and 23°.

The best fitting GPD is found to be the beta distribution at all locations, and the return levels for 100 years are exceeded in some points, standing, on average, close to the already occurred peaks. The GPD and PP are similar methods and present harmonious results. So, apart from small differences that result from having to estimate the parameters of the two formulations, they should be (nearly) the same by definition. Return levels for periods up to 100 years for both GEV and GPD distributions yield values very close to the maxima already registered, indicating rare strong winds occurred during the analyzed period.

The extremal index values indicate permanence of excesses in consecutive days, characterizing that the extreme wind speeds are related to the passage of transient systems. The technique of declustering is a useful tool to recognize, through the clusters, the severity of such meteorological events.

The methodology presented in this paper to analyze temporal and spatially the extreme wind speeds in grid points over the southeast ocean region of the South America was useful to identify the extreme distributions of that variable in a region where the climatological information are scarce. The use of reanalysis data in grid points over the Atlantic Ocean, from north Argentina to southeast Brazil, covering the main track of the passage of cold fronts, made it possible to analyze the statistics of wind speed behavior. In addition to analyzing the extreme distribution, it was possible to identify that the occurrence of extremes is better correlated with the passage of cold fronts than with convective systems, likewise near the southeast coast of Brazil.

The mean values were also verified for 32-year period and they presented values around 8.0 to 5.0 m/s on 10-m height level from higher to lower latitudes respectively, being considered suitable values for wind farms in agreement with Pimenta et al. (2008). Then the south and southeast Brazilian coastlines have wind resources suitable for economically attractive offshore activities.