Peaks-over-threshold model in flood frequency analysis: a scoping review

In flood frequency analysis (FFA), annual maximum (AM) model is widely adopted in practice due to its straightforward sampling process. However, AM model has been criticized for its limited flexibility. FFA using peaks-over-threshold (POT) model is an alternative to AM model, which offers several theoretical advantages; however, this model is currently underemployed internationally. This study aims to bridge the current knowledge gap by conducting a scoping review covering several aspects of the POT approach including model assumptions, independence criteria, threshold selection, parameter estimation, probability distribution, regionalization and stationarity. We have reviewed the previously published articles on POT model to investigate: (a) possible reasons for underemployment of the POT model in FFA; and (b) challenges in applying the POT model. It is highlighted that the POT model offers a greater flexibility compared to the AM model due to the nature of sampling process associated with the POT model. The POT is more capable of providing less biased flood estimates for frequent floods. The underemployment of POT model in FFA is mainly due to the complexity in selecting a threshold (e.g., physical threshold to satisfy independence criteria and statistical threshold for Generalized Pareto distribution – the most commonly applied distribution in POT modelling). It is also found that the uncertainty due to individual variable and combined effects of the variables are not well assessed in previous research, and there is a lack of established guideline to apply POT model in FFA.


Introduction
Flooding is one of the worst natural disasters worldwide leading to significant economic losses (Acosta et al. 2016;Lathouris 2020). To effectively reduce flood damage, arising from the stochastic nature of extreme rainfall and runoff, assessments are usually undertaken by statistical methods. Flood frequency analysis (FFA) is one of the most preferred statistical methods, and widely used in infrastructure planning and design. FFA aims to estimate flood discharge with an associated frequency by fitting a probability distribution function to observed flood data. Hydrologists generally apply two modelling frameworks to perform FFA, annual maximum (AM) and peaks-overthreshold (POT). The AM model uses maximum discharge value from each year (i.e. one value from each year) at the location of interest. On the other hand, POT model extracts all the flow data above a threshold. The POT model is receiving a greater interest recently to understand the the nature of frequent floods, which are useful in characterising channel morphology and aquatic habitat, and helping river restoration efforts (Karim et al. 2017).
The AM model is the most popular method in practice given its straightforwardness in the sampling process. However, the sampling process in the AM model eliminates a large portion of the data from recorded streamflow time series. As an example, for a station with 50 years of streamflow record, AM model only considers 50 elements for modelling, each being the highest discharge data in a single year. Several studies note that AM model results in loss of useful information, e.g., the second highest flow data in a year (which could be higher than many data points in the AM series) is not selected (Bačová-Mitková and Onderka 2010;Bezak et al. 2014;Gottschalk and Krasovskaia 2002;Robson and Reed 1999). Traditional at-site FFA favors a 2 T rule, e.g. 100 years of streamflow record is needed for estimating 2% annual exceedance probability (AEP) or 50-year flood quantile estimate. The AM model has been criticized for its biased flood estimates in arid and semi-arid regions, in particular for smaller average recurrence intervals (ARIs) or frequent floods (Metzger et al. 2020;Zaman et al. 2012). The AM model has also been criticized for its considerable uncertainty in estimating frequent floods (Karim et al. 2017).
The POT model has been employed in extreme value analysis using Generalised Pareto (GP) distribution (Bernardara, Andreewsky and Benoit 2011;Coles 2001;Coles 2003;Coles et al. 2003;Liang et al. 2019;Northrop and Jonathan 2011;Pan and Rahman 2021;Thompson et al. 2009). The POT sampling process extracts a greater number of data points from the historical record compared to the AM model. The extracted POT series provides additional information by retaining all the data points above a selected threshold (Kumar et al. 2020;Madsen et al. 1997). POT model is also advantageous in terms of flexibility of sampling process, i.e. based on the purpose of the analysis, the POT model can extract desired numbers of data points by adjusting the level of threshold (Pan and Rahman 2018). However, the additional complexity associated with the POT model in relation to data independence is a negative aspect, which is one of the reasons for its under-employment (Lang et al. 1999;Pan and Rahman 2021). There is no unique procedure to select a threshold value in the POT modelling, and hence an iterative process is commonly adopted. As the threshold reduces, the number of selected flood peaks in the POT model increases; however, a very small threshold value can compromise with the independence criteria for some of the selected flood peaks. The threshold varies from catchment to catchment depending on catchment characteristics and flood generation mechanism. In contrast, in the AM series, the selected peak floods are most likely to be independent, as in this method, only one discharge data per year is selected.
As mentioned earlier, one of the complexities associated with POT model is the threshold selection (Beguería 2005;Gharib et al. 2017;Sccarrott and Macdonald 2012). Several methods have been proposed in selecting the threshold, which includes graphical methods (e.g. mean residual life plot) and shape stability plot of the GP distribution. These methods assume that for all the thresholds, above a wellchosen level, results in a stable shape parameter of the GP distribution (Durocher et al. 2018a, b). The graphical methods are subjective and difficult to apply for a large number of stations. To select a threshold without human intervention, some studies (Irvine and Waylen 1986;Lang, Ouarda and Bobée, 1999) proposed selection of threshold associated with a given exceedance rate, which is mainly governed by site characteristics (with an acceptable range of 1.2-3 events per year on average). Selection of a threshold based on a given exceedance rate may not guarantee the fulfilment of the POT model assumption. To overcome this problem, Davison and Smith (1990) proposed Anderson-Darling (AD) test to identify a range of thresholds where the GP distribution hypothesis cannot be rejected.
Despite the rising interest of applying POT model, there is lack of commonly accepted guideline for its wider application, and only limited reviews on this approach have been conducted. For example, Lang et al. (1999) reviewed POT modelling and prepared a guide for application. Sccarrott and Macdonald (2012) reviewed advances in POT modelling based on statistical perspective. Later, Langousis et al. (2016) performed a critical review on threshold selection. Recently, the automated threshold detection techniques have been compared (Curceac et al. 2020;Durocher et al. 2019;Durocher et al. 2018a, b). To the best of our knowledge, scoping review on POT approach is limited and the literature on applying POT model remains sparse as compared to the AM model. To bridge the knowledge gap, this study aims to review and summaries the current status of POT model and to identify the difficulties in applying the POT model in FFA. It is expected that this scoping review will enhance the applicability of POT model and provide guidance on future research needs.

Review methodology
To undertake this scoping review, we followed the recommended framework by Sccarrott and Macdonald (2012) and Langousis et al. (2016). We firstly formulate the research questions, which is followed by identification of relevant keywords. To initialize the thought process, we asked: 'Why is POT modelling framework under-employed in FFA?' If we find the answer to this question, we then ask: 'What are the conveniences in applying POT in FFA?' and 'What challenges do we face in applying POT in FFA?' Besides the questions mentioned above, we also formulated a range of additional questions to fully develop the framework of this scoping review: i. What are the current progresses in applying the POT model and at what scale (at-site or regional)?
ii. What are the major research gaps and future research needs in applying POT based FFA?
Based on the above research questions, we identified the relevant keywords for searching published articles on the POT model. The following keywords were applied to maximize the searching performance and to locate relevant articles from scientific database: 'extreme value' & limited to 'Pareto', 'partial duration' & limited to 'POT' and partial duration series ('PDS'), 'peaks over the threshold', 'Pareto', 'pooled analysis', 'threshold selection', 'annual maximum' & limited to 'POT'. The following scientific databases were used to locate relevant publications: Science Direct, Google Scholar and Scopus. We found more than 500 articles related to POT based FFA and only selected 135 publications that satisfied the above-mentioned criteria. The final step was to examine the selected articles and compile the review results in the form of this article. Figure 1 presents components and sub-components that were considered in this study.
The paper is organized as follows. Section 3 focuses on the common model assumptions applied to POT model, followed by reviewing the independence criteria and threshold selection in Sect. 4. Section 5 and Sect. 6 cover the parameter estimators and distribution functions for the POT model, respectively. Section 7 contains regional techniques, followed by discussion on stationarity in Sect. 8. Section 9 presents discussion, and Sect. 10 presents a summary of this review.

Model assumptions
Two common forms of model assumptions are associated with POT modelling, Poisson and Binomial (or negative binomial) processes coupled with either exponential or GP distributions. Poisson arrival assumes that the occurrence of the flood peaks above the selected threshold follows a Poisson process, provided the magnitude of flood peak is identically, independently distributed (i.i.d.). The most useful aspect of the Poisson arrival is that, with a given threshold, X, if the model follows Poisson process then other thresholds with values greater than X also follow Poisson process. Poisson assumption is the most applied sampling technique for construction of POT data series. However, the model assumption of Poisson arrival is not always valid. Cunnane (1979) used recorded flood data from 26 stations in the U.K to assess the validity of Poisson assumption, coupled with the exponential distribution and found that, the variance of the number of yearly flow peaks is significantly larger than the mean, which rejects the Poisson arrivals and suggests the fitting of binomial or negative binomial distributions. Ben-Zvi (1991) applied the Chi-square test to evaluate the model fitting performance of Poisson and negative binomial distributions using data from eight gauged stations in Israel. This study supported the negative binomial arrival in contrast to Poisson arrival. However, this study was inconclusive due to the limitations of the Chi-square test.  Lang et al. (1997) suggested using dispersion index test based on the given threshold for a choice of model assumption. They stated if the index test is greater or smaller than one, negative binomial or binomial assumption should constitute the Poisson process, respectively. Lang et al. (1999) proposed a practical guideline for POT based FFA, which only validate the Poisson assumption if the dispersion index is located within a 5% confidence interval with the selected threshold.
Ö nöz and Bayazit (2001) further evaluated the validity of applying binomial or negative binomial process in combination with the exponential probability distribution in POT based FFA. They found that the flood estimates based on binomial or negative binomial variates, are identical to the ones obtained from the Poisson process. The study concluded that Poisson arrival is the preferred process over binomial or negative binomial ones even if the Poisson process hypothesis is rejected. Furthermore, Eastoe and Tawn (2010) proposed mixed models to account for the overdispersion issue of annual peak count concerning the Poisson assumption. This study included the use of regression and mixed model to extend the homogeneous Poisson process.
To summarize the findings on the model assumption in POT based FFA, the Poisson arrival is the most preferred sampling assumption provided that the i.i.d. criterion is fulfilled. A dispersion index test could indicate if a constituted process to Poisson arrival should be used. However, only limited studies have evaluated the combined effects of model assumptions and sample size (Ben-Zvi 1991;Cunnane 1973Cunnane , 1979. This potentially poses some degree of uncertainty in the design flood estimates based on POT modelling, which requires further investigation.

Independence and threshold selection
POT sampling requires a dual-domain approach including time and magnitude. Coles (2001) proposed a de-clustering method, which filters the dependent elements from natural streamflow records (Solari and Losada 2012). Bernardara et al. (2014) later proposed a two-step framework for POT modelling for estimating environmental extremes through defining physical and statistical thresholds for de-clustering and GP distribution, respectively. Two commonly applied independence criteria are discussed below.
(i) USWRC (1976) stated that independent flood peaks must have at least five days separation period (h), plus the natural logarithm of selected basin area (A) measured in square mile. Besides, the intermediate flow value between two consecutive peaks must be dropped below threequarter of the lowest of these two flow values. The second or any other flood peaks must be rejected if any of the criteria from Eq. 1 is met. The independence criteria specified by USWRC (1976) have been applied to POT based FFA by several studies (Bezak et al. 2014;Hu et al. 2020;Nagy et al. 2017).
where A is basin area in mile 2 .
(ii) Another commonly applied independence criteria is recommended by Cunnane (1979), which is that two peaks must be separated by at least three times of the average time to peak. The average time to peak is obtained by assessing the hydrographs. Also, the minimum discharge between two consecutive peaks must be less than twothirds of the discharge of the first of the two extremes. The second or any other flood peaks must be rejected if any of the criteria from Eq. 2 is met. Silva et al. (2012) and Chen et al. (2010) adopted the below independence criteria: where T p is the average time to peak. Noteworthy, the criteria in Eqs. 1 and 2 for extraction of POT data series have been criticized due to the associated uncertainty. Ashkar and Rousselle (1983) reviewed the above two equations and concluded that the restriction for independence might render the Poisson process inapplicable.
Besides the criteria mentioned above, other methods are also proposed. For example, Lang et al. (1999)   period; (iii) Retaining flood peaks based on a predefined frequency factor; and (iv) Selecting flood peaks that exceed 60% of bankfull discharge at a given station (Page et al., 2005). The bankfull discharge can be found examining the stage-discharge relationship and riverbank elevation at the gauging location, e.g. an inflection point on the stage-discharge relationship points to the bankfull discharge (Karim et al. 2017). Bankfull discharge is unique to each river and depends on factors like catchment area, geology and channel geometry and is generally taken as the AMF discharge with 1.5-year return period (Edwards et al. 2019). Solari and Losada (2012) proposed a unified statistical model for POT based FFA. Dupuis (1999) applied optimal bias-robust estimates (OBRE) to detect the threshold.
Another complexity associated with POT based FFA is determining the statistically sound threshold to fit GP distribution (POT-GP). Based on Pickands (1975), an extracted POT series with a sufficiently high threshold, the tail behavior follows a GP distribution. One of the most well-known properties of GP distribution is that the shape and modified scale parameter remain constant with the increased threshold. This property of GP distribution had been employed widely to verify the suitability of POT-GP approach in frequency analysis.
Traditionally, graphical diagnostics are employed to find a suitable threshold. Coles (2001) suggested three different types of graphical diagnostics including mean residual life plot (MRLP), threshold stability (TS) plot and other tools such as Q-Q, P-P and return level plot. Li, Cai and Campbell (2004) studied the extreme rainfall in Southwest Western Australia using POT-GP approach and identified the threshold based on the TS plot. Langousis et al. (2016) performed a critical review on representative methods for threshold selection for POT model based on GP distribution. This study suggested using MRLP over other methods as it is less sensitive to record length. However, graphical diagnostics have some drawbacks. For example, Sccarrott and Macdonald (2012) noted that graphical method required the practitioner to have substantial experience, and selecting a threshold could be subjective and associated uncertainty is difficult to quantify. There are several proposals to overcome this drawback by automating the threshold selection for POT-GP model through a computer program (Dupuis 1999;Liang et al. 2019;Solari and Losada 2012;Thompson et al. 2009). The main objectives of using such computer programs include quantifying the associated uncertainty in the flood estimates based on different sets of estimated parameters; and guiding the threshold selection based on goodness-of-fit (GOF) statistics, then to determine range of suitable thresholds based on a significance level while not rejecting the hypothesis of GP distribution.
The automation of threshold selection is developed based on the property of POT-GP model. Davison and Smith (1990) suggested using the Anderson-Darling (AD) GOF test to select a range of the threshold candidates for POT-GP model based on acceptable normality p-value (ND). This methodology is applied by Solari et al. (2017); in their study, the obtained thresholds based on POT-GP-ND approach mostly agreed with the ones obtained using the traditional graphical method. Durocher et al. (2018a, b) compared several automated threshold selection techniques based on POT-GP-ND approach and then proposed a hybrid method. The suggested method has a lower boundary of one peak per year (PPY) and an upper limit of 5 PPY, which accommodates the 1.6 PPY as recommended by Cunnane (1973) (at least of 1.63PPY for POT framework to have smaller sample variance compared with AM framework), and to accommodate the practical guideline between 1 and 3 PPY by Lang et al. (1999). This study found the shape parameter's consistency through a hybrid method in most of their selected sites. Zoglat et al. (2014) proposed another method using square error (SE), which was applied by Gharib et al. (2017). This method aims to find the optimum threshold by locating the minimum SE between stimulated and observed flood quantiles. Curceac et al. (2020) evaluated the POT-GP-SE and POT-GP-ND approaches and proposed an empirical automated threshold selection based on cubic curve fitting to TS plot. They found that the proposed method based on TS plot had the greatest agreement of indices between empirical and theoretical quantiles at different time scales (15 min to daily). They addressed the need for further research on the combined effects of data scale, threshold selection and parameter estimator of the shape parameter of the GP distribution. Hu et al. (2020) applied POT model to USA and noted that when automatic threshold selection method was adopted with shorter data length, POT was unable to offer any additional benefit compared to AMF model.
Noteworthy, a single threshold for POT-GP approach might not be suitable for all situations. To overcome this, Deidda (2010) proposed a multiple threshold method (MTM) and found that MTM was better as compared to a single threshold through Monte Carlo simulation. Later, a quantitative assessment was performed by Emmanouil et al. (2020) for comparison of estimated quantiles using several approaches such as, AM, POT-MTM and multifractal approach.
Based on the above discussion, it is evident that the associated uncertainty with the POT-GP model does not arise only due to model assumption as noted in Sect. 3, but it is due to the combined effect of model assumption, threshold selection, data scale and parameter estimator. Future research could include the extent of the statistical argument in an automated algorithm and explore the upper limit of applying MRLP. Figure 3 illustrates different threshold selection methods in POT modelling.

Parameter estimator
Parameter estimation is an essential step in POT based FFA like the AM model. Commonly applied estimators include maximum likelihood (ML), method of moments (MOM), probability weighted moments (biased/unbiased (PWMB/ PWMU)), generalized probability weighted moments (GPWM) and other methods. This chapter briefly reviews the advances of parameter estimators in relation to FFA based on POT and GP distribution. Pickands (1975) introduced GP distribution, and Hosking and Wallis (1987) first adopted this distribution in FFA. Hosking and Wallis (1987) compared the performance of the MOM, PWM and ML for estimation of GP parameters. They concluded that the PWM and MOM are the preferred estimator over ML, except for large sample size for quantile estimation, which aligns with Bobée et al. (1993) and Zhou et al. (2017a, b, c). MOM and PWM were preferred by Hu et al. (2020) and Metzger et al. (2020), respectively. ML has been widely adopted in many studies despite a limited sample size (Martins & Stedinger 2001;Mostofi Zadeh et al. 2019;Nagy et al. 2017;Ngongondo, Zhou and Xu 2020;Zhao et al. 2019a, b;Zhou et al. 2017a, b, c). Madsen et al. (1997) compared the performance of parameter estimator between AM and POT-GP models. Recently, Curceac et al. (2020) applied several commonly used estimators by Monte Carlo experiment and found that PWMU and PWMB are consistently least biased and less sensitive to the sample size, which is also in agreement with Hosking and Wallis (1987).
Other estimators were also evaluated in FFA using POT-GP model. For example, Ashkar and Ouarda (1996) evaluated the generalized method of moment (GMOM) for different shape parameters using observed and simulated data. Rasmussen (2001) developed a GPWM method and provided practical guidelines on this. In this study, GPWM was found to be outperforming the PWM, but only a relatively small difference was observed when compared with MOM. Martins and Stedinger (2001) examined the performance of generalized maximum likelihood estimator (GML) and compared its performance with MOM and L moments (LMOM). Kang and Song (2017) reviewed six estimators for GP distribution and found that the nonlinear least square-based method with modified POT series outperforming other estimators.
Selection of the best estimator for POT-GP model is an area that needs further research. The uncertainty due to parameter estimator is difficult to quantify. Ashkar and Tatsambon (2007) evaluated the upper bound of GP distribution applying different estimators, including MOM, ML, PWM and GPWM, through stimulation studies. They found that the upper bound of GP distribution is inconsistent between estimated and observed data. Later, Gharib et al. (2017) proposed a two-step framework for selecting both threshold selection techniques and the associated parameter estimators. However, the study was inconclusive (only POT-GP-SE method was assessed), and the study stressed the need of assessing the combined effects of estimator and threshold selection, similar to Curceac et al. (2020).
The Bayesian approach can be used to estimate distributional parameters in FFA for both AM and POT models. Here, the parameter of a distribution is treated as a random variable where the knowledge on a parameter is expressed by a prior distribution. In the context of PDS, Madsen et al. (1994) adopted a regional Bayesian approach for extreme rainfall modelling in Denmark where the empirical regional distributions of the parameters of the POT model were used as prior information for both exponential and GP distributions. Parent and Bariner (2003) also adopted a Bayesian approach to deal with the classical Poisson-Pareto-POT model for design flood estimation in the Garonne river in France. Bayesian POT approach was also adopted by Ribatet et al. (2007) and Silva et al. (2017). Figure 4 illustrates MLRP plot for an Australian stream gauging station (223,204 Nicholson River at Deptford) having 56 years of streamflow data. This presents a comparison of the identified threshold based on MLE and PMWU in MRLP. It can be seen that there is no remarkable distinction in the identified thresholds using the two estimators having the same p-value (Pan and Rahman, 2021).
To summarize, it may be stated that the most applied parameter estimators for POT model include MOM, PWM and ML provided the sample size is large. The application of Bayesian approach in POT modelling is limited. We conclude that the sample size and range of the shape parameters are the primary considerations for selecting the parameter estimator. A choice of parameter estimator should be made based on the nature of a given data set. There is a lack of general practice guideline for selecting parameter estimator combined with threshold selection technique, and there is a lack in assessment of the upper bound of GP distribution.

Probability distributions
Fitting of the probability distribution to observed flood data is a primary step in any FFA exercise. GP distribution and its reduced form, exponential distribution, remains the most popular distribution in POT based FFA (Bobée et al. 1993;Davison and Smith 1990;Lang et al. 1999;Lang et al. 1997;Lang et al. 1999;Madsen et al. 1997;Rasmussen & Rosbjerg 1991;Rosbjerg, Madsen and Rasmussen 1992;Silva et al. 2014;Yiou et al. 2006;Zhao, et al. 2019a, b). In this regard, use of extreme value theory by Pickands (1975) is justified. The GP distribution (twoparameter) is preferred over exponential (one-parameter) based on flexibility in modelling. However, use of GP distribution is restricted by its complexity of selecting a statistically sound threshold as discussed in Sect. 4. GP distribution is well-known for its flexibility of modelling upper tail behavior, which is typical for observed flood records and other environmental extreme events. GP distribution is also widely employed in other disciplines for forecasting, trend analysis and risk assessment (Kiriliouk et al. 2019). Other probability distributions have also been applied in POT based FFA but remained unpopular. For example, Bačová-Mitková and Onderka (2010) applied Weibull distribution in POT based FFA and compared the obtained results with AM based FFA. This study concluded that the POT based FFA could produce comparable quantile estimates, especially for a shorter record length. Ashkar and Ba (2017) compared the Kappa distribution with GP due to their inherent similarity. Chen et al. (2010) proposed a bi-variate joint distribution for POT based FFA. Figure 5 illustrates fitting of the GP distribution to POT series for stream gauging station 419,016 (Cockburn River at Mulla Crossing in New South Wales, Australia). It shows that POT 2-ND-MLE model provides the best fit to the observed flood data (POT 2 indicates 2 events per year being selected in the POT series). Also, POT3-ND-MLE model provides very good fit to the observed flood, in particular in frequent flood ranges (smaller ARIs).
Although the GP distribution remains the most popular distribution in POT based FFA, the associated uncertainty in higher return period is difficult to quantify. There are proposals to enhance the model fitting to POT series by introducing a mixture of models, which is GP based. For  (2012) proposed a unified statistical model called log-normal mixed with GP and quantified uncertainty associated with its tail behavior. However, the study was inconclusive and required further investigation to assess the combined effects (Curceac et al. 2020).
Another overlooked area is the bulk data below the threshold, as the GP distribution is only ideal for approximating the behavior of elements above the threshold. Several methods for the mixture of models (below and above threshold) are reviewed by Sccarrott and Macdonald (2012) and still, the associated uncertainty of these models are unquantifiable. The difficulties introduced by the mixture of models are the transitional point between two distribution functions and difficulties to accommodate the site specifics in modelling.
The goodness-of-fit (GOF) is commonly applied in selection of probability distribution(s) by comparing the empirical and theoretical distributions, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) test. GOF is commonly applied in AM based FFA and can be used as a supplementary verification tool in POT based FFA. Choulakian and Stephens (2001) assessed the GP distribution fitting to 238 stream gauging stations in Canada by applying AD & Cramér-von Mises statistics and found that GP distribution providing an adequate fit to the observed POT data series. Gharib et al. (2017) assessed six parameter estimators using Relative Mean Square Error (RMSE) and AD test for the proposed framework based on the shape parameter. This study found that using the AD test, the proposed framework improved for 38% of the stations by an average of 65%. Laio, Di Baldassarre and Montanari (2009) reviewed AIC, BIC, and AD, and concluded that the GOF tests produced satisfactory results. Haddad and Rahman (2011) also assessed several GOF tests and found from the Monte Carlo simulation that ADC was more successful in recognizing the parent distribution correctly than the AIC and BIC when the parent is a threeparameter distribution. On the other hand, AIC and BIC were better in recognizing the parent distribution correctly than the ADC when the parent was a two parameter distribution. Heo et al. (2013) proposed a modified AD test to assess the POT model. In regional POT framework, GOF is vital to assess the fit for the individual and group of sites. Silva et al. (2016) applied AIC, BIC, and likelihood ratio test for their study.
The parent probability distribution remains unknown at a given site. There are limited studies to quantify the uncertainty in fitting GP distribution to the POT data series, and current practice in comparing the flood estimates with either observed data or estimated flood quantiles by AM based FFA may not be adequate.

Regional flood frequency analysis
Regional flood frequency analysis (RFFA) is used to estimate design floods at ungauged catchments or at gauged catchments with limited data length or having data with poor quality (Komi et al. 2016;Walega et al. 2016). In RFFA, AM flood data have widely been adopted (Haddad, Rahman and Stedinger 2012), and only minor attention has been given to the POT-based RFFA methods (Kiran and Srinivas 2021).
Identifying the hydrologically similar group or, the homogenous region is the first step in any RFFA. Three main categories of homogeneous regions are considered, which are based on catchment attributes, flood data and geographical proximity (Ashkar 2017;Paixao et al. 2015;Shu & Burn 2004;. Traditionally, the geographical proximity is commonly adopted to form homogeneous groups. Other methods include catchment characteristics data to form homogeneous groups (Bates et al. 1998).  reviewed the popular attributes considered in RFFA in forming homogeneous groups in Canada, which included geographical proximity, flood seasonality, physiographic variables, monthly precipitation pattern, and monthly temperature pattern. A revision of RFFA algorithm was proposed by  based on AM based RFFA and to satisfy 5 T rule, which was initially suggested by Hosking and Wallis (1987) and Reed et al. (1999). Rahman et al. (2020) performed an independent component analysis using data from New South Wales, Australia; however, it considered AM flood data and homogeneity was not specially considered. All of the above methods to identify homogeneous regions used AM flood data. Research on homogeneous regions using POT data is limited to-date. Cunderlik and Burn (2002) proposed a site-focused pooling technique based on flood seasonality, namely flood regime index, to increase the number of initial homogenous groups and found it to be superior to the mean of mean day (MDF) descriptor; however, the sampling variability was not considered. Cunderlik and Burn (2006) later proposed a new pooling group for flood seasonality based on nonparametric sampling, where the similarity between the target site and potential site was assessed by the minimum confidence interval of the intersection of Mahalanobis ellipses. Shu and Burn (2004) developed a method using fuzzy expert system to derive an objective similarity measure between catchments. There are also other methods for forming of the homogeneous groups (Burn and Goel 2000;Cord 2001). Carreau et al. (2017) proposed an alternative approach using hazard level to partition the region into sub-regions for POT model, which aims to formulate the approach as a mixture of GP distributions.
Index flood method was proposed by Dalrymple (1960) and remains one of the most popular methods in AM based RFFA (Hosking & Wallis 1993;O'Brien and Burn 2014;Robson and Reed 1999). This is due to its simplicity in developing a regional growth curve and weighing the sites by index-variables such as mean annual flood. Index flood approach was applied with POT data by Madsen and Rosbjerg (1997b) where GP shape parameter was regionalized. They examined the impacts of regional heterogeneity and inter-site dependence on the accuracy of quantile estimation. They found that POT-based RFFA was more accurate than the at-site FFA estimate even for extremely heterogenous regions. They also noted that modest inter-site dependence had only minor effects on the accuracy of POT based index flood method. The POT based RFFA was further explored by Madsen and Rosbjerg (1997a) and Madsen et al. (2017). Roth et al. (2012) developed a nonstationary index flood method using POT data based on a composite likelihood test. Mostofi Zadeh et al. (2019) performed a pooled analysis based on both AM and POT data (using AM pooling technique and then applied to POT model and vice versa) and concluded that the POT model pooling group reduced uncertainty in design flood estimates. Quantile regression technique has been widely employed under AM model . Durocher et al. (2019) compared four estimators based on index flood method and quantile regression technique including regression analysis, L-moments and likelihood method using POT data. Gupta et al. (1994) noted that the coefficient of variation of AM flows should not vary with catchment area in a proposed region/group. This may not satisfy for many regions, which led to the Bayesian approaches, which was studied by Madsen, Rosbjerg and Harremoës (1994) by using an exponential distribution in RFFA using POT data. This study adopted total precipitation depth and maximum 10-min rainfall intensity of individual storms for Bayesian inferences. The proposed method was found to be preferable for estimating design floods of return period less than 20 years. Madsen and Rosbjerg (1997a) proposed an index flood method based on a Bayesian approach, which combined the concept of index flood with empirical Bayesian approximation so that the inference on regional information can be made with more accuracy. Ribatet et al. (2007) implemented Markov Chain Monte Carlo (MCMC) technique along with GP distribution to sample the posterior distribution. Silva et al. (2017) studied Bayesian inferential paradigm coupled with MCMC under POT framework. Some attention was drawn to using historical information in RFFA to enhance accuracy of flood estimates. Sabourin and Renard (2015) proposed a new model utilising historical information similar to Hamdi et al. (2019). Kiran and Srinivas (2021) used POT data from 1031 USA catchments to develop regression based RFFA technique. They noted that scale and shape parameters of the GP distribution fitted to PDS data were largely governed by catchment size and 24-h rainfall intensity corresponding to 2-year return period.

Stationarity
Stationarity is one of the most critical concepts in applying extreme value theory in hydrology, which implies that the estimated parameters of the given probability distributions do not change with time, i.e., the current parameters used for modelling remain constant for the future so that the quantile estimates remain consistent over time. However, in assessing the AM and POT time series data, a trend or a jump may be found in many cases, which undermines the stationary assumption (Ishak and Rahman 2015;Ishak et al. 2013). The identified anomalies may be statistically significant or insignificant, which may be due to climate change or other reasons such as land use changes (Burn et al. 2010;Cunderlik et al. 2007;Cunderlik and Ouarda 2009;Ngongondo et al. 2013;Ngongondo, Zhou and Xu 2020;Silva et al. 2012;Zhang, Duan and Dong 2019). Recent application of POT with non-stationarity is proposed by Lee, Sim and Kim (2019) for extreme rainfall analysis.
To account for non-stationarity, parameters of the distribution requires adjustment as a function of time. El Adlouni et al. (2007) and Villarini et al. (2009) applied the function of temporal covariates for parameters of a probability distribution. Koutsoyiannis (2006) evaluated the two commonly applied approaches for nonstationary analysis and argued that the common FFA approaches are not consistent with the rationale of the stationary analysis.
In POT modelling, GP distribution is the most used distribution. In non-stationary approach, GP distribution commonly presents a constant shape parameter and a varied scale parameter (time-dependent or covariate with climate indices under nonstationary condition). In this regard, Coles (2001) argued that even under stationary condition, the shape parameter is difficult to estimate. Regression analysis is commonly applied to quantify the trend before applying the variation in the scale parameter (Vogel, Yaindl and Walter 2011). Moreover, for GP distribution, the threshold can also be treated as time-dependent (Kyselý et al. 2010). Recently, Vogel and Kroll (2020) compared several estimators for non-stationary frequency analysis.
In the context of POT based RFFA, Roth et al. (2012) examined non-stationarity by varying scale parameter using index flood method and suggested a time-dependent regional growth curve for temporal trends observed in the study data set. Silva et al. (2014) proposed a zero-inflated Poisson GP model for the non-stationarity condition, and proposed a non-stationarity RFFA technique based on Bayesian method (Le Vine 2016; Parent and Bernier 2003;Silva et al. 2017). Mailhot et al. (2013) proposed a POT based RFFA approach to a finer resolution using rainfall time series data. O'Brien and Burn (2014) studied nonstationary index flood method using AM flood data. They noted the challenge of forming a homogenous group, which was due to several sites presenting significant level of nonstationarity. Durocher et al. (2019) compared several estimators under nonstationary condition using index flood method for a data set of 425 Canadian stations and found that the L moments approach was more robust and less biased than ML estimator. This study also found that a hybrid pooling group approach, which included sites with stationary and nonstationary conditions, improved the accuracy of the quantile estimates. The recent study by Agilan, Umamahesh and Mujumdar (2020) stated the uncertainty due to threshold under non-stationarity condition is 54% higher than the ones under stationary consideration. Reed and Vogel (2015) questioned the applicability of return period concept in FFA under nonstationary condition. They demonstrated how a parsimonious nonstationary lognormal distribution can be linked with nonstationary return periods, risk, and reliability to gain a deeper understanding of future flood risk. For the non-stationary POT models, the risk and reliability concept need to be further explored as suggested by Reed and Vogel (2015). Iliopoulou and Koutsoyiannis (2019) developed a probabilistic index based on the probability of occurrence of POT events that can discover clustering linked to the persistence of the parent process. They found that rainfall extremes could exhibit notable departures from independence, which could have important implications on POT based FFA under both stationary and non-stationary regimes. Thiombiano et al. (2017) and Thiombiano et al. (2018) presented how climate change indices can be used as covariates in a non-stationary framework in the POT modelling.

Discussion
Traditional FFA based on AM model is the most popular FFA method given its straightforward sampling process and availability of a wide range of literature and guidelines. Even though there are theoretical advantages with POT based FFA, this is still under-employed. Table 1 presents a summary where POT based FFA have been examined. It should be mentioned that although many researchers have examined the suitability of the POT based FFA, its inclusion in flood estimation guide is limited. For example, Australian Rainfall and Runoff (ARR) 2019 stated that POT-based method can be adopted for FFA, but its FLIKE software does not include any POT-based analysis (Kuczera and Franks, 2019). They stated that POT is more appropriate in urban stormwater applications and for diversion works, coffer dams and other temporary structures; for most of these cases recorded flow data are unavailable. In design rainfall estimation for Australia in ARR 2019, POT-based methods have been adopted to estimate more frequent design rainfalls (Green et al., 2019).
Unlike AM model, which only extracts a single value per year, the POT is more complex in its sampling process. Cunnane (1973) argued to have at least 1.6 event per year on average in the POT model to provide less biased estimates than AM model. However, with the recent advances in computational modelling, Durocher et al. (2018a, b) applied an upper limit of 5 events per year in POT based FFA and obtained results which are comparable to the AM model. Besides, ensuring the independence of the extracted data is one of the other difficulties faced in applying the POT model. Two commonly applied criteria in constructing POT series as described in Sect. 4 have been criticized by Ashkar and Rousselle (1983) for possible violation of the model assumptions. POT based FFA is also constrained to the model assumptions, either with Poisson or binomial (or negative binomial) arrivals. Ö nöz and Bayazit (2001) reported a comparable result even when the assumptions are violated, although the associated uncertainty is not well studied.
Another well-known difficulty is to identify the threshold for GP distribution in POT modelling. Based on the critical review of current methods by Langousis et al. (2016), MRLP is found to be the most effective detection method, which leads to the least bias design flood estimates. However, this study suggested further research on automated procedure in POT data construction under stationary condition with additional statistical arguments. Nonstationary POT based FFA has attracted more attention recently Mostofi Zadeh et al. 2019); however, the findings of these studies are not conclusive, and further research is warranted on nonstationary POT based FFA.
Uncertainty in flood estimates is still a challenging topic with the recent floods in Europe and China, it is noted that traditional FFA approaches need an overhaul, and a comprehensive uncertainty analysis is warranted. POT model is flexible in data extraction as compared to AM model, but this brings additional levels of uncertainty such as sample  Cunnane (1973Cunnane ( ), (1979; Hosking and Wallis (1987); Acreman (1987); Davison and Smith (1990); Reed et al. (1999); Eastoe and Tawn ( 2010); Northrop and Jonathan (2011); Le Vine (2016) (2017); Ashkar et al. (1987); Ashkar and El Adlouni (2015); Ashkar and Ouarda (1996); Ashkar and Rousselle (1983); Ashkar and Tatsambon (2007); Irvine and Waylen (1986); Burn (1990a, b); Burn and Goel (2000); Burn and Whitfield (2016); Bobée et al. (1993); Adamowski (2000); Adamowski et al. (1998); Dupuis (1999); Shu and Burn (2004) size variation (i.e., average events per year is not fixed) and effects of time scale of data extraction (e.g., 15 min, hourly, daily or monthly). Besides, POT model requires a threshold to be determined for GP distribution. For this, no universal method has been proposed. To successfully extract the POT series, two independence criteria for retaining flood peaks are discussed in Sect. 4. However, the uncertainty associated with independence criteria is not fully understood despite studies have shown that the independence criteria need to be situation specific. On the other hand, the threshold determination to suit the assumption of the GP distribution is one of the other concerns. The associated uncertainty is more complex in homogeneous group formation for POT modelling. Beguería (2005) performed a sensitivity analysis on the threshold selection to the parameter and quantile estimates and stated no unique optimum threshold value could be detected. Durocher et al. (2018a, b) also stated that the threshold selection affects the trend detection significantly, and currently, no acceptable method has been found. Moreover, as discussed in Sects. 5 and 6, various estimators, distribution functions, and GOF tests aim to reduce the uncertainty in POT based FFA.
AM model certainly is the most popular and well-studied FFA approach but it has limitations too. POT model is an alternative FFA approach, which has been proven to be advantageous in many studies. Below is the list that summarizes key points on the conveniences of applying POT model in FFA.
• POT model is not that limited (compared to AM model) by smaller data length due to its sampling process as the overall data length is controllable. This provides additional flexibility with the POT model. • POT model is proven to be efficient for the arid/semiarid regions as streams here may have low/zero flows in many years over the gauging period. • POT is proven to be efficient in estimating very frequent to frequent flood quantiles, which are needed in environmental and ecological studies. • Due to the controllable resolution of the time series, POT is more suited to present the trend and perform the nonstationary FFA. • POT can provide bigger data set in the context of RFFA due to its nature of data extraction process, which may be useful to regionalize very frequent floods.

Conclusion
Two main modelling approaches, annual maximum (AM) and peaks-over-threshold (POT), are adopted in FFA. The AM model is well employed due to its straightforward sampling process, while the POT is under-employed internationally. In this scoping review, we found that POT model is more flexible than AM one due to the nature of the data extraction process. It is found that POT based FFA can provide less biased estimates for small to mediumsized flood quantiles (in the more frequent ranges). Furthermore, POT model is more suitable for design flood estimation in the arid and semi-arid regions as in these regions many years do not experience any runoff. Despite the advantages with the POT model, it has several complexities. The physical threshold determination (to ensure the independence of the extracted data points and to satisfy the model assumptions) is the first obstacle that discourages the wider application of the POT model in FFA. The effects of independence criteria on the uncertainty in the final flood estimates are not examined thoroughly as well as the combined effects of the extracted sample size and independence criteria.
The statistical threshold selection in applying GP distribution in POT based FFA is the second obstacle. In this regard, a commonly accepted guideline has not been produced yet. In the critical review by Langousis et al. (2016), the MRLP is found to be a promising method of threshold detection; however, the scaling effect is not well examined (e.g., to what scale, the MRLP is efficient for detection of threshold). The suggested approach is to examine a more suitable statistical argument in the iterative/automated process. Additional complexity arises from the model assumption (Poisson or binomial/negative binomial), parameter estimators and distribution functions. The uncertainty of the individual component may not be significant; however, the combined effects of these aspects may increase the level of uncertainty in flood quantile estimates by the POT model, which requires further investigation.
We have found few recent studies involving the mixture of AM and POT modelling frameworks, but this needs further research as there are several unanswered questions with this combination. The POT based RFFA also requires further research as there are only handful of studies on this, which would be very useful to increase the accuracy of design flood estimates in ungauged catchments for smaller return periods, which are often needed in environmental and ecological studies. The non-stationary FFA based on POT model needs further research as the future of FFA lies in the non-stationary approaches. Since POT model has more parameters, the estimation of the effects of climate change on this model is more challenging, and this is an area that needs further research. manuscript. We also acknowledge Australian Government for providing streamflow data used in this study.
Authors contribution XP selected articles for review, carried out analysis, and drafted the manuscript, AR reviewed and revised the manuscript thoroughly. KH reviewed and revised the manuscript and TO discussed the concepts and reviewed the manuscript.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. No funding was received for this study.
Data availability All the articles used in preparing this manuscript are available via Scopus and/or Google. Streamflow data used in this study can be obtained from Australian stream gauging authorities by paying a prescribed fee.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.