Best unbiased graphical estimators of location-scale distribution parameters: application to the Pozzuoli’s bradyseism earthquake data

Graphical techniques are recommended for critical applications because they allow for visual analysis and helpful understanding of the results, also for non-statisticians. However, “graphical” estimation methods are often criticized because they are less efficient with respect to “analytical” methods. This paper proposes a new general graphical method that leads to the best linear unbiased estimators of location-scale distribution parameters. Therefore, the reputation of graphical methods is raised and their strategic use is encouraged. An applicative example analyzes the earthquake magnitudes registered during the serious 1983–1984 bradyseismic crisis in Campi Flegrei, Italy. This graphical analysis will certainly work as a strategic reference picture to which the data, arising from the further bradyseismic crises expected in the next future, can be compared.


Introduction
As confirmed by the renewed interest appeared in the recent literature (Rigdon and Basu 1989;Makkonen 2006Makkonen , 2008ade Haan 2007;Cook 2011Cook , 2012Kim et al. 2012;Erto and Lepore 2013;Fuglem et al. 2013;Lozano-Aguilera et al. 2014) practitioners are used to exploiting modern software that adopts graphical estimation methods via probability papers, even if there is a variety of effective analytical methods available, such as Maximum Likelihood and Bayesian techniques. In fact, especially in critical applications, the graphical estimation gives the unique opportunity to share statistical information with non-statisticians (e.g., by allowing a visual check of the fit of the chosen model and by giving helpful understanding of the consequent conclusions). Clearly, if the approach is to be purely analytical there is no point in using a probability paper (Kimball 1960).
If we consider the observations x (1) , . . . , x (i) , . . . , x (N ) of the order statistics X (1) , . . . , X (i) , . . . , X (N ) arranged in non-decreasing order, which correspond to mutually independent and identically-distributed N random variables X 1 , . . . , X i , . . . , X N , the basic problem of graphical methods is how to establish the estimateF i of the cumulative distribution function (cdf) F X x (i) (i.e., the plotting position) that can ensure a required property (e.g., unbiasedness) for the resulting estimators of the distribution parameters.
Plotting positions have been used and discussed for many years by engineers, hydrologists and statisticians. Noticeable remarks on classical extreme value analysis and plotting positions are included in (Hazen 1914;Gringorten 1963;Jenkinson 1969;Harris 1996;Palutikof et al. 1999;Simiu et al. 2001;Folland and Anderson 2002;Cook et al. 2003;Rasmussen and Gautam 2003;Whalen et al. 2004;Cook and Harris 2004;McRobie 2004;Jordaan 2005;Kharin and Zwiers 2005;Kidson and Richards 2005). A comprehensive review of the main plotting positions can be found in Harter (1984).
In Sect. 2, a new graphical method is proposed that allows best linear unbiased estimation of location-scale distribution parameters. As an example, Sect. 3 exploits Monte Carlo simulation in the case of Gumbel parent distribution in order to confirm the unbiasedness of the resulting estimators of the distribution parameters as well as to compare the proposed solution to classical methods. In Sect. 4, critical data registered during the serious 1983-1984 bradyseismic crisis in Campi Flegrei (Italy) (Luongo 1986) shows the applicative advantage of the proposed method.

The plotting position
In general, by choosing suitable real constants A and B (Table 1), most of the plotting positions appeared in the literature are in the practical form  Blom 1958). It can be easily shown that (2) implies the following assumptionF which, if N is odd, includes the resultsF (N +1)/2 = 1/2, stated by Erto and Lepore (2013). The issue of determining a unique (distribution-free) plotting position formula has recently come to light again (Lozano-Aguilera et al. 2014;Erto and Lepore 2013;Makkonen 2008a, b). It is interesting to note that some of the arguments addressed in the above papers were already clear to Hahn and Shapiro (1967).
Most of the distribution-free plotting positions are essentially based on the median or the mean value of the cdf F X (X (i) ), which, apart from the parent distribution, can be shown to be a Beta random variable U (i) with probability density function pdf where a = i and b = N − i + 1. In particular, Makkonen (2008a) interprets the plotting position as the nonexceedance probability of the next observation in an order ranked sample P X ≤ X (i) and obtains ) which coincides with the classical distribution-free plotting position proposed by Gumbel (1958) widely known as the Weibull plotting position. This formula, also promoted by Makkonen (2008b) has given rise to a wide controversial discussion (de Haan 2007;Makkonen 2007Makkonen , 2011Cook 2011Cook , 2012Erto and Lepore 2013;Fuglem et al. 2013). However, independently from this controversy the following graphical method is focused only on achieving best linear unbiased estimators (BLUEs) of the location-scale parent distribution parameters.

Best linear unbiased estimators of location-scale distribution parameters from graphical method
If X (and then X (i) ) is a continuous location-scale random variable, we can introduce the standardized variable where a and b are the location and the non-negative scale parameters, respectively. In order to graphically estimate a and b through probability papers, the following regression model is assumed where the x (i) s are the observations of the order statistics X (i) s, y (i) = F −1 Z F i and ε i represents the error/residual. In the proposed graphical method, we assume y (i) = E Z (i) in accordance with a well-known approach (Cunnane 1978). However, differently from Cunnane (1978), we take into account that the covariance σ (X (i) ,X ( j) ) between X (i) and X ( j) is nonzero and the variances σ 2 X (i) = σ (X (i) ,X (i) ) of the X (i) 's are not equal. Note that for locationscale distributions, the covariance σ (X (i) ,X ( j) ) can be expressed in term of the covariance σ (i, j) between Z (i) and Z ( j) as follows Therefore, the covariance matrix of the error ε = [ε 1 , . . . , (1,N ) . . . σ (i, j) . . .
is symmetrical, has nonzero off-diagonal elements and different diagonal elements. Apart from the unknown constant b 2 , V represents the covariance structure among the errors and can be shown to be non-singular and positive definite.
In matrix notation, being X = x (1) . . . x (N ) , the regression model can be expressed as where θ = (a, b) and the n × 2 matrix Therefore we propose to utilize the generalized least-squares solution to the regression model (7)θ which can be shown to be the BLUEs of θ (Lieblein 1953;Draper and Smith 1981) and that the variance matrix ofθ can be expressed as Now it is clear that the Cunnane (1978) plotting position approach, recently encouraged by Hong and Li (2013) and Fuglem et al. (2013), does not allow for BLUEs of location-scale distribution parameters because the generalized least-squares method is not applied to the regression model (7). Unfortunately, in many cases the solution (12) is too complex to be analytically evaluated (see, e.g., Lieblein and Salzer (1957) and, even when the sample size is not dramatically small, the ordinary least-squares method cannot be used for practical estimations such as the return period [see conclusion 4 by (Cunnane 1978) and motives 1-2 by (Lozano-Aguilera et al. 2014)].
To overcome this problem, we propose to use the k-th order Taylor polynomial of in the matrix (11) and in the matrix (9). Note that the Taylor polynomial (16) is obtained by using the results of David and Johnson (1954). From a practical point of view, we found that a higher k value does not offer any significant advantage for a sample size N ≥ 10. However, it is always possible to calculate the matrices V and A (and their inverses) by Monte Carlo method. The Weibull plotting position proposed by Gumbel (1958) (coincides with the first term of the Taylor polynomial (15). Moreover, let us remark that the plotting positions proposed in the past decades (Table 1)-generally in the form (1) or (2)-are different formulas used to obtain approximations for E Z (i) (see e.g., Gringorten 1963;Cunnane 1978;Guo 1990).

A new Gumbel probability paper
Since the graphical estimatorsâ andb of location-scale distribution parameters are linear and equivariant (Erto 1981), the quantities are parameter-free (Lawless 1978). In order to compare bias and efficiency of the estimatorsâ andb, note that the Root Mean Square Deviation (RMSD) and the bias modulus of the estimatorsâ andb can be expressed as follows Therefore, it is sufficient to compare BIAS and RMSD for b = 1. As an example, M = 10 5 pseudo-random samples of size n = 5, 10, 30 are drawn from the Gumbel parent distribution (cdf) which will be used for the critical application of the next section. The RMSD and the BIAS modulus of the proposed estimators (12) of location and scale parameters are compared (Tables 2, 3) with the usual estimators obtained through the ordinary leastsquare method (i.e., σ (i, j) = σ if i = j and zero otherwise) and the classical plotting positions (Table 1) as well as with the Maximum Likelihood Estimators (MLEs). The attained results clearly show that only the proposed graphical estimators are unbiased [as k goes to infinity, see (15) and (16)] at each sample size. Their efficiency is higher than the classical graphical ones for both location and scale parameters. However, the latter result can be not true in general, and it could be theoretically possible to find more efficient biased solutions. Consequently, the resulting Gumbel probability paper does not suffer from the typical bias related to the classical probability papers. This is relevant especially for small sample sizes.

A critical application: the Pozzuoli's bradyseism
Campi Flegrei is a large volcanic complex located west of the city of Naples, around the town of Pozzuoli Italy. During the 1983-1984 bradyseismic crisis (slow vertical ground uplift) a total seismic energy of about 4 · 1013J (Lima et al. 2009) was released. The ground uplift and continuous seismic activity diffused highly unsettling emotions and the conviction that a volcanic explosion was going to happen. The "scientific" proof of this upcoming event was given by the Mogi's model (Mogi 1958). This model explains the uplift of a volcanic area as the consequence of the instability due to the increasing pressure in the underlying magma that tries to reach the surface. The event induced  city managers to order a devastating full-scale evacuation of the area. The alternative hypothesis, that explained the ground movement as the consequence of the specific thermo-fluid-dynamics activity of the subsoil of the Campi Flegrei area (Casertano et al. 1976), was immediately abandoned. Probably, the careful consideration of -The time stability of the earthquakes' magnitude -The complete independence of both levels and times of the magnitudes from the focus depths of the corresponding earthquakes should have been enough to judge the hypothesis of an ascending magmatic intrusion to be unlikely. In fact, that would have caused ascending rock fractures and consequent ascending earthquake focuses (with time decreasing depths). In the "Appendix", the magnitudes x i (greater than or equal to 1) registered from July 1983 to July 1984 are reported and grouped by lunar month (labelled by I,…,  Table 4) and the corresponding R 2 (Buse 1979) and modified Anderson-Darling (D'Agostino and Stephens 1986) statistics  Table 4) because of the high correlation among bradyseism and short and long period tidal components (Casertano et al. 1976).
For each lunar month, the log-magnitudes greater than or equal to 1 are analysed in Fig. 1 via the proposed Gumbel probability paper (i.e., by using the proposed plotting positions (15) and the generalized least-squares method). The corresponding R 2 statistics (Buse 1979) are calculated for each month in order to give a measure of the goodness-of-fit of the Gumbel distribution. Around the regression lineâ +by, the following approximate confidence intervals at level 1 − α = 0.98 are reported on each probability paper (Fig. 1) x =â +b y ± t N −2;α/2 Y A V −1 A −1 Y (23) where Y = [1 y] and t v; p is the 100 p-th percentile of the t-distribution with v degrees of freedom. From the second probability paper on, it is also plotted a bold reference line withθ obtained on the basis of the cumulative sample of all the previous month(s). Since the monthly confidence intervals always include the reference line, the hypothesis of earthquakes' magnitude stability cannot be rejected and can be graphically shown to non-statisticians in a very concise and informative way. In addition, it is worth to note that the estimated modest probability of a monthly magnitude X greater than 5 (Table 5), which in expert opinion is the critical threshold for concrete structures, could have helped to warn against the alarmism caused by the apocalyptical newspaper titles at the time (Gore and Mazzatenta 1984). This real scenario is one of the typical critical cases where an unbiased graphical analysis of the data can work as a "reliable" way to share statistical conclusions with non-statistician managers that have to utilize them to make grave decisions on territory and citizens.
However, analytical goodness-of-fit tests for the Gumbel distribution are carried out through the modified Anderson-Darling upper-tail test (D'Agostino and Stephens 1986) and by estimating the population (unknown) parameter estimators through (12). In particular, Table 6 reports the modified Anderson-Darling statistic values to test the goodness-of-fit of the Gumbel distribution for the log-magnitudes (greater than or equal to 1) at each lunar month. Since they are far smaller than the critical value 0.64 corresponding to a significance level 0.1 (D'Agostino and Stephens 1986), it is very likely that the data come from the hypothesized distribution. Moreover, the modified Anderson-Darling statistic values reported in Table 7 show that for each month, the log-magnitudes likely belongs to the Gumbel distribution with the population (unknown) parameters estimated on the basis of the cumulative sample of all the previous month(s).
Because further bradyseismic crisis are expected for next future, the above graphical analysis will surely be able to provide a strategic reference picture to which the new data can be compared as soon as they are collected.

Conclusions
On the basis of theoretical considerations, a new probability paper based on the generalized least-squares method is proposed. Correlation between order statistics and heteroscedasticity are taken into account. The resulting new graphical estimators are shown to be the best linear unbiased estimators (BLUEs) of location and scale parameters of the parent distribution. Consequently, the resulting population line does not suffer from the typical bias related to classical probability papers. This is relevant especially for small sample sizes. An approximate solution is also provided in order to overcome any computational issue and the bias introduced by such approximation can be made as small as needed.
As an example, for the Gumbel parent distribution, a Monte Carlo simulation confirms that the proposed graphical estimators outperform the usual estimators obtained through ordinary least-square method and classical plotting positions in terms of the bias modulus for all the considered sample sizes (n = 5, 10, 30). As the proposed estimators are BLUEs, this result is expected for every distribution in the location scale family even though it could be theoretically possible to find more efficient (in terms of root mean square deviation) but biased solutions. However, in the Gumbel case, the proposed graphical estimators show root mean square deviations that are comparable with those achieved by the corresponding maximum likelihood ones.
The attained results reduce the efficiency gap between probability papers and the concurrent analytical methods, so encouraging the use of graphical procedures. The latter are very strategic especially in critical applications where the visual representation of the results of statistical analysis are to be fully understood also by non-statisticians in order to make correct decisions.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.