Abstract
In this paper, we propose an algorithm to derive the exact distributions of discordancy tests for exponential samples under the slippage alternative providing that their survival functions involve the linear combinations of independent and identically distributed exponential random variables with arbitrary real coefficients. In addition, we define the various performance measures in terms of conditional probabilities that the observed value of the test statistic exceeds the critical value given that the contaminants have the specific position numbers in the ordered sample. These make possible to calculate various performance measures of discordancy tests for the exponential samples to any desired degree of accuracy. For the purpose of illustration, we derive the distributions of the maximum likelihood ratio tests for testing single and multiple outliers in the exponential samples and then we calculate their performance measures accurately to six decimal places. Moreover, the definitions of the performance criteria are not restricted to the discordancy tests for exponential samples only, they are also equally applicable to the discordancy tests for samples from other distributions.
Similar content being viewed by others
References
Balasooriya U, Gadag V (1994) Tests for upper outliers in the two-parameter exponential distribution. J Stat Comput Simul 50:249–259
Barnett VA, Lewis T (1994) Outliers in statistical data. Wiley, Chichester
Chikkagoudar MS, Kunchur SM (1983) Distributions of test statistics for multiple outliers in exponential samples. Commun Stat 12:2127–2142
Chikkagoudar MS, Kunchur SM (1987) Comparison of many outlier procedures for exponential samples. Commun Stat 16:627–645
Dixon WJ (1950) Analysis of extreme values. Ann Math Stat 21:488–506
Dumitrescu MEB, Enchescu DN, Hristea FT (1994) On the performances of an outlier test in the case of the exponential distribution. Comput Stat Data Anal 17(2):119–127
Fieller N (2014) Multivariate outliers. Wiley, New York
Fisher RA (1929) Tests of significance in harmonic analysis. Proc R Stat Soc Ser A 125:54–59
Fung KY, Paul SR (1985) Comparison of outlier detection procedures in Weibull or extreme-value distribution. Commun Stat 14:895–917
Giraudeau B, Chastang C (1999) Two discordancy tests for location slippage and dispersion slippage outlier detection in agreement studies. Statistician 48:517–527
Hadi AS (1994) A modification of a method for the detection of outliers in multivariate samples. J Roy Stat Soc B 56:393–396
Hawkins DM (1972) Analysis of a slippage test for the chi squared distribution. S Afr Stat J 6:11–17
Hawkins DM (1980) Identification of outliers. Chapman and Hall, London
Hayes K, Kinsella T (2003) Spurious and non-spurious power in performance criteria for tests of discordancy. J R Stat Soc Ser D (Stat) 52:69–82
Huffer F (1988) Divided differences and the joint distribution of linear combinations of spacings. J Appl Probab 25:346–354
Huffer FW, Lin CT (2001) Computing the joint distribution of general linear combinations of spacings or exponential variates. Stat Sin 11:1141–1157
Huffer FW, Lin CT (2006) Linear combinations of spacings. In: Kotz S, Balakrishnan N, Read CB, Vidakovic B (eds) Encyclopedia of statistical sciences, Wiley, Hoboken, vol 12, pp 7866–7875
Jain RB, Pingel LA (1981) A procedure for estimating the number of outliers. Commun Stat 10:1029–1041
Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions. Wiley, New York
Kabe DG (1970) Testing outliers from an exponential population. Metrica 15:15–18
Kimber AC (1982) Tests for many outliers in an exponential sample. Appl Stat 31:263–271
Kimber AC (1988) Testing upper and lower outlier pairs in gamma samples. Commun Stat Simul 17:1055–1072
Kimber AC, Stevens HJ (1981) The null distribution of a test for two upper outliers in an exponential samples. Appl Stat 30:153–157
Kumar N (2013) A procedure for testing suspected observations. Stat Pap 54:471–478
Kumar N (2015) Testing of suspected observations in an exponential sample with unknown origin. Commun Stat 44:3668–3679
Kumar N, Lin CT (2017) Testing for multiple upper and lower outliers in an exponential sample. J Stat Comput Simul 87:870–881
Lalitha S, Kumar N (2012) Multiple outlier test for upper outliers in an exponential sample. J Appl Stat 39:1323–1330
Lewis T, Fieller NRJ (1979) A recursive algorithm for null distribution for outliers: I. gamma samples. Technometrics 21:371–376
Lin CT, Balakrishnan N (2009) Exact computation of the null distribution of a test for multiple outliers in an exponential sample. Comput Stat Data Anal 53:3281–3290
Lin CT, Wang SC (2015) Discordancy tests for two-parameter exponential samples. Stat Pap 56:569–582
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
Prescott P (1979) Critical values for a sequential test for many outliers. Appl Stat 28:36–39
Rosner B (1975) On the detection of many outliers. Technometrics 17:221–227
Zerbet A, Nikulin M (2003) A new statistic for detecting outliers in exponential case. Commun Stat 32:573–583
Zhang J (1998) Tests for multiple upper or lower outliers in an exponential sample. J Appl Stat 25:245–255
Acknowledgements
The author would like to thank two anonymous reviewers and the editor for their helpful and constructive comments that have improved the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Exact distribution of MLR test for testing a single outlier for the sample size \(n=3\)
Consider a single outlier problem with \(n=3\). The probabilities \(\Pr [T>d|B(r)]\) (\(r=1,2,3)\) in (2) can be written as
Using (1), we first calculate
where \(\mathbf {A}=((1-3d)/(b+2),(1-2d)/(b+1),(1-d)/(b))\). Clearly, \(\Pr [T>d|B(3)]\) is 1 for \(0<d\le 1/3\), and 0 for \(d>1\). Letting \(\mathbf {c}=(0,(b+1)(1-d)/[bd+(1-d)], -b(1-2d)/[bd+(1-d)])'\), we have \(\mathbf {Ac}=0\). Now using recursion from (4) and the properties of deleting zero entries and renumbering the random variables again, we have
The same process is applied to the obtained two probabilities with \(\mathbf {A_1}=((1-3d)/(b+2),(1-d)/b)\) and \(\mathbf {A_2}=((1-3d)/(b+2),(1-2d)/(b+1))\). Taking \(\mathbf {c_1}=((b+2)(1-d)/[2(bd+(1-d))], -b(1-3d)/[2(bd+(1-d))])'\) to calculate \(\Pr (\mathbf {A_1 Z}>0)\) and \(\mathbf {c_2}=((b+2)(1-2d)/[(bd+(1-d))], -(b+1)(1-3d)/[(bd+(1-d))])'\) to calculate \(\Pr (\mathbf {A_2 Z}>0)\), it follows that
In a similar manner, we can obtain
and
From (6), we have
Now using (31–34), we get the distribution function of T under H for \(n=3\) which is
The density function of T for \(n=3\) can be obtained by differentiating (35) with respect to d. Also, the density function under the labelled slippage \(H_n\) can be obtained by differentiating \(1-\Pr [T>d|B(3)]\) with respect to d and is equivalent to the density obtained by Chikkagoudar and Kunchur (1983) for \(n=3\).
The critical value of test T for a single outlier testing can be obtained from (35) by substituting \(b=1\) for the significance level \(\alpha =0.05\) which yields to be \(d=0.8709\). Now, using (18–19), we can calculate the various performance measures. For example, for the size of the shift \(b=0.5\), the P, \(\textit{NSP}\), \(\textit{SP}\), \(\textit{NSE}\) and \(\textit{SE}\) are 0.0701, 0.0523, 0.0178, 0.4810 and 0.4488 respectively.
Exact distribution of MLR test for testing two outliers for the sample size \(n=3\)
Consider two outliers problem with \(n=3\). The probabilities \(\Pr [T_2>d|B(r,s)]\)\(((r,s)\in S^{(2)}=\{(1,2),(1,3),(2,3)\})\) (14) can be written as
Using (10), we first calculate
where \(\mathbf {A}=(((2-3d)/(2b+1),2(1-d)/(2b),(1-d)/b)\). Clearly, \(\Pr [T>d|B(1,2)]\) is 1 for \(0<d\le 2/3\) and 0 for \(d>1\). Letting \(\mathbf {c}=(-2(2b+1)(d-1)/(2b+d-bd), (3d-2)(b+1)/(2b+d-b d),0)'\), we have \(\mathbf {AC}=0\). Now using recursion from (4) and the properties of deleting zero entries and renumbering the random variables again, we have
The same process is applied to calculate the two probabilities with \(\mathbf {A_1}=(2(1-d)/(2b),(1-d)/b)\) and \(\mathbf {A_2}=((2-3d)/(2b+1),(1-d)/b)\). Note that \(\Pr (\mathbf {A_1 Z}>0)\) is 1 for \(0<d\le 1\). Taking \(\mathbf {c_2}=((2b+1)(1-d)/[(2b+2d-2bd-1)], (3d-2)/[(2b+2d-2bd-1)])'\) to calculate \(\Pr (\mathbf {A_2 Z}>0)\), it follows that
In similar manner, we can obtain
From (11), we have
Combining the probabilities in (36–38) and using (39), we obtain the distribution function of test statistic \(T_2\) as follows.
The null distribution of \(T_2\) can be obtained from (40) by letting \(b=1\). For the significance level \(\alpha =0.05\), the critical value for the test \(T_2\) can be calculated from the null distribution of \(T_2\) which is equal to \(d=0.991559\).
Now, using (22–24) and plugging (36–39) into them, we can calculate the various performance measures for the test. For example, for the size of the shift \(b=0.5\), the P, \(\textit{NSP}\), \(\textit{SP}\), \(\textit{SW}\), \(\textit{NSE}\), \(\textit{SE}\) and \(\textit{PSE}\) are 0.0579, 0.0329, 0, 0.0250, 0.4671, 0, 0.4750 and 0.0658 respectively.
Exact distribution of MLR sequential test for testing two outliers for the sample size \(n=3\)
To calculate the joint probability expression \(\Pr [U_1<d_1, U_2<d_2|B(2,3)]\), we first need to calculate the probabilities \(\Pr [U_1<d_1|B(2,3)]\) and \(\Pr [U_2<d_2|B(2,3)]\). Thus
and
where \(u^{2,3}_1 =2 b+1\), \(u^{2,3}_2 = 2 b\) and \(u^{2,3}_3 = b\) have been defined previously in (10).
Following the lines of arguments in calculating the probability in appendix A, we have
and
For \(0<d_1\le 1, 0<d_2\le 1\), the joint probability \(\Pr [U_1>d_1, U_2>d_2|B(2,3)]\) can be written as \(P(\mathbf {AZ}>0|B(2,3))\) using (10) and (14) where
and \(u^{2,3}_1 =2 b+1\), \(u^{2,3}_2 = 2 b\) and \(u^{2,3}_3 = b\).
Let \(\mathbf {c}=(-((2b + 1)(d_2 - 1))/(2bd_2 - d_2 + 1),(2b(2d_2 - 1))/(2bd_2 - d_2 + 1),0)'\), then \(\mathbf {AC}=((d_2 - d_1 + d_1 d_2)/(2b d_2 - d_2 + 1),0)'\) and using recursion (4), we get
where
and
Since \(1/2 < d_2 \le 1\), all the entries in second row of \(\mathbf {A_2}\) are less than or equal to 0 which implies that \(\Pr [\mathbf {A_2 Z} > 0|B(2,3)] = 0\). Moreover, all the entries in second row of \(\mathbf {A_1}\) are greater than or equal to 0 which implies that second row of \(\mathbf {A_1}\) can be deleted.
Thus, we have
Before proceeding further, we first check the value of \((d_2 - d_1 - d_1 d_2)/(2b d_2 - d_2 + 1)\). It can be easily shown that
where
The values of \(d_1\) and \(d_2\) are obtained such that under the null hypothesis \(H_0\), \(\Pr [U_1>d_1]=\Pr [U_2>d_2]=\beta \ \) to satisfy the condition \(\Pr [U_1<d_1,U_2<d_2]=1-\Pr [U_1>d_1]-\Pr [U_2>d_2]+\Pr [U_1>d_1,U_2>d_2]=1-\alpha \) where \(\alpha \) is the significance level. Note that under the null hypothesis the \(\Pr [U_1>d_1]\) and \(\Pr [U_2>d_2]\) are equivalent to the \(\Pr [U_1>d_1|B(2,3)]\) and \(\Pr [U_2>d_2|B(2,3)]\) respectively.
Using the recursion (4), we obtain
Combining the probabilities obtained in (41), (42) and (45), we can obtain the required probability using
It is also worthwhile to mention that the range of \(\beta \) can be determined using Bonferroni’s inequality (Lin and Balakrishnan 2009). Since under \(H_0\),
which gives \(\beta \ge \alpha /k\). In particular, when \(k=2\), we have \(\Pr [U_1>d_1,U_2>d_2|B(2,3)]=2\beta -\alpha \le 1\) which leads to have the range of \(\beta \) for testing two upper outliers as \(\alpha /k \le \beta \le (1+\alpha )/2\).
In similar way, in order to calculate the probability \(\Pr [U_1<d_1,U_2<d_2|B(1,2)]\), we can obtain
To calculate, \(\Pr [U_1<d_1,U_2<d_2|B(1,3)]\), we get
Finally, we can obtain the joint distribution of \((U_1,U_2)\) by plugging the probabilities in (41–42), (45–51) and (39) as follows
where \(S^{(2)}=\{(1,2), (1,3), (2,3)\}\).
Note that the exact critical values of the sequential tests for testing \(k=2\) upper outliers can be obtained by plugging the expressions of \(d_1\) in (43) and \(d_2\) in (44) into
where \(\alpha \) is the significance level.
Therefore, for \(\alpha =0.05\), we obtain the value of \(\beta =0.025427\) which yields the critical values to be \(d_1= 0.907936\) and \(d_2=0.983191\) using (43) and (44).
Once we calculate the critical values, we can compute the various performance measures discussed in Sect. 6 by plugging the probabilities in (41–51) and (39) for different size of shifts \(b<1\). For example, for \(b=0.5\), the P, \(\textit{NSP}\), \(\textit{SP}\), \(\textit{NSE}\), \({ PL}_1\), \({ PL}_2\) and \({ PL}_3\) are 0.00113, 0.00057, 0, 0.48379, 0.00008, 0.00048 and 0.01564 respectively.
Rights and permissions
About this article
Cite this article
Kumar, N. Exact distributions of tests of outliers for exponential samples. Stat Papers 60, 2031–2061 (2019). https://doi.org/10.1007/s00362-017-0908-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0908-6