1 Introduction

Bol'shev (1969) generalized the Chauvenet's test for rejecting outlier observations (see Bol'shev 1969; Voinov and Nikulin 1993, 1996). This method is suitable for detecting k outliers for univariate data set. The Chauvenet's test can be used for exponential case. Ibragimov and Khalna (1978) considered various modification of this test. Several authors considered the problem of testing one outlier in exponential distribution (Chikkagoudar and Kunchur 1983; Kabe 1970; Lewis and Fiellerm 1979; Likes 1966). Only two types of statistics for testing multiple outliers are exist. First is Dixon's while the second is based on the ratio of the some of the observations suspected to be outliers to the sum of all observations of the sample. Most of these authors have considered a general case of gamma model and the results for exponential model are given as a special case. This approach is focused on alternative models, namely slippage alternatives in exponential samples (see Barnett and Lewis 1978). Zerbet and Nikulin (2003) proposed a statistic which is different to the well-known Dixon's statistic \({D}_{k}\) to detect multiple outliers. In this paper, we generalize the statistics \({Z}_{k}\)\({Z}_{k}^{*}\) for detecting outliers in Rayligh distribution. Distribution of the test based on these statistics under slippage alternatives is obtained and the tables of critical values are given for various sample size n and number of outliers k. The power of these tests are also calculated and compared. The results show that the test based on statistic \({Z}_{k}^{*}\) is more powerful than the test based on statistic \({D}_{k}\).

2 Statistical inference

Let \({X}_{1},{X}_{2},\dots ,{X}_{n}\) are arbitrary independent random variables. In this paper, we want to test the hypothesis: \({{H}_{0}:X}_{1},{X}_{2},\dots ,{X}_{n}\) derive from a Rayligh distribution as.

$$\Pr \{ X \le x|H_{0} \} = F(x;0) = 1 - \exp \left( { - \frac{{x^{2} }}{\theta }} \right),\theta > 0,\,\theta \,{\text{is}}\,{\text{unknown}}$$

Therefore, the probability density function of these variables under null hypothesis is:

$$f_{x} (x;\theta ) = \frac{2}{\theta }x\exp \left( {\frac{{{\text{x}}^{{2}} }}{\theta }} \right)\,\,\theta > 0\,\,x > 0$$

But, under the slippage alternative Hk, we have:

$$X_{(1)} ,X_{(2)} , \ldots X_{(n - k)} \,{\text{derive}}\,{\text{from}}\,F(x;\theta )$$
$$F(x;\theta /\beta )\,{\text{derive}}\,{\text{from}}\,X_{(n - k + 1)} ,X_{(n - k + 2)} , \ldots X_{(n)} \,\,$$

where \(\beta \ge 1 , \beta\) is unknown and \({X}_{(1)},{X}_{(2)},\dots ,{X}_{(n)}\) are the order statistics corresponding to the observations\({X}_{1},{X}_{2},\dots ,{X}_{n}\). This hypothesis could be considered as an important sub-hypothesis of the one saying that k of n observations are suspected to be outliers (for \(\beta >1\), these k observations are called upper outliers). The hypothesis \({H}_{0}\) corresponds to the \(\beta =1\) To test\({H}_{0}\), we propose the following statistics:

$${Z}_{k}^{*}=\frac{{x}_{(n-k)}^{2} -{X}_{(1)}^{ 2}}{\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(1)}^{2})}$$

For \(m=1\), the above statistics (\({Z}_{k}^{*}\)) proposed by Zerbet and Nikulin (2003).

Following the idea of the Chauvenet's test, we assume that the decision criterion is: the hypothesis \({H}_{0}\) is rejected when \({Z}_{k}^{*}\) >\({z}_{c}\) with \({Z}_{c}={Z}_{c}(\alpha )\) being the critical value corresponding to the significance level \(\alpha\).

3 The distribution of the statistics \({Z}_{k}^{*}\) under alternatives

In this section, we find the distribution of the statistics \({Z}_{k}^{*}\), according to Zerbet and Nikulin (2003) method. Then the distributions of these statistics under the slippage alternative hypothesis \({H}_{k}\) are obtained by the following theorem.

Theorem 3.1.

The distribution of the statistic \({Z}_{k}^{*}\), under \({H}_{k}\) is given by:

$$pr\left\{{Z}_{k}^{*}<z\left|{H}_{k}\right.\right\}=\frac{{\left(-1\right)}^{n-k}\Gamma \left(k\beta +n-k\right)\Gamma \left(k+2\right) }{2{\beta }^{2}\Gamma \left(k\right)\Gamma \left(k\beta +1\right)}$$
$$\begin{gathered} \times \left( {\frac{z}{1 - kz}} \right)^{2} \sum\limits_{j = 2}^{n - k} {\frac{{( - 1)^{j} (k\beta + n - k - j + 1)}}{\Gamma (j - 1)\Gamma (n - j - k + 1)}} \hfill \\ X_{2} F_{1} (2,k + 1;3 - \frac{z(k\beta + n - k - j + 1)}{{(1 - kz)\beta }},\,0 < z \hfill \\ \end{gathered}$$
$$<\frac{1}{k}$$

where

$$({a}_{0},{a}_{1};{b}_{1};z)=\sum_{j=0}^{\infty }\frac{\Gamma ({a}_{0}+j)\Gamma ({a}_{1}+j)\Gamma ({b}_{1}) }{\Gamma \left({a}_{0}\right)\Gamma \left({a}_{1}\right)\Gamma ({b}_{1}+j)} \frac{{z}^{j}}{j!}$$

Proof.

To proof this theorem, we must obtain the distribution of the statistic \({Z}_{k}^{*}\) under the alternative hypothesis \({H}_{k}\).

We first compute the corresponding alternative distribution of the statistic:

$${U}_{(k)}=\frac{{X}_{(n-k)}^{2} -{X}_{(1)}^{ 2}}{\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(n-k)}^{2})}=\frac{V}{W} , k\ge 1$$

where \({V=X}_{(n-k)}^{2} -{X}_{(1)}^{ 2}\) and \(W=\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(n-k)}^{2})\).

let \({Y}_{j}={X}_{(j)}^{m} -{X}_{(j-1)}^{ m}\) we obviously obtain that:

\(\sum_{j-2}^{n-k}{Y}_{j}={X}_{(n-k)}^{2} -{X}_{(1)}^{2}\) and \(\sum_{j-n-k+1}^{n}{(n-j+1)Y}_{j}= {\sum_{j-n-k+1}^{n} (X}_{(j)}^{2} -{X}_{(n-k)}^{2}\) Then,

$$U=\frac{\sum_{j-2}^{n-k}{Y}_{J}}{\sum_{j-n-k+1}^{n}(n-j+1){Y}_{j}}=\frac{V}{W}$$

The characteristic function of \((v,w)\) is

$${\varphi }_{\left(v,w\right)}\left(t,z\right)=E\left({e}^{i\left({v}_{t}-{w}_{z}\right)}\right)$$
$$=E\left({e}^{i(\sum_{j=2}^{n-k}{Y}_{j}t + \sum_{j=n-k+1}^{n}(n-j+1){Y}_{j}z)}\right)$$
$$={e}^{i(\sum_{j=2}^{n-k}{Y}_{j}t}+\sum_{j=n-k+1}^{n}(n-j+1){Y}_{j}z)$$
$$\times {f}_{({Y}_{2},...,{Y}_{n})}({Y}_{2},...,{Y}_{n}){dy}_{2}...{dy}_{n}$$

Knowing that \({Y}_{i},j=\mathrm{1,2},\dots ,n-k\) follows the Rayligh distribution of parameters 1 and \(\theta (k\beta +n-k-j+1{)}^{-1}\), and \({Y}_{n-k+j},j=\mathrm{1,2},\dots ,k\) have the same distribution but with parameters 1 and \(\left(\frac{\theta }{\beta }\right)(k-j+1{)}^{-1}\)(see Chikkagoudar and Kunchur (1983)), then the characteristic function \(\phi {}_{(V,W)}\) is

$${\varphi }_{\left(v,w\right)}\left(t,z\right)={\int }_{0}^{+\infty }{e}^{it\sum_{j=2}^{n-k}{Y}_{i}}\left[\prod_{r=2}^{n-k}\frac{1}{{a}_{r}}{e}^{-\frac{{y}_{r}}{{a}_{r}}}\right]$$
$$\times {e}^{iz\sum_{j=n-k+1}^{n}\left(n-j+1\right){y}_{i}}\left[\prod_{r=1}^{k}\frac{1}{{b}_{r}}{e}^{-\frac{{y}_{n-k+r}}{{b}_{r}}}\right]{dy}_{2}\dots {dy}_{n}$$
$$\prod_{r=2}^{n-k}{\int }_{0}^{+\infty }\frac{1}{{a}_{j}} {e}^{-\frac{{y}_{i}}{{a}_{j}}+it{y}_{i}}{dy}_{j}$$
$$\times \prod_{j=1}^{k}{\int }_{0}^{+\infty }\frac{1}{{b}_{j}} {e}^{iz(k-j+1){y}_{n-k+j}{\frac{{y}_{n-k+j}}{{b}_{j}}}}{dy}_{n-k+j}$$
$$=\prod_{j=2}^{n-k}\left[{\int }_{0}^{+\infty }\frac{1}{{a}_{j}} {e}^{-{y}_{j}\left(\frac{1}{{a}_{j}}-it\right)}{dy}_{j}\right]$$
$$\times \prod_{j=1}^{k}\left[{\int }_{0}^{+\infty }\frac{1}{{b}_{j}} {e}^{-{y}_{n-k+j}(\frac{1}{{b}_{j}}-iz(k-j+1)}{dy}_{n-k+j}\right]$$

Therefore we have,

$${\varphi }_{\left(v,w\right)}\left(t,z\right)=\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}(\frac{1}{{a}_{j}}-it{)}^{-1}$$
$$\times \prod_{j=1}^{k}\frac{1}{{b}_{j}}(\frac{1}{{b}_{j}}iz(k-j+1{)}^{-1}$$

with \({\alpha }_{j}=\theta (k\beta +n-k-j+1{)}^{-1}\) and \({b}_{j}=(\frac{\theta }{\beta })(k-j+1{)}^{-1}\). Therefore, the joint density function of \((v,w)\) can be obtained as follows:

$${{f}_{\left(v,w\right)}(v,w)}_{=\frac{1}{{(2\pi )}^{2}}{\int }_{0}^{+\infty }{\int }_{0}^{+\infty }{\varphi }_{\left(v,w\right)}\left(t,z\right){e}^{-i\left(tv+zw\right)dtdz}}$$
$$=\frac{1}{{\left(2\pi \right)}^{2}}{\int }_{0}^{+\infty }\left[\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}{\left\{\frac{1}{{a}_{j}}-it\right\}}^{-1}{e}^{-itv}dt\right]$$
$$\times {\int }_{0}^{+\infty }\left[\prod_{j=1}^{k}\frac{1}{{b}_{j}}{\left\{\frac{1}{{b}_{j}}-iz(k-j+1)\right\}}^{-1}{e}^{-itw}dz\right]$$
(1)

To find the joint probability density function of variables \((v,w)\), we first calculate the following products:

$$\prod_{j=2}^{n-k}\frac{1}{\frac{1}{{a}_{j}}-it}=\sum_{j=2}^{n-k}\frac{{(-1)}^{n+j-k-1}{\theta }^{n-k-2}}{\left(it-\frac{1}{{a}_{j}}\right)\left(j-2\right)!(n-j-k)!}$$
$$=\frac{\Gamma (-k\beta -n+k+1+it){\theta }^{n-k-1}}{{\left(-1\right)}^{n-k+1}\Gamma (-k\beta +it\theta )}$$
(2)
$$\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}=\frac{{\left(-1\right)}^{n-k+1}\Gamma \left(-k\beta \right)}{{\theta }^{n-k-1}\Gamma \left(-k\beta -n+k+1\right)}$$
$$=\frac{\Gamma (k\beta +n-k)}{{\theta }^{n-k-1}\Gamma (k\beta +1)}$$
(3)
$$\prod_{j=1}^{k}\frac{1}{\frac{1}{{b}_{j}}-iz(k-j+1)}=\frac{1}{(\frac{\beta }{\theta }-{iz)}^{k}k!}$$
$$=\frac{\Gamma \left(-k\beta -n+k+1+it\theta \right){\theta }^{n-k-1}}{{\left(-1\right)}^{n-k+1}\Gamma \left(-k\beta +it\theta \right)}$$
(4)

And

$$\prod_{j=1}^{k}\frac{1}{{b}_{j}}=k!(\frac{\beta }{\theta }{)}^{k}$$
(5)

By replacement Eqs. 25 in Eq. (1), the joint pdf of variables \((v,w)\) will be as follows:

$${f}_{\left(v,w\right)}\left(v,w\right)=\sum_{j=2}^{n-k}\frac{{\left(-1\right)}^{n+j-k}\Gamma \left(k\beta +n-k\right)v{e}^{\frac{-v}{{a}_{j}}}{\beta }^{k-1}{w}^{k-1}{e}^{-w\frac{\beta }{\theta }}}{\Gamma \left(k\beta +1\right){\theta }^{k+2}\left(j-2\right)!\left(n-j-k\right)!\left(k-1\right)!}$$
(6)

in the process to find the joint pdf of variables (V,W), we know that

$${\int }_{0}^{+\infty }\frac{{e}^{-itv}}{it-\frac{1}{{a}_{j}}}dt=-2\pi {e}^{-v/{a}_{j}} , v,w>0, \theta >0,\beta \ge 1$$

As a conclusion, the pdf of \({U}_{k}\) is

$${f}_{{u}_{k}}\left(u\right)=\frac{{\left(-1\right)}^{n-k}u\Gamma \left(k+2\right)\Gamma \left(k\beta +n-k\right)}{\beta \theta\Gamma \left(k\right)\Gamma \left(k\beta +1\right)}$$
(7)
$$\times \sum_{j=2}^{n-k}\frac{{(-1)}^{j} {\beta }^{k+1}(k\beta +n-k-j+1)}{[\beta +u(k\beta +n-k-j+1){]}^{k+2}\Gamma (j-1)\Gamma (n-j-k+1)}$$

then,

$$pr\left\{{U}_{k} < u\right\}=\frac{{\left(-1\right)}^{n-k}{u}^{2}\left(k+2\right)\Gamma \left(k\beta +n-k\right)}{2{\beta }^{2}\Gamma \left({\rm k}\right)\Gamma \left({{\rm k}\beta }+1\right)}$$
(8)
$$\times \sum_{j=2}^{n-k}\frac{{\left(-1\right)}^{j}\left(k\beta +n-k-j+1\right)}{\Gamma \left(j-1\right)\Gamma \left(n-j-k+1\right)}$$
$${\times }_{2}{F}_{1}(2,k+1;3;-\frac{\left(k\beta +n-k-j+1\right)u}{\beta } , 0<u$$

Then the distribution function of \({Z}_{k}^{*}\) can be found from (1) using the relation

$$pr\left\{{Z}_{k}^{*}<z\left|{H}_{k}\right.\right\}=pr\left\{{u}_{k}<\frac{z}{1-kz}\left|{H}_{k}\right.\right\} , 0<z<1/k$$

and the proof is complete.

Corollary:

Under \({H}_{0}\) the distribution of statistic \({Z}_{k}^{*}\) is obtained from the Theorem 3.1 using \(\beta =1\).

4 Power comparison of the tests and conclusions

The critical values of the statistics \({Z}_{k}^{*}\) and \({D}_{k}\), for the significance levels of \(\alpha =0.05\) and \(\alpha =0.1\), for \(k=\mathrm{1,2},\dots\) such that \(k<n\), \(n=8\left(1\right)12\) is given in Tables 1 and 2, respectively. Meantime, the Dixon's statistics \({D}_{k}\) is given by

Table 1 Critical values of \({Z}_{k}^{*}\) for \(\alpha =0.05\) and \(\alpha =0.1\)
Table 2 Critical values of \({D}_{k}\) for \(\alpha =0.05\) and \(\alpha =0.1\)
$${D}_{k}=\frac{{X}_{(n)}-{X}_{(n-k)}}{{X}_{n}}$$

for more details about the distribution of the Dixon's statistic, see Likes (1966) and Chikkagoudar and Kunchur (1983).

According to Tables 1 and 2, we can see the critical value of \({Z}_{k}^{*}\) increases when n is increase. But, the critical value of \({D}_{k}\) decreases when n is increase.

Also, the critical value of \({Z}_{k}^{*}\) decreases when k is increase. But, the critical value of \({D}_{k}\) increases when k is increase.