A new statistic for detecting outliers in Rayligh distribution

Deiri, Einolah

doi:10.1007/s41872-020-00150-z

A new statistic for detecting outliers in Rayligh distribution

Original Research
Open access
Published: 23 September 2020

Volume 10, pages 135–138, (2021)
Cite this article

Download PDF

You have full access to this open access article

Life Cycle Reliability and Safety Engineering Aims and scope Submit manuscript

A new statistic for detecting outliers in Rayligh distribution

Download PDF

Einolah Deiri ORCID: orcid.org/0000-0002-1382-0994¹

1677 Accesses
Explore all metrics

Abstract

Zerbet and Nikulin (Commun Statist Theor Meth 32(3): 573–583, 2003) presented the new statistic ${Z}_{k}$ for detecting outliers in exponential distribution. They also compared this statistic with Dixon's statistic ${D}_{k}$. Jabbari et al. (Commun Statist Theor Meth 39(4): 698–706, 2010) expend this statistic (${Z}_{k}$) for Gamma distribution. In this paper, we generalize statistics ${Z}_{k}$–${Z}_{k}^{*}$, for detecting outliers in Rayligh distribution and compare the results with the generalized Dixon's statistic. Distribution of the test based on the statistic ${Z}_{k}^{*}$ under slippage alternatives is obtained. The criterion value and power of the new test are also calculated and compared with the criterion value of the Dixon's statistic. The results show that the test based on statistic ${Z}_{k}^{*}$ is more powerful than the test based on the statistic ${D}_{k}$.

Comparative Performance of Thirteen Single Outlier Discordancy Tests from Monte Carlo Simulations

On Outliers Detection for Location-Scale and Shape-Scale Families

Article 05 August 2017

Robust Estimation by Means of Scaled Bregman Power Distances. Part II. Extreme Values

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Bol'shev (1969) generalized the Chauvenet's test for rejecting outlier observations (see Bol'shev 1969; Voinov and Nikulin 1993, 1996). This method is suitable for detecting k outliers for univariate data set. The Chauvenet's test can be used for exponential case. Ibragimov and Khalna (1978) considered various modification of this test. Several authors considered the problem of testing one outlier in exponential distribution (Chikkagoudar and Kunchur 1983; Kabe 1970; Lewis and Fiellerm 1979; Likes 1966). Only two types of statistics for testing multiple outliers are exist. First is Dixon's while the second is based on the ratio of the some of the observations suspected to be outliers to the sum of all observations of the sample. Most of these authors have considered a general case of gamma model and the results for exponential model are given as a special case. This approach is focused on alternative models, namely slippage alternatives in exponential samples (see Barnett and Lewis 1978). Zerbet and Nikulin (2003) proposed a statistic which is different to the well-known Dixon's statistic ${D}_{k}$ to detect multiple outliers. In this paper, we generalize the statistics ${Z}_{k}$–${Z}_{k}^{*}$ for detecting outliers in Rayligh distribution. Distribution of the test based on these statistics under slippage alternatives is obtained and the tables of critical values are given for various sample size n and number of outliers k. The power of these tests are also calculated and compared. The results show that the test based on statistic ${Z}_{k}^{*}$ is more powerful than the test based on statistic ${D}_{k}$.

2 Statistical inference

Let ${X}_{1},{X}_{2},\dots ,{X}_{n}$ are arbitrary independent random variables. In this paper, we want to test the hypothesis: ${{H}_{0}:X}_{1},{X}_{2},\dots ,{X}_{n}$ derive from a Rayligh distribution as.

$$\Pr \{ X \le x|H_{0} \} = F(x;0) = 1 - \exp \left( { - \frac{{x^{2} }}{\theta }} \right),\theta > 0,\,\theta \,{\text{is}}\,{\text{unknown}}$$

Therefore, the probability density function of these variables under null hypothesis is:

$$f_{x} (x;\theta ) = \frac{2}{\theta }x\exp \left( {\frac{{{\text{x}}^{{2}} }}{\theta }} \right)\,\,\theta > 0\,\,x > 0$$

But, under the slippage alternative H_k, we have:

$$X_{(1)} ,X_{(2)} , \ldots X_{(n - k)} \,{\text{derive}}\,{\text{from}}\,F(x;\theta )$$

$$F(x;\theta /\beta )\,{\text{derive}}\,{\text{from}}\,X_{(n - k + 1)} ,X_{(n - k + 2)} , \ldots X_{(n)} \,\,$$

where $\beta \ge 1 , \beta$ is unknown and ${X}_{(1)},{X}_{(2)},\dots ,{X}_{(n)}$ are the order statistics corresponding to the observations${X}_{1},{X}_{2},\dots ,{X}_{n}$. This hypothesis could be considered as an important sub-hypothesis of the one saying that k of n observations are suspected to be outliers (for $\beta >1$, these k observations are called upper outliers). The hypothesis ${H}_{0}$ corresponds to the $\beta =1$ To test${H}_{0}$, we propose the following statistics:

$${Z}_{k}^{*}=\frac{{x}_{(n-k)}^{2} -{X}_{(1)}^{ 2}}{\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(1)}^{2})}$$

For $m=1$, the above statistics (${Z}_{k}^{*}$) proposed by Zerbet and Nikulin (2003).

Following the idea of the Chauvenet's test, we assume that the decision criterion is: the hypothesis ${H}_{0}$ is rejected when ${Z}_{k}^{*}$ >${z}_{c}$ with ${Z}_{c}={Z}_{c}(\alpha )$ being the critical value corresponding to the significance level $\alpha$.

3 The distribution of the statistics ${Z}_{k}^{*}$ under alternatives

In this section, we find the distribution of the statistics ${Z}_{k}^{*}$, according to Zerbet and Nikulin (2003) method. Then the distributions of these statistics under the slippage alternative hypothesis ${H}_{k}$ are obtained by the following theorem.

Theorem 3.1.

The distribution of the statistic ${Z}_{k}^{*}$, under ${H}_{k}$ is given by:

$$pr\left\{{Z}_{k}^{*}<z\left|{H}_{k}\right.\right\}=\frac{{\left(-1\right)}^{n-k}\Gamma \left(k\beta +n-k\right)\Gamma \left(k+2\right) }{2{\beta }^{2}\Gamma \left(k\right)\Gamma \left(k\beta +1\right)}$$

$$\begin{gathered} \times \left( {\frac{z}{1 - kz}} \right)^{2} \sum\limits_{j = 2}^{n - k} {\frac{{( - 1)^{j} (k\beta + n - k - j + 1)}}{\Gamma (j - 1)\Gamma (n - j - k + 1)}} \hfill \\ X_{2} F_{1} (2,k + 1;3 - \frac{z(k\beta + n - k - j + 1)}{{(1 - kz)\beta }},\,0 < z \hfill \\ \end{gathered}$$

$$<\frac{1}{k}$$

where

$$({a}_{0},{a}_{1};{b}_{1};z)=\sum_{j=0}^{\infty }\frac{\Gamma ({a}_{0}+j)\Gamma ({a}_{1}+j)\Gamma ({b}_{1}) }{\Gamma \left({a}_{0}\right)\Gamma \left({a}_{1}\right)\Gamma ({b}_{1}+j)} \frac{{z}^{j}}{j!}$$

Proof.

To proof this theorem, we must obtain the distribution of the statistic ${Z}_{k}^{*}$ under the alternative hypothesis ${H}_{k}$.

We first compute the corresponding alternative distribution of the statistic:

$${U}_{(k)}=\frac{{X}_{(n-k)}^{2} -{X}_{(1)}^{ 2}}{\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(n-k)}^{2})}=\frac{V}{W} , k\ge 1$$

where ${V=X}_{(n-k)}^{2} -{X}_{(1)}^{ 2}$ and $W=\sum_{j-n-k+1}^{n}({X}_{(j)}^{2} -{X}_{(n-k)}^{2})$.

let ${Y}_{j}={X}_{(j)}^{m} -{X}_{(j-1)}^{ m}$ we obviously obtain that:

$\sum_{j-2}^{n-k}{Y}_{j}={X}_{(n-k)}^{2} -{X}_{(1)}^{2}$ and $\sum_{j-n-k+1}^{n}{(n-j+1)Y}_{j}= {\sum_{j-n-k+1}^{n} (X}_{(j)}^{2} -{X}_{(n-k)}^{2}$ Then,

$$U=\frac{\sum_{j-2}^{n-k}{Y}_{J}}{\sum_{j-n-k+1}^{n}(n-j+1){Y}_{j}}=\frac{V}{W}$$

The characteristic function of $(v,w)$ is

$${\varphi }_{\left(v,w\right)}\left(t,z\right)=E\left({e}^{i\left({v}_{t}-{w}_{z}\right)}\right)$$

$$=E\left({e}^{i(\sum_{j=2}^{n-k}{Y}_{j}t + \sum_{j=n-k+1}^{n}(n-j+1){Y}_{j}z)}\right)$$

$$={e}^{i(\sum_{j=2}^{n-k}{Y}_{j}t}+\sum_{j=n-k+1}^{n}(n-j+1){Y}_{j}z)$$

$$\times {f}_{({Y}_{2},...,{Y}_{n})}({Y}_{2},...,{Y}_{n}){dy}_{2}...{dy}_{n}$$

Knowing that ${Y}_{i},j=\mathrm{1,2},\dots ,n-k$ follows the Rayligh distribution of parameters 1 and $\theta (k\beta +n-k-j+1{)}^{-1}$, and ${Y}_{n-k+j},j=\mathrm{1,2},\dots ,k$ have the same distribution but with parameters 1 and $\left(\frac{\theta }{\beta }\right)(k-j+1{)}^{-1}$(see Chikkagoudar and Kunchur (1983)), then the characteristic function $\phi {}_{(V,W)}$ is

$${\varphi }_{\left(v,w\right)}\left(t,z\right)={\int }_{0}^{+\infty }{e}^{it\sum_{j=2}^{n-k}{Y}_{i}}\left[\prod_{r=2}^{n-k}\frac{1}{{a}_{r}}{e}^{-\frac{{y}_{r}}{{a}_{r}}}\right]$$

$$\times {e}^{iz\sum_{j=n-k+1}^{n}\left(n-j+1\right){y}_{i}}\left[\prod_{r=1}^{k}\frac{1}{{b}_{r}}{e}^{-\frac{{y}_{n-k+r}}{{b}_{r}}}\right]{dy}_{2}\dots {dy}_{n}$$

$$\prod_{r=2}^{n-k}{\int }_{0}^{+\infty }\frac{1}{{a}_{j}} {e}^{-\frac{{y}_{i}}{{a}_{j}}+it{y}_{i}}{dy}_{j}$$

$$\times \prod_{j=1}^{k}{\int }_{0}^{+\infty }\frac{1}{{b}_{j}} {e}^{iz(k-j+1){y}_{n-k+j}{\frac{{y}_{n-k+j}}{{b}_{j}}}}{dy}_{n-k+j}$$

$$=\prod_{j=2}^{n-k}\left[{\int }_{0}^{+\infty }\frac{1}{{a}_{j}} {e}^{-{y}_{j}\left(\frac{1}{{a}_{j}}-it\right)}{dy}_{j}\right]$$

$$\times \prod_{j=1}^{k}\left[{\int }_{0}^{+\infty }\frac{1}{{b}_{j}} {e}^{-{y}_{n-k+j}(\frac{1}{{b}_{j}}-iz(k-j+1)}{dy}_{n-k+j}\right]$$

Therefore we have,

$${\varphi }_{\left(v,w\right)}\left(t,z\right)=\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}(\frac{1}{{a}_{j}}-it{)}^{-1}$$

$$\times \prod_{j=1}^{k}\frac{1}{{b}_{j}}(\frac{1}{{b}_{j}}iz(k-j+1{)}^{-1}$$

with ${\alpha }_{j}=\theta (k\beta +n-k-j+1{)}^{-1}$ and ${b}_{j}=(\frac{\theta }{\beta })(k-j+1{)}^{-1}$. Therefore, the joint density function of $(v,w)$ can be obtained as follows:

$${{f}_{\left(v,w\right)}(v,w)}_{=\frac{1}{{(2\pi )}^{2}}{\int }_{0}^{+\infty }{\int }_{0}^{+\infty }{\varphi }_{\left(v,w\right)}\left(t,z\right){e}^{-i\left(tv+zw\right)dtdz}}$$

$$=\frac{1}{{\left(2\pi \right)}^{2}}{\int }_{0}^{+\infty }\left[\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}{\left\{\frac{1}{{a}_{j}}-it\right\}}^{-1}{e}^{-itv}dt\right]$$

$$\times {\int }_{0}^{+\infty }\left[\prod_{j=1}^{k}\frac{1}{{b}_{j}}{\left\{\frac{1}{{b}_{j}}-iz(k-j+1)\right\}}^{-1}{e}^{-itw}dz\right]$$

(1)

To find the joint probability density function of variables $(v,w)$, we first calculate the following products:

$$\prod_{j=2}^{n-k}\frac{1}{\frac{1}{{a}_{j}}-it}=\sum_{j=2}^{n-k}\frac{{(-1)}^{n+j-k-1}{\theta }^{n-k-2}}{\left(it-\frac{1}{{a}_{j}}\right)\left(j-2\right)!(n-j-k)!}$$

$$=\frac{\Gamma (-k\beta -n+k+1+it){\theta }^{n-k-1}}{{\left(-1\right)}^{n-k+1}\Gamma (-k\beta +it\theta )}$$

(2)

$$\prod_{j=2}^{n-k}\frac{1}{{a}_{j}}=\frac{{\left(-1\right)}^{n-k+1}\Gamma \left(-k\beta \right)}{{\theta }^{n-k-1}\Gamma \left(-k\beta -n+k+1\right)}$$

$$=\frac{\Gamma (k\beta +n-k)}{{\theta }^{n-k-1}\Gamma (k\beta +1)}$$

(3)

$$\prod_{j=1}^{k}\frac{1}{\frac{1}{{b}_{j}}-iz(k-j+1)}=\frac{1}{(\frac{\beta }{\theta }-{iz)}^{k}k!}$$

$$=\frac{\Gamma \left(-k\beta -n+k+1+it\theta \right){\theta }^{n-k-1}}{{\left(-1\right)}^{n-k+1}\Gamma \left(-k\beta +it\theta \right)}$$

(4)

And

$$\prod_{j=1}^{k}\frac{1}{{b}_{j}}=k!(\frac{\beta }{\theta }{)}^{k}$$

(5)

By replacement Eqs. 2–5 in Eq. (1), the joint pdf of variables $(v,w)$ will be as follows:

$${f}_{\left(v,w\right)}\left(v,w\right)=\sum_{j=2}^{n-k}\frac{{\left(-1\right)}^{n+j-k}\Gamma \left(k\beta +n-k\right)v{e}^{\frac{-v}{{a}_{j}}}{\beta }^{k-1}{w}^{k-1}{e}^{-w\frac{\beta }{\theta }}}{\Gamma \left(k\beta +1\right){\theta }^{k+2}\left(j-2\right)!\left(n-j-k\right)!\left(k-1\right)!}$$

(6)

in the process to find the joint pdf of variables (V,W), we know that

$${\int }_{0}^{+\infty }\frac{{e}^{-itv}}{it-\frac{1}{{a}_{j}}}dt=-2\pi {e}^{-v/{a}_{j}} , v,w>0, \theta >0,\beta \ge 1$$

As a conclusion, the pdf of ${U}_{k}$ is

$${f}_{{u}_{k}}\left(u\right)=\frac{{\left(-1\right)}^{n-k}u\Gamma \left(k+2\right)\Gamma \left(k\beta +n-k\right)}{\beta \theta\Gamma \left(k\right)\Gamma \left(k\beta +1\right)}$$

(7)

$$\times \sum_{j=2}^{n-k}\frac{{(-1)}^{j} {\beta }^{k+1}(k\beta +n-k-j+1)}{[\beta +u(k\beta +n-k-j+1){]}^{k+2}\Gamma (j-1)\Gamma (n-j-k+1)}$$

then,

$$pr\left\{{U}_{k} < u\right\}=\frac{{\left(-1\right)}^{n-k}{u}^{2}\left(k+2\right)\Gamma \left(k\beta +n-k\right)}{2{\beta }^{2}\Gamma \left({\rm k}\right)\Gamma \left({{\rm k}\beta }+1\right)}$$

(8)

$$\times \sum_{j=2}^{n-k}\frac{{\left(-1\right)}^{j}\left(k\beta +n-k-j+1\right)}{\Gamma \left(j-1\right)\Gamma \left(n-j-k+1\right)}$$

$${\times }_{2}{F}_{1}(2,k+1;3;-\frac{\left(k\beta +n-k-j+1\right)u}{\beta } , 0<u$$

Then the distribution function of ${Z}_{k}^{*}$ can be found from (1) using the relation

$$pr\left\{{Z}_{k}^{*}<z\left|{H}_{k}\right.\right\}=pr\left\{{u}_{k}<\frac{z}{1-kz}\left|{H}_{k}\right.\right\} , 0<z<1/k$$

and the proof is complete.

Corollary:

Under ${H}_{0}$ the distribution of statistic ${Z}_{k}^{*}$ is obtained from the Theorem 3.1 using $\beta =1$.

4 Power comparison of the tests and conclusions

The critical values of the statistics ${Z}_{k}^{*}$ and ${D}_{k}$, for the significance levels of $\alpha =0.05$ and $\alpha =0.1$, for $k=\mathrm{1,2},\dots$ such that $k<n$, $n=8\left(1\right)12$ is given in Tables 1 and 2, respectively. Meantime, the Dixon's statistics ${D}_{k}$ is given by

Table 1 Critical values of ${Z}_{k}^{*}$ for $\alpha =0.05$ and $\alpha =0.1$

Full size table

Table 2 Critical values of ${D}_{k}$ for $\alpha =0.05$ and $\alpha =0.1$

Full size table

$${D}_{k}=\frac{{X}_{(n)}-{X}_{(n-k)}}{{X}_{n}}$$

for more details about the distribution of the Dixon's statistic, see Likes (1966) and Chikkagoudar and Kunchur (1983).

According to Tables 1 and 2, we can see the critical value of ${Z}_{k}^{*}$ increases when n is increase. But, the critical value of ${D}_{k}$ decreases when n is increase.

Also, the critical value of ${Z}_{k}^{*}$ decreases when k is increase. But, the critical value of ${D}_{k}$ increases when k is increase.

References

Barnett V, Lewis T (1978) Outlier in statistical data. John Wiley and Sons Inc, New York
MATH Google Scholar
Bol'shev LN (1969) On tests for rejecting outlying observations. Trudy Inta prikladnoi Mat. Tblissi Gosudart univ 2:159–177 (In Russian)
Google Scholar
Bol'shev LN, Ubaidullaeva M (1974) Chauvinist ± test in the classical theory of errors. Theory Prob Appl 19:683–692
Article Google Scholar
Chikkagoudar MS, Kunchur SH (1983) Distribution of test statistics for multiple outliers in exponential samples. Comm Stat Theory and Meth 12:2127–2142
Article MathSciNet Google Scholar
Greenwood, and Nikulin PE (1996) A guide to chi-squared testing, John Wiley and Sons, Inc, New York
Ibrakimov IA, Khalna, (1978) Some asymptotic results concerning the Chauvenet test. Ter Veroyatnost i Primenen 23(3):593–597
MathSciNet Google Scholar
Jabbari Nooghabi M, Jabbari Nooghabi H, Nasiri P (2010) Detecting outliers in gamma distribution. Commun Statist Theor Meth 39(4):698–706
Article MathSciNet Google Scholar
Kabe DG (1970) Testing outliers from an exponential population. Metrika 15:15–18
Article Google Scholar
Laurent, and Andre G (1963) Conditional distribution of order statistics and distribution of the reduced ith order statistic of the exponential model. Ann Math Statist 34: 652-657
Lewis T, Fiellerm NRJ (1979) A recursive algorithm for null distribution for outliers: I. Gamma samples Technometrics 21:371–376
MathSciNet Google Scholar
Likes J (1966) Distribution of Dixon's statistics in the case of an exponential population. Metrika 11:46–54 (91, 96, 136, 198-200, 204, 209, 210)
Article MathSciNet Google Scholar
Voinov VG, Nikulin MN (1993) Unbaised estimators and their applications, 1. Kluwer Academic Publishers, Dordrecht
Book Google Scholar
Voinov VG, Nikulin MN (1996) Unbaised estimators and their applications, 2. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
Zerbet A, Nikulin MN (2003) A new statistic for detecting outliers in exponential case. Commun Statist Theor Meth 32(3):573–583
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Islamic Azad University, Qaemshahr Branch, Tehran, Mazandaran, Iran
Einolah Deiri

Authors

Einolah Deiri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Einolah Deiri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Deiri, E. A new statistic for detecting outliers in Rayligh distribution. Life Cycle Reliab Saf Eng 10, 135–138 (2021). https://doi.org/10.1007/s41872-020-00150-z

Download citation

Received: 18 January 2019
Accepted: 08 September 2020
Published: 23 September 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s41872-020-00150-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new statistic for detecting outliers in Rayligh distribution

Abstract

Similar content being viewed by others

Comparative Performance of Thirteen Single Outlier Discordancy Tests from Monte Carlo Simulations

On Outliers Detection for Location-Scale and Shape-Scale Families

Robust Estimation by Means of Scaled Bregman Power Distances. Part II. Extreme Values

1 Introduction

2 Statistical inference

3 The distribution of the statistics \({Z}_{k}^{*}\) under alternatives

Theorem 3.1.

Proof.

Corollary:

4 Power comparison of the tests and conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new statistic for detecting outliers in Rayligh distribution

Abstract

Similar content being viewed by others

Comparative Performance of Thirteen Single Outlier Discordancy Tests from Monte Carlo Simulations

On Outliers Detection for Location-Scale and Shape-Scale Families

Robust Estimation by Means of Scaled Bregman Power Distances. Part II. Extreme Values

1 Introduction

2 Statistical inference

3 The distribution of the statistics \({Z}_{k}^{*}\) under alternatives

Theorem 3.1.

Proof.

Corollary:

4 Power comparison of the tests and conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation