Analysis of variance in uncertain environments
- 371 Downloads
Abstract
Testing one-way analysis of variance (ANOVA) is used for experimental data analysis in which there is a continuous response variable and a single independent classification variable. In this paper, we extend one-way ANOVA to a case where observed data are imprecise numbers rather than real numbers. Several fast computable formulas are calculated for symmetric triangular and normal fuzzy data. Similar to the classical testing ANOVA, the total observed variation in the response variable is explained as the sum of observed variation due to the effects of the classification variable and the observed variation due to random error. A real case is given to clarify the proposed method.
Keywords
Symmetric triangular fuzzy number Symmetric normal fuzzy number Arithmetic operations Testing hypotheses Analysis of varianceIntroduction and literature review
ANOVA
Analysis of variance (ANOVA) is concerned with analyzing variation in the means of several independent populations which developed by Ronald Fisher. The most important point in traditional ANOVA is a test about the significance of the difference among population means. This test permits the user to conclude whether the differences among the means of several populations are too deviated to be attributed to the sampling error or not [7]. The basic formulas of classical ANOVA model can be referred to any statistical book [1, 10, 19], but the classical ANOVA has been briefly reviewed in “Classical ANOVA” of this manuscript from [21].
In many applied sciences like geology, astrology, economics, medical sciences, engineering and environmental sciences, there exist actual cases where imprecise quantities can be assigned to experimental results. In such practical cases, non-precise numbers are suitable models to formalize and handle such imprecise quantities, which is the justification for require to fuzzy sets in ANOVA. The previous related works on fuzzy ANOVA are listed in the next subsection.
Fuzzy ANOVA
During the past three decades, several scientific works have been published in different fields of ANOVA using fuzzy set theory. These fields are as follow: one-way ANOVA testing procedure based on normal fuzzy random variables (FRVs) [18], asymptotic bootstrap ANOVA test for FRVs by one-sample [17], providing ANOVA method for the functional data in Hilbert space [2], asymptotic bootstrap ANOVA test for simple FRVs on the basis of multi-sample [7, 8], an optimization method for one-way ANOVA problem by the cuts of FRVs when observations are imprecise [29], investigating on one-way fuzzy ANOVA and comparing it with a regression model [4], processing method for ANOVA for the vague data using the moment correction [15], providing one-way ANOVA for triangular fuzzy numbers on the basis of the extension principle [21], developing one-way ANOVA for the closed interval observations [9], providing R package ‘SAFD’ for ANOVA test based on defuzzification of the FRVs [16], extending the classical ANOVA test statistics to investigate on fuzzy means of imprecise observation by bootstrap approach [25], providing ANOVA test based on least squares approach for FRVs [12], and proposing a decision-rule approach to test fuzzy hypotheses in ANOVA considering triangular imprecise observation and vague hypotheses [13].
Unlike these studies, another simple method for one-way ANOVA, as a meaningful extension on the classical ANOVA, is presented in this paper where the observations are symmetric triangular or normal fuzzy numbers. The proposed ANOVA method is on the basis of a distance between symmetric fuzzy numbers which is easy to be used by professional clients.
The organization of this paper is as follows. After introducing symmetric triangular and normal fuzzy numbers, some arithmetic operations for imprecise numbers are reviewed in “Preliminaries”. In “Classical ANOVA”, classical ANOVA is briefly explained from [21]. In “Analysis of variance based on fuzzy observations”, a new method for ANOVA test for fuzzy observations is provided and discussed. In “Fast computation formulas”, several fast computable formulas are presented for symmetric triangular and normal fuzzy data. A case study on several brands of soap is presented in “A case study on the process of soap production” which is revealing the ideas of this paper. Conclusion part is given in the final section.
Preliminaries
In this section, we review briefly some definitions which will be referred through this paper. Let X be a universal set and \(F(X)=\left\{ {A|A:X\rightarrow [0,1]} \right\} \). Any \(A\in F(X)\) is called a fuzzy set on X. In particular, let F(R) be the set of fuzzy numbers on R (the set of real numbers). The \(\alpha \)-cut of A is the crisp set given by \(A_\alpha =\{x\in X| A(x)\ge \alpha \}\), for any\(\alpha \in (0,1]\).
Definition 2.1
- (i)An especial case of fuzzy numbers called symmetric triangular fuzzy number (STFN) is defined byand it is denoted with \(T(a,s_a )\).$$\begin{aligned} \widetilde{T}(x)=\left\{ {\begin{array}{l} {(x-a+s_a )}/{s_a } \quad {\text {if}} \, a-s_a \le x<a \\ (a+{s_a -x)}/{s_a } \quad {\text {if}} \, a\le x<a+s_a \\ 0 \quad {\text {elsewhere,}} \\ \end{array}} \right. \end{aligned}$$(1)
- (ii)Another especial case of fuzzy numbers called normal fuzzy number (NFN) is defined byand it is shown with \(N(a,s_a )\). Here, \(a\in R\) and \(s_a \in R^{+}\) are named the core/mean and the spread of STFNs and NFNs, respectively. Moreover, \(F_T (R)\) and \(F_N (R)\) denote the sets of all STFNs and NFNs on R [23].$$\begin{aligned} \widetilde{N}(x)=\exp \left[ {-\left( {\frac{x-a}{s_a }} \right) ^{2}} \right] ,\hbox { } x\in R, s_a >0 \end{aligned}$$(2)
The following distance will be used through Sect. 4 to solve the problem of testing ANOVA based on fuzzy observations.
Definition 2.2
Remark 2.1
Theorem 2.1
Proof
For any \(\alpha \in (0,1]\),
\(\tilde{A}_\alpha =\left[ {a-s_a (1-\alpha ) , a+s_a (1-\alpha )} \right] \) and \(\tilde{B}_\alpha =[ b-s_b (1-\alpha ) , b+s_b (1-\alpha ) ]\).
Theorem 2.2
Proof
Classical ANOVA
Details of ANOVA
Source of variation | SS | Degrees of freedom | MS | F |
---|---|---|---|---|
Between treatments | \({\text {SSTR}}=\sum \limits _{i=1}^r {n_i (\overline{Y_{i.} } -\overline{Y_{..} } )^{2}} \) | \(r-1\) | \({\text {MSTR}}=\frac{\text {SSTR}}{r-1}\) | \(F=\frac{\text {MSTR}}{\text {MSE}}\sim F_{r-1,n_t -r} \) |
Within treatments (error) | \({\text {SSE}}=\sum \limits _{i=1}^r {\sum \limits _{j=1}^{n_i } {(Y_{ij} -\overline{Y_{i.} } )^{2}} } \) | \(n_t -r\) | \({\text {MSE}}=\frac{\text {SSE}}{n_t -r}\) | |
Total | \({\text {SST}}=\sum \limits _{i=1}^r {\sum \limits _{j=1}^{n_i } {(Y_{ij} -\overline{Y_{..} } )^{2}} } \) | \(n_t -1\) |
Analysis of variance based on fuzzy observations
Considering above discussion, in this section and hereafter, it is assumed that we are concerned with a classical analysis of variance, where the whole components of the problem such as random variables, hypothesis and parameters of populations are crisp as are introduced in “Classical ANOVA”. Therefore, the considered ANOVA model is \(Y_{ij} =\mu _i +\varepsilon _{ij} \), where \(\varepsilon _{ij} \sim N(0,\sigma ^{2})\) and so \(Y_{ij}^{\prime }\hbox {s}\) are ordinary random variables. But, just one point that will departed from classical ANOVA assumptions in the model (7) is that the sampled observations are fuzzy numbers rather than being real numbers and nothing else is altered in the prior testing ANOVA.
Remark 4.1
- (i)
Although the square of a LR fuzzy number can be computed by extension principle, the result is not necessarily in the class of LR fuzzy numbers too [23]. Hence, one is faced with calculating the membership function of the addition of fuzzy numbers, with not necessarily the same shape functions, based on the Zadeh’s extension principle.
- (ii)
Calculation of \(\widetilde{\text {sst}}\), \(\widetilde{\text {sstr}}\) and \(\widetilde{\text {sse}}\) by Zadeh’s extension principle, will lead us to a very vague observed statistic at the end of computations, since usually the experimenter is faced to \(\sum _{i=1}^r {n_i } \) fuzzy numbers in any testing ANOVA. In other words, as \(\sum _{i=1}^r {n_i } \) increases, then the support of \(\widetilde{\text {sst}}\), \(\widetilde{\text {sstr}}\) and \(\widetilde{\text {sse}}\) become larger and this means the vagueness of the observed fisher statistic. For instance, see the observed fisher statistic in Sect. 8.2 of [21] which is obtained via Zadeh’s extension principle approach.
- (iii)
Following (ii), the truth level of decision making can be low in many applied situations. See definition “the fuzziness of the decision” from [11].
The decision rule Let \(\tilde{f}\) be the observed value of the test statistic and \(F_{1-\alpha ;r-1,n_t -r} \) be the \(\alpha \)th quantile of the fisher distribution with \(r-1\) and \(n_t -r\) degrees of freedom. At the given significance level \(\alpha \), \(H_{0}\) is accepted if \(\tilde{f}\le F_{1-\alpha ;r-1,n_t -r} \); otherwise \(H_{1}\) is accepted.
Remark 4.2
Considering Remark 2.1, if the observed data for test are real numbers \(y_{ij} \) which can be formulated by indicator functions \(I_{\{y_{ij} \}} \) for \(i = 1,\ldots ,r\) and \(j = 1,\ldots ,n_i \), then all the introduced extended statistics in Eqs. (8)–(11) coincide to statistics of classical ANOVA.
\(p{\text {-value}}=P\left( {F>\tilde{f}} \right) \)
in which \(\tilde{f}\) is introduced by (11) as the observed amount of test statistic on the basis of fuzzy observations.
Fast computation formulas
ANOVA for symmetric triangular fuzzy observations
In this subsection, we assume all observations are STFNs. Then the observed values of statistics are obtained by the following theorems in testing ANOVA on the basis of STFNs.
Theorem 5.1
Proof
Theorem 5.2
Proof
ANOVA for normal fuzzy observations
In this subsection, we assume all observations are NFNs in testing ANOVA. Then the observed values of statistics in ANOVA model can be easily obtained by the following theorems.
Theorem 5.3
Proof
Remark 5.1
As a result of Eq. (19) for ANOVA based on normal fuzzy observations, not only increasing \(sst_y \)can be caused the increase of \(\widetilde{{\text {sst}}}\), but also increasing \({\text {sst}}_{s_y } \)can be caused the increase of \(\widetilde{{\text {sst}}}\). Similar results can be presented on the basis of formulas (19) and (20) in ANOVA based on normal fuzzy numbers, and also on the basis of formulas (12)–(14) for ANOVA based on symmetric triangular fuzzy numbers.
Theorem 5.4
Proof
Remark 5.2
In ANOVA test based on STFNs and NFNs, all the introduced extended statistics in Eqs. (12)–(24) reduce to presented statistics in Sect. 3 of [21] for classical ANOVA, where spreads of \(\tilde{y}_{ij} {\prime }\hbox {s}\) are a fixed number for \(i = 1,\ldots ,r\) and \(j = 1,\ldots ,n_i \).
Remark 5.3
Testing ANOVA discussed here is different from Wu’s approach [28]. His ANOVA method constructed on the basis of the cuts of FRVs, optimistic and pessimistic degrees by optimization. Also, the presented work in [21] is based on extension principle which can lead user to a very fuzzy decision by a very vague fisher statistic in practice. The major advantages of the presented method in this paper are efficiency for large sampling data and its simplicity in use by clients in the real situations.
In the next section, the presented ANOVA method will be exemplified for fuzzy observations by a relevant study.
A case study on the process of soap production
The aim of this study is to investigate on the amount of solubility for three different kinds of soaps in tepid water. The treatment factor, soap, has been selected to have three levels: regular, deodorant, and moisturizing brands, from the same manufactory. This example is extracted from an experiment done by Suyapa Silvia [3] where the data are fuzzified. It must be stressed that the condition of the presented ANOVA test/problem in this paper is exactly same as the presented ANOVA test in [21], but their solutions are different. Therefore, we do not present the story of experiment process to shorten the length of paper and we refer the readers to Sect. 8.2 of [21] for more details about gathering data. Now, we are going to respond to this question: Is there a significant difference in the weight loss (between three kinds of soaps) because of dissolution in water when allowed to soak for the same time?
Weight loss for soaps
Type of soap (i) | Fuzzy weight loss (grams) (j) | |||
---|---|---|---|---|
Regular | \( \tilde{y}_{14} =T(0.40,0.04)\) | \( \tilde{y}_{13} =T(-0.14,0.014)\) | \( \tilde{y}_{12} =T(-0.10,0.01)\) | \( \tilde{y}_{11} =T(-0.3,0.03)\) |
Deodorant | \( \tilde{y}_{24} =T(3.15,0.315)\) | \( \tilde{y}_{23} =T(2.41,0.241)\) | \( \tilde{y}_{22} =T(2.61,0.261)\) | \( \tilde{y}_{21} =T(2.63,0.263)\) |
Moisturizing brand | \( \tilde{y}_{34} =T(1.82,0.182)\) | \( \tilde{y}_{33} =T(2.26,0.226)\) | \( \tilde{y}_{32} =T(2.03,0.203)\) | \( \tilde{y}_{31} =T(1.86,0.186)\) |
Details of ANOVA for soap experiment
Source of variation | \(\widetilde{\text {ss}}\) | Degrees of freedom | \(\widetilde{\text {ms}}\) | \(\widetilde{f}\) |
---|---|---|---|---|
Between treatments | 4.036 | 2 | 2.018 | |
Within treatments (error) | 0.695 | 9 | 0.077 | 26.103 |
Total | 16.839 | 11 |
For the presented triangular fuzzy observations in Table 2, we wish to test whether the weight loss of three soaps are the same or not. In other words, we are going to test “\(H_0 :\mu _1 =\mu _2 =\mu _3 \)”, versus “\(H_1 :\)not all\(_{ }\mu _i \)’s are equal, for \( i=\)1, 2, 3”.
Conclusions and future works
In most applied sciences, there are several situations for which non-precise values can be assigned to their experiments. Fuzzy/non-precise numbers are suitable models to formalize the observed values in such situations and the observed data could be presented by the notion of fuzzy sets to analyze the experiment. The main contribution of this article is extending ANOVA test for fuzzy observations based on a distance between symmetric fuzzy numbers. Several fast computing formulas obtained where the observations are STFNs or NFNs. When all observations are real numbers, the presented ANOVA method reduces to the classical ANOVA approach, since the vagueness of the observed statistics is removed and what remained are only the central points of the statistics, as one can see in Theorems 5.1, 5.3 and Remark 4.2. The proposed ANOVA approach is a meaningful generalization for the classical ANOVA. Efficiency in large sampling is the major advantage of this method, and also this method is easy to be used by professional clients who are familiar with ANOVA, which are some of contributions of the presented manuscript.
For future research work, one can try to use the same approach of this paper to extend other experimental designs such as random block design, Latin square design, etc., where the observations are fuzzy numbers rather than being real numbers. Another interesting topic for future works is to extend the results of this paper for trapezoidal fuzzy numbers and non-symmetric fuzzy numbers, or in general for LR fuzzy numbers.
References
- 1.Cochran WG, Cox GM (1957) Experimental designs, 2nd edn. Wiley, New YorkMATHGoogle Scholar
- 2.Cuevas A, Febrero M, Fraiman R (2004) An ANOVA test for functional data. Comput Stat Data Anal 47:111–122MathSciNetCrossRefMATHGoogle Scholar
- 3.Dean A, Voss D (1999) Design and analysis of experiments. Springer, New YorkCrossRefMATHGoogle Scholar
- 4.De Garibay VG (1987) Behaviour of Fuzzy ANOVA. Kybernetes 16(2):107–112CrossRefGoogle Scholar
- 5.Dubois D, Prade H (1980) Fuzzy sets and systems: theory and application. Academic, New YorkMATHGoogle Scholar
- 6.Filzmoser P, Viertl R (2004) Testing hypotheses with fuzzy data: the fuzzy \(p\)-value. Metrika 59:21–29MathSciNetCrossRefMATHGoogle Scholar
- 7.Gil MA, Montenegro M, González-Rodríguez G, Colubi A, Casals MR (2006) Bootstrap approach to the multi-sample test of means with imprecise data. Comput Stat Data Anal 51(1):148–162MathSciNetCrossRefMATHGoogle Scholar
- 8.González-Rodríguez G, Colubi A, Gil MA (2011) Fuzzy data treated as functional data: a one-way ANOVA test approach. Comput Stat Data Anal 56:943–955MathSciNetCrossRefMATHGoogle Scholar
- 9.Hesamian G (2016) One-way ANOVA based on interval information. Int J Syst Sci 47(11):2682–2690MathSciNetCrossRefMATHGoogle Scholar
- 10.Hocking RR (1996) Methods and applications of linear models: regression and the analysis of variance. Wiley, New YorkGoogle Scholar
- 11.Ivani R, Sanaei Nejad SH, Ghahraman B, Astaraei AR, Feizi H (2016) Fuzzy analysis of variance and its practical application in agriculture. In: Kahraman C, Kabak O (eds) Fuzzy statistical decision-making: theory and applications. Studies in fuzziness and soft computing. Springer, Switzerland, pp 315–327Google Scholar
- 12.Jiryaei A, Parchami A, Mashinchi M (2013) One-way ANOVA and least squares method based on fuzzy random variables. Turk J Fuzzy Syst 4(1):18–33Google Scholar
- 13.Kalpanapriya D, Pandian P (2012) Fuzzy hypothesis testing of ANOVA model with fuzzy data. Int J Mod Eng Res 2:2951–2956MATHGoogle Scholar
- 14.Kaya I, Kahraman C (2011) Process capability analyses with fuzzy parameters. Expert Syst Appl 38(9):11918–11927CrossRefGoogle Scholar
- 15.Konishi M, Okuda T, Asai K (2006) Analysis of variance based on fuzzy interval data using moment correction method. Int J Innov Comput Inf Control 2(1):83–99Google Scholar
- 16.Lubiano MA, Trutschnig W (2010) ANOVA for fuzzy random variables using the R-package SAFD. Comb Soft Comput Stat Methods Data Anal Adv Intell Soft Comput 77:449–456Google Scholar
- 17.Montenegro M, Colubi A, Casals MR, Gil MA (2004) Asymptotic and bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59:31–49MathSciNetCrossRefMATHGoogle Scholar
- 18.Montenegro M, Gonzalez-Rodriguez G, Gil MA, Colubi A, Casals MR (2004) Introduction to ANOVA with fuzzy random variables. In: López-Díaz MC, Angeles Gil M, Grzegorzewski P, Hryniewicz O, Lawry J (eds) Soft Methodol Random Inf Syst. Springer, Berlin, pp 487–494CrossRefGoogle Scholar
- 19.Montgomery DC (1991) Design and analysis of experiments, 3rd edn. Wiley, New YorkMATHGoogle Scholar
- 20.Nguyen HT, Walker EA (2005) A first course in fuzzy logic, 3rd edn. Chapman Hall/CRC, ParisMATHGoogle Scholar
- 21.Nourbakhsh M, Parchami A, Mashinchi M (2013) Analysis of variance based on fuzzy observations. Int J Syst Sci 44(4):714–726MathSciNetCrossRefMATHGoogle Scholar
- 22.Parchami A, Ivani R, Mashinchi M, Kaya İ (2017) An implication of fuzzy ANOVA: metal uptake and transport by corn grown on a contaminated soil. Chemom Intell Lab Syst 164:56–63CrossRefGoogle Scholar
- 23.Parchami A, Sadeghpour-Gildeh B, Nourbakhsh M, Mashinchi M (2014) A new generation of process capability indices based on fuzzy measurements. J Appl Stat 41(5):1122–1136MathSciNetCrossRefMATHGoogle Scholar
- 24.Parchami A, Taheri SM, Mashinchi M (2012) Testing fuzzy hypotheses based on vague observations: a \(p\)-value approach. Stat Pap 53(2):469–484MathSciNetCrossRefMATHGoogle Scholar
- 25.Rodriguez G, Colubi A, Gil MA (2012) Fuzzy data treated as functional data: a one-way ANOVA test approach. Comput Stat Data Anal 56(4):943–955MathSciNetCrossRefMATHGoogle Scholar
- 26.Taheri SM, Arefi M (2009) Testing hypotheses based on fuzzy test statistic. Soft Comput 13:617–625CrossRefMATHGoogle Scholar
- 27.Viertl R (2011) Statistical methods for fuzzy data. Wiley, New YorkCrossRefMATHGoogle Scholar
- 28.Wu HC (2007) Analysis of variance for fuzzy data. Int J Syst Sci 38:235–246MathSciNetCrossRefMATHGoogle Scholar
- 29.Xu R, Li C (2001) Multidimensional least-squares fitting with a fuzzy model. Fuzzy Sets Syst 119:215–223MathSciNetCrossRefMATHGoogle Scholar
- 30.Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning—I. Inf Sci 8:199–249MathSciNetCrossRefMATHGoogle Scholar
- 31.Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–359CrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.