Improved Approximation Scales for Unreplicated Factorial Experiments

Assessing the sizes of active contrasts in un-replicated factorial and fractional factorial experiments by quick and powerful methods are required in analyzing the big data in various research areas of Human endeavors. One of the old methods based on Lenth (1989) is being used in some statistical and data analytical applications which is fast and less efficient. We propose a new class of tests which are simpler, faster, and more powerful using the location median-function (ψmedx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi }_{med}\left({\varvec{x}}\right)$$\end{document}) after being skipped one and/or two times. An empirical study of simulation experiments to compute the critical values, sizes and powers using various sample sizes demonstrate the superiority of our methods. The proposed methods are illustrated in examples which can be employed in various fields of research in conducting data analytics using high computing power and machine learning.


Introduction
In recent years, social and physical scientists are interested in searching a quick and powerful data analytics techniques based on computing technologies, information of thing (IoT), machine learning to analyze big data. For example, Salmaso et al. [16], suggested a two-step workflow where design of experiments (DOE) is conducted prior to the usual big data analytics and machine learning modeling phase. They show their DOE on an industrial application. Among others who pointed out the use of un-replicated factorials in the DOE and other areas are Deepa et al. [9] who give an interesting review of literature of the subject whereas Guerra-Zubiaga and Luong [10] address on the energy consumption parameter analysis of industrial robots using factorial DOE without due regards being given to factors and factor interactions which do exist in designing the DOE methodologies with powerful and speedy tests. This paper fills this gap in the literature and present some applications with various examples to demonstrate the proposed tests, sizes and tests power are better than their competitors like Lenth [14] and Aboukalam [2].
It is common practice in data analytics are that the 'response' in many practical experiments may be affected by multiple factors, but sometimes either the most factors or some factors interactions are not active in various research areas related to big data (see for example [5,16], Tsai [17], Hassan et al. (2022) 1 and Wosiak et al. [18] and references there in Deepa et al. (2022)). Considering the experimentation cost, time, effort, and/or limitation of data resources, missing values in the data, the researchers usually apply for estimating and testing main and interaction effect of each factor at 'two levels' and do not replicate experiments. Therefore, researchers employ un-replicated 2 p factorial experiments and use only one observation per treatment combination. There are many methods in the literature for analyzing unreplicated 2 p factorial experiments. For example, some of these methods are discussed in Lenth [14], Hamada and Balakrishnan [12], Loughin and Noble [15] and Haaland, O'Connell (1995) and Aboukalam [2] among others. In general, Lenth's method seems popular and are being used by various researcher. Aboukalam and his associates improved Lenth's method (see Aboukalam and Al-Shiha [1], Aboukalam [2] and Al-Shiha and Aboukalam [3] using two kinds of powers and size as numerical criteria for illustrating the comparison. Due to recent revolution in computing power in the presence of IoT and machine learning these methods are no more powerful as was reported in earlier literature. Now, due to advancement in computing power and freely availability of computing software's like R, the speed of quickness and complex computation is not an issue. This paper attempts to demonstrate how to use both powerful statistical tools to obtain better results. We employed Huber's skew symmetric function, , used in the location step to find a scale S which solves the one step equation of the scale-part of M-estimate 2 given below: where S 0 is an initial robust scale estimate and the constant B is chosen to be equal to E 2 to make the estimate consistent at the normality. The quick powerful methods of Al-Shiha and Aboukalam [3] and Aboukalam [2] used med (x) = Sign(x) (signal of x) the location median-function after being skipped one time and two times, respectively. Here, we aim at performing some useful approximations on the Eq. (1) with med (x) to obtain innovative forms in terms of quickness and powerful results. The improved forms use SKM1 and SKM2 due to that one and two skipping to med (x) are done, respectively. The improved form SKM2 enables us to generate a class of three competitive quick methods SKM2(k × A) k=1,2,3 given that A is a (1) constant. This task was not possible with the old form used in Aboukalam [2]. The results of SKM1 and SKM2(A) are to update the results of Al-Shiha and Aboukalam [3] and Aboukalam [2], and then compared with the old results without approximations. An empirical comparison showed that both results are comparable, which gives us confidence that approximations are successful and safe.
This paper clarifies that the ranking of the quick methods under study in terms of quality from lowest to highest is; Lenth, SKM1, SKM2(A), SKM2(2A) and SKM2(3A). The superiority of the new methods SKM2(2A), SKM2(3A) over Lenth may sometimes equals several times that of SKM2(A) over Lenth. Thus, the added value of these two new quick methods is remarkably high. Moreover, the study is considered as a list, in one place, of all quick methods of the author competing with Lenth' method. Eventually, necessary proofs are needed, and numerical simulation experiments are conducted under some given level of significances and four selected sample sizes often used in the field to compare the methods. Tables of critical points, sizes and powers are empirically computed.

Theoretical Objectives
Suppose that ̂ 1 ,̂ 2 , … ,̂ n are the best linear unbiased estimators (BLUE) of the main and interaction effects obtained from an un-replicated 2 p factorial experiment in the standard order, where the effects are denoted by 1 , 2 , … , n . Under the usual assumptions about the experimental errors: normality, independence, and a common unknown variance 2 , the estimates ̂ 1 ,̂ 2 , … ,̂ n are independently normally distributed random variables with a common unknown variance τ 2 = σ 2 ∕2 p and means 1 , 2 , … , n respectively. The statistical inference problem may be stated formally as follows. There are n normal distributions with unknown means 1 , 2 , … , n , and a common unknown varianceτ 2 . From each distribution, a single observation ̂ 1 3

Skipped Median (SKM) scale
Let ̂ 1 ,̂ 2 , … ,̂ n be n estimated effects from the scale model F ̂ ∕ , S 0 be an ini- (1) of one step scale part of M-estimates is given by: Indeed, we can produce simple forms of scales in using the location median-function: after being skipped one time (SKM1) at ∓2.5 as following: or two times (SKM2) at ∓1 and ∓2.5 as following: The chosen points ∓2.5 are taken to facilitate the comparison with S L the scale of Lenth.
Next, it is to approximate the expression (3) with SKM1 (x) and SKM2 (x).

Skipped Median Scale One Time (SKM1)
If SKM1 is used in expression (3), then: Assuming that: As simplifying is aimed, we neglect the constant n (n−1)×B and nothing is affected except the critical point, hence: 2n , then s SKM1 will take the simpler form: where A 0 = 0.75 n .

Skipped Median Scale Two Times (SKM2)
If SKM2 is used in expression (3), then: Assuming that: Also, neglecting n (n−1)×B affects only the critical point, hence: , then s SKM2 will take the simpler form: where;A = 1.5 8n = 0.1875 n . Overall, we propose the class of scales s SKM2(k×A) with three degrees k=1,2,3 of multiplying the constant A: The values 2 and 3 of the factor k can be seen as reflection of some suitable shapes of the used -function, or even no worry if they are very successful proposals.
Finally, to help the experimenter remember all the aforesaid quick scales we propose to read s SKM1 and s SKM2(kA) using the unified form: whereas Lenth's scale is read using the parallel way:

Critical Points of the Test Statistics
Scales are usually compared under similar probabilities (= ) of rejecting the null hypothesis H 0 ∶ j = 0 given that all the n effects j are not active, that is H 0 is true.  Table 1 in the appendix.

Numerical Criteria; Sizes and Powers
These five techniques, like Lenth, SKM1, SKM2(A), SKM2(2A) and SKM2(3A) will be assessed and compared in this study using numerical criteria. We intend to contribute toward the concept of power, defined as the probability of declaring an effect as active given that it is an active effect. The Power of the proposed test is accompanied with the size of the test. Here, the size we mean the probability of declaring an effect as active given that it is inactive. As is well known that the size must be close to the significance level when H 0 is true. Here we consider, is an error premium that is paid in the null case, and many experimenters may not care about it in the non-null case if it stays within the volume . In general, any technique would be the best if its powers are the superior than the competing tests and the sizes are the inferior. 3 Moreover, the decline in Powers indicates that the test losses some real active effects. This usually happens if the band S × cr( , n) becomes lager or the scale S has the deficiency of inflation. The increase in size indicates that the test losses some real inactive effects. This usually happens if the band becomes smaller or the scale S has the deficiency of shrinkage. Powers should be first looked at. Moreover, if two techniques have quite comparable powers, then the better is the technique whose sizes is lower. Because for this, the method with lower sizes will cause less false active alarms. Consequently, the further investigations will cost less.  Table 2 and plotted in Fig. 1. The behavior of the results as α = 0.20 will not be different.  For SKM2(3A), there is similarity in low shifts and the lesser is again otherwise. Briefly, the save from the deficiencies of the shrinkage or the inflation that may happen can be ranked from the lowest to the highest when using quick methods under study is as follows: Lenth, SKM1, SKM2(A), SKM2(2A) and SKM2(3A). The next applications highlight this property.

Applications
Application (1) Here, 2 effects are detected as actives, and nothing is false. SKM2 (2A)-scale: Here, 2 effects are detected as actives, and nothing is false. SKM2 (3A)-scale: Here, 2 effects are detected as actives, and nothing is false. Figure 3 displays the bands and the shrinkage deficiency of Lenth in comparing with SKM1 and SKM2.

Concluding Remarks
This paper proposes a new class of simpler, faster, efficient, and more powerful method of approximations using the location median-function ( med (x) ). The speed of approximation is achieved by skipping one and/or two times the simulation process. We observe that model based on SKM2(3A)-technique is far better than its competitors recorded in the literature including Lenth technique. The proposed technique is quite simpler, faster, and free from the deficiencies of the inflation and/or the shrinkage that may exist. The comparative analysis with previous studies is done in presenting mathematical proof with simulated tables of the critical values, sizes, and powers. The proposed methods are illustrated in examples to demonstrate its application in data analytics.