Statistical Methods and Their Applications

Skormin, Victor A.

doi:10.1007/978-3-319-42258-9_1

Victor A. Skormin²

Part of the book series: Springer Texts in Business and Economics ((STBE))

1868 Accesses

Abstract

■■■

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

Sam Kash Kachigan, Multivariate Statistical Analysis: A Conceptual Introduction, 2nd Edition, ISBN-13: 978-0942154917
Google Scholar
Alvin C. Rencher, William F. Christensen, Methods of Multivariate Analysis 3rd Edition, Wiley, ISBN-13: 978-0470178966
Google Scholar

Download references

Author information

Authors and Affiliations

T.J. Watson School of Engineering, Binghamton University, Binghamton, NY, USA
Victor A. Skormin

Authors

Victor A. Skormin
View author publications
You can also search for this author in PubMed Google Scholar

Solutions

1.1.1 Exercise 1.1: Problem 1

The following probabilities can be extracted from the given information. Note that fail represents the event of a machine tool failure, good represents the event that there is no machine tool failure, vibration represents that there was excessive vibration, and overheat represents the event of overheating.

$$ P(fail)=0.083 $$

$$ P(good)=0.917 $$

$$ P\left( vibration| fail\right)=\frac{16}{56}=0.286 $$

$$ P\left( vibration| good\right)=0.05 $$

$$ P\left( overheat| fail\right)=\frac{7}{56}=0.125 $$

$$ P\left( overheat| good\right)=0.1 $$

Given these initial probabilities, the conditional probability that there was a failure given an observed vibration can be calculated as follows.

$$ P\left( fail| vibration\right)=\frac{P\left( vibration\Big| fail\right)\bullet P(fail)}{P\left( vibration| fail\right)\bullet P(fail)+P\left( vibration\Big| good\right)\bullet P(good)} $$

$$ P\left( fail| vibration\right)=\frac{0.286\bullet 0.083}{0.286\bullet 0.083+0.05\bullet 0.917}=0.34 $$

Now, this conditional probability of failure is going to be used as the probability for failure in future calculations. Since the sum of all probabilities in a set must be one, the probability that the product is good must be 0.66. Now that the first event occurred and we have these new probabilities of failure, the probability of failure can be calculated given the next event.

$$ P\left( fail\Big| overheat\right)=\frac{P\left( overheat\Big| fail\right)\bullet P(fail)}{P\left( overheat| fail\right)\bullet P(fail)+P\left( overheat\Big| good\right)\bullet P(good)} $$

$$ P\left( fail| overheat\right)=\frac{0.125\bullet 0.34}{0.125\bullet 0.34+0.1\bullet 0.66}=0.392 $$

So, after both events occurred, the probability of a failure is 0.392.

1.1.2 Exercise 1.1: Problem 2

The following frequencies (probabilities) can be extracted from the Example data.

$$ P\left(A| poor\right)=\frac{136}{879}=0.155\kern0.5em P\left(B| poor\right)=\frac{177}{879}=0.201\kern0.5em P\left(C| poor\right)=\frac{83}{879}=0.094 $$

$$ P\left(A| good\right)\kern-0.2em =\frac{36}{4621}=0.0078\kern0.5em P\left(B| good\right)=\frac{81}{4621}=0.0175\kern0.5em P\left(C| good\right)\kern-0.1em =\frac{63}{4621}=0.0136 $$

$$ P(poor)=0.16\kern0.5em P(good)=0.84 $$

Now, the probability of poor quality given event A at 10 am:

$$ P\left( poor|A\right)=\frac{P\left(A\Big| poor\right)\bullet P(poor)}{P\left(A| poor\right)\bullet P(poor)+P\left(A| good\right)\bullet P(good)} $$

$$ P\left( poor|A\right)=\frac{0.155\bullet 0.16}{0.155\bullet 0.16+0.0078\bullet 0.84}=0.791 $$

The new probabilities are:

$$ P(poor)=0.791\ P(good)=0.209 $$

Now, the probability of poor quality given the consecutive event B at 2 pm:

$$ P\left( poor|B\right)=\frac{P\left(B\Big| poor\right)\bullet P(poor)}{P\left(B| poor\right)\bullet P(poor)+P\left(B| good\right)\bullet P(good)} $$

$$ P\left( poor|B\right)=\frac{0.201\bullet 0.791}{0.201\bullet 0.791+0.0175\bullet 0.209}=0.978 $$

The new probabilities are:

$$ P(poor)=0.978\ P(good)=0.022 $$

Now, the probability of poor quality given the consecutive event C at 4 pm:

$$ P\left( poor|C\right)=\frac{P\left(C\Big| poor\right)\bullet P(poor)}{P\left(C| poor\right)\bullet P(poor)+P\left(C| good\right)\bullet P(good)} $$

$$ P\left( poor|C\right)=\frac{0.094\bullet 0.978}{0.094\bullet 0.978+0.0136\bullet 0.022}=0.9986 $$

Given the three sequential events, the probability that there was a poor quality of product became 99.86 %.

1.1.3 Exercise 1.1: Problem 3

The probability of passing the requirements can be calculated for the original system as the area between two z-scores on a standard normal curve. The upper and lower bound z-scores are calculated as:

$$ {z_{pass}}^{+}=\frac{0.001-0}{0.0009}=1.111\ {z_{pass}}^{-}=\frac{-0.001-0}{0.0009}=-1.111 $$

The probability inside these bounds is 0.7335, or 73.4 %.

Now, to check the improvement from the controls added, the same process will be done with the statistical data for the controlled process.

$$ {z_{pass}}^{+}=\frac{0.001-0}{0.0003}=3.333\kern2em {z_{pass}}^{-}=\frac{-0.001-0}{0.0003}=-3.333 $$

The probability inside these bounds is 0.9991, or 99.91 %.

Given this information, we can see that the probability of passing the requirements jumped from 73.4 % to 99.91 % with the addition of controls. This means that there was a 25.61 % increase in the success rate of this procedure from the introduction of controls.

1.1.4 Exercise 1.1: Problem 4

In this problem, we are looking for the amount of time sufficient for 90 % of students to complete the reading task. For this, we will look at a normal distribution with a mean of 2.5 and a standard deviation of 0.6 min. The z-value corresponding to 90 % probability under is 1.282.

The reading time value associated with this z is 3.3 min. We can conclude that within 3.3 min, 90 % of students will be done reading one page.

1.1.5 Exercise 1.1: Problem 5

1.1.5.1 Part A

For a 90 % confidence interval and 17 students, the t-value for this calculation will be

$$ mean:\;t\left(\alpha, N\right)=t\left(.05,17\right)=1.740 $$

$$ stddev:\;t\left(\alpha, N-1\right)=t\left(.05,16\right)=1.746 $$

The 90 % confidence interval for mean:

$$ \varDelta =t\left(\alpha, N\right)\bullet \frac{\sigma_N}{\sqrt{N}}=1.740\bullet \frac{0.6}{\sqrt{17}}=0.253 $$

$$ P\left(\mu -\varDelta \le {\mu}_{TRUE}\le \mu +\varDelta \right)=90\% $$

$$ P\left(2.5-.253\le {\mu}_{TRUE}\le 2.5+.253\right)=90\% $$

$$ P\left(2.247\le {\mu}_{TRUE}\le 2.753\right)=90\% $$

The 90 % confidence interval for standard deviation:

$$ \varDelta =t\left(\alpha, N-1\right)\bullet \frac{\sigma_N}{\sqrt{2N}}=1.746\bullet \frac{0.6}{\sqrt{2\bullet 17}}=0.18 $$

$$ P\left(\sigma -\varDelta \le {\sigma}_{TRUE}\le \sigma +\varDelta \right)=90\% $$

$$ P\left(0.6-.18\le {\sigma}_{TRUE}\le 0.6+.18\right)=90\% $$

$$ P\left(0.42\le {\sigma}_{TRUE}\le 0.78\right)=90\% $$

For a 95 % confidence interval and 17 students, the t-value for this calculation will be

$$ mean:\;t\left(\alpha, N\right)=t\left(.025,17\right)=2.11 $$

$$ stddev:\;t\left(\alpha, N-1\right)=t\left(.025,16\right)=2.12 $$

The 95 % confidence interval for mean:

$$ \varDelta =t\left(\alpha, N\right)\bullet \frac{\sigma_N}{\sqrt{N}}=2.11\bullet \frac{0.6}{\sqrt{17}}=0.307 $$

$$ P\left(\mu -\varDelta \le {\mu}_{TRUE}\le \mu +\varDelta \right)=95\% $$

$$ P\left(2.5-.307\le {\mu}_{TRUE}\le 2.5+.307\right)=95\% $$

$$ P\left(2.193\le {\mu}_{TRUE}\le 2.807\right)=95\% $$

The 95 % confidence interval for standard deviation:

$$ \varDelta =t\left(\alpha, N-1\right)\bullet \frac{\sigma_N}{\sqrt{2N}}=2.12\bullet \frac{0.6}{\sqrt{2\bullet 17}}=.218 $$

$$ P\left(\sigma -\varDelta \le {\sigma}_{TRUE}\le \sigma +\varDelta \right)=95\% $$

$$ P\left(0.6-.218\le {\sigma}_{TRUE}\le 0.6+.218\right)=95\% $$

$$ P\left(0.382\le {\sigma}_{TRUE}\le 0.818\right)=95\% $$

1.1.5.2 Part B

Doubling the accuracy with a 90 % confidence interval would require the following N.

$$ \varDelta =0.253\to {\varDelta}_{NEW}=0.1265 $$

If we make our N = 63, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.669\bullet \frac{0.6}{\sqrt{63}}=0.126 $$

Doubling the accuracy of a 95 % confidence interval would require the following N.

$$ \varDelta =0.307\to {\varDelta}_{NEW}=0.1535 $$

If we make our N = 61, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.9996\bullet \frac{0.6}{\sqrt{61}}=0.1536 $$

1.1.5.3 Part C

Doubling the accuracy for a 90 % confidence interval would require the following N.

$$ \varDelta =0.18\to {\varDelta}_{NEW}=0.09 $$

If we make our N = 60, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.746\bullet \frac{0.6}{\sqrt{120}}=0.095 $$

Doubling the accuracy of a 95 % confidence interval would require the following N.

$$ \varDelta =0.218\to {\varDelta}_{NEW}=0.109 $$

If we make our N = 61, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=2.12\bullet \frac{0.6}{\sqrt{122}}=0.11 $$

1.1.6 Exercise 1.2: Problem 1

The correlation matrix was calculated with the following configuration:

$$ {R}_{xyz}=\left[\begin{array}{ccc}{r}_{xx}& {r}_{xy}& {r}_{xz}\\ {}{r}_{xy}& {r}_{yy}& {r}_{yz}\\ {}{r}_{xz}& {r}_{yz}& {r}_{zz}\end{array}\right] $$

In which the correlation coefficient for two variables, x and y, is defined as

$$ {r}_{xy}=\frac{1}{N\bullet {\sigma}_x\bullet {\sigma}_y}\bullet \sum_{n=1}^N\left[x(n)-\overline{x}\right]\bullet \left[y(n)-\overline{y}\right] $$

The correlation matrix for x, y, and z is:

$$ {R}_{xyz}=\left[\begin{array}{ccc}0.9967& 0.0251& 0.0719\\ {}0.0251& 0.9967& 0.6053\\ {}0.0719& 0.6053& 0.9967\end{array}\right] $$

Then, the statistical significance was evaluated by comparing the half-width if the confidence interval for the correlation coefficients to the coefficients themselves. The correlation coefficients were deemed significant of they were outside the confidence interval, meaning that the correlation coefficient was greater than the half-width of the interval. The half-widths of the intervals were calculated as

$$ {\varDelta}_{xy}=t\left(\alpha =.005,N=300\right)\bullet \frac{1-{r_{xy}}^2}{\sqrt{N}} $$

Δ_xy = 0.14956 and r _xy is 0.02506, so the x-y correlation is not significant.
Δ_xz = 0.14888 and r _xz is 0.071861, so the x-z correlation is not significant.
Δ_yz = 0.094818 and r _yz is 0.60531, so the y-z correlation is significant .

1.1.7 Exercise 1.2: Problem 2

The multiple correlation coefficient Ry,xvz is:

$$ {R}_{y,xvz}=\sqrt{\frac{Det\left(\begin{array}{ccc}0.9967& 0.8146& \begin{array}{cc}0.0719& 0.0251\end{array}\\ {}0.8146& 0.9967& \begin{array}{cc}0.0822& 0.0577\end{array}\\ {}\begin{array}{c}0.0719\\ {}0.0251\end{array}& \begin{array}{c}0.0822\\ {}0.0577\end{array}& \begin{array}{cc}\begin{array}{c}0.9967\\ {}0.6053\end{array}& \begin{array}{c}0.6053\\ {}0.9967\end{array}\end{array}\end{array}\right)}{Det\left(\begin{array}{ccc}0.9967& 0.8146& 0.0719\\ {}0.8146& 0.9967& 0.0822\\ {}0.0719& 0.0822& 0.9967\end{array}\right)}}=0.7919 $$

1.1.8 Exercise 1.2: Problem 3

First, the array of X was ordered from minimum to maximum value and then split into two equal parts. The associated Z values were split respectively into two equal-length sets. The mean value for Z for each set was calculated separately. The difference of these two mean values was compared to the half-widths of their confidence intervals.

The half-widths were calculated as:

$$ {\varDelta}_{Z1}=t\left(\alpha =.025,N=50\right)\bullet \frac{\sigma_{Z1}}{\sqrt{N}}=2.01\bullet \frac{1.965}{\sqrt{50}}=0.5586 $$

$$ {\varDelta}_{Z2}=t\left(\alpha =.025,N=50\right)\bullet \frac{\sigma_{Z2}}{\sqrt{N}}=2.01\bullet \frac{2.0875}{\sqrt{50}}=0.5934 $$

The difference in mean values of Z ₁ and Z ₂ is 0.3387, and the half-width of the 95 % confidence intervals for Z sets are 0.5586 and 0.5934. This indicates that there is no evidence that value of variable X has an effect on mean value of variable Z.

1.1.9 Exercise 1.2: Problem 4

The cross-correlation function is a function calculated with respect to discrete interval m that varies as m=0,1,2,…,N/4. The value of this function is:

$$ {r}_{cross\_vw}=\frac{1}{\left(N-m\right)\bullet {\sigma}_v\bullet {\sigma}_w}\bullet \sum_{m=0}^{N/4}\sum_{i=1}^{N-m}\left[v(i)-\overline{v}\right]\bullet \left[w\left(i+m\right)-\overline{w}\right] $$

The resulting cross-correlation function is plotted below against m.

1.1.10 Exercise 1.2: Problem 5

A frequency analysis tool in MATLAB was used to break down the frequency spectrum of x(i).

The detected peaks are consistent with the frequencies and magnitudes of the sinusoidal components of the signal:

Peak at 0.016 Hz (0.10 radians/s) with amplitude 7,
Peak at 0.11 Hz (0.70 radians/s) with amplitude 2
Peak at 0.28 Hz (1.77 radians/s) with amplitude .5

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skormin, V.A. (2016). Statistical Methods and Their Applications. In: Introduction to Process Control. Springer Texts in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-42258-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-42258-9_1
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42257-2
Online ISBN: 978-3-319-42258-9
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Statistical Methods and Their Applications

Abstract

Access this chapter

Bibliography

Author information

Authors and Affiliations

Solutions

Solutions

1.1.1 Exercise 1.1: Problem 1

1.1.2 Exercise 1.1: Problem 2

1.1.3 Exercise 1.1: Problem 3

1.1.4 Exercise 1.1: Problem 4

1.1.5 Exercise 1.1: Problem 5

1.1.5.1 Part A

1.1.5.2 Part B

1.1.5.3 Part C

1.1.6 Exercise 1.2: Problem 1

1.1.7 Exercise 1.2: Problem 2

1.1.8 Exercise 1.2: Problem 3

1.1.9 Exercise 1.2: Problem 4

1.1.10 Exercise 1.2: Problem 5

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation