Skip to main content

Statistical Methods and Their Applications

  • Chapter
  • First Online:
Introduction to Process Control

Part of the book series: Springer Texts in Business and Economics ((STBE))

  • 1868 Accesses

Abstract

■■■

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

  1. Sam Kash Kachigan, Multivariate Statistical Analysis: A Conceptual Introduction, 2nd Edition, ISBN-13: 978-0942154917

    Google Scholar 

  2. Alvin C. Rencher, William F. Christensen, Methods of Multivariate Analysis 3rd Edition, Wiley, ISBN-13: 978-0470178966

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Solutions

Solutions

1.1.1 Exercise 1.1: Problem 1

The following probabilities can be extracted from the given information. Note that fail represents the event of a machine tool failure, good represents the event that there is no machine tool failure, vibration represents that there was excessive vibration, and overheat represents the event of overheating.

$$ P(fail)=0.083 $$
$$ P(good)=0.917 $$
$$ P\left( vibration| fail\right)=\frac{16}{56}=0.286 $$
$$ P\left( vibration| good\right)=0.05 $$
$$ P\left( overheat| fail\right)=\frac{7}{56}=0.125 $$
$$ P\left( overheat| good\right)=0.1 $$

Given these initial probabilities, the conditional probability that there was a failure given an observed vibration can be calculated as follows.

$$ P\left( fail| vibration\right)=\frac{P\left( vibration\Big| fail\right)\bullet P(fail)}{P\left( vibration| fail\right)\bullet P(fail)+P\left( vibration\Big| good\right)\bullet P(good)} $$
$$ P\left( fail| vibration\right)=\frac{0.286\bullet 0.083}{0.286\bullet 0.083+0.05\bullet 0.917}=0.34 $$

Now, this conditional probability of failure is going to be used as the probability for failure in future calculations. Since the sum of all probabilities in a set must be one, the probability that the product is good must be 0.66. Now that the first event occurred and we have these new probabilities of failure, the probability of failure can be calculated given the next event.

$$ P\left( fail\Big| overheat\right)=\frac{P\left( overheat\Big| fail\right)\bullet P(fail)}{P\left( overheat| fail\right)\bullet P(fail)+P\left( overheat\Big| good\right)\bullet P(good)} $$
$$ P\left( fail| overheat\right)=\frac{0.125\bullet 0.34}{0.125\bullet 0.34+0.1\bullet 0.66}=0.392 $$

So, after both events occurred, the probability of a failure is 0.392.

1.1.2 Exercise 1.1: Problem 2

The following frequencies (probabilities) can be extracted from the Example data.

$$ P\left(A| poor\right)=\frac{136}{879}=0.155\kern0.5em P\left(B| poor\right)=\frac{177}{879}=0.201\kern0.5em P\left(C| poor\right)=\frac{83}{879}=0.094 $$
$$ P\left(A| good\right)\kern-0.2em =\frac{36}{4621}=0.0078\kern0.5em P\left(B| good\right)=\frac{81}{4621}=0.0175\kern0.5em P\left(C| good\right)\kern-0.1em =\frac{63}{4621}=0.0136 $$
$$ P(poor)=0.16\kern0.5em P(good)=0.84 $$

Now, the probability of poor quality given event A at 10 am:

$$ P\left( poor|A\right)=\frac{P\left(A\Big| poor\right)\bullet P(poor)}{P\left(A| poor\right)\bullet P(poor)+P\left(A| good\right)\bullet P(good)} $$
$$ P\left( poor|A\right)=\frac{0.155\bullet 0.16}{0.155\bullet 0.16+0.0078\bullet 0.84}=0.791 $$

The new probabilities are:

$$ P(poor)=0.791\ P(good)=0.209 $$

Now, the probability of poor quality given the consecutive event B at 2 pm:

$$ P\left( poor|B\right)=\frac{P\left(B\Big| poor\right)\bullet P(poor)}{P\left(B| poor\right)\bullet P(poor)+P\left(B| good\right)\bullet P(good)} $$
$$ P\left( poor|B\right)=\frac{0.201\bullet 0.791}{0.201\bullet 0.791+0.0175\bullet 0.209}=0.978 $$

The new probabilities are:

$$ P(poor)=0.978\ P(good)=0.022 $$

Now, the probability of poor quality given the consecutive event C at 4 pm:

$$ P\left( poor|C\right)=\frac{P\left(C\Big| poor\right)\bullet P(poor)}{P\left(C| poor\right)\bullet P(poor)+P\left(C| good\right)\bullet P(good)} $$
$$ P\left( poor|C\right)=\frac{0.094\bullet 0.978}{0.094\bullet 0.978+0.0136\bullet 0.022}=0.9986 $$

Given the three sequential events, the probability that there was a poor quality of product became 99.86 %.

1.1.3 Exercise 1.1: Problem 3

The probability of passing the requirements can be calculated for the original system as the area between two z-scores on a standard normal curve. The upper and lower bound z-scores are calculated as:

$$ {z_{pass}}^{+}=\frac{0.001-0}{0.0009}=1.111\ {z_{pass}}^{-}=\frac{-0.001-0}{0.0009}=-1.111 $$

The probability inside these bounds is 0.7335, or 73.4 %.

figure a

Now, to check the improvement from the controls added, the same process will be done with the statistical data for the controlled process.

$$ {z_{pass}}^{+}=\frac{0.001-0}{0.0003}=3.333\kern2em {z_{pass}}^{-}=\frac{-0.001-0}{0.0003}=-3.333 $$

The probability inside these bounds is 0.9991, or 99.91 %.

figure b

Given this information, we can see that the probability of passing the requirements jumped from 73.4 % to 99.91 % with the addition of controls. This means that there was a 25.61 % increase in the success rate of this procedure from the introduction of controls.

1.1.4 Exercise 1.1: Problem 4

In this problem, we are looking for the amount of time sufficient for 90 % of students to complete the reading task. For this, we will look at a normal distribution with a mean of 2.5 and a standard deviation of 0.6 min. The z-value corresponding to 90 % probability under is 1.282.

figure c

The reading time value associated with this z is 3.3 min. We can conclude that within 3.3 min, 90 % of students will be done reading one page.

1.1.5 Exercise 1.1: Problem 5

1.1.5.1 Part A

For a 90 % confidence interval and 17 students, the t-value for this calculation will be

$$ mean:\;t\left(\alpha, N\right)=t\left(.05,17\right)=1.740 $$
$$ stddev:\;t\left(\alpha, N-1\right)=t\left(.05,16\right)=1.746 $$

The 90 % confidence interval for mean:

$$ \varDelta =t\left(\alpha, N\right)\bullet \frac{\sigma_N}{\sqrt{N}}=1.740\bullet \frac{0.6}{\sqrt{17}}=0.253 $$
$$ P\left(\mu -\varDelta \le {\mu}_{TRUE}\le \mu +\varDelta \right)=90\% $$
$$ P\left(2.5-.253\le {\mu}_{TRUE}\le 2.5+.253\right)=90\% $$
$$ P\left(2.247\le {\mu}_{TRUE}\le 2.753\right)=90\% $$

The 90 % confidence interval for standard deviation:

$$ \varDelta =t\left(\alpha, N-1\right)\bullet \frac{\sigma_N}{\sqrt{2N}}=1.746\bullet \frac{0.6}{\sqrt{2\bullet 17}}=0.18 $$
$$ P\left(\sigma -\varDelta \le {\sigma}_{TRUE}\le \sigma +\varDelta \right)=90\% $$
$$ P\left(0.6-.18\le {\sigma}_{TRUE}\le 0.6+.18\right)=90\% $$
$$ P\left(0.42\le {\sigma}_{TRUE}\le 0.78\right)=90\% $$

For a 95 % confidence interval and 17 students, the t-value for this calculation will be

$$ mean:\;t\left(\alpha, N\right)=t\left(.025,17\right)=2.11 $$
$$ stddev:\;t\left(\alpha, N-1\right)=t\left(.025,16\right)=2.12 $$

The 95 % confidence interval for mean:

$$ \varDelta =t\left(\alpha, N\right)\bullet \frac{\sigma_N}{\sqrt{N}}=2.11\bullet \frac{0.6}{\sqrt{17}}=0.307 $$
$$ P\left(\mu -\varDelta \le {\mu}_{TRUE}\le \mu +\varDelta \right)=95\% $$
$$ P\left(2.5-.307\le {\mu}_{TRUE}\le 2.5+.307\right)=95\% $$
$$ P\left(2.193\le {\mu}_{TRUE}\le 2.807\right)=95\% $$

The 95 % confidence interval for standard deviation:

$$ \varDelta =t\left(\alpha, N-1\right)\bullet \frac{\sigma_N}{\sqrt{2N}}=2.12\bullet \frac{0.6}{\sqrt{2\bullet 17}}=.218 $$
$$ P\left(\sigma -\varDelta \le {\sigma}_{TRUE}\le \sigma +\varDelta \right)=95\% $$
$$ P\left(0.6-.218\le {\sigma}_{TRUE}\le 0.6+.218\right)=95\% $$
$$ P\left(0.382\le {\sigma}_{TRUE}\le 0.818\right)=95\% $$

1.1.5.2 Part B

Doubling the accuracy with a 90 % confidence interval would require the following N.

$$ \varDelta =0.253\to {\varDelta}_{NEW}=0.1265 $$

If we make our N = 63, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.669\bullet \frac{0.6}{\sqrt{63}}=0.126 $$

Doubling the accuracy of a 95 % confidence interval would require the following N.

$$ \varDelta =0.307\to {\varDelta}_{NEW}=0.1535 $$

If we make our N = 61, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.9996\bullet \frac{0.6}{\sqrt{61}}=0.1536 $$

1.1.5.3 Part C

Doubling the accuracy for a 90 % confidence interval would require the following N.

$$ \varDelta =0.18\to {\varDelta}_{NEW}=0.09 $$

If we make our N = 60, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=1.746\bullet \frac{0.6}{\sqrt{120}}=0.095 $$

Doubling the accuracy of a 95 % confidence interval would require the following N.

$$ \varDelta =0.218\to {\varDelta}_{NEW}=0.109 $$

If we make our N = 61, we can observe a doubling in our accuracy by a halving of our interval width.

$$ {\varDelta}_{NEW}=2.12\bullet \frac{0.6}{\sqrt{122}}=0.11 $$

1.1.6 Exercise 1.2: Problem 1

The correlation matrix was calculated with the following configuration:

$$ {R}_{xyz}=\left[\begin{array}{ccc}{r}_{xx}& {r}_{xy}& {r}_{xz}\\ {}{r}_{xy}& {r}_{yy}& {r}_{yz}\\ {}{r}_{xz}& {r}_{yz}& {r}_{zz}\end{array}\right] $$

In which the correlation coefficient for two variables, x and y, is defined as

$$ {r}_{xy}=\frac{1}{N\bullet {\sigma}_x\bullet {\sigma}_y}\bullet \sum_{n=1}^N\left[x(n)-\overline{x}\right]\bullet \left[y(n)-\overline{y}\right] $$

The correlation matrix for x, y, and z is:

$$ {R}_{xyz}=\left[\begin{array}{ccc}0.9967& 0.0251& 0.0719\\ {}0.0251& 0.9967& 0.6053\\ {}0.0719& 0.6053& 0.9967\end{array}\right] $$

Then, the statistical significance was evaluated by comparing the half-width if the confidence interval for the correlation coefficients to the coefficients themselves. The correlation coefficients were deemed significant of they were outside the confidence interval, meaning that the correlation coefficient was greater than the half-width of the interval. The half-widths of the intervals were calculated as

$$ {\varDelta}_{xy}=t\left(\alpha =.005,N=300\right)\bullet \frac{1-{r_{xy}}^2}{\sqrt{N}} $$
  • Δxy = 0.14956 and r xy is 0.02506, so the x-y correlation is not significant.

  • Δxz = 0.14888 and r xz is 0.071861, so the x-z correlation is not significant.

  • Δyz = 0.094818 and r yz is 0.60531, so the y-z correlation is significant .

1.1.7 Exercise 1.2: Problem 2

The multiple correlation coefficient Ry,xvz is:

$$ {R}_{y,xvz}=\sqrt{\frac{Det\left(\begin{array}{ccc}0.9967& 0.8146& \begin{array}{cc}0.0719& 0.0251\end{array}\\ {}0.8146& 0.9967& \begin{array}{cc}0.0822& 0.0577\end{array}\\ {}\begin{array}{c}0.0719\\ {}0.0251\end{array}& \begin{array}{c}0.0822\\ {}0.0577\end{array}& \begin{array}{cc}\begin{array}{c}0.9967\\ {}0.6053\end{array}& \begin{array}{c}0.6053\\ {}0.9967\end{array}\end{array}\end{array}\right)}{Det\left(\begin{array}{ccc}0.9967& 0.8146& 0.0719\\ {}0.8146& 0.9967& 0.0822\\ {}0.0719& 0.0822& 0.9967\end{array}\right)}}=0.7919 $$

1.1.8 Exercise 1.2: Problem 3

First, the array of X was ordered from minimum to maximum value and then split into two equal parts. The associated Z values were split respectively into two equal-length sets. The mean value for Z for each set was calculated separately. The difference of these two mean values was compared to the half-widths of their confidence intervals.

The half-widths were calculated as:

$$ {\varDelta}_{Z1}=t\left(\alpha =.025,N=50\right)\bullet \frac{\sigma_{Z1}}{\sqrt{N}}=2.01\bullet \frac{1.965}{\sqrt{50}}=0.5586 $$
$$ {\varDelta}_{Z2}=t\left(\alpha =.025,N=50\right)\bullet \frac{\sigma_{Z2}}{\sqrt{N}}=2.01\bullet \frac{2.0875}{\sqrt{50}}=0.5934 $$

The difference in mean values of Z 1 and Z 2 is 0.3387, and the half-width of the 95 % confidence intervals for Z sets are 0.5586 and 0.5934. This indicates that there is no evidence that value of variable X has an effect on mean value of variable Z.

1.1.9 Exercise 1.2: Problem 4

The cross-correlation function is a function calculated with respect to discrete interval m that varies as m=0,1,2,…,N/4. The value of this function is:

$$ {r}_{cross\_vw}=\frac{1}{\left(N-m\right)\bullet {\sigma}_v\bullet {\sigma}_w}\bullet \sum_{m=0}^{N/4}\sum_{i=1}^{N-m}\left[v(i)-\overline{v}\right]\bullet \left[w\left(i+m\right)-\overline{w}\right] $$

The resulting cross-correlation function is plotted below against m.

figure d

1.1.10 Exercise 1.2: Problem 5

A frequency analysis tool in MATLAB was used to break down the frequency spectrum of x(i).

figure e

The detected peaks are consistent with the frequencies and magnitudes of the sinusoidal components of the signal:

  • Peak at 0.016 Hz (0.10 radians/s) with amplitude 7,

  • Peak at 0.11 Hz (0.70 radians/s) with amplitude 2

  • Peak at 0.28 Hz (1.77 radians/s) with amplitude .5

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Skormin, V.A. (2016). Statistical Methods and Their Applications. In: Introduction to Process Control. Springer Texts in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-42258-9_1

Download citation

Publish with us

Policies and ethics