Skip to main content

Statistical Intervals Based on a Single Sample

  • Chapter
  • First Online:
Modern Mathematical Statistics with Applications

Part of the book series: Springer Texts in Statistics ((STS))

  • 17k Accesses

Abstract

A point estimate, because it is a single number, by itself provides no information about the precision and reliability of estimation. Consider, for example, using the statistic \( \overline{X} \) to calculate a point estimate for the true average breaking strength of a certain brand of paper towels, and suppose that \( \bar{x} \) = 9322.7 g. Because of sampling variability, it is virtually never the case that \( \bar{x} = \mu \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jay L. Devore .

Supplementary Exercises (71–92)

Supplementary Exercises (71–92)

  1. 71.

    A manufacturer of college textbooks is interested in estimating the strength of the bindings produced by a particular binding machine. Strength can be measured by recording the force required to pull the pages from the binding. If this force is measured in pounds, how many books should be tested to estimate the average force required to break the binding to within .1 lb with 95% confidence? Assume that σ is known to be .8.

  1. 72.

    According to the article “Fatigue Testing of Condoms” (Polymer Testing 2009: 567–571), “tests currently used for condoms are surrogates for the challenges they face in use,” including a test for holes, an inflation test, a package seal test, and tests of dimensions and lubricant quality (all fertile territory for the use of statistical methodology!). The investigators developed a new test that adds cyclic strain to a level well below breakage and determines the number of cycles to break. A sample of 20 condoms of one particular type resulted in a sample mean number of 1584 and a sample standard deviation of 607. Calculate and interpret a confidence interval at the 99% confidence level for the true average number of cycles to break. [Note: The article presented the results of hypothesis tests based on the t distribution; the validity of these depends on assuming normal population distributions.]

  1. 73.

    Before opening a new location, franchise companies conduct market research to determine if sufficient demand exists for their products. A national sandwich chain recently conducted a survey to investigate opening a franchise in a particular town. Among 300 households contacted through random-digit dialing, 198 respondents indicated they would patronize this shop.

  1. a.

    Let p = the proportion of all households in this town that would patronize the sandwich franchise. Calculate and interpret a 95% lower confidence bound for p.

  2. b.

    From years of marketing experience, the company knows they need more than 5000 households in the population to patronize the shop—this accounts for competing local businesses and variation in frequency of visitation by potential patrons. This particular town has 7700 households. Determine a 95% lower confidence bound for the number of households that will eat at the new store. Can the company be confident they will have enough customers?

  3. c.

    Imagine the company ignored sampling variability and simply used the sample proportion from the survey to determine the expected number of customers (rather than the lower confidence bound). Would that change their opinion regarding the viability of the new location? Explain.

  1. 74.

    The Pew Forum on Religion and Public Life reported on Dec. 9, 2009 that in a survey of 2003 American adults, 25% said they believed in astrology.

  1. a.

    Calculate and interpret a confidence interval at the 99% confidence level for the proportion of all adult Americans who believe in astrology.

  2. b.

    What sample size would be required for the width of a 99% CI to be at most .05 irrespective of the value of \( \hat{p} \)?

  3. c.

    The upper limit of the CI in (a) gives an upper confidence bound for the proportion being estimated. What is the corresponding confidence level?

  1. 75.

    There were 12 first-round heats in the men’s 100-m race at the 1996 Atlanta Summer Olympics. Here are the reaction times in seconds (time to first movement) of the top four finishers of each heat. The first 12 are the 12 winners, then the second-place finishers, and so on.

    1st

    .187

    .152

    .137

    .175

    .172

    .165

    .184

    .185

    .147

    .189

    .172

    .156

    2nd

    .168

    .140

    .214

    .163

    .202

    .173

    .175

    .154

    .160

    .169

    .148

    .144

    3rd

    .159

    .145

    .187

    .222

    .190

    .158

    .202

    .162

    .156

    .141

    .167

    .155

    4th

    .156

    .164

    .160

    .145

    .163

    .170

    .182

    .187

    .148

    .183

    .162

    .186

    Because reaction time has little if any relationship to the order of finish, it is reasonable to view the times as coming from a single population.

  1. a.

    Estimate the population mean in a way that conveys information about precision and reliability. [Note: \( \sum {x_{i} = 8.08100,\;} \sum {x_{i}^{2} = 1.37813.} \)]

  2. b.

    Calculate a 95% confidence interval for the population proportion of reaction times that are below .15. (Reaction times below .10 are regarded as false starts, meaning that the runner anticipates the starter’s gun, because such times are considered physically impossible. Linford Christie, who had a reaction time of .160 in placing second in his first-round heat, had two such false starts in the finals and was disqualified.)

  1. 76.

    Aphid infestation of fruit trees can be controlled either by spraying with pesticide or by inundation with ladybugs. In a particular area, four different groves of fruit trees are selected for experimentation. The first three groves are sprayed with pesticides 1, 2, and 3, respectively, and the fourth is treated with ladybugs, with the following results on yield:

    Treatment

    ni (number of trees)

    \( \bar{x}_{i} \) (bushels/tree)

    si

    1

    100

    10.5

    1.5

    2

    90

    10.0

    1.3

    3

    100

    10.1

    1.8

    4

    120

    10.7

    1.6

    Let μi  = the true average yield (bushels/tree) after receiving the ith treatment. Then

    $$ \theta = \frac{1}{3}(\mu_{1} + \mu_{2} + \mu_{3} ) - \mu_{4} $$

    measures the difference in true average yields between treatment with pesticides and treatment with ladybugs. When n1, n2, n3, and n4 are all large, the estimator \( \hat{\theta } \) obtained by replacing each μi by \( \overline{X}_{i} \) is approximately normal. Use this to derive a large-sample 100(1 − α)% CI for θ, and compute the 95% interval for the given data.

  1. 77.

    It is important that face masks used by firefighters be able to withstand high temperatures because firefighters commonly work in temperatures of 200–500 °F. In a test of one type of mask, 11 of 55 masks had lenses pop out at 250°. Construct a 90% CI for the true proportion of masks of this type whose lenses would pop out at 250°.

  1. 78.

    A journal article reports that a sample of size 5 was used as a basis for calculating a 95% CI for the true average natural frequency (Hz) of delaminated beams of a certain type. The resulting interval was (229.764, 233.504). You decide that a confidence level of 99% is more appropriate than the 95% level used. What are the limits of the 99% interval? [Hint: Use the center of the interval and its width to determine \( \bar{x} \) and s.]

  1. 79.

    The article “The Association Between Television Viewing and Irregular Sleep Schedules Among Children Less Than 3 Years of Age” (Pediatrics 2005: 851–856) reported the following 95% confidence intervals for average TV viewing time (hours per day) for three different age groups.

    0–11 months old

    12–23 months old

    24–35 months old

    (0.8, 1.0)

    (1.4, 1.8)

    (2.1, 2.5)

  1. a.

    Interpret each of these three intervals.

  2. b.

    The three intervals are not the same width. What might explain this?

  3. c.

    Do the intervals suggest a relationship between age and TV viewing time among children of this age range? Explain.

  4. 80.

    In Example 7.12, we introduced the concept of a censored experiment in which n components are put on test and the experiment terminates as soon as r of the components have failed. Suppose component lifetimes are independent, each having an exponential distribution with parameter λ. Let Y1 denote the time at which the first failure occurs, Y2 the time at which the second failure occurs, and so on, so that \( T_{r} = Y_{1} + \cdots + Y_{r} + (n - r)Y_{r} \) is the total accumulated lifetime at termination. Then it can be shown that 2λTr has a chi-squared distribution with 2r df. Use this fact to develop a 100(1 − α)% CI formula for true average lifetime 1/λ. Compute a 95% CI from the data in Example 7.12.

  5. 81.

    Exercises 7778 from Chapter 7 introduced “regression through the origin” to relate a dependent variable y to an independent variable x. The assumption there was that for any fixed x value, the dependent variable is a random variable Y with mean value βx and variance σ2 (so that Y has mean value zero when x = 0). The data consists of n independent \( (x_{i} ,Y_{i} ) \) pairs, where each Yi is normally distributed with mean βxi and variance σ2. The likelihood is then a product of normal pdfs with different mean values but the same variance.

    1. a.

      Show that the mle of β is \( \hat{\beta } =\Sigma x_{i} Y_{i} /\Sigma x_{i}^{2} \).

    2. b.

      Verify that the mle of (a) is unbiased.

    3. c.

      Obtain an expression for \( V(\hat{\beta }) \) and then for \( \sigma_{{\hat{\beta }}} \).

    4. d.

      For purposes of obtaining a precise estimate of β, is it better to have the xi’s all close to 0 (the origin) or quite far from 0? Explain your reasoning.

    5. e.

      The natural prediction of Yi is \( \hat{\beta }x_{i} \). Let \( S^{2} =\Sigma (Y_{i} - \hat{\beta }x_{i} )^{2} /(n - 1) \), which is analogous to sample variance. It can be shown that \( T = (\hat{\beta } - \beta )/\left( {S/\sqrt {\Sigma x_{i}^{2} } } \right) \) has a t distribution with n − 1 df. Use this to obtain a CI formula for estimating β, and calculate a 95% CI using the data from the cited exercises.

  6. 82.

    Let \( X_{1} , \ldots ,X_{n} \) be a random sample from a uniform distribution on the interval [0, θ] and \( Y = \hbox{max} (X_{1} , \ldots ,X_{n} ) \). Then methods from Section 5.7 can be used to show that the rv \( U = Y/\theta \) has pdf

    $$ f_{U} (u) = nu^{n - 1} \quad 0 \le u \le 1 $$
    1. a.

      Verify that

      $$ P\left[ {(\alpha /2)^{1/n} \le \frac{Y}{\theta } \le (1 - \alpha /2)^{1/n} } \right] = 1 - \alpha $$

      and use this to derive a 100(1 − α)% CI for θ.

    2. b.

      Verify that \( P(\alpha^{1/n} \le Y/\theta \le 1) = 1 - \alpha \), and derive a 100(1 − α)% CI for θ based on this probability statement.

    3. c.

      Which of the two intervals derived in (a) and (b) is shorter? If your waiting time for a morning bus is uniformly distributed and observed waiting times are x1 = 4.2, x2 = 3.5, x3 = 1.7, x4 = 1.2, and x5 = 2.4, obtain a 95% CI for θ by using the shorter of the two intervals.

  7. 83.

    Let 0 < γ < α. Then a 100(1 − α)% CI for μ when n is large is

    $$ \left( {\bar{x} - z_{\gamma } \cdot \frac{s}{\sqrt n },\bar{x} + z_{\alpha - \gamma } \cdot \frac{s}{\sqrt n }} \right) $$

    The choice γ = α/2 yields the large-sample interval derived in Section 8.2; if γ ≠ α/2, this confidence interval is not symmetric about \( \bar{x} \). The width of the interval is \( w = s(z_{\gamma } + z_{\alpha - \gamma } )/\sqrt n \). Show that w is minimized for the choice γ = α/2, so that the symmetric interval is the shortest. [Hints: (1) By definition of zα, Φ(zα) = 1 − α, so that zα = Φ−1(1 − α); (2) the relationship between the derivative of a function y = f(x) and the inverse function \( x = f^{ - 1} (y) \) is \( (d/dy)f^{ - 1} (y) = 1/f^{\prime}(x) \).]

  8. 84.

    Suppose x1, x2, …, xn are observed values resulting from a random sample from a symmetric but possibly heavy-tailed distribution. Chapter 11 of Understanding Robust and Exploratory Data Analysis (see the bibliography) suggests the following robust 95% CI for the population mean (point of symmetry):

    $$ \tilde{x} \pm \left( {\frac{{{\text{conservative }}t{\text{ critical value}}}}{1.075}} \right) \cdot \frac{\text{iqr}}{\sqrt n } $$

    The value of the quantity in parentheses is 2.10 for n = 10, 1.94 for n = 20, and 1.91 for n = 30. Compute this CI for the restaurant tip data of Example 8.17, and compare to the t CI appropriate for a normal population distribution.

  9. 85.
    1. a.

      Use the results of Example 8.5 to obtain a 95% lower confidence bound for the parameter λ of an exponential distribution, and calculate the bound based on the data given in the example.

    2. b.

      If lifetime X has an exponential distribution, the probability that lifetime exceeds t is given by \( P(X > t) = e^{ - \lambda t} \). Use the result of part (a) to obtain a 95% lower confidence bound for the probability that lifetime exceeds 100 min.

  10. 86.

    Let θ1 and θ2 denote the mean weights for animals of two different species. A biologist wishes to estimate the ratio θ1/θ2. Unfortunately the species are extremely rare, so the estimate will be based on finding a single animal of each species. Let Xi denote the weight of the species i animal (i = 1, 2), assumed to be normally distributed with mean θi and standard deviation 1.

    1. a.

      Show that the rv \( h(X_{1} ,X_{2} ;\theta_{1} ,\theta_{2} ) = (\theta_{2} X_{1} - \theta_{1}X_{2} )/\sqrt {\theta_{1}^{2} + \theta_{2}^{2} } \) is a pivotal quantity by determining the distribution of h.

    2. b.

      Show that h depends on θ1 and θ2 only through θ1/θ2. [Hint: Divide numerator and denominator by θ2.]

    3. c.

      Consider Expression (8.7) from the first section of this chapter with a = −1.96 and b = 1.96. Now replace < by = and solve for θ1/θ2. Then show that a confidence interval results if \( x_{1}^{2} + x_{2}^{2} \ge 1.96^{2} \), whereas if this inequality is not satisfied, the resulting confidence set is the complement of an interval.

  11. 87.

    The one-sample CI for a normal mean and PI for a single observation from a normal distribution were both based on the central t distribution. A CI for a particular percentile (e.g., the 1st percentile or the 95th percentile) of a normal population distribution is based on the noncentral t distribution. A particular distribution of this type is specified by both df and the value of the noncentrality parameter δ (δ = 0 gives the central t distribution). The key result is that the variable

    $$ T = \frac{{\frac{{\overline{X} - \mu }}{\sigma /\sqrt n } - (z\;{\text{percentile)}}\sqrt n }}{S/\sigma } $$

    has a noncentral t distribution with \( {\text{df}} = n - 1\,{\text{and}}\;\delta = -\left( {z{\text{ percentile}}} \right)\sqrt n \).

    Let t.025,ν,δ and t.975,ν,δ denote the critical values that capture upper-tail area .025 and lower-tail area .025, respectively, under the noncentral t curve with ν df and noncentrality parameter δ (when δ = 0, t.975 = −t.025, since central t distributions are symmetric about 0).

    1. a.

      Use the given information to obtain a formula for a 95% confidence interval for the (100p)th percentile of a normal population distribution.

    2. b.

      For δ = 6.58 and df = 15, t.975 and t.025 are (from software) 4.1690 and 10.9684, respectively. Use this information to obtain a 95% CI for the 5th percentile of the beer alcohol distribution considered in Exercise 17.

  12. 88.

    In this exercise, we develop a CI for \( \tilde{\mu } \) that is valid whatever the shape of the population distribution as long as it is continuous. Let X1, …, Xn be a random sample from the distribution and \( Y_{1} < \cdots < Y_{n} \) denote the corresponding ordered values (smallest observation, second smallest, and so on).

    1. a.

      What is \( P(X_{1} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }) \)? What is \( P(\{ X_{1} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }\} \cap \{ X_{2} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }\} ) \)?

    2. b.

      What is \( P(Y_{n} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }) \)? What is \( P(Y_{1} {\mkern 1mu} > {\mkern 1mu} \tilde{\mu }) \)? [Hint: What condition involving all of the Xi’s is equivalent to the largest being smaller than the population median?]

    3. c.

      What is \( P(Y_{1} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }{\mkern 1mu} < {\mkern 1mu} Y_{n} ) \)? What does this imply about the confidence level associated with the CI \( (y_{1} ,y_{n} ) \) for \( \tilde{\mu } \)?

    4. d.

      An experiment carried out to study the time (min) necessary for an anesthetic to produce the desired result yielded the following data: 31.2, 36.0, 31.5, 28.7, 37.2, 35.4, 33.3, 39.3, 42.0, 29.9. Determine the confidence interval of (c) and the associated confidence level.

  13. 89.

    Consider the situation described in the previous exercise.

    1. a.

      What is \( P(\{ X_{1} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }\} \cap \{ X_{2} {\mkern 1mu} > {\mkern 1mu} \tilde{\mu }\} \cap \cdots \cap \{ X_{n} {\mkern 1mu} > {\mkern 1mu} \tilde{\mu }\} ) \), that is, the probability that only the first observation is smaller than the median?

    2. b.

      What is the probability that exactly one of the n original observations is smaller than the median?

    3. c.

      What is \( P(\tilde{\mu }{\mkern 1mu} < {\mkern 1mu} Y_{2} ) \)? [Hint: The event in parentheses occurs if all n of the observations exceed the median. How else can it occur?]

    4. d.

      What is \( P(Y_{2} {\mkern 1mu} < {\mkern 1mu} \tilde{\mu }{\mkern 1mu} < {\mkern 1mu} Y_{n - 1} ) \)? What does this imply about the confidence level associated with the CI \( (y_{2} ,y_{n - 1} ) \) for \( \tilde{\mu } \)?

    5. e.

      Determine the confidence level and CI using part (d) with the data given in the previous exercise.

  14. 90.

    The previous two exercises considered a CI for a population median \( \tilde{\mu } \) based on the ordered values from a random sample. Let’s now consider a prediction interval for the next observation Xn+1, which is assumed to be independent of \( X_{1} , \ldots ,X_{n} \).

    1. a.

      What is P(Xn+1 < X1)? What is P({Xn+1 < X1} ∩ {Xn+1 < X2})?

    2. b.

      What is P(Xn+1 < Y1)? What is P(Xn+1 > Yn)?

    3. c.

      What is \( P(Y_{1} < X_{n + 1} < Y_{n} ) \)? What does this say about the prediction level for the PI \( (y_{1} ,y_{n} ) \)? Determine the prediction level and interval for the data in the previous two exercises.

  15. 91.

    Consider 95% CIs for two different parameters θ1 and θ2, and let Ai (i = 1, 2) denote the event that the value of θi is included in the random interval that results in the CI. Thus \( P(A_{i} ) \) = .95.

    1. a.

      Suppose that the data on which the CI for θ1 is based is independent of the data used to obtain the CI for θ2 (e.g., we might have θ1 = μ, the population mean height for American females, and θ2 = p, the proportion of all iPhones that don’t need warranty service). What can be said about the simultaneous confidence level for the two intervals? That is, how confident can we be that the first interval contains the value of θ1 and that the second contains the value of θ2? [Hint: Consider P(A1A2).]

    2. b.

      Now suppose the data for the first CI is not independent of that for the second one. What now can be said about the simultaneous confidence level for both intervals? [Hint: Consider \( P(A^{\prime}_{1} \cup A^{\prime}_{2} ) \), the probability that at least one interval fails to include the value of what it is estimating. Now use the fact that \( P(A^{\prime}_{1} \cup A^{\prime}_{2} ) \le P(A^{\prime}_{1} ) + P(A^{\prime}_{2} ) \). The generalization of the bound on \( P(A^{\prime}_{1} \cup A^{\prime}_{2} ) \) to the probability of a k-fold union is one version of the Bonferroni inequality.]

    3. c.

      What can be said about the simultaneous confidence level if the confidence level for each interval separately is 100(1 − α)%? What can be said about the simultaneous confidence level if a 100(1 – α)% CI is computed separately for each of k parameters \( \theta_{1} , \ldots ,\theta_{k} \)?

  16. 92.

    The Bonett CI for a population variance σ2 mentioned at the end of Section 8.4, unlike the chi-squared method, does not hinge on population normality. This interval involves a transformation along with an estimate of the kurtosis of the underlying distribution, a measure of its “tail” behavior. Specifically, Bonett defines a kurtosis estimate by

    $$ \bar{\gamma }_{4} = \frac{{n\sum {(x_{i} - \bar{x}_{\text{tr}} )^{4} } }}{{\left( {\sum {(x_{i} - \bar{x})^{2} } } \right)^{2} }} $$

    where \( \bar{x}_{\text{tr}} \) is the trimmed mean with trim proportion \( 1/[2\sqrt {n - 4} ] \). Then the Bonett CI for σ2 with confidence level 100(1 – α)% has endpoints

    $$ \exp \left[ {\ln (c \cdot S^{2} ) \pm z_{\alpha /2} \cdot c \cdot \sqrt {\frac{{(n - 3)\bar{\gamma }_{4} }}{n(n - 1)}} } \right] $$

    where c = n/(nzα/2) is “an empirically determined, small-sample adjustment” (meaning Bonett found this value by trial and error).

    1. a.

      For the study hours data in Exercise 63, n = 22, s = 4.603 and \( \bar{\gamma }_{4} \) = 7.003. Use Bonett’s formula to calculate a 95% CI for the population variance σ2.

    2. b.

      Use part (a) to determine a 95% CI for σ.

    3. c.

      Show that as \( n \to \infty \), both endpoints of the Bonett CI converge to σ2. [Hint: The kurtosis estimate \( \bar{\gamma }_{4} \) converges to a constant, while \( S^{2} \to \sigma^{2} \).]

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Devore, J.L., Berk, K.N., Carlton, M.A. (2021). Statistical Intervals Based on a Single Sample. In: Modern Mathematical Statistics with Applications. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-55156-8_8

Download citation

Publish with us

Policies and ethics