Skip to main content

Sampling Distributions and Approximations

  • Chapter
  • First Online:
Intuitive Introductory Statistics

Part of the book series: Springer Texts in Statistics ((STS))

  • 5847 Accesses

Abstract

Now that we have built a solid foundation based on exploratory data analysis techniques, proper design of experiments, and basic probability, we will put the finishing touches on this foundation by studying sampling distributions, the most important part of statistical inference. Before we can use the information we have collected and analyzed from our sample to make inferences about the population of interest, we must make sure that we understand how the statistic we have computed varies with repeated sampling. Our goal here is to describe what might happen if we repeat the entire sampling process and computation of the desired statistic again and again. Do you think that if you take a different sample you will get exactly the same value for the statistic? While it is certainly possible that this could happen, in most practical settings it is very unlikely that you will get exactly the same value of the statistic. At first, it might appear that this would be a major problem for the field of statistics. If we collect different samples and they usually give us different results, how can we make any inferences? The fact is that even though the values of the statistic are likely to differ from sample to sample, they will follow a pattern. This pattern of variation in repeated sampling is described by the sampling distribution of the statistic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We have made the sample size very small here to simplify the calculations, but small sample sizes are often typical in clinical trials of treatments for rare illnesses.

  2. 2.

    The term bootstrap originates from an old expression that encouraged individuals to improve matters by lifting themselves up by their own bootstraps. Real life bootstraps are not very common now, but the process is similar to the more common concept of rebooting a computer from a core set of instructions. When all else fails, reboot. In statistical settings, when the calculations are tedious or the situation is difficult because you cannot remember the detailed formulas for a distribution, try bootstrapping.

Bibliography

  • Ault, R. G., Hudson, E. J., Linehan, D. J., & Woodward, J. D. (1967). A practical approach to the assessment of head retention of bottled beers. Journal of the Institute of Brewing, 73(6), 558–566.

    Article  Google Scholar 

  • Baxley, F., & Miller, M. (2006). Parental misperceptions about children and firearms. Archives of Pediatric & Adolescent Medicine, 160(5), 542–547.

    Article  Google Scholar 

  • Devanand, D. P., Lee, S., Manly, J., Andrews, H., Schupf, N., Masurkar, A., Stern, Y., Mayeux, R., & Doty, R. L. (2015). Oflactory identification deficits and increased mortality in the community. Annals of Neurology, 78(3), 401–411.

    Article  Google Scholar 

  • Gallup, Inc. (2012a). Business Journal: Your employees don’t “Get” your brand. J. H. Fleming, and D. Witters. Report issued July 26, 2012. www.gallup.com

  • Gallup, Inc. (2012b). Economy: U. S. workers least happy with their work stress and pay. Report issued November 12, 2012. www.gallup.com

  • Gallup, Inc. (2014). Well-being: Americans favor ban on smoking in public, but not total ban. R. Riffkin; report issued July 30, 2014. www.gallup.com

  • Gallup, Inc. (2015d). Politics: More republicans now prefer one-party Government. J. M. Jones; report issued October 12, 2015. www.gallup.com

  • Intellicast. (2016). http://www.intellicast.com/Local/History.aspx?month=9. Accessed 30 June 2016.

  • Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6o F, the upper limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich. Journal of the American Medical Association, 268(12), 1578–1580.

    Article  Google Scholar 

  • Moore, T. L. (2006). Paradoxes in film ratings. Journal of Statistics Education, 14(1), 8 pages online.

    Google Scholar 

  • National Public Radio. (2014). Scott Neuman, February 14, 2014. http://www.npr.org/sections/thetwo-way/2014/02/14/277058739/. Accessed 21 July 2016.

  • National Pubic Radio. (2015). Sports and health in America. Prepared in conjunction with the Robert Wood Johnson Foundation and the Harvard T. H. Chan School of Public Health. Released June 2015. www.npr.org

  • Neuman, M. D., & Werner, R. M. (2015). Marital status and postoperative functional recovery. Journal of the American Medical Association Surgery, 151, 194, Published online October 28, 2015, 3 pages.

    Google Scholar 

  • Shkedy, Z., Aerts, M., & Callaert, H. (2006). The weight of Euro coins: Its distribution might not be as normal as you would expect. Journal of Statistics Education, 14(2), 15 pages online.

    Google Scholar 

  • Shoemaker, A. L. (1996). What’s normal?—Temperature, gender, and heart rate. Journal of Statistics Education, 4(2), 4 pages online.

    Google Scholar 

  • USA Today. (2013). Chris Chase, USA TODAY Sports. January 30, 2013. http://www.usatoday.com/story/gameon/2013/01/30. Accessed 21 June 2016.

  • VisitingAngels. (2013). National survey reveals children choose Mom over Dad. Released June 7, 2013. www.visitingangels.com

  • Wolfe, J., Martinez, R., & Scott, W. A. (1998). Baseball and beer: An analysis of alcohol consumption patterns among male spectators at major-league sporting events. Annals of Emergency Medicine, 31, 629–632.

    Article  Google Scholar 

  • Woodward, W. F. (1970). A comparison of base running methods in baseball. MSc thesis, Florida State University.

    Google Scholar 

  • Wypijewski, J. (1997). Painting by numbers: Komar and Melamid’s scientific guide to art. Farrar, Straus, & Giroux, Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Chapter 5 Comprehensive Exercises

Chapter 5 Comprehensive Exercises

5.1.1 5.A. Conceptual

5.A.1. Continuity Correction for the Normal Approximation to a Binomial Distribution. The accuracy of the normal approximation for binomial counts can be improved by making one simple modification. Since B is a count of the total number of successes, it can assume only integer values. When we consider the sampling distribution of B to be approximately normal, however, we are also permitting B to assume values between the integers. To correct for the fact that we are using a continuous distribution to approximate a distribution that can assume only integer values we can adjust the process by acting as if the values of B are obtained by rounding to the nearest integers. This amounts to adding or subtracting 1/2 to the number of successes and is referred to as the continuity correction. Figure 5.23 illustrates the use of this continuity correction for the probability

Fig. 5.23.
figure 23

Normal approximation for binomial counts using the continuity correction

$$ {\displaystyle \begin{array}{lll}P\left(a\le B\le b\right)\hfill & =\hfill & P\left(\frac{a-\frac{1}{2}- np}{\sqrt{np\left(1-p\right)}}\le {Z}_{sum}\le \frac{b+\frac{1}{2}- np}{\sqrt{np\left(1-p\right)}}\right)\hfill \\ {}\hfill & \approx \hfill & \Phi \left(\frac{b+\frac{1}{2}- np}{\sqrt{np\left(1-p\right)}}\right)-\Phi \left(\frac{a-\frac{1}{2}- np}{\sqrt{np\left(1-p\right)}}\right).\hfill \end{array}} $$

Compare Fig. 5.23 with Fig. 5.9 and then apply the normal approximation with continuity correction to approximate P(B ≤ 25) when B ∼ B(30, .6) using the R function . Compare your answer with that obtained in Example 5.9 using the normal approximation without the continuity correction.

5.A.2. Continuity Correction for the Normal Approximation to a Binomial Distribution. Consider the sampling distribution for the number of successes in 20 independent Bernoulli trials when p = 0.5. Calculate the probability of getting less than 8 successes using:

  1. (a)

    the exact binomial distribution;

  2. (b)

    the normal approximation without the continuity correction;

  3. (c)

    the normal approximation with the continuity correction.

5.A.3. Continuity Correction for the Normal Approximation to a Binomial Distribution. Consider the sampling distribution for the number of successes in 40 independent Bernoulli trials when p = 0.4. Calculate the probability of getting at most 20 successes using:

  1. (a)

    the exact binomial distribution;

  2. (b)

    the normal approximation without the continuity correction;

  3. (c)

    the normal approximation with the continuity correction.

5.A.4. Continuity Correction for the Normal Approximation to a Binomial Distribution. Consider the sampling distribution for the number of successes in 50 independent Bernoulli trials when p = 0.45. Calculate the probability of getting at least 20 successes using:

  1. (a)

    the exact binomial distribution;

  2. (b)

    the normal approximation without the continuity correction;

  3. (c)

    the normal approximation with the continuity correction.

5.A.5. Continuity Correction for the Normal Approximation to a Binomial Distribution. Consider the sampling distribution for the number of successes in 100 independent Bernoulli trials when p = 0.8. Calculate the probability of getting more than 82 successes using:

  1. (a)

    the exact binomial distribution;

  2. (b)

    the normal approximation without the continuity correction;

  3. (c)

    the normal approximation with the continuity correction.

5.A.6. Continuity Correction for the Normal Approximation to the Distribution of a Sample Proportion. Modify the normal approximation to the sampling distribution for a proportion (corresponding to counts of Bernoulli random variables) to adjust for the fact that we are approximating a discrete distribution with a continuous normal curve. That is, specify the appropriate continuity correction for using the normal distribution to approximate the sampling distribution for a proportion.

5.1.2 5.B. Data Analysis/Computational

5.B.1. Football Yardage. A football team averages 3.5 yards per run with a standard deviation of 2 yards. If the team calls 30 running plays per game and the number of yards gained per rush is approximately normally distributed, what is the probability that they will rush for over 120 yards in a game?

5.B.2. Loss of Smell?—Might Not Bode Well. In a study investigating a link between the loss of our sense of smell and increased risk of dying, Devanand et al. (2015) collected information from 1169 adults in New York City. At their initial evaluations, participants took a “scratch and sniff” test (known as UPSIT) in which they were asked to identify 40 common odors. In a follow up study several years later, the authors found that 45.36% of the participants with the lowest UPSIT scores in the range [0, 20] had died during the follow up period. Suppose we selected a random sample of n = 100 of the participants in this study with UPSIT scores in the range [0, 20].

  1. (a)

    How many of the participants in our sample would we expect to still be alive at the end of the follow up period?

  2. (b)

    Use the R function to determine the exact probability that at least half of the participants in our sample were still alive at the end of the follow up period.

  3. (c)

    Use the R function to approximate the probability that at least half of the participants in our sample were still alive at the end of the follow up period. Compare this result with your finding in part (a).

  4. (d)

    What is the approximate probability that more than 40 of the participants in our sample died during the follow up period?

  5. (e)

    What is the approximate probability that fewer than 20 of the participants in our sample were still alive at the end of the follow up period?

5.B.3. Smelling Fine?—Enjoy the Wine . Consider the olfactory study by Devanand et al. (2015) described in Exercise 5.B.2. The authors also reported that only 18.39% of the participants with the highest UPSIT scores in the range (31, 40] had died during the follow up period. Suppose we selected a random sample of n = 100 of the participants in this study with UPSIT scores in the range (31, 40].

  1. (a)

    How many of the participants in our sample would we expect to still be alive at the end of the follow up period?

  2. (b)

    Use the R function to determine the exact probability that at least half of the participants in our sample were still alive at the end of the follow up period.

  3. (c)

    Use the R function to approximate the probability that at least half of the participants in our sample were still alive at the end of the follow up period. Compare this result with your finding in part (a).

  4. (d)

    What is the approximate probability that more than 40 of the participants in our sample died during the follow up period?

  5. (e)

    What is the approximate probability that fewer than 20 of the participants in our sample were still alive at the end of the follow up period?

  6. (f)

    Compare your findings for these participants with high UPSIT scores with what you obtained in Exercise 5.B.2 for participants with low UPSIT scores.

5.B.4. Having Heart Surgery?—Get Married First. Neuman and Werner (2015) used data from the University of Michigan Health and Retirement Study (http://hrsonline,isr.umich.edu), a longitudinal panel survey that has enrolled 29,053 adults 50 years of age or older since 1998, to study the postoperative function characteristics of married and unmarried patients undergoing cardiac surgery between 2002 and 2010. Of the 1026 married cardiac surgery patients, 199 died or developed new ADL (Activities of Daily Living) dependencies following surgery prior to their first scheduled postoperative interview. Of the 550 unmarried (separated, divorced, widowed, or never married) cardiac surgery patients, 172 died or developed new ADL dependencies following surgery prior to their first scheduled postoperative interview. Suppose we select a random sample of n = 100 of the married cardiac surgery patients in this study and an independent random sample of m = 50 of the unmarried cardiac surgery patients in the study.

  1. (a)

    How many of the married cardiac surgery patients in our sample would we expect to have either died or developed new ADL dependencies following surgery prior to their first scheduled postoperative interview?

  2. (b)

    Use the normal distribution to approximate the probability that fewer than 20% of the married cardiac surgery patients in our sample had either died or developed new ADL dependencies following surgery prior to their first scheduled postoperative interview.

  3. (c)

    What is the approximate probability that more than 15% of the married cardiac surgery patients in our sample had either died or developed new ADL dependencies following surgery prior to their first scheduled postoperative interview?

  4. (d)

    Repeat parts (a) – (c) for the unmarried cardiac surgery patients in our second sample. Compare with your findings in parts (a) – (c).

  5. (e)

    Let and denote the percentages of the 100 married and 50 unmarried cardiac surgery patients, respectively, in our random samples who either died or developed new ADL dependencies following surgery prior to their first scheduled postoperative interview. What is the approximate sampling distribution of ?

5.B.5. One-Party Government?—Depends on When You Ask. Gallup, Inc. (2015d) conducted telephone interviews September 9-13, 2015 with a random sample of 1025 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia . The participants were asked to state their political party preference (Democrat, Independent, or Republican) and whether or not they favored one party control of both Congress and the Presidency. Among those participants whose party preference was Republican, 40% had a preference that the same party should control both Congress and the Presidency. Suppose we collect a random sample of n = 80 from the Republican participants in these telephone interviews.

  1. (a)

    How many individuals in our random sample would we expect to have a preference that the same party should control both Congress and the Presidency?

  2. (b)

    What is the exact probability that more than half of the Republicans in our random sample have a preference that the same party should control both Congress and the Presidency?

  3. (c)

    What is the approximate probability that less than 30 of the Republicans in our random sample have a preference that the same party should control both Congress and the Presidency?

  4. (d)

    Gallup, Inc. had conducted a similar poll in 2014 prior to the November 2014 elections and found that only 24% of Republicans in that poll had a preference that the same party should control both Congress and the Presidency. Suppose we were able to collect a random sample of n = 80 Republican participants in the 2014 poll. Answer parts (a) – (c) for this random sample of 2014 poll participants.

  5. (e)

    Compare your answers in parts (a)–(c) with those obtained in part (d). Can you think of possible reasons that might have led to such a sudden change in governance principles among Republicans from 2014 to 2015?

5.B.6. Company Branding—Do Employees Get It? Gallup, Inc. ( 2012a) asked more than 3000 randomly selected workers whether they agreed with the statement: “I know what my company stands for and what makes our brand(s) different from our competitors.” Only 41% of the respondents strongly agreed with this statement. Suppose we were able to collect a random sample of n = 200 from the workers who participated in this survey.

  1. (a)

    How many individuals in our random sample would we expect to have strongly agreed with the statement?

  2. (b)

    What is the probability that exactly 41% of the individuals in our sample strongly agreed with the statement?

  3. (c)

    What is the approximate probability that less than 41% of the individuals in our sample strongly agreed with the statement?

  4. (d)

    What is the approximate probability that between 80 and 120, inclusive, of the individuals in our sample strongly agreed with the statement?

5.B.7. Smoking in Public Places. In recent years there has been a strong push by states and cities to ban cigarette smoking in public places (such as restaurants and bars) to protect non-smokers from secondhand smoke. How does the American public feel about such laws? As part of their annual Consumption Habits survey during the period July 7-10, 2014, Gallup, Inc. (2014) conducted telephone interviews with a random sample of 1013 adults, aged 18 and older, living in the U. S. states and the District of Columbia. They found that 56% of the respondents supported a ban on smoking in public places. Suppose we were able to collect a random sample of n = 150 from among the participants in these telephone interviews.

  1. (a)

    How many individuals in our random sample would we expect to have supported the ban on smoking in public places?

  2. (b)

    What is the approximate probability that more than 2/3 of the individuals in our random sample supported the ban on smoking in public places?

  3. (c)

    What is the approximate probability that less than half of the individuals in our random sample supported the ban on smoking in public places?

  4. (d)

    Do you think the public opinion on this issue might differ among different age groups? When do you think the public opinion changed from supporting smoking in public places to banning it? Do you think the public supports a total ban on smoking? Go to Gallup, Inc. (2014) and find out!

5.B.8. Teenagers and Sports. National Public Radio (2015), in conjunction with the Robert Wood Johnson Foundation and the Harvard T. H. Chan School of Public Health, supported a major study about sports and health in America. They polled 2506 adults during the period January 29-March 8, 2015. In particular, they asked 437 parents with children currently attending middle school, junior high school, or high school and participating in a sport to name the sport their child participated in MOST OFTEN during the previous year. The results of their poll are as follows:

Sport Participated in MOST OFTEN

Percentage of Children

Basketball

16

Soccer

14

Baseball/softball

11

Football

9

Running/jogging/trail running/track

7

Volleyball

6

Swimming

5

Others

32

Suppose we select a random sample of n = 8 children from the group represented in this poll. Use the R functions and to answer the following questions.

  1. (a)

    What is the probability that the sport participated in most often by the children in our sample was basketball(3), soccer(2), baseball/softball(2), and other(1)?

  2. (b)

    What is the probability that more than half of the children in our sample participated most often in basketball?

  3. (c)

    What is the probability that all of the eight children participated most often in “other” sports?

5.B.9. Adults and Sports/Moderate/Vigorous Exercise. National Public Radio (2015), in conjunction with the Robert Wood Johnson Foundation and the Harvard T. H. Chan School of Public Health, supported a major study about sports and health in America. They polled 2506 adults during the period January 29-March 8, 2015 and asked them about their sports/exercise participation during the previous year. For those 690 respondents who indicated that they DID NOT play sports or do vigorous- or moderate-intensity exercise in the previous year, 47% gave health-related reasons, 38% gave time availability/cost /lack of opportunity reasons, and the remaining 15% cited a lack of interest. Suppose we select a random sample of n = 30 of the adults in this study who DID NOT play sports or do vigorous- or moderate-intensity exercise in the previous year. Use the R functions and to answer the following questions.

  1. (a)

    What is the probability that 10 adults in our sample gave health-related reasons, 10 of them gave time availability/cost/lack of opportunity reasons, and 10 of them cited lack of interest.

  2. (b)

    What is the probability that more than half of the adults in our sample gave health related reasons?

  3. (c)

    What is the probability that none of the adults in our sample cited lack of interest?

  4. (d)

    What is the probability that none of the adults in our sample cited health-related reasons and twice as many of them gave time/availability/cost/lack of opportunity reasons as gave lack of interest as the reason?

5.B.10. Math SAT Scores. Consider the Math SAT scores for seniors graduating in 2013 or 2014 from a small private school, as presented in Table 1.15.

  1. (a)

    Find the sample average and sample standard deviation for the 79 male graduates.

  2. (b)

    Find the sample average and sample standard deviation for the 50 female graduates.

    Assume that these sample averages and standard deviations can be used as reasonable surrogates for the corresponding population means and standard deviations for all graduating seniors from small private schools. Suppose you collect additional random samples of m = 60 male and n = 50 female seniors graduating from other small private schools.

  3. (c)

    What is the probability that the average SAT score for your sample of 60 male graduating seniors will be less than 600?

  4. (d)

    What is the probability that the average SAT score for your sample of 50 female graduating seniors will exceed 550?

  5. (e)

    What is the probability that the average SAT score for your sample of 60 male graduating seniors will be larger than the average SAT score for your sample of 50 female graduating seniors?

5.B.11. Stretching a Hit into a Double. Woodward (1970) conducted a study of different methods of running to first base, with the goal of minimizing the time it would take to get from home plate to second base (i.e., get a double on a base hit). The times (in seconds) given in Table 1.33 are averages of two runs from a point on the first base line 35 ft from home plate to a point 15 ft short of second base for the method of running known as “wide angle” for each of 22 different runners.

  1. (a)

    Find the sample average and sample deviation for the 22 runners.

    Assume that this sample average and sample standard deviation can be used as reasonable surrogates for the corresponding population mean and population standard deviation for all baseball players similar in caliber to the sampled runners. Suppose you collect an additional random sample of n = 40 baseball players and measure the average of two “wide-angle” runs from home plate to second base for each of them.

  2. (b)

    What is the probability that the “wide-angle” time for any one of your ballplayers is between 5.35 and 5.45 seconds?

  3. (c)

    What is the probability that the average “wide-angle” time for your 40 ballplayers is between 5.35 and 5.45 seconds?

  4. (d)

    What is the probability that the average “wide-angle” time for your 40 ballplayers is less than 5.20 seconds?

  5. (e)

    How many of your ballplayers would you expect to have a “wide-angle” time between 5.35 and 5.45 seconds?

  6. (f)

    What is the probability that all of your 40 ballplayers record “wide-angle” times less than 5.20 seconds?

5.B.12. How Long Are Movies? The Movie and Video Guide is a ratings and information guide to movies that had been prepared annually by Leonard Maltin. Moore (2006) selected a random sample of 100 movies from the 1996 edition of the Guide. He compiled the dataset movie_facts containing relevant information about the selected movies. One of the pieces of information provided is the running length of the movies, in minutes.

  1. (a)

    Find the mean and standard deviation for the running length of this random sample of 100 movies.

    Suppose you collect a new random sample of running length times for 50 current movies. Assume that the distribution of running lengths for current movies is similar to those produced in 1996.

  2. (b)

    What are the mean and standard deviation of the average running length for your sample of 50 current movies?

  3. (c)

    What is the probability that the average running length for your sample of 50 current movies is less than 2 h?

  4. (d)

    What is the probability that the average running length for your sample of 50 current movies is between 1 h 50 min and 2 h 10 min?

  5. (e)

    How many of the current movies in your sample would you expect to have running lengths of at least 2 h?

  6. (f)

    What is the probability that none of your movies has a running length of more than 2 h 15 min?

5.B.13. Gender Differences in Body Temperature. Mackowiak et al. (1992) collected body temperature data from 148 individuals aged 18 through 40 years. The dataset body_temperature_and_heart_rate contains body temperature values (artificially generated by Shoemaker 1996, to closely recreate the original data obtained by Mackowiak et al.) for 65 male and 65 female subjects.

  1. (a)

    Obtain the mean and standard deviation for the body temperatures of the 65 male subjects in this dataset.

  2. (b)

    Obtain the mean and standard deviation for the body temperatures of the 65 female subjects in this dataset.

    Suppose you now collect additional random samples of 50 female and 50 male subjects and measure their body temperatures. Assume that the current populations from which you selected your random samples are similar to the populations that led to the random samples in the Mackowiak et al. study.

  3. (c)

    What is the probability that the average body temperature for your sample of 50 female subjects is greater than 98.6 degrees Fahrenheit?

  4. (d)

    What is the probability that the average body temperature for your sample of 50 male subjects is between 98 and 99 degrees Fahrenheit?

  5. (e)

    What is the probability that the average body temperature for your sample of 50 male subjects will be greater than the average body temperature for your sample of 50 female subjects?

  6. (f)

    How many of your 50 male subjects would you expect to have body temperatures between 98 and 99 degrees Fahrenheit?

5.B.14. Gender Differences in Heart Rate. Mackowiak et al. (1992) collected heart rate data from 148 individuals aged 18 through 40 years. The dataset body_temperature_and_heart_rate contains heart rate values (artificially generated by Shoemaker, 1996, to closely recreate the original data obtained by Mackowiak et al.) for 65 male and 65 female subjects.

  1. (a)

    Use the R function to simulate r = 750 bootstrap samples of size n = 15 each from the heart rates for the 65 male subjects and find the sample average for each sample. Display the approximate bootstrap sampling distribution for the sample average for samples of size 15.

  2. (b)

    Simulate r = 500 bootstrap samples of size n = 12 each from the heart rates for the 65 female subjects and find the sample standard deviation for each sample. Display the approximate bootstrap sampling distribution for the sample standard deviation for samples of size 12.

  3. (c)

    Simulate r = 1000 bootstrap samples of size n = 20 each separately from the male and female subjects. Find the difference in the sample medians for the male and female subjects for each of the 1000 pairs of samples. Display the approximate bootstrap sampling distribution for the difference in sample medians for common sample sizes of 20 each.

5.B.15. How Much Do Euros Weigh? The Euro is the common currency coin for the countries comprising the European Union. According to information from the “National Bank of Belgium”, the 1 Euro coin is stipulated to weigh 7.5 g. Shkedy et al. (2006) obtained eight separate packages of 250 Euros each from a Belgian bank and their assistants Sofie Bogaerts and Saskia Litière individually weighed each of these 2000 coins using a weighing scale of the type Sartorius BP310, which provided an accurate reading up to one thousandth of a gram. These 2000 weights, indexed by package number, are provided in the dataset weight_of_Euros.

  1. (a)

    Using only the 250 coins from package number 2, simulate r = 500 bootstrap samples of size n = 25 each and find the minimum Euro weight for each of the 500 samples. Display the approximate bootstrap sampling distribution for the minimum Euro weight for samples of size 25.

  2. (b)

    Using only the 250 coins from package number 4, simulate r = 1000 bootstrap samples of size n = 30 each and find the average Euro weight for each of the 1000 samples. Display the approximate bootstrap sampling distribution for the sample average for samples of size 30.

  3. (c)

    Using only the 250 coins from package number 5, simulate r = 750 bootstrap samples of size n = 40 each and find the range of the Euro weights for each of the 750 samples. Display the approximate bootstrap sampling distribution for the sample range for samples of size 40.

  4. (d)

    Combining the 500 coins from packages numbered 4 and 5, simulate r = 1000 bootstrap samples of size n = 30 each and find the average Euro weight for each of the 1000 samples. Display the approximate bootstrap sampling distribution for the sample average for samples of size 30. Compare your results with those obtained in part (b).

5.B.16. An automobile manufacturer claims that the fuel consumption for a certain make and model of car averages 28 miles per gallon with a standard deviation of 3 miles per gallon. Suppose you test a random sample of n = 25 cars of this make and model.

  1. (a)

    What is the probability that the average fuel consumption for your random sample of 25 cars will be greater than 30 miles per gallon?

  2. (b)

    What is the probability that the average fuel consumption for your random sample of 25 cars will be between 26 and 31 miles per gallon?

  3. (c)

    How many of your cars would you expect to have fuel consumptions between 26 and 31 miles per gallon?

  4. (d)

    What is the probability that none of your random sample of 25 cars has fuel consumption greater than 28 miles per gallon?

5.B.17. How Well Does Your Beer Hold Its Foam? Two features of bottled beer that are important to beer consumers are the amount of initial head formation when a beer is poured and how long the head lasts. Ault et al. (1967) measured the height of the initial head formation upon pouring, the percentage adhesion of the head, and the percentage collapse of the head 4 min after pouring for 20 bottles selected from two different production lots of the beer. The dataset beer_head contains the results of their study.

  1. (a)

    Find the sample average and standard deviation for the maximum head formation for the sample of 20 bottles of beer from the first production lot.

  2. (b)

    If you were to collect another random sample of n = 30 bottles of beer from the first production lot, what is the probability that the average maximum head formation for your sample of 30 bottles would be greater than 175?

  3. (c)

    Find the sample average and standard deviation for the percentage collapse of the head 4 min after pouring for the sample of 20 bottles of beer from the second production lot.

  4. (d)

    If you were to collect another random sample of n = 40 bottles of beer from the second production lot, what is the probability that the average percentage collapse for your sample of 40 bottles would be less than 80 percent?

  5. (e)

    Find the sample averages and standard deviations for the percentage adhesion of the head separately for the two production lots.

  6. (f)

    If you were to collect additional random samples of n = 20 bottles of beer from each of the two production lots, what is the probability that the average percentage adhesion for the sample from the second production lot would exceed the average percentage adhesion for the sample from the first production lot?

5.1.3 5.C. Activities

5.C.1. Age of U.S. Dimes. Collect a large sample of U.S. dimes and find the ages of the dimes (in years). Simulate r = 1000 bootstrap samples of size 5 each, 1000 bootstrap samples of size 10 each, 1000 bootstrap samples of size 20 each, and 1000 bootstrap samples of size 40 each from this large sample of dimes and compute the sample mean for each of these 4000 samples. Display the approximate sampling distributions for sample averages of sizes 5, 10, 20, and 40 from the age distribution of all dimes in circulation at the time. Compare your average age distributions with those for U.S. pennies, as displayed in Figs. 5.14, 5.15, 5.16, and 5.17.

5.C.2. Age of U.S. Quarters. Repeat Exercise 5.C.1 for U.S. quarters.

5.C.3. Peanut M&Ms. Collect a sample of Peanut M&Ms. and compute the Pearson goodness of fit statistic for the sample relative to the claim by Mars, Inc. that the color combination in M&M’s Peanuts is 20% brown, 20% yellow, 20% red, 10% orange, 10% green, and 20% blue. Use simulation to assess their claim.

5.C.4. Peanut Butter and Almond M&Ms. Collect a sample of Peanut Butter or Almond M&M’s and compute the Pearson goodness of fit statistic for the sample relative to the claim by Mars, Inc. that the color combination in both M&M’s Peanut Butter and M&M’s Almond is 20% brown, 20% yellow, 20% red, 20% green, and 20% blue. Use simulation to assess their claim.

5.C.5. Reese’s Pieces. Collect a sample of Reese’s Pieces and compute the Pearson goodness of fit statistic relative to the claim by Mars, Inc. that Reese’s Pieces are evenly mixed among two colors, orange and brown. Use simulation to assess their claim.

5.1.4 5.D. Internet Archives

5.D.1. Country Characteristics/Attributes. Search the Internet to find a dataset that contains the characteristics or attributes for a sample of individuals residing in a particular country of interest to you. Use these data and bootstrap simulation to obtain the approximate sampling distribution for the sample average of 30 individuals from this country for one of these characteristics or attributes.

5.D.2. Drug Studies/Medical Treatment. Search the Internet to find an article that provides details for the results of a clinical trial studying the effectiveness of a new drug or medical procedure. Use these data and bootstrap simulation to obtain the approximate sampling distribution for the average effect of the new drug or medical procedure on a sample of 20 additional individuals who might use it.

5.D.3. Church Attendance. Search the Internet to find the results of a survey dealing with weekly church attendance in the United States. Using these results and bootstrap simulation, obtain the approximate sampling distribution for the percentage of all Americans who attend church weekly.

5.D.4. Political Survey. Search the Internet to find the results of a political survey that includes questions of interest to you. Select a question with at least five possible responses indexed by a categorical variable with at least four categories. Use the responses of the survey participants to this question and bootstrap simulation to obtain the approximate sampling distribution for the Pearson goodness of fit statistic G for a sample of size n = 200 from the population of interest.

5.D.5. Time to Degree. Search the Internet for a study that provides information about times to degree for a collection of undergraduate students. Use the results of this study to simulate the sampling distribution for the average time to degree for a random sample of 100 undergraduate students.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wolfe, D.A., Schneider, G. (2017). Sampling Distributions and Approximations. In: Intuitive Introductory Statistics. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-56072-4_5

Download citation

Publish with us

Policies and ethics