Skip to main content

Statistical Inference for Two-Way Tables of Count Data

  • Chapter
  • First Online:
  • 5761 Accesses

Part of the book series: Springer Texts in Statistics ((STS))

Abstract

In Sect. 9.1 we discussed procedures designed for making statistical inference about the difference in the probabilities of a common event A for two populations. Those procedures are based on independent random samples of Bernoulli variables (i. e., either the event A occurs or it does not) from each of the two populations. One way to represent the observed outcomes of such Bernoulli random samples is in the following 2 × 2 table:

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Bibliography

  • Bagley, F. (1985). Personal communication for report in Statistics 661. Columbus: Ohio State University.

    Google Scholar 

  • Cruickshanks, K. J., Klein, R., Klein, B. E. K., Wiley, T. L., Nondahl, D. M., & Tweed, T. S. (1998). Cigarette smoking and hearing loss: The epidemiology of hearing loss study. Journal of the American Medical Association, 279(21), 1715–1719.

    Article  Google Scholar 

  • Doblhammer, G., & Vaupel, J. W. (2001). Lifespan depends on the month of birth. Proceedings of the National Academy of Sciences, 98(5), 2934–2939.

    Article  Google Scholar 

  • Gallup-Purdue Index Report. (2015). Great Jobs, Great lives. The relationship between student debt, experiences and perceptions of College worth. Report issued September 29, 2015. www.gallup.com

  • Jarausch, K. H., & Arminger, G. (1989). The German teaching profession and Nazi party membership: A demographic Logit model. The Journal of Interdisciplinary History, 20(2), 197–225.

    Article  Google Scholar 

  • Johnson, B. (1984). Personal communication for report in Statistics 661. Columbus: Ohio State University.

    Google Scholar 

  • Lee, C. (1999). Selective assignment of military positions in the Union Army: Implications for the impact of the Civil War. Social Science History, 23(1), 67–97.

    Article  MathSciNet  Google Scholar 

  • Lee, H., Deng, X., Unnava, H. R., & Fujita, K. (2014). Monochrome forests and colorful trees: The effect of black-and-white versus color imagery on construal level. Journal of Consumer Research, 41(4), 1015–1032.

    Article  Google Scholar 

  • Meilman, P. W., Leichliter, J. S., & Presley, C. A. (1998). Analysis of weapon carrying among college students, by region and institution type. Journal of American College Health, 46(6), 291–291.

    Article  Google Scholar 

  • Pew Research Center. (2009). U.S. politics & policy: Public praises science; scientists fault public, media. Report issued July 9, 2009. www.pewresearch.org

  • Pew Research Center. (2015a). U. S. politics & policy: Support for Iran Nuclear Agreement Falls. Report issued September 8, 2015. www.pewresearch.org

  • Pew Research Center. (2015c). U. S. politics & policy: On immigration policy, Wider Partisan divide over border Fence than path to legal status. Report issued October 8, 2015. www.pewresearch.org

  • Rosen, M. (1979). Personal communication for report in Statistics 661. Columbus: Ohio State University.

    Google Scholar 

  • Sax, L. J. (1997). Health trends among college freshmen. Journal of American College Health, 45(6), 252–264.

    Article  Google Scholar 

  • Shepherd, J., Irish, M., Scully, C., & Leslie, I. (1988). Alcohol intoxication and severity of injury in victims of assault. British Medical Journal (Clinical Research Education), 296(6632), 1299.

    Article  Google Scholar 

  • Stoupel, E., Tamoshiunas, A., Radishauskas, R., Bernotiene, G., Abramson, E., & Israelevich, P. (2011). Acute myocardial infarction (AMI) in context with the paradigm—Month of birth and longevity. Health, 3(12), 732–736.

    Article  Google Scholar 

  • Vigorito, A. J., & Curry, T. J. (1998). Marketing masculinity: Gender identity and popular magazines. Sex Roles, 39(1/2), 135–152.

    Article  Google Scholar 

  • Williams, R. D., Jr. (2012). Alcohol consumption and policy perception among college freshman athletes. American Journal of Health Sciences, 3(1), 17–22.

    Google Scholar 

  • Wypijewski, J. (1997). Painting by numbers: Komar and Melamid’s scientific guide to art. Farrar, Straus, & Giroux, Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Chapter 10 Comprehensive Exercises

Chapter 10 Comprehensive Exercises

10.1.1 10.A. Conceptual

10.A.1. Below are two questions of potential research interest . For each of these questions, describe the appropriate data to collect in order to address the question and state which procedure discussed in this chapter would provide the proper statistical analysis of these data.

  • Question 1: Does participation in intercollegiate athletics effect the length of time to graduation or even the graduation rate itself for a college student?

  • Question 2: Are the percentages of individuals between the ages of 16 and 25 who do not smoke at all, smoke occasionally (less than a pack a week), smoke regularly, but not heavily (between one and three packs a week), or are heavy smokers (more than three packs a week) the same for males and females?

10.A.2. Degrees of Freedom . There are k categories for the goodness of fit test procedure discussed in Sect. 4. However, the degrees of freedom for the chi-square approximation associated with the test statistic Q 2 (10.25) is only k-1, despite the fact that there are k terms (one for each category) in the sum for Q 2. Discuss an intuitive reason why the degrees of freedom for this chi-square approximation should not be k.

10.A.3. Below are two questions of potential research interest . For each of these questions, describe the appropriate data to collect in order to address the question and state which procedure discussed in this chapter would provide the proper statistical analysis of these data.

  • Question 1: Is there any relationship between an individual’s religious preference and her tolerance of the religious preferences of others?

  • Question 2: Are there differences in drinking habits between college students who belong to sororities, fraternities or neither?

10.A.4. The test procedures discussed in Sects. 1 and 3 are both designed for settings where fixed numbers of sample observations are collected from at least two different populations. The question of interest for both procedures is whether certain categorical proportions are the same for the populations. Thus, under certain conditions they are competing test procedures.

  1. (a)

    Specify the setting(s) where both procedures are applicable. What are the advantages and disadvantages of each procedure for such setting(s)?

  2. (b)

    Which of the two procedures is more broadly applicable and why?

10.A.5. Both the test for differences in category proportions for two or more populations discussed in Sect. 1 and the test for association (independence) between two categorical attributes discussed in Sect. 2 use the same chi-square statistic Q 1 (10.7). Moreover, the two test procedures use the data in the observed I × J tables of counts in exactly the same way to compute the expected counts (Eqs. (10.6) and (10.16), respectively) when the appropriate null hypotheses are true. However, the ways in which the observed tables of counts are obtained is quite different for the two procedures. Discuss such data collection differences for the two settings, particularly with respect to the sampling methods for the two procedures and the interpretations of the column and row totals for the observed tables of counts.

10.A.6. Degrees of Freedom . Consider the test procedure discussed in Sect. 1 that is designed to test for differences in population proportions. If there are J populations and I categories, then we have IJ category-population cross entries in the observed data count Table 10.2. However, the degrees of freedom associated with the chi-square approximation for the associated test statistic Q 1 (10.7) is only (I-1)(J-1), despite the fact that there are IJ terms (one for each category-population combination) in the sum for Q 1. Discuss an intuitive reason why the degrees of freedom for this chi-square approximation should not be IJ.

10.A.7. Lotteries . In a lottery each number between 0 and 9 is designed to have the same chance of being drawn.

  1. (a)

    In 6000 draws for the lottery, how many times should we expect the number 6 to appear? How many times should we expect the number 9 to appear?

  2. (b)

    Suppose we make two-digit numbers from each consecutive pair of numbers drawn. For 6000 draws from the lottery, how many times should we expect the number 45 to appear? How many times should we expect that the two-digit number will be at least as large as 60?

  3. (c)

    Suppose we make three-digit numbers from each consecutive triple of numbers drawn. For 6000 draws from the lottery, how many times should we expect the number 369 to appear? How many of the three-digit numbers should we expect to be less than 250?

10.1.2 10.B. Data Analysis/Computational

10.B.1. The following is a 5 × 4 table of observed counts collected from an experiment involving two categorical attributes, A and B:

  

Category for attribute A

  

1

2

3

4

 

1

13

9

15

9

Category for

2

11

5

8

10

attribute B

3

6

4

2

16

 

4

8

9

10

5

 

5

3

9

7

11

Find the following expected cross-category counts if there is no association between the two attributes (that is, they are independent).

  1. (a)

    [expected count in category 1 for both attribute A and B]

  2. (b)

    [expected count in category 3 for attribute A and category 2 for attribute B]

  3. (c)

    [expected count in category 4 for attribute A]

  4. (d)

    [expected count in category 3 for attribute B]

10.B.2. Consider the 5 × 4 table of observed counts in Exercise 10.B.1.

  1. (a)

    Construct the corresponding table of expected cross-category counts if there is no association between the two attributes (that is, they are independent).

  2. (b)

    Find the approximate P-value for an appropriate test of the hypothesis that there is no association between Attributes A and B (that is, they are independent).

10.B.3. Lotteries . In a fair lottery it is supposed to be equally likely to draw each integer number between 0 and 9, inclusive. Suppose we draw 500 such numbers using the lottery specified method and observe the following counts for the ten possible outcomes:

Number:

0

1

2

3

4

5

6

7

8

9

Observed Count:

60

42

35

88

50

32

70

55

40

28

  1. (a)

    If the lottery is fair, what should the expected counts be for the ten possible outcomes?

  2. (b)

    State the hypothesis that corresponds to the lottery being fair. Be explicit about all terms and numerical values.

  3. (c)

    Using the observed counts given above, find an approximate P-value for an appropriate test for the fairness of this lottery system.

10.B.4. Alcohol Consumption and Severity of Assault Injuries . Shepherd et al. (1988) were interested in the possibility of a link between the amount of alcohol consumed by a victim of an assault and the severity of the injuries suffered as a result of the assault. For this purpose, they classified the severity of injuries from an assault into five categories:

  • I = one hematoma or one laceration

  • II = multiple hematomas or lacerations

  • III = one fracture

  • IV = one fracture and hematomas and/or lacerations

  • V = more than one fracture.

A victim’s alcoholic consumption was categorized as: none, light (1–10 units), or heavy (> 10 units), where a unit of alcohol corresponded to either 1/2 pint of beer or lager, one measure of spirits, or one glass of wine. Following interviews and examinations of 470 consecutive victims of assault who came to an inner city accident and emergency service in 1986, they obtained the cross-categorized counts given in Table 10.24.

Table 10.24 Numbers of treated patients with various combinations of severity of injury and level of alcohol consumption
  1. (a)

    State the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .05?

10.B.5. Smoking and Hearing Loss . Smoking has been linked to the occurrence of a number of serious diseases, including lung cancer, emphysema, and various forms of heart disease. However, cigarette smoking may also lead to increased deterioration of other health functions, especially those that tend to worsen with age anyway. In particular, clinical studies have suggested that cigarette smoking may be associated with accelerated hearing loss as a person ages. Cruickshanks et al. (1998) conducted an extensive population-based study related to this issue. During the years 1993–1995, they gathered relevant information from questionnaires and examinations on over 3500 residents of the city/township of Beaver Dam, Wisconsin. For purposes of their study, Cruickshanks et al. defined a hearing loss to be a pure-tone average (PTA) of thresholds at 500, 1000, 2000, and 4000 Hz greater than 25-bB hearing level (dB HL) in the worse ear. In this exercise we concentrate on the relevant data for comparison of 534 nonsmokers (smoked fewer than 100 cigarettes in their lifetime), 445 ex-smokers, and 255 current smokers (at the time of the study) in the age group 48–59. The smoking related breakdown of the subjects in this age group who were diagnosed as having hearing losses is presented in Table 10.25.

Table 10.25 Prevalence of hearing loss for smoking/non smoking groups of subjects ages 48–59 in the Beaver Dam, Wisconsin study
  1. (a)

    State the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a) What is your conclusion at significance level .01?

10.B.6. Intensity of Smoking and Hearing Loss . In Exercise 10.B.5 we discussed a portion of the study by Cruickshanks et al. (1998) dealing with the overall effect of smoking on acceleration of hearing loss as one ages. They also collected data on the impact of the number of cigarettes and duration of smoking time on hearing. They defined the total pack-years smoked for a subject to be the number of cigarettes smoked per day divided by 20 cigarettes per pack, then multiplied by the number of years that the subject had smoked. In Table 10.26 are recorded the total pack-years smoked for each current or ex-smoker in the study, along with whether or not that subject was diagnosed as having a hearing loss.

  1. (a)

    State the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .01?

    Table 10.26 Prevalence of hearing loss by total pack-years smoked for subjects ages 48–59 in the Beaver Dam, Wisconsin study

10.B.7. Smoking and Hearing Loss. In Exercise 10.B.5 we discussed a portion of the study by Cruickshanks et al. (1998) dealing with the overall effect of smoking on acceleration of hearing loss as one ages. There we discussed the association between smoking behavior and hearing loss for subjects in the age group 48–59. Similar data for the age group 70–79 are given in Table 10.27.

Table 10.27 Prevalence of hearing loss for smoking/non smoking groups for subjects ages 70–79 in the Beaver Dam, Wisconsin study

Find the approximate P-value for a test of the hypothesis that smoking behavior and hearing loss are independent for this age grouping. Compare and contrast this result with your conclusion in Exercise 10.B.5.

10.B.8. Intensity of Smoking and Hearing Loss . In Exercise 10.B.6 we discussed a portion of the study by Cruickshanks et al. (1998) involving the overall relationship between smoking and hearing loss as one ages. There we discussed the association between total pack-years smoking and hearing loss for smokers in the age group 48–59. Similar data for the age group 70–79 are given in Table 10.28.

Table 10.28 Prevalence of hearing loss by total pack-years smoked for subjects ages 70–79 in the Beaver Dam, Wisconsin study

Find the approximate P-value for a test of the hypothesis that total pack-years smoking and hearing loss are independent for subjects in the age group 70–79. Compare and contrast this result with your conclusion in Exercise 10.B.6.

10.B.9. M & M Colors . Mars, Inc. claims that the color mix for M&M’s Peanut candy is 20% brown, 20% yellow, 20% red, 10% orange, 10% green, and 20% blue. Suppose we observe the color counts for a bag containing N = 750 Peanut M&M’s to be as specified in Table 10.29.

Table 10.29 Observed color counts for a bag of 750 Peanut M&M candies
  1. (a)

    If the company’s claim is correct, what would be the expected observed counts for the six colors?

  2. (b)

    State the hypothesis that corresponds to the company’s claim. Be explicit about all terms and numerical values.

  3. (c)

    Using the observed counts given above, find an approximate P-value for an appropriate test of the company’s claim. What is your conclusion for significance level .10?

10.B.10. Flicker Squawks and Keos . The common flicker, colapres auratus , has a diverse vocal repertoire. Flicker nestlings, however, produce only two distinct calls, squawk and keo. The keo is a common vocalization used by adult birds, both male and female, as well as by older nestlings to attract the parent(s) to the nest cavity to feed them (or to express agitation when they do not receive sufficient food from a parent!). In a number of bird species it is known that the vocalizations of the young change with time until a final innate template for the vocalization is achieved. Rosen (1979) conducted a study with flicker nestlings to see if the duration of their keo vocalizations changed as they matured. She observed the keo vocalizations for a group of flicker nestlings on four different days, corresponding to the nestlings being 17, 21, 22, and 24 days old. Using a Kay Electronic Company 6061B Sonograph , she made sonograms (visual representations -- plotting frequency of the sound on the ordinate versus time on the abscissa) of the keo vocalizations. The length (in mm) of the strip it creates on the sonogram represents the duration of an individual keo vocalization. The durations for the 71 keo vocalizations recorded by Rosen over the four days of observation are given in Table 10.30.

  1. (a)

    State the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .03?

    Table 10.30 Observed counts for lengths (in mm) of keo durations for flicker nestlings of various ages

10.B.11. Saving or Not? Developing a systematic approach to saving is important in order to provide sufficient funds to support one’s retirement years. Not surprisingly, people with higher income households tend to be better at this than those in lower income households simply because of the economics of the matter. What other factors might affect whether a person has taken this step toward systematically putting money away for her retirement? Princeton Survey Research Associates (1998) addressed a number of possible important factors, including gender, age, education, and race/ethnicity, in a nationwide survey conducted for the organization Americans Discuss Social Security . The results of their survey that relate to the age of the respondent are given in Table 10.31.

Table 10.31 Survey frequencies of saving approaches among various age groups of non-retired adults

Find the approximate P-value for a test of the hypothesis that age and saving approach are independent attributes.

10.B.12. Freshman Party Schools . The public often perceives certain institutions of higher education as ‘party schools’. However, are all such institutions created equal with regard to their students’ participation in such activities? Sax (1997) addressed a number of these and related issues in a comprehensive survey among college freshman in 1995. Respondents were asked whether or not they partied at least 6 h per week. The numbers of respondents from universities categorized as either public, nonsectarian private, Protestant private, or Catholic private who answered yes to this question are provided in Table 10.32. Were there differences among the types of four-year institutions with respect to partying by their freshmen in 1995?

Table 10.32 Numbers of college freshmen indicating that they partied at least 6 hours per week, categorized by type of four-year institution
  1. (a)

    Formally state the null hypothesis of interest here. Be sure to clearly identify all relevant parameters.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .075?

10.B.13. Danger for Union Soldiers Based on Rank During the American Civil War. The American Civil War was a deadly conflict for both Confederate and Union soldiers, but was it more deadly for some Union Army soldiers than for others? Lee (1999) investigated the pattern and causes of fatalities among Union soldiers by rank and placement in the battlefields. Based on a sample of 4295 recruits who enlisted in 45 companies organized in Ohio for whom information on both rank and duty was available, Lee compiled the wartime mortality data presented in Table 10.33 categorized by the soldiers’ ranks.

Table 10.33 Union soldiers rank and wartime mortality in the American Civil War
  1. (a)

    Formally state the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .061?

10.B.14. Danger for Union Soldiers Based on Battlefield Placement During the American Civil War. The American Civil War was a deadly conflict for both Confederate and Union soldiers, but was it more deadly for some Union Army soldiers than for others? Lee (1999) investigated the pattern and causes of fatalities among Union soldiers by rank and placement in the battlefields. Based on a sample of 4295 recruits who enlisted in 45 companies organized in Ohio for whom information on both rank and duty was available, Lee compiled the wartime mortality data presented in Table 10.34 categorized by the soldiers’ battlefield placements.

Table 10.34 Union soldiers battlefield placement and wartime mortality in the American Civil War
  1. (a)

    Formally state the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .022?

10.B.15. Danger for Union Soldiers Based on Combination of Rank and Battlefield Placement During the American Civil War. The American Civil War was a deadly conflict for both Confederate and Union soldiers, but was it more deadly for some Union Army soldiers than for others? Lee (1999) investigated the pattern and causes of fatalities among Union soldiers by rank and placement in the battlefields. Based on a sample of 4295 recruits who enlisted in 45 companies organized in Ohio for whom information on both rank and duty was available, Lee compiled the wartime mortality data presented in Table 10.35 categorized by the combination of soldiers’ ranks and battlefield placements.

Table 10.35 Union soldiers rank and battlefield placement and wartime mortality in the American Civil War
  1. (a)

    Formally state the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .061?

  4. (d)

    Compare your findings with those in Exercises 10.B.13 and 10.B.14.

10.B.16. Science and Religion . Does the degree of a person’s religious participation affect their outlook on the impact of science on society? The Pew Research Center (2009) asked 1976 adults, 18 years of age or older, about their participation in religious services and whether they saw science as having a mostly positive or mostly negative effect on society. The results of the survey are given in Table 10.36.

Table 10.36 Participation in religious services and view of impact of science on society
  1. (a)

    Formally state the null hypothesis of interest here.

  2. (b)

    Construct the table of expected counts if the null hypothesis in (a) is true.

  3. (c)

    Find the approximate P-value for an appropriate test of the null hypothesis in (a). What is your conclusion at significance level .025?

10.1.3 10.C. Activities

10.C.1. Hair Color and Educational Level . Discuss how to design an experiment to ascertain if there is any relationship between (true!) hair color and the highest educational degree a person attains. Specify both the numbers and types of categories for each of these attributes.

  1. (a)

    State the hypothesis of interest here.

  2. (b)

    Collect the necessary data from 20 students, 20 non-faculty employees and 20 faculty members at your university and test the hypothesis in (a).

10.C.2. Hair Color and Educational Level . In Question 10.C.1 you were asked to collect data from 20 students, 20 non-faculty employees and 20 faculty members at your university to assess whether there is any relationship between true hair color and a person’s highest educational degree. Couldn’t you have addressed the same issue more easily by simply sampling 60 students (or 60 staff employees or 60 faculty members)? Why or why not?

10.C.3. Who Studies More? Do freshmen, sophomore, junior, and senior college students spend about the same amount of time studying (on average) per week?

  1. (a)

    Discuss how to design an experiment to address this question.

  2. (b)

    State the hypothesis of interest here.

  3. (c)

    Conduct the experiment and use your data to test the hypothesis in (b).

10.C.4. Bridge. In the card game of bridge , an ordinary deck of 52 cards is dealt to four players, each receiving a 13-card hand. One of the important features of the game of bridge is the number of honor cards (ace, king, queen, jack, and ten) that an individual holds in her hand.

  1. (a)

    If a bridge hand is dealt at random to a player, what is the probability that the hand will not contain any honor cards? one honor card? two honor cards? more than two honor cards?

  2. (b)

    Using your answer to (a), state an appropriate null hypothesis regarding honor cards that corresponds to dealing a 13-card hand at random from an ordinary deck of 52 cards.

  3. (c)

    Using an ordinary (shuffled) deck of 52 cards, deal a hand of 13 cards and record whether the hand contains zero, one, two, or more than two honor cards.

  4. (d)

    Repeat the experiment in (c) 160 times, with reshuffling between the dealing of each hand, and use the obtained counts to test the fairness of your deals.

10.C.5. Bridge Again . In part (d) of Exercise 10.C.4 you were asked to deal 160 separate 13-card hands from an ordinary deck of 52 cards, with reshuffling between the dealing of each hand. It would have been much easier to simply deal four complete 13-card hands each time you shuffled the deck. You would then have had to reshuffle the deck only 40 times, rather than the 160 required in part (d) of Exercise 10.C.4.

  1. (a)

    Discuss why this proposed short-cut method is not equivalent to the more lengthy approach of part (d) of Exercise 10.C.4.

  2. (b)

    Repeat part (d) of Exercise 10.C.4, but this time deal all four 13-card hands each time you shuffle the deck and repeat the complete process only 40 times. Record the number of hands that contain zero, one, two, or more than two honor cards.

  3. (c)

    Compare the table of counts you obtained in part (b) with the table of counts you found in part (d) of Exercise 10.C.4. Are there substantial differences in the two tables? Discuss your finding.

10.C.6. Religious Preference and Political Affiliation . Is there a relationship between religious preference and political affiliation?

  1. (a)

    Using five religious categories and three political preferences, discuss how to design an experiment to address this question.

  2. (b)

    State the hypothesis of interest here.

  3. (c)

    Conduct the experiment and use your data to test the hypothesis in (b).

10.C.7. Fair Die ? Roll a six-sided die 120 times and record the frequencies with which each of the numbers 1, 2, 3, 4, 5, and 6 occur. Using your data, find the P-value for a test of the hypothesis that the die is fair (i.e., that the outcomes are equally likely).

10.C.8. Does Your Life Expectancy Depend on Your Month of Birth ? Doblhammer and Vaupel (2001) studied the relationship between month of birth and adult life expectancy. They concluded that people born in the Northern Hemisphere in autumn (October–December) live longer than those born in spring (April–June), but that the opposite is true in the Southern Hemisphere. Suppose you were asked to provide the statistical support for these conclusions.

  1. (a)

    State the null and alternative hypotheses of interest here.

  2. (b)

    What data would you need to test the null hypothesis?

  3. (c)

    How would you design an experiment to collect the necessary data?

  4. (d)

    Choose one of the states in the United States to serve as the data source and use public birth and death records to collect a small sample of the needed data.

  5. (e)

    Using your sample data, obtain the approximate P-value for an appropriate test of the null hypothesis.

10.1.4 10.D. Internet Archives

10.D.1. Original Skittles Flavors . In Exercise 10.B.9 we discussed the proportions of the various colors of Peanut M & M’s manufactured by Mars, Inc. Skittles is another popular candy brand produced and marketed by the Wrigley Company, a division of Mars, Inc. Search the Internet to discover what flavors (colors) make up the original Skittles and in what proportions the Wrigley Company claims they are produced? Buy ten individual packages of the original Skittles and count the numbers of pieces of each color in these ten packages combined. Using these counts, test the hypothesis that the flavor proportions claimed by the Wrigley Company are correct—then you can enjoy your Skittles!

10.D.2. Tropical Skittles Flavors. Repeat Activity 10.D.1 for the variety of Tropical Skittles.

10.D.3. Starburst Flavors . Starburst is another candy brand produced and marketed by the Wrigley Company, a division of Mars, Inc. Search the Internet to discover what flavors make up the original Starburst and in what proportion the Wrigley Company claims they are produced? Buy ten individual packages of the original Starburst and count the numbers of pieces of each color in these ten packages combined. Using these counts, test the hypothesis that the flavor proportions claimed by the Wrigley Company are correct—then you can enjoy your Starburst!

10.D.4. Freshman Party Schools—Updated . In Exercise 10.B.12 you were asked to assess if there were any differences in 1995 among the types of four-year institutions with respect to partying by their freshmen. Search the Internet to find a published article that addresses a similar question for a more recent year. Discuss the findings from this update and compare it to the results for 1995.

10.D.5. Search the Internet to find a journal article that reports on a research study in which the data collected were used to test for differences in population proportions, as discussed in Sect. 1. Prepare a brief summary of the study and the associated statistical analyses carried out by the authors.

10.D.6. Search the Internet to find a journal article that reports on a research study in which the data collected were used to test for association (independence) between two categorical attributes, as discussed in Sect. 2. Prepare a brief summary of the study and the associated statistical analyses carried out by the authors.

10.D.7. Search the Internet to find a journal article that reports on a research study in which the data collected were used to test for goodness-of-fit for probabilities in a multinomial distribution with I > 2 categories, as described in Sect. 4. Prepare a brief summary of the study and the associated statistical analyses carried out by the authors.

10.D.8. Search the Internet to find a journal article that reports on a research study in which the data collected were used to test for differences in two population proportions, as described in Sect. 3. Prepare a brief summary of the study and the associated statistical analyses carried out by the authors. If they used the approximate test procedure discussed in Sect. 1, repeat their analyses using the exact test procedure presented in Sect. 3. Compare the results of the exact and approximate tests.

10.D.9. Gallup, Inc., is an American research-based global performance-management consulting company that “provides data-driven news based on U. S. and world polls, daily tracking and pubic opinion research”. Their website www.gallup.com contains information about current and past public opinions on education, politics, the economy, and wellbeing, as well as other topics. Go to their website and find a report on a topic of interest to you that involves categorical data as discussed in this chapter. Prepare a short summary of Gallup’s sampling methods , data collection, and statistical analyses as described in the report.

10.D.10. The Pew Research Center is a nonpartisan American organization that uses “public opinion polling, demographic research, content analysis, and other empirical social science research” to inform the public about “the issues, attitudes and trends shaping America and the world”. Their interests include U. S. politics and policy; Internet, science and technology; religion and public life; and social and demographic trends. The website www.pewresearch.org contains past and current information they have gathered about these topics, as well as others. Go to their website and find a report on a topic of interest to you that involves categorical data as discussed in this chapter. Prepare a short summary of the Pew Research Center’s sampling methods, data collection, and statistical analyses as described in the report.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wolfe, D.A., Schneider, G. (2017). Statistical Inference for Two-Way Tables of Count Data. In: Intuitive Introductory Statistics. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-56072-4_10

Download citation

Publish with us

Policies and ethics