Abstract
Categorical variables, including nominal (where numbers are simply labels) and ordinal, rank order variables, are described by tabulating their frequencies or probability. If two categorical variables are associated, the frequencies of values of one will depend on the frequencies of values of the other. Chi square tests the hypothesized association between two categorical variables and contingency analysis quantifies their association.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
3.1 Electronic Supplementary Material(s)
Appendices
Excel 3.1: Construct Crosstabulations and Assess Association Between Categorical Variables with PivotTables and PivotCharts
Vastly Tree Free Paper Products
In order to explore the possible association between spend on paper products and household size, begin by making a PivotTable to see the crosstabulation.
Open Vastly.xlsx.
Select filled cells in the Number in Household and Spend categories columns and then insert a PivotTable.
Drag Number in Household to ROW, Spend category to COLUMN, and Spend category to ∑ Values.
Managers agreed that those who reported spending less than $5 monthly on paper products were not of interest, or were not purchasers of paper products. Remove them by clicking the column labels box and unselecting that category.
Reorder spend categories. Select the column label and drag to the position desired.
Excel 3.2: Use Chi Square to Test Association
For chi square, make a table of expected cell counts and a table of cell contributions to chi square.
Find expected counts from the row, column and grand totals.
In J5, enter the formula for the expected count, multiplying cells containing the Grand Total of households of 1, J5, and the Grand Total of spend category 5 to 15, B10, and then dividing by the Grand Total, I10:
Down fill and right fill in the remaining cells in the table with this formula.
In a third table, find each cell’s contribution to chi square, the squared difference between expected counts, in the second table, and actual counts, in the first table, divided by expected counts in the second table.
In the first cell, O5, of the third contributions to chi square table, enter:
Down fill the column and right fill the rows:
Use the Excel function SUM(array1,array2) to find contributions to chi square from each of the household sizes.
In X5,=sum(Q5:W5), then downfill.
Add the cell contributions in the 5 to 15 spend category column:
In Q10, =sum(q5:q9).
Right fill to add contributions to chi square from the other spend categories and chi square:
Find the p value for this chi square using the Excel function
CHISQ.DIST.RT(chisquare,df) with degrees of freedom df of 24
To see the conditional probabilities of spending categories given number in household category, convert cell counts to % of row.
From a data cell in the table, right click, Show Values As, % Row Total.
Make a PivotChart of Spend category by number in household: PivotTable Analyze, PivotChart. Choose a 100% stacked column.
Choose gradient colors:
Add axis titles and a stand alone title featuring your conclusion, as shown in Fig. 3.7:
Excel 3.3: Conduct Contingency Analysis with Three Categories
To control for household size, make a new pivot table with all three variables, purchase likelihood, spend category and household size. Drag household size, then spend category, to the rows. Drag purchase likelihood to the columns and ∑ values. Again, unselect spend <5 and drag spend categories so that they are in increasing order.
Find expected counts for each of the household sizes. Use household size subtotals instead of the grand total.
To easily see sparse cells, select the expected cells, then choose Conditional Formatting, with Less Than, 5:
Six of the cells have fewer than five expected.
To begin eliminating the sparse cells, combine 100+ with 60 to 100.
Select A11:A12, then combine.
PivotTable Analyze, Group Selection:
In the PivotTable Analyze menu, choose the collapse icon to remove the unnecessary labels:
Change the Group 1 label to 60+Â .
Update expected counts.
Two sparse cells remain. Combining spend of 16 to 24 and 5 to 15 and spend of 25 to 30 with 31 to 49 will eliminate those.
Update expected counts.
Find contributions to chi square, then sum by rows and by columns to find separate chisquares (with three degrees of freedom) and pvalues for each of the household size categories.
Controlling for household size reveals that only among the largest 5+ households is there a significant association between spend and purchase likelihood. Larger spenders are more likely to purchase, among households of 5 or more.
To illustrate, first show counts as percents of row total. Then copy cells in A and B and paste special (without formulas below the PivotTable. Move subtotals to a third column for each household size:
Select these new cells and insert a column chart:
Select the chart, then in Design, Change the Chart type to Combo:
Reduce decimals in the table (which will reduce decimals in the graph), add vertical and horizontal axis titles and chart title.
Case 3.1: Generation, Purchase Frequency and Paper Spend
Tranlin managers seek to identify the generations that spend more than average on paper products, and learn whether promising generations purchase more frequently.
A concept test was conducted with 1522 consumers who reacted to a verbal description of Tranlin’s sustainable toilet tissue. Data in Vastly 2023.xlsx contain consumer paper spend, generation and purchase frequency.
Managers sense that Millennials and Gen Z are likely to be the biggest spenders. If you encounter sparse cells, avoid combining these two generations with each other or with other generations in case they differ.
Spend by Generation
-
1.
Is paper spend associated with generation? Cite chi square with dfs and pvalue, as well as your conclusion.
-
2.
Produce a 100% stacked column chart of %s in spending categories by generation. Focus on the highest spenders in your chart title.
Purchase Frequency by Generation
-
3.
Is purchase frequency associated with generation? Cite chi square with dfs and pvalue, as well as your conclusion.
-
4.
Produce a 100% stacked column chart of %s in purchase frequency categories by generation. Focus on the most frequent consumers in your chart title.
Case 3.2: Contingency Analysis Case: Generation, Household Size and Interest in Vastly Sustainable Toilet Tissue
Tranlin managers seek to identify the target market(s) to whom their sustainable paper concepts appeal.
A concept test was conducted with 1522 consumers who reacted to a verbal description of Tranlin’s sustainable toilet tissue. Consumers were asked to rate their interest in purchasing Vastly toilet tissue. Data in Vastly.xlsx contain consumer purchase likelihood, generation and household size. Generations, from youngest to oldest, are: Gen Z, Millennials, Gen X, Baby Boomers, Oldest. Order Generations this way in your analysis and graphs.
Purchase Likelihood by Generation
Test the hypothesis that purchase likelihood is associated with generation.
-
1.
Is purchase likelihood associated with generation? Cite chi square with degrees of freedom and pvalue to support your answer.
-
2.
Produce a column chart of % likely to purchase by generation, including a reference line of the percent interested overall. (Since there are just two interest categories, show only Likely Trier percents.)
Generation by Household Size
Prior analysis revealed that larger households are more likely to purchase.
-
3.
Do Millennials or Gen Z live in larger households?
Test the hypothesis that household size is associated with generation in this sample.
-
4.
Is household size associated with generation? Cite chi square with degrees of freedom and pvalue to support your answer.
-
5.
Produce a stacked column chart of % of household sizes by generation.
Conclusions
-
6.
Which should Vastly target?
-
___ women in larger Gen Z households
-
___ women in larger Millennial households
-
___women in larger households in both Gen z and Millennials
-
___ women in larger households in all generations
-
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fraser, C. (2024). Association Between Categorical Variables: Contingency Analysis with Chi Square. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-42555-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)