Skip to main content

Advertisement

Log in

Simpson’s paradox in GDP and per capita GDP growths

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

Simpson’s paradox occurs frequently in economic data analysis, wherein aggregation is a common practice. Yet, this paradox is not well known among researchers in economy. In this article, we present several real-world examples of Simpson’s paradox in economic statistics, including gross domestic product (GDP) growth and per capita GDP growth aggregations across developing and developed countries. These manifestations of Simpson’s paradox highlight some important issues in developing economies and have implications on social and economic policies. We also present Simpson’s paradox for continuous variables, and its relationship with ecological correlation using empiric economic data. We show that failure to recognize Simpson’s paradox and ecological correlation can cause inaccurate interpretations of economic data. Furthermore, even when one recognizes Simpson’s paradox in the data, one may still make wrong interpretations, wrong policy decisions, or business decisions. We recommend causal analysis to discern the confounding variable(s) and to apply sound statistical modeling in the face of such a paradox.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Alin A (2010) Simpson’s paradox. Comput Stat Wiley Interdiscip Rev 2(2):247–250

    Article  Google Scholar 

  • Anderson DR, Sweener DJ, Williams TA (2009) Statistics for business and economics, 10th revised edn. South-Western College Pub

  • Bickel PJ, Hammel EA, O’Connell JW (1975) Sex bias in graduate admissions: data from Berkeley. Science 187:398–404

    Article  CAS  PubMed  ADS  Google Scholar 

  • Blyth CR (1972) On Simpson’s paradox and the sure-thing principle. J Am Stat Assoc 67(338):364–366

    Article  MathSciNet  MATH  Google Scholar 

  • Gehlke CE, Biehl K (1934) Certain effects of grouping upon the size of the correlation coefficient in census tract material. J Am Stat Assoc 29(185 Supplement):169–170

    Google Scholar 

  • Gildenhuys P (2003) The evolution of altruism: Sober/Wilson model. Philos Sci 70:27–48

    Article  Google Scholar 

  • Grotenhuis et al (2011) Robinson’s ecological correlations and the behavior of individuals: methodological corrections. Int J Epidemiol 40(4):1123–1125

  • Hernandez-Diaz S, Schisterman EF, Hernan MA (2006) The birth weight “paradox” uncovered? Am J Epidemiol 164:1115–1120

    Article  PubMed  Google Scholar 

  • Holland PW (2001) Causal inference and statistical fallacies. In: International encyclopedia of social & behavioral sciences. Pergamon

  • Liu KL, Meng X (2014) Comment: A fruitful resolution to Simpson’s paradox via multi-resolution inference. Am Stat 68(1):17–29

    Article  MathSciNet  Google Scholar 

  • Ma YZ (2009) Simpson’s paradox in natural resource evaluation. Math Geosci 41(2):193–213

    Article  MATH  Google Scholar 

  • Ma YZ, Ma AM (2011) Simpson’s paradox and other reversals in basketball: examples from 2011 NBA playoffs. Int J Sports Sci Eng 5(3):145–154

    Google Scholar 

  • Ma YZ, Zhang Y (2014) Resolution of happiness-income paradox. Soc Indic Res 119(2):705–721. doi:10.1007/s11205-013-0502-9

    Article  Google Scholar 

  • Paris MG (2012) Two quantum Simpson’s paradoxes. J Phys A 45:132001

    Article  MathSciNet  ADS  Google Scholar 

  • Pavlides MG, Perlman MD (2009) How likely is Simpson’s paradox? Am Stat 63(3):226–233

    Article  MathSciNet  Google Scholar 

  • Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Pearson K, Lee A, Bramley-Moore L (1899) Mathematical contributions to the theory of evolution—VI. Genetic (reproductive) selection: inheritance of fertility in man, and of fertility in thorough-bred racehorses. Philos Trans R Soc Lond Ser A 192:257–278

    Article  ADS  MATH  Google Scholar 

  • Robinson W (1950) Ecological correlation and behaviors of individuals. Am Sociol Rev 15(3):351–357. doi:10.2307/2087176

    Article  Google Scholar 

  • Schmitt J, Baker D (2009) Is the US unemployment rate today already as high as it was in 1982? Center for Economic and Policy Research, Report 202–293-5380

  • Simon CP, Blume L (1994) Mathematics for economists. W.W. Norton and Company, New York

    Google Scholar 

  • Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc Ser B 13:238–241

    MATH  Google Scholar 

  • Son HH (2012) A welfare-based approach to aggregating growth rates across countries. Oxf Bull Econ Stat 74(1):152–161

    Article  ADS  Google Scholar 

  • Spellman BA, Price CM, Logan JM (2001) How two causes are different from one: the use of (un)conditional information in Simpson’s paradox. Mem Cogn 29:193–208

    Article  CAS  Google Scholar 

  • Tuna C (2009) When combined data reveal the flaw of averages. Wall Street J December 2, 2009, Column “The Number Guy”

  • Wagner CH (1982) Simpson’s paradox in real life. Am Stat 36:46–48

    ADS  Google Scholar 

  • Wainer H, Brown L (2004) Two statistical paradoxes in the interpretation of group differences: illustrated with medical school admission and licensing data. Am Stat 58(2):117–123

  • Wikipedia (2012) http://en.wikipedia.org/wiki/Indian_rupee. Last Accessed 22 Feb 2013

  • Wilcox AL (2006) Invited commentary: the perils of birth weight—a lesson from directed acyclic graphs. Am J Epidemiol 164(11):1121–1123

    Article  PubMed  Google Scholar 

  • Wilcox AJ (2002) The analysis of birth weight and infant mortality: an alternative hypothesis. Epidemiology Branch, National Institute of Environmental Health Sciences, Durham, NC

  • World Bank (2011) http://data.worldbank.org/. Last Accessed 5 Feb 2013

  • World Bank (2014) http://data.worldbank.org/. Last Accessed 10 July 2014

  • Yule GH (1903) Notes on the theory of association of attributes in statistics. Biometrika 2:121–134

    Article  Google Scholar 

  • Yule GU, Kendall MG (1968) An introduction to the theory of statistics, 14th edn Revised and Enlarged, Fifth Impression. Hafner Pub. Co., New York

Download references

Acknowledgments

The author thanks Dr. Nikita Chugunov, Ernest Gomez, Dave Phillips, Andrew Ma, and Dr. Thomas Jones for discussing some data, and reading and commenting the manuscript. The opinions expressed in this article are those of the author only; they do not necessarily reflect the opinions of his institution.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Y. Zee Ma.

Appendices

Appendix 1: Mathematical formulation

Given three random events, \(A, B,\) and \(C\), and their complements, \({A}^{\mathrm{c}},\,{B}^{\mathrm{c}},\, {C}^{\mathrm{c}}\), Simpson’s paradox occurs when

$$ \begin{aligned} P({A|B \& C})&> P(A|B^{\mathrm{c}} \& C)\nonumber \\ P(A|B \& C^{\mathrm{c}})&> P(A|B^{\mathrm{c}} \& C^{\mathrm{c}}) \end{aligned}$$
(4)

Yet also

$$\begin{aligned} P({A|B}) \le P(A|B^{\mathrm{c}}) \end{aligned}$$
(5)

This happens when the random events B and C are dependent. If B and C are independent, it is not possible that the above inequalities, (4) and (5), are true at the same time. Thus, the paradox results from the dependence or interaction between B and C (Simpson 1951; Blyth 1972). Random variable C can be interpreted as a confounding variable that alters the conditional relationships when the conditional cells are collapsed into marginal cells. Notice that random variable C can represent one physical variable or the composite effect of several variables that cause the confounding. The converse is obviously true by symmetry; that is, the paradox can also occur in such a way that the inequality signs in (4) and (5) are flipped.

Several boundary conditions exist. When inequality (5) is simply equality, the aggregate of two larger (or smaller) quantities is equal to the aggregate of two smaller (or larger) quantities. Another boundary condition happens when two inequalities in (4) are equalities, but the inequality (5) remains an inequality of either larger or smaller sign.

The main reason that people are perplexed with the reversal is that they are intuitively confused between aggregation and arithmetic addition. In fact, another simpler way of formulating the paradox is the following inequalities. Given that

$$\begin{aligned} {a}/{b}<{A}/{B} \quad \quad \hbox {and} \quad \quad {c}/{d}<{C}/{D} \end{aligned}$$
(6)

Simpson’s paradox occurs when

$$\begin{aligned} ({a}+{c})/({b}+{d}) \ge ({A}+{C})/({B}+{D}) \end{aligned}$$
(7)

The reader can find other mathematical formulations of the paradox and related boundary conditions in Pavlides and Perlman (2009).

Appendix 2: Birth weight paradox

The birth weight paradox, as an example of Simpson’s paradox, appears in comparing infant mortalities of two or more different groups of mothers, such as smokers and nonsmokers, different races, different social status (Wilcox 2002, 2006). It is generally agreed that lower birth weight is strongly correlated to the risk of infant mortality. The Wilcox-Russell hypothesis describes the relationship between birth weight and infant mortality (see Fig. 3).

Fig. 3
figure 3

a Mortality rates for the two groups (solid line is Group A in Table 7), with illustration of the third variable’s double effects (reducing the weight and increasing mortality risk), and b frequency distribution of birth weight for two different groups (adapted from Wilcox 2002, 2006). This figure is the so called “the Wilcox-Russell hypothesis”

Consider Group A as the maternal nonsmoking population and Group B as the maternal smoking population. Mortality of the infants born to smokers is generally lower than that of the infants born to the nonsmoking mothers for the same birth weight, as shown in Table 7. If the conditional associations are genuine, maternal smoking reduces the risk of infant mortality. The conclusion would be that the maternal smoking is a good prenatal care for infant survivorship.

Table 7 Illustration of birth weight paradox as a manifestation of Simpson’s paradox

In fact, maternal smoking reduces the infants’ birth weights and increases their mortality risk, but the effect of birth weight reduction obscures the effect of mortality (see the two split effects in Fig. 3). For a given birth weight, the infants of smokers have a lower mortality because the reduction of the birth weights of those infants masks the increased mortality. Had smoking not decreased the birth weight of the infants, and the within-class comparisons conditioning to birth weight would show higher mortality rates for infants of smokers.

The overall mortality of the infants from the maternal smoking group is higher than that of the infants from the nonsmoking mothers, and inference using the aggregated data is not bad, but the within-class comparisons are false. It is possible to adjust the confounding variable so that the within-class comparisons are meaningful. The adjustment, however, is not always straightforward and could distort the results (Hernandez-Diaz et al. 2006; Wilcox 2006).

According to a study by Wilcox (2002), high altitude can lead to lower birth weight as well. Colorado-born infants, for instance, have a lower average birth weight than the US national average. For each bracket of the birth weights, infants born in Colorado have a lower mortality, but the overall mortality of Colorado’s infants is the same as the US national average. In other words, the third variable (high altitude) causes a reduction of the birth weight, but not an increased or decreased mortality. The conditional associations are spurious, but the marginal association is more meaningful.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y.Z. Simpson’s paradox in GDP and per capita GDP growths. Empir Econ 49, 1301–1315 (2015). https://doi.org/10.1007/s00181-015-0921-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-015-0921-3

Keywords

Navigation