Abstract
The social sciences face a problem of sample non-representation, where the majority of samples consist of undergraduate students from Euro-American institutions. The problem has been identified for decades with little trend of improvement. In this paper, I trace the history of sampling theory. The dominant framework, called the design-based approach, takes random sampling as the gold standard. The idea is that a sampling procedure that is maximally uninformative prevents samplers from introducing arbitrary bias, thus preserving sample representation. I show how this framework, while good in theory, faces many challenges in application. Instead, I advocate for an alternative framework, called the model-based approach to sampling, where representative samples are those balanced in composition, however they were drawn. I argue that the model-based framework is more appropriate in the social sciences because it allows for systematic assessment of imperfect samples and methodical improvement in resource-limited scientific contexts. I end with practical proposals of improving sample quality in the social sciences.
Similar content being viewed by others
Notes
The popular story told in statistics texbooks is that the Literary Digest used its own subscriber list, automobile registration and telephone books to choose its sample, and hence was biased towards wealthy Republicans (e.g., Likert 1948; Mendenhall et al. 1971). This story is disputed by Bryson (1976), favouring instead the explanation from nonresponse bias.
In particular, Bowley and Fisher remained skeptical, see Brewer (2013).
I shall use the terms “random” and “probabilistic” interchangeably. Practically speaking, random selection implies that every element of the population has an equal chance of being included in the sample, whereas probabilistic selection allows that chance to differ from element to element. However, probabilistic sampling is almost always accompanied by a correction procedure where elements with greater chance of selection are weighed less in analysis. Theoretically, the two methods are the same.
It seems that other statisticians, such as Bowley, have attempted to provide mathematical foundations for sampling before Neyman. However, Neyman does not discuss these alternative approaches in detail in his (1934) paper, and his paper is widely considered as the statistical landmark (see, e.g., Rao and Fuller 2017; Srivastava 2016). It seems reasonable to conclude that whatever mathematical foundations of survey sampling existed before Neyman, whether or not they are adequate, have had limited historical influence.
A reviewer has pointed out that the validity and interpretation of Segall’s results have been disputed. Indeed, it is a persistent difficulty to determine whether an observed difference is due to a difference in sample composition, methodological variation, or a number of other factors deemed irrelevant. One goal of the framework advocated in this paper is to help better systematize the variations in sample composition so as to facilitate better hypothesis testing regarding the source of a variation.
In certain special cases and with strong additional assumptions, a method may guarantee uniform convergence, where the estimation is always improved with increased sample size. When that happens, one can obtain an \(\varepsilon \)-\(\delta \) bound on how far “off” we can be for a given confidence threshold. However, this option is only open for fields where it is easy to repeatedly, truly–randomly gather large samples, which is unrealistic for the social sciences.
Neyman’s original analysis was based on stratified versions of random and purposive sampling. In his rendition of purposive selection, each stratum was sampled such that the mean of Y in the stratum sample equaled the mean of Y in the overall stratum. Allowing the means of Y to differ among strata, Neyman’s description of stratified purposive sampling is equivalent to sampling from the entire population in a way that the sample distribution of Y matches the population distribution of Y.
Exchangeability is a Bayesian perspective on how random sampling works. The design-based framework is, by and large, developed and used under the frequentist paradigm, where random selection is defined as i.i.d. (independent and identically distributed) sampling, which grounds the application of the Law of Large Numbers. Exchangeability is presented here because it offers are more intuitive description of the inferential process.
A design-model hybrid approach, called model-assisted sampling was developed not long after the development of the model-based approach. The hybrid approach aims to use properties of random selection to help guard against model misspecification (Cassel et al. 1976; see also Brewer 1999). I will not discuss the hybrid approach for two reasons. First, the importance of purposive balancing, which is my main thesis, is equally emphasized in both the model-based and hybrid approaches. Second, the guarding power of the hybrid approach against model misspecification only appears in large samples with relatively good randomization, which is not part of my target.
These estimators are approximately unbiased if the sample is approximately balanced.
Although the acronym “WEIRD” refers to a set of demographic features, the metascientific data Henrich et al. (2010b) relied on primarily concerned where samples were drawn, e.g., from undergraduate psychology classes at the researchers’ universities, supplemented by secondary data on the demographics of students of such universities.
Interpreted from this perspective, the preferential reporting of gender as a control variable brings up a series of questions concerning the presumed roles (and the presumed univocality of such roles) gender plays in shaping behaviour. Similar observations can also be made about the overreporting of some demographic variables and the underreporting of others. Indeed, since design-based principles cannot guide reporting or poststratification, culturally entrenched ideologies often substitute for this role. The philosophical implications of this dynamic are beyond the scope of the current paper but will be the subject of future work.
References
Andrews, G., Henderson, S., & Hall, W. (2001). Prevalence, comorbidity, disability and service utilisation: Overview of the Australian National Mental Health Survey. The British Journal of Psychiatry, 178(2), 145–153.
Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63(7), 602.
Bornstein, M. H., Jager, J., & Putnick, D. L. (2013). Sampling in developmental science: Situations, shortcomings, solutions, and standards. Developmental Review, 33(4), 357–370.
Brewer, K. (1999). Design-based or prediction-based inference? Stratified random vs stratified balanced sampling. International Statistical Review, 67(1), 35–47.
Brewer, K., et al. (2013). Three controversies in the history of survey sampling. Survey Methodology, 39(2), 249–262.
Bryson, M. C. (1976). The literary digest poll: Making of a statistical myth. The American Statistician, 30(4), 184–185.
Cassel, C. M., Särndal, C. E., & Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika, 63(3), 615–620.
Gächter, S. (2010). (Dis) advantages of student subjects: What is your research question? Behavioral and Brain Sciences, 33(2–3), 92–93.
Godambe, V. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society: Series B (Methodological), 17(2), 269–278.
Godambe, V. (1966). A new approach to sampling from finite populations. Journal of the Royal Statistical Society: Series B (Methodological), 28(2), 310–328.
Graham, S. (1992). “Most of the subjects were white and middle class”: Trends in published research on African Americans in selected APA journals, 1970–1989. American Psychologist, 47(5), 629.
Hall, C. C. I. (1997). Cultural malpractice: The growing obsolescence of psychology with the changing US population. American Psychologist, 52(6), 642.
Hansen, M. H., Madow, W. G., & Tepping, B. J. (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association, 78(384), 776–793.
Hart, C. G., Saperstein, A., Magliozzi, D., & Westbrook, L. (2019). Gender and health: Beyond binary categorical measurement. Journal of health and social behavior, 60(1), 101–118.
Henderson, S., Andrews, G., & Hall, W. (2000). Australia’s mental health: An overview of the general population survey. Australian and New Zealand Journal of Psychiatry, 34(2), 197–205.
Henrich, J., Ensminger, J., McElreath, R., Barr, A., Barrett, C., Bolyanatz, A., et al. (2010a). Markets, religion, community size, and the evolution of fairness and punishment. Science, 327(5972), 1480–1484.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010b). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83.
Jacobi, F., Wittchen, H.-U., Hölting, C., Sommer, S., Lieb, R., Höfler, M., et al. (2002). Estimating the prevalence of mental and somatic disorders in the community: Aims and methods of the German National Health Interview and Examination Survey. International Journal of Methods in Psychiatric Research, 11(1), 1–18.
Kessler, R. C. (1994). The national comorbidity survey of the United States. International Review of Psychiatry, 6(4), 365–376.
Kruskal, W., & Mosteller, F. (1980). Representative sampling, iv: The history of the concept in statistics, 1895–1939. International Statistical Review/Revue Internationale de Statistique, 48(2), 169–195.
Likert, R. (1948). Public opinion polls. Scientific American, 179(6), 7–11.
Mendenhall, W., Ott, L., & Scheaffer, R. L. (1971). Elementary survey sampling. Belmont: Wadsworth Pub. Co.
Mickelson, K. D., Kessler, R. C., & Shaver, P. R. (1997). Adult attachment in a nationally representative sample. Journal of Personality and Social Psychology, 73(5), 1092.
Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558–625.
Peterson, R. A. (2001). On the use of college students in social science research: Insights from a second-order meta-analysis. Journal of Consumer Research, 28(3), 450–461.
Pollet, T. V., & Saxton, T. K. (2018). How diverse are the samples used in the journals ‘evolution and human behavior’ and ‘evolutionary psychology’? Evolutionary Psychological Science, 5, 357–368.
Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences, 115(45), 11401–11405.
Rao, J., & Fuller, W. A. (2017). Sample survey theory and methods: Past, present, and future directions. Survey Methodology, 43(2), 145–160.
Royall, R. (1968). An old approach to finite population sampling theory. Journal of the American Statistical Association, 63(324), 1269–1279.
Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika, 57(2), 377–387.
Royall, R. M. (1992). The model based (prediction) approach to finite population sampling theory. Lecture Notes-Monograph Series, 17, 225–240.
Royall, R. M., & Herson, J. (1973). Robust estimation in finite populations. Journal of the American Statistical Association, 68(344), 880–893.
Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51(3), 515.
Segall, M. H., Campbell, D. T., & Herskovits, M. J. (1966). The influence of culture on visual perception. Indianapolis: Bobbs-Merrill.
Seng, Y. P. (1951). Historical survey of the development of sampling theories and practice. Journal of the Royal Statistical Society: Series A (General), 114(2), 214–231.
Sifers, S. K., Puddy, R. W., Warren, J. S., & Roberts, M. C. (2002). Reporting of demographics, methodology, and ethical procedures in journals in pediatric and child psychology. Journal of Pediatric Psychology, 27(1), 19–25.
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128.
Smith, T. (1976). The foundations of survey sampling: A review. Journal of the Royal Statistical Society: Series A (General), 139(2), 183–195.
Smith, T. (1983). On the validity of inferences from non-random sample. Journal of the Royal Statistical Society: Series A (General), 146, 394–403.
Smith, T. (1991). Post-stratification. Journal of the Royal Statistical Society: Series D (The Statistician), 40(3), 315–323.
Srivastava, A. (2016). Historical perspective and some recent trends in sample survey applications. Statistics and Applications, 14(1–2), 131–143.
US Census Bureau. (1989). Statistical abstract of the United States, 1989. Suitland: Bureau of the Census.
Westbrook, L., & Saperstein, A. (2015). New categories are not enough: Rethinking the measurement of sex and gender in social surveys. Gender and Society, 29(4), 534–560.
Wintre, M. G., North, C., & Sugar, L. A. (2001). Psychologists’ response to criticisms about research based on undergraduate participants: A developmental perspective. Canadian Psychology/Psychologie Canadienne, 42(3), 216.
Acknowledgements
I am grateful to Cailin O’Connor, Simon Huttegger, Michael Schneider, Greg Lauro, William Stafford, Jan-Willem Romeijn, the members of the philosophy of statistics reading group (in particular, Conor Mayo-Wilson, Samuel Fletcher, and Kathleen Creel), the audience at the Greater Cascadia HPS Workshop, and the audience at the University of Washington for invaluable discussion and encouragement, and two anonymous reviewers for the helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, K. Sample representation in the social sciences. Synthese 198, 9097–9115 (2021). https://doi.org/10.1007/s11229-020-02621-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-020-02621-3