A recurrent criticism of psychology as a science is the lack of diversity. This lack of diversity refers not solely to the producers of psychological science (e.g. Adair et al. 2002; Bauserman 1997; Cole 2006) or to the topics studied (e.g. Berry 2013), but also to the samples that are used as the basis for drawing inferences. This criticism is recurrent and has been voiced approximately once a decade since around 1965 (e.g. Arnett 2008; Gallander Wintre et al. 2001; Graham 1992; Henrich et al. 2010; Henry 2008; Nielsen et al. 2017; Schultz 1969; Smart 1966). Following a review of the participants in psychological studies, Schultz (1969:218) wrote: ‘The extremely small percentage of studies sampling the general adult population was particularly disturbing; none of the studies published in the Journal of Experimental Psychology during those years used a sample of the general population’.

There is an indisputable geographical bias to the populations sampled by psychologists. For example, a review of articles published between 2006 and 2010 in the three experimental developmental psychology journals with the highest impact factors found that over 90% of the research participants came from Australia, Canada, Europe, the USA, or New Zealand, while under 3% of the participants in the research studies were from Africa, Asia, Central and South America and the Middle East and Israel (Nielsen et al. 2017). Similarly, in the flagship journal ‘Journal of Personality and Social Psychology’, 96% of the papers published in 2012 were based on WEIRD samples (Kurzban 2013). Further, from its inception, psychology has relied heavily on undergraduate samples, a situation that has not changed substantially over time. For example, Gallander Wintre et al. (2001) reviewed 1179 articles spanning six journals across the different subdivisions of psychology and found 68% of the samples to be student samples. They also found that, if anything, the reliance on student samples had increased between 1975 and 1995. A classic paper by Sears (1986) reviewed papers published in 1980 in three mainstream social psychology journals and found that 82% of the samples used students in some form, and 75% used undergraduate students (mainly from the USA) exclusively. Likewise, the 1995 editions of two leading social psychology journals (‘Journal of Experimental Social Psychology’ and ‘Journal of Personality and Social Psychology’) used undergraduate students as participants in 95.8% and 70.6% of all cases respectively (Gallander Wintre et al. 2001), and Arnett (2008) calculated that 74% of the samples in the journal ‘Social Psychological and Personality Science’ were from student populations. The issues relating to sampling are not limited to (social) psychology, and similar concerns have been voiced in other related disciplines such as consumer research (Peterson 2001), education research (Usher 2018), behavioural economics (Levitt and List 2007) and business research (Bello et al. 2009). For example, Peterson (2001) reviewed the literature in consumer research and found 86% of the samples to be from students.

Online Participants

Perhaps in part as a response to these sorts of criticisms, (social) psychologists have increasingly turned to online platforms to recruit participants who are not students (e.g. Gosling et al. 2004; Gosling et al. 2010; Gosling and Johnson 2010). Over the past decade, there has been a strong increase in the use of participants crowdsourced via online platforms, such as Amazon MTurk or CrowdFlower (e.g. Buhrmester et al. 2011; Paolacci and Chandler 2014). Such expansion has benefited psychological research in many ways. For example, results from classic behavioural experiments (e.g. the Stroop Task; Stroop 1935) were shown to replicate well on these online platforms (Crump et al. 2013). Even when studying political ideologies, it seems that MTurk is well-suited (Clifford et al. 2015). However, although Amazon boasts > .5 million participants, the actual pools from which participants are sampled are much smaller, with the population estimated at around 7300 individuals (Bohannon 2016). Moreover, while crowdsourcing platforms such as Amazon’s MTurk allow for sampling more diversely than typical student samples, for example with respect to age range, the participants remain predominantly WEIRD, with some notable exceptions (e.g. Raihani et al. 2013).

Evolutionary Psychology as an Exception?

Since its inception, evolutionary psychology has stressed the importance of human universals (e.g. Buss 1989, 1994, 1995; Cosmides and Tooby 1997; Tooby and Cosmides 1990). Empirical examples where researchers have evaluated whether universals exist by testing across different populations include studies of homicide (Daly and Wilson 1988), economic behaviour (e.g. Henrich et al. 2005) and mate preferences (e.g. Buss et al. 2000; Buss 1989; Schmitt 2005; Shackelford et al. 2005). Cross-cultural universals (see for example those listed in Brown 1991, 2000) are often used by evolutionary psychologists as evidence for adaptive psychological mechanisms (e.g. Buss 1995). It would thus seem, as Apicella and Barrett (2016: p. 92) have argued, that ‘perhaps no field of psychology is more strongly motivated and better equipped than evolutionary psychology to respond to the recent call for psychologists to expand their empirical base beyond WEIRD (Western Educated Industrialized Rich Democratic) samples’. Similarly, Kurzban (2013) argued on the Evolutionary Psychology blog that ‘adding evolution to psychology makes the science less WEIRD’. He found that for the 2012 volume, 65% of the articles in the journal ‘Evolution & Human Behavior’ were WEIRD, which contrasts favourably with data for other fields as cited above. This initial evidence suggests that evolutionary psychology is indeed less WEIRD than some subdivisions of psychology.

Here, we examine the samples used in two leading evolutionary psychology journals in more depth. We have no explicit hypotheses, but rather describe the samples used in these two journals according to their geographical origin, age group (adult or child), student status and source (online vs. offline). In addition, we test whether the sample sizes vary based on these categories. Our aim is to provide an up-to-date snapshot of contemporary evolutionary psychology sampling practice, while responding to calls to increase description within science (Scott-Phillips 2018); it is easier to move forward if we better know where we currently stand.

Methods

Coding

As part of a larger project, data relevant to our research questions were captured from all of the articles published in 2015 and 2016 within the journals ‘Evolutionary Psychology’ (EP, published by Sage) and ‘Evolution & Human Behavior’ (E&HB, published by Elsevier). The 2015 articles were coded by eight coders under the supervision of the first author, and the 2016 articles were coded by the first author. Of course, many key papers on evolutionary psychology are published in other outlets, such as ‘Journal of Personality and Social Psychology’ (e.g. Buss and Shackelford 1997; Kenrick et al. 1995) or ‘Psychological Science’ (e.g. Buss et al. 1992). However, it is reasonable to assume that any article published in EP and E&HB is allied to the discipline of evolutionary psychology, broadly conceived. In addition, the choice follows Kurzban’s (2013) selection of E&HB for analysis and the publication of his analysis within the Evolutionary Psychology blog, on the website of the eponymous journal.

The coders recorded the geographical region from which the data originated based on the M49 UNDP codes (United Nations 2013: Africa, North America, Latin America and the Caribbean, Asia, Australia, Europe and Oceania (excluding Australia)). If a paper listed more than five countries, we labelled it ‘cross-cultural’; we coded each sample individually for papers with one to four geographical samples. We acknowledge that some papers could have an explicitly ‘cross-cultural’ goal even with just two samples, for example studies establishing measurement invariance. However, our focus here is on the samples being used not the paper, as we believed this to be easier to assess.

After piloting, we settled on the following eight categories for the sample participants: online (paid crowdsourced, such as a sample recruited via MTurk); online (unpaid crowdsourced, such as a sample recruited via Facebook or Twitter); offline (western child); offline (western student); offline (western non-student adult); offline (non-western child); offline (non-western student); offline (non-western, non-student adult). Online studies were subdivided only into paid and unpaid samples given the focus of our research, together with the difficulty of confirming online participant age, student/non-student status and location. Samples from Europe, Australia, New Zealand and North America were coded as Western, while samples from other countries coded as non-Western, following (Stulp et al. 2017). Where disagreement between coders existed, this was resolved via discussion.

Data Analysis

We used R (3.5.1, R Development Core Team 2008) and, among other packages, the R packages bindrcpp (0.2.2, Müller 2017), broom (0.5.0, Robinson 2017), dplyr (0.7.6, Wickham and Francois 2017), ggplot2 (3.3.0, Wickham 2009), knitr (1.17, Xie 2015), papaja (0.1.0.9709, Aust and Barth 2016), plyr (1.8.4, Wickham and Wickham 2017), readxl (1.010, Wickham and Bryan 2017), stargazer (5.2.2, Hlavac 2014) and tidyr (0.8.1, Wickham 2014) for our analyses. To compare sample sizes, we relied on non-parametric statistics (Siegel and Castellan 1988), with post-hoc comparisons adjusted for multiple testing (Benjamini and Hochberg 1995), given that visual inspection showed that the data were non-normally distributed. We used logarithmic transformations when presenting figures on sample sizes because the largest samples are so much greater than the smallest (Keene 1995). The data and analysis document, including a list of all R packages used in the analysis, are available from the Open Science Framework (http://osf.io/pajhy).

Results

Descriptive Statistics

There were 219 papers, of which 180 papers contained codable samples (EP 76; E&HB 104). Thirty-nine papers could not be coded because they consisted of, for instance, mathematical models, work on non-humans, or literature reviews. Within the 180 codable papers, there were 311 samples, and the median number of samples per paper was 1. The mean sample size was 4094 but this was driven by one extremely large sample (N = 927,134). The median sample size was 186 but sample sizes varied substantially (minimum 11; first quartile 96.5; third quartile 334.5; maximum 927,134).

Figure 1 shows the distribution of samples by geographical region. The majority of samples were from North America (153), followed by Europe (93) and Asia (37). Of the Asian samples, the majority were from Japan (11), followed by China (7) and Israel (6). There were only 6 samples from Africa (4 from Tanzania, 1 from Namibia, 1 from Nigeria) and only 8 from Latin America and the Caribbean (2 from Guatemala; 2 from Curaçao; 2 from Bolivia; 1 community from Northern-Brazil/Southern Guyana/East Venezuela and 1 undefined [Latin American students studying in Germany]). There were 7 samples from Australia and 1 from Oceania (excluding Australia): Fiji Island. Only 6 samples were Cross-Cultural (containing samples from more than five different countries). Combining the figures, we found that around 8 out of 10 samples were from WEIRD populations (81%, Europe/North America/Australia), and that 87% of the samples used were from developed regions (following the UN classification; United Nations 2013).

Fig. 1
figure 1

Origin of samples. N. America, North America; L. America, Latin America and Caribbean

In terms of sample type, 113 of the 311 samples were Western student samples, while 24 were non-Western student samples; 60 samples were online paid crowdsourced, while 20 were unpaid crowdsourced. Thus, 70% of the samples were either online samples or student samples. Twenty-five samples were based on children (21 from Western and 4 from non-Western populations). Only a small fraction of the samples consisted of non-Western adults who were not students (24 out of 311 samples, or 8%).

Are Samples from Certain Geographical Locations Larger than Others?

Given that there was only one sample from Oceania (excluding Australia) (see Figs. 1 and 2), we combined this with Australia for the analysis of variation in sample sizes between regions (see ESM for additional analyses using this combination). Variation in sample size between geographical regions was not statistically significant (Kruskal-Wallis test: χ2(6) = 10.095, p = .12). Following adjustment for multiple testing, the median sample size was found to be significantly larger for cross-cultural samples (which, according to our coding criteria, had to contain data from more than five different countries) than for Latin American and Caribbean samples (p = .037). The ESM contains all post-hoc multiple comparisons (all remaining p values > .09; see ESM).

Fig. 2
figure 2

Violin plot for geographical origin and Log. sample size, density distribution (curve), median (horizontal line), interquartile range (IQR, box), whiskers (1.5 times the IQR) and individual samples (dots)

Are some Types of Samples Larger than Others?

The sample sizes differed significantly according to type (Fig. 3; Kruskal-Wallis test: χ2(7) = 63.9, p < .0001). Post-hoc comparisons adjusted for multiple testing using the Benjamini-Hochberg procedure showed that online (paid crowdsourced) and online (unpaid crowdsourced) samples tended to be larger than other types of samples (Fig. 3; Table 1).

Fig. 3
figure 3

Violin plot for geographical origin and Log. sample size, density distribution (curve), median (horizontal line), interquartile range (IQR, box), whiskers (1.5 times the IQR) and individual samples (dots)

Table 1 Post-hoc comparisons based on the type of sample

Discussion

Our survey of papers published in 2015–2016 in two key journals relevant to evolutionary psychology, ‘Evolution & Human Behavior’ and ‘Evolutionary Psychology’, indicated a clear dominance of adult samples from Western, developed countries, with a particular preponderance of North American samples. Seventy percent of samples were sourced online or from student populations. Asian samples mainly consisted of samples from Japan. Notable under-representations included samples collected from Africa and Latin America (including the Caribbean). Data collected online typically gave rise to the largest sample sizes.

Implications of Relying on WEIRD Samples

In our survey, 81% of samples were from WEIRD populations. The main advantages of relying on WEIRD samples arise from the fact that most authors are WEIRD, and so WEIRD samples are more practical and convenient, particularly in respect of ease of access and low costs of data sampling. Requiring costly and time-consuming data collection can stifle scientific research, given that it is often poorly resourced (Lakens et al. 2018). There are obvious practical difficulties in collecting data outside of one’s country of residence, or in countries where attitudes and familiarity may vary in relation to the psychological procedures that will be well-known to readers of this journal, such as models of obtaining informed consent, or methods of data elicitation. Indeed, we do not always need to go far to find diverse samples (e.g. Hill et al. 2014; Nettle 2017; Wilson 2011; Wilson et al. 2009). WEIRD samples themselves are certainly not homogeneous, and for instance even people from different neighbourhoods within the same city can vary as much as people from entirely different cultures (Nettle 2017; Nettle et al. 2011). Thus, even samples from WEIRD cultures can be sufficiently diverse that they provide some useful evidence in support of generalisability.

On the other hand, evolutionary psychologists are often keen to sample outside WEIRD populations because of their interest in assessing and recording the nature of humans as a species. WEIRD populations may experience environments (in terms of novel technology, experience of hunger, exposure to death and so on) that are particularly dissimilar from those experienced by many of our ancestors, something that needs to be borne in mind when constructing and testing evolutionary theories of behaviour. Human universals (Brown 1991, 2000) may only be uncovered following assessment of multiple human populations, if not all populations, and assist in developing and evaluating evolutionary theories of behaviour. If we are to test how individual differences may be functionally adaptive (Tybur et al. 2014; Wilson 1998), we need to compare individuals from different ecological settings (Nettle 2009). An awareness of the diversity of worldwide human behaviours would help us understand how human behaviours emerge from an interaction between local ecologies and our evolved brains (Henrich et al. 2010). For these sorts of reasons, classic studies that seek to test adaptive reasoning have taken pains to survey different populations (e.g. Buss 1989; Daly and Wilson 1988; Kenrick and Keefe 1992; Schaller and Murray 2008; Schmitt 2005; Scott et al. 2014). Reliance on WEIRD populations limits discovery of any patterns that might allow us to predict domains where psychological phenomena are more likely to be universal, and domains where psychological phenomena are more likely to show variability (Henrich et al. 2010). As an additional step, WEIRD authors (including ourselves) could usefully reach out to non-WEIRD collaborators to attempt to draw from wider samples. Encouraging greater diversity among authors should automatically increase participant diversity (Medin et al. 2017).

Henrich et al.’s (2010) renowned position piece explains that participants from WEIRD (Western, Educated, Industrialised, Rich, Democratic) populations can be more or less universally representative, dependent upon the area of research, and goes on to detail where reliance on WEIRD populations might not present a complete picture. To summarise Henrich et al.’s findings in as far as they are of particular concern to evolutionary psychologists, behavioural economics games used to assess fairness and co-operation showed that western undergraduate samples behaved very differently from participants from other societies. Similarly, folkbiological reasoning develops differently in rural American children compared with children from other settings. Further, research on moral reasoning has also now shown significant differences between the original data collected from western cultures and data collected later among more diverse cultures. In each case, theories were initially developed that assumed that the results from cultures more familiar to the researchers were universal. On the other hand, some research topics that will be familiar to readers of this journal seem to be those for which we have good evidence of universality, and where a reliance on less diverse samples is less problematic. Such topics can include emotional expression and pride displays, false belief tasks, some mate preferences, personality structure, psychological essentialism, punishment of free-riders and social relationships (Henrich et al. 2010).

Implications of Relying on Student Samples

We found that 44% of the samples that we coded were student samples (137 out of 311 samples). The advantages of using student samples are similar to the advantages of using WEIRD samples, together with the additional advantages that student participants should be comfortable within the university setting and accustomed to following task instructions (Rosenthal 1965). On the other hand, a reliance on student samples may be particularly problematic when dealing with topics where there is a clear impact of the variables that distinguish students from the general population. These may include broad variables such as age, experience, socio-economic background and educational level, as well as more specific tendencies including students’ greater level of cognitive ability and obedience to authority, more transient friendships and still nascent attitudes and sense of self (Sears 1986). Areas that have received specific criticism due to their reliance on student sampling include research on economic decision making (Levitt and List 2007), socio-political attitudes (Schultz 1969), the psychological processes relating to prejudice (Henry 2008) and industrial and organisational psychology (Bergman and Jean 2016). Effect sizes calculated from student data can differ from other populations not merely in magnitude, but also in direction (Peterson 2001). Further, if researchers are interested in features of a variable (e.g. its range, distribution, mean), then it will not be possible to assess that accurately from a sample that is partially selected in relation to that variable: thus, population-level IQ scores cannot be assessed from student samples.

Implications of Relying on Online Samples

Our survey pointed to a substantial reliance (around one-quarter of all samples) on online data collection. It has been suggested that the internet offers a practical solution to reliance upon WEIRD samples (Gosling et al. 2004). Advantages of online sampling include cheap, quick and convenient access to participants who can often be recruited in larger numbers than are readily available for offline studies, and this ease of access to large online samples appears to be reflected in our analyses above. Indeed, online sampling can reach a greater diversity of participants, including some difficult-to-reach and under-represented populations (Andrews et al. 2003). Further, the anonymous setting of an online survey might arguably provoke more honest answers to questions on sensitive topics, compared with lab data collection (Joinson 1999). From another perspective though, internet sampling is limited in terms of the kinds of research tools that can be used, and in addition internet access itself is only available to a proportion of the population (and in some countries, a smaller proportion of the population than those who constitute undergraduate samples in other countries), meaning that the sample is still restricted (Gosling et al. 2010). Researchers sometimes raise concerns that data collected via online sampling might be of lower quality than that collected using more traditional methods (e.g. Matzat and Snijders 2010; Paolacci and Chandler 2014). Online participants do not have easy access to the researcher to raise queries, might enter data carelessly or thoughtlessly, or might have chosen to enter data merely in order to view the study content (Aust et al. 2013). Accordingly, to test the quality of online data collection, various studies have compared data collected online and offline and concluded that in many instances the two sampling methodologies give rise to very similar outcomes (Krantz and Dalal 2000). For instance, judgements of the attractiveness of different female body shapes were similar, irrespective of whether data were collected from laboratory studies of psychology undergraduate students or online from visitors to psychology webpages hosted by the same university (Krantz et al. 1997). Nevertheless, data collected online and offline are not identical (Birnbaum 2004; Epstein et al. 2001). This is unsurprising, given that context and environment can influence behaviour. The demographic differences between people with and without internet access may be particularly stark in developing countries, and so online sampling may not be the most suitable way to reach diverse populations in those countries (Batres and Perrett 2014), and in some instances of course internet access can contribute to behaviour that we might want to assess; for example, media exposure appears to explain differences in preferences for faces and body types (Boothroyd et al. 2016). However, despite the differences between online and offline samples, one should not be seen as the poor cousin of the other; both have strengths and can be used in complementary fashion to rigorously test inferences.

Implications of Relying on Restricted Samples

From a statistical point of view, restricted sampling can lead to selection bias which in turn can lead to confounding (Bareinboim and Pearl 2012; Elwert and Winship 2014; Fiedler 2000; Rohrer 2018). Recently, statisticians have more explicitly defined the conditions under which causal inferences can be made when combining data under heterogeneous conditions (Bareinboim and Pearl 2016). Depending on the type of inference researchers want to make, they could face confounding, sampling bias, or transportability bias. Importantly, these issues apply to both the decision to focus on (for example) a WEIRD population as well as expanding the research to non-WEIRD populations. An exclusive focus on restricted samples comes at the cost of external validity. In medical research, there has been a repeated call to revalue external validity (e.g. Burchett et al. 2011; Green and Glasgow 2006; Steckler and McLeroy 2008). While there had previously been a strong focus on internal validity, for example, focussing on questions such as whether confounding can be effectively ruled out in randomised controlled trials, there has been a call to also remember the importance of external validity (can we generalise the findings from this trial?). An obvious issue with relying on WEIRD, student and/or online samples would be the degree to which any conclusions would hold in different populations (Henrich et al. 2010; Henry 2008; Sue 1999).

Even more fundamentally, and before making causal inferences about other populations, researchers face the more basal problem of knowing whether they are measuring the same ‘thing’ in different populations. This issue is well-understood in the field of psychometrics and has led to the development of measurement techniques and tests to examine the degree to which constructs are measured consistently across cultures (Heine et al. 2002; Hui and Triandis 1985; Nasif et al. 1991; Poortinga 1989; Van de Vijver and Leung 1997). We did not explicitly assess how many papers established equivalence of measurement between different samples, as we focused on the samples. Our standard psychological instruments, often developed by researchers working within WEIRD settings, may limit the generalisability of research findings (Ceci et al. 2010; Konečni 2010; Rochat 2010). We therefore call for more research explicitly establishing that the same ‘thing’ is measured in different populations. Depending on the sampling scheme, broadening the research to non-WEIRD populations could also give rise to problems such as non-independence (Mace and Pagel 1994; Naroll 1965; Pollet et al. 2014; Ross and Homer 1976), which then would need to be addressed. We do not discuss these issues in further detail here, as the degree to which they matter could differ on design (experiment/correlational), covariates and research question. For example, for many psychophysical studies, and also evolutionary psychological studies (Tybur et al. 2014), the focus is on within-individual differences. The implicit assumption is that these would not vary depending on the population studied. For such studies, it would be useful for authors to be more explicit to which degree these within-individual differences are expected to generalise to other samples. In some cases, restricted sampling in itself is useful to determine whether a behaviour exists or not, and as such testing a WEIRD, student and/or online population could constitute a necessary first step (Greenwood 1982; Mook 1983). Given that every sample is restricted in some way, authors can usefully make a statement pertaining to the constraints on generality, to explain the boundaries of the population that they believe their results to apply to (Simons et al. 2017). More broadly, the field would benefit from setting out the conditions under which causal inferences can be made (Pearl 2009a, b).

Limitations and Future Directions

Our analysis did not cover all of the journals that publish evolutionary psychological studies. Instead, mirroring the work that has been done in other reviews of sample diversity (e.g. Arnett 2008; Gallander Wintre et al. 2001; Sears 1986), we focussed on key journals. Many papers on evolutionary psychology are published outside of those two journals. Similarly, there might be papers in our sample which have a different focus than evolutionary psychology, or which might be better classified as relating to fields such as comparative cognition, behavioural economics, linguistics, demography, or anthropology among others. Future work might compare one journal to the next, and compare across a sequence of different years, to determine the variance in sample diversity. Alternatively, one could define keywords to more clearly delineate articles covering evolutionary psychology. Further, future research might seek to uncover whether different research areas within evolutionary psychology are more or less reliant upon non-diverse samples, and how this corresponds to their development as a research area. Exploratory studies might well choose to focus on easily accessible samples such as undergraduates to test their initial ideas, whereas more mature research areas ought to seek to diversify their samples further in order to test the generalisability of their findings. Even within specific research areas, some topics have been studied in more diverse worldwide samples than others; for instance, sex differences in partner preferences draws from data from many cultures (e.g. Buss 1989; Shackelford et al. 2005), whereas research on ovulatory shifts in partner preferences very much rests upon studies carried out in English-speaking WEIRD countries (Gildersleeve et al. 2014).

Our survey does not present cause for despair. In terms of participant diversity, evolutionary psychology does rely on WEIRD, student samples less heavily than some fields (Apicella and Barrett 2016; Kurzban 2013). However, this is perhaps in part because of the discipline’s need for cross-cultural surveys to validate theories that claim to purport to humans as a species. Evolutionary psychology has a greater need for cross-cultural replications than other disciplines, such as those focussed around basic psychophysics where we might more easily assume universal underlying mechanics, or more descriptive research approaches that aim to uncover behaviour in culturally-specific environments, such as the workplace or social media sites. We do not mean to imply either that sample diversity should be the only goal; there are many valuable ways to add to our understanding of any phenomenon. Valuable extensions to research on a WEIRD, student sample can arise, for instance, from adding methodological diversity, developing theoretical frameworks, creating models of the behaviour, or testing similar behaviours in other species. Developmental approaches can make a useful testing ground for adaptive predictions, given that individuals have different adaptive needs across their life course, but we note that only 8% of the samples that we coded used child participants. Scientists, including evolutionary psychologists, are increasingly recognising the value in replication across multiple labs and samples (e.g. Camerer et al. 2016; Ebersole et al. 2016; Errington et al. 2014; Zwaan et al. 2018). In this light, it is of interest that the first study to be accepted by the Psychological Science Accelerator (https://psysciacc.wordpress.com/), a project that uses multiple laboratories to test hypotheses, was proposed by two researchers, Jones and DeBruine, whose work often draws upon a functional framework. For now, we conclude that while two key journals use more diverse samples than many typical (social or developmental) psychology journals, as Kurzban (2013) suggested, it is important to realise that given the glaring underrepresentation of certain regions we still have a long road ahead.