The importance of precision in the history of science is difficult to overestimate. As a famous example in chemistry, Lavoisier’s advances in the precision of measurement helped overturn the phlogiston theory that had dominated for approximately two centuries (see Asimov, 1965, and Trafimow & Rice, 2009, for accessible descriptions). Advances in measurement precision have also enabled important developments in physics, genetics, astronomy, and many other fields. Psychology has benefited, too. For example, advances in the measurement of psychological constructs, including ways to increase measurement reliability—thereby also increasing measurement validity—have importantly influenced many areas in psychology. There also have been advances in psychological measurement pertaining directly to validity, rather than being mediated through reliability. It is not too dramatic, for example, to credit measurement advances by Fishbein (e.g., Fishbein & Ajzen, 1975) for rescuing social psychology from the crisis of a lack of attitude–behavior correlations approximately half a century ago (Wicker, 1969). Yet, measurement precision is not the only kind of precision with which psychologists need to be concerned.

In addition to measurement precision, there also is the issue of sampling precision (Trafimow, 2018b). That is, under the usual assumptions of random and independent sampling from a population, we might ask: How well do our summary statistics—such as the sample means that psychology researchers typically compute—represent the populations they allegedly estimate? As with measurement precision, the importance of sampling precision with respect to means in the history of science is difficult to overestimate. For example, Stigler (1986) provided a compelling description of how astronomers learned to take multiple readings from their telescopes because they came to realize that the mean of many readings is superior to having only one reading. The astronomy case is particularly interesting, because it may have been the first time that scientists consciously took advantage of the ability of increased sample sizes to increase the precision of means. Porter’s (1986) excellent review reveals how, particularly through the work of Quetelet (1796–1874) in the 19th century, scientists have increasingly appreciated the relevance of the law of errors to their research.

A well-known and important implication of the law of errors is that as sample sizes increase, sampling precision also increases. In turn, as sampling precision increases, so does the probability of replication (Trafimow, 2018a). In the present environment, where there is much concern about replication in the sciences, especially in psychology (Open Science Collaboration, 2015), the importance of sampling precision should be particularly salient. Sampling precision is beneficial for the probability of replication; as sampling precision increases, so does the probability of replication (Trafimow, 2018a). In addition, as sampling precision increases, sample statistics more accurately reflect the corresponding population parameters they are used to estimate. Thus, our focus here is on sampling precision in psychology. To put our concern in the form of questions: Does psychology research have adequate sampling precision? Does sampling precision vary across areas in psychology? And does sampling precision vary depending on whether articles are sampled from journals in the upper or lower echelon with respect to impact factor?

The a priori procedure

To answer the foregoing questions, we proposed to use Trafimow’s a priori procedure (Trafimow, 2017, 2018a; Trafimow & MacDonald, 2017; Trafimow, Wang, and Wang, 2018a, b), which allows researchers to estimate sampling precision regardless of the results of the study. Because this is a relatively new procedure, and consequently less familiar than other procedures, this section is devoted to an explanation.

On the basis of the assumption that researchers wish to be confident that their sample statistics are close to their corresponding population parameters, Trafimow (2017) suggested that researchers could ask two questions pertaining to the sample size needed for researchers to be confident of being close:

  • How close is close?

  • How confident is confident?

The issue is for researchers to decide how close they want their sample statistics to be to their corresponding population parameters and what probability (confidence) they wish to have of being that close. Although the a priori procedure is not limited to means or to normal distributions (Trafimow, Wang, & Wang, 2018a, 2018b), since researchers typically use means and assume normal distributions when performing inferential statistics pertaining to means, the present research followed suit.

Trafimow (2017) provided an accessible derivation of Eq. 1 below, where the necessary sample size n to meet specifications is a function of the fraction, f, of a standard deviation within which the researcher wishes the sample mean to be of the population mean, as well as the z-score ZC that corresponds to the degree of confidence desired of being within the prescribed distance.Footnote 1

$$ n={\left(\frac{Z_C}{f}\right)}^2\ \mathrm{or}\kern0.5em f=\frac{Z_C}{\sqrt{n}}. $$
(1)

For example, suppose the researcher wishes to have what Trafimow (2018a) characterized as “excellent” precision, under 95% confidence, that the sample mean to be obtained would be within one-tenth of a standard deviation of the population mean. The z-score that corresponds to 95% confidence is 1.96, so 1.96 and .1 can be instantiated into Eq. 1 for ZC and f, respectively: \( n={\left(\frac{1.96}{.1}\right)}^2=384.16 \). Rounding upward to the nearest whole number, then, implies that the researcher would need to collect 385 participants to meet the specifications for closeness and confidence. Alternatively, Eq. 1 can be used in a posteriori fashion to estimate the precision of a study that has already been run. For example, suppose that the researcher collected 100 participants and wished to estimate precision, using the typical 95% standard for confidence. In that case, the precision could be computed as follows: \( f=\frac{1.96}{\sqrt{100}}=.196 \).

Before continuing, it is useful to draw a contrast between the a priori procedure and other frequentist methods for using intervals. Typically, the standard deviation is an important factor that influences frequentist intervals, where the sample standard deviation is used to estimate the population standard deviation, thereby enabling interval computations to proceed. In a priori equations, however, the standard deviation cancels out, so there is no need to estimate it with the sample data (see Trafimow, 2017, for a proof). Rather, the interval is expressed in terms of standard deviation units—that is, in fractions of a standard deviation. An advantage of using standard deviation units is that the standard deviation does not need to be known or estimated, which affords that calculations can be made prior to the acquisition of data. Another advantage is that studies with very different standard deviations nevertheless can be compared or contrasted in an a posteriori fashion, in standard deviation units, as will be accomplished here. (A possible disadvantage is that there is no mechanism for including prior knowledge in the calculations, though this is unimportant for the present purposes.) We emphasize that according to the a priori procedure, precision is a function of the study procedure rather than of how the data turn out.

Trafimow and MacDonald (2017) expanded the a priori procedure to apply to the means of as many groups as the researcher wishes—that is, k groups. Assuming equal sample sizes per condition, which is convenient for the present purposes, since not all researchers specify exactly the number of participants in each condition, Trafimow and MacDonald derived Eq. 2, which relates to Eq. 1 but works for k groups, as opposed to only one group.Footnote 2 In Eq. 2, p(k means) refers to the probability that the means in all groups are within the desired distances of their population means, and Φ refers to the cumulative distribution function of the normal distribution.

$$ n={\left(\frac{\Phi^{-1}\left(\frac{\sqrt[k]{p\left(k\ \mathrm{means}\right)}+1}{2}\right)}{f}\right)}^2\ \mathrm{or}\ f=\left(\frac{\Phi^{-1}\left(\frac{\sqrt[k]{p\left(k\ \mathrm{means}\right)}+1}{2}\right)}{\sqrt{n}}\right). $$
(2)

How the a priori procedure differs from traditional power analysis

Others have analyzed published articles from the point of view of power analysis (e.g., Fraley & Vazire, 2014), and the proposed contribution may seem similar. But this is not so, because the a priori procedure differs strongly from power analysis, as the present section will clarify.

To see the differences, it is useful to commence by considering that power analysis and the a priori procedure have very different goals. The goal of power analysis is to find the sample size necessary to have a good chance of obtaining a statistically significant p value, whereas the goal of the a priori procedure is to find the sample size necessary to be confident that one’s sample statistics are close to the population parameters of interest. To see the difference qualitatively, imagine that the expected effect size is extremely large, or near 0. If the expected effect size is extremely large, a power analysis would indicate that only a small sample size is needed in order to have a good chance of obtaining a statistically significant p value. In contrast, if the expected effect size is near 0, a power analysis would indicate that a huge sample size would be needed to have a good chance of obtaining a statistically significant p value. In opposition to this contrast, the a priori procedure does not care what the expected effect size is. What matters is obtaining sample statistics (e.g., sample means) that are close to the corresponding population parameters (e.g., population means). Thus, according to the a priori procedure, the necessary sample size has no dependence whatsoever on the expected effect size.

The foregoing paragraph shows that power analysis is sensitive to the expected effect size, whereas the a priori procedure is not. Another qualitative difference is that the a priori procedure is sensitive to the desired closeness of the sample means to the population means, whereas power analysis is not, though we hasten to add that power analysis is sensitive to the threshold for statistical significance (e.g., .05, .01, etc.), whereas the a priori procedure is not. This last occurs because the a priori procedure does not imply the performance of an eventual significance test.

The strong qualitative differences imply strong quantitative differences, too. Imagine a study concerning a single mean, based on a sample size of 50, where the expected effect size varies between .1 and .7, keeping the threshold for statistical significance at the usual .05 level (with confidence for the a priori procedure at the usual .95 level). How does the power vary as the expected effect size varies? Figure 1 illustrates the dramatic effect that the expected effect size (along the horizontal axis) has on power (along the vertical axis); the power increases from .109 to .999 as the effect size increases from .1 to .7.Footnote 3 Note that the precision in this scenario, according to the a priori procedure, always equals \( \frac{1.96}{\sqrt{50}}=.28 \), no matter the expected effect size.

Fig. 1
figure 1

Power expressed as a function of effect size for a single sample when the sample size is 50

Another way to see a quantitative difference between the two procedures is to consider the sample size needed to reach arbitrary levels of precision. Continuing with the simple case of a single mean, imagine that the desired level of precision varies from .1 to .7 (remember that a smaller number implies better precision). As Fig. 2 illustrates, the necessary sample size decreases from 384.16 to 7.84 along the vertical axis as the desired level of precision becomes increasingly poor, from .1 to .7 along the horizontal axis. Note that the required sample size for any desired level of power in this scenario remains the same, no matter the desired level of precision.

Fig. 2
figure 2

Sample size expressed as a function of the desired degree of precision, with a confidence level of .95. Lower values along the horizontal axis indicate better precision

Yet another way to see a difference between the two procedures is to consider the role played by the standard deviation. In the case of power analysis, keeping the mean (in a one-sample experiment) or the difference in means (in a two-sample experiment) constant, the standard deviation has a strong effect on power. As the standard deviation increases, the effect size decreases, so power likewise decreases. In contrast, because precision is defined in standard deviation units in a priori calculations, increasing the standard deviation does not influence precision.Footnote 4 To avoid overkill, we do not provide a figure.

For a dramatic conclusion to this section, imagine a one-sample experiment in which Researcher A can afford to recruit 13 participants. Fortunately, this researcher expects an effect size of .8, and this is barely sufficient for 80% power (the exact power level is 82.2%). According to the received view, this would be satisfactory. In contrast, Researcher B uses the a priori procedure to calculate that the precision is .54, which is much worse than Trafimow’s (2018a) criterion of .4 for “poor precision.” Thus, where Researcher A is satisfied with the experiment because it meets the traditional power requirement, Researcher B is not, because she knows that the precision with which the sample mean estimates the corresponding population mean is terrible. Thus, although power analysis works well for improving null hypothesis significance tests, it is not useful for ensuring that sample means are close to the corresponding population means they are used to estimate. The a priori procedure is necessary to accomplishing that goal.

Present goal

If we use the standard 95% probability for the remainder of this article, Eq. 2 reduces to the following: \( f=\left(\frac{\Phi^{-1}\left(\frac{\sqrt[k]{.95}+1}{2}\right)}{\sqrt{n}}\right) \). In other words, if one knows the number of groups in an experiment and the total sample size and is willing to divide the total sample size by the number of groups in order to approximate n, that person can estimate the precision under the stricture of 95% confidence. Or, if there are multiple experiments, one can include all the groups across all the experiments for k, to arrive at an article-wise estimate of precision. This is the practice followed here.Footnote 5

Using the a priori procedure in a posteriori fashion provided the opportunity to investigate the conditions under which psychology experiments in the literature have been performed. Specifically, we randomly chose articles in upper- and lower-tier journals, in social, cognitive, neuro-, developmental, and clinical psychology, to investigate their article-wise levels of sampling precision and to address our questions.

Method

In each subfield of psychology, we selected three journals in what might be considered relatively upper or lower tiers. We used the scientific journal ranking, or impact factor, for the year 2015 according to Thomas Reuter’s annual Journal Citation Report. Thirty journals were represented (e.g., Top Social, Bottom Social, etc.)—see Table 1.Footnote 6 The journal articles were selected using a random-digit table (Dowdy, Wearden, & Chilko, 2004).Footnote 7 Selected articles had to meet the additional criteria of using exclusively between-participants designs, reporting group means, and being published in 2015. The number of studies, groups per study, sample size per study, and total sample size across studies were obtained for each article.

Table 1 List of journals and rank information

Results

Because the distributions of precision values were skewed, it made more sense to use median than mean precision values.Footnote 8 Table 2 and Fig. 3 illustrate the differences in median precision, for the upper- and lower-echelon journals with respect to the impact factors of the journals in different subfields of psychology. Table 2 also gives the precision standard deviations. Developmental and social psychology did relatively well in comparison to cognitive and neuropsychology, with cognitive psychology exhibiting particularly extreme imprecision in lower-echelon journals. Clinical psychology was mixed in terms of precision, because upper-echelon clinical psychology journals exhibited precision near that of social and developmental psychology, whereas lower-echelon clinical psychology journals exhibited imprecision in the ballpark of cognitive and neuropsychology. Finally, there is an interesting effect whereby lower-echelon neuropsychology journal articles exhibited precision superior to those in the upper echelon.

Table 2 Median precision (f) and standard deviation of this precision, as a function of psychology subfield and journal tier
Fig. 3
figure 3

Median precision levels in social, cognitive, neuro-, developmental, and clinical psychology, for upper- and lower-level journals in those subfields. Lower values indicate better precision

Discussion

Figure 3 illustrates the median precision levels of the different areas in psychology; but how should these be evaluated, in both relative and absolute terms? In relative terms, the data are reasonably clear. Developmental and social psychology perform relatively well; cognitive and neuropsychology perform relatively poorly; and clinical psychology performs relatively well or poorly, depending on whether researchers consider articles in upper- or lower-echelon journals, respectively.

In absolute terms, however, it is not even clear that developmental and social psychology perform all that well. The median precision levels were .23 and .29, respectively. If we take .1 or less as “excellent” precision (Trafimow, 2018a), even developmental and social psychology are quite far from the goal. Of course, such designations are arbitrary, and perhaps should not be taken too seriously, but our best guess is that few researchers would be happy with median precision levels exceeding .2. And with median precision levels in cognitive and neuropsychology exceeding .4 in upper-echelon journals, there is much room for improvement.

Why should we care about sampling precision?

An obvious way to avoid having what some might consider to be the negative implications of Table 2 and Fig. 3 would be to question the importance of sampling precision for psychology research. After all, the argument might commence, researchers are interested in testing empirical hypotheses, in the interest of confirming or disconfirming the theories from which they are derived. Consequently, researchers do not, and should not, care about the sampling precision of the means they obtain. Rather, they should care about whether the hypothesized differences between means are statistically significant or not, in the interest of determining whether the hypothesized effect is “there” or “not there.”

This objection can be addressed in several ways. The first way is mathematical. That is, under the assumption of normality and assuming equal sample sizes for two groups, it works out that more participants are needed in order to achieve the same level of precision for a difference between means than for particular means.Footnote 9 Thus, if one sees Table 2 and Fig. 3 as pessimistic, the pessimism remains even if there is a switch to a focus on differences between means.

Although the mathematical argument is sufficient, this seems a good opportunity to address what can be considered a poor philosophy that underlies the objection. First, from a statistical perspective, an argument can be made that researchers make a huge mistake by dichotomous thinking in terms of whether an effect “is there” or “is not there.” It is unlikely, in the extreme, that the population effect size is exactly zero. After all, with an infinitude of possible values, it is extremely unlikely that many studies would adhere exactly to any single effect size value. Thus, it is tantamount to certainty that there is an effect, which renders the question of whether there is an effect or not moot. Rather, the important issue concerns the size of the effect. Is the effect size and direction consonant or dissonant with a theory, does the effect size suggest or fail to suggest effective applications, does the effect size support or not support the validity of new methods, etc.?Footnote 10

More generally, from the philosophy of science there is the issue of the importance of establishing the facts. There are many different philosophies of science, emphasizing verification (e.g., Hempel, 1965), falsification (e.g., Popper, 1959), abduction (e.g., Haig, 2014), and many other concerns. These different philosophies assert that there are different ideal orders of consideration of what the facts are and of theory development, as well as different sorts of relations between facts and theories. But all of these philosophies have in common that, at some point in the scientific process, it is necessary to determine what the facts are. Admitting, then, that under most respectable philosophical perspectives the facts matter at some point, researchers ought to care about the facts! And if the facts matter, it seems difficult to avoid the implication that knowing the facts as precisely as possible matters, too, as exemplified by Lavoisier’s disconfirmation of phlogiston theory, with which the present article commenced. Although it is arguable whether researchers should depend as much on means as they do (Speelman & McGann, 2016; Trafimow et al., 2018), given that dependence on means, the desirability of obtaining means with the best possible sampling precision should be clear to all. Consequently, the implications of Table 2 and Fig. 3 retain their full force.

The connection between precision and replicability

Although the present work pertains to precision rather than replicability, there may be a strong connection between them, depending on how one conceptualizes replicability. The typical way—we might even say, the received way—that researchers consider a successful replication is if an experiment results in a statistically significant finding upon both iterations. Using this conceptualization, it should be obvious that, all else being equal, the more participants the researcher has in both iterations, the greater the probability of a successful replication. In addition, the greater the population effect size, the greater the replicability. A problem with the received view, however, is that the null hypothesis significance testing procedure has come under fire in the last several years (see Hubbard, 2016, and Ziliak & McCloskey, 2016, for reviews), and it was widely criticized at the 2017 American Statistical Association Symposium on Statistical Inference. Aside from statistical issues, though, consider what might be considered the Michelson and Morley problem. Michelson and Morley (1887) performed an experiment that disconfirmed the existence of the luminiferous ether.Footnote 11 The theoretical effect size was zero (though they did obtain a small sample effect size), and physicists consider the experiment to be highly replicable. According to the received view, where replicability depends importantly on having a large population effect size, the Michelson and Morley experiment would have to be considered not to be very replicable.

Trafimow (2018a) suggested a more creative alternative that solves the Michelson and Morley problem, featuring the goal of calculating the probability of obtaining sample statistics within desired distances of the corresponding population parameters in both iterations of an experiment. This procedure distinguishes between replicability in an idealized universe, where we imagine it is possible to duplicate conditions exactly so that the only differences between iterations are due to randomness, and replicability in the real universe, where it is impossible to duplicate conditions exactly and there are systematic as well as random differences between the two iterations. Trafimow (2018a) showed that the probability of replication in the idealized universe places an upper bound on the probability of replication in the real universe. Moreover, by using a priori equations, it is possible to calculate the probability of replication in the idealized universe, even before performing the original experiment. Trafimow (2018a) showed that it is possible to algebraically rearrange a priori equations to give the probability of obtaining sample statistics within desired distances of the corresponding population parameters at the sample size the researcher plans to collect. Once this probability has been obtained, squaring to take both iterations into account renders the probability of replication in the idealized universe. The probability of replication in the real universe must be less than that figure, thereby implying that if the calculated figure is already a poor number (and it usually is), the true figure must be even worse.

Four interesting features are worth emphasizing when using a priori equations to calculate the probability of replication in the idealized universe. First, analogous to precision in the present article, replication is a function of the procedure rather than of the data. Second, precision and replicability are intimately connected. Third, just as precision does not depend on the population effect size, neither does probability of replication, which, we reiterate, solves the Michelson and Morley problem. Finally, it is quite possible for the received view of replicability and the Trafimow view to result in opposite conclusions. To see this last conclusion, suppose there is a huge population effect size in two iterations of an experiment, though the sample sizes are only moderate. Given the huge population effect size, according to the received view of replicability, there is quite a good chance of obtaining statistically significant findings in both iterations, and hence impressive replicability. In contrast, the probability of replication according to a priori equations is poor in the idealized universe, and even worse in the real one, and there is little reason to believe that the obtained sample statistics are close to their corresponding population parameters in both iterations. In contrast, suppose that the population effect size is very small for two iterations of an experiment, but the sample sizes are very large in both cases. In that case, the experimenter can be confident that the sample statistics are close to their corresponding population parameters in both iterations, so replicability is impressive, though the small population effect size renders replicability poor according to the received view.

Arguments not being made

Given the predilections of psychologists, misinterpretations of the foregoing argument seem likely. The present section is an attempt to prevent misinterpretations.

The most obvious misinterpretation is that developmental and social psychology are “better” than cognitive and neuropsychology. On the contrary, although sampling precision is important, it is only one of many criteria that can be used to evaluate psychology subfields. Other criteria include the explanatory breadth of theories, the validity of auxiliary assumptions, practical applications, and so on. There is no implication here that some psychology subfields are better than others.

Another misinterpretation is that in areas such as developmental and social psychology, where upper- and lower-echelon journal articles do not differ much with respect to sampling precision, there is no reason to distinguish different echelons of journals. In fact, there are many ways to evaluate journals. In addition to impact factors, there are acceptance rates, submission rates, reading rates, and so on. There is no reason for anyone to use the present article as a reason to argue that it is wrong to distinguish different echelons of journals in developmental and social psychology. However, that the present article should not be used in this way is not a reason to support distinguishing different echelons of journals, either. An intermediate position might be that there are different journal echelons in different areas of psychology, but the extent of the differences may be less than is typically assumed. The present authors are not committed to any position on this issue. Having said that, however, it is noteworthy that in social, cognitive, developmental, and clinical psychology, the standard deviation of precision is larger for lower- than for upper-echelon journals (see Table 2). Perhaps one difference between journal echelons in some areas of psychology pertains to variability in precision.

A third misinterpretation is that because the present focus was on between-participants designs, the present authors believe these are superior to within-participants designs. No such implication is intended, and we focused on between-participants designs for other reasons. The main reason is that it was easier to find between-participants designs across areas, and across different echelons of journals. Another reason is that the analyses are more mathematically straightforward for between-participants than for within-participants designs, and it made sense to keep the first article of this type as simple as possible. Nevertheless, there are important reasons to favor either between-participants or within-participants designs, depending on the researcher’s goals, manipulations, and other considerations (see, e.g., Smith & Little, 2018; Trafimow & Rice, 2008).

Because some areas of psychology use within-participants designs more frequently than other areas, it could be argued that the present focus on between-participants designs might underrepresent such areas. For example, cognitive psychology experiments employ within-participants designs more frequently than do social psychology experiments. This is a limitation. Future work analogous to the present work is being planned to analyze articles featuring within-participants analyses.

Should researchers increase sample sizes?

The equations render obvious that for psychology researchers to increase sampling precision, they will need to increase their sample sizes substantially. It is possible to take more than one perspective on this. One view would be that psychology researchers should use larger sample sizes and thereby obtain greater sampling precision. An alternative view might be that increased sample sizes would come at the cost of researchers being able to perform fewer experiments, and it is better to have more experiments even at the cost of decreased sampling precision. A third view might be that many experiments in psychology are flawed in multiple ways, so an advantage of fewer experiments might be that researchers would think them through more rigorously. Thus, according to this third perspective, increased sampling precision could come at little cost within rigorous experiments, even if the cost is increased for nonrigorous experiments.

Although this is not an argument that is going to be settled here, three straightforward points can be made. First, researchers who favor the first or third point of view also should favor increasing sample sizes in order to increase sampling precision. Second, researchers who favor many small experiments should be up front about admitting that the sample means cannot be trusted as accurate estimates of their corresponding population means. This issue can be extended to effect sizes; that is, effect sizes based on sample means with low sampling precision cannot be trusted as accurate estimates of the corresponding population effect sizes.

The third point to be made is that, if psychology moves in the direction of multiple small experiments, a few issues—both positive and negative—would stem from the decision. On the negative side, most “replications” nevertheless differ from each other with respect to the populations sampled from, dates, locations, and so on. Thus, an argument can be made that using multiple small experiments rather than a single large experiment risks multiple confounding across experiments. On the positive side, a counterargument could be that what can be considered “multiple confounding” alternatively can be considered “increasing generalizability.” Our expectation is that the issue of a few large studies versus many small studies invokes important conceptual and philosophical issues that require a separate article, or many such articles. A careful examination of the relevant conceptual and philosophical issues by future researchers would be a positive consequence of the present work concerning the sampling precision of different psychology subfields.

Conclusion

The present work commenced with a general point about the importance of precision in science: There is no substitute for knowing the facts as precisely as possible. The present work focused on one type of precision—that is, sampling precision. However, there are other types of precision. Trafimow (2018b) detailed two other types of precision that matter in the behavioral sciences. One of these is measurement precision, and the other is the precision of homogeneity. Measurement precision refers to a characteristic of the measuring instrument: As randomness decreases, measurement precision increases. Regarding the precision of homogeneity, as participants are more homogeneous, it is increasingly easy to observe differences between the groups in different conditions, so there is greater precision of homogeneity.

We bring up measurement precision and precision of homogeneity to emphasize that sampling precision is only one piece of the larger precision mosaic. Given the importance of precision in the history of science, investigations of different types of precision are desirable. It might well be that psychology does better or worse with respect to other types of precision than with respect to sampling precision. In addition, the relative ordering of the performance of psychology subfields might differ for different types of precision. Thus, although the present investigation concerning the sampling precision of psychology subfields is a good start, the findings raise questions about other types of precision that can only be addressed by future research.