This study demonstrates that statistics are frequently overused in articles from two prominent and highly-cited journals that report educational studies. In fact, since 1997 the majority of articles reporting enumeration studies have inappropriately included statistics.
The authors of one article that did not include statistics wrote directly to the issue in their methods section: "All analyses are based on the entire population of interest. Therefore, tests of statistical significance are not provided" [3]. In another article, the authors appeared conflicted by the issue: "These statistics are included for the interested reader but should be interpreted carefully, for the journals in this study do not, by one way of thinking, represent a sample but rather the entire population of interest" [4]. These authors should not have been stymied and should not have reported inferential statistics.
The findings of our study are worrisome for a number of reasons. The two journals that were studied are among the most prominent in their field and are highly cited. Authors may use them as role models in the reporting of educational studies. In addition, the concept of sampling is central to the whole of inferential statistics and is usually discussed in the early chapters of statistical textbooks [5]. If researchers are confused about a fundamental issue such as whether or not a group of subjects is a sample or an entire population, how are readers to be comforted that other more complex analytical issues have been validly addressed?
A reviewer of this article suggested that there may still be a role for statistics in finite populations by appealing to probability distributions that generated the scores in the population, and that statistical tests are appropriate to compare not the actual numbers in the population but the probability distributions that are imperfectly indicated by the values in the populations. One situation in which this might be the case is when a survey questions an entire population (deans, program or clerkship directors) about values, preferences or impressions such as is often done with Likert-type questions. In this case, the referent distribution might be envisioned to be the impressions of all persons who might hold the office of the person who is responding to the questionnaire. This consideration would not be germane for factual information often queried about in surveys. Dusoir has suggested that "statistics is a collection of warring factions, with deep disagreements over fundamentals" and differences in reporting statistics from finite probabilities may be one of these fundamental issues [6]. On the other hand, Oakes may be correct that "many researchers retain an infatuation with statistical tests" [7].
In addition to confusion about fundamental issues in statistics, the increasing prevalence of statistics in these studies over time suggests that the inappropriate use of statistical packages may be partly to blame. Many of the studies included statements to the effect that data were entered in statistical packages, when for all of these studies a spreadsheet program would have been more than adequate. While statistical packages can generate tests quite readily, the proper interpretation of their output is the responsibility of the investigator. Anthony has suggested that the "use of such (packages) does, unfortunately, also allow you to perform meaningless statistics and incorrect statistical tests, and give misleading or wrong interpretations" [8].
This study did not sample all articles in the medical literature that have reported enumeration studies. However, it reports on all such studies that have been published in two leading journals that report medical educational studies. We suspect that this problem is also rampant in other journals that report this type of study. The proband case that led to this study was published in another leading journal [9].