Deficiencies in the quality of medical education research are widely acknowledged.14 Medical education leaders have appealed for increased methodological rigor, including larger multi-institutional studies,5,6 greater attention to validity and reliability of assessments,7 and examination of clinically relevant outcomes.8,9 Nonetheless, the quality of the current body of published education research remains suboptimal, with the majority of articles reporting single institution studies10 and less rigorous outcomes, such as learner satisfaction or acquisition of knowledge and skills.10,11

An instrument measuring the quality of education research studies could be useful to investigators designing studies and to journal editors reviewing submitted manuscripts. We have developed a Medical Education Research Study Quality Instrument (MERSQI)10 to measure the methodological quality of experimental, quasi-experimental, and observational studies in medical education. In a previous study we demonstrated content, internal structure, and criterion validity evidence for MERSQI scores, including the relationship of one factor, funding, to study quality.10 However, predictive validity evidence has not been established for MERSQI scores.

In this study, we examined whether MERSQI scores predicted editorial decisions for the 2008 medical education special issue of the Journal of General Internal Medicine (JGIM). JGIM regularly publishes a special issue containing medical education research pertinent to general internal medicine. We hypothesized that submitted manuscripts with higher MERSQI scores would be more likely to be sent for peer review, have revisions invited after review, and ultimately be accepted for publication compared to manuscripts with lower MERSQI scores.


Study Design

We conducted a cross-sectional assessment of the quality of manuscripts submitted to the 2008 JGIM medical education special issue. Submitting authors were given the opportunity to decline to include their manuscript in this study. JGIM editors were not aware of authors’ study participation status; all editorial decisions were made independent of study participation. The Mayo Foundation Institutional Review Board deemed this study exempt from review.

Data Collection

A team of investigators (DAR, TJB, SMW, and RBL) who were not involved in JGIM editorial decisions used the Medical Education Research Study Quality Instrument (MERSQI)10 to measure the quality of studies submitted to JGIM’s medical education issue. All studies were de-identified using the procedures described below. Although high interrater reliability of MERSQI scores has already been established,10 two investigators independently scored a subset of studies (55 of 100, 55%) to confirm reliability in this sample. After confirming high interrater reliability, the remaining 45 articles were scored by one investigator. Disagreements were resolved by consensus.

Information on manuscript type (educational innovation, original article, brief report, perspective, review, resource paper, and recommendations/guidelines), initial publication decision (reject without peer review, reject after review, or revise after review), and final decision (reject or accept) was provided by the JGIM editorial office.

De-identification of Studies

The JGIM editorial office removed author names and affiliations from manuscripts and then sent them to an administrative assistant who removed all other identifying information from the manuscript, including acknowledgments, institution names in manuscript text, and references. After MERSQI scoring was complete, the JGIM editorial office provided provisional and final publication decisions using manuscript unique identifiers.

JGIM editorial decisions were made without knowledge of MERSQI scores or other study results. As Co-Editor for the JGIM medical education special issue, investigator DAC was not involved in data collection (grading studies) or data analysis and had no knowledge of individual manuscripts’ MERSQI scores.

Quality Assessment Instrument

The MERSQI is a ten-item instrument designed to assess the methodological quality of experimental, quasi-experimental, and observational medical education research studies.10 The ten items reflect six domains of study quality: study design, sampling, data type (subjective or objective), validity of assessments, data analysis, and outcomes. The maximum score for each domain is 3. A total MERSQI score is calculated as the sum of domain scores with appropriate reductions in the denominator for “not applicable” responses. Thus, possible total MERSQI scores range from 5 to 18. Total MERSQI scores are adjusted to a denominator of 18 to allow for comparison of scores across studies. The MERSQI instrument and scoring algorithm is available online (Appendix).

We have previously demonstrated strong validity evidence for MERSQI scores including: (1) content evidence based on expert consensus and published literature supporting instrument items, (2) internal structure evidence based on factor analysis and excellent interrater, intrarater, and internal consistency reliability, and (3) criterion validity evidence (relationships to other variables) demonstrated by strong correlations between MERSQI scores and the impact factor of the journal in which the study was published, the number of times the study was cited in the 3 years after publication, and global quality ratings by independent experts.10

Data Analysis

Total, domain, and item MERSQI scores for submitted and published studies were summarized using descriptive statistics and compared using Wilcoxon rank sum test. We used logistic regression to examine associations between total MERSQI scores and initial (reject without peer review, reject after review, or revise after review) and final (reject or accept) editorial decisions. Interrater reliability was determined using intraclass correlation coefficients (ICC). We considered a two-tailed p < 0.05 statistically significant for all analyses. Data were analyzed using STATA 8.0 (STATA Corp., College Station, TX).


Characteristics of Submitted Manuscripts

One hundred thirty-one manuscripts were submitted to the 2008 JGIM medical education special issue. Thirty-one were excluded (16 used qualitative methods exclusively, 14 were not original research, and 1 author declined to include his or her manuscript), leaving 100 quantitative, original research manuscripts for analysis.

Of the remaining 100 manuscripts, 58 were submitted as original articles, 35 were submitted as educational innovations, and 7 were submitted as brief reports. Almost half of studies (46%) involved residents as study participants, while 37% involved medical students, and just 7% included faculty. Ten percent of studies included a combination of students, residents, and faculty as study participants.

Quality of Submitted Studies

The interrater reliability of MERSQI scores was excellent with ICCs for individual items ranging from 0.76 (95% CI 0.67–0.83) to 0.98 (95% CI 0.97–0.99) (Table 1).

Table 1 Interrater Reliability of MERSQI Scores

The mean (SD) total MERSQI score of studies was 9.6 (2.6), range 5–15.5. Most studies used single group cross-sectional (54%) or pre-post designs (32%). Fourteen percent of studies included a control or comparison group, and 5% were randomized. Nearly one quarter (22%) of studies were multi-institutional. Nineteen percent failed to report a response rate. Less than half (42%) included objective measurements. Thirty-six percent of manuscripts reported at least one measure of validity evidence for scores from their evaluation instruments: 29% demonstrated content, 20% internal structure, and 9% relationships to other variables (e.g., criterion, concurrent, or predictive validity) evidence. Errors in data analysis were identified in 30% of submitted manuscripts. Most studies (56%) reported satisfaction or opinion outcomes, while a minority reported knowledge or skills (32%), behavior (7%), or patient-related outcomes (5%).

The mean (SD) total MERSQI score of the 35 manuscripts submitted as “educational innovations” was lower than the 65 studies submitted as “original articles” or “brief reports” [8.3 (2.7) versus 10.3 (2.2), p < 0.001], (Table 2). Manuscripts submitted as original articles or brief reports had higher MERSQI scores than those submitted as educational innovations in domains of sampling [2.0 (0.6) versus 1.6 (0.5), p = 0.002]; validity of evaluation instruments’ scores [0.8 (0.9) versus 0.3 (0.7), p = 0.003]; and data analysis [2.7 (0.5) versus 2.0 (0.8), p < 0.001]. There was no difference in MERSQI scores by submission category in the domains of study design, type of data, and outcomes.

Table 2 Mean Total MERSQI Scores for Manuscripts Submitted to the 2008 JGIM Medical Education Special Issue

Association Between MERSQI Scores and Editorial Decisions

Of the 100 submitted manuscripts in the analysis, 75 were sent for peer review, and 25 were rejected without peer review. Of the 75 sent for peer review, 41 received an invitation to revise, and 34 were rejected immediately after peer review. Ultimately, 35 manuscritpts were accepted for publication, and 65 were rejected. For logistic reasons, some manuscripts will be published in a regular issue of JGIM and do not appear in the special issue.

MERSQI scores were associated with an initial editorial decision to send a manuscript for peer review versus reject without review [OR 1.31 for a one-point MERSQI score increase; 95% confidence interval (95% CI) 1.07–1.61, p = 0.009] and to invite revision after review versus reject after review (OR 1.29; 95% CI 1.05–1.58, p = 0.02). MERSQI scores also predicted final acceptance versus rejection (OR 1.32; 95% CI 1.10–1.58, p = 0.003). Thus, a one-point increase in MERSQI score was associated with a 1.32 odds of manuscript acceptance.

The mean total MERSQI score of the 35 accepted manuscripts was significantly higher than the 65 rejected manuscripts [10.7 (2.5) versus 9.0 (2.4), p = 0.003] (Table 2). Accepted manuscripts received higher mean MERSQI scores than rejected manuscripts in the domains of sampling [2.1 (0.6) versus 1.8 (0.6), p = 0.03]; validity of evaluation instruments’ scores [0.9 (1.0) versus 0.5 (0.8), p = 0.02]; data analysis [2.7 (0.6) versus 2.4 (0.7), p = 0.01); and outcomes [1.5 (0.5) versus 1.3 (0.5), p = 0.006) (Figure 1). MERSQI scores were similar for accepted and rejected manuscripts in the domains of study design and type of data.

Figure 1
figure 1

Quality of rejected and accepted manuscripts in the 2008 JGIM Medical education special issue. Legend. Calculated for 65 rejected and 35 accepted manuscripts. Columns represent mean domain-specific MERSQI scores. Error bars represent standard deviation of mean domain-specific MERSQI scores. Maximum possible domain-specific MERSQI score is 3.


The quality of manuscripts submitted to the 2008 JGIM medical education special issue was modest. Most submissions described single institution studies using cross-sectional designs and reporting satisfaction or opinion outcomes. However, our results indicate that high quality submissions, as measured by MERSQI scores, were ultimately selected for publication. As a result, many of the accepted manuscripts are outstanding examples of methodologically rigorous medical education research.

Few submitted manuscripts reported validity evidence for scores from evaluation instruments. MERSQI scores were lowest in this domain for both accepted and rejected manuscripts. This is consistent with prior observations that many categories of validity evidence are underreported in medical education studies.12 However, descriptions of validity evidence for evaluation instruments’ scores were associated with acceptance for publication, suggesting that reviewers and editors agree that validity evidence is important. Because studies that use measurement instruments with weak or no validity evidence are less likely to be published, authors are advised to gather validity evidence in the beginning stages of study design and implementation. Published frameworks that describe and classify validity evidence may facilitate this effort.7,13,14

Less than one-fifth of studies submitted to this issue were multi-institutional, and very few measured learner behaviors (7%) or health care outcomes (5%). These results confirm prior assertions that multi-institutional studies examining clinically relevant outcomes are lacking. Given appeals for greater generalizability5,6 and clinically relevant education research,8,9,11 multi-institutional studies measuring higher level outcomes should be prioritized where appropriate for the research question and study aims.

The associations between MERSQI scores and editorial decisions have meaningful implications. First, this finding provides evidence for the predictive validity of MERSQI scores, supporting its role as a measure of education research study quality. Second, the MERSQI may facilitate peer review and editorial decision-making processes. For example, it could be used by editors to screen articles for review versus rejection, or to resolve dissimilar peer reviews. Because peer reviewers frequently disagree15 and reviews may be influenced by relationships with authors,16 MERSQI scores could be used to standardize the peer review process and identify important methodological issues. Third, the association between MERSQI scores and editorial decisions authenticates the editorial process by showing that editors’ decisions are congruent with established measures of methodological quality.

We acknowledge that the MERSQI focuses solely on the quality domain of methods. Study methods are only one aspect of the multifaceted “quality” of a manuscript. Other important aspects include the quality of the research question,17,18 accuracy of interpretations drawn from study results,19 and the quality of reporting.20 Yet the methods largely determine the confidence one can place in the interpretations drawn from study results. MERSQI scores now have substantial validity evidence supporting their use in assessing the methodological quality of medical education scholarship, and this instrument should thus prove useful to educators, scholars, and editors.

This study has several limitations. First, we assigned MERSQI scores to manuscripts at the time of initial submission, but did not re-score manuscripts after revisions were made. Thus, although many MERSQI items are unlikely to change with revisions (i.e., study design, number of institutions, response rate, outcomes), errors in data analysis and reporting of validity evidence may be identified in the peer review process and corrected prior to publication. Therefore, MERSQI scores of published studies may be higher than initial submissions. Second, we excluded qualitative studies from this analysis because fundamental differences in study design, sampling, evaluation instruments, and analysis preclude summative comparison to other study types.21,22 Although we were unable to assess the quality of qualitative manuscripts using the MERSQI, we observed that a similar percentage of qualitative and quantitative submissions were accepted for publication (31% and 35%, respectively), suggesting that editors value both approaches to education research. Finally, we examined the quality of studies submitted to a single journal, which limits generalization of our findings to a broader range of journals. However, the mean total MERSQI score in this sample [9.6 (SD 2.6)] is similar to that of a sample of published studies from 13 peer-reviewed journals, including general medicine, subspecialty medicine, and medical education journals [9.9 (2.3)].10

Limitations notwithstanding, this study characterizes the quality of a sample of submitted medical education manuscripts and identifies their methodological strengths and limitations. The results also provide predictive validity evidence for MERSQI scores as a measure of the quality of medical education research. The MERSQI may be a useful tool for education researchers, reviewers, and journal editors to gauge the quality of submitted and published education scholarship.