Proposed Standards for Medical Education Submissions to the Journal of General Internal Medicine
- First Online:
To help authors design rigorous studies and prepare clear and informative manuscripts, improve the transparency of editorial decisions, and raise the bar on educational scholarship, the Deputy Editors of the Journal of General Internal Medicine articulate standards for medical education submissions to the Journal. General standards include: (1) quality questions, (2) quality methods to match the questions, (3) insightful interpretation of findings, (4) transparent, unbiased reporting, and (5) attention to human subjects’ protection and ethical research conduct. Additional standards for specific study types are described. We hope these proposed standards will generate discussion that will foster their continued evolution.
KEY WORDSmedical education scholarship research design research methods writing
As part of its mission to serve the needs of generalist physicians, the Journal of General Internal Medicine (JGIM) publishes a substantial number of medical education articles. In 2007, JGIM published 58 articles related to medical education, and the 2008 medical education special issue alone contains over 40 articles and editorials.
Journal editors are responsible for selecting manuscripts most relevant to their readership and of the highest quality. JGIM has embraced efforts to evaluate and improve the quality of its medical education publications.1 In this issue, Reed et al.2 report a study evaluating the quality of all submissions to this medical education special issue. Their study noted a diversity of methodological quality among the submissions, yet also found that the highest quality manuscripts were ultimately selected for publication. This study has generated much discussion among the Journal editors, and we anticipate that it will likewise stimulate dialogue in the general community.
Moving forward, we wish to articulate the standards by which medical education submissions to JGIM are currently judged. The guidelines that follow reflect other published guidelines,3, 4, 5, 6 with an emphasis on the types of studies commonly submitted to JGIM and the issues they raise. We developed and refined these guidelines as we reviewed submissions for this special issue and made editorial decisions. By articulating these guidelines we hope to: (1) help scholars design rigorous studies and prepare clear and informative manuscripts, (2) assist manuscript reviewers in providing high quality critiques, (3) improve the transparency of editorial decisions, and (4) raise the bar on educational scholarship published in JGIM. A concise summary of these guidelines is available on the JGIM website and in the Appendix. We recognize these guidelines represent our views, but hope they will stimulate discussion among the broader community of education researchers and journal editors.
WHAT CONSTITUTES QUALITY?
JGIM embraces the broad concept of scholarship outlined by Boyer7 and will consider manuscripts demonstrating high-quality scholarship of any type. We also endorse the six standards described by Glassick8 for assessing the quality of scholarship: clear goals, adequate preparation, appropriate methods, significant results, effective presentation, and reflective critique. Quality is multifaceted and includes not only rigorous research methods,9 but also starting with an important question or goal,10, 11, 12, 13 interpreting results objectively and making valid inferences,4,14,15 and reporting all of these clearly.4,16 In the discussion that follows, we review these principles concisely and use them to frame recommendations for medical education manuscripts submitted to JGIM.
The research question is arguably the most important part of any scholarly activity.10,17,18 The research question can be framed in many ways (purpose, objective, goal, aim, or hypothesis), but should illustrate the relationship between the variables being studied (population, independent, and dependent variables). A focused question dictates appropriate methods and frames the interpretation of results.
JGIM emphasizes research relevant to the needs of generalist physicians, which includes both applied and theoretical education research. Education researchers can ask questions ranging19 from concrete and practical (“How can we effectively teach students to perform medication reconciliation?”) to more abstract (“Why do faculty feel certain student behaviors are appropriate, while others are not?”).
Asking good questions requires a firm grasp of what has been previously done.13,17,20 Adequate preparation is a hallmark of scholarship,8 and is typified by a thorough and critical literature review that culminates in a “problem statement”11 highlighting the gap in understanding that the present study seeks to fill. Yet medical education research studies frequently lack a critical literature review.16 Without demonstration of adequate preparation, it is impossible to judge how a scholarly effort advances the field. Authors should present a concise but thorough examination of relevant literature, including strengths and weaknesses of previous studies.
Some questions are more important than others based on their implications for practice and research. Many questions are important by virtue of their relevance to pressing issues such as work hours or health disparities. However, importance is best supported through the use of a conceptual framework.11,13,21 A conceptual framework “situates the research question, intervention methods, or study design within a model or theoretical framework that facilitates meaningful interpretation of the methods and results”16 and subsequent application to new settings and future research. While frameworks may take the form of formal theories,22 more often they are models for how things work (Glassick’s criteria8 provide a framework for thinking about quality of scholarship) or systematic approaches to a problem (for example, an approach to the study of computer-based learning23). Questions that incorporate conceptual frameworks, and seek to clarify educational processes,24 will be most relevant to other educators and researchers. Unfortunately, conceptual frameworks are frequently absent from reported education research.16
Methods to Match the Question
Authors should select methods appropriate for the question. Guidelines relevant to specific study types are outlined below and in other sources.25, 26, 27, 28, 29, 30 In general, authors should explain critical methodological decisions, particularly when decisions lead to unusual or suboptimal methods. Justification can be logistical (“it was not feasible to randomize”), logical (“after careful consideration of various options, we decided to ... because ...”), or supported by literature.
Outcomes studied should also match the study goals. Studies that aim to improve knowledge should measure knowledge, and studies designed to improve skills should measure skills. Unfortunately, we often see this principle violated. Given concerns about the accuracy of learners’ subjective (i.e., self-assessed) ratings of knowledge or skills,31 objective assessments are preferred over subjective measures. Although “higher-order” outcomes (behaviors in practice or patient outcomes32) are desirable,33,34 they are not currently the standard35 and may inadvertently weaken a study if measurements are questionable or outcomes do not align with objectives.
Authors should use appropriate statistical tests. Among the errors we commonly see are failure to adjust for multiple independent comparisons,36 use of statistical tests of inference without verifying underlying assumptions,37 and ambiguity about the statistical tests used. Investigators should consider collaboration and/or consultation with a statistician beginning in the planning stages (when the study design can still be adjusted and strengthened).
Glassick’s “significant results” refer not to statistical significance, but rather to the impact of results on the field–in this case, the needs of generalist educators and researchers. A good question aligned with rigorous methods will facilitate relevance and defensible interpretations, but meaningful inferences also require objective analysis and reflective critique.38 In addition to reporting percentages and statistical test results, a reflective scholar will identify strengths and shortcomings, situate the work in the context of prior studies, and identify immediate applications (often few) and directions for future research.4
No study is perfect, and even modestly flawed studies can support valid inferences. Authors should carefully consider how best to convey the study findings and integrate these with prior work without overstating the scope or importance of a study (over-generalization) or understating either the limitations or the implications of their work. Finding this delicate balance constitutes the art of reflective critique.8
Even the best study will fail to have an impact without effective communication of findings. Yet we know that medical education research reporting has much room for improvement.16,39,40 Authors should consult appropriate guidelines,3,4,37 follow JGIM’s “Instructions to Authors,” and obtain assistance if needed to clearly communicate their results. In addition to using accepted or prescribed organizing headings, authors must use clear, concise language and avoid jargon (i.e., locally developed or specialty-specific terminology).
In quantitative research, authors should report means and standard deviations for continuous variables, numerator and denominator (not just percentages) for categorical variables, and in all cases emphasize confidence intervals and effect sizes rather than p values alone.37,41,42 Qualitative research should report specific themes along with supporting quotations and excerpts. Abstracts should be as “informative” as possible.40,43
ETHICAL ISSUES IN EDUCATION RESEARCH
Education studies pose risks to human subjects,44 yet many reported studies fail to comment on human subjects’ protections.16 The power differential between the teacher and trainee, similar to that between the physician and patient, characterizes the trainee as a “vulnerable subject.” Furthermore, educational outcomes, both measured (e.g., grades) and unmeasured (e.g., acquired knowledge), are important to learners and can have lasting effects. Institutional review board (IRB) and informed consent requirements for education research vary across institutions and study designs. Investigators should obtain IRB review (with approval, exemption, or waiver as appropriate) and then follow JGIM’s “Instructions to Authors” to “include a statement about informed consent and institutional review board approval in the methods section.” Regardless of local requirements, investigators should diligently protect human subject rights.
Scholars should also adhere to standards of scientific integrity. Authors must be able to take public responsibility for the full content of an article to justify authorship, meaning they have not only read it, but have contributed meaningfully to the ideas presented. JGIM strongly opposes ghostwriting,45 honorary authorship, undisclosed conflicts of interest,46 duplicate publication, and plagiarism. Reporting a study’s findings as a series of separate articles in order to maximize the number of publications is inappropriate.47
STANDARDS FOR SELECTED STUDY TYPES
Below we highlight key or often-neglected quality elements for specific study “types.” These types comprise a mixture of study purposes, designs, methods, and manuscript categories frequently found among JGIM medical education submissions, and are neither mutually exclusive nor all inclusive. Many studies will use a combination of these types.
This is not a comprehensive list of standards, even for a given type. Absence from discussion below does not mean a quality element is unimportant, but might simply mean we perceive it as less frequently problematic. Authors should continue to refer to relevant sources to guide the systematic design, conduct, and reporting of research.26, 27, 28, 29,48, 49, 50 Likewise, the absence of a specific study type does not indicate that we do not value such scholarship. We do not address systematic reviews here, but would welcome rigorous reviews on important education questions51 and refer authors to published guidelines.52, 53, 54 Similarly, we accept and encourage theory-building and programmatic research.21,22,24
JGIM’s “Educational Innovations” are “succinct descriptions of innovative approaches to improving medical education” and often represent the product of scholarship of teaching.7 The JGIM “Instructions for Authors” contain detailed specifications. The most important part of an “Innovation” study is demonstration that it is indeed innovative. This necessitates documentation of a thorough literature search. Evaluations of activities that have already been described might merit publication as “Original Research," but they are not appropriate as Innovations. Yet even when an idea has never been previously described, a diligent search will invariably identify previous work (empiric and theoretical) to support the approach followed. Scholarly innovations do not appear from thin air; they build on prior work.
Authors must describe the innovation, including both the educational objectives and the innovation itself, sufficiently well that a reader could implement or adapt the innovation at his/her own institution. As most of these articles represent the scholarship of teaching rather than research, a rigorous evaluation of the innovation is not mandatory. However, only the most innovative and best prepared ideas will merit publication without adequate evaluation. As the degree of innovation goes down, the evaluation rigor must go up. Even then, the key to a successful Educational Innovation publication will be a novel, well-described idea that addresses an important need, has an adequate theoretical/empirical foundation, and builds on prior work.
Authors must demonstrate reflective critique by discussing what went well, what did not work as planned, how and why results vary from other studies, and areas for improvement and future research. Honesty and candor are not penalized. Indeed, an innovation with neutral or unexpectedly negative effects may have as much or more importance in publication as an innovation with statistically significant positive effects. However, the usual caveats of sample size, sensitivity of outcome measures, and strength of intervention apply in studies showing no effect.
Much medical education research relies on a survey as a means of collecting data. Although this is strictly a method rather than a study type, its ubiquity justifies a brief discussion.
Surveys are subject to various sources of bias. They are susceptible to researcher bias in the wording of questionnaire items and the sample selection. Low response rates also introduce possible bias. Surveys often generate large amounts of data, which introduce the danger of bias from conducting multiple statistical analyses and then reporting only the statistically significant findings. Lengthy surveys can also breed long Results sections in which key points are lost amidst excessive data. We propose the following as a starting point for studies using surveys (in addition to the general standards of scholarship noted above) and refer authors to other sources26,55 for details.
The research question should be clearly stated and justified. This will focus authors’ collection, analysis, and presentation of data, and also ensure that the survey addresses an important issue.
Based on the research question, a study sample should be selected to reflect the population to which results will be generalized.
The questionnaire must have evidence to support the validity of its scores for answering this research question (see guidelines for the development and evaluation of assessment tools detailed below). If the study uses a previously published instrument, validity evidence should be concisely summarized and referenced. If the instrument is new, the study at a minimum should report evidence of content (breadth and depth of coverage of topic, systematic development process, qualifications of item writers, expert review, and pilot testing) and score reliability (from pilot testing, actual administration, or both).
Authors should report information on the format of survey administration (mail, Web, phone, other) and describe methods used to encourage response. Although there is no universal definition of adequate response rate, authors should keep this in mind while interpreting results.
It is virtually impossible to avoid investigator bias in studies that conduct multiple analyses and then report only those that are significant or interesting. The best defense against such problems is to develop a focused research question and plan all analyses in advance. When reporting, authors should describe all analyses conducted (including those whose results are not reported). Authors should account for independent comparisons using methods such as omnibus tests of statistical significance or Bonferroni’s adjustment.36 The Results and Discussion should highlight key points that support a clear message.
Authors should generally report verbatim the survey questions, along with any scoring rubrics, either in a table (reporting questions and results in the same table) or as an appendix. It is rarely necessary to publish the actual instrument and saves space to report only the questions. If all questions are not reported, authors should report at least a few examples of typical questions.
Needs Analysis Studies
“Needs analyses” are intended to identify the current state of a specific medical education issue. These frequently address potential deficiencies in content knowledge, but can also explore other educational “gaps,” such as work hour violations or inequities in academic promotion. Most studies evaluating educational interventions will have at least a rudimentary needs analysis, but studies designed as needs analyses face a higher bar.
Needs analyses can employ a variety of methods, including surveys and tests, focus groups, chart audits, task analyses, and literature reviews, but all pose challenges. First, such studies are particularly susceptible to researcher biases and special interests. If we looked hard enough, we would likely conclude that every issue in medical education has unmet needs–at least through the eyes of a person with a particular interest in that topic. Second, the results of a needs analysis depend greatly on the participants sampled and the instrument used. Unfortunately, we frequently see needs analyses employing poorly designed measures administered to convenience samples (e.g., a locally developed and administered anatomy exam). Finally, needs analysis studies often collect far more information than can reasonably be (or needs to be) reported and are susceptible to data analysis problems discussed under Survey Research.
Thus, we propose that needs analysis manuscripts meet four minimum requirements (in addition to other relevant standards). First, the research question should be clearly stated and justified. Second, the study sample must reasonably represent the target population (typically a national scope). Since a deficiency at one institution rarely indicates a national need, single-institution needs analyses–though important to an institution–will generally meet with skepticism. Third, the outcome measures must have evidence to support the “plausible” validity of scores. Authors should generally quote verbatim at least a subset of the items including the scoring rubric (i.e., in a table or as an appendix). Fourth, the Results and Discussion should highlight a clear, concise take-home message.
Development and Evaluation of Assessment Tools
Studies describing the development, evaluation, or revision of assessment tools employ a variety of designs, but in all cases the investigators seek to support the validity of an instrument’s scores for making specific inferences.56,57 Rather than try to address all possible study designs, we will discuss a framework or approach to validity that will facilitate high-quality studies.
The current conceptualization of validity unifies all different “types” of validity (content, criterion, construct, etc.) as “construct validity.”6,58, 59, 60, 61 Instruments are intended to generate scores reflecting some underlying construct, and validity is the degree to which scores truly reflect that construct.6 We emphasize that validity is a property of scores, not instruments. Instruments are not inherently valid, but rather scores are valid for a particular purpose.
Validity is best viewed as an hypothesis supported by evidence from various sources.61 As with any hypothesis, validity cannot be proven. Rather, investigators should create a validity argument62 by first stating an initial hypothesis about what construct the scores should reflect; second, collecting evidence (see below) to support or refute that hypothesis (ideally testing the weakest assumption first); third, revising the hypothesis (either the instrument, the construct, or the context of application) if needed; and fourth, repeating the second and third steps until sufficient evidence has been collected to support (or reject) the validity argument. The sufficiency of evidence will vary depending on the application (a high-stakes Boards exam will require more evidence than a medical school second-year midterm). The evidence should answer the question: is it plausible that the scores reflect the intended construct?
There are five currently-accepted sources of validity evidence6: content (how well does the instrument match the intended construct domain?), response process (how do idiosyncrasies of the actual responses affect scores?), internal structure (typically psychometric data such as reliability or factor analysis), relations to other variables (how do scores relate to other variables that purport to measure a similar or different construct?), and consequences (do the scores make a difference?63). A publishable validity study will present data from several (but rarely all) complementary sources of evidence,64 and ideally address the most critical or questionable aspects of the validity argument. Instruments intended for broad use often warrant a series of studies. Other sources contain further details and examples.56,58,59,61 We discourage use of the term face validity61,65 and note that this term is frequently misused to allude to content evidence.
Investigators can employ a similar approach when evaluating or adapting instruments for use in a particular study. When reporting the use of a previously described instrument, authors should briefly summarize the evidence supporting its scores for this application. For example, authors might write, “Felder and Solomon developed the Index of Learning Styles (ILS) to assess the ... learning style dimensions defined by Felder and Silverman. ... [Studies] have used internal consistency, test-retest reliability, and factor analysis to support the internal structure of ILS scores. ILS scores have also been shown to discriminate college students with different majors and college students from faculty.”66
Although much education research evaluates the outcomes of specific interventions, we touch only lightly on this study type because other sources25, 26, 27, 28,67,68 provide adequate guidance for authors. Guidelines developed for behavioral interventions in clinical medicine and public health, such as the TREND guidelines69, STROBE statement,70 and CONSORT extension,71 are also relevant. Authors should highlight an empirical or theoretical grounding for the intervention, focus on a gap in theory or educational practice, and conduct an appropriate evaluation using outcomes aligned with both the educational intervention and the study goals. Conceptual frameworks are useful for both applied and theory-building work.24 Randomized designs are not required, but authors must carefully consider relevant validity threats.38,67
Qualitative research will continue to proliferate as researchers recognize the limitations of quantitative methods in answering many important questions and gain necessary skills.72JGIM has long supported such studies.73 However, such studies must adhere to rigorous standards.30,74, 75, 76, 77, 78, 79 Key standards include a focused research question; appropriate sampling and data collection methodologies; inductive analytic methods that promote trustworthiness, credibility, dependability, and transferability (duplicate coding, triangulation, member checks, saturation, peer review, etc.); results that demonstrate a clear logic of inquiry and present appropriate data (i.e., themes and supporting quotations); and a synthesis with clear conclusions. We encourage use of accepted qualitative paradigms or approaches (grounded theory, ethnography, discourse analysis, etc). Mixed methods approaches (using both quantitative and qualitative methods) can often answer questions better than either approach alone.
We anticipate that these standards will generate discussion and that they will continue to evolve with input from the education research community. In the meantime, JGIM editors will use these guidelines as part of the evaluation process for manuscripts received. We hope that medical education scholars will welcome our attempt to continue to raise the bar, and respond by submitting high-quality work to this journal and thereby advance scholarship in medical education.
Conflict of Interest
- 5.Education Group for Guidelines on Evaluation. Guidelines for evaluating papers on educational interventions. BMJ. 1999;318:1265–7.Google Scholar
- 6.American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.Google Scholar
- 7.Boyer EL. Scholarship reconsidered: Priorities of the professoriate. Princeton, NJ: Carnegie Foundation for the Advancement of Teaching; 1990.Google Scholar
- 8.Glassick CE, Huber MT, Maeroff G. Scholarship assessed: Evaluation of the professoriate. San Francisco, CA: Jossey-Bass; 1997.Google Scholar
- 11.McGaghie WC, Bordage G, Shea JA. Problem statement, conceptual framework, and research question. Acad Med. 2001;76:923–4.Google Scholar
- 14.Regehr G. Reporting of statistical analyses. Acad Med. 2001;76:938–9.Google Scholar
- 15.Regehr G. Presentation of results. Acad Med. 2001;76:940–2.Google Scholar
- 25.Green JL, Camilli G, Elmore PB. Handbook of complementary methods in education research. Mahway, NJ: Lawrence Erlbaum; 2006.Google Scholar
- 26.Fraenkel JR, Wallen NE. How to design and evaluate research in education. New York, NY: McGraw-Hill; 2003.Google Scholar
- 27.Cook TD, Campbell DT. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Boston: Houghton Mifflin; 1979.Google Scholar
- 28.Cronbach LJ. Designing Evaluations of Educational and Social Problems. San Francisco: Jossey-Bass; 1982.Google Scholar
- 29.Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 3Hamilton: BC Decker; 2007.Google Scholar
- 30.Miles MB, Huberman AM. Qualitative Data Analysis: An Expanded Sourcebook. Thousand Oaks, CA: Sage; 1994.Google Scholar
- 32.Kirkpatrick D. Revisiting kirkpatrick’s four-level model. Train Dev. 1996;50(1):54–9.Google Scholar
- 41.Thompson B. Research synthesis: effect sizes. In: Green JL, Camilli G, Elmore PB, eds. Handbook of Complementary Methods in Education Research. Mahway, NJ: Lawrence Erlbaum; 2006:583–603.Google Scholar
- 55.Fink A. How to conduct surveys: a step-by-step guide. Thousand Oaks, CA: Sage; 2005.Google Scholar
- 56.Kane MT. Validation. In: Brennan RL, ed. Educational measurement, 4th ed. Westport: Praeger; 2006:17–64.Google Scholar
- 57.DeVellis RF. Scale development: theory and applications. 2Thousand Oaks, CA: Sage Publications; 2003.Google Scholar
- 67.Cook DA, Beckman TJ. Reflections on experimental research in medical education. Adv Health Sci Educ Theory Pract. 2008; Epub ahead of print 22 April 2008; DOI 10.1007/s10459-008-9117-3.
- 72.Harris I. Qualitative methods. In: Norman G, Van der Vleuten C, Newble D, eds. International Handbook of Research in Medical Education. Dordrecht: Kluwer Academic Publishers; 2002:97–126.Google Scholar