Plain language summary

Outcome measurement instruments are tools that measure certain aspects of health. When researchers want to know which tool is the best for their study, they do something called a systematic review. They gather all facts about these tools from the scientific literature, put them together, and then make a decision on which tool is the best for their research. The problem is that too many systematic review reports about these tools are missing important information. This makes it hard for readers of these reviews to understand them clearly and to pick the best tool. This study tried to solve this problem by creating a new guideline, called “PRISMA-COSMIN for Outcome Measurement Instruments”. This guideline helps researchers to report their reviews on tools in a clear and thorough way. The study identified 54 things that should be reported in any review of tools, covering everything from the report's title to the discussion section. PRISMA-COSMIN for Outcome Measurement Instruments will make the reporting of these reviews of tools better, so people can understand them and choose the right tool for their needs.

Introduction

An outcome measurement instrument (OMI) refers to the tool used to measure a health outcome domain. Different types of OMIs exist, such as questionnaires or patient-reported outcome measures (PROMs) and its variations, clinical rating scales, performance-based tests, laboratory tests, scores obtained through a physical examination or observations of an image, or responses to single questions [1, 2]. OMIs are used to monitor patients’ health status and evaluate treatments in research and clinical practice [3, 4]. Systematic reviews of OMIs synthesize data from primary studies on the OMIs’ measurement properties, feasibility, and interpretability to provide insight into the suitability of an OMI for a particular use [2]. Systematic reviews of OMIs are an important tool in the evidence-based selection of an OMI for research and/or clinical practice.

Several organizations have developed methodology for conducting systematic reviews of OMIs, including Outcome Measures in Rheumatology (OMERACT) [5], JBI (formerly Joanna Briggs Institute) [6], and the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative [2], the latter being the most widely used. Despite the availability of methodological guidance on the conduct of OMI systematic reviews, such reviews are often not reported completely [7,8,9]. For example, a recent study into the quality of 100 recent OMI systematic reviews shows that reporting is lacking on feasibility and interpretability aspects of OMIs, the process of data synthesis, raw data on measurement properties, and the number of independent reviewers involved in each of the steps of the review process (unpublished data). Incomplete reporting limits reproducibility and hinders the selection of the most suitable OMI for a specific application [10]. At present, a reporting guideline for systematic reviews of OMIs does not exist.

Reporting guidelines outline a minimum set of items to include in research reports, and their endorsement by journals has been shown to improve adherence, methodological transparency, and uptake of findings [11,12,13]. To improve the reporting of systematic reviews, the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guideline was developed, containing a checklist, an explanation and elaboration (E&E) document, and flow diagrams [14]. Endorsement of PRISMA has resulted in improved quality of reporting and methodological quality of systematic reviews [15]. PRISMA has been updated in 2020 and is primarily focused on systematic reviews of interventions [16]. Although systematic reviews of OMIs share common elements with systematic reviews of interventions, there are also several differences: for example, in a systematic review of OMIs, multiple reviews (i.e., one review per measurement property) are often included [17], and effect measures and evidence synthesis methods are different in systematic reviews of OMIs. As such, some PRISMA 2020 items are not appropriate for systematic reviews of OMIs, other items need to be adapted, and some items that are important are not included.

There is thus a need for reporting guidance specifically for systematic reviews of OMIs [18], which might also help to reduce the ongoing publication of poor-quality reviews in the literature [7, 8]. This study therefore aimed to develop the PRISMA-COSMIN for OMIs 2024 guideline as a stand-alone extension of PRISMA 2020 [16]. New in reporting guideline development, this study also aimed to integrate patient/public involvement in the development of PRISMA-COSMIN for OMIs 2024, as patients/members of the public are ultimately impacted by the results of these systematic reviews.

Methods

Details on integrating patient/public involvement in the development of PRISMA-COSMIN for OMIs 2024, our lessons learned and recommendations for future reporting guideline developers are outlined elsewhere [19]. Patient/public involvement has been reported according to the GRIPP2 short form reporting checklist in the current manuscript [20].

Project launch and preparation

We registered the development of PRISMA-COSMIN for OMIs 2024 on the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) website [21] and the Open Science Framework [22]. Figure 1 shows the PRISMA-COSMIN for OMIs 2024 development process. A protocol was published previously [23] and Online Resource 1 states deviations from the protocol. The protocol details the project launch, preparation and PRISMA-COSMIN for OMIs 2024 item generation process. Briefly, a steering committee for project oversight, including a patient partner, and a technical advisory group for support and feedback were appointed (Online Resource 2 shows group membership). In the item generation process, we used PRISMA 2020 as the framework on which to modify, add, or delete items [16]. Potential items were identified by searching the literature for scientific articles and existing guidelines that describe potentially relevant reporting recommendations [2, 5, 6, 16, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51]. We applied this initial list of items to three different types of OMI systematic reviews: a systematic review of all available PROMs that measure a certain outcome domain in a certain population [52], a systematic review of one specific PROM [53], and a systematic review of a non-PROM (digital monitoring devices for oxygen saturation and respiratory rate) [54]. Application of the initial item list to these systematic reviews resulted in supporting, refuting, refining and supplementing the items. Findings were shared with the steering committee and technical advisory group, resulting in a list of preliminary items that were presented during the first round of the Delphi study [23].

Fig. 1
figure 1

Development process of PRISMA-COSMIN for OMIs 2024. E&E explanation and elaboration; EQUATOR Enhancing the QUAlity and Transparency Of health Research; OMI outcome measurement instrument

Delphi study

We conducted a 3-round international Delphi study between April and September 2022 using Research Electronic Data Capture (REDCap) [55]. The aim of the Delphi study was to obtain consensus on the inclusion and wording of items for PRISMA-COSMIN for OMIs 2024. We invited persons involved in the design, conduct, publication, and/or application of systematic reviews of OMIs as panelists. They were identified by the steering committee (researchers in the steering committee were not able to participate) and from other relevant Delphi studies [1, 40, 56,57,58]. Persons who co-authored at least three systematic reviews of OMIs, identified through the COSMIN database for systematic reviews [59], were also invited. Invitees could forward the invitation to other qualified colleagues. Besides the patient partner, we selected five patients/members of the public to join through newsletters and contact persons of relevant organizations [60,61,62,63]. Patients/members of the public attended a 90-min virtual onboarding session led by the patient partner and project lead with information about the purpose of the study, OMIs, systematic reviews, reporting guidelines, and the Delphi method. Support was offered throughout the process, if needed.

Registered panelists were invited for each round, irrespective of their responses to previous rounds. Each round was open for approximately four weeks, and weekly reminders were sent two weeks after the initial invitation. For each proposed item, panelists indicated whether it should be reported in a systematic review of OMIs, and whether the wording was clear. Both questions were scored on a five-point Likert scale: strongly disagree, disagree, neutral, agree, strongly agree. Panelists could also opt to select ‘not my expertise’; these responses were not included for calculating consensus. As decided a priori, consensus for inclusion was achieved when at least 67% of the panelists agreed or strongly agreed with a proposal [24, 56, 57, 64] and less than 15% disagreed or strongly disagreed [1, 58]. Panelists were encouraged to provide a rationale for their ratings and suggestions for improved wording.

In round 1, panelists also voted on original PRISMA 2020 items that were thought to have limited relevance for systematic reviews of OMIs. For these items, panelists indicated whether they were indeed not applicable, using the five-point Likert scale described above. In addition, panelists were asked to suggest new items not included in the list. Round 2 of the Delphi study included all round 1 items (except original PRISMA 2020 items that achieved consensus for inclusion and wording), as well as any new items that were suggested during round 1. If panelists made compelling arguments for the deletion of an item in round 1, this was brought forward in round 2, where panelists indicated whether they agreed with the deletion. Round 3 included items that did not reach consensus during rounds 1 or 2, or items with modified wording.

Following each round, frequencies of responses across all panelists and for each group (academia, patients/members of the public, other) were calculated. The project lead (EE) reviewed and summarized qualitative arguments to identify arguments against the overall trend in frequencies. The steering committee checked the summaries of qualitative arguments. A feedback report detailing frequencies and all anonymized qualitative comments was created and shared with panelists in each subsequent round. Each subsequent round also included the summary of qualitative arguments, the percentage consensus for inclusion and wording, and panelists’ own rating from the previous Delphi round.

Workgroup meeting

We held a 3-h hybrid workgroup meeting in Toronto, Canada, and through Zoom in November 2022. This meeting was held to reach agreement on the inclusion and wording of items that had no consensus for inclusion after round 3 of the Delphi study, or for items for which the wording was revised. The steering committee selected participants with a variety of backgrounds from diverse geographic locations from the Delphi panelists who completed all three rounds; however, we did not use the specific responses of panelists in the Delphi study as a criterion for their selection to participate in the workgroup meeting. Additionally, certain members of the technical advisory group, knowledge users, and a limited number of editors were invited, irrespective of their participation in the Delphi study.

Ten days before the meeting, all attendees received an information package via email, including 1) an agenda, meeting details, and practical preparation steps for the meeting, 2) a full list of items detailing their changes over the Delphi rounds, specifying the items that needed discussion at the meeting, 3) the feedback report from Delphi round 3, and 4) short bio statements and photos from participants in the workgroup meeting. Attendees were asked to review the information prior to the meeting. A pre-meeting was held with patients/members of the public to go over the aims and materials for the workgroup meeting.

A facilitator presented each item selected to be discussed, providing a summary of Delphi round 3 results orally and visually on slides. For items that needed agreement on wording, the chair of the meeting summarized main points, and final wording was decided. Where consensus for inclusion was required, attendees voted on each item via a poll. Voting options were “include”, “exclude”, or “abstain”, and ≥ 70% include/exclude was needed for consensus [65], not taking the abstainers into account. The meeting was audio recorded and a notetaker documented the results of each poll, as well as the final wording of the items agreed upon.

Developing the guideline

Drafting the pre-final guideline

After the workgroup meeting, we drafted the pre-final guideline, consisting of 1) the PRISMA-COSMIN for OMIs 2024 checklists (a checklist for full reports and a checklist for abstracts) with a glossary explaining technical terms used; 2) their respective explanation and elaboration (E&E) documents, including a rationale and detailed guidance for the reporting of each item; and 3) the PRISMA-COSMIN for OMIs 2024 flow diagram. We invited workgroup participants to contribute to drafting the E&E document by signing up for specific items in teams of two writers and two reviewers. We made explicit effort to align the wording and structure with PRISMA 2020 [16], as this is expected to facilitate the usability and uptake of PRISMA-COSMIN for OMIs 2024.

Pilot testing

Authors in the process of drafting or publishing their systematic review of OMIs, or who recently (2022/2023) published their review were eligible for pilot testing the pre-final guideline. Pilot testers were recruited through the network of the steering committee, by emailing corresponding authors of systematic reviews published in 2022/2023 included in the COSMIN database [59], and by emailing contact persons of ongoing or completed (but not yet published) systematic reviews of OMIs registered in PROSPERO between January 1, 2020, and January 1, 2023 [66]. Pilot testers received the pre-final guideline and were asked to apply it to their drafted, submitted or recently published systematic review of OMIs. Pilot testers provided feedback on the relevance and understandability of each item and its E&E text using a structured survey in REDCap [55]. Responses from pilot testers were reviewed and used to improve the guideline.

End-of-project meeting

We held a hybrid two-day end-of-project meeting in Toronto, Canada, and over Zoom in June 2023, with most members of the steering committee attending in-person. The main goals of the meeting were to finalize the guideline based on the feedback from the pilot testers and discuss its implementation, dissemination, and endorsement. We held hybrid sessions ranging from 60–90 min on Zoom with the following groups: patients/members of the public, journal editors, pilot testers, and data visualization/OMI systematic review experts. Two weeks before the meeting, attendees received an information package via email, including 1) the agenda, session aims, meeting details, and practical information, 2) the bios and photos from participants relevant to their session, and 3) any session-specific documents, if applicable. Attendees were asked to review the information ahead of the meeting.

Results

Delphi study

In total, 252 potential panelists were invited for the Delphi study, of which 81 registered (response rate 32%). Additionally, 38 persons registered through referral. One person withdrew before the start of the first Delphi round, resulting in 118 invited panelists for each round. Of these, 109 panelists responded to at least one round (Online Resource 3a); their characteristics are presented in Table 1. Round 1 was completed by 103 panelists, whereas rounds 2 and 3 were completed by 78 panelists.

Table 1 Characteristics of Delphi panelists and participants in the workgroup meeting

In round 1, 49 potentially relevant items were proposed. Thirteen original PRISMA 2020 items reached consensus for inclusion and wording, whereas 4 original PRISMA 2020 items with limited relevance to systematic reviews of OMIs (related to data items, effect measures, and reporting biases) reached consensus for deletion (Fig. 2, Online Resource 4). Panelists made many qualitative arguments and suggestions for rewording. Wording was revised for all other items based on suggestions from panelists, and these items moved forward to round 2. For two items, related to the name and description of the OMI of interest and citing studies that appear to meet inclusion criteria but were excluded, panelists made compelling arguments for deletion in round 1. Panelists were asked to confirm deletion of these items in round 2, despite the high percentage of consensus for inclusion obtained in round 1.

Fig. 2
figure 2

Proposals and consensus for items in each Delphi round and the workgroup meeting. COSMIN COnsensus-based Standards for the selection of health Measurement Instruments; OMI outcome measurement instrument; PRISMA Preferred Reporting Items for Systematic reviews and Meta-Analyses

While analyzing responses from round 1, we observed misunderstanding among panelists for the item pertaining to the abstract and for the items pertaining to the syntheses. Therefore, we extensively revised these items for round 2. Instead of one abstract item covering all elements, we added thirteen more specific abstract items, based on the PRISMA for Abstracts checklist [16, 67]. Three syntheses items were thought to be of limited relevance for systematic reviews of OMIs, and panelists were asked to confirm the deletion of these items in round 2. Based on suggestions for additional items, we drafted 1 new item pertaining to author contributions for consideration in round 2.

In round 2, 19 additional items reached consensus for inclusion and wording, whereas 4 items reached consensus for deletion (Fig. 2, Online Resource 4). Wording was revised for the other items based on suggestions of the panelists, and moved forward to round 3, despite having mostly high percentages of consensus for inclusion and wording.

In round 3, 12 additional items reached consensus for inclusion and wording (Fig. 2, Online Resource 4). Wording was slightly revised for 9 remaining items, although most of these items had high percentages of consensus for inclusion and wording. Consensus for inclusion was not reached for 2 items. These 11 items moved forward for discussion during the workgroup meeting.

Besides the confusion on the abstract item and syntheses items in round 1, panelists’ comments revolved around terminology for ‘studies’ and ‘reports’ as unit of analysis within these types of reviews. Within the context of measurement property evaluation, there is ongoing confusion about what constitutes a ‘study’. To avoid such confusion among review authors, we suggested to replace the PRISMA 2020 items that ask to report “the number of studies included in the review” by “the number of reports included in the review”. Ultimately, consensus on terminology was reached (see Table 4 for definitions of ‘study’, ‘report’, and ‘study report’) and the term “study reports” was used in those items.

Notably, patient/public involvement impacted the inclusion of reporting items pertaining to 1) feasibility and interpretability of the OMI, 2) recommendations on which OMI (not) to use, and 3) the plain language summary. Although other Delphi panelists saw little relevance for these items in the first Delphi round, patients/members of the public felt strongly about including these items. Their arguments ultimately persuaded other Delphi panelists to vote for inclusion of these items.

Workgroup meeting

In total, 33 persons were invited to the hybrid workgroup meeting, of which 24 (72%) attended the meeting (16 through Zoom, 8 in-person). Attendants included nine steering committee members (one member was unable to attend), four members of the technical advisory group (all Delphi panelists), three knowledge users (two Delphi panelists), three patients/members of the public (all Delphi panelists), and five Delphi panelists (Online Resource 2). Their characteristics are presented in Table 1.

Through discussions, we reached agreement for wording for the 9 items that had their wording revised based on comments of panelists in Delphi round 3 (Fig. 2, Online Resource 4). Two items required voting on inclusion/deletion (one on citing reports that were excluded, one on author contributions). The first item reached 86% agreement for inclusion; the second item reached 76% agreement for deletion.

Developing the guideline

All but three workgroup meeting participants contributed to drafting the E&E document for specific items. E&E text for each item was drafted by at least two writers and checked by at least 2 reviewers. Patients/members of the public signed up to be reviewers for reporting items that would benefit greatly from their input (e.g., items pertaining to the plain language summary, feasibility and interpretability of the OMI, and recommendations on which OMI (not) to use), as well as some other items, resulting in a clearer guideline. Select members of the steering committee made editorial edits for accuracy and consistency across items. A PRISMA-COSMIN for OMIs 2024 flow diagram was created.

We approached 515 potential pilot testers, of which 65 registered (response rate 13%). Additionally, 27 persons registered through referral, resulting in 92 registered pilot testers. These pilot testers were all in the process of drafting or publishing their systematic review, or recently published their review. Of these, 65 contributed to pilot testing by applying the guideline to their systematic review (Online Resource 3b); their characteristics are presented in Table 2. Pilot testers commented on the usability of the guideline and E&E document and made suggestions to improve clarity of the items and the E&E document.

Table 2 Characteristics of pilot testers

Seven members of the steering committee met in-person for the two-day end-of-project meeting to finalize the guideline and E&E document based on the feedback of pilot testing, whereas two joined through Zoom (one was unable to attend). In addition, the following groups attended hybrid sessions: patients/members of the public (n = 5), journal editors (n = 7), pilot testers (n = 4), and data visualization/OMI systematic review experts (n = 6).

Feedback from pilot testing resulted in minor changes in wording and restructuring of the items, but not to changes in the content of the checklist. Most importantly, we changed the title of the section ‘other information’ to ‘open science’ and moved this section before the items on the introduction, consistent with the recently published CONSORT 2023 statement.

The PRISMA-COSMIN for OMIs 2024 guideline consists of a checklist for full systematic review reports with 54 (sub)items (Table 3), and a glossary of technical terms used (Table 4). The 13 items pertaining to the title and abstract are also included in a separate checklist that authors drafting e.g., conference abstracts could use. Their respective E&E documents (Online Resource 5 shows the E&E for full reports) contain a rationale for each item, essential and additional elements, and quoted examples from a published systematic review of OMIs. The PRISMA-COSMIN for OMIs 2024 flow diagram is shown in Fig. 3.

Table 3 PRISMA-COSMIN for OMIs 2024 checklist with Abstract items featured
Fig. 3
figure 3

PRISMA-COSMIN for OMIs 2024 flow diagram

Table 4 Glossary of terms used in PRISMA-COSMIN for OMIs 2024

Discussion

This paper outlines the development of PRISMA-COSMIN for OMIs 2024, including a Delphi study, workgroup meeting, pilot testing and an end-of-project meeting, and contains the checklist and E&E document for full reports. PRISMA-COSMIN for OMIs 2024 is intended to guide the reporting of systematic reviews of OMIs, in which at least one measurement property of at least one OMI is evaluated. These systematic reviews support decision making on the suitability of an OMI for a specific application. PRISMA-COSMIN for OMIs 2024 is not intended for reviews that only provide an overview (characteristics) of OMIs used, as these reviews are more scoping in nature. Systematic reviews of OMIs conducted with any methodology can use PRISMA-COSMIN for OMIs 2024; it does not apply specifically to systematic reviews conducted with the methodology or tools from the COSMIN initiative, although it is consistent with COSMIN guidance [2].

Similar to PRISMA 2020 [16], PRISMA-COSMIN for OMIs 2024 consists of two checklists (one for full reports and one for abstracts), their respective E&E documents, and a flow diagram. To develop PRISMA-COSMIN for OMIs 2024, we adapted PRISMA 2020 and made the following revisions to the checklist for full reports: 9 new items were added, 8 items were deleted because they were deemed not relevant for systematic reviews of OMIs, 24 items were modified, and 22 items kept as original. This checklist thus contains 54 (sub)items addressing the title, abstract, plain language summary, open science, introduction, methods, results, and discussion sections of a systematic review report. The 13 items pertaining to the title and abstract are also included in a separate Abstract checklist, accompanied by a separate E&E document that authors could use when drafting abstracts (e.g., conference abstracts).

The rigorous development process ensured that PRISMA-COSMIN for OMIs 2024 was informed by the knowledge of those who have expertise in OMIs and OMI systematic review methods, and patients/members of the public with lived experience. We were fortunate to include a good cross-section of stakeholders. Pilot testing with a large sample of authors of various OMI systematic reviews further improved PRISMA-COSMIN for OMIs 2024, confirming its broad applicability to different types and fields of OMI systematic reviews. We included patients/members of the public in the development process, as they are ultimately impacted by the results of systematic reviews of OMIs and the OMIs that are selected based on these reviews. Impact of patient/public involvement was evident, as four items were included that might have been disregarded, and their suggestions for rewording made the guideline clearer. As patient/public involvement in reporting guideline development is still in its infancy [76], we extensively evaluated this part of the process, reflected on lessons learned and provide recommendations for future reporting guideline developers elsewhere [19].

The field of evaluating OMIs is continuously evolving. For the development of PRISMA-COSMIN for OMIs 2024, we took PRISMA 2020 [16] as a guiding framework and used consensus methodology to modify, add, and delete reporting items based on the OMI literature and existing guidelines. The COSMIN guideline for systematic reviews of OMIs [2] was particularly important, as this currently is the most comprehensive and widely used guideline. Novel developments to evaluate OMIs, such as modern validity theory [77, 78] and qualitative research methods to investigate the impact of response processes and consequences of measurement [79, 80], might become increasingly important. Review authors who apply these methods are also able to use PRISMA-COSMIN for OMIs 2024 to guide their reporting. We will monitor the need for adaptations to the guideline should these methods be applied more frequently in OMI systematic reviews and require specific additional reporting items.

Despite the rigorous development process, we cannot be certain that we would have obtained exactly the same results if we would have done the process again, either with the same or with different participants. For example, in the Delphi study and workgroup meeting, we had relatively low representation of people from lower- and middle-income countries. This might have impacted our results, although representation in the pilot study was better. Another potential limitation is that we did not systematically search the literature to identify potential items in the preparation phase of the process. This was largely for pragmatic reasons, as we assumed that not much information on reporting recommendations for systematic reviews of OMIs would exist, as opposed to reporting guidance for primary studies on measurement properties [40]. Instead, we took PRISMA 2020 [16] as an evidence-informed and consensus-based framework and, based on our experiences with conducting, authoring, and reviewing systematic reviews of OMIs, we modified, added or deleted items. By applying the initial item list to three high-quality OMI systematic reviews we were able to confirm the relevance of items. The Delphi study and pilot testing with large and diverse samples validated these decisions. Moreover, our definition of consensus (67%) is somewhat arbitrary, although it has been used in other Delphi studies [24, 57, 64]. However, we ultimately reached at least 80% agreement on inclusion and wording in the Delphi study, so even if we had used a higher cut-off, this would not have changed our results.

Complete and transparent reporting of systematic reviews of OMIs is essential to foster reproducibility of systematic reviews and allow end-users to select the most appropriate OMI for a specific application. We hope that PRISMA-COSMIN for OMIs 2024 will improve the reporting of systematic reviews of OMIs as well as the quality of such reviews [7, 8]. PRISMA-COSMIN for OMIs 2024 will be published on the websites of the EQUATOR network, PRISMA, COSMIN, and www.prisma-cosmin.ca. To promote its uptake, a social media campaign to increase awareness, a short video (2–3 min) explaining the resources available to guide reporting systematic reviews of OMIs, and 1-page tip sheets outlining how to report each item will be created, in addition to patient-targeted materials. Furthermore, we are considering an automated e-mail system, whereby authors who register their OMI systematic review in PROSPERO [66] receive PRISMA-COSMIN for OMIs 2024. We will monitor the need for updating PRISMA-COSMIN for OMIs 2024, to reflect changes in best practice health research reporting and to stay consistent with PRISMA terminology.