Background

Therapeutic decision-making processes should be based on the best available evidence [1]. Documents that synthesise evidence concerning a particular subject facilitate access to such information for the consumers of the product in question (physicians, pharmacists, hospital committees, regulatory organisations). Systematic reviews (SRs) are the standard documents that provide syntheses of evidence. Their conclusions are often used as a starting point for the development of clinical practice guidelines, and also for establishing recommendations concerning diagnostic, prognostic, and/or therapeutic interventions [2]. However, applying the information contained within these documents requires authors to follow rigorous procedures to ensure adequate methodological quality is present, minimise the risk of bias, and facilitate reporting and dissemination. A large number of primary studies and evidence-synthesis documents have been published to date, but many are redundant, do not reach the necessary methodological quality, or have a high risk of bias [3]. Considering this situation, it is not easy for consumers to identify synthesis documents that are of good quality and have a low risk of bias.

Psoriasis is a chronic disease, with moderate and severe forms associated with significant comorbidity, impaired quality of life, and high direct and indirect costs [4]. An increasing number of elective therapies have been developed during the last decade, but these usually have potentially significant adverse side effects and high costs, which puts patients at risk and brings the sustainability of the health systems into question [5, 6]. Assessing full-text documents using Assessing the Methodological Quality of Systematic Reviews (AMSTAR) and Risk of Bias in Systematic Reviews (ROBIS) tools, we recently observed that most SRs relating to interventions in psoriasis are of low methodological quality (28.8%) and have a high bias risk (86%) [7]. However, it is impractical to suggest that interested parties apply this same method to assess the methodological quality and the risk of bias of SRs, as it is a time-consuming process that requires systematic literature searching, abstract screening, and full, in-depth manuscript assessment; further, two or more evaluators are required to control for rating discrepancies [8]. In the recent years, efforts have been made to automate some steps towards SR development. In this sense, machine learning resources have been evaluated to assist the conduction of SRs [9] as well as for assessing the risk of bias of SRs [10].

In 2013, the Preferred Reporting Items for Systematic Reviews and Meta-analyses for Abstracts (PRISMA-A) was published, featuring guidelines concerning methods of writing and presenting abstracts for systematic reviews and meta-analyses [10]. PRISMA-A is a checklist developed to help authors report all types of SRs, although it mainly relates to SRs concerning evaluations of interventions in which one or more meta-analyses are conducted. This tool features 12 items related to information that should be provided in order to present the methods, results, and conclusions in a manner that accurately reflects the core components of the full review. However, the relationship between the reporting quality of such abstracts, the methodological quality of the full texts, and the risk of bias in these texts is still unknown.

Thus, the primary objective of our study is to apply PRISMA-A to evaluate the reporting quality of SR abstracts relating to psoriasis interventions. Our secondary objective is to determine if this instrument indirectly captures the methodological quality of and the risk of bias in the full reviews, which we measured using AMSTAR and ROBIS instruments. Finally, we discuss our attempt to develop classification algorithms using PRISMA-A that can provide deeper analysis of reviews based only on abstract data.

Methods

Protocol and elegibility criteria

To begin, we established an a priori protocol to evaluate AMSTAR vs ROBIS in which we predict the measurement of compliance with PRISMA-A and published it in the PROSPERO International Prospective Register of Systematic Reviews (PROSPERO 2016: CRD42016053181). In this protocol, we included SRs or MAs published in scientific journals that related to interventions in skin psoriasis. Historical articles, abstracts of congresses, case reports, surveys, narrative reviews, narrative reports (i.e., reports that have a particular focus on understanding a concept), clinical practice guidelines, consensus documents, MAs performed without a systematic literature search, and reviews titled as literature reviews or integrative reviews were not included. Further, as a result of the time limitation on completing the project, the documents retrieved were restricted to English-language reviews. There was no limitation on the year of publication or study population.

Search and selection methods

As a systematic literature search was conducted in a previous study and, taking the results listed, we filtered them to include only those published by July 5th 2016 [7]. Then, new SRs and MAs published by January 2017 were identified using MEDLINE, EMBASE, and the Cochrane Database. Details regarding the search methods applied for identifying and selecting these documents are provided in Additional file 1.

Quality assessment of abstract reporting

Two investigators (JL-HR and JL-SC) independently assessed the abstract-reporting quality of each review; they used the same data abstraction forms for each review and were blinded to the names of the journals, the authors, and the authors’ affiliations. As mentioned above, we applied PRISMA-A, a checklist designed to determine if the content of an SR abstract is truthful, to assess reviews of psoriasis interventions [11]. PRISMA-A features a 12-item checklist concerning information that should be provided in SR abstracts; specifically, these are: title; objectives; the eligibility criteria of included studies; information sources, including key databases and dates of searches; methods of assessing bias risk; number and type of included studies; synthesis of results for main outcomes; description and direction of the effect; summary of the strengths and limitations of the evidence; general interpretation of results; funding sources, and registration number.

Methodological quality of SRs

Two investigators (FG-G and JG-M) independently assessed the methodological quality of each review using AMSTAR tool; again, these investigators were blinded to the names of the journals, names of the authors, and authors’ criteria. In the case of a disagreement, an independent researcher (JR) was consulted. Review quality was classified by total AMSTAR score following one of the most used cutoff points for AMSTAR levels [for low (0-4), moderate (5-8), and high methodological quality (9-11) respectively [12]. Detailed information about the AMSTAR checklist and the system of rating the articles are presented in Additional file 2.

Bias risk of SRs

Two investigators (FG-G and MA-L) independently assessed the bias risk of each review using the same data abstraction forms for each and while being blinded to the names of the journals, the names of the authors, and the authors’ affiliations; specifically, we used ROBIS, which features a four-stage approach, to assess this bias risk [11]. ROBIS is conducted over three phases. Phase 1 involves assessing the relevance of the review, and is considered optional. Phase 2 includes four domains: 1) study eligibility criteria, 2) identification and selection of studies, 3) data collection and study appraisal, and 4) synthesis and findings. Finally, phase 3 assesses the overall risk of bias in the interpretation of the review findings and whether limitations identified in any of the phase domains have been considered. To simplify analyses, SR that were rated to have an unclear risk of bias using ROBIS tool were discussed with a third evaluator to take the final decision to categorize them in the group of high or low risk bias. Recently, good validity, reliability and applicability of ROBIS tool have been demonstrated [13]. Detailed information about the ROBIS tool and the system of rating are presented in Additional file 3.

Data extraction and statistical analysis

For studies that fulfilled the inclusion criteria, five investigators (FG-G, JG-M, PA-M, JLS-C, and MG-P) independently obtained metadata from each. Studies were then classified as Cochrane or non-Cochrane reviews. Cochrane affiliation was defined for authors of Cochrane Reviews published at the Cochrane Database of Systematic Reviews (CDSR) and authors using a Cochrane group name even if the paper was not published at CDSR. PRISMA-A results are represented on Likert scales as percentages of achievement per item. PRISMA-A results are also summarised on Likert scales in regard to methodological quality and risk of bias. Total and by item interrater reliability (IRR) of PRISMA-A was assessed using the irr R package. Differences in the mean total of PRISMA-A scores when comparing methodological quality and risk of bias levels were assessed using the Kruskal-Wallis and Wilkoxon tests, respectively. Evidence against the null hypothesis was considered for a two-tailed p value of < 0.05. Further, generalised linear models were obtained using the median total PRISMA-A score as the dependent variable. Adjustments were made for several metadata: actual observed ‘abstract word count’ (≤ 300 versus >300), ‘abstract format’ (8-headings, IMRAD, and free format), ‘Cochrane affiliation authors’, ‘number of authors’ (≤ 6 versus >6), ‘number of authors with conflict of interest’, ‘source of funding’ (pharma, academic or none/UNK), ‘PRISMA endorser journal’ (‘yes’ versus ‘no’), ‘PRISMA-A statement’ (review published before or after 2013), and ‘journal impact factor’. The ‘IMRAD’ format include: introduction, methods, results, and discussion. The ’8-headings abstract’ format includes: background, objectives, search methods, selection criteria, data collection, analysis, main results, and author’s conclusions. We checked the list of journals endorsing PRISMA at the PRISMA web (URL: http://www.prisma-statement.org/Endorsement/PRISMAEndorsers.aspx). Multivariate predictive model was created including those variables that were statistically significant in the univariate predictive models (p<0.05). Recursive partitioning of our dataset helped us to develop easily visualised decision rules for predicting the methodological quality of SRs based on abstract analysis. Next, two classification trees were created for methodological quality (‘high’ and ‘moderate’ levels were recoded as ‘high-moderate’ in order to produce a simpler model with a binary response) and risk of bias. Decision trees were obtained using the rpart R package that implements several algorithms. Cut off points were obtained as results of complex internal processes of these algorithms, and therefore they were not selected by the authors. We used cross-validation method to evaluate predictive accuracy of our model as compared with the rest of tree models. We have performed sensitivity analysis for both AMSTAR and ROBIS classification trees by random selection of the training dataset to build 2.000 models in each case. Values of ’variable importance’ parameter obtained for every node and model were plotted. Graphs were produced and statistics were analysed using several packages of R language (R Development Core Team).

Protocol vs. overview

Our planned search strategy was recorded in PROSPERO and was compared with the final reported review methods. We decided to use the machine learning classification procedure to obtain classification trees based on PRISMA-A after our protocol was published.

Results

Review selection

Our new database search (from July 5th 2016 to January 1st 2017) yielded 161 titles with potential relevance (125 from EMBASE & MEDLINE, 10 from EMBASE only, three from MEDLINE only, and 23 from the Cochrane Database). After excluding duplicated articles and screening titles and abstracts, 44 new studies were judged to be potentially eligible for full-text review, and after assessment, final reviews were added to the previously obtained 119 reviews (Fig. 1). Thus, 139 reviews comprising 4357 primary studies about interventions in psoriasis were published by 62 journals from 1997 to 2017. Lists of included and excluded articles are shown in Additional files 4 and 5.

Fig. 1
figure 1

PRISMA flow diagram of article selection process

Reporting characteristics of SRs

The interrater reliability (IRR) of both raters for total score was substantial (κ=0.77; 95% CI, 0.59-0.88). IRR was highest for question PEA1 (κ=0.86) and lowest for question PEA8 (κ=0.08) (Additional file 6). As shown in Fig. 2, of the 12 PRISMA-A items, there were three items for which more than 90% of the included reviews received a ‘yes’ rating: item 2 (objectives; 94.9%), item 10 (interpretation of results; 94.1%), and item 1 (description of the effect; 93.4%). However, less than 50% of the SRs fulfilled the criteria for item 5 (risk of bias; 23.3%) and item 9 (strengths and limitations of evidence; 27%). Finally, almost none of the SR abstracts fulfilled item 12 (registration; 1.4%) or item 11 (funding; 0.7%). Considering item ratings for each SR, six of the 139 reviews received a ‘yes’ rating for 10 or 11 of the 12 PRISMA-A items. The median number of fulfilled items for each review was six (range: 2-11).

Fig. 2
figure 2

Plot of Likert scales with PRISMA-A. This graph shows the frequency distributions of responses to SR reporting assessment using PRISMA for Abstracts. This graph shows frequency distributions of responses (yes, no) to the 12 items of PRISMA for Abstracts

Reporting quality and risk of bias

For reviews with a high risk of bias, the median number of PRISMA-A items with a ‘yes’ rating was six (2-10). Interestingly, for reviews with low bias risk, the minimum number of items with a ‘yes’ rating was also six (Table 1). Fig. 3a-b shows PRISMA-A Likert scales in which the percentage of achievement per item for high-bias-risk SRs was compared with reviews that had low bias risk; this was performed using the ROBIS tool. Overall, the response profiles are quite similar, with only a slight increase of compliance found in the low-bias-risk subgroup for the ‘interpretation’, ‘funding’, and ‘registration’ items. Lastly, SRs with a low risk of bias showed higher total PRISMA-A values than reviews with high bias risk (7.7±1.26 vs 6.75±1.59, p=0.012) (Additional file 7).

Fig. 3
figure 3

Frequency distributions of responses to reporting assessment using PRISMA for Abstracts comparing SR based on methodological quality and risk of bias. This panel of plots contains different graphs showing PRISMA for Abstracts results when reviews are subgrouped by ROBIS (a,b) and AMSTAR (d,d,e) classifications. (a-b) These plots display frequency distributions of responses (‘no’, ‘yes’) to PRISMA for Abstracts comparing reviews by risk of bias using ROBIS tool (‘high’ or ‘low’). (c-d) These plots show frequency distributions of PRISMA for Abstracts responses (‘no’ or ‘yes’) comparing reviews by AMSTAR-derived methodological quality levels (‘high’, ‘moderate’, or ‘low’)

Table 1 Number of PRISMA-A items reported in abstracts of SRs on psoriasis interventions classified by methodological quality (AMSTAR) or risk of bias (ROBIS)

Reporting quality and methodological quality

Figure 3c-e presents the percentage of achievement per PRISMA-A item, comparing SRs classified using the AMSTAR instrument (as high, moderate, or low methodological quality). In this case, unlike the findings concerning the bias-risk subgroups, there are different patterns for each level of methodological quality. For high-methodological-quality reviews, the median number of items with a ‘yes’ rating was eight (6-11), with six (4-10) and five (2-8) for moderate and low quality reviews, respectively (Table 1). Item 5, ‘risk of bias’, showed the widest variation between the subgroups, and items 10 (‘funding’) and 11 (‘registration’) displayed minimal variation. Lastly, the mean total PRISMA-A score was significantly higher for SRs with high methodological quality than for moderate (7.73±0.13 vs 7.05±0.13, p=0.031) and low methodological quality (7.73±0.13 vs 5.77±0.13, p=0.001) (Additional file 8).

Factors influencing reporting quality

Univariable and multivariable logistic ordinal regressions were performed in order to predict PRISMA-A results (Table 2). The univariable regression models showed ‘abstract word count > 300’, ‘Cochrane author affiliation’, ‘authors per review > 6’, and ‘academic source of funding’ to be predictors of high achievement in regard to PRISMA-A items; meanwhile, IMRAD and free abstract formats were predicted to suggest a lower number of PRISMA-A items than the 8-heading abstract format. Journals with an impact factor of ≤ 3 or journals that did not endorse PRISMA-A statements were also used as predictors for low reporting scores. In the final model, only ‘authors per review > 6’ (OR: 1.098; 95% CI: 1.012-1.194), ‘academic source of funding’ (OR: 3.630; 95% CI: 1.788-7.542), and ‘PRISMA-endorsed journal’ (OR: 4.370; 95% CI: 1.785-10.98) predicted PRISMA-A variability.

Table 2 Univariate and multivariate predictive models of PRISMA-A items reported in abstracts of SRs on psoriasis interventions

Classification trees for SRs methodological quality prediction based on abstract reporting assessment

We used classification trees as a visual tool with which to gain an idea of the abstract-related variables that are important for predicting SRs with low methodological quality and high bias risk, and how they relate to each other; this was because trees can capture nonlinear relationships among predictors. Total and by-item results for PRISMA-A were included as predictor variables. Figures 4 and 5 display pruned classification trees for both methodological quality and risk of bias, respectively. Essentially, Fig. 4 shows that abstracts that had a total PRISMA-A score of less than six, lacking any identification in the title of being an SR or MAs, as well as lacking an explanation of the methods applied for assessing bias risk, were classified using AMSTAR as having low-methodological quality with a root node error of 0.15 and a misclassification rate of 22.6% in the cross-validation. In Fig. 5, abstracts with a total PRISMA-A score equal to or higher than nine which included the results of the main outcomes and an explanation concerning the methods used for assessing bias risk were classified as having low-bias risk, with a root node error of 0.14 and a misclassification rate of 20.6% in the cross-validation. We found that the nodes included in our tree models were also at the top ranking of nodes when ordered by median importance after sensitivity analysis (Additional files 9 and 10). Overall, a higher dispersion of ‘variable importance’ values of AMSTAR-derived trees as compared with ROBIS trees suggests that AMSTAR classification tree is less robust than ROBIS classification tree.

Fig. 4
figure 4

Tree classification model of the methodological quality of SRs based on PRISMA-A total and per item scores. Each node shows from top to bottom the predicted class (high-moderate, low), the predicted probability of each class, and the percentage of observations in the node

Fig. 5
figure 5

Tree classification model of the bias risk of SRs based on PRISMA-A total and per item scores. Each node shows from top to bottom the predicted class (high, low), the predicted probability of each class, and the percentage of observations in the node

Discussion

Main findings

To the best of our knowledge, this is the first study to evaluate the capacity of PRISMA-A to determine the methodological quality and bias risk of SRs or MAs relating to psoriasis interventions. In short, this study suggests that the reporting quality of abstracts of reviews published concerning psoriasis interventions is suboptimal. Overall, the average percentage of PRISMA-A items featured in each abstract was 50-67%. While ‘objectives’, ‘interpretation of results’, and ‘description of effect’ were included in almost all abstracts, the majority failed to adequately report ‘strengths and limitations’ and ‘risk of bias’; furthermore, registration numbers and disclosures of sources of funding were almost universally absent.

We found that methodological quality and risk of bias, assessed using AMSTAR and ROBIS instruments, correlated positively with the PRISMA-A evaluations of the quality and completeness of abstract reporting. Previous studies have supported the theory that improving the abstract quality of SRs may provide a more accurate reflection of their methodological quality. Previous studies, applying AMSTAR, evaluated the quality of SRs with regard to adherence to PRISMA statements, and found that PRISMA endorsement enhanced compliance with AMSTAR scale items in gastroenterology/hepatology and surgical journals [14, 15]. Further, using a masked randomised trial, Cobo et al. analysed the feasibility of using CONSORT- and STROBE-reporting guidelines to support the peer-review process performed by a general medicine journal editorial team [16]. Moreover, Rice et al., using AMSTAR, found a positive correlation between the overall quality ratings of SRs with MAs and the number of PRISMA-A items adequately reported [17].

The above findings are similar to our own, as we also found that the methodological quality of reviews assessed using the AMSTAR instrument correlated positively with PRISMA-A evaluations of the quality and completeness of abstract reporting. However, no study has yet been published presenting a significant correlation between PRISMA-A compliance and risk of bias; in our study, significant differences in terms of abstract quality were observed between SRs with high and low bias risk.

Strengths and limitations

In this study, we explored, for the first time, the capacity of PRISMA-A to determine both the methodological quality and the bias risk of full-text reviews using ROBIS and AMSTAR tools. Our study includes a large sample of over 15 years of reviews (n=139) concerning interventions in psoriasis. The study was performed using a systematic search strategy and following an a priori protocol published in PROSPERO; the AMSTAR and ROBIS assessments were performed independently by two authors, and there were few disagreements during the process, all of which were solved through discussion. Nevertheless, our study has some limitations. First, this study only featured SRs and MAs relating to interventions in psoriasis, so there is a limitation in terms of the generalisability of the data, as we did not compare our results to reviews conducted in relation to other diseases or areas of healthcare. Second, the search was restricted to MEDLINE, EMBASE, and the Cochrane database; this was because our intention was to obtain a representative sample of published systematic reviews concerning psoriasis interventions, rather than cover all such reviews. We did not search for SRs in grey literature databases, and, therefore, we cannot establish differences in terms of methodological quality and risk of bias with respect to those that were examined. Third, during the cross-validation, we found a misclassification rate of 20-22%; this means that for one in five abstracts, the methodological quality and risk of bias are mistakenly classified. To rectify this, we would require external validation to test the performance of our models with other datasets. In any case, a desirable improvement in the quality of reporting could result in the disambiguation of many SRs classified as having moderate quality and causing level overlapping during the cross-validation. Fourth, a limitation of this work is that different reviewers applied PRISMA-A, AMSTAR and ROBIS. Only one of threes raters carried out the evaluations both with AMSTAR and ROBIS tools. Although their results were compared in pairs and discrepancies were discussed with a fourth rater, there is a risk that this issue will affect the validity of our results. Finally, it is a limitation not to have considered the year in which journals are endorsing PRISMA-A and there is a risk of bias in this regard.

Our findings in context

Our findings were similar to those of a previous study conducted by Bigna et al. In this latter study, the authors found that the quality of reporting was declining in terms of the ‘strength and limitations of evidence’ and ‘funding’ of reviews [18]. Further, Tsou et al. used PRISMA-A to analyse 200 randomly selected abstracts of SRs relating to health interventions and found that less than 50% of the abstracts contained information concerning the ‘risk of bias assessment’ (23%), ‘study protocol registration’ (2%), and ‘funding source’ (1%) [19]. Moreover, Seehra et al. studied the reporting completeness of abstracts of SRs published in dental speciality journals [20]. They developed a check list that included several items from different sources: PRISMA statement guidelines [21], the Cochrane Handbook for Systematic Reviews of Interventions [22], and the paper by Beller et al. [14]. We did not find quality of reporting differences between reviews as they were published before vs after PRISMA-A statement. Our results are similar to those found by Panic et al. [23]. These authors demonstrated that the quality of reporting improved only sub-optimally in the years following the publication of PRISMA.

The capacity of abstract extension to predict PRISMA-A variability has also been addressed in other studies. Interestingly, the number of words per summary explains a very small part of it [17], and even better reporting results were observed for abstracts with < 300 words [16]. In the latter study, a better abstract structure (8-headings vs IMRAD formats) also predicted an improved reporting quality. These results are similar to ours and suggest that abstract systematization and concretion are more important than its extension to define the quality of the summary report.

Implications of results

Motivated by the possibility of capturing through an abstract, at least in part, the methodological quality and the risk of bias of a study, which are normally evaluated using information contained in the full text of the document, we explored the possibility of obtaining simplistic and feasible decision models that are easy to interpret and intuitive to follow. Our method is offered as a support to decision making and does not intend to replace the rigorous final analysis of each synthesis document, but it allows to prioritize in a simple and rapid way those documents obtained in a first search by professionals not experts in this type of methodology. We believe that the information contained in the abstract is a good source that can allow us to work in this sense and this is the original contribution that we make. The importance of our proposed tree models lies in their capacity to assist in abstract filtering using just the predicted methodological quality and bias risk determined through PRISMA-A abstract analysis, which is a more feasible instrument than the AMSTAR or ROBIS tools. Our decision trees have been constructed using a machine learning tool. This type of technology is currently being used to systematize some aspects of RS such as article selection or risk assessment bias [9]. We believe that the association of validated tools that measure quality or bias risk and machine learning technology may improve methodological assessment processes. Better meta-epidemiological knowledge together with the development of text mining strategies will allow to develop models that help clinicians to simplify making decisions at clinical setting. Finally, the final classification determined in both decision trees is congruent with the idea that methodological quality explains only part of the risk bias of SRs, as we found the degree of compliance with PRISMA-A required to predict SRs of low risk bias is greater than that required to predict high-methodological-quality SRs. Therefore, we can conclude that the methodological quality and the risk of bias of SRs may be captured by analysing the quality and completeness of abstract reporting, and that by applying our decision tree models, the review-filtering process may be improved through rapid abstract analysis.

Conclusions

Our proposal is aimed to facilitate the evaluation of evidence synthesis by clinical professionals with a lack of methodological knowledge and skills. It does not intend to replace the rigorous final analysis of each review, but it allows to prioritize in a simple and rapid way those documents obtained in a first search. We believe that summaries are a good source to investigate methodological quality and risk of bias through quality and completeness assessment of abstracts. We are aware that our decision trees could be improved and that a external validation of our models in different research fields is necessary.