Background

Sarcomas are rare malignant tumours of mesenchyme origin that occur in connective tissue. They can be split up into dozens of histological categories, which may develop at any age including childhood, can affect any anatomical localisation, and are of varying aggressiveness, even within the same histological subtype [1]. There are three main types of sarcoma corresponding to different clinicopathological entities which require a multidisciplinary approach: bone sarcomas, visceral (GIST being the most typical) and soft tissue sarcoma [2]. Gastrointestinal Stromal Tumours (GISTs) are the most common mesenchymal tumours [3, 4]. They constitute 1% to 3% of all malignant gastrointestinal tumours [5].

Classically, systematic reviews (SRs) summarise the results of available healthcare studies and provide a high amount of evidence on the effectiveness of healthcare interventions [6]. However, SRs frequently address very specific questions, preventing them from providing a comprehensive overview of a given topic [7, 8]. To overcome this barrier, new formats of review (e.g., scoping reviews, evidence map, rapid review, etc.) have been developed [9,10,11], allowing an understanding of the extent and distribution of evidence in a broad clinical area, highlighting both what is known and any gaps in evidence. [10].

In 2007, the Global Mapping Initiative (GEM) was established as a collaboration of clinical research and policy stakeholders to provide an overview of existing research about traumatic brain injury and spinal cord injury [12]. Evidence mapping is an emerging tool to systematically and comprehensively identify, organise and summarise the distribution of scientific evidence on any topic. It can be the first step to conduct systematic reviews or the framework to inform policy development [11,12,13,14].

The purpose of this evidence mapping project is to identify, describe and organise the current available evidence about therapeutic interventions on sarcomas. This approach aims to determine the clinical questions assessed in the scientific literature and the corresponding quality of the supporting evidence, as well as to give general information about their claimed effectiveness. This information shall facilitate detecting research gaps and help stakeholders in the decision-making process. For the sake of clarification, this publication focuses exclusively on GIST whereas the mapping of evidence on soft tissue sarcomas will be addressed in future publications.

Methods

We conducted a mapping of evidence based on the methodology proposed by GEM [12]. In consequence, we did a comprehensive search strategy and assessed the quality of the included SRs. We have only included systematic reviews (with or without metanalysis) because they provide the most reliable empirical evidence in order to answer a specific research questions on therapeutic effects. We divided the process in four stages (Fig. 1: Tasks performed to map evidence in sarcomas).

Fig. 1
figure 1

Tasks performed to map evidence in sarcomas

Setting the boundaries and context of the evidence map

In order to framework our mapping project, we consulted the 2013 World Health Organization (WHO) classification [15], and the related clinical guidelines, combined with the consultation to an oncologist with expertise in sarcomas. With this information we established the eligibility criteria for study inclusion. We selected SRs assessing therapeutic interventions in patients diagnosed with GIST, summarizing randomized controlled trials (RCTs) as well as phase I and II clinical trials, observational studies (including cohorts studies, case control studies, cases series), or case reports.

We used a broader definition of SR in order to obtain the largest possible number of documents. The systematic reviews that conducted a search in at least two databases were considered eligible. We included the most updated review if more than one version was identified. We excluded narrative reviews and systematic reviews that were focused on prognosis or cost-effectiveness. We also excluded studies on patients with Kaposi Sarcoma and/or Ewing’s tumours because of their unique biological characteristics and management [16].

The selection was done independently by two researchers.

Searching and selection of systematic reviews

We conducted searches in PubMed, EMBASE, The Cochrane Library, and Epistemonikos from 1990 to March 2016; the former was updated in November 2016. The lower date boundary was chosen taking into account that biological mechanisms were discovered in 1998 and opened the use of biological agents as the key therapeutic approach of GIST that completely changed the management of the disease [17]. “However, we extended our retrospective search until 1990 for establishing a reasonable period of time to guarantee a higher sensitivity”.

We combined keywords and medical subject headings (Mesh terms) for all types of sarcoma according to the WHO 2013 classification [15]. We adapted the search strategy in accordance with the specific characteristics of each database. We did not limit searches by language. In addition, a clinical expert (AL) was consulted to help in identifying any other relevant reviews. Likewise, we reviewed all references in the relevant articles to identify potential additional reviews. Detailed search strategies are reported in Additional file 1.

We managed the search results with the reference manager software COVIDENCE [18]. After removing duplicates, two reviewers (MB, NM) independently screened all titles and abstracts to exclude irrelevant reviews. Full texts of potentially relevant reviews were obtained for a final decision. Disagreements were solved through discussion and consensus; if necessary, an additional reviewer (GU) was consulted. Reasons for exclusion are clearly stated.

Data analysis

We built a data extraction form to register the main characteristics and quality of included systematic reviews. We tested a pre-defined data extraction form to ensure consistency among reviewers in a pilot study with 20% of eligible SRs. Two authors extracted data (MB, NM). Disagreements were solved by discussion with a third author (AL). We collected data at three levels:

  1. a)

    General characteristics from systematic reviews: authors, year of publication, type of systematic review (with or without meta-analysis), objective, search methods, design and number of included studies, type and number of patients included, and quality of the systematic review.

Two researchers (MB, NM) independently assessed the methodological quality of the included reviews with the AMSTAR tool. Disagreements were discussed until consensus was reached. We calculated AMSTAR scores by adding one point for each item rated as “yes” and no point for items rated as “no”, “cannot answer”, or “not applicable”, resulting in overall score ranging from 0 to 11. According to the total score, SRs were grouped in three categories: low quality (0 to 3 points), 4 to 7 points (moderate quality), and 8 to 11 points (high quality) [19].

  1. b)

    Clinical questions assessed in the systematic reviews: we converted the main aim reported in the included systematic review and their eligibility criteria into clinical questions framed in a PICO format (specifying the four key components: population, intervention, comparison and outcomes). The obtained PICOs were classified in five therapeutic scenarios with the help of a clinical expert (AL). We then extracted details about the population characteristics (e.g. adult population or children, type of sarcoma, localisation of tumours), the intervention and comparator (e.g. type of intervention and comparison broadly categorised as chemotherapy, surgery, radiotherapy and others, intention and temporality of the intervention, and comparison, drugs used in chemotherapy), and outcomes. For descriptive purposes, we also categorised the conclusions reported by the authors of the included studies, into five categories: “inconclusive”, “no effect”, “harmful”, “probably beneficial” and “beneficial” (see Fig. 2 to see the criteria followed for this categorization). Two authors completed this assessment independently (MB, NM); disagreements were solved by discussion until consensus was reached. In any case, this judgement represents a formal assessment about the evidence of interventions benefits and harms.

  2. c)

    Characteristics of other research questions addressed in the systematic reviews, here named secondary PICOs: we defined secondary research questions as those for which all the elements of the PICO question were provided but the conclusions about the direction of the effect were described marginally in the article. We extracted the same information described above for the main research question.

Fig. 2
figure 2

Classification of the conclusions according to results reported by authors

Synthesising findings

We adapted every clinical question addressed in each included review into a PICO format, which specifies the types of population (participants), types of interventions (and comparisons), and the types of outcomes of interest [6]. We classified PICO questions according to the disease stage (localised or unresectable and/or metastatic GIST) and summarised the findings for each included review using: a) tables describing the characteristics of the included systematic reviews, and another one with the characteristics of all PICOs identified (main and secondary), and b) graphic display of the mapping based on bubble plots. Each bubble in the chart represents one included systematic review. This chart displays information in three dimensions: (i) the rating of authors conclusions in the x-axis (“beneficial”, “probably beneficial”, “harmful”, “no effect”, and “inconclusive”) (which are further described in the data extraction section); (ii) the AMSTAR assessment in the y-axis, and (iii) bubble size according to the number of individual studies included in the SR. Each bubble is also a pie chart that shows the proportion of randomised controlled trials included in the SR through a black bold line.

Results

We obtained 1791 records from the search after removal of duplicates. Following screening of titles and abstracts, 143 articles were obtained in full text for a final decision. A total of 41 reviews fulfilled the inclusion criteria for the final analysis, of which 17 SRs are focused [17, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36] on GIST, which developed search strategies until 2014. A flow chart showing the selection of eligible reviews is presented in Fig. 3: Flow chart outlining the study selection process. The list of the 102 reviews excluded on the Evidence Mapping along with exclusion rationale is available in Additional file 2.

Fig. 3
figure 3

Flow chart outlining the study selection process

Characteristics and quality of systematic reviews

Among the 17 included SRs, 10 included a meta-analysis. All SRs were published between 2005 and 2015 including studies conducted between 2001 and 2014. Only two SRs reported a detailed search strategy that allowed replication [23, 30]. A total of 66 individual studies were included in the SR after considering overlapping or duplication of studies, of which 43 were observational studies, 15 were randomised controlled trials, and 8 were phase II clinical trials.

Seven systematic reviews did not include any controlled clinical trials [21, 23, 27, 30,31,32, 35], and among them one did not include any study [23]. The 10 remaining systematic reviews included at least one clinical trial. The number of patients included in the systematic reviews ranged from 233 to 2018 and all were adults. Twelve SRs were conducted to assess chemotherapy interventions [17, 20, 22,23,24,25,26, 28, 29, 32,33,34, 36] and five evaluated surgical interventions [21, 27, 30, 31, 35]. Only 3 of 12 SRs assessed chemotherapy with a curative intent [17, 24, 29], whereas in the remaining 9 SRs the chemotherapy had a palliative intent [20, 22, 23, 25, 26, 28, 32,33,34, 36]. All SRs on surgery stated a curative intent. All SRs assessed the clinical end-point and five reported surgical intermediate outcomes, such as, blood loss, earlier time to flatus, oral diet, etc. [21, 27, 30, 31, 35]. All SRs, except for two, reported overall survival [27, 35]; progression-free survival, response rate and local or distant recurrence rate were reported in seven reviews [17, 22, 25, 26, 28, 33, 34, 36]; and quality of life was assessed only in three SRs [22, 28, 32]. Two reviews reported data on adverse events [23, 33]. Overall, quality of the included SRs was moderate to high according to AMSTAR scores (Fig. 4: Quality of the included SRs). The most frequent drawbacks were: the failure to report the included and excluded studies [17, 20,21,22, 24, 27,28,29,30,31, 33,34,35,36], to declare possible conflicts of interest [17, 23, 24, 30, 35], to evaluate the likelihood of publication bias [17, 23, 30, 34, 36], and to assess bias of individual studies for using it appropriately in drawing conclusions [20, 22, 23, 34]. The characteristics of the included SR are summarised in Table 1.

Fig. 4
figure 4

Quality of the included SRs

Table 1 Summary of systematic reviews included in the Evidence Mapping

PICO questions included in systematic reviews

We extracted 14 PICO questions from related to GIST SRs. The key characteristics of the PICOs are presented in Table 2. Depending on the specific type of GIST, the PICOs were grouped in the following five clinical scenarios, which include the entire clinical spectrum of the disease (from non-metastatic to metastatic cancer):

  1. (1)

    Patients with localised GIST: Five systematic reviews [21, 27, 30, 31, 35] with a total of 28 observational studies and no RCTs. All compared laparoscopic resection versus open resection in GIST adult patients. In general, the analysed outcomes were related to surgery results (blood loss, time to flatus, operative time, time to oral intake, length of hospital stay, complication rate) and oncology outcomes (overall survival, disease-specific survival). The overall conclusion from SRs was in favour of laparoscopic resection and the effects were categorised as “beneficial” [27, 30], and “probably beneficial” [21, 31, 35] due to concerns about the lack of clinical trials.

  2. (2)

    Patients with localised KIT (CD117)-positive GIST after complete surgical resection: two SRs [24, 29] which addressed the question of whether imatinib should be given as adjuvant treatment versus surgery alone without imatinib. In accordance with the risk of recurrence, one SR assessed adjuvant imatinib for overall population (all risk categories divided in three subgroups: high, intermediate, and low) [24] and the other focused on high-risk patients [29]. These two SRs based their conclusions on two controlled trials, six uncontrolled trials and nine observational studies. Overall, the results from the included reviews favoured adjuvant imatinib for patients at intermediate and high risk of recurrence, and the conclusions could be categorised as “beneficial” [24] and “probably beneficial” [29], respectively. For patients at low risk of recurrence, the conclusion of one systematic review [24] was rated as “no effect” based on the subgroup analysis of one controlled trial. One of these SR [29] also evaluated the duration of adjuvant imatinib in this population, and qualified the use of adjuvant imatinib for ≥3 years as “probably beneficial” based on one controlled trial and one observational study.

  3. (3)

    Patients with unresectable and/or metastatic GIST: five SRs evaluated different comparisons [17, 23, 32, 34, 36]. One SR assessed imatinib versus other standard treatments (these included interventions for symptom relief, best supportive care and placebo) and did not find controlled trials that directly evaluated this comparison [32]. Based on indirect comparison with six uncontrolled trials and one observational study, the use of imatinib in this population was classified as “probably beneficial”. One SR assessed the use of preoperative imatinib in the same population and compared it with surgery alone, but no eligible study addressing this issue was found [23]. Three SR [17, 34, 36] with five RCTs in this category assessed if high versus standard doses of imatinib should be used. Overall, the high imatinib doses were considered as “harmful” due to a misbalance between benefits and harms.

  4. (4)

    Patients with unresectable and/or metastatic GIST after failure of imatinib due to resistance or intolerance: four SRs addressing three different comparisons [20, 25, 26, 28, 33]. Two SRs assessed Sunitinib plus best supportive care versus imatinib at escalated doses [20, 25, 26]. No studies directly assessing this comparison were found. These two SRs presented inconsistent conclusions based on indirect comparisons from three trials and four observational studies: “beneficial” in one SR [20] and “no effect” in the other [25, 26]. One was Sunitinib plus best supportive care (as defined by the respective authors) versus best supportive care or placebo. This comparison was assessed in three SRs including one controlled trial and one observational study [20, 28, 33]. Sunitinib plus best supportive care was categorised as “beneficial” [28] and “probably beneficial” [20, 33] for this population due to limitations in the included studies. The third comparison, masitinib versus sunitib was assessed in one systematic review based on one controlled trial, which concluded that masitinib is “probably beneficial” [20].

  5. (5)

    Patients with unresectable and/or metastatic GIST after failure of imatinib and sunitinib due to resistance or intolerance: three SRs comprising three different comparisons [20, 22, 33]: a) Resumption of imatinib versus placebo was rated as “probably beneficial” in one systematic review including one controlled trial [20]; b) regorafenib plus best supportive care versus best supportive care or placebo was rated as “probably beneficial” in three SRs including one controlled trial and one uncontrolled trial [20, 22, 33]; and c) nilotinib versus placebo was rated as “no effect” and “inconclusive” by two SRs including the same controlled trial [20, 33].

Table 2 PICOs included on systematic reviews

Discussion

Although no standard definition of evidence mapping has emerged [11], these reviews share some common characteristics: (a) they are appropriate for addressing broad topics that are often too expansive for an individual systematic review; (b) they involve experts in the area of study to set the inclusion and exclusion criteria; (c) they are based on a systematic search; and (d) they include user-friendly summaries of results.

Following these criteria, this evidence mapping has identified, described and organised the current available evidence for GIST, as part of a broader project aimed to map the existing evidence for the treatment of soft-tissue sarcomas. This mapping was based on 17 published systematic reviews including 66 individual studies conducted between 2001 and 2014. Regardless of the type of evaluated intervention, three quarters of the included studies in the SR were non-experimental (observational studies or uncontrolled clinical trials). This is a phenomenon with important clinical and ethical implications since experimental studies are the best design to evaluate the efficacy of new therapeutic options. For instance, it is noteworthy that some clinical guidelines or systematic reviews [3, 23, 32, 37,38,39] are already considering surgery resection as the current standard of care for localised GIST. However, according to the SR included in this evidence mapping, none of the studies used to support that recommendation about surgery were randomised controlled trials; the total number of included patients was less than 1000; and the results consisted of intermediate surgery-related end-points rather than patient-centred outcomes. Another example is the use of imatinib, a new biologic agent, in patients with unresectable and/or metastatic GIST, evaluated in one SR (37) that only included uncontrolled trials and one observational study.

The majority of the interventions reported as “beneficial” were palliative, probably due to a high proportion of patients experiencing relapse or a metastatic process. Another interesting finding was that only three studies assessed the quality of life as an outcome and none of them conducted an economic evaluation. Quality of life measures are very important in cancer care because they can provide information about the impact of diseases and their treatment on the well-being of patients, and complements efficacy and safety data [40]. Likewise, the economic evaluation contributes to allocating resources within society as efficiently as possible [41].

The majority of the interventions were reported by authors as “beneficial” or “probably beneficial”. Only in one comparison between biologic agents (sunitinib versus imatinib escalated doses in unresectable and/or metastatic GIST after failure of imatinib) were the results controversial. As shown in the bubble plot, (Fig. 5) one SR [20] concluded that sunitinib is “probably beneficial” over imatinib at escalated doses, whereas another one [25, 26] considered that sunitinib has no effect on this type of patients. This discrepancy may be due to the fact that Hislop SR [25, 26] was based on indirect comparisons from small phase II non-randomised studies, whereas Abdel-Rahman SR [20] evaluated two retrospective observational comparative studies with direct comparisons. Currently, sunitinib is usually recommended after failure of escalated doses of imatinib for these type of patients [5, 38, 42, 43].

Fig. 5
figure 5

Mapping of evidence of GIST

Overall, the quality of the SRs according to AMSTAR was moderate to high. However, the following domains have yet to be improved: reporting the excluded studies (only 4 studies out of 17 did it), the conflicts of interest, and the assessment of the likelihood of publication bias (only 12 did it). Similarly, 4 SRs did not report the quality of the included studies nor use it appropriately in formulating conclusions, one of them being (31) the review with more studies, the second one with more included patients, and the most updated evidence about metastatic GIST after failure of imatinib and sunitinib due to resistance or intolerance. Although the evidence mapping does not usually include a quality assessment process [9], we consider that any typology of review (e.g. rapid review, scoping review, umbrella review) should evaluate this aspect in order to assess the reliability of the conclusions; particularly in this case, where most studies included in SRs on GIST sarcomas are non-experimental and have small sample sizes.

Strengths

The Global Evidence Mapping Initiative (GEM) approach that we used is a rational, systematic and constantly improved methodology. A recent systematic review [11] showed that among the 16 documents that met the common characteristics of evidence mapping, seven referenced the GEM.

Some authors consider using highly specific search strategies for evidence mapping [44, 45]. However, for the purposes of our project, we preferred to make a sensitive and adapted search strategy, taking into account the fourth edition of the World Health Organization Classification of Tumours of Soft Tissue and Bone [15] and the clinical background and expertise of one of the members of the research group, which revealed to be of great value for clarifying content doubts. Likewise, we used a broad definition of systematic review in order to obtain the largest number of documents. Thus, we consider that our search strategy is comprehensive as well reproducible; hence, it is unlikely that any relevant systematic review on sarcomas has been missed. We used the PICO format to organise and classify the information obtained from SRs, which was very useful in establishing thematic areas in this broad field. We also organised the results in graphical formats and corroborated them with other related clinical documents (e.g. clinical guidelines and consensus). Following the recommendations of GEM, we used two data extraction methods [12]: general (for characteristics of included systematic reviews) and specific (for main and secondary PICOs).

We added two uncommon components in evidence mapping. Firstly, we rated the interventions included in the systematic reviews as “beneficial”, “probably beneficial”, “harmful”, “no effect” or “inconclusive” according to the authors’ conclusions, irrespective of the reported outcomes. It is important to highlight that we did not evaluate the quality of the evidence of the studies included in each SR, which makes this approach shorter than the one required by a SR and appropriate for its descriptive purposes although the provided information is less complete. Secondly, we assessed the quality of included systematic reviews with AMSTAR. This approach allows displaying the results on a bubble plot for each SR with respect to the other ones with the same comparison, providing a quick view of the existing evidence and their quality.

According to this experience, the most time-consuming phases were classifying the interventions and extracting the secondary PICOs. Although the time spent in an evidence mapping can vary depending on the topic, we recommend elaborating a protocol before starting the project and performing a pilot study as we did.

Limitations

Some limitations were faced in this study. Firstly, our search for SRs was conducted in 2016 but their respective searches were done much earlier, being the most updated search until 2014. Therefore, we cannot guarantee a comprehensive identification of all primary studies about sarcomas that may have been published beyond this date. However, we believe that these limitations would not substantially change the main results of this evidence mapping. Secondly, as it is a characteristic of all evidence mapping methodologies, we did not assess the quality of the evidence supporting the conclusions, which would have required the use of some complementary criteria such as GRADE [46]. In order to provide some qualitative information about the validity of SR, we assessed them through AMSTAR, which is a validated tool [19]. However, a noteworthy drawback of our evidence mapping is that it merely organises and describes the available evidence as is reported by respective authors. This explains why many treatments are presented as beneficial even they are based on non-experimental studies.

Therefore, the main practical applications of evidence mapping are: to orientate further research projects; to stimulate the design of more focused RCT and other rigorous evaluative studies to fill the detected gaps in knowledge; to provide useful comprehensive information for establishing priorities when funding research in this field; to compare the obtained results with the recommendations from clinical guideline in order to identify and solve potential contradictions between them; to help future authors of SRs, rapid reviews and scoping reviews avoiding redundant efforts and improve efficiency; and, to explore innovative tools and friendly formats to disseminate the results to interested stakeholders.

Conclusions

From a practical point of view, this evidence mapping shows a relatively high consistency of effects reported by the different SRs, except for two SRs (comparing sunitinib versus escalated imatinib in GIST metastatic patients after failure of imatinib) (Fig. 5). The quality of the included SRs based on the AMSTAR criteria is moderate to high, which gives some confidence about the validity of their results. The scarce number of clinical trials in this field is remarkable, and we consider that the most important clinical questions have been covered.

In conclusion, the most common type of study to evaluate therapeutic interventions in GIST sarcomas has been non-experimental studies (observational studies or uncontrolled clinical trials), frequently based on small samples sizes. The quality of the included SR was moderate to high. The evidence mapping is a useful and reliable methodology to identify and present the current available evidence about therapeutic interventions. Therefore, these results can be helpful to facilitate any review process that may be conducted and orientate research priorities.