Background

Rationale

Prostate cancer is the most prevalent cancer in men globally, with 1.4 million new cases reported in 2013 [1]. Prostate cancer cases increased by 217% between 1990 and 2013 as a result of population growth and aging and increased uptake of opportunistic screening, particularly in developing countries [1]. Prostate cancer remains the leading cause of death among males in 24 of 188 countries covered by the Global Burden of Disease Cancer Collaboration [1].

Prostate cancer treatments are varied and include: deferred treatment (active surveillance), watchful waiting, radical prostatectomy, radiation therapy (with or without androgen deprivation therapy) or androgen deprivation therapy (ADT) [2, 3]. Each treatment will achieve different outcomes in terms of oncology (e.g., survival or time to biochemical recurrence), adverse events and patient reported outcomes such as urinary incontinence and impotence. These outcomes are important considerations when selecting a treatment for prostate cancer patients and are considered in the context of patient age, life expectancy, co-morbidities, tumour size, grade and stage and other risk indicators that influence outcomes and treatment choice. Determining which treatment choice is optimal for each patient remains an important challenge, particularly where directly relevant randomised controlled data is lacking.

To aid this decision making process, a number of tools have been developed with nomograms and risk stratification systems most commonly used [4]. Nomograms are graphic tools developed to aid clinical decision making and are well established in clinical practice for prostate cancer, particularly for assisting selection of treatment approaches based on risk stratification. Such tools have been shown to improve prediction of outcomes when compared with clinician judgement alone [5, 6]. Unfortunately most nomograms currently in use are likely to be based on dated treatment modalities. Furthermore predictions based on observations made in one setting may not be accurate in another (e.g., where ethnicity or health services differ). Extrapolation of published international results to local practice is a known pitfall that has potential to mislead both clinicians and patients [7]. These limitations are particularly relevant to predictive tools designed for use in patients treated with radiation therapy as this modality has changed significantly over the past decade.

Objectives

We aim to identify papers predicting clinical outcomes for patients with prostate cancer who have been treated with radiation therapy. We particularly set out to assess if the tools identified were adequately developed, validated and provide accurate predictions.

Methods

Protocol and registration

A systematic literature review protocol was developed for this study and registered before searches commenced with PROSPERO, an international prospective register of systematic reviews. The protocol can be accessed at: http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42015025428.

Inclusion criteria

Papers were eligible for inclusion where they met the following criteria; Population: Patients with prostate cancer. Exposure: Treatment with radiation therapy (including external beam radiation therapy and/or brachytherapy). Outcome: The generation or validation of a tool for the prediction of clinical outcomes (biochemical failure [BF], progression to metastases, prostate cancer specific survival, overall survival). Papers had to be written in English and published post July 2007. This date was chosen as it is the search date up to which a previous systematic review of prognostic tools for prostate cancer treated by any therapy was undertaken [4]. Studies were included which described tools using variables which are currently available in a clinical setting. This excluded papers including genetic or molecular variables.

Information sources

Searches were conducted of the Medline database (PubMed interface) and the EMBASE database.

Search

Disease-specific search terms included: prostate cancer, prostatic neoplasms, cancer of the prostate, adenocarcinoma of the prostate, prostatic cancer, prostate gland cancer and prostate tumour. Treatment specific search terms included: radiation therapy, radiotherapy, external beam radiotherapy, EBRT, brachytherapy, high dose radiotherapy, low dose radiotherapy and targeted radiotherapies. Outcome-specific search terms included: overall survival, progression-free survival, PFS, mortality, event free survival, EFS, disease free survival, prostate cancer specific survival, progression to metastases, time to progression, TTP, biochemical recurrence, BCR, biochemical failure, neoplasm recurrence. Search terms used to identify predictive models included: predictive tools, nomograms, risk stratification, Partin tables, regression tree analysis, Artificial Neural Networks, CAPRA-S or CAPRA score, risk estimates, algorithms, predictive accuracy, diagnostic test accuracy, Kattan tables/nomograms.

Study selection

Study selection included three phases. The titles and abstracts of all studies identified by the search strategy were compared to the inclusion criteria detailed above by two authors working independently (ER and MOC). All studies that appeared likely to meet the inclusion criteria were progressed to full-text review. All discrepancies, where authors reached different conclusions about the same papers, were resolved through discussion. The full-texts of these papers were then retrieved and assessed against the inclusion criteria, again by two authors (ER, JC or MOC) working independently in order to minimise the impact of human error. Studies that were identified as meeting all inclusion criteria were included in the review, while those which did not were excluded. Again, where there were differences in the authors’ conclusions consensus on the correct decision was reached through discussion. Finally, the reference lists of included papers were screened for any additional relevant papers which may have been missed by the search strategy. All new titles identified were then reviewed as described above.

Data collection process and data items

After full text review, data extraction was undertaken by one reviewer (ER, JC or MOC). Items for extraction included: manuscript identifiers (author, contact, country, setting), study methods, population studied (inclusion criteria, exclusion criteria, baseline characteristics – dates of recruitment, age, ethnicity, number of patients, primary treatment, treatment subtype, adjuvant therapies, neoadjuvant therapies), and predictive model characteristics (type of model, variables included, if internal validation was reported and the type, external validation, variable definitions, if variables were readily available, sample size, number of events, definition of outcome, model accuracy, sensitivity, specificity, concordance index and receiver operator curve area under the curve). For assessment as to whether or not variables were considered ‘readily available’ the minimum data set used by the only national prostate cancer registry (Prostate Cancer Outcomes Registry, Australia and New Zealand Australian [8]) was used as a guide.

Quality assessment

Quality assessment was performed by two reviewers (ER, JC or MOC) for each paper. Four questions were selected for this assessment: 1. Was the defined representative sample of patients assembled at a common (usually early) point in the course of their disease? 2. Was patient follow-up sufficiently long and complete? 3. Were outcome criteria either objective or applied in a ‘blind’ fashion? And 4. If subgroups with different prognoses were identified, did adjustment for important prognostic factors take place? These questions were selected from the Centre for Evidence Based Medicine ‘Critical appraisal of prognostic studies’ tool [9]. Discrepancies between reviewers were discussed and consensus reached. Questions that were answered positively >75% of the time were considered to present a low risk of bias, those ≤75 to >50% a moderate risk of bias, and any ≤50% a high risk of bias. Data extraction and quality assessment were performed using the online tool ‘Covidence’.

Results

The search strategy resulted in 165 potentially relevant abstracts/articles and these were reduced to 72 once duplicates were removed and title and abstracts were screened (Fig. 1). The full-text of these papers was reviewed against the inclusion criteria (reasons for exclusion are reported in Additional file 1: Table S1a and b) and 47 finally selected. Study recruitment periods varied considerably with the earliest patients being from 1984 [10] and the latest 2009 [1013] (Table 1). The populations of individual studies varied from 80 [14] to 7,839 [14, 15] with a combined population of 60,457 (Tables 2, 3 and 4). The majority of studies were retrospective (n = 38), however seven studies recruited prospective cohorts (for one study [16] it was not stated whether it was retrospective or prospective).

Fig. 1
figure 1

Flow diagram

Table 1 Summary of papers describing prognostic tools relating to clinical outcomes following radiation therapy (2007–2015)
Table 2 Prognostic tools relating to brachytherapy
Table 3 Prognostic tools relating to external beam radiation therapy
Table 4 Prognostic tools relating to combinations of brachytherapy and external beam radiation therapy

The 47 papers finally included in this review described 97 individual predictive models. Of these models, 16 related to brachytherapy treatment (Table 2), 72 to external beam radiation therapy (Table 3) and nine to a combination of brachytherapy and external beam radiation therapy (Table 4).

Across all radiation treatment modalities, outcomes relating to PSA levels post treatment were most common (39 models) followed by prostate cancer specific mortality (29 models). Measures of metastases (17) and overall survival (14 models) were less common (note that some papers report more than one outcome and model). Of those studies reporting development of new models (66), only nine reported validation either internally or in an additional cohort. Only 67/97 (69%) models included variables which were considered to be readily available in existing data sets.

Critical appraisal considered the criteria set by the CEBM appraisal tool for prognostic studies [9]. Risk of bias ranged from moderate (Q1; Was the defined representative sample of patients assembled at a common point in the course of their disease? (72%), Q2; Was patient follow-up sufficiently long and complete? (64%)) to low (Q3; Were outcome criteria either objective or applied in a ‘blind’ fashion? (85%), Q4; If subgroups with different prognoses are identified, did adjustment for important prognostic factors take place? (91%)) (Table 5).

Table 5 Risk of bias assessment summary table

Brachytherapy

In regards to models predicting outcomes following brachytherapy, Potters et al. [17] report the highest c-index in a model developed and internally validated using a cohort of 5,931 patients. This model predicts 9 year freedom from biochemical failure and remains to be validated externally. Eleven models relating to brachytherapy (69%) did not report model accuracy and among those models which did report accuracy, all related to biochemical failure endpoints. Three studies report to be external validations of the Prostogram nomogram (also known as the Kattan nomogram), all of which have low c-indices (0.49, 0.51 and 0.66) suggesting that this model is of limited clinical utility. A c-index of 1 ‘indicates a perfect ability to rank the outcomes in the order they actually occurred (100% sensitivity and specificity), whereas 0.5 is a purely random ranking and is analogous to the area under the receiver operator characteristic curve’ (definition from [18]).

The majority of papers identified in this review reported models relating to external beam radiation therapy (72/97 = 74%). Fifty-four percent (39 of 72) of these models did not have their accuracy reported. 61% of models did not report validation (either internal or external, including external validation of already published models).

External beam radiation therapy

The model relating to external beam radiation therapy with the highest accuracy was described by Vainshtein [19], which was an external validation of the CAPRA stratification in the context of external beam radiation therapy. The cohort included 374 patients and the endpoint of prostate cancer specific mortality was predicted with c-index of 0.86. Accuracy of this model is also reported for the outcome of biochemical failure and subgroups of patients receiving long term ADT or short term ADT, all which had lower accuracy.

External beam radiation therapy with brachy therapy

Nine models were identified which were specific to patients treated with external beam radiation therapy in combination with brachytherapy. Of these models, five (56%) did not report accuracy. The highest accuracy was reported by Delouya [15, 20] (c-index 0.69) predicting biochemical failure free survival at 2-years. This study was based on a cohort of 744 patients and was an external validation of the CAPRA score. Prediction at 5-years was achieved with c-index 0.62.

Discussion

Since the publication of previous reviews, there has been considerable progress in the field of outcomes prediction following prostate cancer treatment. This review identified 47 papers published between 2007 and 2015, which describe 97 predictive tools for men receiving radiotherapy. This includes 66 models which were newly developed and 31 which were validations of already published predictive tools. Consistent with previous reports, most tools (65%) are yet to be validated in a population outside the derivation set. Studies were included from 2007 as the modality of radiation therapy has changed significantly over the past decade, and historic data may not be a useful basis for prognosis. Apart from modality, the total dose has also significantly increased however, we found that only five studies [13, 16, 2022] did not use data from men treated as far back as the 1990s.

The volume of research carried out in the field of prognostics has exploded over the last decade. A systematic review that included all studies published before July 2007 (the cut-off date for inclusion in the present review) identified 17 studies on prognostic models that related to prostate cancer patients treated with radiotherapy [4]. In this review 39 new studies were identified which investigated prognostic markers for BCF. Unfortunately, the majority of new studies did not undertake validation, mirroring the finding of the previous systematic review. As validation – particularly external validation – is vital for the appropriate clinical implementation of prognostic models, this suggests that resources and efforts are not being efficiently targeted to improve tools available for clinical practice.

With regards to the methodological quality of the literature, our critical appraisal found that overall studies were at low to moderate risk of bias. The greatest risk was created by insufficient follow-up (defined as a mean or median of ≥5 years) which only occurred in 64% of studies. There was also a moderate risk of bias created by the possibility of included patients being at different points in the course of their prostate cancer, however in the majority of cases this was due to insufficient specificity in the description of inclusion criteria as opposed to reported differences. There was little risk of bias created by the measurement of outcomes, as the main outcomes (biochemical failure [various definitions], metastasis, survival) were objective, or by a lack of adjustment for important prognostic factors as the essential factors of prostate cancer prognosis (PSA, Gleason score, and clinical stage) were used nearly universally.

Model accuracy was not reported in 57% of the models included. Model accuracy was reported to be highest in Vainshtein 2014 [23] with a c-index of 0.86 derived for prediction of prostate cancer specific mortality with the CAPRA score (originally established in [24]), including the addition of variables for the presence of Gleason 5 and treatment with ADT (this c-index relates to patients not receiving ADT). This study acts to externally validate the CAPRA scoring system (with modifications) in patients treated with external beam radiation therapy, though this improvement to the score requires further validation in other populations. Of the remaining 42 models which reported predictive accuracy, c indices were typically in the 0.70–0.80 range which would be considered ‘reasonable’ according to Hosmer and Lemeshow [25]. Notably, those papers which did not report external validation typically had higher c-indices suggesting that original model developments should be considered optimistic in their predictive capacity. The lowest c-index (0.49, 95%CI 0.37 to 0.61) was reported for a study [26] performing external validation of the Prostogram nomogram (originally established in [27]) suggesting this nomogram may have little predictive value.

The predictive tools identified in this review included joint-modelling approaches but not neural networks which have featured in previous reviews. This may reflect a change in statistical tools available since publication of earlier catalogues [4]. Two of the survival models [28, 29] did not account for competing risks when predicting prostate cancer specific mortality, a potential weakness which could easily be addressed.

The majority of papers attempted prediction relating to biochemical recurrence, prostate cancer specific mortality or overall survival with a smaller subset predicting metastases. Sixteen of the 97 models identified related to brachytherapy with 72 for external beam radiation therapy and 9 a combination of the two. This could reflect more wide-spread use of external beam radiation therapy, and we might anticipate more tools relating to HDR brachytherapy (with or without EBRT) in the future. There is a dearth of externally validated nomograms focusing on brachytherapy and brachytherapy in combination with external beam radiation therapy particularly looking at overall survival and cancer specific survival outcomes.

This study did not explicitly set out to uncover tools incorporating novel variables, but only those which could be used in current clinical settings. Despite this, 31% of studies included reference to variables which have been less studied to date (e.g. mid-point PSA levels). While such variables may prove useful, there is currently limited opportunity to validate these observations using existing datasets. It is possible that additional variables including standardised measures of comorbidity, imaging features or genetic markers, which are becoming more accessible may help to improve the accuracy of future models. For a recent review of potential molecular and genetic candidate see Hall et al. 2016 [30].

Most predictive tools identified in this review were developed in US populations. This observation should be considered by clinicians who are based outside the US when selecting a predictive model to assist treatment decision making. Where possible, tools validated in a setting similar to one’s own clinical practice should be selected for use. The number of tools available internationally would be increased with additional validation work conducted outside the US and particularly in multi-national cohorts.

We observed a large degree of variation in the quality of reporting clinical predictive tools. This may stem from the fact that authors are not aware of reporting guidelines in the field or indeed that such guidelines exist. The TRIPOD guidelines (http://www.equator-network.org/reporting-guidelines/tripod-statement/) for reporting of multivariable prediction models were published in March 2015, shortly before the cut-off for papers included in this review. These guidelines have been widely endorsed and published in key journals [3139]. Further publication of multivariable models would benefit greatly from adherence to these guidelines.

Conclusions

Tools which aid decision making offer more accurate prediction of clinical outcomes when compared to clinical judgement alone. This understanding has led to a large increase in the number of predictive tools relating to clinical outcomes post radiation therapy between 2007 and 2015. This review identifies 47 papers describing 97 models published in the period, a substantial increase compared to the 17 models previously described between 1966 and 2007. Of the models identified, 65% had no external validation and 57% did not report accuracy. Thirty one percent of models included variables which are not part of typical registry data sets, and are therefore difficult to validate. Despite these limitations, there are accurate and externally validated models for external beam radiation therapy treatment which predict prostate cancer specific mortality. There are fewer models which accurately predict outcomes following brachytherapy (alone or in combination with external beam radiation therapy). This review provides an accessible catalogue of predictive tools which could be used currently (i.e. those with high accuracy after external validation) and identifies those which should be prioritised for future validation.