Introduction

Prostate cancer (PCa) is the most common non-cutaneous cancer in men in the Western world1. A prostate tissue biopsy is a key step in the diagnosis of PCa. However the decision to refer a patient for a biopsy is challenging, as TRUS biopsies are associated with significant morbidity2. Clinicians usually base this decision on serum prostate specific antigen (PSA), abnormal digital rectal exam (DRE)3 and increasingly multiparametric magnetic resonance imaging (mpMRI) as well as other factors, such as family history and previous biopsy results. PSA lacks specificity1 and has led to over-diagnosis and over-treatment of clinically indolent disease and a large number of unnecessary biopsies in men4 and the mpMRI PROMIS trial did show that there is still a chance of missing clinically significant disease at PI-RADS scores of 1 or 25.

Improved detection methods for high-grade significant disease that would reduce unnecessary biopsies are highly sought. Risk stratification of men suspected of PCa and high-grade significant disease would allow clinicians and patients to make a more informed decision on whether or not to biopsy. Risk calculators that utilise patient clinical data have already been developed for cardiovascular disease6 and stroke7. There are several guidelines which suggest a risk adapted approach that considers clinical information along with serum PSA should be used to predict PCa risk8. Previous risk calculators have been developed and assessed such as the European Randomised Study of Screening for Prostate Cancer Risk Calculator (ERSPC-RC)9 and the Prostate Cancer Prevention Trial Risk Calculator (PCPT-RC)10,11 in large multi-institutional cohorts. However, the accuracy of the risk score nomograms to detect high grade cancer (Gleason score ≥ 7) are ~ 69–79% for both PCPT and the ERSPC.

The use of multiple serum biomarkers may be the key to improving the identification of significant disease. Commercial tests that utilise serum biomarkers are already on the market, such as the Prostate Health Index (PHI)12, which comprises total PSA, free PSA and [-2] proPSA13 and the 4 K score, which assesses total PSA, free PSA, intact PSA, and human kallikrein-related peptidase 214. Although promising, these commercial tests are currently not widely routinely used, as there is still uncertainty as to their utility and interpretation in a clinical setting. Previous studies have shown that the inclusion of serum and Urine biomarkers improves on risk calculators. Our own studies have shown that the addition of the PHI score to an Irish clinical risk calculator improved the accuracy of the Irish risk calculator15. The urinary biomarkers prostate cancer antigen 3 and the gene fusion product of transmembrane protease serine 2 with the transcription factor v-ets erythroblastosis virus E26 oncogene homolog (TMPRSS2-ERG) also improved the accuracy of ERSPC-RC16. It is clear that the use of multiple biomarkers improves on the accuracy of risk calculators.

It is well recognised that inflammation plays a causal role in the development of several types of cancer17 and there is direct evidence of an inflammatory microenvironment18 and higher inflammatory marker levels effecting a greater PCa risk19. This environment is associated with impaired differentiation of prostate epithelial cells20 and aberrant basal to luminal differentiation promoting cancer initiation21.

The aim of this study was to investigate the utility of inflammatory serum biomarkers combined with clinical information for the detection of (i) PCa and (ii) high-grade PCa in patients that are suspected of having PCa.

Materials and methods

Patient cohort and sample collection

The study cohort consisted of 436 Caucasian Irish men referred for a TRUS biopsy on the basis of an elevated PSA and/or abnormal DRE between April 2012 and June 2016. Blood samples (9 mL) were collected in a serum separator tube prior to biopsy and processed within 3 h of collection. Samples were centrifuged at 1500×g for 15 min at room temperature. Serum (~ 3 mL) was removed and stored at − 80 °C until further analysis. Patients were classified as either biopsy-negative (having no detectable PCa) or biopsy-positive (having detectable PCa) and further sub-divided into low-grade (Gleason score 6) and high-grade (Gleason score 7 or above)15 disease. The clinicopathologic details of the cohort are summarised in Table 1.

Table 1 Clinical Features of the Patient Cohort.

Sample collection and processing were ethically approved by the St James Hospital and Mater Misericordiae University Hospital ethics committees. The patient information leaflet and consent form were written and constructed in line with best practice and the EU Data protection Directive and Data protection Acts 1988 and 2018 and approved by the two ethics committees. All patients gave written informed consent agreeing to participate in the study. All steps were carried out in accordance with national guidelines and regulations.

Biomarker analysis

In total, analysis of 20 biomarkers was performed. The Evidence Investigator platform (Randox) uses sandwich chemiluminescent immunoassay methods for the simultaneous detection of multiple analytes. Two multiplexed Evidence Investigator biomarker panels were employed; the Cytokine and Growth Factors High Sensitivity Array (Cat. No. EV 3623) for IL-1α, IL-1β, IL-2, IL-4, IL-6, IL-8, IL-10, VEGF, IFN-γ, TNF-α, MCP-1 and EGF22 and the Adhesion Molecules Array (Cat. No. EV3519) for VCAM-1, ICAM-1, E-Selectin, P-Selectin, L-Selectin23. Biochip analyses were performed according to the manufacturer’s instructions. Briefly, serum samples diluted where appropriate, reconstituted calibrators and assay specific quality controls utilised in duplicate were incubated on the biochip. Following washing, detector conjugate solution was applied to the biochip and binding was revealed using chemiluminescent detection. Concentration of all analytes in the samples were calculated using a nine-point calibration curve by the Evidence Investigator analyser. Individual assay runs were deemed to have passed if the measured values for the quality control samples were within the specified range for each of the target values for each analyte, as per kit instructions. Values measured below the lower limit of detection were taken as zero.

IL-18 was measured in all serum samples by ELISA (Cat. No. ILE10068, Randox) according to the manufacturer’s instructions. Serum samples, calibrators and quality control samples were added to each well in duplicate. The IL-18 standard provided in the kit was reconstituted in deionised water to make up the calibrators and the two quality control samples (312.5 pg mL−1 and 56.25 pg mL−1). Total PSA and Free PSA were determined in all samples using the Roche COBAS 8000 system according to manufacturer’s instructions at Randox Clinical Laboratory Services (Antrim, UK).

Even though PSA values were available for each patient we included its analysis as the patients were recruited from different clinics and the pre-biopsy PSA values attained using various platforms. Therefore, serum PSA of the patients were reanalysed on the single platform.

Clinical information: Age, Family history, DRE and prior negative biopsy were collected as part of the study from the patients chart or at time of recruitment.

Statistical analysis

Basic analysis of patient information

Basic statistical analysis of the study population’s characteristics was performed using GraphPad Prism (ver. 5.0). Descriptive statistics were performed in the dataset, which was divided into those with and without a PCa diagnosis and high-grade PCa (> = Gleason 7) versus all other patients. The unpaired Student’s t-test and the Wilcoxon Rank test were used to investigate the significant difference in means and medians of continuous variables, respectively. Pearson’s chi-squared test was also performed to studying the significant difference for categorical variables.

Risk calculator model development and performance

Development of the risk calculator for the prediction of PCa and high-grade PCa were performed in R software version 3.4.324. Logistic and multinomial regression methods were used to model the linear and nonlinear effects of serum biomarkers combined with clinical information (age, DRE, family history of PCa, previous negative biopsy). These two modelling strategies are considered as relevant approaches to stratify patients to high-grade PCa, low-grade PCa or those without PCa. The stepwise method was applied as the variable selection technique to integrate potentially relevant biomarkers into the risk calculator. In both methods, the probabilities for each patient were modelled through the log odds of risk factors which were then transformed into probabilities and assigned a percentage risk for each patient. Internal validation is built into the cross-validation approach to prevent overfitting of the data by using tenfold cross validation.

The final models for diagnosis of PCa and/or high-grade PCa were compared to the Irish prostate risk calculator (IPRC) which has been previously developed and outperformed the available risk calculators in the Irish population25. Accuracy of the models was determined using the area under the curve (AUC) calculated from the Receiver Operator Curve (ROC) by plotting the sensitivity and specificity at each of its risk thresholds. Comparison of ROC curves took place via the method described by DeLong et al.26. Decision-curve analysis was undertaken to examine the potential net benefit of the application of each model over the benefit offered by the strategies of performing a biopsy in all patients and performing a biopsy in none27. Calibration plots were plotted to represent the agreement between the observed incidence of cancer visually and predicted risk28. The Chi-Square Hosmer–Lemeshow test was used to assess the goodness of fit of models, where a p < 0.05 indicates a poor agreement between the predicted risk and observed incidence of cancer and a poorly calibrated model.

Results

Baseline cohort characteristics

This was a retrospective biomarker study intended to improve the detection of clinically relevant disease. The clinical endpoint of the study was the histopathological findings from the TRUS biopsy. The study cohort consisted of 436 patient biopsies, of which 211 (48%) were diagnosed with PCa with different Gleason scores (Table 1).

In Table 1, the univariate effects of 'DRE' and 'Previous negative biopsy' were statistically significant in detecting PCa, and 'age' for detecting high-grade PCa. The effect of 'PSA' was also significant in both cases. This implies that if patients are older, have higher PSA, abnormal DRE or did not have a previous negative biopsy, (on average) they have more chance of PCa and high-grade PCa.

Statistical modelling

Descriptive analysis of all biomarkers assessed is presented in Table 2.

Table 2 Descriptive analysis of serum biomarkers (median and interquartile range) grouped by biopsy and grading outcomes.

Integrating serum biomarkers with the clinical risk factors using multinomial and logistic models identified 8 biomarkers (TNFα, VEGF, IL1α, IL1β, ICAM-1, E-selectin, P-selectin, L-selectin) with 4 (IL1α, IL1β, E-selectin, P-selectin) biomarkers identified in both models, to confer significant additional predictive ability. Two separate risk calculators were developed to predict PCa and high-grade PCa using a multinomial model (Multi-RC) and a logistic model (Logst-RC) where Table 3 presents the models and Table 4 evaluates the model performances. We also built models using the biomarkers alone for both the multinomial model (Multi-bio) and a logistic model (Logst-bio) presented in Table 4 which showed no significant improvement over the clinical model (IPRC).

Table 3 Summary of Multi-RC and Logst-RC models using odds ratio, standard error and p-value for each risk factor in the model.
Table 4 The discriminative ability of IPRC, Multi-bio, Logst-bio, Multi-RC and Logst-RC using the areas under the curve (AUC) and 95% confidence interval of the calculated probabilities.

The odd ratios of the Multi-RC model for detecting low-grade (column A) and high-grade PCa (column B) compared to not detecting PCa are presented in Table 3. We combined odd ratios of low grade and high grade PCa to evaluate the performance of the Multi-PC model for detecting PCa and High-grade PCa and these are presented in Table 4 and show a significant improvement above the IPRC model. The model variables consist of Age, DRE, Family History, previous biopsy, PSA, TNF-a, IL-1a, IL-1b, ICAM-1, E-Selectin, P-Selection, Free PSA (FPSA) and Free to total PSA (FTPSA).

The odd ratios of the Logst-RC model for detecting PCa compared to not detecting PCa (column C) and detecting high-grade PCa compared to low-grade or not detecting PCa (column D) are presented in Table 3. The model performance for detecting PCa and High grade PCa are presented in Table 4 and showed a significant improvement above the IPRC model. The model variables consist of Age, DRE, Family History, previous Biopsy, PSA, VEGF, IL-1a, IL-1b, E-Selectin, P-Selectin, L-Selectin and FPSA.

To give some insight into the clinical significance of the study, we selected thresholds manually based on the Youden index criteria. Using the threshold of 0.3 for high-grade Logst-RC (and the threshold of 0.275 for high-grade Multi-RC) with would have resulted in saving 71.2% (72.6%) of the biopsies at the cost of delaying the diagnosis of 27.9% (33.8%) of the high-grade cancers. The negative predictive value of the test results below this threshold would be 0.833 (0.828).

Model performance

Table 4 represents the discriminative abilities of both risk calculators for the diagnosis of PCa and high-grade PCa using AUC. Multi-RC showed an AUC of 0.7126 and 0.7671 and Logst-RC an AUC of 0.7308 and 0.7847 for diagnosis of PCa and high-grade PCa respectively. This significantly improved the predictive ability of the IPRC model, as demonstrated in the ROC in Fig. 1A,D.

Figure 1
figure 1

The receiver operating characteristic (ROC) curves (A,D) and decision curves (B,E) represent the discriminative ability of IPRC (green), Multi-RC (red) and Logst-RC (blue) in diagnosis of cancer (A,B) and high-grade cancer (D,E). Calibration curves are represented in (C,F).

Figure 1 shows the decision curve analyses of the clinical utility of both models in detecting PCa (Fig. 1B) or high-grade PCa (Fig. 1E). For detecting PCa there was an improved net benefit for the threshold ranges of 0.35 to 1.0 and for detecting high-grade PCa there was an improved net benefit for the threshold ranges of 0.15 to 1.0 compared to the IPRC—clinical model alone. The calibration curves (Fig. 1C,F) show good agreements between predicted probabilities and the actual outcome indicating that all models are well calibrated, which have been confirmed by the (non-significant) Hosmer–Lemeshow results.

Integrating the panel of serum biomarkers with the clinical risk factors outperform the previously developed Irish risk calculator25. Logst-RC has shown slightly higher improvement when internally validated; however, further validation in an independent cohort will be required in order to confirm improvements and identify the most appropriate model and could be employed to select the best clinically accepted threshold to be used in practice.

Discussion

In this study, we have utilised a retrospective approach to show that the integration of inflammatory serum biomarkers into the clinical risk factors significantly improves the discriminatory power and clinical utility of the clinical risk factors alone for PCa and high-grade PCa (Gleason Score ≥ 7). This suggests that the Multi-RC or Logst-RC models would improve the detection rate and/or reduce unnecessary biopsies compared to the IPRC risk calculator based on clinical features alone. These models demonstrated consistently higher net benefits over different preferences of wanting or avoiding a biopsy29 and following further validation and threshold selection could have clinical utility.

Chronic inflammation is associated with the development of many cancers including PCa and is possibly playing a role in its formation and development30. In the current study we identified a number of inflammatory mediators that increased the prediction of PCa and high-grade PCa when integrated with the current clinical features compared to clinical features alone. These included TNF-α, VEGF, IL-1α, IL-1β, ICAM-1, E-Selectin, P-Selectin and L-selectin. There is evidence in the literature that some of these mediators are associated with tumour development and progression including PCa. VEGF has been shown to be overexpressed in patients with colorectal31 and PCa. Fryczkowski et al. demonstrated that VEGF concentrations were significantly higher in the PCa groups compared to the BPH patient group however on multiple logistic regression analysis VEGF was not an independent predictor of PCa and did not add to the clinical features alone32. Soluble adhesion molecules ICAM-1 and the selectins have been shown to be increased in Breast33 and colorectal cancer31 but there is no evidence in PCa to date. TNF-α levels have also been correlated with disease stage in breast cancer34 but there is no evidence that TNF-α serum levels are associated with high grade of lethal PCa at the time of diagnosis of localised disease as well as IL-1α and IL-1β35. The power of our study is that we evaluated a number of inflammatory serum mediators and built a model selecting the biomarkers that gave the best prediction of PCa and high-grade PCa.

The multinomial regression modelling approach identified a single combination of biomarkers for the risk assessment of PCa and high-grade PCa. However, two different sets were selected to estimate the risk of PCa and high-grade PCa in the logistic regression approach. The use of logistic regression helps to access the partial effect of the biomarkers on either detecting PCa or high-grade PCa, while the use of multinomial regression reduces the standard errors36. Both of these methods are employed in previous studies, including, the European risk calculator (ERSPC-RC9) used the logistic regression approach, and two American risk calculators (PCPT-RC10 and PBCG-RC11) are developed using the multinomial regression.

The use of a logarithm transformation for some biomarkers in the model (e.g. IL-1β) represents a nonlinear effect of the biomarker on the risk, which indicates that a small change in the biomarker is critical. In contrast, the linear effect of some biomarkers in the model (e.g. E-selectin) represents that a change in any value of the biomarker has the same importance. The use of both linear and logarithm effects of the biomarkers in the model (e.g. IL-1α) indicates that, although any change in the biomarker is important, a small change in the values of the biomarkers are more critical.

A limitation of the study is not having PSA density as a variable which is part of the ERSPC-PC8. We did not have access to the prostate volume data for this study at the time of patient recruitment as the Irish health care setting did not facilitate the collection of prostate volume until the TRUS biopsy was carried out.

Conclusion

Our study has demonstrated that as both models are well calibrated and utilise variables that are available from the patient (Age, Family history, DRE and Previous Biopsy) and assessed from their blood sample they are appropriate for individualized risk assessment. Both models show a statistically significant improvement above the IPRC justifying the addition of the serum biomarkers and their clinical use. Selecting the best model requires additional validation cohorts which would be used to independently validate and identify the best model and select the appropriate thresholds which are clinically accepted and maximize their discrimination and clinical benefit.