Introduction

There is increasing interest in developing the quantitative imaging of biomarkers in personalised medicine. Biomarkers are defined as “characteristics that are objectively measured and evaluated as indicators of normal biological processes, pathological processes, or pharmaceutical responses to a therapeutic intervention” [1]. Broadly, biomarkers fall into two categories: bio-specimen biomarkers, including molecular biomarkers and genetic biomarkers, and bio-signal biomarkers or imaging biomarkers. Bio-specimen biomarkers are obtained by removing a sample from a patient. Examples of these molecular biomarkers are genes and proteins detected from fluids or tissue samples. Bio-signal biomarkers remove no material from the patient, but rather detect and analyse an electromagnetic, photonic or acoustic signal emitted by the patient [2]. These imaging biomarkers have the advantage of being non-invasive, spatially resolved and repeatable [3]. They are of particular interest if they can overcome the limitations of the established histological “gold standards”. Indeed, invasive reference examinations, such as biopsy, can be inconclusive, are non-representative of the whole tissue (which is a tremendous limitation when assessing malignant tumours, which are known to be heterogeneous) and possess non-negligible levels of mortality and morbidity.

Genetic biomarkers indicate whether a disease may occur, but they are usually inefficient to assess the presence and stage of a disease. Similar to molecular biomarkers, imaging biomarkers can be used for early detection of diseases, staging and grading, and predicting or assessing the response to treatment [3]. Accordingly, because of their relative lower cost compared with imaging, molecular biomarkers may be more appropriate for disease screening and early detection than imaging biomarkers. With their high sensitivity, molecular biomarkers could also detect subclinical stages of disease before any morphological or functional change is detectable on imaging. In contrast, imaging biomarkers are often more useful than molecular biomarkers for disease staging, and also grading and for assessing tumour response, because localised information is crucial.

Similar to new drugs, the development of biomarkers has to pass along a pipeline going from discovery, through verification in different laboratories, validation and qualification before they can be used in clinical routine. Validation includes the determination of the accuracy and the precision (reproducibility) of the biomarker and standardisation concerns both acquisition and analysis. Qualification, defined as a “graded, fit-for-purpose evidentiary process linking a biomarker with biological processes and clinical end-points”, is a validation process in large cohorts of patients involving multiple centres, similar to phase III clinical trials, to obtain regulatory approval as surrogate endpoints [4]. A more extensive path to biomarker development has been reported [5]. The first step is the proof of concept, which defines any specific change relevant to the disease that can be studied using the available imaging and computational techniques. The relationship between this change and the presence, grading and response to treatment of the disease constitutes the proof of mechanism. The images needed to extract the biomarker must be appropriate (in terms of resolution, signal and contrast behaviour). Preparation of images relates to improving the data before the analysis (such as segmentation, filtering, interpolation or registration). The analysis and modelling of the signal by computational numerical adjustment of a mathematical model allow extracting the needed information (such as structural, physical, chemical, biological and functional properties). After this voxel-by-voxel computation, the spatial distribution of the biomarker can be depicted by parametric images, defined as derived secondary images which pixels represent the distribution values of a given parameter. Multivariate parametric images obtained by statistical modelling of the relevant parameters allow the reduction of data and a clear definition of the defined disease target. The abnormal values should be defined and measured through histogram analysis. A pilot test on a small sample of subjects, with and without the disease, has to be performed to validate the process—also called proof of principle—and to evaluate the influence of potential variations related to age, sex or any other source of biases. Finally, proofs of efficacy and effectiveness on larger and well-defined series of patients will show the ability of a biomarker to measure the clinical endpoint (Fig. 1).

Fig. 1
figure 1

Steps for the development of imaging biomarkers (adapted from [5])

Accuracy

Before being routinely used in the clinic, imaging biomarkers must be validated. Determining the accuracy implies calculating the sensitivity and specificity of the biomarker when compared with a biological process, such as tumour necrosis, which can be assessed at histopathological examination.

This validation process is challenging because changes in tissue properties due to diseases that are measured by imaging, such as the diffusion coefficients at DW-MRI or the mechanical properties at MR elastography, are only indirectly linked to structural changes such as necrosis, cellularity, fibrosis and vascular architecture. Moreover, the functional properties that are measured may be influenced by other co-existing factors, such as inflammation, perfusion, permeability and interstitial pressure. For example, the apparent diffusion coefficient (ADC) is decreased in chronic liver disease. This ADC decrease has been shown to be influenced by increased fibrosis, inflammation and steatosis, as well as by decreased perfusion [69]. Equating what is measured by imaging and what is occurring at the cellular level in tissue is a difficult task because our understanding of the biophysical underpinnings of many imaging biomarkers, such as diffusion measurements of in vivo systems, remains partial [10, 11].

To help in this understanding, pre-validations studies are conducted in animal models of the disease of interest, where histopathological analysis and other invasive reference examinations can be easily conducted [12]. More basic ex-vivo research in tissues, phantoms or theoretical models may also help in the understanding of the relationship between signal formation and underlying physiopathology [13]. The transition to the patient has then to be realised, and the biomarker once again validated using small-cohort then large-cohort clinical studies.

The ultimate goal for an imaging biomarker is to understand its predictivity so well that it can become a surrogate for clinical outcome. One primary end-point in therapy assessment studies is patient survival. No imaging biomarker, even the familiar “response evaluation criteria in solid tumours” (RECIST) [14], universally employed in oncology drug development, is widely accepted as surrogate for survival. The RECIST criteria can be used to define time to progression, but increases in time to progression as a result of therapy is not necessarily a surrogate of improved overall survival, as shown by the avastin (bevacizumab) story [14]. In 2011, the FDA withdrew approval for the combined use of avastin and chemotherapy for the treatment of metastatic breast cancer because preliminary licensing was predicated on future demonstration of improvement in survival or quality of life, both of which were not forthcoming when clinical trials were completed.

Surrogacy can only be reliably established with a large number of adequately powered clinical studies using a variety of interventions, and with the aid of meta-analyses. This is a daunting goal, which constitutes the very last step in biomarker qualification [2].

Reproducibility

Repeatability (measurements at short intervals on the same subjects using the same equipment in the same centres) and reproducibility (measurements at short intervals on the same subjects using different facilities in the same and different centres) studies must be conducted for image acquisition and image analysis. These studies have to be performed with the same observer (intra-observer variability) and with different observers (inter-observer variability). Repeatability and reproducibility are particularly important to assess if the imaging biomarkers are to be used in longitudinal studies; for example, for treatment follow-up, to ensure that the changes in parameter are caused by a response to treatment and not by inherent technical or physiological variation. The reproducibility will affect the diagnostic usefulness of the biomarker. As an example, it is known that perfusion parameters are markedly variable between subjects. Therefore, it has been reported that post-therapy decrease of Ktrans should at least be in the 30–50 % range to represent a significant therapy-induced change, whereas for ADC at DW-MRI a change of 10–20 % would be sufficient [15]. Reproducibility studies are now very often included in scientific papers, as advised by the “standards for reporting of diagnostic accuracy” (STARD) criteria and should ideally include Bland-Altman plots and results of coefficients of repeatability [16, 17].

Standardisation

Standardisation relates to the establishment of norms or requirements about technical aspects. In the development of imaging biomarkers, two main aspects should be considered.

  • Standardisation of image acquisition: similar acquisition parameters should be used across imaging platforms, when these parameters affect the results of the biomarker. For example, the calculation of ADC depends on the number and choice of the gradient “b” values. A collaborative paper by Padhani et al. [18] lays the foundation for acquisition standardisation, notably by recommending that monoexponential assessments of ADC should use two b values above 100 mm2/s.

    Moreover, DW-MRI is very sensitive to motion. Motion correction schemes are thus advised for DW-MRI acquisition. However, it is still unclear which scheme is optimal. As an example for upper abdominal studies, some consider that free breathing acquisition produces reliable enough data, even with a better reproducibility than breath-hold, and that a respiratory-triggered scheme produces less reproducible data, while others recommend using tracking-only navigator techniques [1921].

  • Standardisation of image analysis: volume and region of interest (ROI) determinations and parameter calculation (mathematical models) should be standardised. In tumour perfusion imaging, it has been shown that the ROI placements in the vascular input and in the tumour influence the results and reproducibility of the parameter measurements [22]. To take motion into account, rigid and non-rigid registration of images at different time points can be used. In heterogeneous lesions such as tumours, imaging biomarkers are frequently calculated as parametric maps with spatial resolution. We need to define how to handle the histogram that displays the obtained values. Descriptive statistics such as mean value, standard deviation, and range can be directly obtained from the histogram. The main drawback with this approach is the clear tendency to underestimate the changes in body tissues and organs, since the values indicative of disease, or its most relevant manifestations, are minimised. For this reason, percentiles are used in some settings to obtain a better relationship with the most relevant predictive clinical variables. The optimal type of approach must be defined for each problem (complete histogram, partial histogram in quartiles, partial histogram in deciles). A further approach involves the analysis of the heterogeneity in the spatial distribution of a biomarker provided by its parametric image. To this end, some distribution asymmetry statistics such as kurtosis can be used [2326]. Finally, the choice of the mathematical model that is used to calculate the quantitative parameters has also a major influence on the results that are obtained [27, 28]. Standardisation procedures are currently being developed [18, 29, 30]. It is important that standardisation be a collaborative effort of academia and industry. Standardisation of data reporting should also be performed. For example, to describe the liver elasticity in cirrhosis, different units (Young modulus in kPa, shear modulus in kPa, wave speed in m/s) and different cut-off values are currently used [3133]. Standardisation of these data would improve the communication between research groups.

Quality control

Adequate phantoms could be used to validate, on a day-to-day basis, that the biomarker stays robust and to avoid any drift in the machine, acquisition or processing protocol. The advantage of using phantoms is that the sequence can be optimised in detail before being performed in patients (which is particularly adapted to CT studies to limit the radiation imposed on the patient), and distribution of the same phantom across imaging platforms allows control of the quality and standardisation of the procedures. Multicentre quality control studies have already been conducted using a simple, ice-water filled, DW-MRI phantom containing tubes of solutions of known diffusion coefficients, which allowed for comparing machines and centres [34]. For ultrasound and CT, phantoms ranging from simple gels with inclusions of different shapes and sizes (for control of tumour size measurement) to complex thoracic models including vasculature inserts (to test perfusion acquisitions) are available [30, 35]. Mechanically-induced motion of these phantoms can also be realised [36]. Another possibility is to simulate images based on computerised phantoms [37]. This computerised phantom dataset can even incorporate deformation information mimicking respiration of patients [23].

Clinical use

When imaging biomarkers are validated for use in drug development studies or clinical trials, several additional points should be considered. First, the imaging biomarker should bring new information on top of existing diagnostic tools or existing risk factors and have the potential to modify the patient management [38]. The coronary artery calcium score, one of the most evaluated cardiovascular imaging biomarkers, is not only associated with the risk of future cardiovascular events but it improves the traditional classification of risk by shifting patients from intermediate to high risk categories [39]. It is likely that a panel of biomarkers will be required to achieve the high accuracy required at the clinical level.

Second, the imaging biomarker should be completely non-invasive, for not losing the advantage of safe imaging methods over invasive reference examinations. Third, the imaging biomarker should be cost-effective. If the biomarker is to be added as part of the clinical routine examination, and not to further burden the public health system with increased costs of care, its diagnostic advantages have to offset its cost. The imaging biomarker also should be easy to implement in the clinic, meaning that the machinery must already exist or be easily available, that there should not be the need for specific expertise from hospital employees, and that the parameter must be easy to measure and interpret. Few guidelines currently exist for imaging biomarker use [40, 41]. Together with other agencies, guidelines, evaluation and implementation may be an important task for the biomarkers subcommittee of the ESR.

Biomarkers have also a potential for the industry as pharmacodynamic markers and even surrogate endpoints for targeted clinical phase I to III studies [42]. Development of new biomarkers was identified as the highest priority for scientific effort by the FDA to ease the marketing of newly developed drugs [43].

Development of new biomarkers

When seeing the difficulties in the qualification and standardisation of existing imaging biomarkers, is there a need to develop additional ones? The answer is yes; for example, in the field of oncology, where the palette of reasonably well-understood biomarkers, has major gaps. The hallmarks of cancer include sustaining proliferative signalling, evading growth suppressors, resisting cell death, enabling replicative immortality, inducing angiogenesis, activating invasion and metastasis, reprogramming of energy metabolism and evading immune destruction [44]. Regarding angiogenesis, there are useful biomarkers utilising MRI, CT, ultrasound or PET. For drugs affecting the deregulated cellular energetics of the Warburg effect, FDG-PET offers an obvious assessment. For cellularity, proliferation and apoptosis, a joint public-private partnership between the EU and pharmaceutical companies called “Quantitative Imaging in Cancer: Connecting Cellular Processes (QuIC-ConCePT)” is currently devoted to the validation of imaging biomarkers, namely ADC at DW-MRI, [18F]30-deoxy-30-fluorothymidine PET (FLT–PET) and isatin-5-sulphonamide PET ([18F]ICMT-11), an apoptosis radiotracer with subnanomolar affinity for caspase-3 [12, 45, 46]. However, we currently do not have good markers for activation of invasion and appearance of metastasis before these events become macroscopically evident. Thus, development of new imaging biomarkers is still needed.

The European Society of Radiology and its related European Institute for Biomedical Imaging Research (EIBIR) should have a relevant role in coordinating future developments of biomarkers and in the assessment and validation of imaging biomarkers as surrogate end points.