The evaluation of diagnostic technologies includes assessment of test accuracy and clinical, process or economic outcomes following testing (see additional file 1)[1]. The impact of a test depends on a variety of factors in addition to test accuracy including; interpretation of tests results, the possibility that the new information does not contribute sufficiently to cross a treatment threshold, clinician awareness of availability of cost-effective treatments, lack of patient access to treatments, acceptability of treatments to patients and the possibility that the patient is already receiving optimal care[2].
Once test results arrive, clinicians use the information to make categorise patients into those with and those without disease, known as the diagnostic yield, and then decisions about treatment required – therapeutic yield[1] (see additional file 1). Diagnostic and therapeutic yield should be considered separately from the accuracy of the test. For example, a test could be 100% sensitive and 100% specific but may not have a high therapeutic yield for a variety of reasons listed in the previous paragraph.
Study designs that may be employed for evaluations of diagnostic and therapeutic yield include randomised controlled trials, or non-randomised experimental or observational studies. Randomised trials may be impractical due to large sample size requirements,[1] the speed of technological advances in diagnostics that risks trial results being obsolete and ethical considerations arising from the potential to deny patients beneficial treatments.
An observational study design which allows for evaluation of diagnostic or therapeutic yield is the diagnostic before-after study, (see additional file 2). For this study design in its most basic form, a group of patients undergo an existing test or battery of tests and the therapeutic strategy is noted, depending on the test results. They then have the new test to be evaluated and any change of diagnosis or treatment strategy is noted and compared. The design can be elaborated to include measurement of test accuracy if the new test is not the reference standard, and assessment of patient outcomes following treatment. Diagnostic before-after studies may be retrospective or prospective in contrast to the temporal relationship traditionally implied by before-after evaluation studies.
Diagnostic before-after studies are subject to a number of limitations[2] such as discrepancies between stated clinical assessment and actual clinical action, and possible subconscious bias about the benefits of the new technology – a clinician may delay making a definitive diagnosis if they know that another test is going to be performed. Also there can be no direct comparison of patient outcomes because all have had the new test. However some of the limitations can be overcome by careful planning and conduct of the study. For example using a prospective design may ameliorate review bias, and independently reviewing pre-and post test clinical assessment and strictly adhering to a study protocol may ameliorate discrepancies between stated clinical assessment and actual clinical action.
Observational studies, such as diagnostic before-after studies, are easier and quicker to conduct than RCTs[3]. In addition it is considered that diagnostic before-after studies tend to be biased in favour of new interventions so when no benefit is found, it is unlikely that a stronger study design on the same question, such as an RCT, will find a benefit[2]. Therefore despite limitations, diagnostic before-after studies may have a role in evaluating therapeutic impact of diagnostic tests.
This paper discusses an example of the use of diagnostic before-after studies to evaluate the effectiveness of structural neuro-imaging in psychosis in the context of undertaking a health technology assessment for the NICE technology appraisals programme in the UK. The systematic review underpinning this methodological paper is published as an HTA monograph[4]. The decision problem for the systematic review underlying this work was to evaluate the added value of structural neuro-imaging with CT or MRI compared to current practice alone. Current practice was defined as any test(s) or investigation(s), or any combination of tests that would be carried out as part of the initial care of a psychotic patient to identify brain lesions in two patient groups – acutely psychotic patients and psychotic patients who are treatment-resistant or deteriorating despite treatment[4]. This decision problem can be conceptualised as a before and after comparison of two diagnostic strategies – current practice only and current practice with CT and/or MRI (see additional file 3) where CT and MRI are considered reference standards for the pathology investigated (target disorders). However, unlike most diagnostic yield studies where a single target condition is investigated, this review had several target conditions i.e. any organic disorder with the potential to cause psychosis as well as any treatable organic condition that may coexist with psychosis; including cerebrovascular accident (CVA), various vascular disorders and brain tumours. The best structural neuroimaging method to determine the presence or absence of these conditions varies depending upon the condition. For example, CT is considered better than MRI for diagnosing calcification, whereas MRI is the gold standard for the diagnosis of space occupying lesions. For the purposes of this review CT and/or MRI were considered reference standard tests for the pathologies being investigated and so additional assessment of test accuracy was not considered a necessary component of included studies. The key question to be answered by the systematic review was whether the addition of neuroimaging would affect diagnostic yield, patient management (therapeutic yield) and ultimately patient outcomes.
In this situation an RCT for diagnostic or therapeutic yield would not be useful because multiple conditions were being sought. If patient outcomes such as health-related quality of life and mortality due to undetected treatable conditions were the outcomes measured, the sample size would need to be massive. Therefore the most likely design that would be found in a systematic review would be a diagnostic before-after study.
In this context it is important to know some information about psychosis in order to appreciate the clinical scenario. In 2005–6 there were 41,600 NHS finished episodes and 2,617,500 bed days in England due to psychotic illnesses[5]. Psychosis secondary to a brain tumour is rare. The prevalence of brain tumours in psychiatric patients has been estimated in a review of cross sectional studies of prevalent cases to be approximately 1.2% (using CT scan). However this does not distinguish between psychotic patients with coincidental brain tumours and patients with brain tumours causing their psychotic symptoms[6]. Psychotic patients can develop additional pathology at any time during their life. Structural neuroimaging (MRI and CT scanning) allows non-invasive visualisation of anatomical structure of the brain in order to assist in the diagnosis of intracranial pathology. As it has been estimated that between 4.3–10% of patients have psychological reactions sufficiently severe to require that MRI has to be modified, postponed or cancelled, it is important to know whether subjecting psychotic patients to this procedure is clinically warranted[7].
When conducting the systematic review, we discovered that there was no existing quality assessment tool for diagnostic before-after studies. Therefore, we had to modify a validated quality assessment tool for diagnostic accuracy studies. We describe the modifications that we made to the QUADAS[8] tool in relation to published theory on diagnostic or therapeutic yield studies[2, 3] and our experience of using the modified tool in practice.
Standard systematic review methods were used to find suitable studies to answer the clinical question. The inclusion criteria were any design that gave diagnostic or therapeutic yield, including prospective or retrospective diagnostic before-after studies, reporting the additional diagnostic benefit of structural MRI, CT or combinations of these in patients with psychosis compared to any current standard practice of diagnostic workup without structural neuroimaging. An added complication was whether there were any symptoms and signs of a space occupying lesion or not in patients in the included studies. In the included studies, diagnostic tests conducted before or in addition to structural neuroimaging were often not detailed well but, when described, were a variety of medical and psychiatric history, physical and neurological examinations, biochemical tests, blood tests, toxicological screens, mental state examinations, EEG and psychiatric rating scales. Only studies reporting clinically relevant outcomes were included in the review, such as the proportion of patients with scans identifying pathology that would influence patient treatment (therapeutic yield) and patient outcomes that were not suspected from history and/or physical examination.
Standard systematic review methods include quality assessment of included studies. Quality assessment tools for primary studies of test accuracy are relatively well developed,[8] although only one is validated – QUADAS[8]. Recent draft NICE Guide to the Methods of Technology Appraisal (2007) suggests the QUADAS quality assessment tool "as a useful starting point for appraising studies that evaluate the sensitivity and specificity of a test" but no guidance is provided on quality assessment of diagnostic before-after studies. This lack of a validated quality assessment tool appears not to have been noticed up to now, perhaps because there are very few systematic reviews of diagnostic or therapeutic yield studies. However, it is likely that in the future NICE will be appraising more devices and diagnostic tests (Personal Communication, Carole Longson, NICE, December 2008).