Observer reliability of CT angiography in the assessment of acute ischaemic stroke: data from the Third International Stroke Trial

Introduction CT angiography (CTA) is often used for assessing patients with acute ischaemic stroke. Only limited observer reliability data exist. We tested inter- and intra-observer reliability for the assessment of CTA in acute ischaemic stroke. Methods We selected 15 cases from the Third International Stroke Trial (IST-3, ISRCTN25765518) with various degrees of arterial obstruction in different intracranial locations on CTA. To assess inter-observer reliability, seven members of the IST-3 expert image reading panel (>5 years experience reading CTA) and seven radiology trainees (<2 years experience) rated all 15 scans independently and blind to clinical data for: presence (versus absence) of any intracranial arterial abnormality (stenosis or occlusion), severity of arterial abnormality using relevant scales (IST-3 angiography score, Thrombolysis in Cerebral Infarction (TICI) score, Clot Burden Score), collateral supply and visibility of a perfusion defect on CTA source images (CTA-SI). Intra-observer reliability was assessed using independently repeated expert panel scan ratings. We assessed observer agreement with Krippendorff’s-alpha (K-alpha). Results Among experienced observers, inter-observer agreement was substantial for the identification of any angiographic abnormality (K-alpha = 0.70) and with an angiography assessment scale (K-alpha = 0.60–0.66). There was less agreement for grades of collateral supply (K-alpha = 0.56) or for identification of a perfusion defect on CTA-SI (K-alpha = 0.32). Radiology trainees performed as well as expert readers when additional training was undertaken (neuroradiology specialist trainees). Intra-observer agreement among experts provided similar results (K-alpha = 0.33–0.72). Conclusion For most imaging characteristics assessed, CTA has moderate to substantial observer agreement in acute ischaemic stroke. Experienced readers and those with specialist training perform best. Electronic supplementary material The online version of this article (doi:10.1007/s00234-014-1441-0) contains supplementary material, which is available to authorized users.


Introduction
Non-contrast CT (NCCT) is the most widely available imaging modality for assessing patients with acute stroke. Many centres now also perform CT angiography (CTA) as part of their stroke imaging protocol [1]. However, only limited observer reliability data exists for the reporting of CTA in acute stroke [2,3]. A recent consensus statement on angiography grading standards for acute ischaemic stroke recommended that further reliability studies should be performed [4].
The Third International Stroke Trial (IST-3) was a multicentre, randomised controlled trial in 3035 patients that tested whether intravenous recombinant tissue plasminogen activator (rt-PA), given within 6 h of ischaemic stroke, improved functional outcome at 6 months [5]. Standardised brain imaging (predominantly NCCT) was mandatory for all IST-3 patients prior to randomisation in the trial. In some centres, CTA was also routinely obtained.
Using CTA to assess intracerebral arterial patency is limited by the lack of a grading scale developed specifically for cross-sectional imaging [6]. To date, most trials incorporating CTA have used one of the catheter angiography scales, e.g. Thrombolysis in Cerebral Infarction (TICI) [7]. However, there are two reasons why application of catheter angiography scales to CTA (or MR angiography) without modification, is problematic: 1) catheter angiography scales assess distal tissue perfusion, but perfusion is not appreciable on CTA (unless it is time resolved) [6] and 2) catheter angiography scales conflate features of cerebral arterial patency, flow and perfusion into one scale thereby potentially increasing sources of observer disagreement. Unlike catheter angiography, standard CTA provides only a snapshot in time and can only be used to assess arterial patency rather than flow. A new IST-3 angiography score was developed in an attempt to overcome the limitations of applying catheter angiography scores to CTA. The IST-3 angiography score aims to assess only those characteristics of angiography that are identifiable on CTA, especially arterial luminal patency at the main point of occlusion [6].
Our primary aim was to investigate inter-and intraobserver reliability of expert readers assessing CTA in acute ischaemic stroke. We also sought to establish how lessexperienced readers perform and to evaluate a new CTA grading scale, the IST-3 angiography score.

Materials and methods
The Third International Stroke Trial IST-3 was an international, multicentre, prospective, randomised, open, blinded endpoint (PROBE) trial of intravenous rt-PA in acute ischaemic stroke. Enrolment, data collection and image analysis have been fully described elsewhere [6,8]. Briefly, patients with acute stroke of any severity were eligible for inclusion in the trial if intravenous rt-PA (alteplase) could be started within 6 h of known stroke onset, and CT or MR imaging had reliably excluded both intracranial haemorrhage and any structural stroke mimic. Patients were aged 18 years or above with no upper age limit. Informed consent was obtained from all patients. IST-3 is registered, ISRCTN25765518.

Scan acquisition and management
Prior to joining IST-3, all centres had to submit a test scan to ensure adequate image acquisition parameters. Minimum acquisition standards were specified in the trial protocol and centre participation criteria. All scans were checked against quality assurance standards centrally. Due to the large number of participating sites, CT scans were inevitably obtained from many different scanners (and generations of scanner) across the trial.
In centres where CTA was routinely performed in the assessment of acute ischaemic stroke, these images were also submitted to the IST-3 central trials office. Subgroup analysis of IST-3 angiography was pre-specified [6].
Once received by the IST-3 central trials office, all images were anonymised and uploaded to a local server. Image analysis was undertaken using the Systematic Image Review System 2 (SIRS2). The use of this system for remote multireader scan assessment has been fully described [9,10]. Briefly, SIRS2 provides an environment for viewing images via a web browser (available at www.neuroimage.co.uk). Scan ratings are entered simultaneously and are automatically submitted securely to the trial database. Users are assigned specific image datasets upon which standard image manipulation functions (e.g. zoom, pan and scroll) can be applied. SIRS2 allows multiple images to be viewed in parallel. Scan ratings were entered on a structured pro forma: www.sbirc.ed.ac.uk/research/imageanalysis.html [6].

Image assessment panel
All imaging in IST-3 were assessed centrally by a panel of expert readers comprising neuroradiologists, neurologists and stroke physicians with extensive experience in the assessment of acute stroke imaging. All readers underwent scan rating training prior to joining the panel by completing the ACCESS study [9,10]. All readers were completely blinded to all clinical information including stroke symptoms, treatment allocation, time after stroke and any other image data acquired at different time points.

Image analysis in IST-3
Non-contrast CT assessment NCCT was evaluated for the extent, depth and location of acute ischaemia using an IST-3 scale and the Alberta Stroke Program Early CT Score (ASPECTS) [11,12], ischaemic tissue swelling, the presence and location of any hyperattenuated artery and background pre-stroke brain changes (brain atrophy, leukoaraiosis, prior infarct or haemorrhage) [13,14], using validated qualitative scales (details Table 1). The IST-3 scale grades infarct location and extent in any arterial territory, with up to eight categories in the middle cerebral artery (MCA) territory. ASPECTS is a 10-point scale developed to assess infarct extent within the MCA territory where points are deducted for each of the ten MCA territory regions involved; the anterior cerebral artery (ACA) and PCA territories can be included by adding 1 point for each.
CTA assessment CTA was firstly categorised as 'normal' or 'abnormal' (when any arterial stenosis and/or occlusion was identified in any intracranial location). A modified version of the TICI and IST-3 angiography (modified Mori) scales was then applied [6,7,15]. Both are scalar and range from occlusion (0) through lesser grades of obstruction to normal patency (3 for TICI, 4 for IST-3) as detailed in Table 1.
The Clot Burden Score assesses the extent of contrast deficits (as a surrogate for clot) in the internal carotid artery (ICA), MCA and ACA [16]. From an initial score of 10, points are deducted for each vessel segment involved; a score of 0 implies all segments of all named vessels are occluded.
The quality of collateral vessel supply in patients with ICA or MCA occlusion was categorised as good, moderate or poor [17].
CTA source images (CTA-SI) were assessed for deficits in contrast enhancement of brain tissue as a surrogate of impaired cerebral blood flow (CBF) and low cerebral blood volume (CBV), indicative of infarction. Extent of any perfusion deficit on CTA-SI was categorised using ASPECTS [18,19].

Observer reliability analysis
Selection of cases We identified 15 cases from the IST-3 angiography subgroup that had both NCCT and concurrent CTA performed pre-randomisation. Time-resolved CTA was not included. These 15 cases were chosen to represent a range of angiographic findings (e.g. presence/absence of arterial obstruction in various locations, clot burden) including normal appearances based on the consensus opinion of three senior neuroradiologists (details Table 2). In three of these cases, angiography was deemed to be normal. In the remaining 12 cases, arterial obstruction of varying severity (TICI 1-2b) was identified in an ICA (n=4), in an MCA (n=7) or in the basilar artery (n=1). Clot Burden scores ranged from 1 to 10. There were no significant differences between the full IST-3 CTA subgroup (n=269) and the 15 cases selected for reliability analysis for the following variables (full subgroup data are presented): age (median 81 years), sex (56 % female), National Institutes of Health Stroke Scale (median 10) and time from stroke onset to scan (median 170 min).

Selection of readers
We identified 14 readers comprised of seven (of the original ten) expert IST-3 angiography panel members (each with greater than 5 years of experience in assessing CTA in acute stroke) and seven non-expert readers (radiology trainees with less than 2 years of experience in assessing CTA).
Scan rating All 15 cases for reliability analysis were independently rated by the 14 readers. These scan ratings (of both NCCT and CTA) were performed purely to assess reader reliability and were undertaken in addition to, and separate from, scan ratings performed during the main IST-3 trial and the IST-3 angiography subgroup analysis.
Inter-observer reliability comparisons Three distinct interobserver analyses were performed. Intra-observer reliability comparisons We used expert panel readings performed during the primary IST-3 angiography rating (i.e. prior to this observer reliability study) for intra-observer  [13] Ten MCA territory divisions available: Insula, caudate head, lentiform nucleus, internal capsule IST-3 ischaemia score [12] (condensed code) 0=None 1=Small cortical, border zone or lacunar infarct, etc.
2=More than half ACA territory or >2 cm within basal ganglia, etc. 3=More than half cerebellar hemisphere or peripheral MCA territory, etc. Poor=only distal superficial MCA branches fill with contrast Brain atrophy [14] 0=None, 1=modest, 2=severe Assessed for both central and cortical regions Leukoaraiosis [15] 0=None, 1=periventricular only, 2=from ventricle to cortical surface Assessed for both anterior and posterior regions reliability analysis. Each of the seven experts had read at least one of the same 15 cases in the primary angiography assessment, and that reading could be compared with their subsequent readings performed specifically for this observer reliability analysis (Fig. 1). The readers had no knowledge of their previous scan assessment or even that a previous assessment of the same case was undertaken. All scan reads for intra-observer analysis were separated in time by at least 4 weeks, but in many cases, up to 1 year passed between scan reads.

Statistical analysis
All reliability analyses were performed using Krippendorff's alpha (K-alpha) with 1000 bootstrap samples for each. Kalpha results range from −1.0 to +1.0 where +1.0 equates to perfect agreement, 0.0 means no agreement and −1.0 implies perfect disagreement [20]. We have adopted the Landis and Koch approach for interpreting these results such as K-alpha 0.00-0.20= slight agreement, 0.21-0.40 =fair agreement, 0.41-0.60 = moderate agreement, 0.61-0.80 = substantial agreement and 0.81-1.00=almost perfect agreement [21]. Differences in K-alpha between expert and non-expert groups (including between neuroradiology specialist trainees and others) and between imaging characteristics assessed on NCCT and CTA were not tested for significance. All analyses were performed using IBM SPSS Statistics software, version 20.0 (IBM Corporation, Armonk, NY, USA). SPSS does not provide native support for K-alpha; an appropriate macro was applied [22].

CTA inter-observer agreement
Inter-observer reliability analyses for the assessment of CTA among expert and non-expert readers (n=7 for both groups)   Inter-observer agreement among experts and among nonexperts with additional training was greatest for assessing whether CTA was 'normal' or 'abnormal' (i.e. any intracranial arterial stenosis or occlusion). For assessing the extent of angiographic abnormality, IST-3 scoring performed better than TICI in all groups, although the 95 % confidence intervals overlap, so this difference is unlikely to be significant. Expert panel IST-3 angiography scores for all 15 cases are presented in Table 4. Note that while there are some discrepancies, most of the disagreements do not extend by more than 2 points.
The assessment of CTA collateral supply and the identification of a perfusion deficit on CTA-SI scored lowest for interobserver agreement in all groups.
CTA intra-observer agreement Intra-observer agreement for the expert panel assessment of CTA (K-alpha 0.33-0.72) was generally similar to their interobserver results (above). However, the wide confidence intervals for intra-observer analysis suggest these results may be underpowered (Table 3).

Observer agreement for NCCT versus CTA
Online appendix 1 provides results of the inter-and intraobserver reliability analyses for NCCT findings of the angiography expert panel. These results show fair to moderate agreement for most imaging characteristics. Identification and classification of ischaemia (using either ASPECTS or the IST-3 ischaemia score) showed the best agreement (K-alpha 0.56-0.66 for both inter-and intra-observer analyses). Identification of a hyperattenuated artery showed only fair inter-observer agreement but almost perfect intra-observer agreement (Kalpha 0.37 and 0.83, respectively). Figure 2 compares IST-3 expert panel inter-observer agreement for NCCT and CTA findings. Agreement was generally greater for the assessment of CTA features (K-alpha 0.32-0.70) than for NCCT features (K-alpha 0.13-0.66) although the ranges were similar. In addition, four of the top five agreement scores were for imaging characteristics assessed on CTA.

Discussion
In this study, where 14 observers with differing levels of experience assessed a purposive sample of 15 examinations, we show that CTA features have slightly higher levels of agreement than non-contrast CT features. Imaging characteristics that are likely to have the greatest clinical impact (e.g. the presence and severity of arterial occlusion) are reported with the highest inter-observer agreement, both by experienced (K-alpha >0.60) and inexperienced observers. There was less agreement over arterial collateral supply and use of CTA-SI to identify perfusion deficits, even among experienced observers (K-alpha 0.30-0.60). Despite being comparatively inexperienced, the participating radiology trainees that had undertaken additional neuroradiology training (neuroradiology fellows) performed as well as experts in the assessment of CTA. This implies that, with adequate training, CTA can be reliably assessed even by readers with less experience. Table 3 Observer reliability analyses for CT angiography (CTA) among expert and non-expert readers

IST-3 expert panel readers
Non-expert readers The IST-3 angiography score is an adaptation of earlier scores (TICI, Mori). It is designed to overcome the limitations of using a catheter angiography score for the assessment of CTA by primarily assessing residual arterial calibre at the point of stenosis and contrast penetration into the major distal vessels only and makes no attempt to assess distal tissue perfusion [6]. The present work represents the first external testing of observer reliability for the IST-3 angiography score, and it compares favourably with TICI.
To the best of our knowledge, there are only a few previous studies of CTA reliability in stroke; all had fewer than seven observers, and none tested all the CTA signs assessed in our study. Knauth et al. reported an inter-reader kappa=0.78 for two readers identifying the correct location of occlusion on CTA in acute ischaemic stroke [23]. Suh et al. compared TICI versus a modified TICI score and found both scales were moderately repeatable (intra-class coefficients (ICC), 0.67 and 0.73, respectively) across five readers [24]. We did not replicate the inter-and intra-observer reliability demonstrated by Puetz and colleagues in their original report of the Clot Burden Score (six readers, ICC=0.87 and 0.96, respectively) despite similar reader numbers [16]. Similarly, in the original report defining their classification of collateral status, Miteff and colleagues demonstrated an inter-observer reliability of kappa=0.93 for two observers [17]. We were unable to replicate those findings, but our results are more consistent with other methods of assessing leptomeningeal flow as demonstrated on a systematic review (0.49-0.87) [3].  [27]. These previous studies represent a mixture of kappa statistics and ICC and are not directly comparable with our K-alpha results; any comparisons should be treated with caution. Nevertheless, kappa, ICC and K-alpha work on the same numerical scale and are therefore broadly similar. We opted to use K-alpha for several reasons. Kappa is only suitable for assessing two observers rating nominal data and even then may not be the most suitable test [28,29]; we had up to seven observers per analysis and a mixture of nominal and ordinal data. K-alpha has been shown to provide a more robust measure of observer variance than kappa or ICC and provides several advantages to the user; it allows comparisons between any number of observers, it can handle both categorical and ordinal data, it is less prone to the influences of observer bias and result prevalence and it can still be computed in the presence of missing data [20,30].
Other strengths of our work include more readers than in previous studies; calculation of both inter-and intra-reader reliability; use of a robust, standardised image analysis platform, previously shown to provide consistent multiuser reporting [9,10]; complete blinding of readers to all clinical information and to any other scan assessments and use of representative cases from a multicentre trial which increases the generalisability and real world relevance of our results.
Our work also has some limitations. Firstly, in contrast to previous work [10], we did not formally produce a single reference standard for the 'correct' interpretation of the 15 scans to compare with other readers. Use of a reference standard would have allowed us to assess reader accuracy in addition to reader reliability. The results in Table 2 represent the consensus opinion of three senior neuroradiologists but are nevertheless still open to interpretation error. By confirming high observer agreement among a group of seven experienced readers, including several senior neuroradiologists, we believe that our results are as informative as reader comparisons set against any reference standard created from the same data. We do however acknowledge the possibility that the expert panel was reliable in making false diagnoses but feel this is highly unlikely. Secondly, several of the characteristics we tested in our intra-observer analyses are probably underpowered.

Conclusions
Experienced observers report CTA in acute ischaemic stroke with substantial levels of agreement for most imaging characteristics. Non-expert readers perform well if given specialist training. The IST-3 angiography score is reported as reliably as TICI and has some face validity and practical advantages for the assessment of CTA.