Introduction

Lobar fissures consist of a double layer of visceral pleura that separates the anatomical lung lobes [1]. As such they serve as important anatomical landmarks in recognising the pulmonary lobar structure and the regional assessment of the extent and distribution of lung disease. Lobar fissures are often incomplete, leading to parenchymal fusion between the lobes [13]. The relationship between fissure incompleteness and collateral flow across adjacent lobes is unknown. Recent treatment strategies aimed at causing regional volume loss by occluding or disrupting airflow to diseased segments of the lung have resulted in a need to better understand the anatomy and functional consequence of fissure completeness. In a recent randomised study of bronchoscopic lung volume reduction (BLVR) treatment for emphysema patients it was shown that radiographic evidence of fissure completeness was, in fact, associated with a better response to treatment [4].

Incomplete pulmonary fissures and their variations based on autopsy data were reported for the first time by Medlar [5] in 1947. Subsequently, several papers have been published reporting the percentages of incomplete fissures observed in computed tomography (CT) data by visual scoring [13, 6, 7]. The present study advances the existing literature in two ways. First, a method for quantitatively assessing fissural completeness is proposed, and second, the extent to which fissures are classified as complete versus incomplete in a group of patients with severe emphysema is evaluated, whereas previous studies focused on disease-free lung parenchyma. The purpose of this study is to evaluate the proposed automatic fissural completeness quantification of a database of emphysema patients by comparison with a three reader consensus.

Materials and methods

Data collection

From an anonymised image database of patients with emphysema collected in the setting of clinical trials, 1495 CT studies were identified with severe homogeneous emphysema (defined as having an emphysema score at −910 Hounsfield Units (HU) of at least 25% in each lobe, and a difference in emphysema score for the different lobes of less than 25%). A subset of 116 patient CT studies were selected if they met the criteria of having near-isotropic voxel size and slice spacing (axial resolution between 0.6 and 0.8 mm and slice spacing of at most 1 mm). 20 cases were used for developing the computer system and another 96 were read for testing. Table 1 provides descriptive baseline statistics for the 96 CT studies used in this paper. The patient characteristics and emphysema severity were the same for the development set and the reading set. Image acquisition protocols were standardised for specific CT systems at each site to maintain a consistent image quality across the different sites and systems. Eleven different CT systems from four manufacturers were used to obtain the 96 CT studies. Images were obtained at maximum breath hold to total lung capacity after careful breath hold coaching (120 KV, 80 to 110 mAs; reference mAs for regular to large patients). Informed consent and IRB approval for participation in the trials and for the analysis of the images by our imaging laboratory was obtained for each participating centre.

Table 1 Subject baseline characteristics for all 96 subjects

Visual fissural completeness scores

Before analysis, visual assessment of fissural completeness was performed by carefully inspecting the axial, sagittal and coronal views, without any reference to the computer output. For visual scoring a trichotomous scale was used: complete, partial, and absent. The fissure was scored as complete when it was thought to be visually complete and there was no evidence of bronchovascular structures crossing from adjacent lobes. The score absent was used when no appreciable fissure could be seen. In all other cases, the fissures were scored as partial. For all 96 CT studies, two radiologists with four and six years’ experience independently scored the completeness of all lobar fissures. For cases where the two radiologists disagreed, a consensus reading was performed with a chest radiologist who had 14 years’ experience. The two radiologists disagreed in 36 out of 288 (12.5%) fissures, out of which 12 were right major fissures, 13 right minor fissures and 11 left major fissures. The kappa between the readers were 0.75, 0.67 and 0.74, for the right major, right minor and left major fissures respectively

Automatic fissural completeness quantification

The lungs, fissures and lobes were automatically segmented using previously proposed and evaluated methods [810]. The lung segmentation was performed using a hybrid method specifically designed to overcome segmentation errors typical in CT studies exhibiting pathological features [8]. The lung segmentation begins with a conventional approach to segmenting the lungs using region-growing and morphology. Segmentation failure was automatically detected based on statistical deviation from a range of volume and shape measurements. To CT studies with segmentation errors, an advanced, multi-atlas-based algorithm using non-rigid registration was applied.

Fissure detection was based on pattern classification using image derivatives, image gradient and Hessian features. The pattern classification was trained based on manually identified examples of fissure and non-fissure voxels, as described in van Rikxoort et al. [9]. For this study, the fissure detection method was trained using manually segmented fissures in the 20 CT studies set aside for development.

Given the fissures detected, the lobe borders were automatically segmented by fitting an atlas with complete fissures (see van Rikxoort et al. [10]). As a result, the lobe borders are derived, even when the fissures are incomplete. The completeness of a pulmonary fissure is determined from the segmentations of the lobe and fissure by assigning each point on the detected fissure to the closest point on the lobe border; voxels on the lobe border not assigned to a fissure voxel are considered to be non-fissure and therefore incomplete. The percentage of lobar border voxels identified as complete quantifies the completeness of the fissure. The result of the automatic fissural completeness quantification is visually displayed as a colour-coded lobar boundary. The procedure of fissural completeness calculation is illustrated in Fig. 1.

Fig. 1
figure 1

Illustration of the calculation of the percentage of the fissure’s completeness. The left frame shows an original coronal slice of a right lung. In the second frame the automatically found (incomplete) fissure is shown, followed by the automatically determined lobar border in the third frame. The last frame shows the colour-coded lobar boundary that is the output of the system. Voxels in yellow indicate fissure voxels, voxels in red indicate a lobar boundary without a fissure. The percentage complete is now simply calculated as the number of yellow voxels divided by the total number of voxels on the lobar boundary. Note that this image is a 2D slice, but the calculation of fissural completeness is performed in 3D

Statistical analysis

Descriptive statistics on fissural completeness for both the visual scoring and the automatic quantification are provided. Box-plots are provided for the automatic score stratified by the different visual categories for the consensus read. Rank sum tests were performed to test whether the automatic quantification is able to distinguish complete and partial fissures. In case three categories were scored by the visual read (complete, partial, absent), the Kruskal-Wallis test was applied. To test the sensitivity of the automatic quantification, Receiver Operating Characteristic curves (ROC) were constructed for the automatic quantification compared with each visual read, defining a complete fissure as a positive test. The level of statistical significance was set to 0.05. This was implemented using Stata V.10.0 (College Station, TX, USA).

Results

Figure 2 shows descriptive statistics for both the automatic quantification and the visual consensus scores. It can be appreciated that the left major fissure is most often scored as complete by the visual consensus as well as by the automatic method. The automatic computation was successfully run on all CT studies. Figure 3 shows examples of the output of the automatic fissural completeness method on coronal slices of 4 CT studies.

Fig. 2
figure 2

a: Descriptive statistics for the 96 CT studies used in this paper for the automatic method. b: Descriptive statistics for the 96 CT studies used in this paper for the visual consensus read

Fig. 3
figure 3

Sets of 4 coronal slices from 4 different CT studies and the corresponding output of the automatic fissural completeness quantification method. Voxels in yellow indicate fissure voxels, voxels in red indicate a lobar boundary without a fissure. a Example of a CT study for which all lobar fissures are complete. In (b) an example of an incomplete right minor fissure is shown. c and d Examples of incomplete major fissure(s)

Figure 4 shows boxplots for the completeness of each fissure compared with the consensus read, divided into the different categories of the visual reads. For all three fissures, compared with each visual read, the automatic quantifications are significantly different for the complete fissures than for the partial fissures (all p values <0.001). In addition, for the right minor fissure, the automatic scores for the absent fissures are significantly different from those of the partial and complete fissures (p < 0.001).

Fig. 4
figure 4

a: Box-plots showing the quantitative fissural completeness score for each visually scored category for the right major fissure. b: Box-plots showing the quantitative fissural completeness score for each visually scored category for the right minor fissure. c: Box-plots showing the quantitative fissural completeness score for each visually scored category for the left major fissure

The ROC curves evaluating the automatic fissural completeness quantification to the individual reads and the consensus read are shown in Fig. 5, along with the corresponding areas under the ROC curve (AUC). The AUCs from the consensus read were 0.88, 0.91 and 0.83 for the right major, right minor and left major fissures respectively.

Fig. 5
figure 5

a: ROC curves comparing the automatic continuous completeness score with those of the individual readers and the consensus read for the right major fissure. b: ROC curves comparing the automatic continuous completeness score with the individual readers and the consensus read for the right minor fissure. c: ROC curves comparing the automatic continuous completeness score with those of the individual readers and the consensus read for the left major fissure

Discussion

An automatic method of detecting and assessing the completeness of fissural surfaces has been described and tested. The completeness of pulmonary fissures may be important in the correct selection of patients for novel therapeutic techniques including BLVR treatment [4] and is commonly assessed by radiologists from a chest CT. Visual assessment of fissural completeness is time-consuming and tedious on large thin section reconstructed chest CT. In addition, visual assessment of fissural surface completeness is challenging given the anatomy of the fissures and the need to assess them on multiple planes of reconstruction. Several previous studies have attempted to evaluate fissural completeness in the last decade and their results are provided in Table 2. These studies were based on visual assessments in cohorts of relatively normal subjects. Interestingly, despite our study being performed on patients with severe COPD and differing definitions of completeness, our findings were within the range published. The results of Mahmut and Nishitani [6] differ most from all other studies listed and the present study. This is probably caused by their definition of incomplete fissures—more than 1 cm missing—as well as their cohort of normal subjects.

Table 2 Summary of studies evaluating fissural completeness using computed tomography. The percentages in columns 4–6 indicate the percentage of CT studies for which incomplete fissures were observed

Volumetric thin-section CT data allow for precise delineation of the interlobar fissures and allow automatic methods of recognising the fissures with high accuracy (e.g. [912]). Precise quantification is difficult using the eye alone. Gülsün et al. [2] visually determined the percentage of incompleteness of the major fissures in five categories: 0%, 1–25%, 26–50%, 51–75% and 76–99%. In their study population of 144 CT studies of disease-free patients they found that most incomplete major fissures were between 1 and 25% incomplete (around 70%), and that most of the remaining incomplete fissures were scored to be 26–50% incomplete; no fissures were found to be more than 75% incomplete. More recently, Ozmen et al. [7] used a similar visual quantification score for all lobar fissures. For the major fissures similar results were found as in Gülsün et al. [2]. For the right minor fissure, they found the extent of incompleteness to be almost equally divided over the different categories. Advantages of computer-based schemes are that quantification is more robust, precise, and reproducible. The descriptive statistics provided in Fig. 3 show a similar trend to that described in Ozmen et al. [7]: if the major fissures are incomplete the percentage of incompleteness is mostly between 10% and 30%; for the right minor fissure the percentages are more spread out.

It is interesting that there was a difference in the agreement between the right major and minor fissures and the left major fissure, with better agreement seen on the right. A possible explanation for this is that the degree of incompleteness in the left major fissure is often less than in the right major and minor fissures as reported by Aziz et al. [1], and therefore the readers tended to have more variation in the ability to recognise completeness.

We suggest that the best use of the proposed method is as a support aid for a visual read by a radiologist. The method indicates incomplete parts of the fissure and provides the exact quantification. The reader can quickly assess if he/she agrees with the method and only in cases where the reader disagrees with the method is another strategy or a complete visual read necessary. Most disagreement between the readers in this study occurred in the area of fissures visually thought to be complete or incomplete to a small degree; out of the 36 discordant fissure reads, 27 (75%) were scored by the automatic method to be between 86% and 97% complete. This is exactly where the automatic method would be beneficial; the readers agree that there might be a gap in the fissure, but the exact quantification of the size of the gap is difficult for humans to judge and is assessed much more accurately by computers.

The limitations of our study include the use of a truth panel as the gold standard and the use of a trichotomous scale for the visual read. This protocol was used for the visual read as it is the current standard for visual fissure completeness assessment in several BLVR trials [4].

To conclude, a fully automatic, quantitative method is able to quantify fissural completeness in a manner comparable to expert radiologists in a cohort of subjects with severe emphysema.