Trends in the incidence of pulmonary nodules in chest computed tomography: 10-year results from two Dutch hospitals

Objective To study trends in the incidence of reported pulmonary nodules and stage I lung cancer in chest CT. Methods We analyzed the trends in the incidence of detected pulmonary nodules and stage I lung cancer in chest CT scans in the period between 2008 and 2019. Imaging metadata and radiology reports from all chest CT studies were collected from two large Dutch hospitals. A natural language processing algorithm was developed to identify studies with any reported pulmonary nodule. Results Between 2008 and 2019, a total of 74,803 patients underwent 166,688 chest CT examinations at both hospitals combined. During this period, the annual number of chest CT scans increased from 9955 scans in 6845 patients in 2008 to 20,476 scans in 13,286 patients in 2019. The proportion of patients in whom nodules (old or new) were reported increased from 38% (2595/6845) in 2008 to 50% (6654/13,286) in 2019. The proportion of patients in whom significant new nodules (≥ 5 mm) were reported increased from 9% (608/6954) in 2010 to 17% (1660/9883) in 2017. The number of patients with new nodules and corresponding stage I lung cancer diagnosis tripled and their proportion doubled, from 0.4% (26/6954) in 2010 to 0.8% (78/9883) in 2017. Conclusion The identification of incidental pulmonary nodules in chest CT has steadily increased over the past decade and has been accompanied by more stage I lung cancer diagnoses. Clinical relevance statement These findings stress the importance of identifying and efficiently managing incidental pulmonary nodules in routine clinical practice. Key Points • The number of patients who underwent chest CT examinations substantially increased over the past decade, as did the number of patients in whom pulmonary nodules were identified. • The increased use of chest CT and more frequently identified pulmonary nodules were associated with more stage I lung cancer diagnoses. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-023-09826-3.


Appendix E1 -NLP algorithm
An NLP algorithm was developed to automatically identify pulmonary nodules that are mentioned in radiology reports.

Pre-processing
Each report is pre-processed by lowercasing all words, removing letter accents, splitting punctuation and digits from words, and splitting the report into sentences.After preprocessing, the algorithm processes each report sentence by sentence.

Nodule detection
The algorithm uses keywords to (partially) match pulmonary nodules.Table E1 -E4 contain all keywords that were used to identify reported pulmonary nodules.The keywords in each table are categorized by sub keywords and keywords that are matched as a whole.In order to find all nodule synonyms, we first generated word vectors with the algorithm Word2Vec [1]  by using all available radiology reports as input.Then we manually inspected the 200 nearest neighbours for the vector belonging to "nodule" to make a list of positive keywords (Table E1 and E2).Subsequently, the algorithm was iteratively updated with new rules to minimize the number of false positives on the development set (see the Materials and methods section in the main article).
In order to increase the precision of the algorithm, any nodule keyword was combined with another lung-related keyword (Table E3) in the previous, same, or next sentence.If a blacklisted keyword (Table E4) was used in the same sentence as a nodule keyword, then the nodule was ignored.Another method to increase the precision of the algorithm was to categorize nodule keywords into unambiguous and ambiguous keywords.Unambiguous keywords are synonyms of a nodule, while ambiguous keywords may describe a nodule only in specific contexts.Therefore, ambiguous keywords are combined with diameter measurements or adjectives that indicate a small or nodular shape: pointy (Dutch: "puntvormig"), spherical (Dutch: "bolvormig"), nodular (Dutch: "nodulair"), or small (Dutch: "kleine").Any nodule description preceded by the word "no" (Dutch: "geen") was ignored, which indicates that a negation was used.

Nodule diameter detection
For determining the nodule diameter, regular expressions were used to find any valid combination of a digit and metric (mm or cm) in the same sentence as the detected pulmonary nodule.A diameter measurement was ignored if it was preceded by the words "was" or "previously"; those indicate that the measurement originates from a previous study.

Word2Vec training procedure
We trained a Word2Vec algorithm [1] with the continuous bag-of-words (CBOW) architecture on all 166,688 radiology reports in our corpus to find nodule keywords for the NLP algorithm.
We developed a custom Word2Vec implementation in PyTorch, a Python Deep Learning library.The model was trained with a Nvidia TITAN xp GPU (12 GB).
The input dataset was pre-processed by removing letter accents, lowercasing words, and tokenizing the texts.Punctuation marks were kept in the dataset.All infrequent words in the corpus (fewer than 5 occurrences) were replaced by a placeholder token in order to prune the word vocabulary (36K words).A context window of 10 words was used to generate samples, 5 words before and after the target word.Any sentence with fewer than 11 words was padded with a placeholder token.Subsequently, the dataset (24M samples) was randomly split into a training and validation set: 90% of the generated samples were used for training and the remaining 10% for validation.
The model was trained until its performance on the validation set stopped improving (this occurred after two epochs).The batch size was set to 1024 and the learning rate of the Adam optimizer was set to 0.001.The dimensionality of the word vectors was set to 128.The subsampling threshold was set to 0.001, so that higher-frequency words were randomly downsampled.

Appendix E2 -Nodule malignancy verification for lung cancer stage I
We manually verified the malignancy of newly reported pulmonary nodules in patients with a subsequent lung cancer stage I diagnosis within two years.

Procedure
In the period of 2010-2017, 479 new positive studies were identified with a subsequent stage I lung cancer diagnosis.For each case, all corresponding radiology reports were collected and inspected by an experienced radiologist (E.T.S.).The reported nodules were linked to the lung cancer diagnoses based on the size and lobe location as recorded in the Netherlands Cancer Registry (NCR).Any described morphology (e.g., spiculated, lobulation, etc.) or reported growth was checked as well.If the information in the radiology reports was inconclusive, then the corresponding CT scans were also assessed.Any discrepancies between the automated analysis and manual check were recorded.

Results
From the 479 new positive studies, 417 studies (87%) had a nodule that matched with a subsequent stage I lung cancer diagnosis.Table E16 shows an overview of the 62 discrepancies that were found (13%).Most cases (n=53) concerned a matched mass instead of a nodule.All matched nodular lung cancers were found to be at least 5 mm in size.   1 Excluded all patients with a positive chest CT scan within the previous 2 years. 2 Also includes pulmonary nodules without reported diameter. 3Nodule and corresponding lung cancer location were manually verified (see Appendix E2).  1 Excluded all patients with a positive chest CT scan within the previous 2 years. 2 Also includes pulmonary nodules without reported diameter. 3Nodule and corresponding lung cancer location were manually verified (see Appendix E2). 1 Excluded all patients with a positive chest CT scan within the previous 2 years. 2 Also includes pulmonary nodules without reported diameter. 3Nodule and corresponding lung cancer location were manually verified (see Appendix E2).

Table E1 .
Unambiguous keywords for finding nodules.Keywords with an asterisk can return partial matches.

Table E2 .
Ambiguous keywords for finding nodules.

Table E3 .
Keywords related to the lungs or bronchi.Keywords with an asterisk can return partial matches.

Table E4 .
Blacklisted keywords.Keywords with an asterisk can be matched partially.

Table E5 .
Discrepancies between manual and automated analysis of reported nodules with subsequent stage I lung cancer diagnosis in 479 new positive studies.

Table E6 .
Overview of the number of lung cancer diagnoses of patients who underwent a chest CT in hospitals A and B between 2000 and 2019, stratified by year and basis.Lung cancer diagnoses were selected based on the ICD-O codes C340-C349[2]with any morphology.

Table E7 .
Overview of the number of extrapulmonary cancer diagnoses of patients who underwent a chest CT in hospitals A and B between 2000 and 2019, stratified by year and basis.Together with the lung cancer diagnoses, they were used to determine the history of malignancy for each patient.

Table E8 .
Annual number of positive chest CT scans in hospital A, patient and scan-level data(2008-2019).

Table E9 .
Annual number of patients with new positive CT scan and those followed by lung cancer diagnosis within two years in hospital A (2010-2019).

Table E10 .
Annual number of patients with new positive chest CT scans followed by lung cancer diagnosis in hospital A (2010-2017), stratified by cancer stage (II-IV) according to the respective TNM Classification at the time of diagnosis.

Table E11 .
Annual number of positive chest CT scans in hospitals B, patient and scan-level data(2008-2019).
1 Also includes pulmonary nodules without reported diameter.

Table E12 .
Annual number of patients with new positive chest CT scans and those followed by lung cancer diagnosis within two years in hospital B (2010-2019).

Table E13 .
Annual number of patients with new positive chest CT scans followed by lung cancer diagnosis in hospital B (2010-2017), stratified by cancer stage (II-IV) according to the respective TNM Classification at the time of diagnosis.

Stage I lung cancer analysis for patients without history of malignancyTable E16 .
Annual number of patients with new positive chest CT scans and those followed by lung cancer diagnosis within two years in hospitals A and B (2010-2019).All patients did not have any history of malignancy (both extrapulmonary and pulmonary) within 10 years before the CT examination.

Table E17 .
Diameter distribution of newly reported pulmonary nodules per year, hospitalA  and B combined (2010-2019).Only the largest reported pulmonary nodule finding per study is counted, which has not been preceded by another pulmonary nodule finding within the previous two years.

Annual number of patients with new positive chest CT scans followed by lung cancer diagnosis stage II-IVTable E18 .
Annual number of patients with new positive chest CT scans followed by lung cancer diagnosis in hospitals A and B (2010-2017), stratified by cancer stage (II-IV) according to the respective TNM Classification at the time of diagnosis.