With much interest, we read the correspondence from Mento and colleagues, as well as the original study by Volpicelli and colleagues [1, 2]. We would like to contribute original study results to move the discourse on optimal lung ultrasound methodology forward.

Although the work by Mento provides an interesting perspective, we believe that the method used to study agreement between lung ultrasound protocols may have inherently led to the presented conclusions.

The authors compare the proportion of worst lung ultrasound scores (LUS) across different protocols with a subjectively selected 14-zone protocol as reference standard.

First, whether the reference standard accurately represents total pulmonary involvement is uncertain. In fact, previous research has shown equivalence of both 6- or 12-zone protocols compared to gold standard chest computed tomography (CT) [3, 4]. Second, the 14-zone protocol’s overrepresentation of posterior zones (43%) constitutes a scan-location bias, which is problematic when examining disease with gravity-dependent distribution. Consequently, comparing worst scores of predominantly posterior LUS protocol to worst scores of predominantly lateral or anterior LUS protocols inevitably leads to lower agreement. Third, exclusively evaluating worst LUS disregards a plethora of particulars needed to assess true agreement between protocols.

We present the results of a study with robust methods to comprehensively evaluate agreement between LUS protocols.

We performed a prospective observational study at the tertiary intensive care unit of the Amsterdam University Medical Centers, location VUmc. The study was approved by the local ethics board and need for informed consent was waived. A total of 191 examinations from 102 critically ill patients (81.4% male; mean age 64.9 ± 11.4) affected by coronavirus disease 2019 (COVID-19) were examined and analyzed. Full methodology is described in Supplementary S1. Reference test was a 12-zone LUS protocol which has shown to have monitoring equivalence to CT and index test was a 6-zone LUS protocol (Fig. 1A) [4]. Each LUS zone was scored from 0 (A-pattern) to 3 (consolidation). A LUS index (LUSI = (LUS/LUS achievable) × 100) was calculated for both.

Agreement was tested using Spearman’s correlation coefficient, Bland–Altman plot, and smallest detectable change with accompanying 95% confidence intervals (Supplementary S2).

The Spearman’s correlation coefficient was 0.944, indicating a strong correlation. The Bland–Altman plot (Fig. 1B) exhibited a constant bias, indicating that 6-zone LUS was consistently 1.9% (95% CI 1.1%, 2.7%) higher than 12-zone LUS. No proportional bias was found, signifying that imaging protocols agreed equally across disease severities. The limits of agreement of 10.8% (95% CI 7.4%, 14.2%) were smaller than the calculated smallest detectable change of 17.4% (95% CI 11.8%, 26.1%) (p = 0.019, derived from 10,000 bootstrapped comparisons), indicating that differences between protocols were smaller than the measurement error (comparing each protocol to itself would have led to similar limits of agreement).

Fig. 1
figure 1

Lung ultrasound reference standard (blue) and index test (asterix) (a), and the Bland–Altman plot (b). Each point represents agreement between the index and reference test in one examination in one patient. A jitter effect was added to improve visualization of data and avoid direct overlap of multiple examinations. LoA Limits of Agreement, LUSI lung ultrasound score index—the lung ultrasound score expressed as a percentage of total score achievable

Monitoring COVID-19 with more than six zones does not appear to provide additional clinical information. This is important, because much of lung ultrasound’s value is owed to its efficient bedside applicability, particularly in time and resource strained settings, such as the COVID-19 pandemic. Although these results need to be validated comprehensively, this study agrees with previous investigations concerning optimal number of lung ultrasound zones: less is more [5].