Introduction

Diaphragm ultrasonography is a rapidly growing field of research, with close to 3000 PubMed-listed publications over the last decade. It has been shown to be a feasible and accurate tool to assess diaphragm anatomy, respiratory physiology and, especially in ventilated critically ill patients, pathology [1,2,3,4,5,6].

The currently most well-studied methods of diaphragm ultrasonography include assessment of changes in muscular thickness over time, contractile activity (i.e. thickening fraction) and excursion during active breathing [1, 7, 8]. With these parameters, the physician can quickly obtain valuable information at the bedside with little patient burden. Important applications include mapping loss of muscle mass through repeated measurements of thickness, determining adequate ventilatory support through assessment of excessive and insufficient contractile activity, and predict the outcome of liberation attempts from mechanical ventilation and detection of patient–ventilator interaction through temporal comparison of pressure curves of the ventilator with contractile activity and excursion of the diaphragm [9,10,11,12,13,14,15,16,17,18].

While the areas of implementation are well understood, guidelines for methodology such as transducer settings, image acquisition and ventilator impact on measurements do not, or only partially, exist and are mostly derived from narrative reviews. Significant variability in diaphragm ultrasonography methodology hampers quality and comparison of studies in this field and, consequentially, implementation in daily clinical practice.

As such, we set out to perform a Delphi process across seven categories, including diaphragm anatomy, transducers settings, image acquisition technique, limitations of mechanical ventilation through passive displacement of the diaphragm, guidance for learning and obtaining expertise and application in clinical practice. The aim of this study was to provide a consensus statement towards a universal measurement protocol for diaphragm ultrasonography in research and daily practice and determine key areas for future research.

Methods

Between November 2020 and March 2021, international experts were invited to participate in a Delphi procedure using web-based questionnaires as method for consensus development. This method was chosen as it serves to establish consensus on topics with unclear and/or conflicting evidence, while at the same time allowing exploration of fields beyond existing knowledge [19].

Panelists were invited based on their proven expertise in diaphragm ultrasonography with prior publications. This entailed at least two peer-reviewed publications with original data of which one as leading author, with either diaphragm excursion, thickness and/or thickening as the main outcome variable in an adult critical care setting. Experts from different hospitals, countries and continents were invited to minimize risk of establishing local viewpoints as consensus.

Inception of the survey consisted of several steps. First, an epidemiologist (LM) specialized in Delphi methodology was consulted for the process design. A two-round survey was selected as appropriate method to form consensus, providing sufficient rounds to reach consensus without risking dropouts due to the extent of the survey [20]. Second, a literature review was performed of recently published (systematic-) review articles listed in PubMed on diaphragm ultrasound in critical care medicine. Based on previous knowledge on the topic and information gathered from the literature, topics relevant for the survey were established by two researchers (MH and PT). These topics were then grouped within seven overarching categories, to organize and provide better overview of the framework for a measurement protocol. These included “Anatomy and physiology”, “Transducer settings”, “Technique”, “Ventilator Impact”, “Learning and Expertise”, “Daily Practice” and “Future Directions”. Third, based on these categories, five investigators (MH, EL, PT, JS and LM) of the lead research group created the questions and statements to be included in the pilot survey. Fourth, the pilot survey was then conducted within a local expert group (HV, AJ, MW, MHe) to evaluate comprehensiveness and comprehensibility of the questionnaire and adapted accordingly.

The pilot round and both study rounds contained questions based on a 5-point Likert scale, ranging from strongly disagree to strongly agree [21]. No default answers were preselected to avoid introducing bias to the experts’ responses, and every question contained a free text response if the panelists desired provision of additional comments. In addition, the questionnaire contained open questions to explore the panelist’s views and opinions across several fields. Questions were organized by seven domains (as outlined above), each on its own page with a bar indicating progress across the questionnaire to create better overview and to minimize straightlining (selecting the same response down a line of survey answers). An a priori cut-off at ≥ 68% was determined as the minimum threshold to reach consensus on an individual question and provide a statement [22,23,24]. This threshold was deemed appropriate to facilitate formation of consensus while allowing for disagreement and collection of arguments in case of the latter.

After the first round, a detailed summary of all statements and corresponding answers, arguments and percentage consensus was distributed to all panelists. All panelists were blinded to the identity of other panelists. While the steering committee was not blinded to the identity of the panelists, they were blinded to the individuals to whom the answers and arguments pertained. The second round contained modified and new questions based on the panelist’s answers and feedback from the first round. Questions on which consensus was already achieved in round one were not repeated in this round. A summary of study proceedings is provided in Fig. 1.

Fig. 1
figure 1

Flow chart of the study procedure

The study was pre-registered on the Open Science Framework with registration digital object identifier https://doi.org/10.17605/OSF.IO/HM8UG [25].

Results

Eighteen panelists were invited to participate in the Delphi process of which 15 replied, 14 agreed to participate and 1 declined. This panel was formed by intensivists from Canada, China, France, Greece, Italy and The Netherlands. Of the 14 participants, a response rate of 100% was achieved for all questions on both rounds. A full list of experts is provided in the Acknowledgements.

In the pilot round, 89 questions were established and grouped into seven categories. Several changes were made which included omission of redundant questions, addition of new questions and changes regarding completeness and comprehensibility. This resulted in a survey with a total of 88 questions for Round 1. Of these, 35 questions were designed to collect opinions and arguments and 53 to reach consensus on the respective question. In round 1, consensus was reached on 33 questions. With answers provided from Round 1, the survey for Round 2 was established. Round 2 contained 29 new questions, of which 7 questions were designed to collect opinions and arguments and 22 to reach consensus on the respective question. In round 2, consensus was reached on 13 questions. In total, this resulted in 75 questions with the possibility for consensus across Rounds 1 and 2. Consensus was reached in 46 (61%).

A summary of the number of questions organized by rounds and categories is provided in Table 1. A more detailed overview of the questionnaire and outcome reached is provided in the Additional files 14. The results are summarized per category and provided in Tables 2, 3, 4 and 5. Consensus statements for anatomy and physiology are presented in Table 2. Consensus statements for transducer settings and technique are presented in Table 3. Visual examples of the statements are provided in Additional file 5. Statements for learning and expertise are presented in Table 4. Areas for future research are presented in Table 5.

Table 1 Summary of survey rounds
Table 2 Diaphragm anatomy and physiology, and ventilator impact in diaphragm ultrasonography
Table 3 Diaphragm ultrasonography: transducer settings and technique
Table 4 Learning, expertise and applications of diaphragm ultrasonography in clinical practice
Table 5 Future directives

Discussion

This Delphi study on diaphragm ultrasonography is the first consensus-based approach to formulate statements on methodology taking into account the effects of diaphragm anatomy, physiology, impact of ventilator settings on ultrasound measurements, transducer settings and technique of image acquisition. Statements for learning and reaching expertise in diaphragm ultrasonography were also formulated. Through this process, we defined areas for application in daily clinical practice, identified areas of controversy and established key opportunities for future research.

Given the rapid growth of diaphragm ultrasonography as tool in daily clinical practice and in research, a measurement protocol and recommendations for acquiring expertise in a critical care setting were urgently needed. While some clinical and literature review studies exist that address aspects of image acquisition and areas for clinical implementation, none also fully encompass the variety of additional key components such as effects of diaphragm anatomy, physiology and impact of ventilator settings on ultrasound measurements. In addition, previously reported methodologies for diaphragm ultrasonography reflected local, rather than international, consensus. With the collaboration of a large group of international experts, we aimed to overcome these limitations and generate guidelines with direct implications for clinical practice and research. In the following paragraphs, we discuss areas of consensus and controversy of special interest.

First, regarding ultrasonographic anatomy of the diaphragm, consensus was established on > 10% decrease as relevant cut-off for atrophy. This is highly relevant as it has been shown that diaphragm atrophy impacts clinical outcomes such as duration of mechanical ventilation. However, for an increase in thickness, which is equally interesting in terms of potential clinical impact, no cut-off was established. It was considered impossible to distinguish the cause for the increase in thickness, for example true muscular hypertrophy from inflammation, oedema or fibrosis. In this regard, evaluating the echogenicity of the diaphragm, and thereby potentially quality of the diaphragm, was agreed to be an area of special interest for future research [26].

Another important point of controversy was continuity of muscle thickness throughout the zone of apposition. Settling this debate is pertinent, as in case differences in thickness do exist, the location of measurement could impact the obtained thickness and derived parameters of functionality such as the thickening fraction. Available evidence is scarce and only reflects local and not global thickness [27, 28].

Second, various controversies remain regarding physiology of diaphragmatic contraction. For one, no consensus was reached on which moments of the respiratory cycle are better for taking measurements, e.g. peak inspiration versus end-inspiration. Whether these subtle differences impact final measurements is unknown and remains to be investigated. Until then, a pragmatic approach would be taking measurements of the thickest and thinnest state of the diaphragm. Another point of debate is the clinical utility of measurements of maximum effort. While panelists agreed that they theoretically provide clinically useful information about diaphragm functionality, the critical concern is variability in eliciting maximum efforts and estimating if maximum efforts were given by the patient. As follows, methods that allow standardization of eliciting maximum efforts are necessary for implementation in clinical practice [29]. An additional point of controversy was the cut-off value for diaphragm dysfunction assessed by the thickening fraction. While various cut-offs for thickening fraction as parameter for failing spontaneous breathing trials or extubation exist, the conducted studies vary strongly in terms of outcome definition and patient population [14, 18, 30, 31]. Even in healthy individuals, normal values have been shown to be highly variable and body position dependent [27, 32, 33]. We hypothesize that these aspects are key limiting factors of forming consensus, as cut-off values might vary according to the context of the measurement. As follows, determining context-specific (e.g. during (un-)assisted breathing, respiratory distress, spontaneous breathing trial, etc.) or outcome-related cut-off values (e.g. failing extubation, at risk for exhaustion and intubation, stratification of over- or under-assistance by ventilator) is an important next step.

Third, vital steps were taken towards a measurement protocol for diaphragm excursion and thickness in the critically ill. Choice of transducer, ideal depth and gain, transducer positioning and alignment in regards to the diaphragm were agreed upon. A crucial statement is that diaphragm thickness should be measured between the pleura and peritoneum, not including them into the total thickness. These are important steps in reducing heterogeneity between measurement methodology and thus increasing external validity in research and clinical practice. Nevertheless, some important aspects still remain without consensus. These include making thickness measurements in line with or crossing the intercostal space and obtaining images in B-mode or M-mode. The advantage of M-mode is allowing more accurate timing of the respiratory cycle, while advantages of B-mode are better spatial orientation and ease of use. For now, there is no evidence favouring either method and no clear advantages are directly apparent [34, 35]. Until this issue is resolved, clinicians are encouraged to use the method they are most comfortable with and researchers to clearly state their method of choice.

Fourth, essential areas of application for diaphragm ultrasonography in daily practice were determined. These included, among others, evaluating diaphragm dysfunction, prognosticating difficult weaning and detecting patient ventilator asynchrony. At the same time, factors limiting the applicability and/or interpretability of measurements were also established. In this regard, clinicians are recommended to appreciate the measurements in the light of the impact of positive end expiratory pressure on measurements, through diminished excursion and higher resting thickness due to the lower resting position [36]. The same holds true for the effect of positive pressure ventilation on reduced patient effort and passive displacement.

Lastly, a new consensus was reached on the minimum training necessary to achieve sufficient proficiency to use diaphragm ultrasound, including excursion, thickness and thickening, in clinical practice. The threshold was agreed upon to be at least 40, ideally bilaterally performed, examinations of which half should be under supervision. However, the consensus reached poses a general statement that does not take prior ultrasound experience into account [37]. In addition, we emphasize that this statement does not address the minimum number of examinations needed to obtain high reproducibility, which was demonstrated in a previous study, but the minimum training necessary for using diaphragm ultrasonography in clinical practice and guide decision-making [4].

Strengths and limitations

There are important strengths and limitations to this study that merit consideration. First, the selection of panelists was limited to physicians with a strong scientific background. This resulted in a selected group, and local experts and educators with thorough knowledge and clinical experience but without peer reviewed publications might have been missed. Nevertheless, the advantage of this approach is guaranteed expertise with in-depth knowledge of current scientific viewpoints, which strengthens the statements formulated. Second, the classical Delphi approach does not include the possibility for a live discussion among the panelists, which could potentially help elucidate, clarify and resolve complex issues. However, it does allow for complete anonymity which prevents interference of group dynamics and provides opportunity to express unpopular or controversial opinions [38]. In addition, the possibility was presented to provide arguments for the answers selected, which were also presented to the panelists. Third, this study included two rounds. More rounds could have provided opportunity to discuss unresolved issues. Nevertheless, this did result in full completion of the surveys by all panelists, which would become less likely with increasing number of rounds [20].

Conclusion

This expert consensus statement presents the first set of evidence-based statements on diaphragm ultrasonography methodology. They serve to ensure high-quality measurements in daily clinical practice and in research. In addition, important gaps in current knowledge and thereby key areas for future research are established.