Improving the quality of care given to critically ill patients is highly desirable. The Institute of Medicine (IOM) in 1999 [1] described quality as the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge. Safety was defined in that report as the absence of clinical error, either by commission (unintentionally doing the wrong thing) or omission (unintentionally not doing the right thing) [2], and error as the failure of a planned action to be completed as intended or the use of a wrong plan to achieve an aim. This is similar to the definition of safety used by the World Health Organization where safety is described as the reduction of risk of unnecessary harm to an acceptable minimum [3]. Patient safety is an integral part of the quality agenda; it is difficult to provide effective care where safety is compromised [4]. As the safety and quality agenda have developed, the boundaries between safety and quality have significantly blurred, and it is now hard to look at the one isolated from the other.

This definition of quality is, in fact, very similar to an older, more conceptual, framework proposed by Donabedian, in which he described the measurement of quality of healthcare as being related to three distinct dimensions: structure, process and outcomes [5]. This model has subsequently been refined by many different authors to include a number of extra dimensions, such as the experience of care that the patient receives as an entity distinct from the outcomes of the care, and also the timeliness and accessibility to the care that allows for assessment of equity and cost-effectiveness [6].

There are many examples of performance improvement processes in the literature. These are all grounded in a philosophy of striving to improve the delivery of quality care. These processes use many different methods, with some being quantitative and others qualitative in design. Both methodologies are important. The qualitative techniques are often used to look at complex interactions between caregivers and patients. They can be used to explain how, why and what is happening. Quantitative measures, on the other hand, are able to develop and test hypotheses that measure whether an intervention improves the care and by how much [7, 8].

In 2009, as part of a series of actions to raise the awareness of both professionals and the public to the issue of patient safety, the European Society of Intensive Care Medicine (ESICM) initiated a task force with the aim of improving the safety and quality of care provided to critically ill patients. This initiative was the follow-up of a series of ESICM supported studies investigating the level of patient safety events in Intensive Care Units (ICUs) around Europe and subsequently documented the scale of the problem [9, 10]. This task force developed a directive for change that was signed by 57 national and international critical care organizations in the Declaration of Vienna [11]. One of the outputs that this task force was requested to achieve was the identification of a set of indicators that could be used to measure the quality of care provided on any ICU to drive future improvements in performance.

This paper describes the results of this task force. In this study the group assessed a number of indicators that could be used to measure and improve the quality and safety of care in ICUs by being mandated and integrated into routine practice. Using a modified Delphi process, a group of these indicators that should be recommended for more widespread use as mandatory safety indicators was delineated.


This study was performed by the Safety and Quality Task Force of the ESICM and used a nominal (expert) group together with a modified Delphi process. The nominal group consisted of 18 nominated experts coming from nine different countries, all of whom had a pre-declared interest in safety and quality.

The first stage of the process was to identify all possible indicators in current use that relate to both quality and safety of care. These were identified by the nominal group through contacting national authorities and benchmarking organizations, as well as through personal contacts and a search of online databases (MEDLINE and EMBASE). After eliminating duplications and imprecise indicators, these were then consolidated into a list of 111 indicators that are described in more detail in the Electronic Supplementary Material (ESM).

A series of iterative processes were then followed in order to gain a consensus of the nominal group with regards to the specific indicators that could be recommended as a mandatory set to describe and improve the quality and safety of care for any individual ICU. The members of the Safety and Quality Task Force agreed that this set should be applicable for any unit and not specific to any individual disease process or specialty. A consensual agreement was sought from 100% of the group wherever possible, although it was agreed a priori that any agreement above 90% would be sufficient to include an indicator in the final set. Any indicator that achieved a consensus of 100% was automatically included in the final set, and any indicator that had an agreement of less than 75% was excluded.

The iterative processes consisted of three online surveys that requested the views of each expert as to which indicators should be mandated into the set. Following the survey the experts’ responses were then fed back to each expert and compared against the whole group’s statistics in an anonymous fashion. The first phase allowed the expert to answer each question using a 5-point Likert scale (strongly agree, agree, neither agree nor disagree, disagree and strongly disagree). Consensus at this stage was defined as anyone answering as either agree or strongly agree to the question. The second phase then used a three-way descriptive response (strongly agree/agree/disagree), with either of the first two options being necessary for consensus and the final survey a binary answer (agree/disagree). Between these surveys, opinions were honed using email discussions and online cloud-based forums (Basecamp). Before the final decision was made, participants were copied into the arguments to either include or exclude a given indicator, and each participant was given the opportunity to change his/her position.


This study was performed in five phases that are described in Fig. 1. In the first phase of the study, 111 potential indicators were identified for discussion (ESM). These were then subsequently consolidated into 102 discrete entities that could be entered into phase 2 to be discussed and rated. At this point none of the indicators achieved the 100% consensus necessary to be automatically included in the final set, although 75 had a level of consensus of less than 75% and so were excluded. In a similar fashion, a further 13 indicators were discarded in phase 3, leaving a total of 23 potential indicators to enter phase 4. In phase 4, the potential survey answers were reduced to force the experts to make an agree/disagree decision. At this point, three indicators achieved the levels of consensus adequate to be included in the final set and a further ten were included in a fifth phase of discussion. A final set of nine indicators were then agreed upon, all of which had a greater than 90% agreement rate (Table 1). These indicators can be used to describe the structures (3), processes (2) and outcomes (4) of intensive care (Table 2). A detailed description of these indicators, including their formulae for calculation, is included in the ESM.

Fig. 1
figure 1

The five phases in the development of the set of safety indicators

Table 1 List of all indicators obtaining over 75% consensus from the group in the final stage of the Delphi Process
Table 2 Table describing the agreed definitions of the final set of Indicators

The four indicators that failed to reach the required threshold at the final iteration all had considerable support but were not able to reach the threshold agreement that was being sought for. These indicators include (1) continuing medical education according to national standards (2) the maintenance of bed occupancy rates of less than 90% (3) the endotracheal re-intubation rate within 48 h of a planned extubation and (4) the prevalence of ventilator-associated pneumonia (VAP). For each of these, disagreements on either the quality of the indicator (e.g. prevalence of VAP) or the specific cut-off level (e.g. endotracheal re-intubation rate within 48 h of a planned extubation) prevented a consensus from being reached.


This study has described the consensual agreement from a nominal group on a set of indicators that could be used to evaluate quality and to improve the safety of care provided to critically ill patients in ICUs. Nine indicators were agreed upon that could be used to improve the quality of care provided in any ICU in the world. Agreement was easier to reach on the indicators describing the structures and the outcomes relating to care than on the underpinning processes. This difference may reflect the multi-national status of the nominal group, which represented nine separate countries with very disparate healthcare systems and cultural traditions and will be especially relevant when considering how the different regions regard the registering, analysing and public disclosure of indicators to health authorities and to the general public.

Process usually refers to the way something is done (or fails to be done), with the creation of added value for patient care. This proposed set of indicators therefore describes the complex tasks performed by healthcare teams and their interactions with the patient and their family in order to achieve a given outcome. Considering that many process indicators can be easily collected in settings provided with patient management systems, the low number of process indicators included in this set may reflect the need to identify indicators suitable for application in units without such kinds of technologies available. Although we started with many different indicators that described the many different mechanisms for delivering and protocolizing care, it proved to be very difficult to translate these across borders and achieve consensus [12].

The IOM round table on Quality of Care described three main threats to the ability of any system to provide quality care, namely, underuse, overuse and misuse of care [6]. All of these aspects can result in safety threats to patients either at the individual level (usually failing to avoid harm or actively causing harm) or at the collective level (e.g. not using diagnostic, preventive or therapeutic measures in a way that is consistent with the state of the art) [4]. Until we start to routinely collect, disclose and compare safety and quality indicators, we cannot expect to continuously improve the safety and the quality of our practices. It has been reported that an indicator of safety or quality should be: important, valid, reliable, responsive, interpretable, feasible and underpinned by a robust evidence-based literature [13]. It should probably also focus more on the processes of care than on just outcomes [14]. This is exactly the opposite of what we were able to agree upon in this study where the processes were more difficult to both define and (subsequently) achieve consensus. Doctors, as well as patients and families, are mainly interested in the effects of treatment (outcome) and variables to be changed (structure). Processes are more difficult to define precisely despite emerging evidence that the adherence to bundled process-describing tools can improve outcomes [15], [16].

There has been considerable interest recently in the development of guidelines, protocols, bundles and checklists that could be used to reduce clinical variation and improve quality. These mechanistic technical approaches to a socio-cultural problem are not the complete answer [17]. The implementation and subsequent use of the indicator, or list of indicators, is a vital step towards improving performance. In order for this to happen, the indicators need to be uncontroversial, achievable and measurable and be believed to work.

This study started from a review of the literature and also a compilation of experts’ views and national databases. The list was then refined through a Delphi process. It remains possible that the original list was not complete and that some indicators were missed. Although we can never be 100% certain, we do not believe it is likely that an important indicator was missed at this stage which would have changed our final set. The final results of the Delphi process do reflect the opinion of the experts participating in the Task Force and, as such, could be challenged. However, due to the diverse geographical and cultural backgrounds of the group, it is likely that the final list is both relevant and representative of current international thinking.

We identified nine indicators that adhere to these principles. In addition, there are a number of other indicators that we identified that could also be used. We decided to use a rigid and high level of threshold to gain consensus (>90% of the group) in order to ensure that we ended up with a manageable set of indicators that were relevant across geographical and cultural boundaries and which could be used in practice without overloading practising teams. This has inevitably lead to several indicators being left out of the final set that some clinicians may find as a surprise. One such indicator is the calculation of the rate of VAP. Although there is considerable support for this as a marker of quality, there still remains some doubts as to the best way of defining the entity and also whether it should then be used to change practice [18].

We deliberately chose to keep the indicators relatively simple and straightforward in order to aid utilization and uptake. Many of the indicators could be criticized for this approach; however, the view of the group was that the use and review of the data acquired through the use of these sets would in itself improve quality. This is perhaps most obvious for the indicator describing the calculation and review of the standardized mortality ratio (SMR). The SMR as a single number does not describe the whole situation (a normal SMR may be the result of the alternation of bad and good performance across different risk classes) and will not change performance. The fact that a unit actually measures, reviews and reflects on its performance through this indicator, however, should lead to an improvement in processes and outcomes.

The implementation of this set of quality indicators into clinical practice will require considerable ‘buy-in’ from clinical teams and the willingness to review practice and change in accordance with the findings. This work is therefore only the first step in a performance improvement process. The next step will be to use these sets of measures in clinical practice and to test the hypothesis that their use is associated with a high level of quality of care and can result in improved patient outcomes and satisfaction when adhered to.