Development and validation of a taxonomy of adverse handover events in hospital settings

To develop and validate a taxonomy to classify and support the analysis of adverse events related to patient handovers in hospital settings. A taxonomy was established using descriptions of handover events extracted from incident reports, interviews and root cause analysis reports. The inter-rater reliability and distribution of types of handover failures and causal factors. The taxonomy contains five types of failures and seven types of main causal factors. The taxonomy was validated against 432 adverse handover event descriptions contained in incident reports (stratified random sample from the Danish Patient Safety Database, 200 events) and 47 interviews with staff conducted at a large hospital in the Capital Region (232 events). The most prevalent causes of adverse events are inadequate competence (30 %), inadequate infrastructure (22 %) and busy ward (18 %). Inter-rater reliability (kappa) was 0.76 and 0.87 for reports and interviews, respectively. Communication in clinical contexts has been widely recognized as giving rise to potentially hazardous events, and handover situations are particularly prone to failures of communication or unclear allocation of responsibility. The taxonomy provides a tool for analyzing adverse handover events to identify frequent causes among reported handover failures. In turn, this provides a basis for selecting safety measures including handover protocols and training programmes.


Introduction
In recent years, there has been an increased focus on patient safety during patient handovers. If a patient handover is carried out improperly so that wrong or inadequate information is received, or important information is missing, or responsibility for care of the patient becomes unclear, the patient may suffer serious harm.
Several studies have shown that handovers are associated with adverse events (Arora et al. 2005;Pezzolesi et al. 2010;Cohen and Hilligoss 2010) and initiatives have recently been launched to reduce adverse events associated with handovers, including an extensive programme introduced by the Australian Commission on Safety and Quality in Healthcare to develop and improve clinical handover communication (ACSQHC 2011). Similarly, the WHO Patient Safety Alliance has identified communication failures during patient handovers as well as medication accuracy at transitions in care as part of its High 5-s initiatives (WHO 2007).
The study reported in this paper has aimed at developing and validating taxonomy to support the analysis and classification of adverse events related to patient handovers. accidents and types of direct and indirect causes (or 'contributory factors') (Wiegmann and Shappel 2003;Taylor-Adams and Vincent 2004;Itoh et al. 2009;Mikkelsen et al. 2013). Nevertheless, communication failures are heterogeneous (Clark 1996;Sperber and Wilson 1995) and any single episode of communication, even in a highly circumscribed environment such as professional work in hospitals, will typically involve a rich set of (tacit) background assumptions. Moreover, handovers serve functions that go beyond the mere transmission and uptake of discrete items of information, e.g., exploration of uncertainties (Wilson et al. 2010;Lingard et al. 2011;Patterson et al. 2007). Therefore, a taxonomy to support the classification of communication failures cannot be too detailed if it is to be useable.
A large number of taxonomies have been developed for analysis of accidents and incidents, nearly always with the explicit purpose of supporting learning from accidents. General taxonomies for healthcare include the World Health Organization's classification scheme for adverse events (Runciman et al. 2006(Runciman et al. , 2009Kaplan et al. 1998;Mikkelsen et al. 2013) and the London Protocol (Taylor-Adams and Vincent 2004; Woloshynowych et al. 2005). Specific ones have been developed targeted at, inter alia, anesthesia (Marcus 2006); intensive care unit (Pronovost et al. 2008); general practice (Rubin et al. 2003); and surgery (Antonacci et al. 2008a, b). Nearly, all modern taxonomies for accident or incident analysis are based on a systems view of mishaps (Reason 1998), one of the earliest examples of which is Rasmussens' taxonomy targeted at capturing humanmachine interaction failures (Rasmussen 1982).
A taxonomy to classify incidents and accidents serves the overall goal of improving safety by supporting learning from experience. The specific goals of a taxonomy are to support (a) case analysis by providing a conceptual framework in terms of which events and relations may be captured and linked and (b) the establishment of a database of cases that are indexed in terms of the taxonomic categories used so that it becomes possible to identify causal patterns across a (possibly large) number of 'similar' cases. A further benefit of using a database of events structured in terms types of failures and possibly types of causes is that it enables users who are dealing with a concrete case or safety issue with the means of retrieving from the database 'similar' cases or safety issues. This feature is particularly useful when the database contains not only information about the causal and demographic categories but also possible recommendations about intervention and evaluation results of intervention.

Materials
Having considered using more generic taxonomies to analyze handover events, the authors concluded that a classification system specifically targeted at handovers would be needed to capture the types of failures and factors involved. For the development of the taxonomy, descriptions of adverse events were drawn from two sources. First, adverse event reports submitted to the Danish Patient Safety Database (DPSD) were retrieved. The database, created in 2004 when the law on patient safety introduced the first national, non-punitive and mandatory reporting system of adverse events and receiving about 25,000 reports per year, contains anonymized descriptions of adverse events. Reports are supplied by healthcare professionals, possibly edited or supplemented by a local patient safety manager (Bjorn et al. 2009). The original DPSD sample (N = 3,246) retrieved comprised all reports submitted to the DPSD by one of the five regions in 2007 and classified as 'breach of continuity of care' or 'failure of communication or confounding' (being two of the nine categories under which reports were classified until 2010). From the original sample, a random sub-sample was made (200 events) comprising reports with a SAC score of 2 or 3 (18 % of the original sample). The SAC score (severity assessment code) is a widely used risk matrix for assessing the degree of risk of any given adverse event, where '3' denotes serious events and '1' less serious ones (Bagian et al. 2001). The decision to include SAC [1 reports was based on the assumption that longer reports contain more information about causes of the adverse event reported (median lengths of SAC2/3 and SAC1 reports in the original sample are 131/178 and 91 words, respectively). Event reports from the DPSD would sometimes describe several independent adverse events in which different handover situations were described. In such cases, individual events were differentiated and extracted as independent events (178 event reports yielded the sample of 200 independent events).
The second sample to support development and test was drawn from an interview study of handover adverse events. The study, using the critical incident interview technique to gather information about handover problems (Siemsen et al. 2012), collected descriptions of adverse events related with handovers from 47 individual interviews with staff members (23 nurses, three nurse assistants, 13 physicians, five paramedics, two orderlies, one radiographer) from different departments and units: the emergency department, two medical and two surgical departments, an intensive care unit, a radiology unit, the orderly unit and two ambulance stations. Both senior and junior staff members were included from each unit.
The interviewees, who were promised anonymity when recruited, received oral and written information about the study emphasizing that the goal of the interview was to obtain a comprehensive picture of the interviewee's subjective perceptions of critical episodes experienced at first-hand during patient handovers. Interviews were conducted by two interviewers, one or both with healthcare background.
In the 45 interviews (two pilot interviews were excluded), adverse events described by interviewees were transcribed, yielding a sample of 232 separate critical event descriptions. In addition, a small sample of descriptions of individual events (n = 12) related to handover described in root cause analysis was used to ensure that the taxonomy was not too coarse-grained but capable of capturing causal factors identified through more detailed accident analysis.

Methods
An iterative development of the taxonomy was performed by the authors during which revised versions of the taxonomy were codified, followed by test of the revised version of the taxonomy against 'new' events drawn from the DPSD and the interview samples. Event descriptions were drawn randomly from each sub-sample for each iterative development. To reduce the risk that the taxonomy would be biased by the kind of event descriptions contained in the DPSD database, event descriptions drawn from interviews and root cause analyses (as described above in Materials) were used as well. The main structure of the taxonomy remained the same during the development.
The process of development followed the standard inductive approach of quality assurance of design: based on meeting notes, one of the authors (HBA) would elaborate a new version of the proposed categorization of failures and factors including a definition of each of the categories. The proposed division into and definition of categories were then tested by the authors against samples of 5-20 reports drawn randomly from the original sample of DPSD reports or interviews, where each sample was analyzed, first independently and next during consensus classification by two of the authors (IMS, LFP). In case of disagreement, all authors contributed until consensus was reached. This process of iterative refinement of definitions of the taxonomic categories was carried out for more than ten iterations. The development phase was concluded when the team was satisfied that each of the definitions was sufficiently precise and that they, as a whole, were sufficiently comprehensive to capture relevant distinctions among handover events.

Results
The taxonomy consists of two groups of categories, active failures and causal factors. Failures are divided into types of handover failures that include acts of miscommunication and refused, unclear or deferred responsibility among healthcare staff in relation to patient handovers (Table 1; ''Appendix''). Inadequate communication is divided into communication related to and not related to tests, each of which is divided into sub-types. Inadequate communication not related to test comprises omissions as well as unsuccessful acts of communication. But it also comprises the failure to address given aspects of patient care, for instance, the failure to ask relevant questions or to address aspects about the patient that, according to accepted standards of care, should have been explored.
Causal factors associated with types of failure comprise seven main groups including deviation from procedure or guideline and inadequate professional competence or knowledge of tasks (Table 1; The test phase of the taxonomy development involved a two-part test of the usability and reliability of the classification system against two separate samples of adverse events. Two of the authors (IMS, LFP) performed, independently, a classification of two samples: A random sample of 200 incident reports (SAC score [1) from the DPSD database, and a sample of 232 handover incident descriptions contained in the transcriptions of face-to-face interviews with hospital staff. Of the sample of 200 DPSD reports, 40 were used for consensus discussions, so only 160 were used for prior and independent classification to assess inter-rater reliability.
Inter-rater reliability of classification into the five failure types showed kappa values of 0.76 (DPSD) and 0.87 (interviews). A chance-corrected assessment of inter-rater reliability such as the kappa statistics of causal factors of the taxonomy is not possible, since this statistics cannot be applied to inclusive (overlapping) classification, when a given incident may be assigned to more than one causal category. Pairwise agreement for two raters may be defined as the number of agreed category assignments to a given event divided by the number of assignments of either rater (agreement/agreement ? disagreement). This corresponds to the likelihood, for any event, that if one of the raters has assigned a given failure type or causal type to the event, then the other rater has done so as well. Pairwise agreement of the five main of all causal factors was 62 % (for types of failures it was 81 %).
The distribution of causes for each of the failure types and sub-types is shown in Table 2. To avoid clutter 'deviations from procedures' is shown as a single column: Only four events (1 % of sample) were distinguished further into individual or organizational deviation from procedure. The most prevalent causes of adverse events are inadequate competence (30 %), inadequate infrastructure (22 %) and busy ward (18 %). Communication failures related to and not related to tests accounted for 33 and 44 %, respectively, of failures of all types.

Discussion
Inter-rater reliability of the taxonomy is satisfactory for the part that may be assessed by a chance-corrected statistics. Most authors follow the interpretation of kappa values suggested by Landis and Koch: in the range 0.01-0.20 slight agreement; 0.21-0.40 fair agreement; 0.41-0.60 moderate agreement; 0.61-0.80 substantial agreement; and 0.81-0.99 almost perfect agreement (1977). Fleiss suggests that a kappa above 0.75 is excellent (1971). The achieved kappa between 0.76 and 0.87 indicates therefore that the taxonomy is reasonably robust with respect to the divisions into types of failures. A pairwise agreement rate of 62 % is not impressive, but may be regarded as satisfactory, as this method of computing agreement tends to yield lower quotients (Martin and Bateson 1993).
To assess an incident taxonomy, one should, besides determining reliability, ask if the categories contained are suitable and at an appropriate level of granularity (Wiegmann and Shappel 2003). Our distinction within failure types between communication related to and not related to tests may appear ad hoc. However, the settings for each of these failures are typically quite different, since handover communication related to tests is predominantly written and schematic, whereas handover communication not related to tests takes place in a clinical setting and is often face-to-face and oral. Moreover, the distinction makes it possible to identify possible differences in causal factors behind different types of communication failures-and therefore the possibility of different types of interventions to reduce their occurrence (Siemsen et al. 2012).
Data about handover incidents are predominantly retrospective (incident reports, root cause analyses), and therefore, they typically contain few details about dialog taking place during handover. Hence, taxonomies that are useful for prospective studies (Lingard et al. 2004) in which communication patterns may be recorded and coded by observers are less suitable for analyzing the more sparse data of incident reports. Arora et al.'s (2005) taxonomy directed at handover communication distinguishes between 'content omission' and 'failure-prone communication processes.' The former is similar to our sub-type 'communication omission,' which is reserved for those situations where, as far as the evidence goes, no specific parameter was misstated or misheard, but the staff members involved [1] [0] [1] [0] [1] [3] [1] [0] [6] Abnormal finding not alerted [4] [1] [1] [0] [2] [3] [1] [0] [11] Received but not followed up [5] [0] [2] [0] [2] [5] [7] [0] [19] (C) Refusal of/ unclear responsibility Numbers in the second column denote all events of the given type or sub-type. Some events have no specific cause that may be identified from the description provided and hence are not assigned a cause; therefore, the number of events of a given type may be larger than the horizontal sum of that row (right-most column). Numbers in brackets represent sub-types and sub-factors and are therefore not included in sums Cogn Tech Work in the handover do not address overall patient needs, e.g., acuteness (''Appendix'').
The causal factors contained in the taxonomy are relatively coarse and not nearly as detailed as, for instance, the classification system of the WHO Patient Safety Alliance (2007). The reason for not dividing causal categories into greater and more precise ones is the same as above: data about handover events described in incident reports and interviews (e.g., interviews that supply information to root cause analyses) will typically not contain information that allows for identifying sub-categories of causal factors with any certainty. Similarly, most incident taxonomies allow that a given failure may be assigned to more than one causal factor (WHO 2007;Wiegmann and Shappell 2003;Woloshynowych et al. 2005;Pronovost et al. 2008), though not all (Kaplan et al. 1998). The drawback is that there is no widely agreed method of assessing inter-rater reliability or agreement-chance corrected or not-when causal categories overlap (Olsen and Shorrock 2010).
The taxonomy in its original version and as tested distinguished 'deviation from procedure' into an optional division between individual and organizational causes of deviations from procedures ( Table 2). The term 'individual' is meant to suggest that individual training or instruction will be the most direct remedy against repetition of failure, whereas 'organizational' is meant to suggest that the deviation is customary practice in the group or clinic (similar to 'routine violations' as described by J. Reason (1998). However, as the two samples used for validating the taxonomy contained very few incident descriptions that allowed this distinction to be applied (1 % of the sample), we conclude that the additional effort in seeking to make use of the distinction does not justify its inclusion.
The causal factor 'inadequate competence' is the most widely applied factor (30 %). Pronovost et al. (2008) find in their analysis of ICU incident reports that (inadequate) 'training and education' is a contributory factor in 49 % of all cases, but their category also includes 'failure to follow established protocol,' which overlaps with our 'deviation from procedure.' Within 'inadequate competence,' our taxonomy offers a distinction between 'individual' and 'organizational' contribution to the failure-again, a distinction that is based on the assumption that different interventions are called for, depending on whether the staff available do not have the competence that may be expected given their qualifications or whether the staff available do not have the qualifications required for the tasks at hand. The information supplied in the event descriptions was found to be sufficiently detailed to allow for this distinction to be applicable: We found in about one-third of all cases of inadequate competence enough details to determine that the causal contribution was either organizational or individual, approximately evenly divided (Table 2). Using a somewhat different approach, Pezzolesi et al. (2010) have analyzed and classified handover incidents from a UK hospital in terms of the type of scenario they exemplify, e.g., 'poor/ incomplete handover' which refers to situations that 'essential elements of patient's care' are not handed over. The authors' results are complementary to ours, except they find that in 29 % of their cases, a patient is admitted to a ward without the staff being informed. We had no such case in our samples, and we do not know why this difference exists between the UK sample and our Danish samples.
The subjective experience of the authors of the workload required in applying the taxonomy is that considerable efforts are spent on seeking to apply distinction below the first level of types and categories (i.e., to what we have called sub-types and sub-factors). Distinctions within failure types and causal factors that are theoretically justified may therefore not necessarily be worth applying, unless the information available clearly justifies their use.
There are limitations of this study. A chief limitation is that the validation was carried out with the involvement of the development team. This dual role of developer and validator may have had the consequence that the raters developed a tacit understanding of how to apply the categories that they helped in naming and defining. A more rigorous test of the taxonomy would require an independent team of raters to apply, independently, the taxonomy to a body of incident descriptions. Another potential limitation of the study is the selection of adverse event data. There is no reason to believe that the DPSD reports used are not representative of reports collected through reporting schemes, nor do the settings for the critical incident interviews appear to invite any bias (Siemsen et al. 2012). But the level of detail in the data about handover failuresboth in respect of communication failures and fuzzy allocation of responsibility-may be presumed to be greater if prospective data were used, e.g., derived from an observational study.

Conclusion
Patient handovers are potentially hazardous situation failures, and so far, there has been no dedicated taxonomy that captures both types of failures and causal factors. The proposed taxonomy, being at a medium level of detail, has been shown to be useable and robust and is therefore suitable for capturing targeted at incident reports and narratives.
To spend resources-that might perhaps be spent more profitably on other safety enhancing work-on classifying incident reports into a system-based taxonomy is justified only if it enables efficient production of knowledge needed for reducing risk to patients. Results of such classification will show patterns of causal factors, but knowledge about how to avert failures does not necessarily come from scrutinizing classification results from a database of incidents. Rather, the diversity of handover situations and handover mishaps is such that insights into what goes wrong and how to counteract failure mechanisms must be based on the recognition that different types of failures and different kinds of causal antecedents may require different kinds of interventions. Therefore, a chief function of a taxonomy such as the present lies in its ability to deliver, when used to stoke a database, to safety managers cases that are sufficiently 'similar'-i.e., cases in which narratives and analysis reveal similar failures and similar causes. continued Category Explanation (F2) Inadequate competence individual A staff member involved is nominally qualified by position and formal professional qualifications for the handover-related tasks at hand, but shows inadequate competence in carrying these out (G) Memory lapse or slip Lapse of memory or action slip not related to knowledge and competence (H) Inadequate procedures or guideline Procedure, guideline or instruction is missing, inadequate or wrong (I) Problems with physical or functional infrastructure Problems with physical or functional infrastructures, including access to or availability of records and other information (J) Busy ward or interruptions Staff members required for handover-related tasks are not available due to busyness or become interrupted while involved in such tasks (K) Crowded ward Insufficient room/space, beds or physical facilities to accommodate or aid patients