Background

Central venous catheterization is a procedure that is commonly performed, with an estimated 15 million central-line-days per year in the intensive care units in U.S. hospitals (Mermel 2000). Because training using simulation has been previously shown to be associated with improved performance outcomes as well as clinical outcomes (Ma et al. 2011; Barsuk et al. 2009b), multiple institutions have implemented simulation-based training programs (Ma et al. 2011; Cook et al. 2011). These training programs require significant human and material resources (Ogden et al. 2007). Thus, to evaluate the return on such departmental investments, assessment tools that yield valid and reliable data are needed in order to evaluate procedural competence of those who underwent training (Evans and Dodge 2010).

For the assessment of technical skills, traditionally, there have been two general approaches: either using checklists or global rating scales; a combination of both approaches may also be considered (Lammers et al. 2008). A checklist consists of a list of observable behaviors organized in a consistent manner, which then allows the evaluator to record the presence or absence of the demonstrated behavior (Hales et al. 2008). Global rating scales, on the other hand, use a Likert scale for rating either an overall impression of the performance or on individual items within a performance (Bould et al. 2009).

Because steps in a procedure are often sequential and predictable, it is felt that checklists may be better suited for the assessment of technical skills, as they are felt to be more objective than global rating scales (Lammers et al. 2008; Evans et al. 2005). However, the pitfalls of using checklists have been extensively debated in the health professional education literature (Norman et al. 1991; Van Der Vleuten et al. 1991; Hodges et al. 1999; Swartz et al. 1999; Epstein and Hundert 2002). In the hands of expert raters, global rating scales may in fact demonstrate better psychometric properties than checklists (Hodges and McIlroy 2003; Regehr et al. 1998; Ma et al. 2012). Despite this, checklists continue to be commonly used in the assessment of procedural skills. For central venous catheterization, in 2009 alone, there were seven publications that included assessment tools, each of which used a checklist (Evans and Dodge 2010).

In the evaluation of any skill, a clear understanding of the underlying task is critical. Items in the assessment tool should be both relevant and representative of the task in question (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. Standards for educational and psychological testing. 1999). In a systematic review of checklists for procedural skills in general, seven themes were identified (McKinley et al. 2008). These include: 1) Procedural competence, 2) Preparation, 3) Safety, 4) Communication and working with the patient, 5) Infection control, 6) Post-procedural care, and 7) Team-working. In this review, a third to a half of the checklists did not assess for key competencies in the domains of “infection control” and “safety” (McKinley et al. 2008). Unfortunately, incompetence in these same domains has significant adverse clinical consequences. Therefore, it may be problematic to simply borrow an existing published tool and assume that it would evaluate procedural competency accurately.

The objective of this study is to review existing assessment tools for rating central venous catheterization and determine the individual steps and key competencies evaluated by these tools. This information can help 1) better define the underlying task of central venous catheterization itself, and 2) assist evaluators in deciding which tools to use. To accomplish the above objective, we conducted a systematic review of published evaluation tools used during direct observation of performances of central venous catheterization. We used the database of our recently published systematic review of simulation-based education on central venous catheterization (Ma et al. 2011) as the basis of this current study.

Results

Search results and article overview

Our previous search strategy from our systematic review (Ma et al. 2011) yielded 110 articles (Figure 1). These 110 articles resulted from excluding 1,241 articles from the initial search of 1,351 citations, (kappa 0.87; 95% CI 0.82-0.92).

Figure 1
figure 1

Flow diagram of study selection process.

In this review, from these 110 publications, 75 articles were excluded (Figure 1). Agreement for this stage was high (kappa 0.82; 95% CI 0.71-0.93). Thus, 35 articles were considered for review. Of the 35 articles, an additional 10 articles were excluded (kappa 0.85; 95% CI 0.66-1.00). A final pool of 25 publications was included in this systematic review. Figure 1 illustrates the results of the study selection process.

Baseline description of tools

Overall, a total of 147 items were included in the assessment tools in 25 studies (Additional file 1). Median number of items included per study was 17 (IQR 8–22; range 2–63). All studies (100%) reported using checklists (using at least one binary item for assessing central venous catheterization skills). Only six studies reported also using global rating scales (Britt et al. 2009; Huang et al. 2009; Lee et al. 2009; Millington et al. 2009; Murphy et al. 2008; Ramakrishna et al. 2005). Other baseline characteristics of the tools are listed in Table 1.

Table 1 Baseline characteristics of 25 studies describing directly observed central venous catheterization performances

Procedural checklists

Except for two studies, checklist items were scored in a binary fashion in general. One study (Ramakrishna et al. 2005) used a Likert scale of 1–5 (1=”very unsatisfactory”; 3=”neutral”; 5=”very satisfactory”) to score the seven items in the checklist, while the other study (Rosen et al. 2009) used a behaviorally anchored scale of 0–5 with a descriptor for each score to rate each of the 22 checklist items: (0=”displays complete unfamiliarity with the step, needs visual and verbal instruction in order to perform the step [‘stumped’], or omits step completely”; 5 = “executes procedure step independently, smoothly, with total confidence, and without error.”) The remaining studies scored checklist items in a binary fashion.

Thematic content of checklist items

There were 11 checklists applied to assessments of procedural performances on simulators (simulation checklists) and 14 checklists applied to assessments of procedural performances on patients (clinical checklists) (Table 1).

Clinical checklists had a higher percentage of items representing “Preparation” and “Infection control” than simulation checklists (67 ± 26% vs. 32 ± 26%; p = 0.003 for “Preparation” and 60 ± 41% vs. 11 ± 17%; p = 0.002 for “Infection control”, respectively). Simulation checklists, on the other hand, had a higher percentage of items on “Procedural competence” than clinical checklists (60 ± 36% vs. 17 ± 15%; p = 0.002).

Representation and underrepresentation of themes

A number of checklists were comprehensive in their representation of themes (Table 2). For example, six checklists (20%) contained at least one item in each of the seven domains (Barsuk et al. 2009a; Barsuk et al. 2009c; Evans et al. 2009; Huang et al. 2009; Wall et al. 2005; Dong et al. 2010). “Preparation” and “Infection control” were assessed in most checklists: only three checklists (12%) contained no items on “Preparation” (Blaivas and Adhikari 2009; Carvalho 2007; Stone et al. 2010) and only four checklists (16%) contained no items on “Infection control” (Blaivas and Adhikari 2009; Carvalho 2007; Kilbourne et al. 2009; Stone et al. 2010).

Table 2 Themes represented by checklist items in 25 studies with checklists

Other themes were less well-represented by checklists: 13 checklists (52%) contained no items on “Team working”(Lee et al. 2009; Lobo et al. 2005; Millington et al. 2009; Murphy et al. 2008; Rosen et al. 2009; Ramakrishna et al. 2005; Blaivas and Adhikari 2009; Carvalho 2007; Stone et al. 2010; Kilbourne et al. 2009; Coopersmith et al. 2002; Xiao et al. 2007; Yilmaz et al. 2007); 14 checklists (56%) contained no items on “Communication and working with the patient” (Berenholtz et al. 2004; Blaivas and Adhikari 2009; Britt et al. 2009; Carvalho 2007; Coopersmith et al. 2002; Kilbourne et al. 2009; Lobo et al. 2005; McKee et al. 2008; Millington et al. 2009; Papadimos et al. 2008; Stone et al. 2010; Velmahos et al. 2004; Xiao et al. 2007; Yilmaz et al. 2007); seven checklists (28%) contained no items on “Post-procedure” (Ramakrishna et al. 2005; Blaivas and Adhikari 2009; Carvalho 2007; Stone et al. 2010; Kilbourne et al. 2009; Xiao et al. 2007; Yilmaz et al. 2007); seven checklists (28%) contained no items on “Safety” (Berenholtz et al. 2004; McKee et al. 2008; Millington et al. 2009; Papadimos et al. 2008; Ramakrishna et al. 2005; Xiao et al. 2007; Yilmaz et al. 2007); and six checklists (24%) contained no items on “Procedural competence” (Coopersmith et al. 2002; Costello et al. 2008; Lobo et al. 2005; McKee et al. 2008; Xiao et al. 2007; Yilmaz et al. 2007).

Global rating scales and additional items assessed

Only six studies reported the use of global rating scales (Britt et al. 2009; Huang et al. 2009; Lee et al. 2009; Millington et al. 2009; Murphy et al. 2008; Ramakrishna et al. 2005), all of which were used in conjunction with checklist items (Table 3). The median number of items assessed was 2 (IQR 1–5; range 1–7). Additional items assessed frequently included number of attempts and time taken to perform the procedure (Table 4).

Table 3 Global rating scale assessed
Table 4 Additional items assessed

Validity and reliability evidence for the assessment tools

Inter-rater reliability was reported for 12 (48%) of the studies (Barsuk et al. 2009a; Barsuk et al. 2009c; Dong et al. 2010; Evans et al. 2009; Huang et al. 2009; Lee et al. 2009; Millington et al. 2009; Murphy et al. 2008; Rosen et al. 2009; Kilbourne et al. 2009; Stone et al. 2010; Xiao et al. 2007), reporting a range of reliability coefficients and absolute agreement [range 0.43 (Millington et al. 2009) to 0.97(Evans et al. 2009)]. Only 12 studies (48%) specified the process used for content validation (Velmahos et al. 2004; Barsuk et al. 2009a; Barsuk et al. 2009c; Costello et al. 2008; Dong et al. 2010; Evans et al. 2009; Huang et al. 2009; Lee et al. 2009; Rosen et al. 2009; Wall et al. 2005; Kilbourne et al. 2009; Coopersmith et al. 2002).

Discussion

Our study identified 25 published tools for the assessment of procedural skills in central venous catheterization. All of these tools used at least one item that is scored in a binary checklist fashion and only six studies reported using a global rating scale.

Our study identified that only 20% of the assessment tools incorporated at least one item in each of the seven key procedural competence domains; the majority of tools did not assess for competency in the domains of “Team working” and “Communication and working with the patient.”

In an effort to improve clinical outcomes through the use of simulation-based training, trainers need to be mindful of assessing domains that have implications on patient safety, such as “Team working”, “Safety” and “Infection control.” Therefore, the tool, wherever possible, should strive to aim for including items in as many of the seven key competency domains as possible. Failing the ability to assess the procedure in a systematic and comprehensive manner, consideration should be made towards using a global rating scale instead.

Not every tool is created equally. Tools are frequently created with specific purposes in mind. Thus for an evaluator wishing to borrow a pre-existing assessment tool from the published literature for the purposes of assessments, this study provides a comprehensive list of assessment items to facilitate educators and assessors in choosing an appropriate tool.

There are some limitations in this systematic review that impact on the interpretation of our study’s conclusions. First, despite our systematic review including only publications that included an educational intervention, the assessment purposes of the studies were not uniform. Tools designed to be used by nurses for the purposes of documenting infectious risks only or tools designed for the purposes of assessing performances on simulators are unlikely to be as comprehensive as tools designed to assess for overall competence of procedural skills on patients. Indeed, our results suggest that clinical checklists were more focused on steps involving preparation and infection control than simulation checklists, while simulation checklists were more focused on procedural technical competence itself. Therefore, the contextual features of each published tool are important to recognize, since ultimately, validity of any assessment tool refers to the “degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests” (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. Standards for educational and psychological testing. 1999). Second, despite contacting authors to obtain the actual checklists, although a number did provide these (Wall et al. 2005; Lobo et al. 2005; Costello et al. 2008), a few studies were excluded because of a lack of response from the authors.

Despite these limitations, this study has a number of strengths. By providing a systematic and comprehensive evaluation and description of existing tools on central venous catheterization, this study facilitates educators, researchers, or hospital administrators wishing to use, study or develop assessments tools on assessing for competency in this procedure. Furthermore, this study compiles, for the first time, a “catalog” of all the potential aspects of the procedure that could be assessed (see Additional file 1). This “catalog” represents the end product of work from multiple groups using various methods such as cognitive task analysis, literature review, and expert panels.

Conclusions

In conclusion, in this systematic review of published assessment tools on central venous catheterization, we present a comprehensive list of assessment items. We found that the use of procedural checklists far outnumber the use of global rating scales. The majority of these tools did not assess for competency in the domains of “Team working” and “Communication and working with the patient.” Lastly, the rigor in which the tools were developed greatly varied.

Methods

Data sources and search strategy

The search strategy was previously published (Ma et al. 2011). In short, searches for relevant articles published between January 1950 and May 2010 were conducted on the following databases: PubMed, MEDLINE, Education Resource Information Center (ERIC), the Cumulative Index to Nursing and Allied Health Literature (CINAHL), Excerpta Medica, and Cochrane Central Register of Controlled Trials. Our search strategy was developed with the assistance of a research librarian and used the following keywords: catheterization, central venous; catheterization; catheter$; jugular veins; subclavian veins; and femoral veins. These terms were searched as subject headings, medical subject heading, and text words, and combined with the Boolean operator “and” with education terms. Education terms used were: education; learning; teaching; and teach$. We did not place a language restriction on the search. The initial screening of search results was done independently by two authors (I.M., M.B.), using titles and abstracts. Additional hand search for references in included articles and relevant review articles was conducted. From this initial search (Ma et al. 2011), citations that were clearly not primary research, involved animal studies, or did not involve an educational intervention were excluded. For the remaining citations, full-length articles were retrieved.

Selection of articles

From these full-length articles, we included primary research articles that described the assessments of central venous catheterization skills under direct observation. That is, we excluded articles where the procedures were performed without anyone observing the procedures. We also excluded studies on peripherally-placed venous access devices as well as studies without an educational intervention. Articles that did not provide an assessment tool or articles that did not include descriptions of assessment items were excluded. For studies where only descriptions of assessment items were reported without provision of the assessment tool, we contacted the authors to obtain the full tool. Selection of articles was done independently by two authors (I.M., N.S.), with disagreements resolved by consensus.

Data extraction

Independent data abstraction on baseline characteristics of each study was performed by two authors (IM, NS) using a standardized data form. Information on learner population, observers, and tools was obtained from each publication. We also abstracted information on whether or not the tool was used on patients (clinical) or on simulators.

We defined any item scored an observable action item in a binary fashion (y/n) as being part of a “checklist,” whether or not the authors specified the use of the tool as a “checklist.” For example, if “need for help from senior resident”(Velmahos et al. 2004) is routinely assessed in the observed performances, this item is considered to be one of the checklist items. Checklist items scored in a non-binary fashion are also included. We defined global rating scale items as those that use a Likert scale for rating either an overall impression of the performance or on individual qualities within the performance (Bould et al. 2009).

Classification of items into seven competency themes

Each checklist item was classified by two authors (IM, MB) according to one or more of the seven competency themes previously identified (McKinley et al. 2008): 1) Preparation, 2) Infection control, 3) Communication and working with the patient, 4) Team working, 5) Safety, 6) Procedural competence, and 7) Post-procedure.

Disagreements were resolved by consensus. Items may be classified into more than one theme. For example, an item on obtaining informed consent was classified into both “Preparation” as it involves assessing for indications and contraindications for the procedure (McKinley et al. 2008) as well as “Communication and working with the patient,” which involves sharing information about the procedure with the patient (McKinley et al. 2008).

We defined “Preparation” as any steps prior to the breach in patient skin (i.e. administration of anesthetics or insertion of needle). Steps after the administration of anesthetics but before securing of the catheters were considered part of “Procedural competence.” Lastly, we defined any steps including or after securing the catheter as “Post-procedure,” such as placement of dressing, obtaining chest x-rays, documentation of procedure, and equipment clean-up.

Immediate complications are included as assessment items only if they are part of the directly-observed evaluation. For example, carotid puncture, pneumothorax, hemothorax, malignant arrhythmia, and number of needle passes. Long-term complications such as catheter-related infections are excluded, as these “distal” outcomes may or may not be directly related to the learner performance.

Statistical analysis

Data were analyzed using standard parametric and non-parametric methods. Comparisons of continuous variables between groups were performed using Student’s t-tests. Inter-rater agreement in study selection is estimated by the kappa statistic. All analyses were performed using SAS version 9.2 (SAS Institute Inc., Cary, NC, USA) and Stata 11.0 (StataCorp LP, College Station, TX).