Background

Nurses play an important role in ensuring optimal health outcomes by engaging in evidence-informed decision making (EIDM). EIDM, used synonymously with the term evidence-based practice (EBP) [1] involves “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients” [2] (p. 71). The use of the word ‘informed’ in EIDM denotes that research alone is insufficient for clinical decision making and cannot take precedence over other factors [3]. Evidence in this regard then, is defined as credible knowledge from different sources including research, professional/clinical experience, patient experiences/preferences, and local data and information [4, 5]. There are numerous examples of improved patient outcomes following implementation of best practice guidelines such as reductions in length of hospital stay [6] and adverse patient events related to falls and pressure ulcers in long-term care settings [7].

Despite knowledge of such benefits, competency gaps and low implementation rates in EIDM persist among nurses across diverse practice settings [8,9,10]. A barrier to EIDM implementation has been the lack of clarity and understanding about what nurses should be accountable for with respect to EIDM as well as how it can be best measured [11, 12]. As such, considerable effort has occurred in the development of EIDM competence measures as a strategy to support EIDM implementation in nursing practice [12].

EIDM competence attributes of knowledge, skills, attitudes/beliefs, and behaviours have been well defined in the literature. EIDM knowledge is an understanding of the primary concepts and principles of EIDM and hierarchy of evidence [13,14,15,16,17]. Skills in EIDM refer to the application of knowledge required to complete EIDM tasks (e.g., developing a comprehensive strategy to search for research evidence) [13,14,15,16,17]. Attitudes and beliefs related to EIDM include perceptions, beliefs, and values ascribed to EIDM (e.g., belief that EIDM improves patient outcomes) [13, 15]. EIDM behaviours are defined by the performance of EIDM steps in real-life clinical practice (e.g., identifying a clinical problem to be addressed) [13, 15, 17].

Multiple uses for measures assessing EIDM competence attributes in nursing practice and research exist. Such measures can be integrated into performance appraisals [18] to monitor progressive changes in overall EIDM competence or specific domains. At an organizational level, EIDM competence standards can support human resource management by establishing clear EIDM role expectations for prospective, newly hired, or employed nurses [18, 19]. With respect to nursing research, there has been great attention afforded to the development and testing of different interventions to increase EIDM knowledge, attitudes, skills, and behaviours among nurses [20,21,22]. The use of EIDM competence instruments that produce valid and reliable scores can help to ascertain effective interventions in developing EIDM competence areas.

Previous systematic reviews have focused on EIDM competence attribute measures used among allied health care professionals [13, 16, 23] as well as nurses and midwives [14]. However, several limitations exist among these reviews. A conceptual limitation is that many reviews included research utilization measures despite stating a focus on EIDM [13, 14, 23]. Research utilization, while considered a component of EIDM, is conceptually distinct from it. Research utilization includes the use of scientific research evidence in health care practice [24]. While, EIDM encompasses the application of multiple forms of evidence such as clinical experience, patient preferences, and local context or setting [5]. Conceptual clarity is of critical importance in a psychometric systematic review, as it can impact findings of reported validity evidence. Reviews by Glegg and Holsti [16] and Leung et al. [14] were also limited in focus, as they included measures that assessed only a few, but not all four of the attributes that comprise competence, potentially resulting in the exclusion of existing EIDM measures. Methodologically, across all reviews, psychometric assessment was limited as validity evidence was either not assessed [16] or assessed only by reviewing data that was formally reported as content, construct, or criterion validity [13, 14, 23], neglecting other critical data that could support validity evidence of a measure. As well, none of the reviews reported on or extracted data on specific practice settings. This is an essential component of psychometric assessment, as Streiner et al. [25] identify that reliability and validity are contingent not solely on scale properties, but on the sample with whom and specific situation in which measures are tested. Consideration of setting is important when determining the applicability of a measure for a specific population due to differences in role and environment. Despite these existing reviews, most importantly, none of them focused only on nurses. A systematic review unique to nursing is imperative given the diversity of needs, reception to, and expectations of EIDM across health care professional groups [16]. These differences may be reflected across measures to assess discipline specific EIDM competence.

The current review aimed to address limitations of existing reviews by: including measures that address a holistic conceptualization of EIDM which includes the use of multiple forms of evidence in nursing practice; focusing on the four EIDM competence attributes of knowledge, skills, attitudes and behaviours; utilizing a modern understanding of validity evidence in which sources based on test content, response process, internal structure, and relations to other variables were assessed according to the Standards for Educational and Psychological Testing [26]; extracting data on and presenting findings within the context of practice setting; and targeting the unique population of nurses.

The objectives of this systematic review were to: 1) identify existing measures of EIDM competence attributes of knowledge, skills, attitudes/beliefs, and/or behaviours used among nurses in any healthcare setting; and 2) determine the psychometric properties of test scores for these existing measures.

Methods

The protocol for this systematic review was registered (PROSPERO #CRD42018088754), was published [27] a priori, and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline.

Search strategy

A comprehensive search strategy consisting of online databases, hand searches, grey literature, and content experts, was developed in consultation with a Health Sciences Librarian. Searches were limited from 1990 until December 2017, as the term evidence-based medicine was first introduced and defined in 1990 [28]. Search strategy sources are summarized in Table 1. A detailed search strategy is provided in Additional file 1.

Table 1 Search strategy

Inclusion and exclusion criteria

Studies were included if they met the following criteria: study sample consists of all nurses or a portion of nurses; conducted in any healthcare setting; reported findings from the use or psychometric testing of measures that assesses EIDM knowledge, skills, attitudes/values, and/or behaviours; quantitative or mixed-method design; and English language. Studies were excluded if the sample consisted of solely other healthcare professionals or nursing undergraduate students, or in which data specific to nurses was not reported separately. As well, studies testing or using measures assessing research utilization were excluded [5, 24].

Study selection

Titles and abstracts of initial references and full-text records were screened independently by two team members (EB and TB) for inclusion/exclusion. All disagreements were able to be resolved by consensus between those whom extracted the data.

Data extraction

Data extraction was piloted using a standard form completed independently by two team members (EB and TB) on five randomly selected references. Data extracted pertaining to study and measure characteristics included: study design, sample size, professional designation of sample, healthcare setting, study country, funding, name of measure, format, purpose of measure, item development process, number of items, theoretical framework used, conceptual definition of competence established, EIDM attributes measured, EIDM domains/steps covered, and marking key or scale for self-report measures. Data extraction on these characteristics was performed by one team member (EB) and checked for accuracy by a second team member (TB/TD).

Data extraction of primary outcomes included psychometric outcomes of acceptability, reliability, and validity evidence. Data extracted relating to acceptability consisted of completion time and missing data reported for each measure. Missing data were extracted from reports of incomplete surveys or calculated based on the number of complete surveys included in the analysis. Reliability data extracted for scores of measures related to internal consistency, inter-rater, and test-re-test reliability coefficients. Sources of validity evidence were extracted following guidelines from the Standards for Educational and Psychological Testing [26]. Data were extracted on four sources of validity evidence: test content; response process, internal structure, and relationships to other variables. Test content refers to the relationship between the content of the items and the construct under measure, which includes analyzing the adequacy and relevance of items [26]. Validity evidence of response process involves understanding the thought processes participants use when responding to items and their consistency with the construct of focus [26]. Internal structure is defined as the degree to which test items are related to one another and coincide with the construct for which test scores are being interpreted [26]. The last source of validity evidence, relations to other variables, is the relationship of test scores to other external variables, from which it can be determined the degree to which these relationships align with the construct under measure [26].

To determine if study findings supported validity evidence based on relationships to other variables, a review of the literature was conducted and guiding tables on variable relationships were established (see Additional file 2). Data on psychometric outcomes were extracted by two independent reviewers (EB and TB/TD). All disagreements were able to be resolved by consensus between those whom extracted the data. Measures were grouped according to the number of sources of validity evidence that were reported in the study(ies) associated with each measure. In the event that multiple studies were reported for a measure, group classification was determined based on the number of sources indicated by 50% or more of the associated studies [29].

Quality assessment was not conducted due to limitations across varying and inconsistent criteria for appraising studies involving psychometric measures [27]. Instead, aligning with previous reviews [17, 29], a thorough assessment of reliability and validity evidence for scores of measures was conducted to align with the Standards for Educational and Psychological Testing [26].

Data synthesis

A narrative synthesis of results is presented. Study statistics as they relate to setting and population are summarized. Measures are also categorized according to the number of EIDM attributes addressed. Acceptability defined as completion time and overall missing data are summarized across measures and settings. Reliability data is summarized for each measure across settings. Similar to previous psychometric systematic reviews [17, 29], measures are categorized into distinct groups based on the number of validity evidence sources reported for each measure (e.g., Group 1 = 4 sources of validity evidence). This aligns with the Standards for Psychological and Educational Testing [26] which identifies that the strength of a validity argument for scores on a measure is cumulative and contingent on the number of validity evidence sources established. As psychometric properties are based on the context in which a measure is used or tested, healthcare settings are integrated into the presentation of results.

Results

Review statistics

In total, 5883 references were screened for eligibility at the title and abstract level. Of the 336 screened at full-text, 109 articles were included in the final review. Six pairs of articles (n = 12) were linked (i.e., associated with the same parent study) and the remainder of the articles were unique studies. Therefore, the review included 103 studies (see Additional file 3) and 35 unique measures (see Fig. 1 for PRISMA details).

Fig. 1
figure 1

PRISMA details

Study characteristics

Of the 103 studies, over half were conducted in the United States (n = 57; 55.3%). Twenty studies were conducted in Europe (57.1%), with 19 (54.3%) taking place in Asia. Two studies were conducted each in Africa, Australia, Canada, and one in New Zealand. Publication years spanned 2004–2017. One additional measure was identified after contacting content experts; its associated study was published in 2018.

Settings

The 35 included measures were used or tested most often in acute care (n = 31 measures) followed by primary care (n = 9 measures). Measures were used less often in public health (n = 4 measures), home health (n = 4 measures), and long-term care (n = 1 measure). An overview of measures with identified settings is presented in Table 2.

Table 2 Description of EIDM competence attributes measures across setting, population (35 measures)

Population

Measures were primarily used or tested among registered nurses (n = 26 measures; 74.3%), followed by advanced practice nurses (n = 7 measures; 20%), and licensed/registered practical nurses (n = 4 measures; 11.4%). A licensure group for 13 of the measures (37.1%) was not specified. Associated population groups are presented for each measure in Table 2.

EIDM competence attributes addressed

Measures addressed a variety of EIDM competence attributes (see Table 2). Only three measures (8.6%) assessed all four EIDM competence attributes of knowledge, skills, attitudes/beliefs, and behaviours. These included the Evidence-Based Practice Questionnaire (EBPQ) [30], the School Nursing Evidence-based Practice Questionnaire [67] and a self-developed measure by Chiu et al. [68]. Seven measures (20%) assessed three of the four EIDM competence attributes, with differing foci [69,70,71,72,73,74,75]. These measures all assessed knowledge, but varied on assessment of attitudes/beliefs, skills, and behaviours. Six measures (17%) addressed two EIDM competence attributes [77, 78, 80,81,82,83]. Over half of the total measures (n = 19; 54.3%) assessed only a single EIDM attribute. Among these single attribute measures, attitudes/beliefs were assessed the most (n = 6 measures) [31,32,33, 84, 134,135,136,137]. Overall, knowledge was the attribute addressed by most measures (n = 19), followed closely by attitudes/beliefs (n = 17 measures), skills (n = 15 measures), and behaviours (n = 13 measures; see Table 2).

Psychometric outcomes

Acceptability

Missing data

Overall, missing data related to percentage of incomplete surveys were reported for 10 measures (28.6%). The range of missing data was 1.6% (EBP Beliefs Scale) - 25.6% (EBPQ) and differed across health care settings. Missing data across seven measures yielded percentages below excessive missing data limits of > 10% [138]. Reported missing data is summarized in Table 3.

Table 3 Acceptability findings: Missing data and completion time [related citations]

Completion time

Data for completion time were extracted where times were explicitly stated or calculated using time to complete each item if a combined time was reported to complete multiple measures in a study. Completion time was reported for four measures, ranging from 5 (EBP Beliefs Scale) - 25 (EBPQ) minutes [34, 82, 84, 85]. A summary of reported completion time is provided in Table 3.

Reliability

Across measures and studies reporting reliability evidence, internal consistency was the most commonly assessed. Inter-rater and test-re-test reliability were also reported, although, for only one measure each.

Internal consistency

Reliability of scores, reported as Cronbach’s alpha (α), was reported for 21 measures (60%). Cronbach’s alpha values ranged widely across settings of: Acute care (0.45–0.99); primary care (0.57–0.98); public health (0.79–0.91); home health (0.63–0.87); and long-term care (0.79–0.96). Cronbach’s alphas are presented for individual measures and settings in Table 4.

Table 4 Reported Cronbach’s alphas for measures (n = 21) across settings [related citations]

Out of the 21 measures for which internal consistency was reported, seven measures had multiple study findings reported across unique practice settings. Reported Cronbach’s alphas were varied across and within settings for the same measure as evident by wide alpha ranges (see Table 4). Among these findings, two measures assessing EIDM attitudes with the lowest reported alphas were the Evidence-based Nursing Attitude Questionnaire (0.45) and the EBPQ (0.63 for attitude subscale) in acute care settings. The Modified Evidence-based Nursing Education Questionnaire also had a low alpha reported (0.57) in both acute and primary care settings. Regarding high range values, the EBPQ had the highest overall reported alpha (0.99) also in an acute care setting.

All 21 measures met a minimum of Cronbach’s alpha ≥0.80 [139] in at least one study instance (see Table 4).

Inter-rater and test-retest reliability

Test-retest reliability was assessed in only one measure, the Quick EBP Values, Implementation, Knowledge Survey [75]. Average item level test-retest coefficients ranged from below marginal to acceptable [140] at 0.51–0.70 [75].

Inter-rater reliability was reported for scores on the Knowledge and Skills in Evidence-Based Nursing measure [82]. Intraclass correlations were reported for three sections of this measure and exceeded a guideline of ≥0.80 [140].

Sources of validity evidence

Group 1: measures reporting four sources of validity evidence

Two of the 35 measures (5.7%) used/tested across three studies, were assigned to Group 1 [67, 135, 136] (see Table 5). Common across these two measures was the use of exploratory factor analysis to assess internal structure. Pertaining to validity based on relationships with other variables, this differed between the two measures. For the School Nursing Evidence Based Practice Questionnaire, the use of correlation and regression analyses supported validity evidence with significant associations between use of EBP and demographic variables (e.g., education; see Additional file 4). For the Evidence-Based Nursing Attitude Questionnaire, correlation and t-test analyses were used to establish relationships between EBP attitudes and variables related to EBP knowledge, EBP training, and education level (see Additional file 4). The measures also varied with respect to setting with the former being tested in a public health setting and the latter in acute care, primary care, and home healthcare settings.

Table 5 Group 1: Measures with four sources of validity evidence (n = 2)

Group 2: measures with three sources of validity evidence

Five measures (14%) used/tested across seven studies, were categorized in group 2 [35, 71, 75, 76, 79, 82, 137] (see Table 6). Common across all these measures was the report of validity evidence related to content and relationships to other variables. Similar to group 1, the strength of variable relationships differed, with varied use of correlational, t-test, ANOVA, and regression analyses to report significant relationships between EBP competence attributes (i.e., knowledge, implementation, skills, attitudes) and demographic, organizational variables or education interventions (see Additional file 4). Internal structure validity evidence via exploratory factor analysis was reported for three measures [71, 75, 76, 137], while response process validity evidence was reported for two measures [35, 82]. All measures were tested or used in acute care.

Table 6 Group 2: Measures with three sources of validity evidence (n = 5)

Group 3: measures with two sources of validity evidence

Six measures (17%) were categorized in group 3 [10, 69, 70, 73, 80, 120] (see Table 7). Content validity evidence was commonly reported across all six measures using an expert group. Validity evidence based on relationships to other variables was reported for five of the six measures with correlational and ANOVA analyses used most often (n = 3 measures). Once again, regarding this source of validity evidence, significant relationships were demonstrated between EBP knowledge, attitudes, skills, and individual characteristics or organizational factors (see Additional file 4). Acute care was the most common healthcare setting (n = 5 measures).

Table 7 Group 3: Measure with two sources of validity evidence (n = 6)

Group 4: measures with one source of validity evidence

Over half of the measures were categorized in group 4 (n = 19; 54%; see Table 8). For all these measures, except one [122], validity evidence based on relationships to other variables was reported. With respect to strength of these variable relationships, t-test (n = 12 measures), correlational (n = 11 measures), and ANOVA (n = 8 measures) analyses were primarily conducted. Regression analyses were used less commonly (n = 6 measures). Similarly, as in previous groups, significant relationships between EIDM competence attributes and demographic, organizational factors, and interventions were established (see Additional file 4).

Table 8 Group 4: Measures with one source of validity evidence (n = 19)

Group 5: measures with no sources of validity evidence

No sources of validity evidence were found for three measures [68, 72, 121].

See Additional file 4 for detailed information on validity evidence sources for each measure with supporting evidence.

Validity evidence and settings

Most of the measures (n = 29; 83%) reported validity evidence in the context of acute care settings. For nine measures, validity evidence was reported across multiple settings. For three of these measures (EBP Implementation Scale, EBP-Beliefs Scale, EBPQ), multiple sources of validity (> 1) were more often reported in acute care settings compared to other practice settings where only one source of validity evidence was commonly found. In contrast, one measure (Evidence-based Nursing Attitude Questionnaire) had four sources of validity evidence established in primary and home care settings but not in acute care. While, the same number of validity sources were established for five additional measures (Developing Evidence-based Practice Questionnaire, modified Evidence-based Nursing Education Questionnaire, two unnamed self-developed measures, EBP Competency Tool) across varied healthcare settings.

Discussion

This review furthers our understanding about measures assessing EIDM competence attributes in nursing practice. Findings highlight limitations in the existing literature with respect to use or testing of measures across practice settings, the diversity in EIDM competence attributes addressed, and variability in the process and outcomes of psychometric assessment of existing measures.

Settings

This review contributes new insight about settings in which EIDM measures have been used or tested that previous systematic reviews have not addressed. This review reveals a concentration on use or testing of EIDM measures in acute care (n = 31 measures; 89%) compared to other healthcare contexts (primary care, home health, public health, long-term care). This imbalance was also observed in an integrative review of 37 studies exploring the knowledge, skills, attitudes and capabilities of nurses in EIDM [9] where the majority of studies (n = 27) were conducted in hospitals, with fewer conducted in primary, community, and home healthcare, and none in long-term care. While there is a large body of evidence to support understanding of the psychometric rigor of EIDM measures in acute care, more attention and investment is required for this type of understanding in community-based and long-term care contexts. Given current trends and priorities in healthcare such as the reorientation toward home care [141], attention toward disease prevention and management, and health promotion [142], and a large aging population with growing projections of residence in long-term care facilities [143], it is of great importance to assess EIDM competence across all nursing practice settings to ensure efficient, safe, and patient-centred care.

EIDM competence attributes addressed

This review also adds to the current literature on nursing EIDM competence measures using a broader conceptualization of competence. That is, the measures reviewed focus on four competence attributes of knowledge, skills, attitudes/beliefs, and behaviours. In comparison, Leung et al. [14] assess measures focused on three attributes; knowledge, attitudes and skills. In our current review, three measures [30, 67, 68] addressed all four EIDM attributes (e.g., knowledge, skills, attitudes/beliefs, behaviours). Measures that address all four attributes are of critical importance given the inextricable link between knowledge, skills, attitudes and behaviours to comprise professional competence [144,145,146]. Professional competence cannot sufficiently develop if each attribute was to support it independently [147]. Knowledge without skill, or the ability to use knowledge, renders knowledge useless [148]. Similarly, performing a skill without understanding the reasoning behind it contributes to unsafe and incompetent practice [148, 149]. And lastly, possessing knowledge and skill without the experience of their application in the real world is insufficient to qualify as competent [150].

However, despite these measures addressing all four competence attributes, based.

on their response scales used, they do not conceptually reflect an assessment of competence, defined as quality of ability or performance to an expected standard [150], but rather, focus on mere completion or frequency of completing tasks. Quality versus frequency of behaviours are distinct concepts and have been measured separately in nursing performance studies [19, 151]. The provision of a high standard of patient care includes nursing competence assessment, which is a critical component of quality improvement processes, workforce development and management [19, 152]. This conceptual limitation of existing EIDM measures highlights a need for a measure that aligns with the conceptual understanding of competence as an interrelation between knowledge, skills, attitudes/beliefs, behaviours [144] and quality of ability [150].

Psychometric outcomes

Acceptability

Despite acceptability, measured as amount of missing data and completion times, being identified as a critical aspect of psychometric assessment [153], discussion of acceptability among included primary studies was lacking compared to an emphasis on reliability or validity. In this review, only 10 measures (28.6%) reported missing data. In addition, only four measures (11%) reported completion times. This limited discussion of acceptability is reinforced by findings from a systematic review of research utilization measures by Squires et al. [29] in which no studies reported acceptability data. As well, acceptability was not mentioned or discussed in systematic reviews of EIDM measures for nurses, midwives [14], medical practitioners [17] and allied health professionals [23]. Discussions about acceptability have typically been explored in the context of patient-reported outcome measures [153]. These discussions also hold relevance for measures with healthcare professionals as end users [154, 155]. Time and ease of completing a measure are important considerations for nurses or managers who work in fast-paced clinical settings, which can influence their decision to integrate these measures into their practice.

Reliability

Findings from the current review determine gaps in reliability testing of measures in addition to variable findings across EIDM measures and healthcare contexts.

Internal consistency reported as Cronbach’s alpha was the most commonly assessed type of reliability in this review. This appears to be a trend similarly found among EIDM related psychometric reviews [14, 23]. Cronbach’s alpha is a commonly used statistic in psychometric research perhaps due to its ease of calculation as it can be computed with a one-time administration [156]. While Nunnally [157] identifies that the “coefficient alpha provides a good estimate of reliability in most cases” (p. 211), there are important considerations with its use. One consideration is that interpretation of Cronbach’s alpha requires an understanding that it must be re-evaluated in each new setting or population a measure is used in [158]. In the current review, many of the studies associated with frequently used measures (EBP-Implementation Scale, EBP Beliefs Scale) did not re-evaluate internal consistency when using the measure in a new or different setting from where it was originally tested. This was evident from unreported data in multiple studies associated with the same measure but taking place across various healthcare settings. Other reviews have reported similar findings, whereby measures have not been re-assessed in new contexts, and have reported either no data or only original internal consistency findings [13, 16]. The importance of re-assessing and interpreting this reliability statistic in new contexts is further underscored by current review findings in which Cronbach’s alphas varied widely across unique practice settings for the same measure.

Moreover, there were heterogenous findings among studies taking place in the same type of setting for the same measure. Within each setting, there were instances in which the same measure would result in varying Cronbach’s alphas with range values falling both below and above minimum guidelines of ≥0.80 [139]. For example, Mooney [86] reported a Cronbach’s alpha of 0.776 for the EBP Beliefs Scale when used in an acute care setting, while Underhill et al. [87] reported α = 0.95 with the same measure also used in acute care practice. Variability in internal consistency findings has been reported in other systematic reviews as well [16, 23], perhaps due to the use of measures in diverse populations, settings, and countries. This further indicates the effect of nuanced populations within similar practice settings on internal consistency findings.

In addition, lower alphas were typically reported for EIDM attitude scales, such as for the self-developed measure by Yip et al. [71] (α = 0.69), the EBNAQ [135, 136] (α = 0.45) and the EBPQ (α = 0.63) [30]. A possible explanation of these low alphas may be related to the low number of items on an EIDM attitude subscale compared to other EIDM competence attributes. As Streiner [25] indicates, the length of a scale highly impacts internal consistency, and as such, reliability could plausibly be improved through the addition of conceptually robust items. Further to this, in a literature review of the uses of the EBPQ [159], authors note that low alpha scores for the attitude subscale were consistently reported, due to repeated item deletions or modifications, calling for further refinement of EIDM attitudes items.

Overall, there was a lack of reliability assessment as 40% of measures did not report reliability. This occurred for both newly developed and established measures. The lack of reliability testing has also been identified in existing reviews assessing EIDM measures among allied healthcare professionals [13, 16, 23] as early as 2010. The ongoing lack of attention to reliability assessment highlights a need for more rigorous and standardized reliability testing not only in the original development of measures but also in its subsequent use in different healthcare environments.

Validity

Findings pertaining to validity evidence when compared to existing literature show both alignment and contrast with respect to how validity evidence was assessed, and the number and type of validity sources established across measures.

As noted, psychometric assessment of the current review was based on the contemporary understanding that the strength of a validity argument is dependent on the accumulation of different validity evidence sources [26]. In this review, only one source of validity evidence was reported for over half of the measures (n = 19; 54%). Very few measures were reported with four (n = 2 measures) or three (n = 5 measures) validity evidence sources established. Employing a similar approach to validity evidence assessment, Squires et al. [29] reported similar findings in their review of research utilization measures: the majority of measures were categorized under level three of their hierarchy (i.e., one source of validity evidence); no measures were reported as having all four sources of validity evidence; and six measures were associated with three sources of validity evidence.

Since existing reviews did not present validity evidence in the context of practice settings, this presents challenges with comparison of results. However, this review presents some insight on contextualizing validity evidence. In the current review, much of the validity evidence was presented in the context of an acute care setting, and in particular, for three measures most widely used (EBP Implementation Scale, EBP Beliefs Scale, EBPQ), more sources of validity evidence were established by the original developers in acute care practice. Similar to reliability findings, this brings to light a critical gap in nursing research with respect to the use of measures after their original development, and lack of validity evidence assessment in different settings and populations. This demonstrates a call to action for nursing researchers that a consistent level of rigor must be applied to comprehensively re-assess sources of validity evidence for a measure when using it in a new practice setting. This strengthens a cumulative body of validity evidence to support continued use of a measure in varied nursing contexts.

Compared to the current review, previous EIDM psychometric systematic reviews [13, 14, 16] included traditional assessments of content, criterion, and construct validity and demonstrated variable findings. Buchanan et al. [13] reported no findings related to validity for 18 measures and failure to re-test validity by authors when original measures were used in a new study setting. Glegg and Holsti [16] only provided a description of validity data and did not perform an assessment through scoring or ranking of this evidence. While, Leung et al. [14] used their self-developed Psychometric Grading Framework [160] to assess validity of instruments in their review. These authors determined that most of the studies reported measures as having ‘weak’ or ‘very weak’ validity according to their matrix scoring, with only three studies reporting the tested measures as having adequate validity [14].

Included studies in this review also limited validity assessment to sources based on test content and relationships to other variables, focusing on construct validity. This appears to be a consistent theme reported across existing reviews as well [14, 23]. A new contribution from this review is an in-depth understanding about the strength of validity evidence based on relationships to other variables. Data extracted on the statistical analyses associated with this source of validity evidence showed relationships established primarily through correlational, t-test or ANOVA analyses. In less instances, regression analyses were used to demonstrate strong relationships, highlighting a need in psychometric evaluation of tools to validate more robust relationships between variables.

Findings from the current review and existing literature highlight limitations in assessing validity evidence and the psychometric rigor of existing EIDM measures. Variability in testing and results of validity evidence creates challenges and confusion for end users in research or nursing practice who look to this body of literature to determine appropriate and robust EIDM measures. Scholarly support for the use of a comprehensive and contemporary approach in psychometric development of tools can help to standardize assessments and produce findings representative of a unified understanding of validity evidence.

Considerations for tool selection in nursing practice or research

This systematic review can serve as a helpful resource for nursing administrators, frontline staff, or researchers who are interested in using a measure to assess a specific EIDM competence attribute. In selecting measures for nursing practice or research, the specific population and setting in which measures have been previously used or tested, in addition to specific EIDM competence attributes they address, all serve as important considerations. As well, looking to the acceptability of measures, taking into account tool completion time given demands of busy clinical environments and if high rates of missing data > 10% are present [138], are also critical factors to consider for decision-making. Acceptable reliability of a measure should also be given weight in tool selection (α ≥ 0.80) [139], in addition to determining how comprehensively all four sources of validity evidence (content, internal structure, response process, relationships to other variables) have been established for a given measure [26].

Limitations

A limitation of this review relates to the absence of quality assessments of included primary studies. Given that traditional quality assessment was not conducted, this may influence the confidence in study findings and thus results are to be interpreted with caution. However, among tools previously used to assess quality of psychometric studies, several limitations exist [27]. These include the development of quality assessment tools for use only with patient reported outcome measures [14], using a lowest score ranking method providing an imbalance in the overall quality score [161], and a lack of validity and reliability testing [27]. Most importantly, existing quality assessment tools employ a traditional approach of assessing construct, content, and criterion validity, rather than a contemporary perspective of viewing validity evidence as a unified concept [26], as used to guide the current review. Given this, to align with other reviews using a similar contemporary approach [17, 29] assessment was focused on the categorization of measures according to the number of sources of validity evidence established for scores in related studies. A second limitation pertains to the exclusion of non-English literature as there were 14 articles identified from full-text screening requiring translation for seven languages, which were excluded from the review. Given the large number of studies included in the final review, it is unlikely that the small number of non-English studies would have a critical impact on results. A third limitation is that with the use of a classification system for assessing validity evidence, the number of studies for a particular measure could influence the strength of the validity argument [29]. A measure which has one or a small number of studies may appear to have strong validity evidence [29] as compared to those measures with more cited studies. Implications of this are most relevant for more established measures, in that more sources of validity evidence may have in fact been established, but only in a small amount of studies, which may not be reflected in its final categorization. However, the advantage of using this synthesis process is that it highlights the types of validity evidence that require further testing for a particular measure [29].

Conclusions

There is a diverse collection of measures that assess EIDM competence attributes of knowledge, skills, attitudes/beliefs, and/or behaviours in nurses. Among these measures is a concentration on the assessment of single EIDM competence attributes. Review findings determined that three measures addressed all four EIDM attributes, although with some conceptual limitations, highlighting a need for a tool that comprehensively assesses EIDM competence. More rigorous and consistent psychometric testing is also needed for EIDM measures overall, but particularly in community-based and long-term care settings in which the data is limited. A contemporary approach to psychometric assessment of EIDM measures in the future may also provide more robust and comprehensive evidence of their psychometric rigor.