Background

With the quickened pace of industrialization and urbanization, as well as its remarkable impact on the improvement of people’s living standards, public demands for healthcare services have increased accordingly. The medical model has transformed from simple disease treatment to a combined mode of prevention, care, treatment and rehabilitation. In China, it has been proposed that the development goals of healthcare undertakings include providing people with a full range of full-cycle healthcare services [1]. In such a circumstance, it has become an urgent need to strengthen the construction of compound medical talent team, accelerate the innovative development of medical education and cultivate more high-level and application-oriented talents for medicine and healthcare.

To meet the requirements of the national medical education strategy, Chinese government has been committed to promoting the reform of the postgraduate education in the past decade, and has taken the development of professional degree postgraduate education as a significant national policy. The professional degree postgraduate education is a new form of postgraduate education in China. Compared with the academic degree postgraduate, it focuses more on cultivating advanced applied talents and providing practiced manpower [2]. As an important reserve of medical staff and researchers, clinical professional degree postgraduates are expected to complete various work and research in clinical units in the future. They will play a key role in improving the overall level of health service and achieving the goal of national health [3], which requires them to possess both profound professional knowledge and excellent clinical manipulation skills, as well as critical thinking and the ability of implementing Evidence-based practice (EBP) [4, 5].

EBP has gained increasing popularity worldwide. It requires health professionals to use the best available evidence when making medical decisions. In addition to the contexts and preferences of individual patients, the evidence provided by authoritative research results will contribute to generating best practice behaviors and optimizing patient outcomes [6, 7]. It has become a norm that health professionals are supposed to demonstrate evidence-based practice behaviors on a daily basis [8, 9]. The clinical postgraduates, especially professional degree postgraduates, will become backbones to support EBP in clinical units after graduation [3]. It is useful for them to have sufficient knowledge and skills in conducting literature search and critical appraisal of evidence [10]. Thus, medical educators should identify the effectiveness of EBP programs and determine the best method to teach students the knowledge and skills required for EBP. Currently, many medical schools around the world have tried to incorporate evidence-based medicine teaching programs into their curriculum system [11]. A crucial aspect in evaluating education programs is to choose instrument for evaluating the effect of educational teaching [12]. Selecting an instrument to best assess the effectiveness of EBP learning outcomes is necessary.

Several instruments existing for evaluating EBP are relevant to medical students in foreign countries, including Fresno Test [13], Berlin Questionnaire [14], Assessing Competency in Evidence Based Medicine (ACE) Tool [15], Evidence-Based Practice -Knowledge, Attitude and Behavior (KAB) Questionnaire [16] and Evidence-Based Practice Profile Questionnaire (EBP2Q) [17]. Some instruments were cross-culturally adapted to measure EBP for nursing students or nurses in Mainland China [18, 19]. However, none of them was developed specifically for clinical postgraduates. Existing domestic instruments for evaluating EBP of clinical postgraduates were mostly self-designed to test their attitudes, behaviors or beliefs towards EBP, which had shortcomings in contents and psychometric properties. The EBP2Q was developed at the University of South Australia by Maureen McEvoy et al., and was validated on 526 people (consisting of 481 students and 45 academics/practitioners). Apart from its good psychometric parameters, an additional advantage of the EBP2Q is that it can be applied to self-assessment of the knowledge, skills, attitudes and behaviors required for EBP by students, lecturers, and practitioners. It can also be applied to assess different aspects of EBP by selecting individual parts (domains) of the questionnaire [17]. Therefore, we cross-culturally adapted the EBP2Q to measure EBP learning outcomes of Chinese clinical postgraduates and subsequently evaluated its psychometric properties.

Methods

Participant and setting

A cross-sectional validation study was conducted with 633 clinical postgraduates (major in clinical medicine, stomatology and nursing) using convenience sampling from three university affiliated hospitals (the First Affliated Hospiatl of Xi’an Jiaotong University, the Second Affiliated Hospital of Xi’an Jiaotong University and Hospital of Stomatology Xi’an Jiaotong University) in northwest China. Participant inclusion criteria included: (a) enrolled in a Master or Doctor of Clinical postgraduate degree program; (b) possessing 3 months or more in clinical practice and already adjusting themselves to the working environment; (c) willing to participate in the study. All participants were informed consent after the study aims and procedures had been fully explained. Anonymity and confidentiality were assured and participants were told that they could withdraw at any point without consequences. Data were collected by the online survey utilizing sojump (an online research survey tool; http://www.sojump.com). Approval was obtained from the Ethics Committees of Xi’an Jiaotong University. All procedures followed were in accordance with the ethical standards of the Declaration of Helsinki.

Instruments

The Evidence-based Practice Profile Questionnaire (EBP2Q)

The original Evidence-based Practice Profile Questionnaire (EBP2Q) was developed in 2010 in Australia, which was initially developed with academics and students from health and non-health backgrounds to assess knowledge and skills in EBP. The original instrument comprised five distinct domains: Relevance, Sympathy, Practice, Terminology and Confidence. Relevance (14 items) refers to the value, emphasis and importance placed on EBP; Sympathy (7 items) refers to the individual’s perception about the compatibility of EBP with professional work; Terminology (17 items) refers to the understanding of common research terms; Practice (9 items) refers to the use of EBP in clinical situations and Confidence (11 items) refers to the perception of an individual’s ability with EBP skills. All items are scored on a 5-point Likert scale and the items in Sympathy domain are reversely scored. The 58-item questionnaire demonstrated acceptable internal consistency (Cronbach’s alpha 0.96) and test–retest reliability. When compared to the instrument developed by Upton & Upton [20], the EBP2Q was shown to have good convergent validity in the three comparable factors (Practice 0.66, Confidence 0.80 and Sympathy 0.54). Descriptive statistics and correlation coefficients demonstrated sufficient item facility and discrimination of the original EBP2Q. As a well-developed instrument, the original EBP2Q version was examined strong psychometric properties.

General Information Questionnaire

Socio-demographic and evidence-based practice relevant data were obtained through the General Information Questionnaire that we developed. The questionnaire includes age, gender, specialty, educational background, degree (academic degree or professional degree), English level, clinical practice duration, clinical work experience, research experience on EBP, supervisor’s research experience on EBP, interests in EBP, EBP courses or training duration, and necessary to implement EBP courses or training.

Translation procedure

After obtaining original author approval, translation and cross-cultural adaptation of the EBP2Q were performed according to a clear and user-friendly guideline [21]. The guideline outlines a thorough adaptation process of self-report measures, aiming to maximize the attainment of semantic, idiomatic, experiential, and conceptual equivalence between the source and target questionnaires. The adaptation process can be carried out within the following stages as recommended: translation (Stage I), synthesis (Stage II), back translation (Stage III), expert committee review (Stage IV), pretesting (Stage V), together with further testing of the adapted version and evaluation of the adaptation process (Stage VI). The EBP2Q was adapted to Chinese in strict adherence to the guideline.

The original English version was independently translated into Chinese by three translators who respectively work in clinical department, evidence-based medicine education department and English language teaching department. The differences between three completed translation versions were then resolved after comprehensive discussion with the participation of the fourth translator, and they ultimately accomplished a forward-translated version of questionnaire. Subsequently, the questionnaire was back translated independently by another two independent translators (i.e., bilingual experts fluent in English and Chinese) who were blind to the original English version. A multidisciplinary consensus committee comprised by one methodologist (a member of the research team), one health care professional, five bilingual and bicultural translators was held to consolidate all the translated and back translated versions of the questionnaire, verify any controversial or ambiguous wording, ensure cross-cultural equivalence and develop the pre-final version of the questionnaire for field testing.

The pilot study

Chinese version of the questionnaire was tested on a sample of 30 postgraduates majoring in clinical medicine, stomatology and nursing, which were recruited through convenience sampling in Northwest China. All volunteers were asked to complete the questionnaires. The pilot study enabled us to detect problems with wording, terminology, instructions and the clarity of options. An interview was conducted to explore their perception and understanding of each item, and to take their advice for the improvement of the questionnaire. The interview was uniformly performed by the researcher to address three aspects: the instruction of the questionnaire, the content of the entries, and the domain of the entries. The outline of the interview was as follows: Q1: Do you have any suggestions for the instruction of the questionnaire? Q2: Do you have any suggestions for the domains of the entries? Q3: Which entries do you find difficult to comprehend? Q4: Do you have any recommendations for the wording of the entries? This process ensured that the adapted versions were still retaining the equivalence and linguistically appropriate when applied in practice. After this process, the final Chinese version was developed [18, 22]. The 30 students were recruited to complete the same questionnaire to measure the test-retest reliability 2 weeks later.

Data analysis

The Statistical Package for the Social Sciences (SPSS) version 19.0 was used for data analysis. The statistical description of the demographic variable was carried out by frequency tables, the means, and the standard deviations (SD). The 7 experts were invited to judge the degree of relevance of each item based on the recommended 4-point scale from 1 (very invalid) to 4 (very valid) for the content validation. Content validity index (CVI) was computed to quantify scores for each item and the whole questionnaire. Items rated at a 3 or 4 on the 4-point relevance scale suggest expert have reached consensus regarding relevance. The content validity index of each item (I-CVI) and the overall scale (S-CVI) were calculated, and an S-CVI of more than 0.90, together with I-CVIs of more than 0.78 were denoted validity [23]. The internal consistency was calculated with Cronbach alpha coefficient. The Cronbach’s alpha value of 0.7 or greater was considered satisfactory [24]. Split-half coefficient reliability was assessed by using half of odd and even items. Test-retest reliability was assessed by using the intraclass correlation coefficient (ICC) for the whole questionnaire and each domain [25]. ICC values of 0.60 to 0.80 were deemed good reliability, and ICC values above 0.80 were regarded as excellent reliability [26]. Validity of each item was determined through item analysis. We considered unfavorable floor or ceiling effects to be present if more than 15% of the individuals reached the highest or lowest score. Exploratory factor analysis using Principal component analysis (PCA) as the extraction method and Direct Oblimin as the rotation method was conducted to determine the factor structure of questionnaire [27]. The number of factors was identified with the following strategies: (a) The Kaiser Criterion (eigenvalue), (b) the “elbow” joint in the scree plot, (c) the clinical interpretability. Items were deemed relevant if factor-loading coefficients exceeded 0.40 and extracted factors achieved an eigenvalue ≥ 1.0 [28]. A confirmatory factor analysis (CFA) was also performed to verify the results. The expected values of indices recommended were as follows [29]: (a) Chi-squared divided by the degrees of freedom ≤ 3; (b) the root mean squared error of approximation (RMSEA) < 0.08; (c) the comparative fit index (CFI), normed fit index (NFI) and goodness-of-fit index (GFI) > 0.90.

Results

Translation and adaptation of EBP2Q

The forward and backward translations were repeated three times until an acceptable version was obtained. According to expert enquiry, the Likert-5 scoring options set after items 1 to 8 in the Relevance domain were uniformly revised as “5 = strongly agree”, “4 = agree”, “3 = neutral”, “2 = disagree” and “1 = strongly disagree”. The supplementary interpretation was given in item 22 “Formulated a clearly answerable question that defines the client or problem, the intervention and outcome(s) of interest” to ensure that respondents could understand, that was “construct clinical questions using the principles of PICO”. The words “set standards” in item 55 were modified into “existing standards for reference evaluation” as needed for cross-cultural adaptation.

Pilot study

The pre-final Chinese EBP2Q was tested on a sample of 30 postgraduates majoring in clinical medicine (40%), stomatology (30%) and nursing (30%), which were recruited through convenience sampling in Northwest China. In order to better fit Chinese context, the phrase “develop knowledge” in item 5 was suggested to be translated as “expand knowledge”, and the term “client” in the whole questionnaire was recommended to be translated as “service recipients”. One of the participants proposed to revise item 28 “Read published research reports” into “Read published research reports related to EBP”, as the scope of “research reports” was too broad and not specific enough to reflect the Practice domain of EBP. The researcher recorded participants’ suggestions during the pilot study and made modifications after discussing with the multidisciplinary consensus committee.

Validation study

Sample characteristics

Demographic data of the validation study are presented in Table 1. The total number of participants included 633 clinical postgraduates (postgraduates of clinical medicine accounted for 72.8%, stomatology accounted for 13.6%, and nursing accounted for 13.6%). The mean age was 25.18 years (SD = 2.24). Participants included 465 (73.5%) female and 168 (26.5%) male who were in their Master (80.7%) and Doctor (19.3%) of postgraduate degree programs. Over half (56.7%) received EBP courses or training and 52.3% of participants were very interested in EBP. 49.4% of the participants had research experience, while almost half of them (52.1%) had research experience on EBP. A total of 608 participants (96.1%) strongly agreed that it was necessary to implement EBP in clinical settings.

Table 1 Characteristics and EBP related information of sample (n = 633)

Item analysis

First, we sorted EBP2Q items into high and low scoring groups according to the total score of participants. The top 27% of the highest scoring items comprised the high group, and the lower 27% of the lowest scoring items comprised the low group. Then the mean score of each item in the two groups was compared using independent samples t-test to test the difference between two groups, and the critical ratio (CR) of the item was obtained. The results showed that there was a statistically significant difference in the scores of each item between the high group and the low group (p < 0.001), and the CR value of each item was greater than 3, indicated that every item had good discrimination without the floor or ceiling effect. No entries were deleted at this stage.

Content validity

The CVI evaluation form was distributed to the seven experts who were asked to rate content validity. All experts agreed that the EBP2Q was particularly designed to measure the EBP learning outcomes. S-CVI of the questionnaire reached 0.938, which indicated excellent content validity. I-CVIs were above 0.78 except items 1–4 in the Relevance domain and items 31, 32, 34, 36, 40, 42, 43 in the Terminology domain. These 11 items were deleted because of a low validity. In addition, the experts suggested merging or deleting some items in the Sympathy domain and the Relevance domain for they expressed the similar meaning. For example, items 15, 16, 20 and 21 in the Sympathy domain were basically the same as the items 9, 10, 11 and 14 in the Relevance domain, respectively. The major difference between them was the scoring method, while the items in the Sympathy domain were negatively worded and required to be reverse scored. According to the experts, the other items in the Sympathy domain (items 17, 18 and 19) involved the accumulation of long-term clinical work experience of the respondents, while our questionnaire was designed for medical students with limited clinical work experience. Therefore, we deleted the items of the Sympathy domain (items 15–21) after discussion. The final version of Chinese EBP2Q consists of 40 entries. The results are shown in Table 2.

Table 2 The content validity of each item of the Chinese EBP2Q

Exploratory factor analysis

The data was divided into two parts randomly in this study. The first 313 samples were employed for exploratory factor analysis using oblique rotation to account for the relationship among the factors. The correlation matrix showed ample adequacy of the sample size (the Kaiser-Meyer-Olkin measure was 0.922) and the Bartlett test results (χ2 = 13,882.106, p < 0.001) rejected the hypothesis of zero correlations. The scree plot (Fig. 1) indicated that there were four factors. In addition, based on Kaiser’s criterion of extracting factors with eigenvalues of greater than 1, a four-factor structure (Factor 1 = 10.610, Factor 2 = 5.994, Factor 3 = 5.124, Factor 4 = 2.907) that explained 61.586% of the variance of the data was identified by the pattern matrix (see Table 3). Exploratory factor analysis of the 40 items produced factor loading from 0.565 to 0.872. Factor 1 comprised 11 items (items 48–58), all taken from the Confidence domain; Factor 2 comprised 10 items (items 33, 35, 37–39, 41, 44–47), all taken from the Terminology domain; Factor 3 comprised 10 items (items 5–14), all taken from the Relevance domain; Factor 4 comprised 9 items (items 22–30), all taken from the Practice domain. Combined with the results of scree plot, Kaiser Criterion (eigenvalue) and the meaningfulness of factors [30], a four-factor structure was identified. Correlation analysis showed a weak correlation between the extracted factors (factor intercorrelations ranged from 0.137 to 0.461), indicating the suitability of the oblique solution [31].

Fig. 1
figure 1

Scree plot of the Chinese version of EBP2Q (n = 313)

Table 3 Factor loadings on items of the EBP2Q (n = 313)

Confirmatory factor analysis

A total of 320 samples were employed for confirmatory factor analysis. A four-factor model was established according to the results of exploratory factor analysis (see Fig. 2 and Table 4). All fit indices within the initial model, except GFI and NFI, complied with suggested parameters for satisfactory model fitting. In the modified model, the fit indexes were excellent: the RMSEA was 0.052, less than 0.08; the GFI was 0.902, and NFI was 0.901 exceeding the benchmark of 0.90. Eventually, the four-factor model suitably fitted the survey data and its application was testified to be appropriate for the population surveyed.

Fig. 2
figure 2

A schematic diagram of standardized model fitting of the Chinese EBP2Q (n = 320)

Table 4 The fitting indexes of confirmatory factor analysis of the EBP2Q (n = 320)

Reliability

The summaries for internal consistency and split-half reliability of the EBP2Q are illustrated in Table 5. The Cronbach’s alpha for the overall questionnaire was 0.926 and the four domains had the Cronbach’s alpha of 0.921 (Relevance), 0.894 (Practice), 0.922 (Terminology) and 0.950 (Confidence). Split-half reliability of all items of Chinese version of the EBP2Q was 0.925 and values for the four domains ranged from 0.848 to 0.926. Test-retest reliability by ICC was 0.868 for the overall questionnaire and 0.719 to 0.805 for the four domains.

Table 5 Reliability analysis of the EBP2Q (n = 633)

Discussion

In this study, we cross-culturally adapted the EBP2Q in Mainland China, providing an effective tool for evaluating the EBP learning outcomes of clinical postgraduates, especially for the clinical professional degree postgraduates. Chinese version of the EBP2Q introduced in this study contains 40 items, including four domains: Relevance, Practice, Terminology and Confidence. Relevance domain (10 items) measures students’ attitude towards EBP. Practice domain (9 items) measures the frequency of applying EBP in clinical situations. Terminology domain (10 items) measures students’ understanding of EBP related terms. Confidence domain (11 items) measures students’ self-confidence in their EBP related skills.

The items in the Sympathy domain and several items in the Relevance domain and the Terminology domain were deleted according to the results of content validity and experts’ reviewing. On one hand, some items (items 17–19) of the Sympathy domain involved the accumulation of clinical work experience, which was not suitable for medical students to answer. On the other hand, the other items (items 15, 16, 20 and 21) of the Sympathy domain expressed similar meanings to the contents in the items of the Relevance domain. Some terms in the Terminology domain, such as “relative risk (RR)”, “absolute risk (AR)”, “number needed to treat (NNT)”, etc. were proper nouns in epidemiological studies. With reference to the results of CVI, these items were deleted to improve the generalizability of questionnaire. Meanwhile, the content validity indicated that the CVI of four items in the Relevance domain were low. The four items focused on the understanding of the concept of evidence-based practice, and were not related to the students’ attitude to the EBP, so these four items were also deleted.

The EBP2Q was developed and evaluated across a range of professions and showed acceptable psychometric properties. Maureen Patricia Mcevoy et al. [17] examined the questionnaire as a reliable instrument with the ability of monitoring changes in EBP learning outcomes after guidance or cumulative learning throughout the degree period for students or healthcare professionals with varying healthcare disciplines. Our project played a crucial role in the reliability and validity of the culturally adapted EBP2Q. The Cronbach’s alpha was 0.926 in 633 Chinese clinical postgraduates, indicating that good internal consistency was confirmed for the questionnaire. Similar value was also reported in the original version. The test-retest reliability of the adapted questionnaire was optimal in general and for each of the identified components. The ICC in 2-week retest indicated a strong reliability.

The CVI is the main method which is adopted to quantify content validity for multi-item instruments. S-CVI reached 0.938 in this study. I-CVIs were reported higher than 0.78 with eleven items in the Relevance and Terminology domain exception, and these items were removed at this stage. The results suggested the content validity of the EBP2Q was acceptable. Exploratory factor analysis indicated that the extracted four principal components accounted for 61.586% of the total variance, providing suitable indices for assessing the validity of this instrument. Our findings showed a strong similarity of factor structure with Polish version, indicating that the condition for theoretical validity is fulfilled. Polish version of the EBP2Q obtained five factors through exploratory factor analysis, which was consistent with the structure of the original version, while the item 15 from the Sympathy domain was included in the Relevance domain [32]. It confirmed the similar view in this study that some contents expressed in items of the Sympathy and Relevance domain were similar. Ming-Yu Hu et al. [19] conducted the exploratory factor analysis in a sample of Chinese clinical nurses, and obtained an eight-factor structure (the eight domains were redefined as basic understanding, intention, attitude, sympathy, clinical related terms, EBP related terms, practice and confidence according to their common characteristics.), which was inconsistent with our results. This may be resulted from the different characteristics between clinical postgraduates and nurses. The well-developed EBP curriculum system makes it accessible for clinical postgraduates to obtain EBP knowledge and skills, but less working experience limits the perception of the compatibility of EBP with clinical work.

As the further contribution of this study, confirmatory factor analysis (CFA) was conducted for investigating the fit of the four factors with the general structure of the EBP2Q. Model fit assessment plays a pivotal role in evaluating CFA models and the validity of psychological assessments. The fixed fit cutoffs utilized in the study are widely adopted in empirical research to identify potential model misspecification and select a concise model [33]. As suggested by Marsh HW [34], assessing goodness of fit is best achieved by considering multiple perspectives. It is typically recommended to examine several qualitative indices with well-established properties to evaluate model fit. Hence, Chinese version of EBP2Q was verified using 7 indices: χ2/df, GFI, CFI, RMSEA, NFI, IFI and PCFI. All the indices were satisfied the standard. The results of CFA indicated that the four-factor model with modification was considered a better fit, suggesting that the revised Chinese version of EBP2Q had good construct validity. Norwegian version of the EBP2Q [35] conducted confirmatory factor analysis to test whether the collecting data from 347 students majoring in social education and nursing fit the original five-factor structure. The results of CFA did not confirm the original five-factor model (CFI = 0.69, RMSEA = 0.09). In addition, there was no statistically significant difference in the domain of the Sympathy before and after the EBP course.

Allowing for the obtained results, Chinese version of EBP2Q consists of the four factors: Relevance, Practice, Terminology and Confidence. Each factor comprises a sufficient number of items. Moreover, the 40-item questionnaire has a high response rate and the pilot study has reassured a quality of the adapted version. Above all, the revised questionnaire has good reliability and validity, which can satisfy the domestic demand for relevant research and application.

Limitation

Despite the result of the cross-cultural adaption of the EBP2Q is satisfied, several limitations should be mentioned. First, clinical postgraduates in our study were recruited using convenience sampling from three university affiliated hospitals in Northwest China, which may have impacted the widespread generalization and application of Chinese EBP2Q to some degree. However, the sample in this study has a broad range of specialties, clinical practice duration and research experience, suggesting that EBP2Q is understandable and acceptable by general Chinese clinical postgraduate’s population. Second, criterion validity or predictive validity was not directly determined because a gold standard does not exist. A psychometric assessment of EBP2Q with respect to convergent validity should be considered in the future validation study. Third, while the fixed fit cutoffs employed for model fit assessment in CFA have gained significant recognition, methodological research has highlighted that cutoff values may vary depending on the characteristics of the data and model being evaluated [36]. Daniel McNeish et al. proposed a simulation-based approach known as dynamic fit index cutoffs, which allows fit index cutoffs in CFA models to be dynamically adjusted to align with data and model characteristics. However, the widespread application and validation of this approach are still limited. Due to the nature of dynamic cutoffs, additional computations are required, making it more complex than fixed fit cutoffs and potentially leading to a higher likelihood of user errors [33]. Thus, we continued to utilize fixed fit cutoffs in this study. The possibility of dynamically adjusting fit index cutoffs based on specific model and data characteristics will be considered in future studies.

Conclusion

Chinese version of the EBP2Q possesses adequate validity, test-retest reliability and internal consistency. The results indicate that the tool is replicable and applicable for EBP learning outcomes evaluation of clinical postgraduates. Chinese version of EBP2Q could be adopted by medical educators in designing their course and curriculum, or by clinical postgraduates for self-assessment of EBP learning outcomes.