Background

Patient harm due to unsafe care is a large and persistent global public health challenge and one of the leading causes of death and disability worldwide [1]. Improving safety in healthcare is central in governmental policies, though progress in delivering this has been modest [2]. Patient safety culture surveys have been the most frequently used approach to measure and monitor perception of safety culture [3]. Safety culture is defined as “the product of individual and group values, attitudes, perceptions, competencies and patterns of behavior that determine the commitment to, and the style and proficiency of, an organization’s health and safety management” [4]. Moreover, safety culture refers to the perceptions, beliefs, values, attitudes, and competencies within an organization pertaining to safety and prevention of harm [5]. The importance of measuring patient safety culture was underlined by the results in a 2023 scoping review, where 76 percent of the included studies observed associations between improved safety culture and reduction of adverse events [6].

To assess patient safety culture in hospitals the US Agency for Healthcare Research and Quality (AHRQ) launched the Hospital Survey on Patient Safety Culture (HSOPSC) version 1.0 in 2004 [7, 8]. Since then, HSOPSC 1.0 has become one of the most used tools to evaluate patient safety culture in hospitals, administered to approximately hundred countries and translated into 43 languages as of September 2022 [9]. HSOPSC 1.0 has generally been considered to be one of the most robust instrument measuring patient safety culture, and it has adequate psychometric properties [10]. In Norway, the first studies using N-HSOPSC 1.0 concluded that the psychometric properties of the instrument were satisfactory for use in Norwegian hospital settings [11,12,13]. A recent review of literature revealed 20 research articles using the N-HSOPSC 1.0 [14].

Studies of safety culture perceptions in hospitals require valid and psychometric sound instruments [12, 13, 15]. First, an accurate questionnaire structure should demonstrate a match between the theorized content structure and the actual content structure [16, 17]. Second, psychometric properties of instruments developed in one context is required to demonstrate appropriateness in other cultures and settings [16, 17]. Further, psychometric concepts need to demonstrate relationships with other related and valid criteria. For example, data on criterion validity can be compared with criteria data collected at the same time (concurrent validity) or with similar data from a later time point (predictive validity) [12, 16, 17]. Finally, researchers need to demonstrate a match between the content theorized to be related to the actual content in empirical data [15]. If these psychometric areas are not taken seriously, this may lead to many pitfalls both for researchers and practitioners [14]. Pitfalls might be imprecise diagnostics of the patient safety level and failure to evaluate effect of improvement initiative. Moreover, researchers can easily erroneously confirm or reject research hypothesis when applying invalid and inaccurate measurement tools.

Patient safety cannot be understood as an isolated phenomenon, but is influenced by general job characteristics and the well-being of the individual health care workers. Karsh et al. [18] found that positive staff perceptions of their work environment and low work pressure were significantly related to greater job satisfaction and work commitment. A direct association has also been reported between turnover and work strain, burnout and stress [19] Zarei et al. [20] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Also, there was a negative correlation between occupational burnout and safety climate, where a decrease in the latter was associated with an increase in the former. Hence, patient safety researchers should look at healthcare job characteristics in combination with patient safety culture.

Recently, the AHRQ revised the HSOPSC 1.0 to a 2.0 version, to improve the quality and relevance of the instrument. HSOPSC 2.0 is shorter, with 25 items removed or with changes made for response options and ten additional items added. HSOPSC 2.0 was validated during the revision process [21], but the psychometric qualities across cultures, countries and in different settings need further investigation. Consequently, the overall aim of this study was to investigate the psychometric properties of the HSOPSC 2.0 [21] (see supplement 1) in a Norwegian hospital setting. Specifically, the aims were to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more outcomes, namely’ pleasure of work’ and ‘turnover intention’.

Methods

Design

This study had cross‐sectional design, using a web-based survey solution called “Nettskjema” to distribute questionnaires in two Norwegian hospitals. The study adheres to The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)Statement guidelines for reporting observational studies [22].

Translation of the HSOPSC 2.0

We conducted a «forward and backward» translation in-line with recommendations from Brislin [23]. First, the questionnaires were translated from English to Norwegian by a bilingual researcher. The Norwegian version was then translated back to English by another bilingual researcher. Thereafter, the semantic, idiopathic and conceptual equivalence between the two versions were compared by the research group, consisting of experienced researchers. The face value of the N-HSOPSC 2.0-version was considered to be adequate and the items lend themselves well to the corresponding latent concepts.

Piloting

The N-HSOPSC 2.0 was pilot-tested with focus on content and face validity. Six randomly selected healthcare personnel were asked to assess whether the questionnaire was adequate, appropriate, and understandable regarding language, instructions, and scores. In addition, an expert group consisting of senior researchers (n = 4) and healthcare personnel (n = 6), with competence in patient safety culture was asked to assess the same.

The questionnaire

The HSOSPS 2.0 (supplement 1) consists of 32 items using 5-point Likert-like scales of agreement (from 1 = strongly disagree to 5 = strongly agree) or frequency (from 1 = never to 5 = always), as well as an option for “does not apply/do now know”. The 32 items are distributed over ten dimensions. Additionally, 2-single item patient safety culture outcome measures, and 6-item background information measures are included. The patient safety culture single item outcome measures evaluate the overall ‘patient safety rating’ for the work area, and ‘reporting patient safety events’.

In addition to the N-HSOPSC 2.0, participants were asked to respond to three questions about their ‘pleasure at work’ (measure if staff enjoy their work, and are pleased with their work, scored from 1 = never, to 4 = always) [24], two questions about their ‘intention to quit’ (measure is staff are considering to quit their job, scored on a 5-point likert scale where 1 = strongly agree to 5 = strongly disagree) [25], as well as demographic variables (gender, age, professional background, primary work area, years of work experience).

Participants and procedure

The data collection was conducted in two phases: the first phase (Nov-Dec 2021) at Hospital A and the second phase at Hospital B (Feb-March 2022)). We used a purposive sampling strategy: At Hospital A (two locations), all employees were invited to participate (N = 6648). This included clinical staff, administrators, managers, and technical staff. At Hospital B (three locations) all employees from the anesthesiology, intensive care and operation wards were invited to participate (N = 655).

The questionnaire was distributed by e-mail, including a link to a digital survey solution delivered by the University of Oslo, and gathered and stored on a safe research platform: TSD (services for sensitive data). This is a service with two-factor authentication, allowing data-sharing between the collaborating institutions without having to transfer data between them. The system allows for storage of indirectly identifying data, such as gender, age, profession and years of experience, as well as hospital. Reminders were sent out twice.

Statistical analyses

Data were analyzed using Mplus. Normality was assessed for each item using skewness and kurtosis, where values between + 2 and -2 are deemed acceptable for normal distribution [26]. Missing value analysis was conducted using frequencies, to check the percentage of missing responses for each item. Correlations were assessed using Spearman’s correlation analysis, reported as Cronbach’s alpha.

Confirmatory factor analysis (CFA) was conducted to test the ten-dimension structure of the N-HSOPSC 2.0 using Mplus and Mplus Microsoft Excel Macros. The structure was then tested for fitness using Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR) [27]. Table 1 shows the fitness indices and acceptable thresholds.

Table 1 Fitness indices and acceptable thresholds [27]

Reliability of the 10 predicting dimensions were also assessed using composite reliability (CR) values, where 0.7 or above is deemed acceptable for ascertaining internal consistency [25].

Convergent validity was assessed using the Average Variance Explained (AVE), where a value of at least 0.5 is deemed acceptable [28], indicating that at least 50 percent of the variance is explained by the items in a dimension. Criterion-related validity was tested using linear regression, adding ‘turnover intention’ and ‘pleasure at work’ to the two single item outcomes of the N-HSOPSC 2.0.

Internal consistency and reliability were assessed using Cronbach’s alpha, where values > 0.9 is assumed excellent, > 0.8 = good, > 0.7 = acceptable, > 0.6 = questionable, > 0.5 = poor and < 0.5 = unacceptable [29].

Ethical considerations

The study was conducted in-line with principles for ethical research in the Declaration of Helsinki, and informed consent was obtained from all the participants [30]. Completed and submitted questionnaires were assumed as consent to participate. Data privacy protection was reviewed by the respective hospitals’ data privacy authority, and assessed by the Norwegian Center for Research Data (NSD, project number 322965).

Results

Sample

In total, 1002 participants responded to the questionnaire, representing a response rate of 12.6 percent. As seen in Table 2, 83.7% of the respondents worked in Hospital A and the remaining 16.3% in Hospital B. The majority of respondents (75.7%) were female, and 75.9 percent of respondents worked directly with patients.

Table 2 Sample characteristics

The skewness and kurtosis were between + 2 and -2, indicating that the data were normally distributed. All items had less than two percent of missing values, hence no methods for calculating missing values were used.

Correlations

Correlations and Cronbach’s alpha are displayed in Table 3.

Table 3 Correlations and Cronbach’s alpha values (diagonal)

The following dimensions had the highest correlations; ‘teamwork’, ‘staffing and work pace’, ‘organizational learning-continuous improvement’, ‘response to error’, ‘supervisor support for patient safety’, ‘communication about error’ and ‘communication openness’. Only one dimension, ‘teamwork’ (0.58), had a Cronbach’s alpha below 0.7 (acceptable). Hence, most of the dimensions indicated adequate reliability. Higher levels of the 10 safety dimensions correlate positively with patient safety ratings.

Confirmatory Factor Analysis (CFA)

Table 4 shows the results from the CFA. CFA (N = 1002) showed acceptable fitness values [CFI = 0.92, TLI = 0.90, RMSEA = 0.045, SRMR = 0.053] and factor loadings ranged from 0.51–0.89 (see Table 1). CR was above the 0.70 criterium on all dimensions except on ‘teamwork’ (0.61). AVE was above the 0.50 criterium except on ‘teamwork’ (0.35), ‘staffing and work pace’ (0.44), ‘organizational learning-continuous improvement’ (0.47), ‘response to error’ (0.47), and communication openness.

Table 4 Confirmatory factor analysis with standardized factor loadings

Criterion validity

Independent dimensions of HSOPSC 2.0 were employed to predict four different criteria: 1) ‘number of reported events’, 2) ‘patient safety rating’, 3) ‘pleasure at work’, and 4) ‘turnover intentions’. The composite measures explained variance of all the outcome variables significantly thereby ascertaining criterion-related validity (Table 5). Regression models explained most variance related to ‘patient safety rating’ (adjusted R2 = 0.38), followed by ‘turnover intention’ (adjusted R2 = 0.22), ‘pleasure at work’ (adjusted R2 = 0.14), and lastly, number of reported events (adjusted R2 = 0.06).

Table 5 Regression models to assess the criterion-related validity

Discussion

In this study we have investigated the psychometric properties of the N-HSOPSC 2.0. We found the face and content validity of the questionnaire satisfactory. Moreover, the overall statistical results indicate that the model fit based on CFA was acceptable. Five of the N-HSOPSC 2.0 dimensions had AVE scores below the 0.5 criterium, but we consider this to be the strictest criterium employed in the evaluations of the psychometric properties. The CR criterium was met on all dimensions except ‘teamwork’ (0.61). However, ‘teamwork’ was one of the most important and significant predictors of the outcomes. One the positive side, the CFA results supports the dimensional structure of N-HSOPSC 2.0, and the regression results indicate a satisfactory explanation of the outcomes. On the more critical side, particularly AVE scores reflect threshold below 0.5 on five dimensions, indicating items have certain levels of measurement error as well.

In our study, regression models explained most variance related to ‘patient safety rating’ (R2 = 0.38), followed by ‘turnover intention’ (R2 = 0. 22), ‘pleasure at work’ (R2 = 0.14), and lastly, number of reported events (R2 = 0.06). This supports the criterion validity of the independent dimensions of N-HSOSPC 2.0, also when adding ‘turnover intention’ and ‘pleasure at work’. These results confirm previous research on the original N-HSOPSC 1.0 [12, 13]. The current study also found that ‘number of reported events’ was negatively related to safety culture dimensions, which is also similar to the N-HSOPSC 1.0 findings [12, 13].

The current study did more psychometric assessments compared to the first Norwegian studies using HSOPSC 1.0 [11,12,13]. However, results from the current study still support that the overall reliability and validity of N-HSOPSC 2.0 when comparing the results with the first studies using N-HSOPSC 1.0 [11,12,13]. Also, based on theory and expectations, the dimensions predicted ‘pleasure at work’ and ‘overall safety rating’ positively, and ‘turnover intentions’ and ‘number of reported events’ negatively. The directions of the relations thereby support the overall criterion validity. Some of the dimensions do not predict the outcome variables significantly, nonetheless, each criterion related significantly to at least two dimensions on the HSOPSC 2.0. It is also worth noticing that ‘teamwork’ was generally one of the most important predictors even thought this dimension had the lowest convergent validity (AVE) in the previous findings [11,12,13], even if the strict AVE criterium was not satisfactory on the teamwork dimension and CR was also below 0.7. Since the explanatory power of teamwork was satisfactory, this illustrate that the AVE and CR criteria are maybe too strict.

The sample in the current study consisted of 1009 employees at two different hospital trusts in Norway and across different professions. The gender and ages are representative for Norwegian health care workers. In total 760 workers had direct patient contact, 167 had not, and 74 had patient contact sometimes. We think this mix is interesting, since a system perspective is key to establishing patient safety [31]. The other background variables (work experience, age, primary work area, and gender) indicate a satisfactory spread and mix of personnel in the sample, which is an advantage since then the sample to a large extend represent typical healthcare settings in Norway.

In the current study, N-HSOPSC 2.0 had higher levels of Cronbach’s alpha than in the first N-HSOPSC 1.0 studies [11, 13], but more in-line with results from a longitudinal Norwegian study using the N-HSOPSC 1.0 in 2009, 2010 and 2017 respectively [23]. Moreover, the estimates in the current study reveal a higher level of factor loading on the N-HSOPSC 2.0, ranging from 0.51 to 0.89. This is positive since CFA is a key method when assessing the construct validity [16, 17, 32].

AVE and CR were not estimated in the first Norwegian HSOPSC 1.0 studies [11, 13]. The results in this study indicate some issues regarding particularly AVE (convergent validity) since five of the concepts were below the recommended 0.50 threshold [32]. It is also worth noticing that all measures in the N-HSOPSC 2.0, except ‘teamwork’ (CR = 61), had CR values above 0.70, which is satisfactory. AVE is considered a strict and more conservative measure than CR. The validity of a construct may be adequate even though more than 50% of the variance is due to error [33]. Hence, some AVE values below 0.50 is not considered critical since the overall results are generally satisfactory.

The first estimate of the criterion related validity of the N-HSOPSC 2.0 using multiple regression indicated that two dimensions where significantly related to ‘number of reported events’, while six dimensions were significantly related to ‘patient safety rating’. The coefficients were negatively related with number of reported events, and positively related with patient safety rating, as expected. In the first Norwegian study in Norway on the N-HSOPSC 1.0 [13], five dimensions were significantly related to ‘number of reported events’, and seven dimensions were significantly related to ‘patient safety ratings’. The relations with ‘numbers of events reported’ were then both positive and negative, which is not optimal when assessing criterion validity. Hence, since all significant estimates are in the expected directions, the criterion validity of N-HSOPSC 2.0 has generally improved compared to the previous version.

In the current study we added ‘pleasure at work’ and ‘turnover intention’ to extend the assessment of criterion related validity. The first assessment indicated that ‘teamwork’ had a very substantial and positive influence on ‘pleasure at work’. Moreover, ‘staffing and work pace’ also had a positive influence on ‘pleasure at work’, but none of the other concepts were significant predictors. Hence, the teamwork dimension is key in driving ‘pleasure at work’, then followed by ‘staffing and working pace’. ‘Turnover intentions’ was significantly and negatively related to ‘teamwork’, ‘staffing and working pace’, ‘response to error’ and ‘hospital management support’. Hence, the results indicate these dimensions are key drivers in avoiding turnover intentions among staff in hospitals. A direct association has been reported between turnover and work strain, burnout and stress [19]. Zarei et al. [20] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Furthermore, a negative correlation between occupational burnout and safety climate was discovered, where a decrease in the latter is associated with an increase in the former [20]. Hence, patient safety researchers should look at health care job characteristics in combination with patient safety culture.

Assessment of psychometrics must consider other issues beyond statistical assessments such as theoretical consideration and face validity [16, 17]; we believe one of the strengths of the HSOPSC 1.0 is that the instrument was operationalized based on theoretical concepts. This has been a strength, as opposed to other instruments built on EFA and a random selection of items included in the development process. We believe this is also the case in relation to HSOPSC 2.0; the instrument is theoretically based, easy to understand, and most importantly, can function as a tool to improve patient safety in hospitals. Moreover, when assessing the items that belongs to the different latent constructs, item-dimension relationships indicate a high face validity.

Forthcoming studies should consider predicting other outcomes, such as for instance mortality, morbidity, length of stay and readmissions, with the use of N-HSOPSC 2.0.

Limitations

This study is conducted in two Norwegian public hospital trusts, indicating some limitations about generalizability. The response rate within hospitals was low and therefore we could not benchmark subgroups. However, this was not part of the study objectives. The response rate may be hampered by the pandemic workload, and high workload in the hospitals. However, based on the diversity of the sample, we find the study results robust and adequate to explore the psychometric properties of N-HSOPSC 2.0. For the current study, we did not perform sample size calculations. With over 1000 respondents, we consider the sample size adequate to assess psychometric properties. Moreover, the low level of missing responses indicate N-HSOPSC 2.0 was relevant for the staff included in the study.

There are many alternative ways of exploring psychometric capabilities of instruments. For example, we did not investigate alternative factorial structures, e.g. including hierarchical factorial models or try to reduce the factorial structure which has been done with N-HSOPSC 1.0 short [34]. Lastly, we did not try to predict patient safety indicators over time using a longitudinal design and other objective patient safety indicators.

Conclusion

The results from this study generally support the validity and reliability of the N-HSOPSC 2.0. Hence, we recommend that the N-HSOPSC 2.0 can be applied without any further adjustments. However, future studies should potentially develop structural models to strengthen the knowledge and relationship between the factors included in the N-HSOPSC 2.0/ HSOPSC 2.0. Both improvement initiatives and future research projects can consider including the ‘pleasure at work’ and ‘turnover intentions’ indicators, since N-HSOPSC 2.0 explain a substantial level of variance relating to these criteria. This result also indicates an overlap between general pleasure at work and patient safety culture which is important when trying to improve patient safety.