1 Introduction

The psychosocial working environment has been recognised as an important determinant of health (Landsbergis 2010; Smith et al. 2008). In particular in relation to mental health, aspects of work such as work demands, autonomy, ability to use skills, and organisational justice have been associated with various mental health outcomes (Harvey et al. 2017; Stansfeld and Candy 2006). In 2013, to help address the relationships between the psychosocial work environment and metal health, the Mental Health Commission of Canada and the Canadian Standards Association released the National Workplace Psychological Health and Safety Standard (The Standard) (Canadian Standards Association 2013). The objective of The Standard was to outline the steps workplaces could follow to 'develop and sustain a psychologically healthy and safe workplace' (Canadian Standards Association 2013). To do this The Standard provides guidance on how to identify and reduce (or control) exposures at work that may be associated with mental health risks; and how to implement practices, structures and workplace cultures that support and promote psychological health and safety in the workplace (Canadian Standards Association 2013).

The Standard proposes 13 dimensions of the work environment which have the potential to impact worker mental health. These are: psychological support; organisational culture; clear leadership and expectations; civility and respect; psychological job demands; growth and development; recognition and reward; involvement and influence; workload management; engagement; work/life balance; psychological protection from violence, bullying, and harassment; and protection of physical safety. In addition, The Standard suggests ‘other chronic stressors as identified by workers’ should also be addressed. To assess these 13 dimensions, The Standard (Canadian Standards Association 2013), along with subsequently published implementation guides (Canadian Standards Association and Mental Health Commission of Canada 2017), recommends that workplaces should use a 65-item survey instrument developed by Guarding Minds @ Work (GM@W). Despite the prominent promotion of the GM@W instrument, very little information is available concerning the psychometric properties of this instrument. In particular, while it appears that 12 of the 13 dimensions of the work environment identified in The Standard were based on the GM@W instrument, there is a brevity of evidence in the peer-reviewed literature, that the theoretical relationship between the 13 dimensions and the 65 items in the GM@W questionnaire is supported by data collected from workers. While employers recognise the importance of the psychosocial work environment, the number of dimensions contained in The Standard and complexity of dimensions have been identified as barriers for employers to implement The Standard (Kunyk et al. 2016). One Canadian study has noted that while 17% of employers sampled were aware of The Standard, only 2% of these had implemented The Standard in its entirety, with another 20% implementing some elements of The Standard but not others (Sheikh et al. 2018).

Given the challenges associated with the implementation of The Standard, it is important to understand if the instrument promoted and recommended in The Standard can adequately assess the 13 dimensions it proposes to asses. In addition, it is important to document general information about the distribution of scores across items contained within the GM@W instrument, as well as the distribution of scores for each of the 13 dimensions in a sample of workers across different industries. By generating this evidence, workplaces who opt to implement The Standard, can be confident that they have an instrument that can adequately distinguish between particular dimensions of the work environment as outlined in The Standard. This will then enable them to identify specific dimensions, as part of the 13 assessed, which warrant greater focus in their organisation, and the ability to monitor change in these dimensions over time. To address this important information gap in the development of The Standard, the objective of this paper is to assess the psychometric properties of the GM@W 65 item survey in a sample of employed workers in Ontario. Specifically, we assess the reliability of the items contained within each dimension of the GM@W instrument, the relationships between items contained within the standard, and the discriminant validity of the dimensions within The Standard. Discriminant validity in a multi-dimensional construct, such as The Standard, can be defined as the ability of each dimension to account for more variation in the items associated with it, than it does for measurement error or other dimensions contained within The Standard (Farrell 2010).

2 Methods

In February 2020 a survey was conducted in collaboration with EKOS Research Associates, to assess the GM@W survey instrument. Respondents were recruited using a pre-existing panel of over 100,000 households maintained by EKOS, where participants have agreed to participate in surveys from “time-to-time”. This sample has been drawn using both landline and cellular telephones, and the distribution of the target sample is meant to mirror the actual population in Canada (based on Census data). The addition of cell phones in the EKOS sample enables the inclusion of more low-income respondents (Blumberg and Luke 2010; Call et al. 2011). To be eligible to participate respondents had to be employed in a workplace with 5 or more employees, able to complete the survey online, in English. A total of 1,006 respondents completed the survey, which represents a conservative response rate of 11.8% (given not all respondents approached would have been eligible to participate in the study).

2.1 Guarding minds @ work survey

The GM@W Survey is a 65-item instrument, proposed to capture information on the 13 dimensions of the psychosocial work environment as outlined in The Standard. Each item in the GM@W survey is a positively worded statement about the work environment, with 4-level agreement scale used as response options (strongly disagree, disagree, agree, strongly agree). Higher scores on each item reflect greater agreement, and therefore more positive assessment of the work environment. Certain dimensions in the GM@W questionnaire are named differently than the names originally provided in The Standard. These are: ‘Psychological Competencies & Requirements’ for ‘Psychological Job Demands’; ‘Psychological Protection’ for ‘Psychological Protection From Violence, Bullying, and Harassment’; and ‘Balance’ for ‘Work/Life balance’. In this paper we have used the names as provided as part of the GM@W instrument. Items are evenly distributed across dimensions, with each dimension in The Standard being reflected by five items in the GM@W questionnaire. Items from the survey were administered in the same order as they are provided on the GM@W website (Guarding Minds @ Work 2016).

2.2 Analysis

Initial analyses examined the distribution of each of the survey items to identify floor or ceiling effects. While having no more than 15% of the sample in the top or bottom of the response scale has been suggested to assess the absence of floor/ceiling effects (McHorney and Tarlov 1995), given the GM@W survey items are positively worded statements on an agreement scale, we instead used a threshold of having no more than 80% of the sample agreeing or disagreeing with a particular statement as evidence of the absence of floor/ceiling effects (Cadarette et al. 2004). We then examined the correlations between items, both within the same dimension, and between items theoretically influenced by different dimensions. We also examined the distribution of scores across each of the 13 dimensions, to allow comparability between our sample and previous data collected by GM@W. Finally, we examined factor structure of the GM@W scale, using confirmatory factor analysis (CFA). In a multi-dimensional scale, responses to items assigned to a given theoretical dimension should be most strongly influenced by that latent construct/dimension and be less strongly influenced by other latent constructs/dimensions. As part of the CFA paths between each of the 13 latent constructs/dimensions proposed as part of The Standard and the 5 items identified to measure that dimension were specified. Paths between dimensions and items not measuring those dimensions were set to zero. Correlations were allowed between each of the 13 dimensions in The Standard, and error correlations were included if they belonged to items within the same dimension. The degree to which a proposed theoretical model is supported by the data was assessed using various model fit indices. We used three types of model fit indices: absolute fit; incremental fit; and parsimonious fit (Hatcher 1996; Kline 1998). Absolute fit is focused on the ability of the proposed model to reproduce the data, and is measured in this paper using the chi-square (χ2) statistic, as measure of the deviation between the proposed model and the actual correlation or covariance matrix. Incremental fit is concerned with comparing two competing models, with the first usually being a model where no relationships are specified and the comparison being the proposed model. Incremental fit was assessed using the comparative fit index (CFI), and the non-normed fix index (NNFI), also known at the Tucker-Lewis Index. Parsimonious fit assesses the tradeoff between the number of parameters estimated (noting that a better model fit can always obtain a better fit by estimating more parameters) and model fit. Parsimonious fit was assessed using the Root Mean Square Error of Approximation (RMSEA). We used guidelines proposed by Hu and Bentler (1999) to assess goodness of fit. These are a p-value of greater than 0.05 for the chi-square statistic, values of 0.95 and higher for the CFI and NNFI and values of 0.08 and lower for the RMSEA. A final set of analyses examined the reliability, and convergent and divergent validity of each latent construct using estimates from the CFA procedure. For reliability we used coefficient H (Hancock and Mueller 2001), and for convergent and divergent validity we used the Average Variance Extracted (AVE), Maximum Shared Variance (MSV), and inter-factor correlations (Farrell 2010; Fornell and Larcker 1981). For good reliability all coefficient H values should be 0.8 or above (Hancock and Mueller 2001), AVE values of 0.5 and higher indicate convergent validity, and divergent validity is indicated when the AVE is greater than the MSV, and the square root of the AVE value for a dimension (factor) is higher than correlations between that factor and other factors (Farrell 2010; Fornell and Larcker 1981). Given items are on a four-point scale, we used a Spearman correlation matrix as the input for the CFA, rather than a Pearson correlation matrix, which assumes normally distributed items. Analyses were conducted in SAS Version 9.4 (The SAS Institute 2017). CFA was conducted using PROC CALIS with a maximum likelihood estimation procedure.

3 Results

Of the initial sample of 1,006 respondents, 106 (10.5%) were missing information on one or more of the GM@W survey items, leaving a sample of 900 respondents with complete information on all 65 GM@W questions. Of these 106 respondents the majority (N = 77.73% of the 106) were missing information on only one of the 65 items. Table 1 presents the distribution of the sample by sex, age, employment status, industry and overall psychological job quality. Respondents with missing data were more likely to be male (compared to female). There was trend for greater missing data (as assessed by a Mantel-Haenszel Chi-Square test) with older age, and better psychological health and safety climate. No differences were observed in missing data across employment status or industry categories.

Table 1 Distribution of missing data on GM@W questions across sample demographic and work characteristics

The distribution of responses across each of the 65 items in the GM@W measure is provided in the Appendix (Table A1). In general, missingness across items was small (less than 1.3% of respondents for all items). A total of 14 or the 65 items had 80% or more of the sample in the agree or strongly agree category, including 4 out of the 5 items in the engagement dimension, and 3 out of the 5 items in the physical protection & safety dimension, indicating potential ceiling effects.

Table 2 presents the average Spearman correlations for the five items within each of the 13 dimensions of the GM@W questionnaire, and the average spearman correlations between the five items within that dimension and the other 60 items outside of that dimension. In general, correlations were slightly higher with items within the same dimension, than between items outside of that dimension. For one dimension (psychological competencies and requirements), the average correlation between items and other items outside of that dimension was slightly higher than the correlation between items within that dimension. In addition, for three other dimensions (recognition and reward, involvement and influence and workload management) correlations within and outside of the dimension were similar. Table 3 provides examples of four pairs of items where the Spearman correlations were noted as high. In each case, the wording of the item seems similar, despite the items theoretically being influenced by different dimensions of The Standard.

Table 2 Average Spearman correlations between items within the same dimension, and average Spearman correlations between items within a dimension with items outside of that dimension. Employed Ontario respondents working in workplaces with 5 or more employees
Table 3 Examples of highly correlated items, across dimensions

Figure 1 presents the distribution of scores across each of the 13 dimensions and overall, using the four categories suggested by GM@W. Scores between five and nine are labelled ‘Serious Concerns’, scores of 10 through 13 are ‘Significant Concerns’, scores of 14 to 16 are ‘Minimal Concerns’ and scores of 17 through 20 are ‘Relative Strengths’. Dimensions with the most positive profile (greatest percentage with relative strengths) were engagement and protection of physical safety, while the dimensions with the most negative profiles (highest percentage in serious concerns) were organisational culture, clear leadership and expectations, and psychological protection.

Fig. 1
figure 1

Distribution of scores across each of the 13 dimensions contained within the GM@W questionnaire (N = 900)

Table 4 presents the model fit statistics from two confirmatory factor analysis models. The first is a model with each item influenced by its theoretical dimension, and covariances specified between each of the 13 dimensions (latent constructs). The second model includes two correlated error terms for items within a particular dimension, based on modification indices. Due to these correlated errors the second model has two fewer degrees of freedom. In general, both models had challenges with convergence due to the predicted covariance matrix not being positive-definite. Further investigation suggested this was due to the latent constructs being highly correlated with each other. The degree to which the theoretical model from GM@W questionnaire was supported by the data was poor. All fit indices were below recommended cut offs, although the upper bound of the RMSEA was below the 0.08 threshold. It should be noted that the absolute fit indices (chi-square statistic) are also strongly influenced by sample size (Kline 1998). In the CFA model with the two correlated errors (GM@W (mod)—the second model referred to above), 18 of the 78 correlations between the 13 latent factors were 0.95 and higher, and only one (between engagement and physical safety) was below 0.60.

Table 4 Model fit statistics from confirmatory factor analysis of GM@W questionnaire

Table 5 presents estimates for coefficient H reliability, average variance extracted and maximum shared variance for each of the 13 dimensions from the original model, with no correlated errors. Estimates for the model with correlated error terms were not appreciably different so are not presented, but are available from the authors on request. In general, the reliability of items within each dimension all exceeded acceptable levels. The AVE, which represents how much variance in the items for each construct that can be explained by the latent factor, were below 0.5 for three of the 13 dimensions, and above 0.6 for only 2 of the 13 dimensions. While it is recommended that the AVE should be greater than the MSV—which represents the maximum variance that the latent construct can explain in other latent construct—this was not the case for any of the 13 dimensions.

Table 5 Coefficient H (reliability), Average Variance Extracted and Maximum Shared Variance for dimensions of the GM@W Questionnaire among Employed Ontario respondents working in workplaces with 5 or more employees

Table 6 presents the correlation matrix between the 13 dimensions from the original model, and the square root of the AVE. For discriminant validity the square root of the AVE for a dimension should exceed each of the correlations between that dimension and other dimensions. As demonstrated by the grey shading, this criterion was met in nine of the 78 correlations.

Table 6 Square root of the average variance extracted and inter-factor correlations for dimensions of the GM@W questionnaire among Employed Ontario respondents working in workplaces with 5 or more employees

4 Discussion

Psychological health and safety is gaining increasing attention in workplaces across Canada and in other developed economies globally. The 2013 National Workplace Psychological Health and Safety Standard, produced by the Canadian Standard Association and commissioned by the Mental Health Commission of Canada (Canadian Standards Association 2013), recommends that workplaces looking to assess and address the 13 dimensions of the work environment as outlined in The Standard, should use the 65 item GM@W questionnaire. Yet, to date, limited information is available on the psychometric properties of the GM@W questionnaire, and if it can measure, and distinguish between the 13 dimensions of the work environment as outlined in The Standard. In a sample of 900 Ontario workers, from a variety of industries, we observed numerous measurement issues in the GM@W survey, specifically concerning highly correlated items which are theoretically influenced by different dimensions, and an inability of the items within the GM@W survey to be able to isolate particular dimensions from The Standard. In general, dimensions of The Standard measured with the GM@W survey demonstrate poor discriminant validity. That is, the ability of each dimension to explain the variance in the items associated with that dimension is lower than the ability of that dimension to explain variance in other dimensions. These limitations are important, as workplaces using the GM@W survey will not be able to isolate the different dimensions as outlined in The Standard, or assess changes in particular dimensions over time.

There are strengths and limitations which should be taken into account when interpreting our study findings. The response rate to our survey was low (approximately 12% of respondents in the EKOS panel who were approached to complete the survey). Compared to estimates from the employed labour force in Ontario from Statistics Canada’s Labour Force survey in February 2020, our sample had a higher proportion of females (54% versus 50%), fewer respondents under the age of 25 (4% versus 14%), slightly more permanent full-time workers (80% versus 76%), fewer respondents from the construction industry (3% versus 6%) and more respondents from the education (16% versus 9%) and public administration (14% versus 6%) industry groups. However, we were able to recruit workers across a variety of industry groups, age groups, with variation in assessments of the psychosocial work environment (22% of our sample described the psychological health and safety climate in their workplace as healthy or supportive, and 14% described their psychological health and safety climate as poor or toxic). We can compare our results to those previously published from a survey conducted in 2012 commissioned by The Great-West Life Centre for Mental Health in the Workplace. In this survey of 6,624 Canadian workers, using a different household panel, the most positively rated dimensions in The Standard were engagement, followed by protection of physical safety, while the least positively rated dimensions were organisational culture, growth and development, psychological support and psychological protection (Great-West Life Centre for Mental Health in the Workplace 2012). In our sample the same two dimensions were rated most positively, and organisational culture and psychological protection were amongst the least favourable. The levels of serious and significant concerns were similar, although slightly higher in our sample, compared to this earlier survey (Great-West Life Centre for Mental Health in the Workplace 2012). It is possible that the level of psychosocial work conditions that someone experiences can be related to their propensity to respond to a survey. However, it is likely this can work in both directions (i.e. people with more negative conditions can be more likely to respond, or people with more positive conditions may be more likely to respond). Given no information on the GM@W standard is available from a representative sample of the Canadian population, it is not possible for us to estimate which direction this bias might be present in our sample (if at all). If the differences in serious and significant concerns in our sample, compared to the previous Great-West Life Centre sample, are indicative that people with more negative psychosocial work environments were more likely to respond to our survey, the estimates of the potential ceiling effects in our sample may be underestimated. We used a Spearman correlation matrix to account for the non-normal distribution of responses across items in our CFA. We did compare these results to other potential approaches, such as a polychoric correlation matrix. In general, CFA model fit statistics when using a polychoric correlation matrix were worse than when using the Spearman correlation matrix (results not presented but available from authors on request). In addition, we have not assessed all psychometric attributes of the GM@W survey. We had originally also intended to conduct a test-retest assessment of each of the 65 items as part of the survey. However, the timing of the retest assessment was during the initial workplace closures in Ontario due to COVID-19, as such we did not have enough respondents where the psychosocial work environment had been stable between test and retest.

Comparing our results to previous examinations of the GM@W survey is challenging, given the limited number of peer-reviewed publications on this instrument. We do not think the poor model fit statistics observed in our study are specific to the EKOS panel or the Canadian labour force. For example, previous CFA examinations of the 19 dimensions contained within the Copenhagen Psychosocial Environment Questionnaire (COPSOQ), using the same panel for recruitment, have demonstrated good model fit (Ramkissoon et al. 2019). In addition, the high correlations between dimensions captured in the GM@W survey have been noted in a report from a Portuguese sample (Magalhães and Paul 2017), while the inability of the GM@W survey to reflect the 13 dimensions of The Standard has also been noted among a Canadian sample in a conference abstract (MacLellan et al. 2016).

Understanding the implications of poor model fit and lack of discriminant validity for the GM@W survey instrument for the use of The Standard in general is complex. Although in The Standard it is inferred that the 13 dimensions of the psychosocial work environment were informed by other available psychosocial models, in particular the demand-control model (Karasek et al. 1998; Karasek and Theorell 1990), the effort-reward imbalance model (Siegrist 1996), and the model of organisational justice (Elovainio et al. 2002), it is also stated that 12 of the 13 dimensions were those that were part of the GM@W instrument at that time. Further work should examine whether the extreme overlap between the 13 dimensions measured in our study reflects poor measurement development of the items contained within the GM@W survey instrument, or general lack of conceptual clarity in the development of the 13 dimensions in the actual Standard.

In conclusion, the results of our paper suggest caution is required if using the GM@W survey to assess the 13 dimensions of the psychosocial work environment as outlined in The Standard. Based on our results, the 13 dimensions as measured by the GM@W instrument are highly overlapping, and unable to be distinguished with the 65-item survey. Given the importance of the psychosocial work environment in contemporary labour markets and workplaces, it is important that progressive workplaces looking to address dimensions of the psychosocial work environment are provided with instruments with demonstrated psychometric properties. This is particularly important for the psychosocial work environment as there are a number of dimensions that a workplace could target, and due to resource constraints, workplaces might choose to focus on particular dimensions at different times. The inability of the GM@W instrument to be able to discriminate between dimensions presents a challenge for workplaces in this situation, both in terms of assessing their current psychosocial work environment, and measuring progress in addressing particular dimensions. We would suggest alternative instruments, where psychometric attributes have already been demonstrated, should be recommended, with appropriate concurrent changes to the names and number of psychosocial dimensions made, in future versions of The Standard.