Introduction

Over the past twenty years, the cost of serious work-related musculoskeletal disorders (WMSDs) in Australia has increased by 58%, whilst time lost from the same disorders increased by 40% [1]. Globally, 26.44 million disability-adjusted life years are caused by work-related injuries [2].

Modifiable factors that have been shown to be associated with outcomes of WMSD include workplace cultural factors [3], social determinants [3, 4] psychosocial factors [4,5,6], physical and psychological demands of a job [3, 7], underlying health factors [8, 9], worker expectations [4, 5], self-efficacy [5], job dissatisfaction [4, 6] and psychosocial factors [6, 10]. Unfortunately, many of these modifiable factors are not collected in workers’ compensation claims databases, which are the key resources traditionally used for secondary research into outcomes for WMSDs [4, 11,12,13,14]. Often health plan and clinical intervention details cannot be obtained for analysis [15]. Longitudinal primary data collection can be expected to be required to address these gaps [16].

Electronic health records (EHRs) are digital versions of patient’s clinical notes, can collect longitudinal data on patients and are thus increasingly being used for research. EHR data have only recently started to be used in WMSD research [10, 17] and provide an opportunity to use real-world data for conducting research in cohort studies, randomized controlled trial studies or for building predictive models. Reuse of health care data is seen as valuable to address health care and research needs [18]. The information collected within EHRs includes many of the same demographic and injury details that are typically collected from claims databases, but they additionally provide much richer data on the details of patient care, modalities and interventions, underlying health and psychosocial factors, and other work variables. There is an opportunity to further research into WMSD outcomes through deidentified data collected from EHRs.

With demonstrated benefits of musculoskeletal (MSK) therapists working onsite in occupational health settings [19], there is the additional opportunity to collect data earlier, from the time of the first signs of pain or dysfunction, rather than at the time of an injury claim. This provides an opportunity for EHR studies to fill gaps in what is known about WMSDs from the time they occur to the time of claim [4, 10] or, indeed, to analyse what factors lead to claims, as patients are often only included in studies if they are completely off work for a period of time. Many WMSDs do not lead to a workers’ compensation claim or time off work yet may still have large costs associated with lost productivity and work ability. WMSDs that are not on workers’ compensation claims can become serious claims if factors affecting recovery are not addressed effectively. EHRs may provide early data on these pre-claim factors.

Due to the variety of MSK practitioners working in diverse settings and a lack of interoperable EHR systems, there are not yet consistent datasets from these practitioners that can be used for research into modelling outcomes of WMSDs. In addition, EHR data are often free text, which leads to challenges in extracting meaningful data due to the need for detailed chart review.

A specific criticism of many predictive models using EHR data is the relevance of the model external to the setting in which it was created [20]. Unfortunately, there is the potential for models to be bias towards demographics, particularly minority demographics, which can risk patient health outcomes [21]. Therefore, it is important to validate EHR data externally with respect to existing reference datasets [22] to determine whether the EHR dataset is relevant or to which populations it is generalizable to [23, 24] outside of the clinics where it was collected.

The data-generating organization (DGO) is a national onsite occupational health service operating in the private sector in Australia that has been collecting structured EHR data on MSK disorders for over 15 years. The DGO employs chiropractors, physiotherapists and osteopaths to treat workers in various industries. The DGO’s clients are workplaces that typically have higher job demands or repetitive work and the DGO operates in a value-based care model, where WMSD outcomes are closely monitored. The EHR collects data related to patient care, modalities used, and a vast amount of workplace psychosocial and cultural information that is relevant to workers recovery from injuries. The service collects information from the time of injury presentation and encourages early reporting of issues, which may be useful in providing a greater understanding of injury progression early on.

The EHR data could be valuable for future research or for modelling, such as for predicting outcomes of WMSDs if the dataset is relevant outside of its current setting, and broadly generalizable of national WMSD patterns. Hence, the aim of this study is to determine whether the EHR dataset contains a similar population of workers to Australian workforce and similar injury characteristics to workers’ compensation claims data. The EHR dataset has been reused for reporting of WMSD outcomes internally and externally for over 15 years: internally, such as providing analytics to clinicians to help them track outcomes, such as number of visits to resolve shoulder versus elbow injuries; externally, by reporting to workplaces about injury trends, such as by tracking mechanism of injury across departments or workplace cultural factors to cost of injury. The reporting led to the hypothesis that the EHR dataset contains workforce data similar to the Australian workforce and WMSD characteristics similar to the Australian workers' compensation claim characteristics for musculoskeletal (MSK) injuries. Describing the EHR dataset will provide evidence for its usefulness and generalizability for conducting further research into WMSDs and their outcomes.

Approval was obtained from the ethics committee of Central Queensland University, CQUHREC #0000023392.

Methodology

Study Design

Data should be assessed against the purpose for which it is to be reused, which in this instance is determining the relevance of the EHR dataset outside of its setting. Determining the relevance of the data is an important first step to ensure that any further research or models built from the data are generalizable. The EHR dataset has not yet been used for research, nor is it linked to existing registry datasets, such as workers’ compensation or occupational health registries. To determine the relevance of the EHR dataset for further research, Kahn’s harmonized terminology and framework for the secondary use of EHR data were utilized [22] where appropriate. The framework addresses the intrinsic data parameters including completeness and plausibility. It verifies these parameters with external sources or gold standards. Whilst the full analysis of the EHR dataset to this framework is outside the scope of this study, the relevant parts of the framework were applied. Completeness of each variable and any implausible values were validated. The EHR dataset was verified to determine its relevance in further study against the ABS and SWA datasets.

Data Source

The EHR dataset is derived from an EHR used by musculoskeletal practitioners to treat workers with MSK disorders at occupational clinics nationally in Australia. The EHR is proprietary software built specifically to manage WMSDs. The data used for the analysis are from July 2014 to September 2021 after the EHR underwent significant upgrades in 2014. There are 57,570 musculoskeletal disorder records available for analysis and 20,663 unique patients seen across 10 industries and 101 sites. One patient may have suffered multiple WMSDs within the time frame. The data come from across seven of the eight states and territories across Australia in both rural and metropolitan settings. There are data from 59 chiropractors, eight physiotherapists and osteopaths. The EHR is highly structured, with minimal free text and many mandatory fields.

The musculoskeletal disorders (MSDs) contained in the dataset are those within the scope of practice for chiropractors, physiotherapists and osteopaths. Specifically, this excludes traumatic injuries requiring hospitalization and surgery, such as compound fractures. Non-musculoskeletal disorders such as respiratory and infectious diseases are also not managed at the clinics. The EHR dataset records visits to the health clinic within a workforce. The EHR dataset captures data from all MSK disorders and health conditions.

To protect patient privacy, the EHR dataset from the DGO was deidentified using globally unique identifier protocols, and then extracted by the organization from the relational database to a separate, secure location for the purpose of analysis as described in the TRANSFoRm Zone Model, which describes a process for dealing with data flow, privacy and confidentiality of personal patient data in research datasets [25]. Any potentially personal identifying information that was not required for the analysis was excluded from the dataset prior to extraction to further protect privacy, such as free text information. Access to the data required a secure login. Raw data were acquired in a.csv format.

To describe the EHR dataset and help determine the feasibility of the EHR dataset in further understanding workplace injuries treated onsite by musculoskeletal practitioners, two existing datasets were identified as criterion datasets, to help determine the relevance of the EHR dataset outside of current setting, the Australian Bureau of Statistics (ABS) Labour Force Survey data and the Safe Work Australia Workers’ Compensation claim data. Both of these criterion datasets contain aggregated and deidentified data.

The Australian Bureau of Statistics (ABS) Labour Force Survey data provide information about the labour market in Australia’s residents over 15 years of age and was used to determine the similarities between the Australian workforce and the workforce represented in the EHR dataset. It was chosen as it is the largest available dataset estimating Australian workforce characteristics. Variables that were assessed were industry, gender, duration of employment, age and nationality. These are key data inclusions that are important to identify population characteristics to allow for determination of similarities. The ABS and the EHR datasets have different purposes. The ABS workforce survey collects data about people in the workforce. The EHR dataset records data from people in the workforce that visit a health clinic for help with a MSK condition. ABS data were accessed through a publicly available database through the ABS website.

Safe Work Australia (SWA) collects data from all workers’ compensation claims lodged across Australia and is therefore the best dataset to use to determine similarities to the EHR dataset. SWA data were obtained from 1 July 2014 to 30 June 2020. The SWA dataset collects data on work-related MSK disorders; therefore, non-work-related MSK disorders were removed from the EHR dataset analysis. The SWA data were used to compare mechanism of injury, diagnosis and body region to the EHR dataset. SWA data were accessed from a request to SWA for publicly available data. Analysis against the SWA dataset is important as it is possible that the EHR dataset only contains minor injuries that may never be serious enough for a worker to take time off work or lodge a workers’ compensation claim, or that the EHR dataset only sees a small proportion of injuries within the specialty of the treating practitioners. It may also be that the setting of the onsite clinics influences the types of injuries seen and therefore not useful in hospital or medical practice settings.

Variable Selection

Potential variables were identified through a literature review of variables associated with outcomes of WMSDs. The EHR subject matter experts were then consulted to determine further potential variables for analysis as outlined by Steyerberg [26]. From these, variables were selected based on the availability of overlapping variables in the EHR dataset and the ABS and SWA datasets. Whilst many other potential variables were present in the EHR dataset, the analysis was limited to those that could be compared to the external datasets.

The variables included are industry, age, gender, duration of employment and region of birth, which were assessed for similarities to the ABS data, and mechanism of injury, body location, diagnosis which were assessed for similarity to the SWA data. Geographical data were not available in the EHR dataset due to deidentification processes.

Data Standardization

Dates for analysis of the EHR dataset were limited to the date ranges available from criterion datasets from 2014 to 2021.

The EHR dataset contains predominantly (99.3%) records of full-time workers, and therefore the ABS dataset was limited to full-time workers for analysis of workplace demographics. The SWA dataset obtained did not contain details of full-time employment status and therefore all employment status were included.

Industry was grouped by the Australian and New Zealand Standard Industrial Classification codes [27] in all three datasets, so no further standardization was required. Industry data were split into manufacturing and non-manufacturing. Age and duration of employment were grouped into categorical data using the grouping used by the ABS. Duration of employment recorded in the EHR dataset has some known data quality issues. Specifically, for a time, the EHR system rules set the date of employment to the date of the first appointment by default for new patients into the service. These records were identified and recorded as implausible values.

Data for nationality were aligned between EHR and ABS datasets. Further standardization was required according to EHR subject matter experts. Practitioners at the DGO report they often record a worker’s country of origin or cultural background within the nationality field, as this information was more clinically relevant than nationality. This difference leads to some inaccuracies in the nationality data. Secondly, within the EHR software, nationality defaulted to “Australia” prior to 2020 unless it was changed by the practitioner resulting in incorrect data so there was only a limited dataset for analysis. Due to this, nationality was grouped to the broader category that is used in the ABS dataset of region of birth.

The variable of mechanism of injury is recorded in the EHR as a mandatory field with several list options. These options did not completely align with the SWA dataset list options due to the SWA dataset recording non-MSK injuries and traumatic injuries that are not seen at the DGOs clinics. For this reason, it is not expected that mechanism of injury will be similar between the datasets. Non-MSK disorders were excluded from the SWA dataset. The WMSDs used for analysis included nature of injury/disease of “Traumatic joint/ligament and muscle/tendon injury” and “Musculoskeletal and connective tissue diseases”. The SWA dataset was then dichotomized into ‘body stressing’ or ‘non-body stressing’.

Body location data from SWA directly mapped to the EHR data. Both EHR and SWA datasets have a large list of potential diagnoses making standardization difficult. SWA uses an Australian coding system, which is based on the International Statistical Classification of Diseases and Related Health Problems (ICD) coding. The EHR diagnosis variable list is determined from conditions seen in the clinic and is not currently aligned with ICD or other coding systems. Six diagnoses were selected for analysis. These diagnoses were selected as they have tighter diagnostic criterion and are more likely to have pathoanatomical diagnosis and objective findings, rather than pain-based conditions that may not display pathoanatomical changes and are more subjective in diagnostic criteria, such as trigger points or lumbago.

Data Analysis

Identification of the completeness of variables was determined using R statistical software v4.3.2 and reported as missingness. Potential reasons for missing values were summarized after consultation with EHR subject matter experts. The EHR subject matter experts were senior clinical leaders with experience in health informatics. Implausible values analysis was conducted to determine values that are outlying or likely to be incorrect based on local knowledge of the EHR subject matter experts, usually through assessing distribution analysis and by an understanding of potential system flaws and areas of potential clinician misuse of the EHR system.

EHR variables were analysed to determine percentage of records in each category and confidence intervals [23]. Confidence intervals were calculated to determine the limitations and stability of the results and therefore the confidence in the hypothesis.

The EHR dataset contains multiple WMSDs from a single worker over their career in the workforce. The ABS and SWA datasets also contain the same person multiple times, for example with two workers’ compensation claims. For this reason, it was more relevant to assess the EHR dataset at the WMSD event level rather than the person level.

The mean difference between the percentage of records for each variable analysis was recorded to provide a visual on the differences between the datasets. Mean difference confidence intervals were determined to be not appropriate for calculating as differences are expected as each dataset has a different primary purpose. For reporting, variables were reported as similar between the datasets if there was less than 10% difference between them. A result of over or under 10% however does not indicate that the EHR dataset should be reused. Models could be built with the EHR dataset with much wider variability; however, the population that the model is generalizable to may be diminished.

Results

After standardization, most analysis involved 48,434 patient care plans across 10 industries. Records came from 101 workplaces from 2014 to 2021.

The database collects information relating to demographics, health history, injury details, examination, diagnosis, care plan, intervention and treatment details. Additionally, workers’ compensation details, advice given to employers, workers and patients, workplace modifications and work accommodations form part of the clinical record.

Physical job demands as well as workplace/job cultural, psychological and health demand factors are also collected through workplace assessments, and these are linked to the job that patients are conducting when injured. These fields are not used for this assessment but may be used in future analyses.

Obstacles to recovery or patient psychosocial “flags” [6] variables are recorded in structured format and have the potential to provide rich information in further studies. These include health factors, psychological factors, work beliefs, system and environmental factors [10, 28].

Completeness Results

All variables besides industry were mandatory data capture, and therefore completeness was usually 100%, as shown in Table 1.

Table 1 Completeness of EHR dataset variables

A known quality issue existed with duration of employment, which meant that a default setting recorded injury date as the date of employment if not changed by practitioners, which is likely to be incorrect in all but the rarest cases. These were marked as missing and any subsequent care plans from these patients were recorded as implausible and all were excluded from the comparison to the ABS dataset when assessing of duration of employment only.

No other implausible values were identified, likely due to a well-structured EHR with rules around allowed values for each variable. Patients ages ranged from 14 to 77 years and duration of employment ranged from 0 days to 49 years.

Of the records included in the analysis, 95.4% were from the manufacturing industry.

Distributions of age, duration of employment, gender and region of birth were similar between the EHR dataset and ABS dataset as seen in Table 2. There were more females in the manufacturing industry in the EHR dataset than in the Australian workforce population (9.5%). There were also less employees with over 10 year’s service in the EHR (9.1–12.4%).

Table 2 Comparison of EHR dataset to ABS labour force survey datasets

In analysing similarities from the EHR dataset to the SWA dataset, upper limb WMSDs were more prevalent (12.3–16.4%) and lower limb WMSDs were less prevalent (13–17.3%). Diagnoses were similar between the EHR and SWA data but limited to low percentages of records due to the selection of diagnoses analysed (14.3–15.7%). A mechanism of injury of body stressing was much more prevalent in the EHR dataset (26.2–32.9%) due to the nature of the injuries seen in the clinic (Table 3).

Table 3 Comparison of EHR dataset to SWA workers’ compensation claims dataset

Discussion

The Analysis

Completeness is a common problem in EHR studies and whenever data are used for secondary analysis. The EHR dataset demonstrates high completeness compared to many similar studies [29,30,31,32]. Completeness in the EHR in this study is a consequence of mandatory data capture in many fields. This, however, leads to other types of errors and potential bias [33]. Practitioners may, for instance, always select the same list options for every patient. As EHRs are becoming more structured in data collection, assessment of practitioner’s individual data entry will become crucial, rather than simply relying on completeness findings. Further analysis of individual practitioner data entry in this EHR dataset has been previously described [34]. The completeness findings also demonstrate the importance of subject matter expert knowledge to understand where potential data quality issues exist.

Whilst the confidence intervals are acceptable, there is known variability in WMSDs in different industries [35]. Even within a single industry, there can be high variability due to the specifics of each workplace. For example, local populations vary as regional locations have different migrant populations. Even within an industry, the job demands can vary due to factors such as automation. Workplace hiring policies impact the specifics of the population at each workplace. Perhaps most importantly, workplace cultural factors and injury management program details all impact who attends onsite health clinics. On top of the factors external to the DGO, internal organizational factors such as practitioner training and clear definitions around data input requirements impact confidence intervals. Additionally, confidence intervals will be affected by the clinical and operational governance processes in the DGO as well as many other data quality parameters outlined in frameworks specific for the reuse of EHR data [22, 36]. Within the DGO in this study, there is already reuse of the EHR data for reporting outcomes to workplaces and clinicians. For example, practitioners in the organization have metrics and governance processes for reviewing clinical notes for clinical quality reasons. Failure to report on organization practices can lead to issues with external and internal validity of studies and models [37].

The Variables

The EHR dataset was found to contain records predominantly from the manufacturing industry. Manufacturing is usually found to have an increased risk of poor outcomes and long-term disability compared to many industries [16, 38, 39]. So, whilst the dataset was not shown to be representative of industries across the Australian population, there is a need for investigation into higher rates of disability and poor outcomes for the manufacturing industry.

The EHR dataset demonstrated a similar distribution to ABS data for age groups and duration of employment. The EHR dataset has more representation in younger age groups. This is likely to be due to the types of workplaces that the EHR dataset represents. The DGO typically works at employers that have high risk and heavy manual roles. These employers often employ international working holiday visa holders who are younger in age and have shorter employment lengths. These international workers are excluded in the ABS dataset. The findings describe the physical outcomes workers may experience in workplaces with high manual labour roles. In the authors’ experience, older age and long-term workers tend to self-select for alternate employment due to the impact of years of hard labour on their bodies, which is supported by the literature finding that older injured workers are less likely to return to work and suffer long-term disability [38,39,40]. These variations may have implications on further study of the EHR dataset and affect the generalizability which may need to be accounted for in models built from the data.

Gender is not completely comparable between datasets and differences are likely to be partially due to the population studied, assessment of working hours or how they are broken down for analysis. Gender analysis would be improved with specific recording of sex and gender allowing for diversity.

The EHR dataset demonstrates a slightly higher rate of Australian workers (2.2–3.5% mean difference) than the ABS data. This could be expected as ABS data exclude international residents as previously discussed. Many regions of birth are represented in the EHR dataset which may be useful in further research to analyse cultural and genetic physical differences in types and response to WMSDs. For example, the average height of a Burmese male is 164.7 cm versus an Australian male is 175.6 cm. Outcomes to injury analysis may need to consider work modifications such as bench heights as factors influencing recovery.

The EHR dataset was unable to be completely compared to the SWA dataset due to differences in reporting categories. Comparing the ‘body stressing’ category found that the EHR dataset reporting around 92% of complaints as ‘body stressing’ compared to SWA data of 58.9–66%. The SWA dataset would include many conditions outside of the scope of practice of the onsite health service, such as those requiring surgical intervention, likely explaining the variability. Non-traumatic injuries, such as those involving repetitive mechanisms, are often found to be more likely to lead to poor recovery [4, 16, 39, 40]. Repetitive movement mechanisms represent 37% of the injuries within the EHR dataset providing an opportunity for further analysis into factors leading to poor outcomes within this population.

The EHR dataset contained more upper limb MSDs than the SWA dataset, with less lower limb and trunk MSDs. Body region coding is often subjective. Clinician may record area of pain or the area related to the underlying cause of the problem. Neck and back injuries are commonly reported as having worse outcomes for long-term disability [38, 41] and are well represented in the EHR dataset.

There were six diagnosis categories analysed, with similar rates of diagnosis within the EHR dataset and SWA dataset, although percentage of records were low in each diagnosis. Variation is expected as there are many challenges with diagnosis. Practitioners tend to diagnose MSK disorders either by the pathoanatomical lesion or by the potential causative nature. Interpractitioner reliability of tests used to diagnose is often questionable [42, 43] and skill level of practitioners likely plays a role.

The EHR Opportunities

Lack of employment secondary to health issues has negative consequences on health [44, 45] just as early returning to work has benefits [46]. A strong predictor of long-term disability is days until medical care is received for a work-related injury [38]. Staying at work with coordinated care [4] and appropriate work modifications is better for return to work, reduced costs and more positive outcomes [47, 48]. EHRs used by onsite clinics can collect data from the time an injury occurs and create the opportunity to develop early predictors to determine who is more likely to stay at work or intervene early to reduce the risk of poor return to work with coordinated care. This has been demonstrated by early collection of the single-item Work Ability Index which can predict the risk of long-term disability [44]. Recording of known psychosocial, social determinants, job demands, work beliefs and environmental/system factors through EHRs needs to occur as early as first signs of pain, dysfunction or loss of work ability to be able to offer early intervention, even prior to claim submission.

This study helps to demonstrate the value of the EHR dataset for reuse in the wider population. Whilst the comparison was to Australian datasets, the study is relevant globally as many countries such as Canada, United States, United Kingdom and India operate similar workers compensation systems. There is an opportunity with the adoption of EHRs to develop EHR WMSD registries to allow for better research into WMSDs.

Checking the relevance and generalizability of the data is an important step to understand potential further uses for the dataset. When implementing predictive machine learning models, the predictive value of a machine learning model reduces as the model is applied in different settings [49] and checking the relevance of the data may provide a better understanding of whether the model should be used in different settings. Population bias is a known issue with many EHR machine learning models [21]; for example, models often fail to accurately represent all nationalities due to lack of diversity in training sets. This research paper helps provide an understanding of the generalizability of the EHR dataset so that any lack of diversity or data deficiencies can be understood [50] prior to further research and guide development of appropriate research questions.

The study demonstrates that this DGO real-world dataset derived from musculoskeletal practitioners can be used to advance WMSD outcomes, advocating for chiropractors, physiotherapists and osteopaths in treating and managing WMSDs with and without workers’ compensation claims. The research is important in setting a benchmark of what is achievable when using EHRs, as currently, EHR data from allied health practitioners are challenging to collate, due to practitioners working in multiple disparate setting on different systems.

Strengths and Limitations

The strength of the study is the ability to provide an important methodological step in assessing the suitability of an EHR built to manage WMSD for research which has the potential to improve predictive modelling such as machine learning models built on EHR data. Demonstrating the generalizability of the data reduces risks of bias in models and increases the chances models can be used broadly. The EHR collects structured data on psychosocial factors and workplace factors such as workplace culture, which are often not available in workers’ compensation claims databases or registry data.

One EHR is not representative of all EHRs or all practitioners or professions. The study is limited to a single EHR dataset of workers who presented for care to one organization and is not necessarily representative of the entire workforce at a worksite.

Whilst this study demonstrates that the EHR dataset contains similar data to ABS and SWA datasets, this does not mean that a model produced from the dataset would be valid across all MSK occupational health care organizations as further data quality analysis is required first. Modelling would also need to assess the methods used in this study to determine if by assessing the EHR dataset at the person level leads to overrepresentation of specific types of workers, such as those that are likely to suffer more WMSDs.

Whilst the study does not report on many of the variables that are collected within the EHR, it accurately determines an important first step, that the EHR dataset is similar to workforce characteristics and workers’ compensation claims statistics in the industries it works with.

A specific framework for the assessment of data quality with EHRs for secondary research [22] has been used as the terminology and framework for this paper in effectively determining missingness and external verification of the EHR dataset; however, the paper does not attempt to determine the overall data quality of the EHR. Further studies should conduct a complete data quality assessment in line with the recognized frameworks to determine if the dataset is appropriate for use in future research or building predictive models into outcomes of WMSDs.

Future Research

The study was unable to analyse many variables that have been shown to be important in understanding outcomes to WMSDs such as obstacles, work modifications, job demands, psychosocial and health interventions as no comparable data are available in the national datasets. However, these variables are available for analysis within the dataset and will be analysed in further data quality analysis studies to determine the reliability of the data for predicting outcomes to WMSDs. The opportunity for further study on many important factors that are contained within this dataset offers potential for improving WMSD outcomes.

Conclusion

The study describes an extensive real-world data collection from chiropractors and occupational musculoskeletal professionals that can potentially be a valuable dataset for further research into WMSDs. The EHR collects many variables known to be predictors in determining outcomes of WMSDs and that are traditionally difficult to collect, such as clinical care details, health and psychosocial factors, job demands and workplace cultural factors. The analysis of the EHR dataset demonstrates that it is similar in many ways to comparative datasets from SWA and ABS. It can be considered to be broadly representative of the Australian workforce, manufacturing industries and Australian workers' compensation claims. The EHR dataset represents a wide range of patient musculoskeletal disorders from many age groups, regions of birth, body regions and diagnoses. The EHR dataset demonstrates high completeness due to structured and mandatory data capture. Overall, the analysis suggests that the EHR will support meaningful research and can contribute to reducing the costs and impact of WMSDs.