Background

Valid data on morbidity, at the regional level, is essential for planning of primary healthcare services that are specifically tailored to the needs, demands and requirements of the local population. The currently available peer reviewed literature indicates multiple avenues of population health surveillance including national surveys, administrative data and electronic health records. National surveys are surveillance methods that have commonly been used [1, 2], but that demonstrate multiple shortcomings. Prevalence estimates obtained from national and state level surveys are based on self-reported measures, provided by respondents of these surveys, the accuracy of which is questionable [2]. Secondly, surveys do not include all aspects of health, and therefore do not provide a full and accurate picture of health [3], resulting in a lack of generalisability to smaller regional populations [4]. Additionally, different subgroups of the population may demonstrate different response rates to surveys, which impacts on the generalisability of the survey data to the general population [5, 6]. Surveys, however, provide prevalence rates comparable to routinely collected clinical administrative data [13], indicating that clinical administrative data is a potential avenue for surveillance. The use of information obtained from administrative data, including primary care medical records (data collected during general practitioner (GP) visits), physician billing, specialist visits, pharmacy data (prescription dispensation) and hospital data (inpatient/outpatient information); has been shown to provide reliable and valid prevalence estimates of chronic disease conditions [13, 7, 8].

Clinical administrative data is widely available [1] and its collection and reporting systems are currently in place in both primary care and tertiary levels of care [9]. Further, administrative data is validated by the clinical judgment of medical practitioners and may be generalisable to smaller/regional populations [10]. Patient data that is entered into electronic medical recording software at the point of contact with primary health care practitioners is often supported by diagnostic testing and clinical examination, and is thus likely to be more valid than self-reported health information [10]. The peer-reviewed literature hence provides a vast amount of current evidence on the effectiveness of utilising administrative and/or primary care data for population health surveillance; however, there exists a current lack of such data based disease monitoring models in Australia.

The Sentinel Practices Data Sourcing (SPDS) project aimed to implement a sentinel site surveillance system within the Illawarra-Shoalhaven region of the state of New South Wales (NSW) in Australia to obtain a region-specific prevalence of chronic diseases and mental health disorders through the use of patient data obtained during primary care patient interactions. A pre-tested method of data extraction [10] was used, aimed at informing the population health planning within health service catchments of regional Australia.

Methods

The study was conducted in the Illawarra-Shoalhaven region of the state of NSW in Australia (Figure 1). Within NSW there are 15 Local Health Districts which are responsible for the acute, sub-acute and tertiary care service delivery in the state through the public hospital system; and 17 Medicare Locals which are responsible for the primary healthcare planning and delivery for their constituent regions [10]. Unlike other regions, the geographical catchment boundary of the Illawarra-Shoalhaven Local Health District (ISLHD) is the same as that covered by a single Medicare Local namely the Illawarra-Shoalhaven Medicare Local (ISML); which places the region in a unique and advantageous position in terms of planning and implementing a chronic disease surveillance system. Additionally the region has a diverse socio-economic profile and has pockets of both higher and lower socio-economic disadvantage, comparing the Index of Relative Socio-Economic Disadvantage (IRSD) scores between the region and for Australia as a whole (Figure 2). IRSD is a composite summary measure constructed by the Australian Bureau of Statistics (ABS) for all regions in Australia and is a based on income, educational attainment, employment status, occupation type, family structure, dwellings, house ownership, marital status and ethnicity [11].

Figure 1
figure 1

Illawarra-Shoalhaven region of NSW and its Statistical Local Areas (SLAs) on the map of Australia.

Figure 2
figure 2

Socioeconomic snapshot of the Illawarra-Shoalhaven region: Index of Relative Socio-Economic Disadvantage (IRSD) scores compared with NSW state and Australian national averages. *Illawarra-Shoalhaven score is a population weighted average of individual SLA scores.

The study undertook secondary analysis of administrative data through extraction of de-identified clinical patient information and the project was rolled out in 4 phases: -

Phase 1: Practice recruitment

The study aimed to recruit 12–18 practices within the Illawarra-Shoalhaven region based on the Statistical Local Area (SLA) geography and the demographic profile of the catchment. Eligible practices were identified by requesting the Illawarra-Shoalhaven Medicare Local (ISML) for a list of the region’s practices that fulfilled the following inclusion criteria (n = 52):

✓ location in one of the 7 SLAs that represent the Illawarra-Shoalhaven region;

✓ multiple (more than 1) GPs working at the practice site (solo practitioner sites are likely to have smaller patient numbers);

✓ employment of either more than one full-time GP or more than two part-time GPs (i.e. who work for at least 20 hours a week);

✓ additional criteria: -

  • installation of a clinical auditing software package on desktop software or a desire to install and use the Pen Computer Systems (PCS) Clinical Audit Tool: ™(PCS CAT) (multiple licensing for PCS CAT has been procured by ISML and is therefore freely available to all general practices within its catchment); and

  • a willingness to provide de-identified practice data extracts to the researchers for surveillance purposes.

Seventeen general practices in the catchment volunteered and consented to participate in the study (response rate = 33%). Only two electronic medical record software packages were being used by participating practices in the study, either Medical Director™ (n = 8) or Best Practice™ (n = 9).

Phase 2: Data cleansing and enhancement of data accuracy

Recruited practices undertook comprehensive “Data Cleansing” training to understand the usage and the various functionalities of PCS CAT, and to update and clean the data stored in their clinical systems. With the consent of the primary GPs and managers within each practice, the “Data Cleansing” training was conducted by the researchers. The cleaning of practice records improves searches (in both the practice electronic medical record software program and the PCS CAT), to identify patients with particular conditions and thus to target health research and patient management activities. This data cleaning process allowed the complete patient database that had been entered during GP consultations to be identified when searching for specific variables. The data cleansing phase of the study was conducted using the data maintenance utility tools which are available within both the GP electronic medical record software programs used in the study. Data cleansing included: -

  • encouraging all practice staff to use the ‘drop down box functionality’ of their clinical software to define and code all medical diagnoses and other sections of the patient record;

  • strictly avoiding free text entries in all sections of the patient record;

  • finding all identifiable free text non-coded past medical history items, and either linking them to appropriate coded items or replacing them with the correct coded item; and

  • coding all inactive patients as ‘Inactive’ (an ‘active patient’ is one who has attended the practice three or more times in the past two years as defined in the Royal Australian College of General Practice: Standards for general practices [12]).

Phase 3: Data collection

Patient data that had been de-identified by practice employees was extracted to a database. Data items extracted from general practice clinical systems included: demography (population by age and sex and population geography including postcodes and suburbs), chronic disease surveillance items (hypertension, type 2 diabetes mellitus, depression, anxiety, COPD, asthma, congestive heart disease, stroke, osteoarthritis, osteoporosis, high Body Mass Index (BMI) – overweight and obese), and Medicare Benefits Schedule (MBS) items uptake relevant to primary care services for GP and other non-referred attendances. A cleaned, de-identified PCS CAT data extract was performed in September 2013 for all recruited practices which included all information obtained from patient interactions in the preceding 24 months for all diagnosed pathologies, clinical variables such as anthropometric measures, and patient demographic information such as age, sex, geographical location of residence (postcodes and suburbs) and indigenous status.

Phase 4: Data collation and analysis

The research team collated all extracted data, cross-matched residential suburb and postcode information with health and clinical information using de-identified unique link ID tags, converted all resultant information into usable database formats, and then analysed the datasets using Microsoft Excel (V2007: Microsoft Corporation, Redmond Washington, USA). The final datasets hence included clinical diagnosis and patient demographic information as entered by GPs within each participating practice. Basic epidemiological measures, including age-specific prevalence and total prevalence were calculated for all major disease conditions. The prevalence figures were compared against comparable indicators reported for the same age groups by the Australian Health Survey (AHS) 2011–12 conducted by the ABS [13].

The age-specific disease prevalence figures obtained from the study sample and the estimated national prevalence figures reported by the AHS 2011–12 were then age-standardised using the 2011 estimated resident population of Australia [14]. Comparisons across age-standardised prevalence were conducted for all major chronic conditions that the SPDS project is targeting for regular surveillance namely; obesity, overweight, diabetes mellitus, hypertension, asthma, mental health disorders such as clinically diagnosed depression and anxiety disorder, coronary heart disease, stroke, and chronic bone diseases such as osteoarthritis and osteoporosis. Both Microsoft Excel (V2007: Microsoft Corporation, Redmond Washington, USA) and the PCS CAT tool (v.3.1: pencs.com.au) were used for graphical illustration of demographic data and age-specific disease prevalence.

The study undertook secondary analysis of administrative data through extraction of de-identified clinical patient information. The study was performed with the approval of the Human Research Ethics Committee (Health and Medical) of the University of Wollongong (HE 12/447). Written informed consent was not obtained from individual patients due to the retrospective nature of the study design, however all data was exclusively extracted and de-identified by trained practice clinical staff only.

Results

The number of patients that had visited the 17 general practices within the previous 24 months (September 2011 to September 2013) was 164,435 (152,767 from within the Illawarra-Shoalhaven and 11,668 from outside of the catchment). The Illawarra-Shoalhaven catchment sample of 152,767 included 70,103 men, 82,506 women and 158 without an identified gender.

While 144 patients did not have their age recorded, the median age for the study sample (n = 152,767) was 39 years (IQR = 20 – 58 years). Adults aged 20–24 years comprised the largest age group at 7% of the total sample, followed by the 40–44 year old age group (6.8%), and 5–9 year old children (6.6%). Older adults aged 65 years and above comprised 18.2% of the sample. The population pyramid of the study sample along with the comparison with the population structure of the 2011 estimated resident population of the Illawarra-Shoalhaven catchment is shown in Figure 3. The proportion of the local residential population of the SLAs that consulted the study practices during the study period is shown in Table 1[14]. The majority of the study sample (92.9%) were found to reside within the Illawarra-Shoalhaven catchment SLAs.

Figure 3
figure 3

Population pyramid comparisons: study sample and the 2011 resident population of the Illawarra-Shoalhaven catchment.

Table 1 Proportion of local population that had consulted the 17 general practices during the previous 24 months (September 2011 to September 2013)

The age-specific population and disease counts within the study sample for major chronic conditions and high BMI are shown in Tables 2 and 3 respectively, while the crude and age-standardised prevalence comparisons of the sample and the Australian national estimates [13] are shown in Tables 4 and 5 respectively. Overall the study sample population exhibits figures higher than Australian averages for the age-standardised prevalence of chronic conditions such as anxiety, cancer, hypertension, obesity and overweight/obesity (Figure 4). An illustration of age-specific burden of disease (Figure 5) indicates that prevalence (non-age standardised) of asthma and mental health conditions (depression and anxiety) are significantly higher amongst younger age groups compared to older adults.

Table 2 Age-specific population and chronic disease counts within the study sample during the last 24 months (September 2011 to September 2013)
Table 3 Age-specific population and high BMI counts within the study sample during the last 24 months (September 2011 to September 2013)
Table 4 Crude prevalence proportions of chronic conditions in the study sample compared to Australian national averages
Table 5 Age-standardised prevalence of chronic conditions in the study sample compared to Australian national averages
Figure 4
figure 4

Comparisons of age-standardised chronic disease prevalence between the study sample and Australian national averages. *Includes adults only.

Figure 5
figure 5

Age-specific prevalence of chronic diseases within the study sample.

Discussion

In Australia, nationally representative data is available through the National Health Surveys (NHS) conducted by the Australian Bureau of Statistics (ABS) [15] and regionally through state surveys such as the annual New South Wales Population Health Survey [16]; however, extrapolations of these data to smaller geographical areas such as Local Government Areas (LGAs) and/or small area geographic regions within LGAs such as SLAs and suburbs is limited [10].

In 2011–12, the percentage of adults who saw a GP in the preceding 12 months varied across Australia ranging from 74% to 86%, with the Illawarra-Shoalhaven catchment recording the highest percentage nationally [17]. This indicates that extraction of patient and GP interactions over a 24 month period has the potential to include health information of almost the entire resident population of catchment regions such as the Illawarra-Shoalhaven; however, not all SLAs were evenly represented in terms of coverage in this study.

All general practice clinical and electronic medical record softwares utilise one of the several nationally validated health coding and medical classification systems such as SNOMED-CT, DOCLE, PYEFINCH and ICPC2+ [10]. These medical vocabularies enable recording of nationally/internationally recognised coded disease diagnosis, which also assists in maintaining accurate and consistent primary care clinical data that can be extracted and analysed [10]. Additionally, recent introduction of the Personally Controlled Electronic Health Records (PCEHR) in Australia, further requires general practices to “work towards recording the majority of diagnoses for active patients electronically” [18]. This enables accurate recording and easy identification of medical conditions and hence provides an opportunity for successful public health and chronic disease surveillance. However, Aizpuru et al. [19] suggest that chronic disease data from electronic health records provide a lower prevalence of conditions, as compared to health surveys because actions taken by physicians are often not recorded leading to cases being missed out. The limitations of primary care practice based data collection have also been illustrated in systematic reviews. Common problems include inconsistent diagnostic coding vocabulary of different clinical systems [20] and errors in data entry and recording [7, 21]. Major barriers faced by general practice staff in this regard include difficulties with clinical coding of diagnoses; complexities with software applications; preference for entering free text rather than the pre-coded options; inadequate skills in information technology; time constraints; poor motivation and low prioritisation of data entry compared to other clinical duties; inconsistency of data entry; coding of a condition in order to justify choice of prescribed treatment; and the additional burden of including laboratory test results in patient records, as well as a need to enter a diagnosis, even in the early preclinical stages of the disease (Attard E, Ghosh A, Charlton K: Barriers faced by general practice staff in maintaining clean primary care databases: a systematic review, unpublished).

Studies conducted in Canada [22, 23], Italy [24] and the UK [25] have demonstrated methods to improve the quality and accuracy of practice-based disease surveillance models. Keshavjee et al. [22] trained and employed ‘data managers’ in an attempt to standardise disease coding and de-identify patient information. Similarly, Griever et al. [23] employed a trained data entry clerk to check missing or incorrectly coded records. Cricelli et al. [24] trained GPs, themselves, in data entry and use of standard software. Pearson et al. [25] provided initial training and updates to all doctors and practice staff and carried out validation procedures such as verification of clinical coding, checking for rare diagnoses and those made outside the usual age and sex parameters through random validation visits to all participating practices.

A number of limitations to the study need consideration. The SPDS study identified various common data entry errors, including misspelt suburb names or postcodes that did not match the suburb entry, missing geographic information (postcodes and suburbs), missing values for age and sex, incorrect/mismatched entries within data entry fields such as height inserted in the weight field and/or vice-versa, and missing entries for weight and/or height measurements. While geospatial analysis of disease patterns is highly useful to target services towards areas of need [26], the SPDS data has highlighted difficulties in obtaining consistent information on patient residential postcodes and suburbs, including missing entries and mismatched entries, for example, a record with suburb of the Kiama (A) SLA and postcode of 3000 which is the incorrect postcode for this suburb. It was often unclear which variable to change in order to resolve this inconsistency and hence led to the deletion of such records from the analysis. Data quality and accuracy discrepancies required immense post-extraction data cleaning/editing efforts by the researchers which is vital to improve data linkage quality [27]. It is therefore imperative to undertake further research and technological innovation into improving utility and interface functionality of practice clinical desktop systems and creation of valid and easy to use advanced data aggregation systems which could vastly improve the processes of primary care clinical data extraction and modelling resulting in furthermore accurate prevalence estimation.

Both international literature and Australian evidence identifies a higher reported prevalence of overweight and/or obesity within primary care settings [28, 29]. It has been argued that obese patients are more likely than healthy or underweight patients to visit their physician and also more likely to be weighed and measured by practice staff and clinicians. This results in lower population denominators for obesity and overweight, as also seen in our study (Table 3), and arguably higher prevalence figures. This is another limitation of the proposed method of surveillance.

The seventeen general practices recruited from the major SLAs within the Illawarra-Shoalhaven region include approximately 40% of the resident population of the catchment area but generalisability of the findings to the general population of the Illawarra-Shoalhaven region cannot be assumed. Additionally there was a clear coverage disparity between the 7 Illawarra-Shoalhaven SLAs with high representation of Kiama (A) and Shellharbour (C) residents, moderate representation of the Shoalhaven (C) - Pt A and Shoalhaven (C) - Pt B residents and low proportional share of Wollongong (C) Inner and Wollongong (C) Bal SLAs within the study sample (Table 1). This can be attributed to the recruitment of practices that voluntarily consented to participate rather than routine surveillance as such. Thus, disease prevalence estimates drawn from the study sample may not be representative of the true population disease status for the region.

Another limitation to the study is that it only investigated the interaction between one extraction tool (the PCS CAT) and two general practice electronic medical record (EMR) software systems (Best Practice™ and Medical Director™). Although these are the most commonly used systems in Australia, the findings cannot be extrapolated to other systems. Additionally the validity of a PCS CAT extract has not been completely investigated. While the tool is co-developed by the Royal Australian College of General Practice (RACGP), the peak body of general practice in Australia and is advocated by them as an integrated product aimed at improving the way patient information can be used to better inform decisions in both clinical and business settings [30]; to date there has not been any empirical validation of the PCS CAT as a general practice data extraction tool. Further research into validation of the PCS CAT extract and the assessment of its agreement with manual data review/audit is required. A final limitation is that we only included data that could potentially be extracted from the electronic medical record software programs. While the data cleansing phase of the study focused heavily on avoidance of any free text entered into medical or clinical notes by GPs and practice staff; if a practitioner still made free text entries rather than using the codable sections of the record, then neither the extraction tool nor our manual case record reviews/audits would be able to detect those cases.

Despite these limitations the SPDS study has significant implications for public health planning, primary health care delivery and epidemiological research. Apart from ongoing chronic disease surveillance, the study methodology and protocol also has the potential to provide evidence-based direction to population health planning strategies aimed at addressing the local health needs of regional areas of Australia. The most recently reported planning documents for the Illawarra-Shoalhaven region of NSW, both from the Local Health District level [31] and the Medicare Local level [32], illustrate disease rates and health status indicators drawn from statistically modelled estimates from the 2006–07 Australian National Health Surveys. These figures are significantly outdated for planning purposes in 2014 and their generalisability for regional and smaller area disease prevalence and health status is questionable [10]. The proposed surveillance system also provides opportunity for monitoring trends in chronic disease prevalence across regular time intervals and promotes the engagement of general practice staff and clinicians in maintaining primary care clinical data quality and accuracy. The inclusion of a larger number of sentinel sites that are generalisable to the population being served would provide an accurate and region-specific system for the purposes of population health planning at the primary care level in order to improve the overall health of the community.

Conclusion

This study has demonstrated that extraction of patient clinical data from general practice settings is both a feasible and valid method to obtain a region-specific estimate of the prevalence of chronic diseases and mental health disorders within regional NSW, Australia. General practices that agreed to participate were included in the study, however further sampling methodology is required to identify which sentinel sites would provide an accurate and truly representative surveillance system. Technological updates/changes to general practice clinical software systems are recommended to improve functionality and data quality within general practice databases. Drop down menus with fixed nationally recognised lists of suburb names, cross matched with correct geographical concordance postcode and state information is currently lacking within the general practice clinical software systems. Additionally, making age, sex, postcode and suburb mandatory fields for creating a new patient record could eliminate the issue of missing data for these essential socio-demographic variables. Investment in computer skills and data entry training for general practice staff and advancements in data aggregation instruments are essential to improve quality of clinical data and their collection methods for effective utilisation by researchers and population health planners for surveillance purposes. Annually obtained chronic disease prevalence figures through the surveillance methodology implemented by the SPDS project, could provide more updated and granular health information for prompt health service planning.