Background

Despite significant advances internationally in the regulation of health professionals – including requirements to demonstrate ongoing competence, the development of impairment and performance pathways and greater external governance – oversight of fitness of practice remains fundamentally reactive in nature. Practitioner boards rely on reports emanating chiefly from patients, peers, and employers to identify misconduct and impairment. The boards vet those reports, determine which to investigate, and ultimately prosecute a small fraction, with sanctions ranging up to license suspension or revocation [1]. Aspects of this regime are out of step with contemporary approaches to quality assurance and consumer protection [2]. In particular, critics have pointed to the reactive posture of professional boards, their heavy dependence on external actors to observe and report problems, and a paucity of response measures designed to rehabilitate clinicians and prevent future harm [2,3,4,5].

In recent years, there has been growing interest among health regulators in approaches to oversight that are more proactive and data-driven [6, 7]. One such strategy involves using administrative data that regulators routinely gather, including demographic and complaint history information to flag practitioners at high risk of future malpractice claims and complaints [8]. The potential for risk prediction to contribute meaningfully to prevention is highlighted by mounting evidence from Australia and the United States that a relatively small number of doctors who recidivate are responsible for a majority of malpractice claims and disciplinary actions [9,10,11,12].

Using comprehensive national data on complaints against practitioners in 14 health professions in Australia, we extended an earlier effort aimed at predicting the risk of a complaint at the individual doctor level [8]. Our goal was to test the feasibility of an algorithm that regulators could use to reliably identify high risk practitioners. More generally, we aimed to explore how well simple, routinely-collected administrative data performed in predicting complaint risk and the conditions under which prediction was possible.

Methods

Setting

In Australia, all registered health professions are regulated under a national scheme [13]. Ten professions have been regulated by the scheme since its inception in 2010: doctors, nurses and midwives, dental practitioners, psychologists, pharmacists, chiropractors, optometrists, osteopaths, physiotherapists and podiatrists. Four more professions joined in mid-2012: Aboriginal and Torres-Strait Islander (ATSI) health practitioners, Chinese medicine practitioners, medical radiation practitioners and occupational therapists. In 2018, nursing and midwifery were recognised as separate professions and paramedicine joined the scheme.

The Australian Health Practitioner Regulation Agency (AHPRA) works in partnership with practitioner boards that oversee each of the regulated professions. AHPRA manages the registration processes across Australia. In four of the six states and both territories: it receives “notifications” regarding the health, conduct, and performance of registered practitioners; it then refers the notification to the relevant practitioner board for adjudication. (We refer to notifications hereafter as “complaints” for consistency with international terminology.) Two states have slightly different regulatory arrangements. In New South Wales, complaints are managed by the state’s profession-specific boards (e.g. the Medical Council), with support from the Health Professionals Council Authority (HPCA) and the Health Care Complaints Commissioner. In Queensland, since July 2014, the Office of the Health Ombudsman receives complaints about health practitioners and determines whether to manage them internally or transfer them to AHPRA. The divisions of complaint-handling responsibilities among federal and state regulators are described in more detail elsewhere [14].

We conducted the study in accordance with TRIPOD guidelines for developing prediction models [15].

Data and variables

Registration and complaints data

Using administrative data routinely collected by AHPRA, we identified and extracted information on all health practitioners registered to practise in Australia between 1 January 2011 and 31 December 2016. (For the four professions that joined in 2012, the information covered 1 July 2012 to 31 December 2016.) The extracts consisted of variables indicating the period when each practitioner was registered as well as the practitioner’s type of registration, age band, sex, profession, specialty and practice location.

We also identified all complaints about these practitioners lodged with relevant regulators during the same time periods. AHPRA and the HPCA provided practitioner-level complaints data that included the date the complaint was lodged and the primary issue raised by the complainant. We linked the registration data with the complaints data using anonymised, unique identification variables provided by AHPRA and HPCA.

To protect confidentially, APHRA provided practitioners’ birth dates in 5 year bands (e.g., 1970–1974). We recoded these to reflect the practitioner’s age group in 2010. We coded the professions into 16 categories, separating nurses and midwives and distinguishing dentists and dental prosthetists from other allied dental practitioners (dental hygienists, dental therapists and oral health therapists). Within the medical profession we further sub-coded doctors into 10 specialty groups [16] (general practice, surgery, obstetrics and gynaecology, physician, psychiatry, anaesthesia, radiology, emergency and ICU, non-clinical and non-specialist); and sub-coded nurses into registered nurses (who have a three year degree in nursing) or enrolled nurses (who have a two-year diploma or equivalent). If a practitioner had multiple professions (e.g., physiotherapist and nurse) we selected one at random and if they had multiple registration types (e.g., general and specialist) we selected the highest level of registration.

As part of the complaint-handling process, AHPRA and HPCA staff had coded the primary issue associated with each complaint into one of 149 mutually exclusive categories. Using an established coding methodology [1], two reviewers on our team then independently classified these into one of 14 categories. For example, complaints about “bullying, discrimination, disrespect, threats and assault” were re-coded as “interpersonal behaviour”. The independent reviewers’ judgements were then compared and differences resolved by consensus. Finally, the 14 issue categories were arranged into three domains (health, performance and conduct). Further details of the coding process are reported elsewhere [1].

Primary practice location was coded into one of three categories to capture the remoteness of the practice: major city, inner/outer regional area, remote/very remote area. These categories are based on a standard geographic coding system developed by the Australian Bureau of Statistics to indicate proximity to population centres [17].

Clinical hours data

Systematic differences among practitioners in the amount of clinical work they conducted had the potential to bias estimates of risk factors for complaints. To control for such differences, we estimated the average number of clinical hours worked per week by profession, sex, age group and for doctors, specialty. Data for these estimates came from the 2015 National Health Workforce Dataset [18], a summary of information practitioners report when renewing their registration each year. Our calculations produced stratified averages of clinical hours per week, which we then merged with the registration data.

Study dataset

We constructed a person-period dataset in which each row of data represented covariate values for an individual practitioner for each time interval they were at risk of a complaint. The values for a practitioner’s sex, age group, profession and speciality and practice location did not change over time. The other variables were time-varying. New intervals triggered new rows, which began on the date the value of a time-varying variable changed and ended at the next change of any time-varying variable.

Complaints were also specified as time-varying variables. A practitioner’s “exposure” to complaints was defined in two different ways: (1) a variable indicating the total number of previous complaints (0, 1, 2, 3, 4, 5, 6, ≥7) a practitioner had accumulated; and (2) 14 “look-back” variables (one for each issue category) indicating whether the practitioner had experienced a complaint of that type in the past year. Because construction of the look-back variables necessitated at least one prior year of observation time, we excluded from the analysis each practitioner’s first year of data. For most practitioners, observation time therefore began on 1 January 2012. We also excluded time intervals in which a practitioner was unregistered, was registered to an address outside Australia, held non-practising registration status (which indicates no authorisation to deliver direct clinical care) or was over 75 years of age.

Finally, because our study sought both to develop and validate a risk score for health practitioners, we randomly split the data into a training sample (70% of practitioners) and a validation sample (the remaining 30%).

Analyses

We began by calculating the number of complaints and the crude complaint rate (per 1000 person years) for the sample as a whole and by practitioner and complaint characteristics. Construction of the complaint risk calculator proceeded in three steps, as described below. All analyses were performed as complete-case analyses (reliance on administrative data meant there was no missing data on any study variables).

Predicting complaints

We used survival analysis to identify the characteristics of practitioners at risk of one or more complaints. Specifically, we used an Anderson-Gill model [19], which allows each practitioner to accrue multiple complaints over the study period. The underlying distribution for time at risk was modelled using the Weibull distribution. We calculated cluster-adjusted robust standard errors to account for multiple periods of observation per practitioner. Model discrimination was assessed using the C-index derived from fitting the model estimates from the training sample to the validation sample.

Exposure time was defined as the period of time each practitioner was registered with AHPRA (in years) multiplied by weekly clinical hours estimates that corresponded to the practitioner’s profession, sex and age group (and for doctors, specialty). The measure of clinical work time was expressed as a fraction of a 40 h working week and allowed for values greater than 1.

Constructing the PRONE-HP score

We used the results from the survival model to design a scoring system. Each risk factor was assigned points; the number of points assigned was scaled directly from the coefficients in the model. Specifically, we multiplied the log hazard ratios for each predictor by 6·0 and then rounded to the closest integer. This transformation permitted all possible point totals to produce a total “risk score” that ranged from 0 to 100. We refer to the scoring system as PRONE-HP (Predicted Risk of New Event for Health Practitioners). The system permits a simple calculation of a practitioner’s risk score each time a new complaint is lodged against him/her.

Evaluating the performance of the PRONE-HP score

We assessed the performance of the PRONE-HP score in several ways. First, to determine whether precision was lost in transforming the model coefficients from a survival model to a simple additive scoring system, we computed discrimination using the C-index in the validation sample. Second, to assess calibration of the PRONE-HP score (i.e. how closely PRONE-HP scores correspond with out-of-sample complaint risk), we calculated and compared Kaplan-Meier curves for 5 score ranges in the training and validation samples.

Finally, because our aim was to develop a tool for reliably identifying practitioners at high risk of complaints, we defined high risk groups by demarcating practitioners whose risk scores fell above designated “thresholds”. The distribution of scores varied across professions, rendering a single common threshold too crude. Instead we pegged the threshold to a specified percentile (the 90th) within each profession. Using this threshold, we calculated the positive predictive value (PPV), negative predictive value (NPV), sensitivity and specificity of the prediction of a new complaint within 2 years for each profession. These calculations used methods appropriate for censored data [20].

Sample size calculations

We did not calculate formal sample sizes for this study. Over the period for which data were available, we used data on all complaints against all registered health practitioners in Australia in order to maximise the power and generalisability of study results.

Results

Sample characteristics

The sample consisted of 715,415 health practitioners (Table 1). Twenty-two percent were male and 94% were ≤ 65 years. The largest professional group was nurses (55%), followed by doctors (15%), midwives (6%) psychologists (5%) and pharmacists (4%).

Table 1 Characteristics of health professionals and complaints

These practitioners were the subject of 39,575 complaints over the 2,305,763 person-years of follow-up time. Concerns about treatment or other clinical issues were the most common reason for lodging a complaint (42% of complaints), followed by complaints regarding interpersonal behaviour (9%) and other conduct issues (9%).

Factors associated with risk of complaint

In multivariate survival analysis that adjusted for working hours, all variables in the model were associated with risk of complaint (Table 2). Male practitioners’ complaint risk was 1·5 times that of female practitioners. There was no significant difference in complaint risks for practitioners 26–35 years and those aged ≤25 years, but practitioners in the older age groups had 1·5 to 2·1 times higher risk and it generally increased with age. Compared to practitioners working in major cities, those based in regional Australia had 1·1 times higher complaint risk and those in remote areas had 1·3 times the risk.

Table 2 Complaint rates, survival model, and PRONE-HP scoring system

Risks varied widely by profession and medical specialty. Compared to medical radiation practitioners (the profession with the lowest risk and the reference group), the risk of a complaint was substantially higher for all the medical specialties, especially obstetrics and gynaecology (HR = 16·2), psychiatry (HR = 16·1), surgery (HR = 13·3) and general practice (HR = 11·2). Dentists and dental prosthetists were also at very high risk (HR = 11·5). Chiropractors (HR = 6·5), psychologists (HR = 6·0), and pharmacists (HR = 5·6) had elevated risks of complaints, whereas the risks for enrolled nurses (HR = 1·8), registered nurses (HR = 1·8), physiotherapists (HR = 1·6), occupational therapists (HR = 1·5) and midwives (HR = 1·0), were relatively close to those of the reference group.

Complaint risk increased monotonically with the number of prior complaints. Compared to those with no prior complaints, practitioners with 1 prior complaint had 2·6 times higher risk of accruing another complaint; those with 3 prior complaints had 5·1 times the risk and those with ≥7 prior complaints had 12·0 times the risk. The highest risks of additional complaints followed complaints relating to concerns about substance use (HR = 4·2), honesty (HR = 3·6), a practitioner’s mental health (HR = 3·3), sexual boundaries (HR = 2·4) and use and supply of medications (HR = 2·2).

Performance of PRONE-HP

The far right column of Table 2 shows the points assigned to each predictor, based on the hazard ratios estimated in the survival model, to form the PRONE-HP scoring system. PRONE-HP demonstrated good discrimination when applied to the validation dataset (C-index = 0·77). The C-index was virtually identical to that estimated in the underlying model, suggesting that the conversion of model coefficients to the points system did not materially affect discrimination.

To assess the calibration of PRONE-HP, we examined the out-of-sample consistency within risk strata. Figure 1 shows Kaplan-Meier curves plotting the probability of a subsequent complaint over a two-year period 5 different groups of practitioners. The groups are defined by ranges (or bands) of PRONE-HP scores. Within each band, two curves are displayed: one comes from the training dataset and the other from the validation dataset. The pairs of curves in the lowest four ranges show close concordance. In the highest range (PRONE-HP ≥35), there is some divergence in the pair, especially between 6 and 18 months.

Fig. 1
figure 1

Observed probability of complaints based on selected PRONE-HP score ranges, test and validation samples

Figure 1 also shows a high degree of consistency up and down the PRONE-HP scale. The probability of a subsequent complaint ascends monotonically across the ranges. For example, at 24 months, the probability of a further complaint is 0·6% for practitioners in the 0–4 point range, 15% for practitioners in the 20–24 point range, 30–33% for practitioners in the 25–29 point range, 53–54% for scores 30–34, and 88–92% for scores ≥35. In addition, after the first few months, no group’s pair of prediction curves crossed those of the adjacent group.

Thresholds for identifying high risk practitioners

For doctors and dentists and dental prosthetists, the high-risk threshold (90th percentile of risk) was a score of 30 or more points (Table 3). A total of 6245 doctors reached or exceeded this threshold at some period during the study. During those periods, the PRONE-HP had a PPV of 93·1% and a NPV of 90·1% for these practitioners. Sensitivity was low (17·6%) but specificity was high (99·8%). The PRONE -HP demonstrated similar levels of predictive accuracy among dentists and dental prosthetists at out above their high-risk threshold. The high PPVs estimated for doctors and dentists and dental prosthetists, coupled with the relatively small number of practitioners exceeding the threshold, suggests that PRONE-HP may be a useful tool for identifying high risk practitioners in these professions.

Table 3 Diagnostic properties of PRONE-HP: Predicting new complaint within 2 years

By contrast, 10 of the 16 professions we analysed exhibited features that constrained the practical utility of PRONE-HP. The high-risk threshold of these professions fell at a substantially lower scores and PPVs were low. The professions were Aboriginal and Torres Strait Islander health practitioners (PPV = 25·3%), optometrists (PPV = 12·7%), osteopaths (PPV = 8·4%), other dental practitioners (PPV = 7·3%), physiotherapists (PPV = 7·3%), nurses (PPV = 5·7%), medical radiation practitioners (PPV = 5·5%), occupational therapists (PPV = 5·4%), midwives (PPV = 5·2%) and Chinese medicine practitioners (4·7%). In addition, the number of practitioners in the high-risk groups for nurses and midwives was very large.

In between are several professional groups for whom PRONE-HP may be useful for identifying high risk practitioners so long as regulators are willing to accept lower predictive properties (i.e. a greater number of false positives). Chiropractors (PPV = 71·1%), psychologists (PPV = 54·9%), pharmacists (PPV = 39·9%) and podiatrists (PPV = 34·0%) constitute this group.

Discussion

In this national study of complaints to health regulators over a 6-year period, we modelled the risk of complaints to the professional regulator among more than 700,000 practitioners in 14 health professions. Results from the model formed the basis of PRONE-HP, a tool designed to be amenable for use by regulators to estimate the risk practitioners’ will experience complaints in the short to medium term. Tests of PRONE-HP’s predictive accuracy suggested it is a very promising approach for identifying doctors and dentists at high risk of subsequent complaints (few false positives); moderately promising for identifying chiropractors, psychologists, pharmacists and podiatrists at high risk (some false positives); and of limited utility for the other professions examined (many false positives).

We envision the PRONE-HP score being used in a couple of different ways by health profession regulators in the complaint handling process. Because the score can be re-calculated each time a complaint is lodged, it could be useful for flagging cases that warrant deeper review – for example, reviewing previous complaints against a practitioner or other information held by the regulator to ascertain if there are troubling patterns of care or imminent risks to patients. Such signposting may be the most appropriate way to use for the tool. Alternatively, or in addition, the score could guide stepped-level interventions. For example, a low PRONE-HP score may suggest that minimal action is required beyond resolution of the immediate complaint, while a high score may prompt a regulator to consider whether a more active intervention is needed to guard against the risk of future harm.

Using the PRONE-HP in this way has the potential to benefit both patients and practitioners. The tool paves the way for targeted quality-improvement interventions. Patients clearly stand to benefit from such risk remediation. Perhaps less obvious is the fact that practitioners may too. By enabling interventions to occur at an early stage among practitioners who need them, before poor track records of medico-legal actions materialise, the tool could help facilitate an environment in which interventions can be constructive and non-punitive in nature. In other words, the anticipatory nature of the predictive tool creates opportunities to mount interventions designed to help practitioners deliver better care, rather than penalising them for damage done [21]. Ensuring that implementation of an instrument like PRONE-HP strikes the right balance between protecting patients and treating practitioners fairly will be challenging but is crucial.

A more general contribution of our study is to clarify the conditions under which prediction of medico-legal events is possible. PRONE-HP’s performance varied widely across professions. In theory, it should perform well under conditions in which complaints cluster among certain individuals and those individuals have observable traits that differentiate them from their peers. In practice, however, there is a more fundamental requirement: clustering, even among clinicians who are relatively prone to behaviours that prompt complaints, depends on a sufficiently high baseline incidence of complaints [22,23,24]. It is no coincidence that PRONE-HP performed best among professions with the highest complaint rates, and worst among professions with the lowest complaint rates. In sum, the rarity of complaints in certain professions appears to impede robust prediction in those professions.

Our approach differs from previous attempts to predict medico-legal risk (including our own) in several ways. First, we analysed complaint risk in 16 different health profession groups Australia-wide; previous work has focused on samples of doctors, often sampled from particular insurers or states [12, 25,26,27,28,29,30]. Doctors play a central role in the delivery of healthcare, but many other types of clinicians influence the quality and safety of care patients receive.

Second, we used a wider range of predictor variables than previous studies have – most notably, information about issues raised in prior complaints. This enabled us to identify and account for particular types of complaints – for example, those relating to mental health, substance use, sexual boundaries and honesty – that appear to have very strong associations with recurrence. We have previously shown that some of these types of complaints, most notably mental health and substance use problem, are more likely to end in disciplinary action being taken [1]. Our finding here is that these same issues are also risk factors for recurrent complaints further underlines their indicative value.

Third, the model used to create PRONE-HP accounted for systematic differences in degrees of clinical contact (or “exposure” to complaints) among different practitioner groups. Most previous studies rely on head counts of registered practitioners, which tend to overstate attribution of risk of certain characteristics (e.g. male) that are also associated with higher clinical workloads.

Fourth, while many previous studies of medico-legal risk “fix” characteristics at baseline levels, we allowed several predictors to be time-varying. This allows PRONE-HP to take into account aspects of a practitioner’s risk profile that changed over time.

Finally, our previous studies predicting medico-legal risk have been restricted to practitioners with a least one prior complaint or malpractice claim. [8,9,10] The risk measures in this study take into account the entire population of registered practitioners, including those with no previous complaints. This full cohort approach enhances statistical power and generalisability, and supports estimation of risk that is related to an entire health workforce.

Notwithstanding these substantial methodological advances over prior work, our analyses has several limitations. First, we were not able to measure certain practitioner-level variables that are known to be related to complaint risk. These include patient volume, practice type, disciplinary history, and for doctors – performance issues during training and country of training.

Second, the issue-type variables in our analysis were coded from information available at the time the complaint was lodged, and subsequent investigation may have uncovered new or different issues. More refined measurement of this variable may have slightly boosted the model’s predictive power.

Third, while complaints are increasingly recognised as a potential marker of potential problems in care, not all complaints are associated with poor performance or wrongdoing [25]. Fourth, the reliance on administrative data means some practitioner’s practice arrangements are incompletely observed. For instance, we rely on the practitioner’s principle practice location to measure remoteness, but some practitioners do not have a fixed practice location or may work across state/jurisdictional boundaries.

Fifth, our data treats each complaint as a separate and independent, but occasionally a single complaint generates multiple complaints and our data did not allow us to link them. Sixth, we do not have a measure of the severity of harms associated with the complaints in our sample. Those harms will have ranged from minor misunderstandings to preventable deaths, but there is currently no established taxonomy among regulators for measuring and grading harm.

Sixth, while we able to obtain high PPV and NPV values for doctors and dentists, the sensitivity of PRONE-HP for all professions, but especially these two, was low. This result stemmed from a decision to set thresholds that maximised PPV, which is particularly relevant and important for a prediction tool of this kind. Finally, at the request of AHPRA, which has oversight responsibility for all regulated professions, we developed a single algorithm for predicting risk across the workforce as whole. Profession-specific algorithms may well perform better, at least among professions within which risk prediction is feasible.

Conclusions

Some complaints about health practitioners may reflect an isolated instance of dissatisfaction with care; others herald persistent problems that adversely affect the practitioner’s fitness to treat patients [21, 31]. Telling one from the other, without launching time- and resource-intensive investigation, has long been challenge for regulators. The capacity of tools like PRONE-HP to harness routinely-collected information to reliably predict risk of future complaints in some professions opens up new opportunities for regulators of the health professions. Prediction alone is not quality improvement. But it has considerable potential to inform larger quality improvement initiatives and to enable regulators to be more proactive in protecting patients in ways that optimise scarce regulatory resources [6, 7]. Hence, when a prediction capability is evident, as it is for complaint-prone doctors and dentists in Australia, the focus should turn to the nature and effectiveness of the interventions that risk prediction will facilitate.