Background

Clinical practice guidelines and prescribing indicators have become a common feature in many healthcare systems in an attempt to reduce unwarranted physician variation in medical care, improve quality and control drug costs [1]. In particular, improving the quality of drug prescribing in older populations is a priority due to the association between potentially inappropriate prescribing (PIP) and increases in patient morbidity, mortality, adverse drug events (ADEs), hospitalisation and healthcare costs [2, 3]. A number of indicators of PIP have been developed to evaluate prescribing consisting of drugs to be avoided in older people independent of diagnosis and in the context of certain diagnoses [46].

There has been little research on the prevalence of PIP in primary care populations or how it varies between both patients and general practitioners (GPs) [7]. The focus to date has mainly been on variation in prescribing volumes and costs which can have low validity with prescribers [8]. Quantifying and understanding the variation in PIP is important for planning interventions and the development of guidelines and incentives to improve prescribing quality in older populations [1]. It can also provide information for performance management purposes by identifying prescribers with particularly high rates of PIP for further investigation [9].

The aim of this study was to examine the variation between GPs in the prevalence of PIP in the national Irish population aged ≥ 70 years in 2007 using the STOPP criteria. The main objectives were to: (i) estimate the level of variation in PIP across the distribution of GPs; (ii) estimate how reliably the STOPP criteria could distinguish between GPs in terms of the proportion of PIP prescribed; and (iii) examine how PIP varies between GPs and by patient and GP characteristics associated with PIP in a multilevel regression model.

Methods

Study population

This was a national population study of patients aged ≥ 70 years in Ireland in 2007 dispensed medication through the Primary Care Reimbursement Service of the Health Service Executive in Ireland (HSE-PCRS) pharmacy claims database. The HSE-PCRS provides free health services including medications to eligible persons in Ireland. It is means tested for those < 70 years, and free to all those ≥70 years between July 2001 and December 2008. It is estimated that over 97% of older patients nationally avail of the scheme [10]. Prescriptions are coded using the WHO ATC classification system [11]. Prescriber and patient socio-demographic information, defined daily doses (DDD), strength, quantity, method and unit of drug administration, an urban/rural classification and a measure of practice deprivation is available [12]. Patients receiving prescriptions from more than one practitioner were assigned to the practitioner who prescribed for >3 consecutive months. Where more than one practitioner prescribed over a period of >3 months, the patient was assigned to their most recent practitioner.

Measuring PIP

Thirty STOPP criteria were applied to prescription claims data for the study population. Details of the application of the STOPP criteria to the HSE-PCRS pharmacy claims database have been described previously [13]. The thirty criteria were considered applicable to pharmacy claims data without diagnosis information on a consensus basis by an expert panel of five members in geriatric pharmacotherapy, clinical pharmacology, pharmacoepidemiology and academic general practice (Additional file 1). Prescription drugs for the treatment of certain disease conditions were identified and used as proxies for diagnosis where possible e.g. dementia (ATC, N06D), Parkinson’s disease (ATC, N04) [13] The thirty STOPP criteria were also included in a composite binary indicator defined as whether or not a patient had received any PIP indicator. The number of different repeat drug classes was measured for each patient [13, 14].

Estimating reliability

The reliability of the STOPP indicators in distinguishing between GPs was determined based on previous research and a description of this methodology is provided [7]. The reliability coefficient measures how confident one can be that the observed differences in PIP between GPs result from real differences in the quality of GP prescribing. Reliability increases as variation between GPs increases and with the number of patients per GP [7]. STOPP indicators may be reliable for GPs with a large number of patients but not for GPs with small number of patients and in order to compare GPs STOPP indicators must have adequate reliability for the majority of GPs. Reliability varies between 0 (completely unreliable) and 1 (completely reliable) with > 0.7 (70%) indicating acceptable reliability and reliabilities of 0.8-0.9 (80-90%) are preferable for clinical governance, such as paying GPs for their performance on certain quality indicators [9].

Data analysis

The overall prevalence of PIP and the prevalence per individual STOPP criteria were calculated. The proportion of PIP prescribing (STOPP composite indicator) for each GP is also presented. Reliability was measured by estimating the intracluster correlation coefficient (ICC) for each STOPP indicator and the composite indicator in a two level random intercept logistic model with no explanatory variables (“empty model”). Reliability was calculated using the Spearman-Brown prophecy formula based on median number of patients aged ≥ 70 years per GP in 2007 [7, 9]. The proportion of GPs with reliability >0.7 and >0.8 for each STOPP indicator and the composite were then calculated based on the actual number of patients in each GP practice [7].

Multilevel logistic regression investigated how the STOPP composite indicator varied between GPs and by patient and GP characteristics. Multilevel unadjusted odds ratios (OR) with 95% confidence intervals (CI) were estimated in a two level random intercept logistic model for each patient level one explanatory variable (age, gender, number of different repeat drug classes) and each GP level two explanatory variables (gender, urban/rural, deprivation). Three multivariable models were estimated; (i) Model 1, a two level random intercept logistic model with patient level one explanatory variables only; (ii) Model 2, a two level random intercept logistic model with patient level one and GP level two explanatory variables and; (iii) Model 3, a two level random slope logistic model. In Models 1 and 2, the response probability for the STOPP composite indicator was allowed to vary across GPs but the effect of each patient level one explanatory variable was assumed to be the same for each GP. Model 3, allowed both the intercept and the explanatory variable, number of different repeat drug classes, to vary randomly across GPs [15]. Likelihood ratio (LR) tests were used to compare the fit of the three models.

The between GP variance and the variance partition coefficient (VPC) were estimated for all three models [15]. The VPC partitions the variance at the patient and GP level and provides an estimate of the proportion of the total residual variance of the outcome (PIP) that is explained at the GP level and represents the heterogeneity between the GPs. The VPC was initially calculated in an “empty” model including only the random parameter (PIP). Patient level and GP level variables were then introduced into the model and the percentage of proportional change in variance was calculated, representing the percentage of variation explained by the variables in the model compared with the empty model. The median odds ratio (MOR) was also calculated which quantifies the variation between GPs by comparing two patients from two randomly chosen, different GPs [16]. A MOR equal to 1 indicates no differences between GPs in the probability of prescribing a PIP indicator. The variation between GPs after controlling for patient level variables (Model 1) was examined graphically using a funnel plot of the observed versus expected number of patients with a PIP indicator and compared to the unadjusted analysis [17].

Initial data analysis and application of the STOPP criteria to the data set was performed using SAS statistical software package version 9.1 (SAS Institute Inc. Cary, NC, USA). Multilevel logistic regression was performed in STATA Version 11.2 (StataCorp, Texas, USA) [18]. Level two residuals were checked graphically for normality, heteroskedasticity and outliers.

Results

Descriptive statistics

A total of 1,938 GPs and 338,725 patients aged ≥ 70 years were identified from the HSE-PCRS pharmacy claims database. The variation between GPs in the overall rate of PIP was considerable ranging from 13% at the 5th percentile to 65% at the 95th percentile (median 35%, IQR 29.6-40.3%). Figure 1 presents the proportion of PIP (at least one PIP indicator) for each GP. GPs outside the 3 standard deviation (SD) control limit were statistically significantly different from the average. Common cause variation is when the values are within the 2 SD and 3 SD lines and special cause variation is when the values are outside the 3 SD lines. Common cause variation indicates variation consistent with random chance and special cause variation indicates variation due to systematic influences [19]. Ninety-eight percent of GPs (N = 1,906) had at least one patient with PIP.

Figure 1
figure 1

The proportion of PIP prescribing (at least one potentially inappropriate indicator) for each GP (N=1,938).

Reliability of STOPP criteria

All of the thirty STOPP criteria had adequate reliability (> 0.7) based on a GP with median-sized catchment. The majority of practitioners had sufficient patient numbers aged ≥ 70 years to be reliably measured for each of the individual STOPP criteria. Between 82.6% and 99.9% of practitioners had reliability > 0.7 and between 70.1% and 97.2% had reliability > 0.8. The reliability of the composite STOPP indicator for a median sized GP was 0.80. The proportion of practitioners with sufficient patient numbers to be reliably measured was lower than the individual STOPP criteria; 76.6% had reliability > 0.7 and 60% had reliability > 0.8 (Additional file 1).

Multilevel regression model

Table 1 shows the percentage of patients receiving at least one PIP indicator by patient and GP characteristics and the unadjusted ORs with 95% CIs estimated in a two level random intercept logistic model. PIP increased considerably with the number of different repeat drug classes; 3% of patients with no repeat prescriptions received a PIP indicator compared with 70% of those prescribed ≥10 repeat drug classes. PIP also increased with age. PIP varied less by GP level variables with a higher rate in urban based practices compared to rural.

Table 1 Number and percentage of patients receiving at least one potentially inappropriate indicator and multilevel unadjusted odds ratios (95% CIs)

Table 2 presents the adjusted ORs and 95% CIs for the three multivariate models (Models 1, 2 and 3). In the multivariate model including only patient level variables (Model 1) the association between PIP and the number of different repeat drug classes was mainly unchanged compared to the unadjusted ORs (Table 1). The associations with patient gender and age were reversed, with older patients and female patients slightly less likely to receive a PIP indicator. In the two level random intercept logistic model with no explanatory variables (empty model) the between GP variance was 9% (SE 0.004). In Model 1, after adjusting for patient level variables the residual between GP variance in receiving a PIP indicator was 6% (SE 0.003).

Table 2 Multilevel logistic regression adjusted odds ratios (95% CIs) for patients receiving at least one potentially inappropriate indicator

The multivariate model including both patient and GP level variables (Model 2) had similar findings with the association between PIP and number of different repeat drug classes remaining unchanged. The association between PIP and GP area of residence (urban/rural) was no longer significant compared to the unadjusted ORs (Table 1). Adding GP level variables (Model 2) only explained an additional 0.5% of the between GP variance (Model 2 = 5.9% vs. Model 1 = 6.4%) and resulted in minimal change to the adjusted ORs. Model 3 allowed the coefficient of the number of different repeat drug classes to vary randomly across GPs. LR tests indicated that Model 3 was preferred to Models 1 and 2. Model 3 indicated that the between GP variance for PIP was a quadratic function of the number of different repeat drug classes. The lower or higher the number of repeat drug classes the more variability in PIP between GPs.

Figure 2 shows the differences between the observed numbers of patients with a PIP indicator from the expected, for each GP after adjustment for patient level explanatory variables (Model 1). GPs varied from having 50% less than the expected rates of PIP to 50% to 100% in excess but the majority of this variation was not significant (within 3 SD). GPs outside the three standard deviation control limits were statistically significantly different from the average. The VPC for the “empty” model which explains the percentage variance explained by the GP level was 2.7% and relatively small. This indicates that individual patient factors are relatively more important for PIP than GP level factors. The VPC for Model 2 was 1.76%. Thus after adjusting for the effects of patient and GP characteristics the remaining variance (1.76%) in the propensity for PIP at the GP level was attributable to unobserved patient and GP characteristics. The GP level variance was also measured by the MOR (MOR 1.26; 95% CI: 1.23 to 1.29) [16].

Figure 2
figure 2

Observed versus expected number of patients with a potentially inappropriate indicator (N=1,938).

Discussion

Principal findings

There was a high prevalence of PIP in those aged ≥70 years in Ireland in 2007 and nearly all GPs had at least one patient with PIP. The most prevalent PIP drugs were PPIs at maximum therapeutic dosage for >8 weeks, followed by NSAIDs for >3 months and long-acting benzodiazepines for >1 month. The National Institute for Health and Clinical Excellence (NICE) guidelines recommend regular review of patients to assess their continuing need for PPIs and the use of step-down therapy [20]. Long-term PPI treatment has significant cost consequences [21, 22]. NSAID use is associated with gastrointestinal adverse effects and hospitalisation and long-acting benzodiazepines are associated with an increased risk of falls, fractures, impaired cognition and dependence problems in older populations [23, 24]. Drug duplication on the same prescription claim was also prevalent and concurrent use of NSAIDs has been shown to increase the risk of gastrointestinal toxicity [25].

The individual STOPP indicators can be used with reasonable confidence to identify GPs as having above or below average proportions of PIP (average > 0.8 reliability for 90% of GPs). Although the composite indicator had lower reliability, it is likely in practice that the individual indicators would be used to monitor the quality of prescribing. There is also evidence that the STOPP criteria have predictive validity with an association found between STOPP and ADEs in older hospitalised patients [26].

There was relatively little variation in PIP between GPs in Ireland at the GP level, the majority of the variation was at the patient level. While there was evidence of significant variation in PIP between GPs in the unadjusted analysis (Figure 1) after adjustment for patient level variables the majority of this remaining variation was no longer significant (Figure 2). This remaining variation (significant and non-significant) was not explained by adding GP level variables to the model. The characteristics of the GPs did not substantively affect the likelihood of receiving a PIP indicator at the patient level. The multilevel logistic regression model for the STOPP composite indicator found that only the number of different repeat drug classes was strongly associated with the likelihood of receiving a potentially inappropriate indicator. Other patient and GP level variables were found to be significantly associated with PIP but their odds ratios were close to one in the adjusted multilevel models. The association between the number of different repeat drug classes and the likelihood of receiving a potentially inappropriate indicator varied across GPs; the lower or higher the number of repeat drug classes the more variability in PIP between GPs.

A recent Scottish study investigated the variation in PIP between 315 practices and 139,404 patients defined as vulnerable to ADEs using 15 indicators based on explicit national prescribing safety advice (median 12.5%, IQR, 10.1%, 15.3%) [7]. Unlike the current study, the variation between practices was considerable even after adjusting for patient case mix and practice characteristics. Practices which were statistically different from average varied from having half (-50%) the expected rates of PIP prescribing to having 50%-125% in excess. The MOR was higher than the current study (1.42, 95% CI, 1.37, 1.47) [7]. The study populations and prevalence of PIP were different for the two studies which may explain the differences in variation between prescribers [7]. Both studies did however identify considerable unexplained significant and non-significant variation in PIP between prescribers and found that practice level variables did not account for this variation (additional 0.5% in both studies); only the patient level factor number of drug classes was strongly associated with PIP [7].

This study has identified that reductions in PIP will require improvement across all GPs to reduce the average rate of PIP rather than focusing on a few select GPs (outliers). The number of different repeat medications has consistently been shown to be an independent predictor for PIP in numerous studies [7, 14, 2729]. The prescription of multiple medications in older adults has also been associated with an increased risk of drug interactions, adherence problems, ADEs and drug costs [27, 30]. There is some evidence that interventions targeting polypharmacy in older people, using pharmaceutical care or computerised decision support, are successful in reducing medication related problems such as PIP. These and other forms of interventions that help the prescriber modify or reduce PIP in older patients should be developed and evaluated in randomised controlled trials [30].

Strengths and limitations

This study has a number of limitations. The lack of diagnostic information in the database limited the applicability of all of the STOPP criteria and the investigation of individual patient factors and differences in drug indication. It is likely that estimates of PIP and comparisons across GPs are conservative. [31] There was a possibility of confounding by indication and patient case mix when comparing PIP rates across different GPs. However, the variable number of different repeat drug classes should account for most of the unmeasured variability in patient co-morbidities between GPs.

There was a small proportion of patients (3.5%) who were assigned to more than one GP in 2007 and these patients were assigned to the GP who prescribed their medication on a consistent basis, or their most recent GP if more than one GP prescribed their medication on a consistent basis. Therefore, a certain proportion of prescribing is unaccounted for in the analysis which may result in a more conservative estimate of PIP. In addition, the database does not include OTC items, although this is not likely to be a significant factor as the scheme provides free medical treatment and patients must pay for OTC items.

The multilevel approach used in this study controlled for confounding by including both patient and GP level predictors of PIP. In general GP variation in prescribing reflects different therapeutic approaches to health problems in older populations but the current study found minimum variation in PIP between GPs. Also none of the available patient and GP level factors could explain the remaining variation in PIP between GPs. The database had a limited number of GP and patient variables, hence limiting the ability to explain all of the remaining variance. Further multilevel research is required to investigate and understand which factors influence PIP at the different levels of health care organisation; patient, GP, and practice organisation and culture [7, 32].

Notwithstanding the limitations, this study is one of the first studies to examine how PIP varies between both patients and GPs in a national older population [7]. The application of PIP indicators to prescription databases at the patient, GP and practice level provides useful information for assessing and comparing prescribing at the population level [33].

Policy implications

The development of PIP guidelines and their implementation is expensive and must bring value in terms of improved prescribing quality and patient outcomes. Studies on the effectiveness of clinical guidelines have been conflicting but they are effective if well constructed and implemented consistently. Guidelines also need to be closely monitored and prescribers educated to comply with them [3436]. The introduction of regulatory prescribing guidelines were poorly followed in France because of the volume, lack of information systems and limited capacity for monitoring [37]. While in the UK, education on the use of guidelines on prescribing nutritional supplements significantly reduced total prescribing by 15% and inappropriate prescribing from 77% to 59% [38]. The use of computerised clinical decision support, academic detailing and pharmacist intervention has had some success in reducing PIP and further research on their implementation is required [3942].

There is also evidence that guidelines are effective if accompanied by pay for performance financial incentives [43]. Performance measurements do offer an efficient mechanism to regulate health care providers, increase accountability and encourage quality improvement and care but can alienate providers and make them obstinate to change [4446]. Not all PIP measured in prescribing databases may be inappropriate and screening tools will never be substitutes for clinical assessment and judgment. However they can be used to identify high rates of PIP and monitor and improve prescribing practices in older populations.

Conclusion

Optimisation of drug prescribing in older patients is becoming an important public health issue worldwide and effective mechanisms and policies are needed to reduce the occurrence of PIP across all GPs and improve the quality of prescribing [31].