Economic incentives and diagnostic coding in a public health care system

We analysed the association between economic incentives and diagnostic coding practice in the Norwegian public health care system. Data included 3,180,578 hospital discharges in Norway covering the period 1999–2008. For reimbursement purposes, all discharges are grouped in diagnosis-related groups (DRGs). We examined pairs of DRGs where the addition of one or more specific diagnoses places the patient in a complicated rather than an uncomplicated group, yielding higher reimbursement. The economic incentive was measured as the potential gain in income by coding a patient as complicated, and we analysed the association between this gain and the share of complicated discharges within the DRG pairs. Using multilevel linear regression modelling, we estimated both differences between hospitals for each DRG pair and changes within hospitals for each DRG pair over time. Over the whole period, a one-DRG-point difference in price was associated with an increased share of complicated discharges of 14.2 (95 % confidence interval [CI] 11.2–17.2) percentage points. However, a one-DRG-point change in prices between years was only associated with a 0.4 (95 % CI \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1.1$$\end{document}-1.1 to 1.8) percentage point change of discharges into the most complicated diagnostic category. Although there was a strong increase in complicated discharges over time, this was not as closely related to price changes as expected.


Introduction
A number of countries have introduced activity-based payment systems for hospital care by linking all or part of the hospital budget to the number of discharged patients while at the same time adjusting for treatment intensity or patient complexity (case mix). The diagnosis-related group (DRG) is one of the most common systems used to account for case mix. DRGs are widely used for both monitoring and payment purposes. The size of the reimbursement differs between patients, reflecting differences in complexity and thus treatment costs. Patients are categorized in different groups based on diagnosis and procedural codes routinely registered in medical records. For some groups, the DRG system makes the distinction between a "complicated" and an "uncomplicated" patient. While the main diagnosis will be the same, complicated patients will have one or more additional "complicating" secondary diagnoses. Within the resulting pair of DRGs, the complicated group will thus have higher predicted costs and a higher reimbursement. Because personnel in hospitals register information about diagnosis, there is the possibility that a patient is consciously coded to a "complicated" DRG. This is often referred to as "upcoding" or "DRG creep", first defined as "a deliberate and systematic shift in a hospital's reported case mix in order to improve reimbursement" (Simborg 1981). It has also been argued that the introduction of activity-based payment systems will increase the importance of accuracy and completeness in coding (Fisher et al. 1992;O'Reilly et al. 2012). The latter view is shared by the Norwegian government body responsible for the Norwegian DRG system, which defines DRG creep as "patients being coded as more complete, resulting in an increase in case mix index" (translated by the authors from Helsedirektoratet (2011)). Indeed, evidence from the US Medicare system indicated that the introduction of a prospective payment system in 1983 was followed by an increase in the average case mix (Carter and Ginsburg 1985;Ellis and McGuire 1986;Carter et al. 1990; Stern and Epstein 1985;Rosenberg 2001).
In the past decade, there has been a renewed interest in issues related to DRG creep and upcoding. Examining a policy reform in the financing of US Medicare discharges, (Dafny 2005) found a positive association between price differences between complicated and uncomplicated DRGs and the share of discharges in complicated groups. More recently, Barros and Braun (2016) found a positive association between price incentives and upcoding in Portugal.
Responses to price incentives vary between different types of hospitals. In Sweden, the increase in the number of secondary diagnoses registered was larger in hospitals with prospective payment systems than hospitals without prospective payment systems (Serdén et al. 2003). Two studies in the USA found that for-profit hospitals were more likely than nonprofit or government-owned hospitals to upcode (Dafny and Dranove 2009;Silverman and Skinner 2004), and also that hospitals in "economic distress" were more likely to upcode (Silverman and Skinner 2004). However, no difference in upcoding between public and private hospitals was found in Italy (Berta et al. 2010).
In a cross-country comparative study, Steinbusch et al. suggest that health systems combining for-profit hospitals with the use of secondary diagnosis criteria for classification, such as in the USA, were more susceptible to upcoding (Steinbusch et al. 2007). In a systematic review, Palmer et al. argued that the effects seen in other countries are similar to those observed in the US system (Palmer et al. 2014). In a theoretical work, Kuhn and Siciliani suggested that the level of auditing of the financing system will influence the perceived risk related to upcoding, and this can also explain differences in levels of upcoding across health systems (Kuhn and Siciliani 2008).
The purpose of this paper is to add to the relatively small literature on upcoding in systems dominated by public hospitals by providing an analysis of coding behaviour in Norway over a period of 10 years. The Norwegian health care system is tax funded, with universal access to services that are largely free at the point of use. Hospitals are predominantly publicly owned and financed through a combination of global budgets and activity-based funding. Activitybased financing was introduced in 1997 utilizing a Nordic version of the DRG system. In the period covered by this study (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008), the share of activity-based funding fluctuated between 40 and 60 %. 1 The period also encompasses a major ownership reform in 2002, where hospital ownership was transferred from 19 county councils to the state (Magnussen et al. 2007).
Analysing coding behaviour in the Norwegian health care sector allowed us to address three questions. First, in a public health care system, the additional income generated from upcoding remains in the hospital. Thus, it will be used to increase the level of activity beyond what was planned, to increase slack (inefficiencies), or it will be saved to finance future investments. It remains uncertain to what extent actors in this public setting will seek to increase income by upcoding. Second, the substantial changes in the degree of activity-based funding during the period studied allowed us to analyse to what extent public hospitals adjust their coding behaviour in response to changes in financial incentives. Third, using observations over a period of 10 years allowed us to study any underlying trends in coding behaviour, and isolate this from the effects of changes in financial incentives. In all three questions, our main interest was the potential relationship between economic incentives and coding behaviour on an aggregate national level. Although there are numerous micro-level examples of upcoding (Laegreid and Neby 2012;Neby et al. 2015), it is unclear whether these are exceptions to the rule, or whether they represent a general behavioural response to economic incentives.

Data material
Data from all Norwegian somatic hospital discharges for the period 1999-2008 were used. The Norwegian Patient Registry provided the data. 2 Each hospital discharge was grouped in a DRG, and 250 of the total of 913 groups were linked in complicated/uncomplicated pairs (in 2008). Only patients in acute care hospitals grouped within these 125 DRG pairs were included. We excluded DRG pairs not used in all years, DRG pairs with fewer than 1000 annual cases, and five additional DRG pairs that were viewed as problematic. 3 After exclusion criteria were applied, 3,180,578 in-patient discharges remained. They were grouped into 76 different DRG pairs, of which 53 pairs were medical DRGs and 23 pairs were surgical DRGs.
These pairs amount to about 29 % of the total volume of discharges. See Table 1 for a list of included DRG pairs. Our study included 26 hospitals (including three large publicly funded non-profit private hospitals). Not all hospitals treated patients in all included DRGs.

Dependent variable
The dependent variable (c tih ) was the percentage of complicated discharges in a DRG pair. This was defined as the number of complicated cases divided by the total number of cases in the DRG pair, calculated for year t, DRG pair i and hospital h.

Potential gain in income from upcoding: the incentive
We measured the potential gain in income from upcoding as the difference in reimbursement (DRG prices) between complicated and uncomplicated groups in each DRG pair similarly to the spread in weights as defined by Dafny (2005) and Barros et.al. (Barros and Braun 2016). This spread did not differ across hospitals, as there were no hospital-specific prices. We calculated the difference between prices of complicated and uncomplicated groups within a DRG pair across the years, multiplied by the share of activity-based funding for each specific year. However, we depart from Dafny's approach by calculating the mean across years for each DRG pair and denote this as p i (Eq. 1). To enable comparison across years, we measured prices normalized in DRG points, not as the monetary value of a DRG point. One DRG point, roughly equalling the treatment cost of the "average patient", was valued at 33,647 NOK (∼3629 EUR) in 2008. This should be interpreted as the incentive in a DRG pair because it increases income without increasing cost, should any upcoding take place.
In Eq. 1, COMPLICATED it is the DRG weight (relative price) of the complicated group in DRG pair i in year t, UNCOMPLICATED it is the DRG weight of the uncomplicated group in DRG pair i in year t and ABFSHARE t is the share of the total budget allocated through activity-based financing (from 0 to 1) in year t. However, the price of each DRG may change from year to year. Such changes are caused by (1) changes in relative reimbursement rates (prices are adjusted annually) for specific DRGs (i.e., COMPLICATED it and UNCOMPLICATED it ), and (2) variations in the share of activity-based funding between years (ABFSHARE t ). Either of these causes will yield changes in the potential gain in income. In this study, we are not only interested in the level of the incentive, ( p i ), but also in changes calculated as the annual changes from the average for each DRG pair (Eq. 2).
By separating p i and p it , we separate the effect of the level of the incentive from changes in the incentive on coding behaviour. The level of the incentive is thus the difference between DRG pairs ( p i ), while the changes are differences over time within a specific DRG pair ( p it ). The spread used by Dafny (2005) and Barros et.al. (Barros and Braun 2016) is the sum of these between and within effects.

Statistical analysis
The clustered and hierarchical nature of the data led us towards a mixed-model approach.
The multivariable analyses were performed using a three-level linear regression model, where hospital discharges were aggregated to 19,250 observations, comprising 10 yearly observations (level 1) of each DRG pair (level 2) within each of the 26 hospitals (level 3). Equation 3 describes our main analytical model.
Our dependent variable, c tih , is the share of complicated cases in year t in DRG pair i in hospital h. The effects of the level of the upcoding incentive were defined by p i (Eq. 1), and the change in incentive defined by p it (Eq. 2). To capture any general development in coding practice over time, we included time trend (T t ), which measures years since 1999. This time trend might, however, capture both general improvements in quality of coding, as well as any fraudulent upcoding not captured by the effects of p i and p it . We also controlled (by way of a dummy (D) for the years 2002-2008) for the possible effect of the ownership reform in 2002. A statistical interaction of these was included (T t D).
The a-terms are constants and intercepts at the different levels while ε tih is the residual. Other covariates are denoted x tih in the equation. These included average age and sex in each DRG pair. Elderly patients are more likely to be frailer, and therefore have an increased probability of being grouped in complicated groups. 4 For the same reason, we also adjusted for emergency status and length of stay. Emergency admissions are more likely to be complicated than elective procedures (Melnick et al. 1989;Keller et al. 1987). Length of stay may be a proxy for case mix as the longer the patient remains in the hospital, the more complex the illness is likely to be or the frailer the patient. To better control for co-morbidity and case mix, we constructed a Charlson index for each analytical observation. The index is a measure of co-morbidity that is based upon secondary diagnoses (Charlson et al. 1987), as also was our dependent variable. For the calculation of the Charlson index, we excluded those diagnoses that caused a complicated DRG grouping (within each DRG pair), and thus the index does not have an upcoding bias other than what comes from the complicated discharges actually being more complicated.
While ownership of hospitals after 2002 was transferred to the state, there was an administrative decentralization to four regional health authorities. The regional health authorities face different challenges, as there are substantial differences in distance to hospital, different degrees of deficits/surpluses and also size of population. We also included dummy variables for these to account for possible regional variances in coding behaviour induced by diverse organizational incentives or structures. The annual number of in-patient treatments at each hospital (measured as case mix-adjusted DRG points) was included as a proxy for hospital size. This measure will be invariant at the DRG pair level. Finally, we performed a stratified analysis of medical and surgical DRGs, because surgical DRGs could arguably have less room for differences in coding behaviour than medical DRGs. Precision was estimated with 95 % confidence intervals (CI).
Even though the dependent variable is a proportion, we assumed normality in the residuals. Robustness tests were performed with a simpler two-level model, using the actual monetary value as main independent variables instead of the rather abstract DRG points. Percentage complicated in DRG pair Table 2 presents descriptive statistics. Across the observations (year, DRG pair, hospital), the mean share of complicated discharges was 38 %, ranging from 0 to 100 (see Fig. 1 for distribution). The mean p i was 0.28 DRG points and ranged from 0.05 to 1.19 (see Fig. 2 for distribution). The mean change ( p it ) was zero because this was defined as yearly deviations from p i . Table 1 lists p i and the mean absolute p it for each DRG pair, and Fig. 3 shows the distribution of p it . Data analysis was performed at an aggregate level, i.e., the mean age of 55.6 was the mean across all observations (year, DRG pair, hospital) and not the mean for all distinct patients. On average, the share of females was 51.2 %, but this varied from 0 to 100 as some DRG pairs were gender specific. The mean length of stay was 4.87, but varied across DRG pairs with a maximum of 46. Some DRG pairs had a zero length of stay and were thus likely to be patients admitted as in-patients but discharged on the same day. There was a downward trend in length of stay over the period. To control for hospital size, we also calculated the (case mix-adjusted) number of in-patient discharges at each hospital. This was measured annually at the hospital level, and as opposed to the other independent variables, this was DRG pair invariant. Hospital size varied substantially with the mean of 11,496 discharges while the largest hospital had 43,540 discharges. Mean hospital size also increased over the period covered by this study, both through reforms and reorganizations/mergers as well as increased budgets. All control variables were centred on their mean in the multivariable analysis. Table 3 shows the correlations between the variables of interest. The share of complicated discharges (c tih ) was highly correlated with the case mix-related variables: age (Pearson's r correlation coefficient 0.512), length of stay (0.461) and comorbidity (0.510). The share of complicated discharges was also positively correlated with the temporal variables, emergency admissions and medical DRG pairs. At this aggregate level, there was a small yet statistically significant association with p i (0.091), but not with p it . In the multilevel regressions, there was a positive association between p i and the share of complicated discharges (Table 4). Over the whole period, a one-DRG-point difference in p i was associated with an increased share of complicated discharges of 14.2 percentage points (95 % CI 11.2-17.2). However, a one-DRG-point change in p it between years was only associated with an increase of the most complicated group of 0.4 percentage points (95 % CI −1.1 to 1.8).

Multivariable analysis
The temporal variables had large estimated values. There was a large annual increase in the share of complicated discharges of 2.9 percentage points (95 % CI 2.6-3.1) in the period leading up to the reform (1999)(2000)(2001). After the reform in 2002, there was a shift in the share of complicated discharges of 10.2 percentage points (95 % CI 9.6-10.8). By calculating the combined estimates of T t , D and T t D, we find an annual increase of only 0.4 percentage points in the period after 2002.
The case-mix adjustors had a large impact on the share of complicated discharges. A one-unit increase in the Charlson index, which can be interpreted as one more co-morbidity, was associated with an increase of 12.5 percentage points in the share of complicated discharges. For an increase in mean length of stay of one day, the share of complicated discharges increased 1.3 percentage points (95 % CI 1.2-1.4). We found only a small negative association between share of females and percentage of complicated discharges. There were no substantial differences between the different regional health authorities. Hospital size had a small positive effect, indicating that larger hospitals have a higher share of complicated discharges.
The share of complicated discharges was 8.1 percentage points (95 % CI 6.8-9.4) higher in medical DRG pairs than in surgical DRG pairs. We performed a stratified analysis of medical and surgical DRG pairs. For medical pairs, a one-DRG-point change in p it was associated with an increase in share of complicated discharges of 5.1 percentage points (95 % CI 2.5-7.6) ( Table 4); for the surgical DRG pairs, there was a negative effect from p it of −2.5 (95 % CI −4.3 to −0.6). Aside from the effect of p it , there were no other large differences between the stratified and the non-stratified analyses.
Robustness tests were performed using simpler two-level models (either hospital level or DRG pair level), but the results did not differ much from the results presented in Table 4. We also ran the analysis using potential income gain measures calculated from the monetary refund that the hospitals received instead of DRG points. The refund was calculated using the yearly refund value of a DRG point while deflating the older years to real 2008 prices. The results did not differ much from the presented results. The test showed that for every 1000 NOK (∼109 EUR) in increased potential income ( p i ), the share of complicated discharges increased by 0.31 percentage points. Nonetheless, changes in p it had no effect. Table 5 shows the different models tested for robustness.    12.57*** (11.77 to 13.37) * * * p < 0.01, * * p < 0.05, * p < 0.1. Controlled for regional health authorities (with dummies) and five age splines. Random effects of time trend, otherwise fixed effects

Discussion
Our goal was to examine the association between the potential gain in income from upcoding and the coding behaviour of hospitals. Across DRG pairs, we found a positive association between the gain in income from upcoding and the share of discharges classified as complicated. Thus, DRG pairs in which there was a higher gain in income from upcoding also had a higher share of complicated discharges. However, although we controlled for co-morbidity, age and length of stay, we cannot exclude the possibility that this partly reflects differences in the case mix. Nevertheless, it is not clear why the difference in treatment costs between complicated and uncomplicated discharges should be higher in DRG pairs with a higher share of complicated discharges and therefore our results indicate that coding behaviour is related to the size of the incentive. We found that a difference in price between a complicated and uncomplicated group of one DRG point was related to a difference of 14 percentage points in the share of complicated discharges within a DRG pair. Although this may seem like a large effect, the average potential gain from upcoding was only 0.28 DRG points (see Table 2).
We found no association between changes in p it over time and the share of complicated discharges within a DRG pair. Thus, in a period with frequent changes in the share of activity-based funding, hospitals did not seem to respond by changing their coding behaviour. However, when stratifying the analysis by medical and surgical DRGs, we found a small, positive association for medical DRGs. Because surgical patients are generally more homogeneous (within a DRG) than medical patients, there may have been less opportunity for tactical coding of these patients. Although the size of the estimated association was small, this result indicated that there might be subgroups of patients where the relationship between financial incentives and tactical coding is stronger. This corresponds to earlier results on how Norwegian hospitals respond to price changes (Januleviciute et al. 2016). Melberg et al. have recently shown higher growth in DRG groups with a price increase than in groups with a reduction in reimbursement rates (Melberg et al. 2016).
We found that the share of complicated discharges increased during the ten year period covered by the study. This may be due to changes in case mix resulting from demographic changes, changes in technology, changes in the quality and completeness of coding and finally changes in the financing system. Recalling the two different definitions of upcoding and DRG creep presented in the introduction, we cannot here distinguish between "deliberate upcoding" and "more complete coding". The increasing trend could both indicate that the quality of coding has improved, and at the same time that the presence of explicit and implicit incentives is followed by a general increase in the recording of secondary diagnoses. Thus, while we cannot label all upcoding as being completely driven by financial incentives, we argue that such incentives were present and that their consequences are reflected on an aggregate level by the increasing time trend. The introduction of activity-based funding in 1997 was followed by an increased use of secondary diagnoses. Eventually the use of secondary diagnoses will reach a level (or equilibrium) where it might be difficult to justify an additional secondary diagnosis from a medical point of view. Thus, one might suspect that a large part of the potential for increase was exhausted in the period following the hospital reform, explaining the slowing growth in the share of complicated discharges.
This paper decomposed the price incentive into two components, p i and p it , to differentiate between the level and changes of the incentive for upcoding. This approach differs from earlier studies but demonstrates that, in Norway, the differences in prices are more important than changes within groups. Hospitals may appear to respond to prices, but the changes in price are probably too small to have a large-scale impact.
We believe that the major strength of this analysis is the fact that we are able to utilize a complete dataset covering all DRG pairs for all patients at all hospitals. Our analyses include a ten year period in which there have been large and repeated changes in the potential gain in income from upcoding. Thus, any aggregate effects of increased gain in income from upcoding should be detected in this study. By controlling for a time trend and separating within and between effects, we are more reassured that any remaining effects are more related to upcoding rather than to an increase in the quality of coding.
We have employed a system perspective by pooling all DRG pairs, hospitals and years in the same analysis. This could dilute important findings for specific DRG pairs. Silverman and Skinner (2004) found substantial evidence of upcoding for patients with pneumonia. Their results were robust to different model specifications, but sensitive to the included DRGs. Our stratification showed very different results for the medical and surgical DRG pairs. It is safe to assume that even larger differences will be found on examination of separate DRGs. However, our aim was to detect system-level effects and not effects of singular groups or hospitals. One might also question whether the observed changes in the price incentive were large enough to have an effect. While frequent and potentially substantial, the changes in incentives observed in this study were small compared with some of the larger exogenous shocks described by, for example, Dafny (2005). Therefore, it may have been unrealistic to expect significant results from the observed changes. A change of 20 percentage points in the share of activity-based funding is, however, not trivial and it is interesting that these changes only seem to have led to a marginal change in coding practice.
Upcoding can take place in all systems that incentivize documenting of diagnoses. We have limited our study to upcoding in DRG pairs in Norway. These groups amount to less than one-third of the total volume of treatment. Upcoding is possible for all groups, but the paired structure of complicated/uncomplicated lends itself easily to our research strategy of testing directly whether incentives are associated with upcoding. There are several ways "manipulations" can occur in a DRG system (Neby et al. 2015). In this paper, we have focused solely on upcoding and not touched upon other related strategies: gaming, dumping, skimping and skimming. Further studies should attempt to distinguish upcoding from other manipulations empirically. It is impossible using registry data to determine whether the upcoding has been deliberate. To assess the actual conscious decision to upcode, one must opt for a qualitative approach. This study has not ventured into the auditing of diagnosis and hospital records. Earlier evidence from Norway has indicated that diagnostic accuracy is not very high (Jørgenvåg 2005), and it would be interesting to consider whether the Norwegian auditing scheme could be considered optimal (Kuhn and Siciliani 2008).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.