FormalPara Key Points for Decision Makers

We illustrate (i) how to apply the fixed-effects and random-effects regression frameworks and (ii) how to determine which regression framework is most appropriate for given data.

We find that both random-effects and fixed-effects model are more appropriate than a simple logistic model for measuring the volume-outcome effect. For four operations, the random-effects model is sufficient. However, for two operations, the fixed-effects model is more appropriate.

Policy makers who may be considering the centralization of complex operations to improve patient outcomes may falsely conclude that a volume-outcome relation exists, if decisions are based on analysis from simple logistic models.

Implementation of panel-data methods (like the fixed-effects and random-effects frameworks) following the example in this paper may lead to more reliable policy recommendations.

1 Introduction

For years, numerous studies have asserted a positive correlation between hospital or surgeon volume and patient outcome. Halm et al. [1] systematically reviewed over 130 volume-outcome studies, approximately 70 % of which found a significant volume effect. Halm et al.’s review was published in the Annals of Internal Medicine, and the majority of volume-outcome studies have also been published in clinical journals. Based on findings such as these, many researchers and policy groups advocate centralizing procedures at a small number of hospitals, in order to take advantage of the volume effect [2].

The majority of volume-outcome studies to date use simple regression models, such as basic logistic regressions, that ignore omitted-variable bias. This approach potentially leads to spurious conclusions and improper policy recommendations. For example, suppose that certain large-volume hospitals are particularly effective at organizing surgical teams that reduce complications. A regression that does not control for organizational skill will find a positive association between hospital volume and patient outcome. However, it is organizational expertise, not higher hospital volume, that drives improved patient outcomes.

Panel data sets provide means to help control for omitted-variable bias, specifically via the fixed-effects and random-effectsFootnote 1 regression frameworks. Both frameworks control for time-invariant heterogeneity, i.e. unobserved fixed factors that differ across hospitals. The random-effects framework does this under the assumptions that this heterogeneity is uncorrelated with the other explanatory variables and follows a known distribution. The fixed-effects framework is more general, allowing correlation with other explanatory variables and making no parametric assumption about the distribution.

The tendency to use basic logistic models when estimating the volume effect continues to this day. Searching Google Scholar and PubMed with keywords “volume-outcome relationship” for publications from 2008 to 2013, we located 87 peer-reviewed studies. Only 11 employed panel-data methods, and only three used fixed-effects models. In addition, studies using fixed-effects and random-effects models often fail to validate that the assumptions underlying these frameworks are satisfied by the data. These publication trends are not limited to smaller journals: for example, the New England Journal of Medicine in 2010, 2011, and 2013 published studies on the volume effect that do not explicitly consider fixed-effects models [35].

Cancer procedures are of particular interest regarding improved outcome and efficiency. According to the American Cancer Society, cancer is the second-leading cause of death in the USA, “accounting for nearly one of every four deaths” [6]. In addition to the human toll, the National Institutes of Health estimates that cancer cost the US economy approximately US$86.6 billion in direct medical costs in 2009 and US$130 billion in lost productivity due to premature death [6].

Previous research on the volume-outcome relationship in cancer procedures is conflicted. The survey by Halm et al. [1] mentions that nine of ten studies on pancreatic cancer and three of three studies on esophageal cancer find a significant volume effect at the hospital or surgeon level. Several recent clinical studies, dating from 2008 to 2012, report similar findings [4, 710]. However, Lapar et al. [11] finds no significant association between hospital procedure volume and patient mortality for several surgical procedures, including pancreatic and esophageal resection.

This study addresses three major questions. First, does a volume effect exist in any of six major cancer resection procedures? Second, how does the regression framework used affect this answer? Third, for a given data set, which regression framework is most appropriate? To address these questions, we analyze a 12-year panel data set of hospital-discharge data of patients who underwent one of six cancer procedures. We fit the data for each of these procedures to basic logistic, fixed-effects logistic, and random-effects logistic regressions. We could have taken a different approach, using Monte Carlo simulation to generate samples with and without volume-outcome effects and testing whether each of the three estimation approaches correctly identify the presence or absence of a relationship between procedure volume and outcomes. We chose instead to focus on an application involving actual clinical data. This approach is more relevant to clinicians and policy makers, who are most likely to shape future decisions on whether or not to centralize complex care.

2 Previous Literature

The vast majority of studies dealing with binary patient outcomes such as mortality employ a simple logistic regression framework. A few studies have used the random-effects model [3, 1216]. However, most studies using the random-effects model do not check whether their data satisfy the assumptions of the random-effects model, nor do they test alternative model specifications. Moreover, most do not discuss omitted-variable bias as justification for their model choice.

The fixed-effects regression framework uses the variation within a group to exclude omitted-variable bias from time-invariant factors [17]. With a few exceptions [1820], the fixed-effects model is rarely used in the volume-outcome literature. However, previous research confirms that controlling for unobserved heterogeneity by using a fixed-effect model can yield drastically different results. In a study of child immunization in China, Xie et al. [21] uses a fixed-effects model to control for community- or household-level time-invariant characteristics, rendering insignificant what appeared to be a significant effect of wealth. In a study of hip-fracture patients, Hamilton et al. [22] also find that a significant volume effect disappears when using a fixed-effects model.

It is crucial for researchers to correctly measure volume-outcome effects, because there are potential unintended consequences of centralization. Kessler et al. [23] and others have shown that hospital competition tends to enhance patient welfare. Centralization typically reduces the competitiveness of healthcare markets. It is not clear a priori whether the benefit (if any) from a volume effect would outweigh the welfare loss associated with reduced competition.

In his seminal 1979 paper identifying an empirical relation between surgical volume and mortality for 12 different operations, Luft et al. [24] cautioned that the observed volume-outcome association could be attributable to selective referral; more patients may be drawn to hospitals that have better outcomes. More recently, economists have applied instrumental variables analysis to distinguish between volume driving patient outcomes (a learning by doing effect), versus better outcomes leading to higher volume (selective referral). In these studies, distance to providers or the number of patients and other hospitals within close vicinity of a particular hospital are used as instruments for hospital volume that are unlikely to be confounded by selective referral [25, 26]. These studies find that the proposed instruments are valid predictors of hospital volume. Hypothesis tests also reveal no evidence for selective referral. Another paper conducts hypothesis tests for the exogeneity of hospital volume in explaining patient mortality and finds no evidence that the volume-outcome relation is the result of selective referral [27].

Given that past studies that test for patient selection in the volume-outcome relation find no evidence of selective referral, we chose not to apply instrumental variables analysis in this paper. Instead, we focus on comparing random- and fixed-effects models. The majority of clinicians consult clinical journals when they seek to learn whether a volume-outcome relation exists for a particular operation [1, 28]. And the overwhelming majority of clinical studies apply a simple logistic regression to test for a volume-outcome relation, while most of the rest apply random-effects analysis. Estimation with fixed effects can be readily applied to the same data sets that have been analyzed in these published studies. It is critical for clinicians and policy makers to know whether failing to control for potentially systematic but unobservable differences between high- and low-volume hospitals can yield misleading conclusions regarding the presence of a volume-outcome effect.

3 Methods

3.1 Study Population

We use hospital-discharge data from Florida, New Jersey, and New York for the 12 years 2000–2011, containing all patients discharged during this time.Footnote 2 Based on previous literature [29], we selected patients with an International Classification of Diseases 9th Revision, Clinical Modification (ICD-9-CM) cancer diagnosis and a principal procedure code for colectomy, esophagectomy, pancreatic resection, pneumonectomy, pulmonary lobectomy, or rectal resection. We apply three additional inclusion criteria: the patient was 21 years or older at admission, the patient was not transferred to another hospital (as opposed to being discharged home or to a rehabilitation hospital, or died during the hospital stay), and the patient was successfully matched with hospital-level data from the American Hospital Association (AHA) Surveys.Footnote 3 Because the state-level discharge data do not provide a persistent patient-specific identifier, we could not control for multiple admissions of the same patient within a year. However, analysis suggests that this problem, if present, is negligible.Footnote 4 Annual hospital-level data were obtained from the AHA Surveys, 2000–2011.

3.2 Variables

Our outcome measure is in-hospital mortality. To define hospital volume, we compute the total number of patients treated by each hospital for each procedure within each year. Volumes are computed before applying inclusion criteria to avoid endogeneity.Footnote 5 Hospital-specific characteristics for each year include total facility expenses, full-time-equivalent physicians, ownership status, teaching status, and urban status. Patient characteristics include admission status, age, cancer stage, Elixhauser co-morbidities, race, and sex.

3.3 Statistical Analysis

For each procedure, we fit three models with cluster-robust standard errors: logistic, conditional fixed-effects logistic,Footnote 6 and random-effects logistic. For each model, the log odds of mortality is regressed on patient characteristics, hospital characteristics, and indicator variables for each subprocedure (if any) and year.

In fitting the fixed-effects model, estimating an indicator variable for each hospital would lead to inconsistent estimates, known as the incidental-parameters problem. To avoid this, we use the conditional logistic distribution suggested by Chamberlain [30]. By conditioning the likelihood function on the sum of the dependent variables, a sufficient statistic, we obtain a conditional likelihood function that does not depend on the hospital indicator variables.Footnote 7 Thus, to fit the fixed-effects model, we do not have to estimate the hospital indicator variables, allowing us to avoid the incidental-parameters problem. In Stata, this approach is effected by the -clogit- command.

Two aspects of our analysis merit brief discussion. First, we do not include surgeon fixed effects. The cancer procedures analyzed here typically have very low surgeon volume: the median surgeon volume, summed over the 12-year sample period, ranged from one (esophagectomy) to seven (lobectomy). As one might expect with such low volumes, the percentage of surgeons having nontrivial variation in outcome (i.e., at least one patient survives and does not survive the operation during the sample period) does not exceed 15 % in any of the surgical procedures; these surgeons account for no more than 25 % of observations in the respective procedure. Because the fixed-effects model uses only within-group variation, including surgeon fixed effects would have required discarding more than 75 % of the data.Footnote 8

Second, we run each model specification both with and without a surgeon-volume variable.Footnote 9 Most existing studies of the hospital-level volume effect omit surgeon volume, so to facilitate comparison, we focus on results from the regressions without surgeon volume. This specification risks misattributing a surgeon-level volume effect to the hospital level. For our data, the main findings are robust under both specifications.

3.4 Model Specification Tests

After fitting the three models, we perform model-specification tests. To compare the fixed-effects and random-effects models, one typically uses the Hausman test. However, the standard Hausman test is invalid for clustered data [33]: the clustered structure causes the error terms of observations in the same group not to be independently distributed, which in turn implies that the random-effects estimator is not efficient. Several alternative approaches have been proposed; we adopt the approach proposed by Mundlak [34]. We briefly present this approach in the linear setting, for simplicity. (The extension to the nonlinear case is similar) [31]. In our study, we apply this approach within the logistic framework.

Consider nested panel data (patients, indexed by i, are clustered within hospitals, indexed by j, which are observed over time t), and the basic linear regression model

$$ Y = X\beta + \varepsilon . $$

We assume that the error term ɛ ijt can be decomposed into

$$ \varepsilon_{ijt} = c_{j} + d_{t} + u_{ijt} $$

where c j is a constant capturing time-invariant unobserved characteristics of hospital j, d t is a time fixed effect, and u ijt is the error term associated with patient i in hospital j at time t. The random-effects model assumes that c j is uncorrelated with the other regressors X ijt . This assumption allows us to treat c j as a random variable conditional on X ijt . If the unobserved characteristics c j are correlated with the other regressors, then the estimates of the random-effects model are inconsistent. The fixed-effects model makes no assumption about the correlation between c j and the other regressors; in particular, they are allowed to be correlated.

Mundlak [34] proposes the following approach. Let Z be the subset of hospital-specific variables in X. For each hospital j, we take the time average of the observed hospital characteristics, \( \bar{Z} = \frac{1}{T}\mathop \sum \nolimits_{t = 1}^{T} Z_{jt} \), and include these averages in the random-effects model, obtaining

$$ Y = X\beta + \bar{Z}\gamma + \varepsilon . $$
(1)

Under the assumption that

$$ {\text{E}}\left[ {c_{j} |Z_{j} } \right] = \bar{Z}_{j} \gamma $$
(2)

γ = 0 implies that c j and Z j are uncorrelated.Footnote 10 Empirically, then, fitting model (1) and performing a significance test on γ gives us a way to compare the fixed-effects and random-effects models: A \( \gamma \) significantly different from zero implies that the noncorrelation assumption underlying the random-effects model fails to hold. In this case, the fixed-effects model is preferred.

We compare the fixed-effects and pooled logistic models by fitting a seemingly unrelated regression. Let b and \( \tilde{b} \) denote the coefficients of the fixed-effects and pooled logistic models, respectively. We estimate \( {\text{Cov}}\left( {b, \tilde{b}} \right) \) via a seemingly unrelated regression assuming correlation of the error terms in the two models, then use this result to compare \( {\text{Var}}\left( {b - \tilde{b}} \right) \) via its expansion \( {\text{Var}}\left( b \right) + {\text{Var}}\left( {\tilde{b}} \right) - 2{\text{Cov}}\left( {b, \tilde{b}} \right). \) This variance is used to construct the usual Wald-type test statistic for comparing two models.

4 Results

4.1 Descriptive Statistics

After applying the three inclusion criteria in section 3.1, we obtained 164,804 cancer patients hospitalized for colectomy, 4827 for esophagectomy, 14,246 for pancreatic resection, 5043 for pneumonectomy, 54,448 for pulmonary lobectomy, and 36,046 for rectal resection.Footnote 11 Table 1 provides descriptive statistics for these data.

Table 1 Descriptive statistics

The average patient age is between 63 and 71 years. The racial composition is 76–86 % White, 4–11 % Black, 5–9 % Hispanic, and 5–9 % other. Females comprise roughly half of the patient population for all procedures except esophagectomy (19.35 % female) and pneumonectomy (36.51 % female). The percentage of patients with nodal cancer is 16–40 %; the percentage of patients with metastatsized cancer is 7–28 %. Counting hospitals in different years as unique (to allow for changes in hospital status), public ownership ranges from 10 to 15 %, and teaching status ranges from 16 to 42 %.

The fixed-effects framework estimates the volume-outcome relationship using only within-hospital variation, ignoring variation across hospitals. If most hospitals experience little volume variation over the sample period, one will obtain noisy estimates. Figures 1 and 2 present measures of within-hospital variation in volume for our data. The coefficients of variation plotted in Fig. 1 show the relative size of standard deviation and mean volume within each hospital. In general, these coefficients of variation are around 0.5; that is, a given hospital tends to have a standard deviation equal to about half its mean. The histograms of absolute deviation from mean hospital volume (computed separately for each hospital) in Fig. 2 show that for all procedures, our data have relatively good coverage of deviations between 0 and the sample mean (i.e., the mean volume over all hospitals). Moreover, for these procedures, we observe a reasonable number of deviations greater than the sample mean. Using Stata’s -xtsum- command, we find that the within-hospital variation is at least 29 % for each of the six procedures.Footnote 12 These results suggest that our data possess sufficient within-hospital variation to allow us to credibly fit the fixed-effects model.

Fig. 1
figure 1

Coefficient of variation for hospital volume (within-hospital). For each hospital, the coefficient of variation is computed by dividing the standard deviation of volume for the hospital by the hospital’s mean. The values are plotted here as histograms. See Appendix 4 in the Supplemental Material for quantiles of hospital volume for each procedure

Fig. 2
figure 2

Absolute deviation from mean of hospital volume (within-hospital). For each hospital, for each year the hospital is active (i.e., performs at least one surgery) we compute the absolute value of the difference between the hospital’s yearly volume and its mean over the 12-year period. We plot all results here as histograms. The mean hospital mean volume, aggregated over all hospitals (counting each hospital once), is plotted in red. For each procedure, the bottom plot zooms in on small values of the vertical axis, offering a better view of the right tail

4.2 Regression Models and Analysis

Coefficient estimates for the explanatory variables in all three regressions (basic logistic, fixed-effects, and random-effects) for each procedure are in the Supplementary Material. Neither teaching status nor public ownership appear to significantly affect in-hospital mortality. All regression frameworks find the probability of mortality increases with age. The regressions reveal a small number of significant correlations between race and mortality, but no clear trend emerges either within or among procedures. Cancer stage is significantly correlated with an increased probability of mortality for only half of the procedures studied here (metastatic: colectomy, pulmonary lobectomy, rectal resection).

Of particular interest are the coefficients on hospital volume and their discrepancies among the regression frameworks. The estimates for the hospital volume variable are presented in Table 2. For all procedures except esophagectomy (for which all regression models find no significant volume effect), the logistic and random-effects regressions find a volume effect with p < 0.05 or better. The direction of this effect is consistent with previous findings in the literature: higher volume is correlated with lower mortality. In contrast, the fixed-effects regressions find no significant volume effect in any of the procedures.

Table 2 Coefficient and standard error on hospital volume

Given the differences between models, we implement the two specification tests described above. In Table 3, the first test compares the fixed-effects and random-effects models: we run the regression proposed by Mundlak [34], followed by a test of the null hypothesis that all mean variables of hospital characteristics are jointly zero. In this case, rejection of the null hypothesis favors the fixed-effects model. For two of the six procedures—esophagectomy and rectal resection—we reject the null hypothesis with p < 0.0132 and p < 0.0163, respectively. The second test assesses unobserved heterogeneity by comparing the logistic and fixed-effects models: we first run a seemingly unrelated regression using the two models, implemented via Stata’s -suest- command, followed by a test of the null hypothesis that the coefficients of all variables common to both models are identical. For all six procedures, this test strongly rejects the null hypothesis: for colectomy, p < 0.0037, and for the other five procedures, p < 0.0001. These results imply that the hospital-specific indicator variables introduced by the fixed-effects framework capture relevant unobserved heterogeneity.

Table 3 Model specification tests

As shown in Table 2, the standard errors on the hospital-volume variable in the fixed-effects regressions are typically within one to two times the magnitude of those in the random-effects regressions. Note also that the point estimates of the volume coefficient move closer to zero (typically twice as close to zero) in the fixed-effects regression compared to the random-effects regression. This suggests that the absence of significant volume effects in the fixed-effects models is not due to insufficient within-group variation.

We conclude that, of the three regression frameworks analyzed here, the fixed-effects framework is most appropriate for esophagectomy and rectal resection, while the random-effects framework is most appropriate for the other four procedures. In particular, this implies that we find no significant volume effect for esophagectomy and rectal resection. These two procedures have comparable sample sizes to other surgical procedures studied here, so this result does not seem to be a product of insufficient data. Similarly, neither the in-hospital mortality rate nor the magnitude of the volume coefficient in these two procedures differs drastically from those in the other procedures.

4.3 Robustness Checks

We subject our findings to several robustness checks. To investigate the possibility that the volume effect is nonlinear, we fit the regression models using linear splines with four knots (see Appendix 1 in the Supplemental Material). In almost all cases, no statistically significant difference is found between slopes in adjacent regions; in addition, in all cases, the coefficient of the highest-volume region is not significantly different from the coefficient of the lowest-volume region. These results indicate no concavity, suggesting that our linear specification of volume is reasonable.

Despite this absence of concavity, as an additional check we fit the regression models using the square root of hospital volume, following Seider et al.Footnote 13 The results correspond closely to those of the main analysis in Table 2. In particular, for rectal resection, model specification tests continue to favor the fixed-effects framework, which does not find a volume effect, over the random-effects framework, which does. We also fit the regression models including a surgeon-level volume variable. Again we observe discordant findings for volume among the specifications, and again the fixed-effects regression is preferred in the case of rectal resection, with no significant volume effect. All of these results are presented in Table 4.

Table 4 Robustness checks

5 Discussion

Many studies have measured the relationship between hospital volume and outcome. However, different studies report contradictory findings, and few studies report results from more than one model specification. This study uses a 12-year panel data set to investigate the volume-outcome relationship for six cancer procedures, applying three different regression frameworks, and tests their validity.

For our data, the basic logistic regression model—the most commonly used model in the literature to date—is always strongly rejected in favor of models that control for time-invariant heterogeneity. A study of coronary artery bypass grafts by Huesch [25] finds similar results, always rejecting the basic logistic model. For our data the random-effects model always finds a significant volume effect (except for esophagectomy), whereas the fixed-effects model never does. Whether the fixed-effects or random-effects model is preferred varies across surgical procedures.

We find no significant volume effect for esophagectomy and rectal resection; we find a precisely estimated volume effect for the other four cancer operations. Esophagectomy is the least common procedure in our sample (N = 4827), and rectal resection is the third most common (N = 36,046). Esophagectomy has some of the highest mortality rates in the sample (comparable to pancreatic resection), whereas mortality rates for rectal resection are the lowest. The difference in nature between these two operations suggests that both infrequent and common operations, as well as operations of differing difficulty, should be carefully examined for a volume effect.

While not the focus of this paper, for those procedures that display a significant volume effect, one may ask whether this association results from a practice-makes-perfect mechanism (also known as learning by doing) or a selective-referral mechanism. Under the practice-makes-perfect hypothesis, repeatedly performing a procedure yields experience (to the operating physician, surgical team, etc.), which in turn improves future outcomes. Under the selective-referral mechanism, better outcomes attract more patients. Practice makes perfect supports centralization, whereas selective referral does not.

If one could construct a computationally feasible instrument for hospital volume using exogenous variation, then the direction of causality could be investigated. One commonly used instrument for hospital volume is distance to hospital, e.g., the distance between the patient’s and hospital’s zip codes. For emergent conditions such as a heart attack, distance to hospital is likely to be a significant predictor of hospital volume, which is likely uncorrelated with patient outcomes [23, 35].

However, most admissions for cancer surgery are non-urgent. Patients need not go to the nearest hospital, but consider other factors like hospital quality in their hospital choice. Thus, distance to hospital is unlikely to be a good predictor of hospital volume. Two previous studies involving elective surgery use distance to hospital as an instrument for hospital volume on market concentration (derived from volume measures) and conclude that the instrumental variables analysis yields similar results to analyses without an instrument for volume [26, 36]. Further progress in testing the practice-makes-perfect hypothesis awaits identification of a valid instrument for volume in non-urgent procedures. Future studies should also consider using a measure of specialization (the number of a specific operation performed as a percentage of total hospital admissions) as a determinant of patient outcomes [37].

In addition to providing an analysis of six cancer procedures, this study highlights the benefits of (i) long-time-frame panel data and (ii) the appropriate use of fixed-effects and random-effects frameworks in controlling for unobserved heterogeneity. Application of the most appropriate statistical model improves the chances that referrals to high-volume centers will be made only when doing so improves patient outcomes.