Many research problems involve a response variable which is dichotomous; for example, a patient has a good or a poor outcome following surgical intervention. Such data are often assumed to arise from a binomial distribution and may be modelled using logistic regression. More generally, data may be in the form of a proportion (such as the proportion of GP consultations resulting in a referral to physiotherapy) and may be modelled in a similar manner. This chapter shows how a multilevel logistic regression model is formulated for binomial data clustered within higher-level units. We then introduce the example and the data set used. This is followed by an application within MLwiN. Further details on multilevel modelling and MLwiN are available from the Centre for Multilevel Modelling http://www.bristol.ac.uk/cmm/. The materials have been written for MLwiN v3.01. The teaching version of the software is available from https://www.bristol.ac.uk/cmm/software/mlwin/download/.

## Multilevel Logistic Regression Model

Let yij denote a binary response (0 or 1) for the ith individual in the jth unit, and let πij denote the probability of a ‘success’ (i.e. yij = 1). The binomial distribution is characterised by two parameters: the probability of success πij and the number of ‘trials’ n. So if the outcome were the proportion of GP consultations that resulted in physiotherapy, the denominator n would be the total number of relevant consultations. For a logistic regression model, when each data item refers to an individual response with a dichotomous outcome rather than a proportion, the denominator is always equal to one. This means that we have

$${y}_{ij}\sim \mathrm{Binomial}\left(1,{\pi}_{ij}\right)$$

In a random intercept multilevel logistic regression model, we then model the transformed probability πij as a linear combination of a series of covariates or explanatory variables xpij together with a random effect for each higher-level unit u0j so that we can write

$$\mathrm{logit}\left({\pi}_{ij}\right)=\log \left(\frac{\pi_{ij}}{1-{\pi}_{ij}}\right)={\beta}_0+{\beta}_1{x}_{1 ij}+\cdots +{u}_{0j}$$

As for the multilevel linear regression model, we make an assumption about the distribution of the higher-level residuals u0j

$${u}_{0j}\sim N\left(0,{\sigma}_{u0}^2\right)$$

Alternative link functions to the logit link can be employed for dichotomous outcomes; common alternatives are the probit and complementary log-log links. The logit link has the advantage that the parameter estimates βp can be interpreted as log odds ratios (and so, when exponentiated, they can be interpreted as odds ratios). For further details of link functions, the reader is referred to general works such as that by McCullagh and Nelder (1989).

## Example: Variation in the GP Referral Rate to Physiotherapy

Until recently, patients in the Netherlands (from where the data used in this example are drawn) had to be referred by a GP before they could visit a physiotherapist. GPs are still the major source of referrals to physiotherapists in primary healthcare. Patients are predominantly referred to physiotherapists when they have complaints relating to the locomotive system. Of all patients that present their problem to their GP, a varying proportion is referred to a physiotherapist. The aim of the original study was to explain the variation between GPs in physiotherapy referrals (Uunk et al. 1992).

The authors followed the logic that was explained in Chap. 2. The average referral rate of the GPs in their sample for patients with complaints related to the locomotive system was 24%. This percentage varied between GPs from a low of 11% to a high of 45%. So some GPs referred only one out of ten patients with problems in the locomotive system to a physiotherapist, whilst at the other end of the scale almost half of another GP’s patients were referred. The authors constructed an explanatory model based on social production function (SPF) theory (again see Chap. 2; Lindenberg 1996).

The GPs could either treat the patients themselves, including the use of a ‘wait and see’ policy, or they could refer patients to a physiotherapist. The dependent variable is therefore dichotomous. Following SPF theory it was assumed that GPs have two goals: improving their patients’ health and increasing their own well-being. It was further assumed that both GPs and patients had resources that they could use to reach their goals. The theoretical model is given in Fig. 12.1.

Starting from the right-hand side of Fig. 12.1, the dependent variable is whether a patient is referred to physiotherapy or not. Preceding this are two boundary conditions: firstly, patients have to visit their GP with health complaints for which referral to a physiotherapist is a relevant alternative. The authors restricted the data to patients with complaints of the locomotive system. Hence this condition was fulfilled. The separate diagnoses were used in the analysis to take the case-mix of different GPs into account. The second condition is that there are physiotherapists to whom patients can be referred. That condition is always fulfilled globally, but there is variation in the local availability of physiotherapists within the practice area of the GPs. This variable was therefore used as a control variable.

The assumption was that GPs want to improve their patients’ health. Whether they can realise this goal by referring a patient to physiotherapy might depend on their knowledge of and experience with physiotherapy. As the authors did not have a direct measure of this, they used the number of years of experience that each GP had working as a GP. It is also assumed that GPs want to achieve personal goals: well-being and social approval (from patients, colleagues and physiotherapists). Their workload and the way they were paid (depending on whether a patient was publicly or privately insured) were both assumed to influence well-being. The type of practice was considered a potential influence of sources of social approval: in single-handed practices, GPs depend more on their patients for social approval. The authors had information on whether GPs had physiotherapists in their social network. They interpreted this information in two ways: either this might influence the possibility of acquiring social approval through the referral of patients to physiotherapists, or it might relate to their knowledge of physiotherapy. Finally, it was assumed that patients themselves might want to visit a physiotherapist and that those patients who had achieved a higher educational level would be better able to put their point forward when discussing this issue with their GP. Patient characteristics such as age and sex were used as control variables. In the example dataset, we will use a less extensive set of variables for the sake of simplicity. However, you will still be able to explore the data and test your own ideas.

The data were collected in 1987 as part of a large national survey of general practice (Van der Velden 1999). The starting point was a sample of 100 GP practices in the Netherlands. The following data are relevant to this example:

• GPs in these practices recorded all contacts with their patients over a period of 3 months, including the diagnosis and whether a patient was referred to a physiotherapist.

• GPs filled in a questionnaire.

• All patients on the list of each practice were sent a short questionnaire to collect social and demographic background variables.

The contacts of the same patients for the same health problem were combined into care episodes. This is especially relevant in the case of referrals where patients might first have a consultation, presenting their problem, and their GP might advise them to wait for a couple of weeks and come back if their complaints did not disappear. If we calculated the referral rate using separate contacts instead of the care episodes, we would therefore tend to find much lower referral rates. Consequently, the data have five levels: the practice, the GPs, the patients, the episodes and the contacts. In this example, we only use two levels: GPs and episodes (most GPs were single-handed at that time and the majority of patients only had one episode during the 3-month period). The data therefore form a two-level strict hierarchy of episodes nested within GPs. Patient characteristics, such as age, are simply distributed over episodes. The outcome of interest is a binary indicator of whether the patient was referred to a physiotherapist or not.

## The Data

The data are contained in the MLwiN worksheet ‘fysio.wsz’. When you open the worksheet, you will see the Names window providing an overview of all of the variables. Patients (as previously mentioned, these are not strictly speaking patients but episodes) are identified by PATID and GPs by GPID. Columns 3–8 contain data information relating to the patient. PATAGE is the patient’s age in years, ranging from 18 to 98. This variable is subsequently categorised in PAGEGRP. This variable has been declared as a categorical variable; click on the variable name PAGEGRP in the Names window and then on the View button in the Categories section at the top of the Names window to display the category names. The categories used are 18–34, 35–44, 45–54, 55–64, 65–74, 75–84 and 85–98. PATSEX is also a categorical variable denoting the patient’s sex—1 for male and 2 for female. Similarly, PATINSUR takes the value 1 if the patient is publicly insured and 0 if they are privately insured. The extent of the patient’s education is contained in the variable PATEDU; this variable has four levels (1 for no formal education, 2 for those with only primary education, 3 for secondary and lower/middle vocational education and 4 for higher vocational and university education).

The variable DIAG contains the primary diagnosis resulting from the care episodes. These diagnoses are in 13 mutually exclusive categories:

• diag_1: symptoms/complaints neck

• diag_2: symptoms/complaints back

• diag_3: myalgia/fibrositis

• diag_4: symptoms of multiple muscles

• diag_5: disabilities related to the locomotive system

• diag_6: impediments of the cervical spine

• diag_7: arthrosis cervical spine

• diag_8: lumbago

• diag_9: ischialgia

• diag_10: hernia nuclei pulposi

• diag_11: impediments of the shoulder

• diag_12: epicondylitis lateralis

• diag_13: tendinitis/synovitis

The variables in columns 9–13 relate to the GP. Their experience was measured by the number of years they had worked as a GP; we have rescaled this by dividing by 10 so that GPEXPER, a continuous variable, ranges from 0 to 3.3 indicating that the range of experience was from 0 to 33 years. Also at the level of GP we have workload (GPWORKLOAD), a continuous variable, containing the total number of contacts in the 3-month registration period, measured in thousands of patients, and ranging from 0.277 to 4.649 (i.e. from 277 to 4649 patients). The type of practice, PRACTYPE, is a categorical variable distinguishing between single-handed practices, partnership practices, group practices and health centres. The variable LOCATION differentiates between four categories of practice location: rural, suburban, urban and big city. Finally, the variable GPPHYSIFR indicates whether the GPs have physiotherapists in their social network (taking the value 1 for yes, 0 for no).

REFERRAL is the response variable with 0 indicating that the patient was not referred to a physiotherapist and 1 indicating that they were. (Note the use of 0 and 1 for the responses, not the 1 and 2 used by convention in some other software packages.) Finally, CONS is a column of 1s used to model the intercept in the fixed part of the model; for a random intercept model, this variable will also model the random variation across GPs.

## Model Set-Up

Open the Equations window and the default unspecified model should appear. Declare REFERRAL to be the response, specify a two-level model and set the level 1 and 2 identifiers to be PATID and GPID, respectively. Next click on the N corresponding to the default (normal) distribution for the response and change this to binomial. Accept the default suggestion of a logit link to fit a logistic regression. The window should appear as follows:

In addition to asking for the model specification—the red β0x0 term—MLwiN requests the denominator nij. We can use the binomial distribution to model proportions in which case nij would be the number of ‘attempts’. Since our data refer to individuals, and the response is whether or not an individual patient is referred to physiotherapy, the nij that we require is just another column of 1s. Click on the nij, select CONS from the drop-down list and click on Done.

Now we can specify the fixed part of the model and the level-2 variance component. It is sensible to start with a mean model to estimate the probability of being referred and see how this varies between GPs. Add CONS as an explanatory variable to estimate the mean probability and let this mean vary across GPs at level 2. The window should now appear as follows (you may need to press the + button at the bottom of the Equations window to expand the model that is shown).

The constant β0 will estimate the log odds of referral by the average GP and the GP residuals u0j, which are assumed to be normally distributed, will estimate the GP deviations from the mean log odds. The lowest level variance is a function of πij, the probability of individual i being referred to a physiotherapist by GP j; this is determined by the fact that we are assuming a binomial distribution and we do not estimate this variance explicitly.

## Non-linear Settings

Before estimating the model, we need to specify the settings for non-linear estimation. There are three options that can be set, and this is done by clicking on the Nonlinear button at the bottom of the Equations window. The first option covers the distributional assumption and this relates to whether we wish to assume the variation at level 1 is binomial. For binary data we should assume that this is true rather than testing for over- or under-dispersion (Skrondal and Rabe-Hesketh 2007). The second and third options relate to the estimation procedure used by MLwiN. The estimation procedure is iterative and involves transforming the data and fitting a linear model. The linearisation option relates to the Taylor series expansion, which approximates a linear form for the model, and the options are either a first or second order expansion. The linearising expansion uses predicted values from one iteration to estimate the parameters at the next iteration, and estimation type relates to whether these predicted values are calculated from the fixed part of the model only (MQL) or from both the fixed and random parts of the model (PQL). The simplest estimation procedure (first order MQL) tends to underestimate the random parameters (variances), although it is computationally more robust than second order PQL estimation (Goldstein and Rasbash 1996; Rodríguez and Goldman 1995). A rule of thumb is to start with the simpler estimation procedure and, once a model of interest has been established, switch to second order PQL. To start with we shall use the default settings: a binomial distribution with a first order MQL estimation procedure. This can be selected by clicking on Use Defaults and then Done.

Once these options have been selected, we can estimate the model by clicking on the Start button at the top of the MLwiN window.

## Model Interpretation and Model Building

The mean model should appear as follows:

Taking the antilogit function of the intercept (i.e. exp(β0)/[1 +  exp (β0)]) gives the probability of being referred by the average GP to be 0.203. There is a great deal of variation between GPs and we can use this estimate to calculate a 95% confidence interval for the proportion of patients receiving a referral from their GP, again using the antilogit function. Thus, in 95% of GPs the probability of referral is between antilogit $$\left(-1.366-1.96\sqrt{0.232},-1.366+1.96\sqrt{0.232}\right)=\left(0.090,0.396\right)$$.

In Chap. 6, we considered ways of examining the magnitude of the variance for multilevel logistic regression models. Firstly, the intraclass correlation coefficient can be approximated as

$${\rho}_{\mathrm{I}}=\frac{\sigma_{u0}^2}{\sigma_{u0}^2+3.29}$$

suggesting that 6.6% of the variability in whether a patient is referred to a physiotherapist can be attributed to differences between GPs. Secondly, we can calculate a median odds ratio (MOR) as

$$\mathrm{MOR}=\exp \left(0.954\sqrt{\sigma_{u0}^2}\right)$$

This suggests that the median of all pairwise comparisons between GPs gives an odds ratio of 1.58. There is therefore considerable variation between GPs and we can go on to see how much of this variation can be explained by differences in patient populations. Firstly, add the two control variables, age and sex, as explanatory variables to the current model: PAGEGRP and PATSEX. As reference categories use women in the youngest age group. Then add the diagnoses contained in the variable DIAG, with the first category (symptoms or complaints of the neck) as the reference category. Now estimate this new model to obtain:

Note that MLwiN does not provide an estimate of −2∗loglikelihood for logistic regression models. This is because the estimation procedure used is not maximum likelihood but pseudo-likelihood. There has been a change in the estimate associated with the intercept β0. This is now an estimate of the log odds of referral by the average GP for a patient with the baseline characteristics, in this case a female aged 18–34. All of the covariate estimates are on the log odds scale and thus represent the change in log odds associated with a unit increase in each explanatory variable. By taking the exponential of these estimates, we can obtain estimates of the odds ratio (OR) of referral relative to an appropriate baseline group. The OR for referral for patients aged 35–44, relative to those aged 18–34, is exp(0.055) or 1.06; 95% confidence intervals are given by exp(0.055 ± 1.96 × 0.055) or (0.95, 1.18). The 95% confidence interval contains 1 suggesting that the odds of referral to a physiotherapist are not significantly different between the 18–34 and 35–44 age groups. The parameter estimates suggest a non-linear relationship with age and, relative to the younger patients, older patients (those aged 65 and over) are less likely to be referred to a physiotherapist. Relative to the youngest group, the OR of referral is 0.69 (0.59, 0.81) for those aged 65–74, 0.45 (0.37, 0.56) for those aged 75–84 and 0.31 (0.20, 0.49) for those aged 85 and over. Men are less likely to be referred than women, although this is of borderline significance (OR = 0.92; 95% C.I. 0.85, 1.00) .

After taking account of differences in patient populations and diagnoses, we see that the between GP variation has remained virtually unchanged. This is quite uncommon, as often a large part of the apparent variation between high level units is due to differences between individuals. It is, however, also possible for the variance between higher-level units to increase in multilevel models following the addition of variables at the lower (in this case patient) level. Snijders and Bosker (2012) provide an explanation as to why this is likely to happen in multilevel logistic regression models. In essence, since the variance in a binary outcome yij is constrained to be equal to πij(1 − πij) (see Chap. 6), the addition of a level 1 variable will tend to result in an increase in the level 2 variance so that the proportion of unexplained variation at level 1 will decrease.

We can now check for the effect of the other patient variables; add both PATEDU and PATINSUR to the current model, using the lowest educated and those with private insurance as the reference categories.

These two new covariates offer further insight into the pattern of referrals: there is a steady increase in the probability of referral with increasing educational level of those patients who present with complaints of the locomotive system. Relative to those with no education, those with higher education have more than twice the odds of being referred for physiotherapy (OR = 2.34; 95% C.I. 1.59, 3.45) . The type of insurance (and thus the way GPs are remunerated) does not significantly affect the chance of being referred; those with public insurance show a small and insignificant increase in the odds of referral (OR = 1.08; 95% C.I. 0.98, 1.19). Once again, the addition of these patient characteristics makes no difference to the variance between GPs.

Finally we add the five GP-level variables: GPEXPER, GPWORKLOAD, PRACTYPE (reference: single-handed practices), LOCATION (reference: rural) and GPPHYSIFR (reference: those GPs who do not have friends who are physiotherapists).

We have now built our final model. GPs working in joint practice and those in health centres (which usually include physiotherapists) refer slightly more patients than those in solo practice. The odds of referral are increased among GPs working in one of the big cities (OR = 1.85; 95% C.I. 1.23, 2.76) and GPs who have physiotherapists as friends or acquaintances are also more likely to refer patients (OR = 1.25; 95% C.I. 1.04, 1.50). Neither the experience of the GP nor their workload appears to influence the likelihood of referring patients to physiotherapy.

Altogether the GP characteristics have reduced the variation between GPs from 0.230 in the previous model to 0.196 (a reduction of about 15%). Although we would expect the introduction of variables at the GP level to decrease the variance between GPs, calculation of the intraclass correlation coefficient shows that 5.6% of the unexplained variation in patient referrals is attributable to differences between GPs. The median odds ratio for this model is 1.52.

## A Note on Estimation

The current estimation procedure, first order MQL, is known to produce biased estimates (Goldstein and Rasbash 1996; Rodríguez and Goldman 1995) although it is a reasonable tool for model building. In practice, we recommend that you obtain the final results that you wish to report using second order PQL estimation. (There are alternative methods of estimation available in MLwiN including the parametric bootstrap and Markov chain Monte Carlo or MCMC. Some other packages also include the option of maximum likelihood estimates obtained using numerical integration.) The screenshot below replicates our final model using second order PQL.

These estimates differ markedly from those obtained using first order MQL. The level 2 variance estimated using second order PQL is considerably larger giving an intraclass correlation coefficient of 0.061 and a median odds ratio of 1.56. There are also changes in the fixed part of the model; for example, the estimate of the odds ratio associated with the practice being located in a big city (compared to rural practices) has increased to 1.90 (95% C.I. 1.25, 2.90).

As for a linear multilevel model, we can calculate residuals for multilevel logistic regression models. The residuals from our final model are shown below for the 158 GPs.

These residuals are now on a log odds scale; patients attending the GP with the largest residual (1.046) have an odds ratio of 2.85 (95% C.I. 1.94, 4.17) of being referred to chemotherapy relative to the average GP after taking patient and GP characteristics into account. Note the varying magnitude of the 95% confidence intervals around the GP residuals; those GPs about whom we have more data (i.e. those with more patients) have smaller confidence intervals.

## Further Exercises

Explore the random slope variance for variables such as the insurance status of the patients. It was expected that privately insured patients would be referred less often. We did not find such an effect, but it might still be the case that some GPs are less likely to refer privately insured patients (depending on some measured or unmeasured GP variables).

Look at the GP residuals to check for outliers and explore the effects any outliers may have on the current model.