Diabetes is an important public health issue [1] and an increasing number of clinical trials are being conducted to improve care for patients with diabetes. Increasingly, interventions aimed at improving the quality of care are evaluated using cluster randomised controlled trials (CRCTs) [25]. Whilst observations used in the evaluation may still be made at the individual level, randomisation at the cluster level (such as GP surgery) will often be necessary [57] and is increasingly being used [8]. In CRCTs patients within the same cluster tend to more similar than patients from differing clusters [7, 9]. Thus, the observations within a cluster may not be independent, and the design and analysis of CRCTs should acknowledge this [5, 1013].

Important outcomes in trials of diabetes include clinical measurements, such as glycosylated haemoglobin (HbA1c) (both as a continuous and dichotomised outcome) [14], body mass index (BMI) [15], cholesterol [16], blood pressure [17], or the incidence of macrovascular and microvascular outcomes [18, 19].

Sample size calculations for an individually randomised controlled trial (RCT) are relatively straightforward, but for a CRCT it is necessary to account for the nonindependence [1012]. A design effect can be used to inflate the sample size of an RCT to that required in a CRCT [9, 20]. For a trial with equal cluster sizes, the design effect is calculated as:

$$ 1+\left(m-1\right)\rho . $$

Here m is the cluster size and ρ is the correlation between patients within a cluster [21]. This correlation has important implications for the sample size required [22, 23].

The majority of CRCTs have a parallel design. That is to say, clusters are allocated to either intervention or control. However, increasingly, the value of alternative cluster designs is being appreciated. Some alternative designs include the cluster cross-over [24], the stepped wedge [25, 26], and the dog-leg [27, 28]. In these alternative designs repeated cross-sectional samples are taken from each cluster over multiple time periods. It is becoming increasingly recognised that observations from the same cluster and same period are likely to be more highly correlated than observations in the same cluster but at different periods [2932]. This leads to the notion of a within-period cluster correlation (WPC) and an inter-period cluster correlation (IPC). Unfortunately, there is little or no empirical literature to inform likely values for these parameters at the design stage [28, 29].

For a trial to be powered correctly, an accurate estimate of the correlation of observations within a cluster is required. In the past, many type-2 diabetes trials in primary care have failed to report this correlation, forcing many planned trials to use ad hoc values at the design stage [33]. This leads to inaccurate sample size estimates and (sometimes) to underpowered trials. Typically, this correlation is assumed to be time independent – and a single intra-cluster correlation coefficient (ICC) is used in the sample size calculation. This assumption may not always be valid. For designs with observations taken over multiple time periods, estimates of the WPC and IPC are vital in the sample size calculation [28, 29]. These can be obtained from routinely collected data, in a similar way to ordinary ICCs [34, 35].

Our objective here is to estimate ICCs for typical trial outcomes related to type-2 diabetes using anonymised patient data from The Health Improvement Network database [36]. We additionally report estimates of the WPC and the IPC for a subset of continuous outcomes. Finally, we review previous CRCTs in type-2 diabetes to compare the ICCs estimated in this paper to those previously used.


Correlation of observations in a cluster trial

The quantity ρ in Eq. 1 is defined as the correlation between two randomly selected observations within the same cluster. Typically, an assumption is made that this correlation is independent of the timing of the observations. This property is consistent with a decomposition of the total variance into two independent components representing variation between clusters and between subjects (within clusters). In view of this, the ICC can be defined as the proportion of the variance that is attributable to the between-cluster variance, given as:

$$ \frac{{\sigma_b}^2}{{\sigma_b}^2+{\sigma_w}^2}, $$

where σ b 2 and σ w 2 represent the between- and within-cluster variance components.

Cluster trials are typically analysed using a multilevel linear model. If the correlation between observations in a cluster is independent of when they are taken, an approach using the ratio of variances is a simple method to estimate the ICC. This approach is taken throughout the paper whenever an estimated ICC is reported.

Time-dependent correlation

In some contexts, a model based on the assumption of time-independent correlations is flawed. An alternative model can be fitted to the data by splitting time into a number of (equal) periods. In this formulation, constant correlations are assumed: (1) for any two observations in the cluster from the same time period (WPC); and (2) for any two observations from the same cluster in different time periods (IPC).

These assumptions are consistent with a variance-decomposition into three independent components: between clusters (σ e 2); between time periods (within clusters) (σ c 2); and between subjects (within time period and cluster) (σ t 2).

Now, the WPC is the correlation of observations between two patients in the same cluster from the same time period. This can be calculated as:

$$ \frac{{\sigma_c}^2+{\sigma_t}^2}{{\sigma_c}^2+{\sigma_e}^2+{\sigma_t}^2}. $$

The IPC is the correlation of observations between two patients in the cluster from different time periods, and is calculated as:

$$ \frac{{\sigma_c}^2}{{\sigma_c}^2+{\sigma_e}^2+{\sigma_t}^2}. $$

In this framework, the correlation, ρ, between two randomly selected observations within the same cluster is given by a within-cluster correlation (WCC) defined by:

$$ WCC=IPC+\frac{1}{n_{tp}}\left(WPC-IPC\right). $$

Here n tp is the number of time periods in the study. It is assumed that each time period contains an equal number of observations.

The ratio of the IPC to the WPC is known as the cluster autocorrelation (CA), which is the correlation between the cluster level mean outcome over time [28]. The cluster autocorrelation has been established as key to sample size formula for studies with a repeated cross-sectional design [37]. We present estimates of the CA alongside the IPC and WPC.

In the absence of period effects, the CA = 1, indicating that the time-dependent model is unnecessary. In this setting, WCC = WPC = IPC. Otherwise it follows from the definitions that WPC > WCC > IPC.

Correlation of binary outcomes

In the context of a clinical trial, data are often dichotomous – recording the presence or absence of a particular clinical outcome. The ICC that appears in the design effect is then defined as the correlation between two binary outcomes from two patients in the same cluster. In such cases, sample size calculations will typically entail a normal approximation to the binomial distribution which describes the number of positive outcomes in a sample of fixed size. Nevertheless the analysis of dichotomous outcomes in cluster trials is often conducted via a multilevel logistic model. In such models the observed binary outcome may be conceptualised as having arisen by dichotomising a continuous latent scale. When these models are fitted in some analysis packages (e.g. Stata) a type of ICC is presented which relates not to the observed binary outcomes but to this unobservable latent scale. It takes the form:

$$ \frac{{\sigma_b}^2}{{\sigma_b}^2+{\pi}^2/3}, $$

where σ b 2 is the between-cluster component of variance on the latent scale and the term π 2/3 is associated with the logistic distribution used to generate the binary model.

Since this version of the ICC refers to the unobservable latent scale, rather than the correlation between the binary outcomes of two patients from within the same cluster, this ICC should not be used directly to compute design effects for sample size calculations. In principle, a latent ICC from a logistic regression model can be converted to a natural ICC on the proportion scale for the raw binary data, taking account of the prevalence of the outcome – see, for example, the table presented by Eldridge et al. [21]. Throughout this paper we maintain the distinction between a natural ICC on the proportion scale and a latent ICC for binary data. It is the natural ICC on the proportion scale that contributes to the calculation of design effects.

Outcome variables

The aim was to investigate the correlation of all routinely recorded variables that might be clinically relevant to a trial undertaken in type-2 diabetes. The outcome variables were divided into three categories: clinical measures, medication, and clinical outcomes. Clinical measures included HbA1c, systolic blood pressure, diastolic blood pressure, BMI, total cholesterol level, and high-density lipoprotein (HDL) cholesterol level. Medication measurements involved insulin and other hypoglycaemic medications. The clinical outcomes were a first diagnosis of: atrial fibrillation, chronic kidney disease, chronic obstructive pulmonary disease (COPD), ischaemic heart disease (IHD), peripheral vascular disease, and stroke. Patients who had suffered an event prior to the study were excluded from the analysis for that outcome.

Dichotomisation of continuous outcomes

In practice, many trials use dichotomised values of continuous outcome measures [38, 39], and so we generated dichotomised values for each continuous outcome. A threshold value of 7.5 % was chosen for HbA1c as NICE guidelines state that 7.5 % indicates inadequate control [40], in addition to being used in previous studies [41]. Multiple recommendations have been made that total cholesterol levels should be below 4.0 mmol/L and HDL cholesterol levels be above 1.2 mmol/L [42, 43]. Two relevant cut-points were used for both systolic blood pressure and BMI. For systolic blood pressure, a value of 140 mmHg is the upper limit recommended for patients with type-2 diabetes [40]. A lower value of 130 mmHg is the target that health care professionals aim to reduce systolic blood pressure to in patients who suffer from kidney and eye problems, or those who have suffered a stroke [40]. Two cut-points were chosen for BMI to correspond to the categories of overweight (25 kg/m2) and moderately obese (30 kg/m2).

Measurement periods

A cross-sectional sample of measurements taken over a 15-month period was used (1 January 2009 to 31 March 2010), to reflect the NICE quality and outcomes framework (QOF) [44], which monitors measurements taken for patients over a 15-month period. To estimate the IPC and WPC an additional 15 months (1 October 2007 to 31 December 2008) of data is used to estimate the time-dependent correlation, creating two 15-month time periods.

Since the measuring unit of HbA1c changed in 2009 from % to mmol/mol, the consistency in reporting is likely to be poor around this time. In view of this, we consider a slight variation, and a cross-sectional sample of measurements taken over a 12-month period was used (1 January 2008 to 31 December 2008). An additional 12 months (1 January 2007 to 31 December 2007) of data contributes towards the estimation of the IPC and WPC.

The Health Improvement Network

The retrospective cross-section of patients with type-2 diabetes was formed using data from The Health Improvement Network (THIN) database [36]. Participating general practices contributed anonymised demographics, prescribing information, and clinical data for more than 3.7 million patients throughout the UK. All practices used the Vision computer system.

All patients over 18 years of age were included if a diagnosis of type-2 diabetes, indicated by the appropriate ‘Read codes’, was made before the study index date. Read codes are a coded thesaurus of clinical terms that are used in the recording of patient data in primary care electronic medical records in the UK. The general practices were required to have been using the Vision computer system for a minimum of a 1 year period prior to the study index date, and to have an acceptable mortality reporting (AMR) date (an indicator of practice quality) [45].

Data summary

The included population was summarised by describing both patient and practice characteristics using appropriate summary statistics. General practice characteristics include the total number of practices, location (country) of the practice, and practice inclusion size (the number of patients from each practice satisfying the entry criteria). Patient characteristics (of the included population) were age (years), gender, location (country of residence), and deprivation quintiles.

We also summarised potential trial outcomes using suitable summary statistics. Outcomes included clinical measures, onset of clinical outcomes, and the prescription of medication. Although the HbA1c variable exhibits skewness, both mean and median values were given as it is assumed to be normally distributed in many trials.

Variation across practices in mean (or median) clinical measures, clinical outcomes, and the prescription of medication, was summarised by reporting the interquartile range (IQR) of the practice mean (or median) values.

Statistical models

Generalised linear mixed models were used to estimate the ICCs with cluster (general practice) modelled as the random effect. Both adjusted and unadjusted ICCs were estimated, with adjustments made for age, sex, location, and deprivation quintiles. All clinical measures were presented in both continuous and dichotomised form.

For continuous outcomes, a mixed-effects linear model was fitted and the ICC was estimated as the ratio of the between-cluster variance (of the outcome) to the total variance of the outcome.

For binary outcomes, a mixed-effects linear model was fitted to estimate the natural ICC on the proportion scale, whilst a mixed-effects logistic regression was fitted to estimate the latent ICC.

To estimate the WPC, IPC, and CA, a generalised linear mixed model was used, with two random effects – one for cluster (general practice) and one for a cluster by period interaction.

All analysis was performed using Stata 13 (StataCorp, College Station, TX, USA). Linear models were fitted using the mixed command, and logistic models fitted using the melogit command. Estimates of the ICC, WPC, and IPC were produced using the estat function.

Search of previous CRCTs

A systematic search of previous CRCTs investigating diabetes in primary care in the UK was carried out in order to compare the results from this analysis to values used in previous CRCTs.

The following sources were used: Medline (1950 to week 2 of May 2013), Medline InProcess (May 2013), and Google Scholar (May 2013). The searches were conducted in May 2013. The following phrases were used: type-II diabetes, type-2 diabetes, diabetes mellitus, diabetes mellitus non-insulin-dependent, adult-onset diabetes mellitus, cluster trial, clustered trial, cluster analysis, cluster analyses, clustering, disease clustering, cluster RCT, and cluster randomised (randomized) controlled trial. The search was limited to the English language.

Studies from all fields of research were included if they described a CRCT that had taken place, or was planned to take place, that used UK general practices as the unit of randomisation. Studies were included if at least one of the trial outcomes were: HbA1c levels, systolic blood pressure, diastolic blood pressure, BMI, total cholesterol, HDL cholesterol, the prescription of insulin, or the onset of microvascular and macrovascular outcomes.

Since the focus is on the ICCs used in the design of a CRCT, all trials in which individuals were the unit of randomisation were excluded from the study. All trials that did not take place in the UK were also excluded since ICC estimates may be affected by the country in which the trial is taking place. All trials with unspecified outcomes were excluded. Trials that aim to prevent the onset of diabetes were also excluded. Any duplicate or follow-on publications from the same trial were included as a single study.

Titles and abstracts retrieved from the search process were screened to obtain relevant trials. Full articles were then read and classified as either included or excluded. All included articles were then used for data extraction. The extracted information consisted of: study authors, outcome used, value of ICC used in the sample size calculation, standard deviation used in the sample size calculation (where appropriate), and the ICC estimated from the trial data (if reported).


Analysis of THIN data

A summary of patient and practice characteristics is given in Table 1. A total of 112,633 patients from 430 practices covering all areas of the UK, were included in the study. The socioeconomic status was fairly balanced across the categories. The median value of HbA1c (%) (7.05) was lower than the mean value (7.35), highlighting the positive skewness that is exhibited by the variable. Atrial fibrillation was the most common clinical outcome (1.06 %), whilst chronic kidney disease was the least common (0.35 %).

Table 1 Summary of study population (THIN database) by practice and patient-level characteristics

Table 2 summarises the proportion of patients whose clinical measures exceed the dichotomised value of the outcomes. Of the participants with a recording for HbA1c, over one third (34.2 %) had an HbA1c % exceeding 7.5 %. It was also found that over one half (57.2 %) exceeded the target systolic blood pressure of 130 mmHg whilst approximately one quarter (25.2 %) exceeded 140 mmHg. A large proportion (83.1 %) of the population were categorised as being overweight (>25 kg/m2) (34.8 %), obese (>30 kg/m2) (27.3 %), or morbidly obese (>35 kg/m2) (21.0 %).

Table 2 Summary statistics for clinical measures of included patients from THIN database in binary form

The variation of both the clinical outcomes and clinical measures across practices is given in Table 3. The interquartile range represents the practice mean outcome for the central 50 % of practices. ICC estimates and corresponding standard errors (SE) for clinical measures of continuous nature are given in Table 4 and compared further in Fig. 1. For clinical measurements, in continuous form, the ICCs had a median of 0.026 [IQR 0.020–0.032] and were similar when adjusting for confounding factors (median 0.025, IQR 0.020–0.029). The ICC for HbA1c was estimated to be 0.032 (SE 0.003) when using an unadjusted model and 0.032 (SE 0.003) after adjustment for patient-level factors.

Table 3 Summary of the variation of practice average values from included patients from THIN database
Table 4 Intra-cluster correlation coefficients (ICCs) for continuous outcomes for included patients from THIN database
Fig. 1
figure 1

Box plot highlighting the median, interquartile range, and range of the intra-cluster correlation coefficients (ICCs) that were estimated for continuous and binary clinical outcomes from both linear and logistic models (n = number of outcomes that had estimate of the ICC)

After dichotomising, the ICCs of clinical measures had a median latent ICC of 0.037 [IQR 0.023–0.055] and a median natural ICC on the proportion scale of 0.028 [IQR 0.018–0.039]. Clinical outcomes had a median latent ICC of 0.094 [IQR 0.027–0.136] and a median natural ICC on the proportion scale of 0.003 [IQR 0.001–0.005]. When comparing two clinical outcomes with similar prevalence, it is expected that the outcome with a larger IQR of the practice average would have a larger ICC. This is consistent with the larger natural and latent ICCs (Table 5) that are associated with COPD compared to IHD, both of which have a prevalence of around 1 % (Table 1). Figure 1 further highlights that latent ICCs were larger than natural ICCs on the proportion scale for binary outcomes, but also that the range of latent ICCs is higher than natural ICCs.

Table 5 Intra-cluster correlation coefficients (ICCs) for binary outcomes for included patients from THIN database

Estimates of the WPC, IPC, and CA for the two-period study design are given in Table 6. For HbA1c, the correlation between two patients during the same (12-month) time period (WPC) was estimated at 0.035 (SE 0.003). The correlation between two patients at different (12-month) time periods (IPC) is 0.019 (SE 0.003). There is evidence to suggest that the variance component related to time period is non-zero, and so the correlation of observations seems to decay over time. Excluding HbA1c, in the two-period (each of 15 months) design, the decay of correlation is further highlighted by the median WPC (0.021, IQR 0.021–0.032) and median IPC (0.018, IQR 0.013–0.021).

Table 6 Estimates of the within-period and inter-period correlation for included patients from THIN database from two consecutive periods

The median cluster autocorrelation (excluding HbA1c) is 0.649 [IQR 0.612–0.692], with total cholesterol having the smallest value – indicating that correlation of total cholesterol observations for patients in different time periods is much smaller than the correlation of observations in the same time period. Adjusting for covariates had some impact on correlation estimates. For total cholesterol, the CA in the adjusted model (0.281) was much lower than the unadjusted model (0.486). Conversely, HbA1c had much higher CA in the adjusted model (0.747) than in the unadjusted model (0.612).

Systematic search

Our search strategy found 133 relevant articles. From this, 70 articles were of irrelevant outcome or trial type (individually randomised design, genetics of diabetes, cross-sectional studies, etc.), 36 were excluded due to the population of the trials (not of UK origin), 7 articles were screening programmes, 6 aimed to prevent diabetes, and 2 articles were excluded as they measure prevalence of diabetes. Of the 12 trials remaining, 3 duplicates were removed, leaving 9 articles that met the inclusion criteria (see Additional file 1).

One CRCT used the cluster as unit of randomisation but did not use an ICC when calculating sample size [46]. Of the remaining eight CRCTs, two CRCTs [39, 47] used multiple outcomes and calculated sample sizes for each outcome of relevance. Seven CRCTs [14, 39, 4751] used HbA1c as an outcome measure, three [38, 39, 47] used systolic blood pressure, and two [39, 47] used cholesterol. However, cholesterol was not used as a sole outcome measure, only as secondary measure alongside both HbA1c and blood pressure. Of these eight CRCTs, two [38, 39] used a binary outcome, and seven [14, 39, 4751] used a continuous outcome (one used both a binary and continuous outcome [39]).

The median [IQR] ICC used to power the study for trials in which HbA1c % was the primary outcome was 0.047 [0.047–0.05] (Table 7). The two CRCTs [39, 47] in which total cholesterol (mmol/L) was the main outcome used 0.047 and 0.06 (binary outcome) as the ICC whilst the three CRCTs using blood pressure (mmHg) as the main outcome [38, 39, 47] used ICCs of 0.001 (binary outcome), 0.02 (binary outcome), and 0.035. The standard deviation of HbA1c % used was reported in six trials [14, 39, 47, 4951], of which the mean value was 1.7. The results of this paper found a similar standard deviation of 1.4 for HbA1c %, whereas the ICC found by this paper was lower (0.032 versus 0.047).

Table 7 Summary of systematic search of intra-cluster correlation coefficients (ICCs) used in previous trials

Only three trials reported ICCs from their analysis [14, 38, 48]. Two trials reported ICCs for HbA1c % [14, 48], with ICCs of 0.0253 and 0.02 (95 % CI 0.00–0.08), and one trial [38] reported an ICC for blood pressure of 0.035. For the two trials that reported the ICC, the reported value was lower than the value used in the initial sample size calculation, whilst for blood pressure the reported value was notably higher. However, for the trial that estimated an ICC for blood pressure [38], it was not clear what method was used to estimate this value.


Using THIN database, we have estimated ICCs for a variety of outcomes associated with type-2 diabetes. We are the first to report time-dependent correlations, the IPC and WPC, which can be used in the design of cluster cross-over and stepped wedge CRCTs. For binary outcomes, we reported both the latent ICC (an ICC from a logistic model) and the natural ICC on the proportion scale (an ICC from a linear model).

These results are primarily applicable for planned CRCTs aimed at the general practice level in the UK, but in the absence of other estimates, may be useful more widely. We found that the ICC for HbA1c used in the design of trials tended to be larger than that estimated here.

Intra-cluster correlation coefficients

ICCs were calculated for continuous and dichotomous clinical measurements and outcomes, using both adjusted and unadjusted models. This includes ICCs for continuous outcomes and ICCs for binary outcomes. Upon adjusting for age, sex, location, and deprivation quintiles, the ICCs were generally similar to the ICCs estimated from the unadjusted models (HbA1c 0.032 versus 0.032). Adjusting for confounding factors also had minimal impact on the standard error of the ICCs (HbA1c 0.003 versus 0.003).

There was a noticeable difference between natural ICCs and latent ICCs for binary outcomes. Latent ICCs estimated for clinical events were much larger than their corresponding natural ICC. Similar results were found by Wu et al. [52], who found that ICCs were smaller when modelled using linear regression than logistic regression.

For binary outcomes it is important to note that natural ICCs (an ICC from a linear model) are smaller for cases in which the prevalence’s are low [35, 53]. Here all clinical outcomes chosen were rare events and consequently had small prevalence’s. Since the dichotomised values were chosen to reflect typical values in relation to type-2 diabetes, the prevalence’s of these were naturally larger – resulting in a larger ICC.

Due to the importance of the prevalence on the natural ICC, care should be taken to ensure that an appropriate ICC is used. If the prevalence in a planned trial differs greatly from the prevalence used here, sample size calculations using the natural ICC from these results may be inaccurate

Since latent ICCs for dichotomous outcomes, are estimated using logistic regression, they are on a log-odds scale and so are defined on a different scale to a natural ICC [35, 52]. A latent ICC estimated in this manner will refer to an unobservable latent scale, rather than the correlation of observations within a cluster, and so would not be a relevant ICC for use in the design stage of a trial. Eldridge et al. [21] provide a table that allows some ICCs on this logistic scale to be converted into a natural ICC for a selection of prevalence’s.

Previous trials

Many authors discuss the most appropriate methods and models that should be used to model ICCs in situations in which the outcome is binary [35, 52, 54], and there are numerous cases in which previous authors have correctly estimated ICCs for binary outcomes using linear models for future trialists to use [34, 5557]. However, there are still some situations where a logistic model is used [5860]. The differences between the natural ICC and the latent ICC are also considered by Merlo et al. [61] who note that since the natural ICC depends on the prevalence of the outcome; any comparisons made regarding the magnitude of clustering should be made using the latent ICC. We agree that that care should be taken when using the natural ICC to describe the extent of clustering in a trial with binary outcomes; however, we cannot recommend that the latent ICC is used directly in the design of future trials.

The number of previous cluster trials involving type-2 diabetes that have reported ICCs from their results is rather small, which will leave future trialists using ad hoc values or conservative values. The ICCs found in this paper were smaller than that often used in trials, but more consistent with the ICCs that were reported from the results of previous trials. The ICC for HbA1c %, the most common outcome in a trial involving type-2 diabetes, was found to be 0.032 (SD 0.003). Trials in which the primary outcome is binary should use an ICC from a linear model when estimating a required sample size, and not one obtained from a logistic model, even if the data will be analysed using a logistic model.

Inter-period correlation coefficients

It is emerging that cluster designs require not only estimates of within-cluster correlation measures, but some value of how this correlation decays over time [29, 62]. We have attempted in part to address this issue and are the first to provide estimates of the inter-period correlation and the within-period correlation alongside ICCs. However, we have only provided these estimates for continuous outcomes and we have only provided estimates assuming a cross-sectional study design. Clearly, many studies use a cohort design and many studies contain a primary outcome that is dichotomous in nature. However, estimation of correlation coefficients for binary outcomes are more complex due to the change of scale; and adding a cohort structure would increase complexity, as it would also be necessary to allow for within-person correlation.

The IPC and WPC may also be reported as the CA. It has been established that the sample size is directly impacted by the CA [37]. No guidelines exist for reasonable values of the CA, but values of 0.8 and 1.0 have previously been used [28, 63]. Here we have shown that for our study design, the CA may be smaller than these estimates.

Ignoring the IPC and CA in sample size calculations may lead to incorrect estimates of the required number of clusters in a CRT [29] or to underpowered studies [28]. Studies in which the IPC differs to the WPC should ensure that the estimates of ρ for use in Eq. 1 stem from the WCC estimated via Eq. 3, and not from an ICC estimated by Eq. 2.

Future research

It has been established that the ICC, IPC, and CA are necessary for sample size calculations for CRCTs. However, there is opportunity for future research into the IPC and the impact of time between observations in the model for CRCTs. It is perhaps naïve to assume a fixed correlation between observations in a cluster trial regardless of the time between these. Instead, this correlation should depend on time, and this length of time may be important. It is not known what impact changing the length of time period or the length of the study period would have on the IPC. Additionally, the IPC used to direct a sample size calculation should be calculated from a dataset using a similar time period and study length. The motivating idea behind additional correlation types is repeated cross-sectional designs such as the cluster cross-over design and the stepped wedge design. However, these results may indicate that sample size in parallel CRCTs should also acknowledge that correlation may be time-dependent. Future research is likely to show that recognising the decay in correlation over time in the model would increase power in parallel designs.


There are limitations that may arise from using routine data from general practices. It is not always possible to distinguish between follow-up care for a first clinical event (e.g. myocardial infarction) from a second event as they may have been coded in an identical manner. This means that patients who had suffered an event prior to the study inclusion period would have to be excluded from the analysis. There is also the possibility of misclassification as type-2 diabetes rather than type-1 diabetes due to coding errors, which could lead to younger patients being included in the study unintentionally.

Since THIN dataset consists of data from general practices only, the results can only be adjusted for variables that are recorded by the practice. The quality of service may vary between practices and so there may be situations in which clinical measures are monitored in different intervals which, along with quality of reporting and recording of measurements, could lead to an inconsistency.

Although the reporting of clinical measures during the 15-month cross-section that was chosen as the inclusion period was high, the length of the cross-section may not accurately represent the length of trials in practice.


An estimate of the ICC is vital when calculating the sample size requirement in a pretrial calculation [21]. We estimated ICCs for a range of clinical outcomes related to type-2 diabetes that would be useful for planning a trial in UK primary care. The primary outcome used in type-2 diabetes trials is often HbA1c, for which we estimated an ICC of 0.032. We have also illustrated how the methodology described here could be extended for other outcomes or disease settings.

For binary outcomes, the results show careful consideration is needed when estimating the ICC. This is because, in a trial with a dichotomous outcome, the ICC used at the design stage should refer to the variation in the observed data rather than the underlying logistic scale. Despite the analysis of binary outcomes being usually conducted via a logistic regression model, the latent ICC obtained from such model should not be used for sample size calculations. Rather, the ICC used in the design stage of a trial should be estimated from a linear mixed model on the natural scale.

In cluster trials with repeated cross-sections, observations are taken over multiple time periods. It is likely that observations within a cluster within the same time period are more highly correlated than observations from different time periods. The inter-period correlation and within-period correlation provides an estimate of how this correlation deteriorates over time. We are the first to report estimates of the IPC and WPC and we have illustrated how these differ from the ICC. It may be important to acknowledge the degeneration of correlation over time in repeated cross-sectional studies.