Background

Non-communicable diseases (NCDs) are a growing global epidemic that disproportionately affect low- and middle-income countries (LMICs) [1]. In sub-Saharan Africa, they are now one of the leading causes of death among adults, and in order to begin to address this burden, high quality community-based epidemiological studies from the region are urgently needed [25]. Additionally, outcomes-related research either through observational cohort studies or randomized-controlled trials (RCTs) will be an important component of the public health response moving forward [6].

Nonetheless, many challenges exist in carrying out these studies. Poor infrastructure and a lack of resources in many of the sub-Saharan African countries limit rigorous studies, in part due to inadequate methodological capabilities. Physical addresses, phonebooks, and reliable census data are often unavailable for many populations in the region which means that representative community-based samples often require labor-intensive, prospective household surveys. In this context, cluster-designed sampling methods offer an efficient, practical, and cost-effective means of obtaining a representative sample from the population of interest [7, 8].

However, studies that use cluster sampling methods require extra considerations in their design and analyses, and cluster-designed studies in sub-Saharan Africa continue to inadequately address many of these considerations [9]. Because study participants or households are drawn from clusters, which serve as the primary sampling unit, they can demonstrate more homogeneity than would otherwise be expected from a simple, random sample. For NCDs, similar lifestyles, environmental risks, economic stress, and genetic backgrounds may all increase homogeneity within clusters, and consequently, this increased homogeneity within clusters, or intra-cluster correlation (ICC), can significantly affect the precision of population parameter estimates [10, 11]. The ICC is typically quantified by the ICC coefficient, and although the ICC coefficient can be calculated post hoc during the analysis stage, this method may not be preferable or ethical in many sub-Saharan African settings due to cost and limited resources. Accounting for the design effect beforehand allows for more accurate estimations of sample size, budget requirements, and logistical needs; however, for NCD-related research, few ICCs have been reported in the region [9, 10].

The Comprehensive Kidney Disease Assessment for Risk Factors, epidemiology, Knowledge, and Attitudes (CKD-AFRiKA) study is an ongoing project in northern Tanzania with the goal of understanding and addressing the health burden of chronic kidney disease (CKD) and CKD-related NCDs. As part of the study, we conducted a cluster-designed, community-based epidemiologic survey. In the design stage, we were unable to identify any comparable ICCs for health outcomes related to CKD or CKD-related NCDs, and we had to extrapolate them from data derived from high-income settings. To fill this gap, we report here the observed intra-cluster correlations for multiple NCD-related factors from a community-based, sub-Saharan African setting [12].

Methods

Ethics, consent, and permissions

The study protocol was approved by Duke University Institutional Review Board (#Pro00040784), the Kilimanjaro Christian Medical College Ethics Committee (EC#502), and the National Institute for Medical Research in Tanzania. Written informed consent (by signature or thumbprint) was obtained from all participants.

Study setting

We conducted a stratified, cluster-designed cross-sectional household survey between January and June 2014 in the Kilimanjaro Region of Tanzania, which has an adult population of more than 900,000 people [13, 14]. The region comprises seven districts, and our study was conducted in two of these districts, Moshi Urban and Moshi Rural, which served as strata for our sampling scheme. Within these districts, there are 21 and 31 administrative wards respectively that range in size from 1500 to 25,000 people. Each ward is then further sub-divided into neighborhoods (also known as streets). Neighborhoods are the most basic governmental administrative unit in Tanzania, and they range in population size from 500 to 5000 people. The 65 urban neighborhoods have a median population size of 2000 people and a median area of 0.50 km2. The 165 rural neighborhoods have a median population size of 2200 people and a median area of 4.00 km2. In total, there are 230 neighborhoods/streets in the Moshi Urban and Moshi Rural districts [14].

Sampling methods

We used a three-stage cluster probability sampling method stratified by urban and rural. We used a random-number generator to select twenty nine neighborhoods within the Moshi Urban and Moshi Rural districts. We based the random neighborhood selection on probability proportional to size sampling according to the 2012 national census [14]. From the twenty-nine neighborhoods, we then randomly selected the starting point for each sampling area (37 in total) using geographic coordinates randomly generated by Arc Global Information Systems (ArcGIS), v10.2.2 (Environmental Systems Research Institute, Redlands, CA). From the randomly-selected geographic point, we then chose households based on a coin-flip and die-rolling technique (Appendix 1). All non-pregnant adults (age ≥ 18 years old) living in the selected households were recruited. A neighborhood cluster, therefore, included a group of individuals living in geographically-related households within the boundaries of an administrative neighborhood.

We targeted an enrollment between 15 and 25 participants per sampling area based on the requirements of the CKD AFRiKA study. The total sample size was designed to estimate the community prevalence of CKD with a precision of 5 % when accounting for the cluster-design effect, assuming a CKD prevalence up to 20 % and an ICC coefficient of 0.05. To reduce non-response rates, we attempted a minimum of two additional visits during off-hours (evenings and weekends) and multiple phone calls using mobile phone numbers.

Data collection

Our data collection methods have been previously described in detail [12]. In brief, participants were tested for CKD and CKD-related conditions including diabetes and hypertension, and anthropomorphic data (including height, weight, and body mass index) were recorded for each participant.

CKD was defined as the presence of albuminuria (≥30 mg/dL; confirmed by repeat assessment) and/or a reduction in the estimated glomerular filtration rate (eGFR) ≤60 ml/min/1.73 m2 according to the Modification of Diet in Renal Disease equation without the race factor [15]. Hypertension was defined as a single blood pressure measurement of greater than 160/100 mmHg, a two-time average measurement of greater than 140/90 mmHg, or the ongoing use of anti-hypertensive medications. Glucose impairment was defined as an HbA1C >6.0 % in the presence or absence of ongoing treatment with anti-hyperglycemic medications. Diabetes mellitus was defined as an HbA1c level was ≥7.0 % or current known use of anti-hyperglycemic medications for the purpose of treating diabetes. Participants with an HbA1C between 6.0 % and 6.0 % in the absence of treatment with anti-hyperglycemic medications were considered to have pre-diabetes. Overweight was defined as a body mass index (BMI) greater than 25 kg/m2 and obesity was defined as a BMI greater than 30 kg/m2.

Data analysis

We used STATA version 13 (STATA Corp., College Station, TX) for all data analyses. Continuous variables were summarized by the mean and standard deviation (SD) or median and inter-quartile range (IQR). Categorical variables were summarized using counts and percentages. To address potential non-response bias, mean and prevalence estimates were sample-adjusted using age- and gender-weights based on the 2012 urban and rural district-level census data [14]. To estimate the level of clustering in health outcome variables at the household level, the sampling area level, and the neighborhood cluster level, we first fitted a mixed effect model with separate random intercepts for neighborhood, sampling area, and household for each of the outcomes of interest. In these models, after accounting for neighborhood, very little clustering (<15 %) remained at the sampling area level and household level indicating that most of the variation in these outcomes was explained at the individual and neighborhood cluster-levels. As such, we estimated the ICC for the neighborhood clusters only.

To estimate the absolute-agreement ICC coefficient for neighborhood clusters (ρ) we used a one-way, random effects analysis of variance (ANOVA) estimator which has been shown to perform well for both binary and continuous outcomes across a wide range of ρ and cluster sizes [1619]. These estimations were performed in STATA using the ‘loneway’ command which uses the F statistic to calculate ρ as described by Hayes and Moulton. Although alternative estimators are available for binary outcomes, given that the ANOVA estimator has been shown to perform well for binary outcomes, we chose to present all estimates based on the common, easily implementable approach as described above [17, 18].

We calculated ρ for the social characteristics, self-reported medical histories, physical and laboratory measurements, and measured health outcomes. Negative values were truncated at zero, and our reporting of ρ is in accordance with the guidelines suggested by Campbell et al. [20].

Variance estimation was based on asymptotic theory, as implemented in the ‘loneway’ command, which accommodates differing cluster sizes. The 95 % confidence intervals for each ICC coefficient were derived from the asymptotic standard error, which has been shown to provide good coverage probabilities for a wide range of parameter combinations including clusters, cluster sizes, and ρ [18, 21, 22]. Confidence intervals with negative values were truncated at zero.

Results

Between January 2014 and June 2014, we enrolled 481 participants from 346 households from a total of 37 sampling areas (30 urban and 7 rural) within 29 neighborhoods (23 Urban and 6 rural) (Table 1). These 29 neighborhoods were located within 18 wards (13 urban and 5 rural). The mean age was 46.9 years (SD 15.1). The household non-response rate was 15.0 %. Men (p < 0.001) and adults 18–39 years old (p = 0.001) were more likely to be non-responders. The median neighborhood cluster size was 13.0 participants (IQR 9–21), and neighborhood cluster size ranged from 6 to 49 participants (Appendix 2).

Table 1 Unweighted proportions for demographic, social characteristics, self-reported medical histories, health outcomes, and design parameters stratified by setting; N = 481 (CKD-AFRiKA, 2014)

The majority of participants lived in an urban setting (n = 370; 77.0 %), were women (n = 358; 74.4 %), ethnically Chagga (n = 288; 59.9 %), and had obtained a primary school level of education (n = 349; 72.6 %), and most participants were occupied as farmers or daily wage-earners (n = 199; 41.4 %) (Table 1). Many participants reported an ongoing use of alcohol (n = 198; 41.2 %) and many reported a history of malaria (n = 427; 88.8 %), diabetes (n = 61; 12.7 %), or hypertension (n = 134; 28.0 %). Few reported a history of stroke, heart disease, tuberculosis, hepatitis, HIV/AIDS, COPD/asthma, cancer or kidney disease. From our assessment of NCD-related health outcomes, 149 participants (31.0 %) had hypertension, 138 (28.7 %) were obese, 57 (11.9 %) had CKD, and 129 (26.8 %) had glucose impairment of which 84 (17.5 %) had pre-diabetes and 45 (9.4 %) had diabetes.

Clustering varied across neighborhoods and differed by urban or rural setting. Overall ICC coefficients ranged from 0.00 to 0.125 with a mean value of 0.30 (SD 0.033) (Table 2). In the rural setting, ICC coefficients ranged from 0.000 to 0.331, and in the urban setting, ICC coefficients ranged from 0.000 to 0.109. Ongoing alcohol use exhibited the strongest neighborhood clustering (ρ = 0.125), which was most prominent in rural neighborhoods (ρ = 0.331). Ongoing tobacco use exhibited modest neighborhood clustering in both rural (ρ = 0.022) and urban settings (ρ = 0.042). Neighborhood clustering of self-reported medical histories was most significant for diabetes (ρ = 0.045), hypertension (ρ = 0.100), HIV (ρ = 0.054), and CKD (ρ = 0.020).

Table 2 Population-based intra-cluster correlation coefficients (ρ) for neighborhood clustering; N = 481 (CKD-AFRiKA, 2014)

Among the NCDs, neighborhood clustering varied with ρ ranging from 0.000 to 0.075. Hypertension (ρ = 0.075) exhibited the strongest clustering within neighborhoods followed by CKD (ρ = 0.440), obesity (ρ = 0.040), and glucose impairment (ρ = 0.039) (Fig. 1). Among those with glucose impairment, neighborhood clustering was more significant for pre-diabetes (ρ = 0.031) than for diabetes (ρ = 0.000). Neighborhood clustering for physical and laboratory measurements paralleled the NCD outcomes. Both systolic (ρ = 0.064) and diastolic (ρ = 0.056) blood pressures exhibited strong neighborhood clustering. Clustering for albuminuria was modest (ρ = 0.038), but it accounted for most of the neighborhood clustering observed for CKD when compared to serum creatinine or eGFR measurements. Similar to obesity and glucose impairment, clustering of BMI was more significant in urban neighborhoods (ρ = 0.049) while clustering of HbA1C was more significant in rural neighborhoods (ρ = 0.025).

Fig. 1
figure 1

Neighborhood clustering of non-communicable diseases in northern Tanzania. Intra-cluster correlation coefficients, presented by prevalence, for CKD, obesity, glucose impairment, and hypertension

Discussion

In northern Tanzania, prevalence of NCDs, including hypertension, CKD, obesity, and glucose impairment, exhibited clustering by neighborhood. This clustering varied across urban and rural settings, and for NCD prevalence, it was most significant for hypertension and CKD. Based on the ICC coefficients that we observed, cluster-designed studies examining NCDs in the region should account for the design effect on precision or variance caused by clustering. In a region where the NCD burden is quickly growing, these results will be valuable in designing such studies, including cluster RCTs [5, 12, 23].

The urban and rural differences in neighborhood clustering of NCDs may highlight important environmental and lifestyle risk factors for the development of hypertension, glucose impairment, obesity, and CKD. The neighborhood clustering for hypertension and glucose impairment was most pronounced in the rural settings where families tend to remain more environmentally clustered, share meals, and work in similar agricultural jobs which may all contribute to the development of such NCDs that are known to be highly associated with lifestyle [2426]. On the other hand, obesity and CKD were most clustered in the urban neighborhoods. For obesity, this urban clustering highlights the importance that urban lifestyles, which may be clustered within neighborhoods on the basis of socioeconomic status, transportation, or occupation, play in the development of obesity. In the context of CKD, living in an urban setting has been shown to be a significant risk factor, yet specific etiologies associated with the urban environment remain unknown [12]. The clustering of CKD within urban neighborhoods that we observed may be important in highlighting causes of CKD, and it further stresses that public health efforts targeting CKD must take a broad approach that includes urban planning with sanitation improvement, safe drinking water, pollution reduction, and infection control.

Among all measured variables, ongoing alcohol use, hypertension, a self-reported history of hypertension, and a self-reported history of HIV were most highly correlated among cluster-sampled individuals, and the latter two variables may reflect an increased awareness and/or prevalence of these conditions within certain neighborhoods. In northern Tanzania, alcohol is commonly homemade and shared among households which may in part explain the significant clustering that we observed.

To our knowledge, this is the first community-based, household-level survey to report on the neighborhood clustering of NCDs in East Africa. As such, these are the first ICC coefficients reported for hypertension, CKD, obesity, and glucose impairment in the region, and compared to reports of ICC coefficients in high-income countries there are significant differences in several of the physical and laboratory variables [2729]. Because we also measured clustering in both an urban and rural settings we were able to demonstrate important differences which may help inform future studies examining the demographic transition of NCDs in sub-Saharan Africa where rapid urbanization is occurring [30].

Despite these strengths, we also noted a few limitations. Caution must be taken when applying these estimates to other populations and settings. Although the paucity of data currently available for NCD-related measurements and outcomes may make these results useful to researchers more broadly across the region, differences in prevalence and risk factors for NCDs, particularly those that are geographic or environmental-based, mean that even NCDs can cluster at different rates within villages, neighborhoods, or households. Additionally, although we used sample-balancing approaches to address potential non-response bias, the effect of participant non-response upon these estimates is not fully known. Finally, some results, such as self-reported medical history, rely upon the subjective response of individual participants, and as such, they may be prone to recall or response bias.

Conclusion

In conclusion, we have reported on the observed neighborhood clustering for several NCDs from a community-based study in northern Tanzania. The neighborhood clustering, which varied by urban or rural setting, was substantial enough to contribute to a design effect for NCD outcomes including hypertension, CKD, obesity, and glucose impairment, and it may also highlight NCD risk factors that vary by setting. These results may help inform the design of future community-based studies or randomized controlled trials examining NCDs in the region particularly those that use cluster-sampling methods.