In comparison with the European American population, minority groups in the USA experience a disproportionate burden of type 2 diabetes [1]. This is particularly evident in some Native American tribes, with the Pima Indians presenting one of the highest population prevalences of type 2 diabetes ever reported for any ethnic group [24]. A similar trend can be observed among the admixed populations of Mexico, the Caribbean, and Central and South America (including recent immigrants to the United States), who are commonly characterised as Latinos. Indeed, the risk of type 2 diabetes is two- to fivefold higher in Latinos from Puerto Rico, Texas, New Mexico or Colorado [59] than in whites, an epidemiological difference that persists after adjusting for other traits such as abdominal obesity [10].

These differences support the notion that diabetes risk factors occur at higher frequency in populations of Native American descent. A genetic component for this bias was suggested in the form of the ‘thrifty gene’ hypothesis and its interaction with the environment first posed by J. Neel over 40 years ago [11]. A similar hypothesis was proposed for Mexican-Americans, based on crude skin pigmentation measures [12] and subsequently supported by analysis of genetic markers [13]. Initial evidence of a genetic contribution to the higher risk of type 2 diabetes in Native Americans was provided by the observation that Pima Indians with type 2 diabetes have significantly more Native American ancestry than their normoglycaemic counterparts [14] and that the fraction of European ancestry was associated with protective metabolic phenotypes in this population [15]; the potential contribution of socioeconomic status to these estimates was not fully assessed. In addition, the recent identification of the ABCA1 R230C variant in Mexican admixed individuals (mestizos) illustrates that exclusive gene variants derived from Native American ancestry can indeed influence type 2 diabetes risk [16, 17].

The advent of comprehensive databases cataloguing genetic variation [18], the development of high-throughput genotyping technologies and the availability of DNA samples from multiple populations make it possible to select a set of single nucleotide polymorphisms (SNPs) that are highly informative for geographic ancestry, commonly termed ancestry-informative markers (AIMs). Thus, SNPs that have widely divergent allele frequencies in ancestral populations can be compiled to make such determinations of ancestral origin. When evenly spaced throughout the genome, approximately 2,000 AIMs can be employed to infer ancestry at each genomic location for admixture mapping; but a much smaller set of randomly distributed AIMs (<100) can also be genotyped in admixed persons to derive a fairly accurate estimate of genetic ancestry, expressed as a proportion of each individual’s genome. Such compilations specific to Latino populations have recently been done by three different groups, including our own [1921].

In anticipation of genome-wide admixture mapping, AIMs have been applied to Latino samples with the goal of estimating the genetic contribution to the increased diabetes prevalence in this population. An initial study performed in American Latinos and stratified by neighbourhood detected a strong association between Native American ancestry and socioeconomic status; however, the authors concluded that despite the presence of such confounding, a genetic component to the increased disease prevalence was likely [13]. The association between type 2 diabetes and Native American ancestry was further substantiated by Parra et al. in Mexican-Americans from the San Luis Valley, Colorado; but once again, controlling for income and education abolished the statistical significance of the finding [22]. More recently, a similar study was conducted in a sample of 286 unrelated diabetic patients and 275 controls assembled from users of the Social Security hospital in Mexico City, which is thought to capture a large middle segment of the population devoid of upper- and lower-income outliers. This report found a non-significant increase in Native American ancestry among participants with diabetes, but a much stronger association between higher educational level and both European ancestry and non-diabetic status [23].

The success of admixture mapping in Latino populations is predicated on the ability to dissect these extra-genetic confounders from the genetic association. The correlation of socioeconomic status with ancestry in samples from two distinct US American locations and in an additional sample from Mexico City suggests that this may be a general phenomenon. To replicate and expand these observations, and as a way of investigating its likely impact on ongoing admixture mapping studies, we evaluated the contribution of socioeconomic status to the ancestry–diabetes relationship in two separate, non-US Latino populations from North and South America.



We studied 499 patients with type 2 diabetes and 197 controls from Medellín (Colombia), as well as 163 patients with type 2 diabetes and 72 controls from central Mexico. Demographic characteristics are presented in Table 1. In Colombia, diabetic patients were recruited from diabetes clinics in and around Medellín, with diagnostic criteria including fasting plasma glucose ≥7 mmol/l or 2 h glucose >11.1 mmol/l after a 75 g OGTT. Controls were unaffected participants over 40 years of age who had no history of diabetes among first-degree relatives (and at times also including grandparents). Medellín is divided into main sectors (‘comunas’) and the mean socioeconomic status is available for each one. We balanced the collection of patients and controls in Medellín by selecting the diabetes clinics (for the patients) and the centres for the care of the elderly (for the controls) from the same sectors of the city, with the express intention of reflecting a range of socioeconomic status strata. In Mexico, type 2 diabetic participants included individuals treated at the Diabetes Clinic of the Instituto Nacional de Ciencias Médicas y Nutrición in Mexico City, in whom diabetes was confirmed with a blood glucose sample obtained after a 9 to 12 h fast. Exclusion criteria included secondary causes of diabetes (e.g. diseases of the exocrine pancreas, endocrinopathies, both drug or chemical-induced), genetic syndromes associated with diabetes and insulin therapy during the first 2 years after diagnosis. Controls were selected among spouses of diabetic participants or unrelated patients who were seeking medical attention at the same Instituto Nacional de Ciencias Médicas y Nutrición for reasons other than diabetes (e.g. primary dyslipidaemia) and were older than 40 years of age, had no history of diabetes among first-degree relatives and had fasting plasma glucose less than 6.1 mmol/l. Individuals identified themselves as Mexican mestizos with both parents and grandparents born in Mexico. By this ascertainment strategy we tried to ensure that our cases and controls were selected from the same geographical area, and had similar socioeconomic status and comparable access to the public health system.

Table 1 Demographic characteristics of genotyped samples

To estimate allele frequencies in the ancestral populations and project ancestry proportions, we also studied several unmixed populations: European Americans from Baltimore and Chicago (n = 77) and from the HapMap Centre d’Etude du Polymorphisme Humain (Utah residents with northern and western European ancestry) collection (n = 60); Spaniards from Valencia (n = 31); West Africans from Ghana (n = 52) and from the HapMap Yoruba in Ibadan, Nigeria collection (n = 60); and Native Americans from the Mazahua (n = 22), Zapotec (n = 60), Mixtec (n = 23) and Mixe (n = 29) populations. All participants gave informed consent and studies were carried out in accordance with the principles of the Declaration of Helsinki as revised in 2000.

Estimation of socioeconomic status

In Colombia, we had access to a government-assigned ‘property band’ based on property valuation of each individual home for the purposes of billing for public utilities and ranging from 1 (lowest) to 6 (highest). An assignment to one of these strata was made for each participant based on the telephone number they provided at the time of their interview. We contrasted this information with other data we collected (such as home or car ownership) and confirmed a close correlation between the governmental water usage rank and these other markers of socioeconomic status.

In Mexico, socioeconomic status was determined by social workers at the National Institute of Medical Sciences and Nutrition using a standardised and validated tool currently applied in all Mexican National Institutes of Health studies [24]. Questionnaires include information on six categories: family monthly income, occupation of the head of the household, percentage of family income spent on food, type and characteristics of residence (owner-occupied, rented or shared with extended family), place of residence and the presence of chronic illnesses in other family members. Points are given for each category and the sum is used to assign participants to one of six socioeconomic status bands (lowest to highest). Supporting documents are required to validate the information. When information was not complete or questionable, socioeconomic status assignments were further explored and confirmed via unscheduled home visits.


Samples from all populations were genotyped using the Sequenom MassARRAY technology [25] at 67 AIMs (Electronic supplementary material [ESM] Table S1). These SNPs are on average 49% different in frequency between Native Americans and European Americans, and are spaced by at least 10 cM (or >10 Mb) on chromosomes 1 to 22 (see Supplementary Table B in Smith et al. [26], from which these markers were selected). The large number of markers and the high degree of informativeness per marker in this set yield precise estimates of the proportion of European ancestry for each individual and also a measure of the precision of each estimate as a standard error. The average standard error for the percentage European ancestry estimate was ±7.2% (in Mexicans) and ±8.0% (in Colombians) respectively. We had complete data for 89% of genotypes; this reduction below 100% reflects the fact that slightly different panels of markers were genotyped on some samples. The missing data are not expected to affect our estimates of ancestry proportion.

Statistical methods

To search for genetic differences associated with ancestry between participants with type 2 diabetes and controls (specifically, under-representation of European ancestry in diabetic participants), we used a mixture-of-binomials model to estimate the ancestry proportions of each Latino individual as previously described [20], and compared the distributions of ancestries in cases vs controls (Table 2). We note that Latino populations represent a three-way admixture of European, Native American and African ancestry. Thus, we had a choice of whether to base our analyses on per cent European ancestry (distinguishing European vs [Native American + African]) or on per cent Native American ancestry (distinguishing Native American vs [European + African]). Because the 67 AIMs used to infer ancestry were relatively less informative for Native American vs African ancestry, the standard errors for inferred per cent European ancestry were smaller than the standard errors for inferred per cent Native American ancestry: therefore, we based our analyses on per cent European ancestry.

Table 2 Association of non-European ancestry with type 2 diabetes in Latinos

Associations of European ancestry proportion, socioeconomic status and/or BMI with type 2 diabetes status were assessed via logistic regression with (1) European ancestry proportion, (2) socioeconomic status, (3) socioeconomic status and European ancestry proportion or (4) socioeconomic status, European ancestry proportion and BMI as covariates. We included a constant term in each regression analysis. For example, letting π denote disease outcome, the logistic regression model for (4) was

$$ \pi = \frac{{e^{{c_0 + c_{\text{SESSES}} + c_{\text{ancanc}} + c_{\text{BMIBMI}}}} }}{{1 + e^{{c_0 + c_{\text{SESSES}} + c_{\text{ancanc}} + c_{\text{BMIBMI}}}} }} $$

where c BMI is the logistic regression coefficient multiplying BMI, anc is genome-wide ancestry, c anc is the logistic regression coefficient multiplying anc, SES is socioeconomic status, c SES is the logistic regression coefficient multiplying socioeconomic status, and c 0 is the logistic regression coefficient multiplying the constant term.

For each regression analysis, we computed effect sizes, p values and pseudo-r 2 values. We defined pseudo-r 2 as the reduction in magnitude of the mean square value of disease outcome minus the predicted probability of disease outcome, comparing each of the four logistic regression models to a constant-term only model.


We analysed the relationship between genetic ancestry and type 2 diabetes status in two Latino populations from Mexico and Colombia. Both of these populations inherit European, Native American and African ancestry, but we focused our analysis on the proportion of European ancestry as inferred from using 67 AIMs. In the Mexicans, we observed a statistically significant difference (OR [95% CI] 0.06 [0.02–0.21], p = 2 × 10−5) in genetic ancestry between patients with type 2 diabetes and age- and sex-matched controls from central Mexico (Table 2). A similar phenomenon, although of lesser magnitude, was observed for the Colombians, with a nominally significant difference (OR 0.26 [0.08–0.78], p = 0.02) in genetic ancestry between diabetic participants and non-diabetic controls (Table 2). This ancestry difference was due to participants with type 2 diabetes having less European ancestry in both groups (Fig. 1). Overall, the proportion of European ancestry is estimated to be 33% in diabetic participants vs 46% in controls in Mexicans, and 56% vs 59% respectively in Colombians.

Fig. 1
figure 1

Histograms of European ancestry proportions in type 2 diabetes participants and controls from Mexico (a) and Colombia (b), using 67 AIMs. Black bars, type 2 diabetes; grey bars, controls

We investigated whether the observed association between genetic ancestry and type 2 diabetes was confounded by socioeconomic status. We determined that socioeconomic status was strongly correlated with European ancestry proportion in Mexicans (33% correlation, p = 4 × 10−7) and Colombians (34% correlation, p = 2 × 10−19), and therefore could potentially confound associations between genetic ancestry and type 2 diabetes. In the Mexicans, socioeconomic status was strongly predictive of case–control status (p = 2 × 10−10) and when socioeconomic status and ancestry were analysed together, the association between non-European ancestry and type 2 diabetes was significantly attenuated (OR 0.17 [0.04–0.71], p = 0.02; Table 3). In the Colombians, the association between socioeconomic status and type 2 diabetes was weaker, but still strongly significant due to the larger sample size (p = 8 × 10−7). Inclusion of both socioeconomic status and ancestry in the model abolished the significance of the ancestry–phenotype relationship (OR 0.64 [0.19–2.12], p = 0.46; Table 3). For both populations, inclusion of socioeconomic status in the model reduced the effect size as well as the statistical significance of the association between ancestry and type 2 diabetes (Table 3). However, in each case, the effect size and statistical significance of the association between socioeconomic status and type 2 diabetes was little changed by accounting for ancestry (Table 3).

Table 3 Association of non-European ancestry with type 2 diabetes in Latinos is confounded by socioeconomic status

As an alternative way to investigate the association between non-European ancestry and type 2 diabetes while accounting for socioeconomic status, we conducted a stratified analysis, in which we analysed each socioeconomic status stratum separately. We excluded the eight Colombian samples in socioeconomic status stratum 6 from this analysis, so that five strata were analysed for each population. In Mexicans and Colombians, the association between ancestry and type 2 diabetes did not reach statistical significance (nominal p = 0.01, corrected p = 0.05 after accounting for five statistical tests) for any stratum (Fig. 2a,b). These results are consistent with the greatly reduced association between ancestry and type 2 diabetes when accounting for socioeconomic status (Table 3).

Fig. 2
figure 2

Distribution of ancestry proportions across strata of socioeconomic status (SES) in (a) Mexicans and (b) Colombians. c Distribution of mean socioeconomic status scores across strata of increasing proportion of European ancestry in Mexicans and (d) in Colombians. Error bars denote SEM. Black bars, type 2 diabetes; grey bars, controls

We also conducted a stratified analysis in which we considered each of five ancestry strata (0–20%, 20–40%, 40–60%, 60–80% or 80–100% European ancestry) and analysed associations between socioeconomic status and type 2 diabetes within each stratum (Fig. 2c,d). For Mexicans, we obtained nominal p values of 0.003, 0.12, 0.00001, 0.11 and 0.01, each with a negative coefficient for socioeconomic status. For Colombians, we excluded ancestry stratum 0–20% (which contained only six samples, each with type 2 diabetes) and obtained nominal p values of 0.004, 0.0006, 0.006 and 0.49, with a negative coefficient for socioeconomic status for each of the three significant p values. These results are consistent with the association between socioeconomic status and type 2 diabetes remaining strong after accounting for ancestry (Table 3).

We finally considered whether BMI confounds the observed associations between either socioeconomic status or non-European ancestry and type 2 diabetes. We determined that BMI is negatively correlated with socioeconomic status in Mexicans (−20% correlation, p = 0.002) and Colombians (−7% correlation, p = 0.05), but that correlations between BMI and European ancestry proportion are not statistically significant in Mexicans (−12% correlation, p = 0.06) or in Colombians (−1% correlation, p = 0.72). The significant negative correlation between BMI and socioeconomic status implies that BMI could potentially confound our analyses. However, inclusion of BMI as a covariate in our analyses did not change the effect size or statistical significance of the associations of socioeconomic status and non-European ancestry with type 2 diabetes either in Mexicans or in Colombians, even though BMI itself was associated with type 2 diabetes in both populations (Table 4).

Table 4 Association of socioeconomic status and non-European ancestry with type 2 diabetes is not confounded by BMI


These data demonstrate a genetic association between type 2 diabetes and individual non-European ancestry proportions in Latinos, while also showing that this evidence of association is highly confounded by socioeconomic status. Combining our study with previous results, this pattern has now been observed in North American, Central American and South American populations [13, 22, 23], with similar trends also noted in African-Americans [27].

Our study design has several limitations. First, our strategies for estimating socioeconomic status are necessarily imprecise and differ between both locations. Second, because type 2 diabetic participants and controls were ascertained separately, some bias may have been introduced despite best efforts to match recruitment procedures. Third, our sample size may have been too small to assess the relative effects of highly correlated, potentially confounding variables. And fourth, given the diversity inherent in Latinos, our findings may not be generalisable to all Latinos or to other populations with admixed Native American ancestry. Nevertheless, we believe we have taken analytical measures to address potential confounders and that the similar results obtained in two different populations strengthen our conclusions.

Our results show that, due to the correlation between socioeconomic status and Native American ancestry, it is difficult to disentangle the relationship between genetics and social factors in the contribution to disease risk. Low socioeconomic status can increase diabetes risk via a variety of mechanisms such as poor access to care, neglect of preventive strategies, a lower ability to exercise or an unhealthy diet. The question of whether the increased susceptibility to type 2 diabetes in Native American populations is caused by genetic or social factors (or a combination of both) is difficult to resolve accurately as long as low socioeconomic status is more prevalent among persons with greater Native American ancestry; answering it appropriately would require that admixed case–control cohorts be carefully matched for socioeconomic status, a much larger sample be used or twin studies discordant for socioeconomic status be undertaken. In some cases (e.g. the ABCA1 R230C polymorphism), the strong association between a genetic variant and type 2 diabetes risk may be largely unaffected when adjusted for different confounders, including educational level as a surrogate of socioeconomic status [16, 17]. On the other hand, due to the strong association between Native American ancestry and socioeconomic status, use of the latter as a covariate in admixture studies might mask a true association signal in populations where low socioeconomic status is highly correlated to ancestral background.

The association between type 2 diabetes and Native American origin remained nominally significant after adjustment for socioeconomic status in the Mexicans, but not in the Colombians. This may be due to a weaker initial signal in the Colombians, differences in socioeconomic status ascertainment between the two populations or a combination of both factors. Because the Mexican sample was enriched for early-onset cases (n = 81), it is possible that genetic susceptibility to type 2 diabetes may have been stronger among the Mexicans. Another potential reason for the stronger p value in Mexicans is the very different distribution of Native American heritage in Mexicans vs Colombians (Fig. 1a, b). Mexicans have a wide range of ancestry proportions, so enrichment of the type 2 diabetes group by individuals with more Native American ancestry could cause a wider separation of ancestry proportions that is easier to measure. Conversely, Colombians have a narrower range, which may be due to more generations of mixing and homogenisation [28]. Just as in African-Americans [26], this phenomenon limits the separation in ancestry proportion between cases and controls, even if a strong effect of ancestry causes oversampling of people who have inherited more of one ancestry. This may explain some of the discrepancies observed in the literature [22]. Thus it is possible that Pima Indians, like the samples in this study from central Mexico, have a wide range of ancestry proportions, making the oversampling of individuals with more extreme values of one ancestry easily detectable, whereas the San Luis Valley samples, like our samples from Colombia, have a narrow spread of ancestries making the effect more subtle. A higher degree of admixture and homogenisation in a society may also lead to decreased health disparities of a social nature. Alternatively, the lack of precision inherent in estimations of socioeconomic status (which often rely on information provided by participants who may have secondary motives for reporting a different stratum) may also have contributed to the observed differences. Our findings also illustrate the difficulties in making generalisations about Latino populations, which are often characterised by very diverse ancestral origins and environmental contexts that may affect disease, socioeconomic status and their interactions.

These results also have implications for the prospects of gene mapping to find risk factors for type 2 diabetes in Latinos. Based on the epidemiological observation that type 2 diabetes is more prevalent in Latinos than in populations of European descent, it has been speculated that genetic risk factors for type 2 diabetes must be more common in Native Americans than in Europeans. Such variants could be located by admixture mapping, a technique that scans through the genome of affected individuals of mixed Native American, European and African ancestry, searching for regions with unusually high proportions of one ancestry compared with the genome-wide average. The observation that increased type 2 diabetes prevalence in Latinos is at least partially explained by environmental factors decreases the likelihood that significant genetic risk factors can be easily found through this approach, although it does not rule out the possibility that the method can work. These results do not imply that genome-wide association scans cannot detect variants associated with either increased or decreased Native American ancestry. Indeed, when a variant is much more common in one population than another, it will be easier to achieve genome-wide statistical significance in a genome-wide scan performed in the former than it will be in the latter [2931].

It is important to recognise that socioeconomic status will not be a confounder in type 2 diabetes admixture mapping in the same way as it is for the present analysis. In admixture disease mapping, each locus in the genome is separately tested for association, using the rest of the genome as a control to assess whether the locus stands out. If an association is observed at any locus, it must signify a real genetic connection of that particular locus to disease (socioeconomic status is not expected to be locus-specific). However, because of the strong effect of socioeconomic status on type 2 diabetes risk in Latinos, rich information on socioeconomic status may be an important covariate that will increase statistical power in scans for genetic risk factors for type 2 diabetes. Tests for interactions between genes and environment in Latinos with type 2 diabetes may offer more power to detect risk factors for the disease than would be afforded if the modulatory effect of socioeconomic status on genetic risk were not taken into account.