Strong association of socioeconomic status with genetic ancestry in Latinos: implications for admixture studies of type 2 diabetes
Type 2 diabetes is more prevalent in US American minority populations of African or Native American descent than it is in European Americans. However, the proportion of this epidemiological difference that can be ascribed to genetic or environmental factors is unknown. To determine whether genetic ancestry is correlated with diabetes risk in Latinos, we estimated the proportion of European ancestry in case–control samples from Mexico and Colombia in whom socioeconomic status had been carefully ascertained.
We genotyped 67 ancestry-informative markers in 499 participants with type 2 diabetes and 197 controls from Medellín (Colombia), as well as in 163 participants with type 2 diabetes and 72 controls from central Mexico. Each participant was assigned a socioeconomic status scale via various measures.
Although European ancestry was associated with lower diabetes risk in Mexicans (OR [95% CI] 0.06 [0.02–0.21], p = 2.0 × 10−5) and Colombians (OR 0.26 [0.08–0.78], p = 0.02), adjustment for socioeconomic status eliminated the association in the Colombian sample (OR 0.64 [0.19–2.12], p = 0.46) and significantly attenuated it in the Mexican sample (OR 0.17 [0.04–0.71], p = 0.02). Adjustment for BMI did not change the results.
The proportion of non-European ancestry is associated with both type 2 diabetes and lower socioeconomic status in admixed Latino populations from North and South America. We conclude that ancestry-directed search for genetic markers associated with type 2 diabetes in Latinos may benefit from information involving social factors, as these factors have a quantitatively important effect on type 2 diabetes risk relative to ancestry effects.
KeywordsGenetic admixture Genetic association Latinos Socioeconomic status Type 2 diabetes
Single nucleotide polymorphism
In comparison with the European American population, minority groups in the USA experience a disproportionate burden of type 2 diabetes . This is particularly evident in some Native American tribes, with the Pima Indians presenting one of the highest population prevalences of type 2 diabetes ever reported for any ethnic group [2, 3, 4]. A similar trend can be observed among the admixed populations of Mexico, the Caribbean, and Central and South America (including recent immigrants to the United States), who are commonly characterised as Latinos. Indeed, the risk of type 2 diabetes is two- to fivefold higher in Latinos from Puerto Rico, Texas, New Mexico or Colorado [5, 6, 7, 8, 9] than in whites, an epidemiological difference that persists after adjusting for other traits such as abdominal obesity .
These differences support the notion that diabetes risk factors occur at higher frequency in populations of Native American descent. A genetic component for this bias was suggested in the form of the ‘thrifty gene’ hypothesis and its interaction with the environment first posed by J. Neel over 40 years ago . A similar hypothesis was proposed for Mexican-Americans, based on crude skin pigmentation measures  and subsequently supported by analysis of genetic markers . Initial evidence of a genetic contribution to the higher risk of type 2 diabetes in Native Americans was provided by the observation that Pima Indians with type 2 diabetes have significantly more Native American ancestry than their normoglycaemic counterparts  and that the fraction of European ancestry was associated with protective metabolic phenotypes in this population ; the potential contribution of socioeconomic status to these estimates was not fully assessed. In addition, the recent identification of the ABCA1 R230C variant in Mexican admixed individuals (mestizos) illustrates that exclusive gene variants derived from Native American ancestry can indeed influence type 2 diabetes risk [16, 17].
The advent of comprehensive databases cataloguing genetic variation , the development of high-throughput genotyping technologies and the availability of DNA samples from multiple populations make it possible to select a set of single nucleotide polymorphisms (SNPs) that are highly informative for geographic ancestry, commonly termed ancestry-informative markers (AIMs). Thus, SNPs that have widely divergent allele frequencies in ancestral populations can be compiled to make such determinations of ancestral origin. When evenly spaced throughout the genome, approximately 2,000 AIMs can be employed to infer ancestry at each genomic location for admixture mapping; but a much smaller set of randomly distributed AIMs (<100) can also be genotyped in admixed persons to derive a fairly accurate estimate of genetic ancestry, expressed as a proportion of each individual’s genome. Such compilations specific to Latino populations have recently been done by three different groups, including our own [19, 20, 21].
In anticipation of genome-wide admixture mapping, AIMs have been applied to Latino samples with the goal of estimating the genetic contribution to the increased diabetes prevalence in this population. An initial study performed in American Latinos and stratified by neighbourhood detected a strong association between Native American ancestry and socioeconomic status; however, the authors concluded that despite the presence of such confounding, a genetic component to the increased disease prevalence was likely . The association between type 2 diabetes and Native American ancestry was further substantiated by Parra et al. in Mexican-Americans from the San Luis Valley, Colorado; but once again, controlling for income and education abolished the statistical significance of the finding . More recently, a similar study was conducted in a sample of 286 unrelated diabetic patients and 275 controls assembled from users of the Social Security hospital in Mexico City, which is thought to capture a large middle segment of the population devoid of upper- and lower-income outliers. This report found a non-significant increase in Native American ancestry among participants with diabetes, but a much stronger association between higher educational level and both European ancestry and non-diabetic status .
The success of admixture mapping in Latino populations is predicated on the ability to dissect these extra-genetic confounders from the genetic association. The correlation of socioeconomic status with ancestry in samples from two distinct US American locations and in an additional sample from Mexico City suggests that this may be a general phenomenon. To replicate and expand these observations, and as a way of investigating its likely impact on ongoing admixture mapping studies, we evaluated the contribution of socioeconomic status to the ancestry–diabetes relationship in two separate, non-US Latino populations from North and South America.
Demographic characteristics of genotyped samples
48.5 ± 12.5
56.0 ± 9.9
64.1 ± 10.3
60.7 ± 9.9
28.1 ± 5.8
25.6 ± 3.7
26.7 ± 4.6
25.4 ± 4.1
To estimate allele frequencies in the ancestral populations and project ancestry proportions, we also studied several unmixed populations: European Americans from Baltimore and Chicago (n = 77) and from the HapMap Centre d’Etude du Polymorphisme Humain (Utah residents with northern and western European ancestry) collection (n = 60); Spaniards from Valencia (n = 31); West Africans from Ghana (n = 52) and from the HapMap Yoruba in Ibadan, Nigeria collection (n = 60); and Native Americans from the Mazahua (n = 22), Zapotec (n = 60), Mixtec (n = 23) and Mixe (n = 29) populations. All participants gave informed consent and studies were carried out in accordance with the principles of the Declaration of Helsinki as revised in 2000.
Estimation of socioeconomic status
In Colombia, we had access to a government-assigned ‘property band’ based on property valuation of each individual home for the purposes of billing for public utilities and ranging from 1 (lowest) to 6 (highest). An assignment to one of these strata was made for each participant based on the telephone number they provided at the time of their interview. We contrasted this information with other data we collected (such as home or car ownership) and confirmed a close correlation between the governmental water usage rank and these other markers of socioeconomic status.
In Mexico, socioeconomic status was determined by social workers at the National Institute of Medical Sciences and Nutrition using a standardised and validated tool currently applied in all Mexican National Institutes of Health studies . Questionnaires include information on six categories: family monthly income, occupation of the head of the household, percentage of family income spent on food, type and characteristics of residence (owner-occupied, rented or shared with extended family), place of residence and the presence of chronic illnesses in other family members. Points are given for each category and the sum is used to assign participants to one of six socioeconomic status bands (lowest to highest). Supporting documents are required to validate the information. When information was not complete or questionable, socioeconomic status assignments were further explored and confirmed via unscheduled home visits.
Samples from all populations were genotyped using the Sequenom MassARRAY technology  at 67 AIMs (Electronic supplementary material [ESM] Table S1). These SNPs are on average 49% different in frequency between Native Americans and European Americans, and are spaced by at least 10 cM (or >10 Mb) on chromosomes 1 to 22 (see Supplementary Table B in Smith et al. , from which these markers were selected). The large number of markers and the high degree of informativeness per marker in this set yield precise estimates of the proportion of European ancestry for each individual and also a measure of the precision of each estimate as a standard error. The average standard error for the percentage European ancestry estimate was ±7.2% (in Mexicans) and ±8.0% (in Colombians) respectively. We had complete data for 89% of genotypes; this reduction below 100% reflects the fact that slightly different panels of markers were genotyped on some samples. The missing data are not expected to affect our estimates of ancestry proportion.
Association of non-European ancestry with type 2 diabetes in Latinos
Country of origin
European ancestry, cases (%)
European ancestry, controls (%)
Logreg coefficient canc
Pseudo-r2 for canc
p value for canc
33 ± 20
46 ± 24
2 × 10−5
56 ± 15
59 ± 14
For each regression analysis, we computed effect sizes, p values and pseudo-r2 values. We defined pseudo-r2 as the reduction in magnitude of the mean square value of disease outcome minus the predicted probability of disease outcome, comparing each of the four logistic regression models to a constant-term only model.
Association of non-European ancestry with type 2 diabetes in Latinos is confounded by socioeconomic status
Country of origin
cSES + canc
2 × 10−10
9 × 10−8
8 × 10−7
1 × 10−5
We also conducted a stratified analysis in which we considered each of five ancestry strata (0–20%, 20–40%, 40–60%, 60–80% or 80–100% European ancestry) and analysed associations between socioeconomic status and type 2 diabetes within each stratum (Fig. 2c,d). For Mexicans, we obtained nominal p values of 0.003, 0.12, 0.00001, 0.11 and 0.01, each with a negative coefficient for socioeconomic status. For Colombians, we excluded ancestry stratum 0–20% (which contained only six samples, each with type 2 diabetes) and obtained nominal p values of 0.004, 0.0006, 0.006 and 0.49, with a negative coefficient for socioeconomic status for each of the three significant p values. These results are consistent with the association between socioeconomic status and type 2 diabetes remaining strong after accounting for ancestry (Table 3).
Association of socioeconomic status and non-European ancestry with type 2 diabetes is not confounded by BMI
Country of origin
cSES + canc + cBMI
6 × 10−7
3 × 10−5
These data demonstrate a genetic association between type 2 diabetes and individual non-European ancestry proportions in Latinos, while also showing that this evidence of association is highly confounded by socioeconomic status. Combining our study with previous results, this pattern has now been observed in North American, Central American and South American populations [13, 22, 23], with similar trends also noted in African-Americans .
Our study design has several limitations. First, our strategies for estimating socioeconomic status are necessarily imprecise and differ between both locations. Second, because type 2 diabetic participants and controls were ascertained separately, some bias may have been introduced despite best efforts to match recruitment procedures. Third, our sample size may have been too small to assess the relative effects of highly correlated, potentially confounding variables. And fourth, given the diversity inherent in Latinos, our findings may not be generalisable to all Latinos or to other populations with admixed Native American ancestry. Nevertheless, we believe we have taken analytical measures to address potential confounders and that the similar results obtained in two different populations strengthen our conclusions.
Our results show that, due to the correlation between socioeconomic status and Native American ancestry, it is difficult to disentangle the relationship between genetics and social factors in the contribution to disease risk. Low socioeconomic status can increase diabetes risk via a variety of mechanisms such as poor access to care, neglect of preventive strategies, a lower ability to exercise or an unhealthy diet. The question of whether the increased susceptibility to type 2 diabetes in Native American populations is caused by genetic or social factors (or a combination of both) is difficult to resolve accurately as long as low socioeconomic status is more prevalent among persons with greater Native American ancestry; answering it appropriately would require that admixed case–control cohorts be carefully matched for socioeconomic status, a much larger sample be used or twin studies discordant for socioeconomic status be undertaken. In some cases (e.g. the ABCA1 R230C polymorphism), the strong association between a genetic variant and type 2 diabetes risk may be largely unaffected when adjusted for different confounders, including educational level as a surrogate of socioeconomic status [16, 17]. On the other hand, due to the strong association between Native American ancestry and socioeconomic status, use of the latter as a covariate in admixture studies might mask a true association signal in populations where low socioeconomic status is highly correlated to ancestral background.
The association between type 2 diabetes and Native American origin remained nominally significant after adjustment for socioeconomic status in the Mexicans, but not in the Colombians. This may be due to a weaker initial signal in the Colombians, differences in socioeconomic status ascertainment between the two populations or a combination of both factors. Because the Mexican sample was enriched for early-onset cases (n = 81), it is possible that genetic susceptibility to type 2 diabetes may have been stronger among the Mexicans. Another potential reason for the stronger p value in Mexicans is the very different distribution of Native American heritage in Mexicans vs Colombians (Fig. 1a, b). Mexicans have a wide range of ancestry proportions, so enrichment of the type 2 diabetes group by individuals with more Native American ancestry could cause a wider separation of ancestry proportions that is easier to measure. Conversely, Colombians have a narrower range, which may be due to more generations of mixing and homogenisation . Just as in African-Americans , this phenomenon limits the separation in ancestry proportion between cases and controls, even if a strong effect of ancestry causes oversampling of people who have inherited more of one ancestry. This may explain some of the discrepancies observed in the literature . Thus it is possible that Pima Indians, like the samples in this study from central Mexico, have a wide range of ancestry proportions, making the oversampling of individuals with more extreme values of one ancestry easily detectable, whereas the San Luis Valley samples, like our samples from Colombia, have a narrow spread of ancestries making the effect more subtle. A higher degree of admixture and homogenisation in a society may also lead to decreased health disparities of a social nature. Alternatively, the lack of precision inherent in estimations of socioeconomic status (which often rely on information provided by participants who may have secondary motives for reporting a different stratum) may also have contributed to the observed differences. Our findings also illustrate the difficulties in making generalisations about Latino populations, which are often characterised by very diverse ancestral origins and environmental contexts that may affect disease, socioeconomic status and their interactions.
These results also have implications for the prospects of gene mapping to find risk factors for type 2 diabetes in Latinos. Based on the epidemiological observation that type 2 diabetes is more prevalent in Latinos than in populations of European descent, it has been speculated that genetic risk factors for type 2 diabetes must be more common in Native Americans than in Europeans. Such variants could be located by admixture mapping, a technique that scans through the genome of affected individuals of mixed Native American, European and African ancestry, searching for regions with unusually high proportions of one ancestry compared with the genome-wide average. The observation that increased type 2 diabetes prevalence in Latinos is at least partially explained by environmental factors decreases the likelihood that significant genetic risk factors can be easily found through this approach, although it does not rule out the possibility that the method can work. These results do not imply that genome-wide association scans cannot detect variants associated with either increased or decreased Native American ancestry. Indeed, when a variant is much more common in one population than another, it will be easier to achieve genome-wide statistical significance in a genome-wide scan performed in the former than it will be in the latter [29, 30, 31].
It is important to recognise that socioeconomic status will not be a confounder in type 2 diabetes admixture mapping in the same way as it is for the present analysis. In admixture disease mapping, each locus in the genome is separately tested for association, using the rest of the genome as a control to assess whether the locus stands out. If an association is observed at any locus, it must signify a real genetic connection of that particular locus to disease (socioeconomic status is not expected to be locus-specific). However, because of the strong effect of socioeconomic status on type 2 diabetes risk in Latinos, rich information on socioeconomic status may be an important covariate that will increase statistical power in scans for genetic risk factors for type 2 diabetes. Tests for interactions between genes and environment in Latinos with type 2 diabetes may offer more power to detect risk factors for the disease than would be afforded if the modulatory effect of socioeconomic status on genetic risk were not taken into account.
We thank all the participants for their contribution of phenotype and DNA samples, and S. Jiménez-Ramírez for technical assistance. J. C. Florez is supported by a Massachusetts General Hospital Physician Scientist Development Award and a Doris Duke Charitable Foundation Clinical Scientist Development Award. A. L. Price was supported by a Ruth Kirschstein National Research Service Award from the NIH. R. Saxena is supported by a Ruth Kirschstein National Research Service Award from the NIH. D. Reich is supported by a Burroughs Wellcome Career Development Award in the Biomedical Sciences. Support for this project was provided by discretionary funding from Harvard Medical School (to D. Reich), NIH grants NS043538 (to A. Ruiz-Linares) and DK073818 (to D. Reich), Colciencias grants 1115-04-16451 and 1115-04-012986 (to G. Bedoya and A. Villegas), and a Universidad de Antioquía grant (CODI/sostenibilidad 9889-E01321) to G. Bedoya and A. Villegas.
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
- 16.Villarreal-Molina MT, Aguilar-Salinas CA, Rodriguez-Cruz M et al (2007) The ATP-binding cassette transporter A1 R230C variant affects HDL cholesterol levels and BMI in the Mexican population: association with obesity and obesity-related comorbidities. Diabetes 56:1881–1887PubMedCrossRefGoogle Scholar
- 24.Silva Arciniega MR, Brain Calderon ML (2006) Validez y confiabilidad del estudio socioeconómico. Universidad Nacional Autónoma de México y Escuela Nacional de Trabajo Social, Mexico CityGoogle Scholar