Rheumatoid arthritis (RA) is an autoimmune inflammatory rheumatic disease that affects mainly synovial joints among many tissues and organs. It affects approximately 1% of the population worldwide1 and, although this condition can develop at any age, RA affects women more frequently than men and is mainly diagnosed between the ages of 40–60 years. In Latin America (LA), differences towards women seem to be higher, whereas prevalence has been estimated between 0.2–0.5%2,3. In Chile, there are data showing that the overall prevalence of RA based on clinical examination is 0.46%4.

The etiology of RA is multifactorial and partially unknown because of the complex interactions between genetic and environmental factors. Approximately 50% of RA risk is thought to be genetic, and one-third of this risk is associated with the human leukocyte antigen (HLA) locus5, specifically HLA-DRB1 shared alleles (SE), which encode a common amino acid sequence6. Since 2007 about 101 RA risk loci have emerged from genome-wide association studies (GWAS) and subsequent GWAS meta-analyses7,8, mostly in individuals from European and/or Asian populations (Supplementary Table 1). In fact, none of the GWAS pertaining to RA has been performed in LA populations (Supplementary Table 1).

It is generally accepted that many common risk variants are shared between multiethnic populations, but allele frequencies of disease-associated single nucleotide polymorphisms (SNPs) vary significantly among ethnic groups due to genetic drift or selection9. Linkage between causal variants and tag SNPs included in genotyping microarrays might vary depending on population-specific pattern of recombination which in turn, is largely affected by population size, founder effects and admixture processes. In addition, populations with different histories may carry distinct causal mutations even in similar loci. All of these factors can preclude generalization of genetic associations from one population to another, and suggest testing for locus- or haplotype-wise rather than SNP-wise generalization10.

López Herráez et al.11 examined susceptibility loci for RA in LA populations. In this study, a strong association with HLA region was observed, with three independent effects, probably due to the diverse origin of the patients (Argentina, Mexico, Chile, and Peru). Some of the RA associations previously reported in GWAS were also replicated in the study by López Herráez and coworkers, but with moderate significant values (including protein tyrosine phosphatase, non-receptor type 22 (lymphoid) [PTPN22] and signal transducer and activator of transcription 4 [STAT4] genes). However, in general, genetic association studies on RA have not been robustly replicated in LA populations. Therefore, the aim of the present study was to carry out a high-density SNP genotyping in candidate genes to test their association with susceptibility to RA in the Chilean population, in order to provide insight on the cross-ethnic generalizability of known European and Asian RA risk loci to LA populations.


In the present study, five hundred and sixty-three (42.0%) of the included individuals suffered RA. Supplementary Table 2 shows the characteristics of the RA patients that were used for the analysis. The mean age was 48 and 58 years for cohort 1 and 2, respectively, and 84.7% and 81.0% of the patients were women. The mean duration of the disease was 8 years. Anti-cyclic citrullinated peptide (CCP) antibodies were determined in a total of 218 patients being positive in 164 of them (75.23%), whereas rheumatoid factor (RF) was determined in 300 patients being positive in 264 (88.0%). The RA group did not differ from the control group with regard to any of the clinical parameters included in the study (data not shown).

The present findings do not show replicable association of individual SNPs with RA. Among 128 SNPs genotyped, 118 passed all the quality filters, after excluding SNPs with a minor allele frequency <0.01 or missingness > 0.1 and those that were not in Hardy-Weinberg equilibrium (HWE) (p < 0.001) (Supplementary Table 3). Only two markers (2%) showed significant associations (p ≤ 0.01): rs1635567 and rs2469434 (Table 1), of which none was confirmed in Cohort 2. When data from both cohorts were combined, rs2469434 was still significant whereas rs1635567 could not be tested because the assay failed in Cohort 2. However, the combined analysis revealed a new significant association for rs2451258 (combined p = 5 × 10−3; p = 0.09 after Bonferroni correction for multiple testing) (Table 1). Eighteen markers exhibited suggestive associations (p < 0.05), whereas the associations of the remainder of SNPs included in the study were not significant. The significantly-associated SNPs in peptidyl arginine deiminase, type IV (PADI4), Protein tyrosine phosphatase, non-receptor type 22 (PTPN22), signal transducer and activator of transcription 4 (STAT4), cytotoxic T-lymphocyte-associated protein 4 (CTLA4), tumor necrosis factor, alpha-induced protein 3 (TNFAIP3), and chemokine receptor 6 (CCR6 genes), identified in Caucasian and Asian populations, were not replicated in the Chilean population (Supplementary Fig. 1).

Table 1 Association analysis of the replicated SNPs as single markers in cohort 1 and cohort 2 and the joint analysis (only SNPs with P ≤ 0.1 are shown).

We next determined the correlation between odds ratio (OR) derived from our study and OR previously reported in GWAS from Caucasian and Asian population12 (Fig. 1). There was no correlation between data belonging to Caucasian population and our data (r = −0.041, p = 0.768), or between Asian populations and our data (r = 0.152, p = 0.302). In addition, the allele frequencies of RA-associated SNPs varied significantly among different ethnic groups (Fig. 2, Supplementary Fig. 2). The results of allele frequencies were concordant between our study (healthy controls vs. RA cases, p-value < 10−15 and r = 0.98) and ChileGenomico dataset (healthy controls vs. ChileGenomico, p-value < 10−15 and r = 0.96). However, the allele frequency in European, East Asian, Aymara and Mapuche samples showed variability compared to our cohort (r ≤ 0.70).

Figure 1
figure 1

Correlation between log(odds ratio) from data published in GWA studies carried out in Caucasian an Asian population versus log(odds ratio) reported in this study (7). OR = odds ratio; GWA = genome association analysis. The respective regression lines with the Pearson correlation’s r-values are indicated.

Figure 2
figure 2

Correlation matrix between allele frequencies of the SNPs analyzed in different populations. MAF = minor allele frequency; AF = allele frequency; AFR = African; EUR = European; EAS = East Asian; AYM = Aymara; MAP = Mapuche; CHG = ChileGenomico.

The sliding window test revealed several SNP blocks that were associated with RA (Table 2). The p values for the strongest sliding window (ranging from p = 9.82 × 10−3 to 2.04 × 10−3) were associated with regions around STAT4 gene. In addition to the sliding window test, we also performed case-control studies based on linkage disequilibrium (LD) haplotype block reconstruction, not revealing associations between SNPs and RA. Detailed haplotype block information and the LD plot around the STAT4 gene are shown in Supplementary Fig. 3.

Table 2 Association analyses of sliding windows of 4–19 single-nucleotide polymorphisms within STAT4 (only p values < 10−2 are shown), using the chi-square test in Plink software (15).


The present study aimed to investigate the association of SNPs markers in candidate genes and RA in the Chilean population. Our main finding was a little replication of previously reported genetic associations with RA. Indeed, only 2% of know RA loci from GWAS studies in populations of European or Asian origin were significantly associated in our LA population, and just 11% showed a suggestive association. This was unexpected because SNPs in well-known RA loci were tested, such as PADI4, PTPN22, STAT4, CTLA4, TNFAIP3, and CCR6 -none of which replicated. There are a number of reasons why previously GWAS-significant findings might not replicate in independent cohorts, as reviewed by Kraft et al.13. The small sample size of our study may be responsible for the modest number of SNPs that showed associations validated in our participants. Sample sizes larger than the one used here are needed to reach high confidence levels and strong statistical power. In this regard, the low prevalence of the disease restricted the number of patients that we were able to recruit for our study. A long-term effort to progressively collect numerous patients’ samples from biobanks might allow to perform more powered genetic studies and to test for generalizability of genetic associations. Similarly, we believe that the small sample size is a main reason for the lack of differences we found between endophenotypes. Our study did not reach statistical power for one-third of the SNPs analyzed, which might provide a possible explanation, at least in part, for the lack of replication of the results in the Chilean population. However, if lack of power was the only explanation, it is expected that, overall, the OR values would follow the same trend in Chilean patients as in other populations. However, ORs in Chile show absolutely no correlation with estimates from studies with Europeans and only a very week positive association with Asians (Fig. 1). This suggests that genetic divergence between populations at these loci may be one of the reasons of the lack generalization of SNP associations.

Differences in LD patterns between populations may preclude replication of association, which can be caused by multiple factors such as different demographic history including population-specific bottlenecks, genetic drift, selection, and recent admixture, among others14. Large diversity in LD among populations from different continents, including the Americas, is well documented15. Furthermore, RA is a trait associated with loci responsible for the immune response, which in turn is highly associated with local adaptations and disease resistance. In support for the above interpretation of our results, although we did not find any significant SNP-wise association of STAT4 with RA, we did find association for this locus when testing haplotypes instead of genotypes. Using the sliding window test revealed several haplotype associations with RA, suggesting the possible existence of untested (potentially functional) genetic variation within STAT4 in the Chilean population, a result that other studies with different populations might had failed to detect or might had not shown the strongest signal. Further investigations are required to confirm these findings. The strongest association was observed for the SNP rs2451258 located upstream of the T-cell activation RhoGTPase activating protein (TAGAP) gene, although the p-value was >0.05 after Bonferroni correction for multiple testing. This variant is not within any protein-coding sequence or disrupted a non-coding functional motif, but TAGAP would be a promising biological candidate gene12. TAGAP gene encodes a member of the Rho GTPase-activator protein superfamily, but little is known about their role in the immune system. Additional investigations, with higher of variants in the region are required to confirm this hypothesis.

Polygenic risk scores could be the next great stride in genomic medicine, which is generating a considerable debate regarding their use in complex phenotypes16. Recently, Khera et al. proposed that it is time to contemplate the incorporation of polygenic risk prediction in clinical care17, projecting these scores across a wide variety of diseases. The risk scores have been generated and tested mainly in individuals of primarily European ancestry. In the present study, significant values of the previously detected SNP-wise associations were moderate and a better generalizability was found when testing association between phenotype and haplotypes rather than SNPs. Moreover, allele frequency vary between populations of different ancestries. These results suggest the existence of genomic patterns in Chilean, and probably other LA populations, that differentiate them from Europeans with regard to loci that are relevant for RA. This can be caused by different demographic histories (e.g., past population bottlenecks and migration events, or ancestries18,19,20). Haplotype-based associations may capture the interacting effects among two or more potential causal variants within certain genomic region, which single-variants approach cannot detect. Therefore, haplotype-based approaches show a greater power to map susceptibility genes in complex traits than single-marker methods21,22. These results support the need for GWAS in LA populations, including Chileans, to discover potentially novel loci accounting for genetic risk for RA, to investigate the contribution of genetic ancestry, and to improve performance of polygenic prediction models in these populations.


Study participants

A total of 1.340 individuals were studied as two distinct cohorts. Cohort 1 comprised 313 patients with RA and 487 healthy control subjects; cohort 2 included 250 RA patients and 290 healthy controls. The patients with RA were diagnosed following the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria23. The study was approved by the Ethical Committee of the “Servicio de Salud del Maule” (registration number 04/2014), Chile; and all individuals gave their written informed consent prior to enrolling in the study. All methods were performed in accordance with the relevant guidelines and regulations.

SNP selection and genotyping

A total of 128 SNPs from 73 genes were chosen for genotyping from previous GWAS in populations of diverse ethnic background7,11. Supplementary Table 3 shows SNPs elected for our analysis. Some of them were selected as haplotype-tag-SNPs (ht-SNPs) based on LD patterns located within our candidate genes (PADI4, PTPN22, STAT4, CTLA4, TNFAIP3 and CCR6) and using the HapMap dataset24. Haplotype tagging (Ht)-SNPs were selected using the Tagger tool of Haploview25, under the following criteria: minor allele frequency ≥0.01 and r2 > 0.8, and based on the HapMap populations (CEU, CEU + TSI and MEX). Some of the identified associations were validated by genotyping 23 SNPs in the cohort 2. The SNPs were genotyped using the OpenArray®™ TaqMan platform (Applied Biosystems Inc.) in the test (Cohort 1) and replication (Cohort 2) samples. The genotyping assays were performed at the Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO) (Cohort 1), Granada, Spain; and at the Centro Nacional de Genotipado (Cohort 2), at the Santiago de Compostela node, Spain.

Genotyping data from reference populations

In order to assess ethnic differences in allelic frequencies for the SNPs evaluated in this work, we obtained genotypes for 108 AFR, 99 EUR, and 103 EAS unrelated individuals from the 1000 Genomes Project Phase 3 dataset ( For Amerindian ancestry, we obtained genotypes for 85 individuals of Aymara ancestry (AYM), 54 individuals of Mapuche ancestry (MAP), and 348 of Chilean ancestry (CLG) from the ChileGenomico Project ( AYM, MAP, and CLG individuals were genotyped using the Axion LAT1 Array (Affymetrix, Inc., Santa Clara, California, U.S.) and imputed using the 1000 Genomes Project phase 326.

Statistical analysis

Power calculations were done with the GAS Power Calculator tool ( assuming a multiplicative model, with OR = 1.5, a significance level of 0.05 and an RA prevalence of 0.5%. Only SNPs that met the quality criteria of a minor allele frequency (MAF) > 0.01, missingness < 0.1, and/or HWE P > 0.001 were considered for inclusion in the association analyses (Supplementary Table 3). Allele frequencies were compared between RA patients and control populations by chi-square test, and OR with 95% confidence intervals (95% CI) were calculated using PLINK software (v1.07)27. Haplotype analysis was performed using Haploview software (v4.2)25. In addition, haplotypes based on 1-bp sliding windows of 2 to 21 SNPs each were also constructed. Association analyses were done with the chi-square test using PLINK. Pearson’s correlations and linear regression were used to evaluate differences between genetic background. The LocusZoom web-based resource was used to generate plots of association results by genomic region28.