Introduction

Rheumatoid arthritis (RA) is a systemic autoimmune disease that is characterized by chronic, destructive, and debilitating arthritis. The most common inflammatory arthritis, it affects approximately 1% of the population [1]. The etiology of RA is unknown, but it is presumed that environmental factors trigger its development in the genetically predisposed. The strongest genetic risk factor, which is responsible for approximately 30% of the genetic contribution to the development of RA, is human leukocyte antigen (HLA) type [2, 3]. The HLA-shared epitope (SE) is the strongest known genetic risk factor for RA, with two copies associated with a relative risk of 5 to 6 in Caucasians [46]. However, RA, like many complex human diseases, is polygenic in origin. It is likely that many other genes, both inside and outside the HLA region, contribute to disease predisposition [5, 710]. RA genetic studies have reported polymorphisms PTPN22 (R620W, rs2476601) [9, 1116], CTLA-4 (CT60, rs3087243) [17, 18], and PADI-4 (PADI4_94, rs2240340) [8] to be associated with increased risk for RA. More recently, polymorphisms in or near the genes encoding STAT4 [19], TRAF1-C5 [20], TNF-AIP2 [16], and IL2RA [16] have been reported in whole-genome scans. The non-HLA genetic polymorphism that has been most strongly replicated across multiple independent studies is PTPN22 [9, 1114, 16, 2124]. This missense allele (C→T) has been associated in past studies with an odds ratio (OR) for RA of approximately 1.8 [9], and it appears to carry greater risk for autoantibody-positive RA [12, 24, 25]. Homozygotes for the variant T allele are at greatest risk for RA (OR = 4.6) [12]. This polymorphism has also been associated with increased risk for type I diabetes [26, 27] and systemic lupus erythematosus [11, 13]. The associations of CTLA-4 and PADI-4 polymorphisms with RA risk have been less well replicated [17, 18, 28]. In a large pooled replication study, CTLA4 CT60 polymorphism was associated with anti-cyclic citrullinated peptide antibody (anti-CCP)-positive RA [25].

Twin studies conducted in the UK and Finland have estimated that 50% to 60% of the variation in RA susceptibility is accounted for by genetic factors [29], leaving 40% to 50% probably due to environmental exposures. Cigarette smoking is the best established environmental risk factor for RA, with risk increasing in proportion to duration and intensity of exposure [3035]. Case-control studies conducted in Sweden, Holland, and North America have identified an interaction between presence of the HLA-SE alleles and cigarette smoking in determining RA risk, in particular that of anti-CCP-positive RA [3638]. Female reproductive factors such as early age at menarche, irregular menses, and use of postmenopausal hormones have also been related to increased RA risk, and prolonged duration of breast-feeding was found to be protective against development of RA in the Nurses' Health Study (NHS) [3941].

We aimed to validate previous findings of increased risk for RA associated with polymorphisms in the PTPN22, PADI-4 and CTLA-4 genes, and to assess whether of behavioral and reproductive factors that are known to be associated with RA risk influence these findings. We also investigated potential additive and multiplicative interactions between each of these polymorphisms and the presence of the HLA-SE. To do this, we conducted a case-control study nested within the NHS and NHSII; those studies include two large cohorts of women, who were followed closely over many years for behavioral and reproductive factors before the onset of disease.

Materials and methods

Study population

The NHS includes a prospective cohort of 121,700 female nurses, aged 30 to 55 years in 1976 when the study began. The NHSII was established in 1989, when 116,608 female nurses aged 25 to 42 years completed a baseline questionnaire about their medical histories and lifestyles. Ninety-four per cent of the NHS participants from 1976 to 2002, and 95% of NHSII participants from 1989 to 2003 have remained in active follow up (5% to 6% no longer respond to questionnaires and have not been confirmed as dead). All aspects of this study were approved by the Partners' HealthCare Institutional Review Board.

Identification of rheumatoid arthritis

As previously described [35], we employed a two-stage procedure in which all nurses who self-reported any connective tissue disease received a screening questionnaire for connective tissue disease symptoms [42], and – if positive – a detailed medical record review for American College of Rheumatology (ACR) classification criteria for RA [43], in order to identify and validate incident cases of RA. The presence or absence of rheumatoid factor (RF) and other features of RA was based on medical record review. Those in whom four of the seven ACR criteria were documented in the medical record were considered to have definite RA. For this nested case-control study, we also included a small number of women (n = 14) with three documented ACR criteria for RA, a diagnosis of RA by their physician, and agreement by two rheumatologists on the diagnosis of RA.

Population for analysis

We excluded prevalent RA cases diagnosed before the cohort was assembled, nonresponders, and women who reported any connective tissue disease that was not subsequently confirmed to be RA by medical record review. Women were censored when they failed to respond to any subsequent biennial questionnaires. Among the women in each cohort who had provided a sample for genetic analyses, each participant with confirmed incident RA was matched by year of birth, menopausal status, and postmenopausal hormone use to a healthy woman in the same cohort without RA. To minimize population stratification, and given that most cohort participants are Caucasian, we limited the analyses to Caucasian matched pairs of women. In 1992 (NHS) and 1989 (NHSII), all participants were asked to provide data concerning their own racial backgrounds in more detailed categories. Of the Caucasian women in NHS and NHSII included in this analysis, 2% reported pure Scandinavian heritage, 15% reported pure Southern European, and 83% reported other or mixed Caucasian backgrounds. There were no significant differences in the distributions of these ethnicities between cases and controls (χ2 with two degrees of freedom, P = 0.30).

Blood sampling

From 1989 to 1990, 32,826 (27%) NHS participants aged 43 to 70 years agreed to provide blood samples for future NHS studies. Between 1996 and 1999, 29,613 (25%) of the women included in the NHSII cohort (aged 32 to 52 years at that time) also agreed to have their blood drawn for future investigations. All samples were collected in heparinized tubes and sent to us by overnight courier in chilled containers. On receipt, the blood samples were centrifuged, aliquoted, and stored in liquid nitrogen freezers at -70°F (-57°C). The demographic and exposure characteristics of the NHS and NHSII participants who provided blood samples were found to be very similar to those of the overall cohorts [44, 45].

DNA extraction from blood

DNA was extracted from buffy coats from 96 samples in 3 to 4 hours. A volume of 50 μl of buffy coat was diluted with 150 μl phosphate-buffered saline and processed using the QIAmp™ (QIAGEN Inc., Chatsworth, CA, USA) 96-spin blood kit protocol. The protocol entails adding protease, the sample, and lysis buffer to 96-well plates. The plates are then mixed and incubated at 158°F (70°C), before adding ethanol and transferring the samples to columned plates. The columned plates are then centrifuged and washed with buffer. Adding elution buffer and centrifuging elutes the DNA. The average yield from 50 μl of buffy coat (based on 1,000 samples) is 5.5 μg with a standard deviation of 2.2 (range 2.0 to 16.4). These methods are semiautomated using a Qiagen 8000 robot to increase throughput and decrease manual pipetting errors.

Buccal cell collection method and DNA extraction in NHS

Forty thousand women in NHS who did not give blood in 1989 to 1990 were asked to give a buccal cell sample in 2002. To date we have collected an additional 21,733 buccal cell samples (18% of the NHS cohort). A collection kit was sent to participants, consisting of instructions for the buccal cell collection and the necessary supplies (a small bottle of mouthwash, a plastic cup with a screwtop cap, a ziplock plastic bag and absorbent sheet, and a stamped, self-addressed bubble envelope), as well as an informed consent form. Participants were instructed to fill the cup with mouthwash, swish the mouthwash in their mouth vigorously, and then spit back into the cup. Returned samples were processed using ReturPureGene DNA Isolation Kit (Gentra Systems, Minneapolis, MN, USA) to extract genomic DNA from human cheek cells. The extracted DNA was archived in liquid nitrogen freezers using specific tracking software. The average DNA recovery from these specimens measured using PicoGreen was 59 ng/μl.

Whole-genome amplification

For all genomic DNA samples, an aliquot was put through a whole-genome amplification protocol using the GenomPhi DNA amplification kit (GE Healthcare, Piscataway, NJ) to yield high-quality DNA sufficient for single nucleotide polymorphism (SNP) genotyping.

Single nucleotide polymorphism genotyping

DNA was genotyped using Taqman SNP allelic discrimination on the ABI 7900 HT (Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404 USA) using published primers [8, 9, 18, 46]. We studied only the CTLA-4 CT60 (rs3087243) allele. We chose the PADI4_94 allele (rs2240340) of the haplotype first described by Suzuki and coworkers [8], because it had the strongest association in a Japanese population and was replicated in a large meta-analysis [25]. Using the same methods, we also genotyped the lactase gene (rs4988235), which is known to exhibit substantial variation in allele frequency from Northern to Southern Europe, in order to test for population stratification in this nested case-control study [47, 48].

HLA-DRB1shared epitope determination

Low-resolution HLA-DRB1 genotyping was performed by polymerase chain reaction with sequence specific primers using OLERUP SSP kits (QIAGEN, West Chester, PA, USA). We used primers to amplify DNA samples that contained sequences for HLA-DRB1*04, *01,*10 and *14, along with consensus primers and appropriate positive and negative control samples. For samples with positive two-digit HLA signals, sequence-specific primers were used for high resolution four-digit shared epitope allele detection of DRB1*0401, *0404, *0405, *0101, *0102, *1402, and *1001. OLERUP SSP computer software (QIAGEN) was used to determine four-digit HLA types. Quality control split samples were included, randomly interspersed with study samples.

Covariate information

Information was collected from the women in both cohorts via biennial questionnaires regarding diseases, lifestyle, and health practices. Age was updated in each cycle. Reproductive covariates were chosen based on our past findings of associations between reproductive factors and risk for developing RA in this cohort [41]. Data on parity, total duration of breast-feeding, menopausal status, and postmenopausal hormone use were selected from the questionnaire cycle before the date of RA diagnosis (or index date in controls). Self-reported menopausal status and age at menopause are highly reproducible in our cohorts; in a validation study of a subsample of NHS participants, 82% of naturally postmenopausal women reported the same age at menopause (within 1 year) on two questionnaires mailed 2 years apart [49].

Participants in both cohorts were asked at baseline whether they were a current smoker or had ever smoked in the past and the age at which they began to smoke. Current smokers were asked for the number of cigarettes typically smoked per day and former smokers reported the age at which they stopped smoking and the number of cigarettes smoked per day before quitting. On each subsequent questionnaire, participants reported whether they currently smoked and the number of cigarettes smoked per day. From these reports, we calculated pack years of smoking (product of years of smoking and packs of cigarettes per day).

Other potential confounders examined included, body mass index, which was computed for each 2-year time interval using the most recent weight (in kilograms) divided by height (in meters squared), as reported at baseline. Alcohol intake was reported at least every 4 years and categorized in grams per day. Husband's educational level was assessed in 1992 in NHS and 1999 in NHSII, and was included as a proxy for socioeconomic level.

Statistical analyses

We verified Hardy-Weinberg equilibrium for each of the genotypes among controls in each of the datasets (NHS blood, NHS cheek cells, and NHSII blood). We employed conditional logistic regression analyses, conditioned on matching factors, and adjusted for potential confounders, including cigarette smoking and reproductive factors assessed before diagnosis of RA. All analyses were first conducted separately in each cohort and then on data pooled from the two cohorts. Because the P value for heterogeneity was significant for the CTLA-4 genotype, we also meta-analytically pooled results from the two cohorts using a DerSimonian and Laird random effects model [50]. In analyses stratified by the presence of RF among the RA cases, we employed unconditional logistic regression analyses, adjusting for each of the matching factors, in addition to the covariates above. For analyses of PTPN22, we employed a dominant model because the minor allele frequencies were low (9% in controls and 14% in cases). In analyses involving CTLA-4 and PADI-4, we assessed the risk for RA in dominant, additive, and recessive models.

Gene-environment and gene-gene interactions

We conducted assessments for gene-environment interactions by testing for both additive and multiplicative interactions. For additive interactions, we calculated the attributable proportion due to interaction using a 2 × 2 factorial design to analyze the data [5153]. (There is evidence of interaction when the attributable proportion is not equal to 0.) Ninety-five per cent confidence intervals (CIs) were calculated using the delta method as described by Hosmer and Lemeshow [54]. We tested for multiplicative interaction using an interaction variable (for example, gene × smoking) in the conditional logistic regression models. The significance of the interaction was determined using the Wald χ2 test of the interaction variable. In the combined NHS-NHSII nested case-control study dataset, we assessed for interactions between the presence of each polymorphism and cigarette smoking categorized both as ever/never, and then dichotomized as ≤10 or ≥10 pack-years of smoking, because this is the threshold we previously identified to be associated with increased risk for RA [35]. Using similar methods, we tested for gene-gene interaction between PTPN22 and HLA-SE in influencing RA susceptibility in analyses limited to NHS and NHSII blood samples. SAS version 9.1 (SAS Institute, Cary, NC, USA) was used for all analyses.

Results

A total of 437 pairs of Caucasian women, each containing one woman with incident RA and her matched control, were included in these analyses, after removing 18 women because of missing data for all genotypes examined. The characteristics of the RA cases at diagnosis in each of the two cohorts are shown in Table 1. The cases in the NHS had a mean (± standard deviation) age of 57 years (± 9), as compared with 43 (± 5) in the younger NHSII cohort, because of the different ages targeted for enrollment in the two cohorts. Otherwise, the cases were similar in terms of the prevalence of RF, erosions, nodules, and proportion diagnosed by a member of the ACR. All cases and controls in these analyses were Caucasian, and the mean (± standard deviation) number of ACR criteria for the classification of RA was 5 (± 1) [43].

Table 2 shows the characteristics of the RA cases and matched controls at the time of RA diagnosis (or index date for the controls). A higher proportion of RA cases and controls were postmenopausal at RA diagnosis in the NHS than in the NHSII cohort, but the proportions of premenopausal and postmenopausal women among cases and controls were similar in each of the cohorts, as were the proportions currently receiving postmenopausal hormones. In NHSII a slightly higher percentage of women with RA were parous as compared with their matched controls (94% and 86%), but this was not true in the NHS cohort (91% of RA cases and 95% of controls). Among women in the NHSII with RA, a higher proportion had husbands who were college educated as compared with their matched controls (39% compared to 18%), but this was not true in the NHS cohort (20% in each group). No significant differences in allele frequencies of the lactase gene (rs4988235) in cases compared with controls were found. This argues strongly against any significant population stratification in our samples.

Table 1 Characteristics of RA cases at diagnosis of RA
Table 2 Characteristics of RA cases and matched controls: Caucasian matched pairs

The genotype and allele frequencies of the RA cases and controls for the three candidate genotypes are shown in Table 3. None of the PTPN22, CTLA-4, or PADI-4 genotype distributions deviated from Hardy-Weinberg equilibrium, either in each cohort or in the combined dataset. Overall, genotyping call rates were 97.5% for PTPN22, 96.4% for CTLA-4, 97.9% for PADI-4, and 98.7% for HLA-SE. The frequency of the T allele of the PTPN22 polymorphism was significantly higher among RA cases than among controls (χ2 with one degree of freedom, P = 0.001 for pooled NHS and NHSII cohorts). The mutant alleles were not statistically associated with RA case status for the other two genotypes, namely PADI-4 and CTLA-4. As expected, HLA-SE alleles were highly significantly associated with risk for RA. (A slightly higher frequency of NHS cheek cell DNA samples could not be HLA genotyped: 3% of cases and controls, as compared with 0% to 2% of NHS and NHSII case and control DNA samples from blood.)

Table 3 Genotype and allele frequencies of the RA cases and controls for the three candidate genotypes (PTPN22, CTLA-4, and PADI-4) and genotype frequencies of HLA-SE

Table 4 includes the results of conditional logistic regression analyses of risk for RA associated with each of the genotypes, performed separately in each cohort, and then on pooled data. The final multivariable model includes pack-years of cigarette smoking, age at menarche, regularity of menses, parity, and total duration of breast-feeding. Further adjustment for body mass index, alcohol intake, husband's educational level, and oral contraceptive use did not affect risk estimates either and these were not included in the final models. As is evident comparing the results of a model taking only the matching factors into account, adjustment for potential confounders did not significantly influence results. The risk for RA associated with the PTPN22 variant T allele was elevated (OR = 1.46 [95% CI = 1.02 to 2.08] in a NHS and NHSII pooled multivariable dominant model). These results may have been influenced by the high OR observed in the smaller NHSII cohort (OR = 8.77). The CTLA-4 variant G allele was associated with an increased RA risk among women in the NHS cohort (multivariable dominant model OR = 1.92 [95% CI = 1.10 to 3.35]), but not in the NHSII cohort or pooled results (multivariable dominant model OR = 1.27 [95% CI = 0.88 to 1.84]; the OR was similar [1.29, 95% CI = 0.54 to 3.08] in a random effects meta-analytically pooled dominant model). The PADI-4 genotype was not associated with risk for RA in the NHS and NHSII cohorts in any of the models.

Table 4 Effect of PTPN22, CTLA-4, and PADI-4 genotypes and the risk of RA in NHS, NHSII and pooled Caucasian matched pairs

To pursue potential associations of these polymorphisms with different RA phenotypes, we conducted analyses stratified by RF positivity, because many risk factors, including HLA-SE and cigarette smoking, have been shown to be more strongly associated with RF-seropositive RA [35, 36]. Results of these analyses are shown in Table 5. The effect of the PTPN22 polymorphism was seen primarily for the development of RF-seropositive RA (OR = 1.75 [95% CI = 1.18 to 2.59]).

Cigarette smoking is a strong environmental risk factor for the development of RA, in particular RF-positive RA, and amount and duration are associated with increased risk [35]. We thus investigated potential interactions between the three polymorphisms of interest and the amount and duration of cigarette smoking at the time of RA diagnosis. Table 6 presents the results of analyses in which we tested for both multiplicative and additive interactions between smoking, categorized as ever/never smoking and then dichotomized as ≤10 or ≥10 pack-years of smoking, for each of the genotypes. Among those with the CC genotype of PTPN22, a modest effect of heavy smoking was observed (OR = 1.22 [95% CI = 0.81 to 1.83). However, among those with the PTPN22 T risk allele, the effect of heavy smoking was much more pronounced (OR = 2.50 [95% CI = 1.25 to 5.00]). We observed significant additive and multiplicative gene-environment interactions between heavy cigarette smoking and the presence of the PTPN22 T allele (additive interaction: P = 0.0006; multiplicative interaction: P = 0.04). When smoking was dichotomized as never/ever, there was marginal evidence for additive but not multiplicative interaction. We also tested for genotype-smoking interactions in RF-positive and RF-negative RA cases separately. In stratified analyses, we found significant additive but not multiplicative interactions between the PTPN22 risk allele and heavy smoking for both seropositive and seronegative RA. We did not observe similar gene-smoking interactions for CTLA-4 or PADI-4, for the overall risk for RA, or for RF-positive or RF-negative RA separately. No additive or multiplicative interactions were observed between PTPN22 and HLA-SE (Table 7). (Given potential difficulties with HLA-SE genotyping NHS cheek cell DNA samples, we performed sensitivity analyses with these samples excluded, and the interaction analyses yielded similar and nonsignificant findings.)

Table 5 Stratified analyses of genotype and RA risk in the pooled NHS/NHSII samples
Table 6 PTPN22 genotype and smoking interactions according to RF status in NHS/NHSII pooled samples
Table 7 PTPN22 genotype and HLA-SE interactions according to RF status in NHS/NHSII pooled samples

Discussion

In these two cohorts of women followed prospectively for the development of RA and for multiple potential environmental exposures, we have confirmed that the R620W polymorphism in the PTPN22 gene is associated with increased risk for RA. We did not confirm that the PADI-4 (rs2240340) or the CTLA-4 (rs3087243) polymorphism were associated with increased risk for RA or for RF-positive RA in this population. We did not find that cigarette smoking, parity, total duration of breastfeeding, age at menarche, regularity of menses, menopausal status, or postmenopausal hormone use – all associated with risk for RA in past studies – were important confounders of the relationships between these genotypes and RA. However, we did uncover a significant multiplicative gene-environment interaction between heavy smoking and PTPN22 in determining RA risk.

The C→T polymorphism at position 1858 of the PTPN22 gene interferes with the function of the PTPN22/Csk complex, which is an important inhibitor of T-cell signaling, hindering its ability to suppress T-cell activation [9, 26, 55]. Similar to past reports, we have found the elevated risk to be primarily for RF-positive disease [9, 25, 5658]. Several reports and a meta-analysis have suggested that those with the PTPN22 risk allele have more severe disease [25, 57]. We have confirmed that a significant association exists after adjustment for potential confounders, including smoking and reproductive factors. We also found a significant multiplicative interaction between heavy cigarette smoking (≥10 pack-years) and the presence of the PTPN22 risk allele, with a threefold elevated odds of developing RA in the presence of both factors.

Kallberg and colleagues [38] recently explored potential gene-environment and gene-gene interactions in RA susceptibility, combining data from three large RA cohort studies. The results of their study are slightly different from ours, in that they did not find a significant interaction between the presence of the PTPN22 polymorphism and smoking in determining RA risk. Their gene-smoking interaction analyses used data from the Swedish Environmental Investigations in RA incident RA cohort, in which participants were asked to recall past smoking and were classified as ever or never smokers. Using the detailed prospective data regarding smoking amount and duration available for NHS and NHSII participants, we demonstrated a multiplicative interaction between the presence of the PTPN22 risk allele and heavy cigarette smoking of ≥10 pack-years in this female cohort. In past studies, we have found that the risk for RA was significantly elevated with ≥10 pack-years [35]. Our results now suggest that it may be necessary to exceed a threshold of heavy smoking to trigger a biologic pathway in RA pathogenesis involving the PTPN22 gene. Both HLA-SE and PTPN22 primarily affect the risk for RF-positive and anti-CCP-positive RA [25, 36, 5961].

In the case of HLA-SE, it is hypothesized that cigarette smoking leads to inflammation and citrullination of certain peptides, which – when presented within the context of HLA-DR4 molecules – are specifically recognized, contributing to anti-citrulline autoimmunity [37]. The newly described interaction between PTPN22 and heavy cigarette smoking suggests that the smoking/citrullination/T-cell recognition and activation pathway in RA pathogenesis may be influenced by both PTPN22 and HLA-SE.

The CTLA-4 gene is an attractive candidate gene for RA susceptibility, given the role played by CTLA-4 (cytotoxic T-lymphocyte associated 4) in T-cell activation and that a CTLA-4-IgG1 fusion protein is very effective in treating RA [62]. The CT60 polymorphism was associated with a modest increase in RA risk in the NHS cohort alone, and not in the NHSII cohort or pooled results, possibly because of a lack of sufficient power to detect a small elevation in risk (with OR in the order of 1.2) reported in other studies [25]. In a post hoc power calculation, for this CTLA-4 genotype with a risk allele frequency of 0.56 among controls and a two-sided type I error rate of 0.05, we had 71% power to detect an effect of 40% or greater (OR = 1.4).

The enzyme peptidylarginine deiminase-4, responsible for the citrullination of peptides to which anti-CCP antibodies are formed, is encoded by the PADI-4 gene. The PADI4_94 SNP we have investigated was associated with RA in Japanese subjects (OR = 1.97 [95% CI = 1.44 to 2.69]) [8]. The effect sizes observed in previous replication studies in Caucasians have been small (pooled OR = 1.1 [95% CI = 1.0 to 1.2]) [25,.64]. We were unable to detect an effect of this polymorphism on the risk for RA in these cohorts of women, and this could reflect inadequate power to detect a risk estimate of that magnitude. Given that the allele frequencies in the controls were similar in each of the cohorts to that reported in the literature, the significant P value for heterogeneity across the cohorts we observed was probably due to small sample size.

Limitations of this study that should be noted include the fact that, in the NHS and NHSII cohorts, the presence or absence of RF in the blood among RA cases was confirmed by medical record review at diagnosis, and thus not assayed at the same laboratory, and was not assayed in controls. Rheumatoid nodules and radiographic erosions are likewise documented at the time of diagnosis from thorough medical record review, but cohort participants have not been followed longitudinally for RA disease activity or complications. Similarly, we have limited data in the medical record on antibodies to CCP among the cases, which is important in the subphenotyping of RA [64], because the dates of diagnosis for most of the RA cases in this cohort preceded the clinical use of anti-CCP. Further analysis by anti-CCP status could be potentially informative.

Although all participants included in this analysis were of self-reported Caucasian ancestry, potential population stratification, or confounding by ethnicity, still exists, in particular if the inclusion of individuals of Northern compared with Southern European origin varied between cases and controls [48, 65, 66]. We assessed the potential for this bias in two ways. First, we examined and did not find significant differences in the more precise racial backgrounds reported by the Caucasian women included as cases or controls in these analyses. Second, we genotyped the lactase gene, which is known to exhibit substantial variation in allele frequency from Northern to Southern Europe [47, 48], and found no significant differences in allele frequencies between cases and controls. A recent whole-genome association study investigating breast cancer risk alleles [67] found no evidence of population stratification among self-reported Caucasian women in the NHS cohort.

This study is unique in that the participants were followed for many years, in great detail, before the onset on their RA, and environmental and reproductive risk factors for RA have been well studied in this cohort [35, 41]. This has allowed the investigation of possible gene-environment interactions with each of these recently described polymorphisms, and known and suspected RA risk factors assessed prospectively, such as cigarette smoking and menopausal status.

Conclusion

Our data confirm that the PTPN22 R620W polymorphism is a strong risk factor for RF-positive RA, and that presence of this polymorphism interacts with heavy cigarette smoking in a multiplicative manner. These findings contribute to the growing understanding of how genetic and environmental factors interact in RA pathogenesis, and suggest that heavy cigarette smoking and PTPN22 may be acting in a similar mechanistic pathway.