High variability of peptidylarginine deiminase 4 (PADI4) in a healthy white population: characterization of six new variants of PADI4 exons 2–4 by a novel haplotype-specific sequencing-based approach
- First Online:
- Cite this article as:
- Hoppe, B., Heymann, G.A., Tolou, F. et al. J Mol Med (2004) 82: 762. doi:10.1007/s00109-004-0584-6
- 79 Views
Seven single nucleotide polymorphisms (SNPs) of the peptidylarginine deiminase 4 (PADI4) gene have recently been reported to be strongly associated with rheumatoid arthritis in Japanese individuals. These SNPs are located in or close to exons 2–4 of PADI4 and are organized in at least four different haplotypes. However, a detailed sequencing-based characterization of the PADI4 gene in other populations is still lacking. We therefore analyzed exons 2–4 of the PADI4 gene in 102 healthy white Germans individuals by DNA sequencing and characterized new variants and haplotypes by a novel haplotype-specific sequencing-based approach. The haplotypes 2/3 (padi4_89*G, padi4_90*T, padi4_92*G, padi4_94*T, padi4_104*T, padi4_95*C, padi4_96*C), and haplotype 4 (padi4_89*G, padi4_90*T, padi4_92*G, padi4_94*T, padi4_104*C, padi4_95*G, padi4_96*T) conferring susceptibility to rheumatoid arthritis were detected at frequencies of 30.9% and 7.8%, respectively. In addition, three novel coding SNPs in exons 2, 3, and 4, and three SNPs in introns 2 and 3 located near the exon-intron boundaries were identified in 11 individuals (10.8%). The so-called nonsusceptibility haplotype 1 (padi4_89*A, padi4_90*C, padi4_92*C, padi4_94*C, padi4_104*C, padi4_95*G, padi4_96*T) occurred at a frequency of 58.3%. Additionally, we identified a closely related novel haplotype, haplotype 1B (2.9%), that differs from haplotype 1 only by padi4_92*G/padi4_96*C. This haplotype was not described in the Japanese population. Our results indicate that the PADI4 gene exhibits a remarkable variability and a rather complex haplotypic organization. Further studies on disease association of PADI4 should be performed by haplotype-specific sequencing-based approaches to identify the exact genotype of the PADI4 fragment of interest.
KeywordsPADI4HaplotypeSequencingSingle nucleotide polymorphismRheumatoid arthritis
Polymerase chain reaction
Single nucleotide polymorphism
The peptidylarginine deiminases (PADs, EC 18.104.22.168) are enzymes involved in the posttranslational deimination of protein-bound arginine to citrulline . Five different types of PADs encoded by the genes PADI1–4 and PADI6 are currently known . The exact functional significance of these enzymes is unknown. However, evidence suggests that at least PADI4 might have an immunomodulatory function, and that it leads to breakage of tolerance under certain circumstances. Posttranslational deimination of proteins is a phenomenon that occurs under physiological and pathological conditions. Citrulline is found in structural proteins such as filaggrin and in some keratins in terminally differentiating keratinocytes . Recent reports describe the stimulation-dependent citrullinization of histones in granulocytes and suggest a possible role of this modification in chromatin remodeling . Moreover, citrulline-modified proteins are thought to be targets of the autoimmune reaction in some autoimmune diseases. For example, enhanced T-cell responsiveness to citrullinated myelin basic protein has been observed in multiple sclerosis .
The presence of citrulline-modified target epitopes for autoantibodies is a well known phenomenon in rheumatoid arthritis (RA) [5, 6]. PADs were recently implicated in the generation of anti-cyclic citrullinated peptide antibodies detectable in early stages of RA [5, 6, 7]. The process resulting in anti-cyclic citrullinated peptide antibody formation is thought to play a pivotal role in early stages of disease progression since it is detectable several years before the onset of symptoms in patients with RA . There is evidence that the deimination of arginine at those peptide side-chain positions that interact with the so-called shared epitope of some major histocompatibility complex class II molecules (e.g., HLA-DRB1*0401 or HLA-DRB1*0404) results in the generation of high-affinity peptides, thus inducing a strong in vitro T-cell activation . Using gene-based linkage disequilibrium mapping approaches, a Japanese research group identified in 1p36 a genomic region containing the genes PADI1–4, which seemed to be associated with susceptibility to RA. The gene responsible for the association with RA was identified as PADI4, which has four main haplotypes that differ at four exonic single nucleotide polymorphisms (SNPs), with three subsequent amino acid substitutions . While the so-called susceptibility haplotypes (sPADI) 2, 3, and 4 were found to be significantly more frequent in Japanese individuals suffering from RA, the nonsusceptibility haplotype (nPADI) 1 predominated in healthy individuals . However, another group studying the association between PADI4 and RA in the United Kingdom did not find a difference in PADI4 haplotype distribution between RA patients and healthy individuals . Thus the relevance of PADI4 variability for susceptibility to RA is still unclear.
PADI4 variability has been tested until now by SNP screening using techniques such as TaqMan 5′ allelic discrimination or Invader assays, and the corresponding haplotypes were calculated by the expectation-maximization algorithm [10, 11]. In addition, the identification of SNPs by sequencing-based approaches was limited to screening for heterozygote positions . In other words, techniques that allow an in-depth analysis of PADI4 to determine the exact cis/trans linkage of different SNPs and to identify additional novel variants in their exact haplotypic context are still lacking. Consequently we devised a method for sequencing-based characterization of exons 2–4 of the PADI4 gene in a healthy white German population using a novel long-range (5.3 kb) haplotype-specific amplification technique.
Material and methods
Genomic DNA was extracted from whole blood using GenoPrep cartridges B and the GenoM-6 system (GenoVision) following the manufacturer’s instructions. Blood samples were withdrawn from healthy, unrelated blood donors who gave their informed consent. The mean age of the 102 individuals studied was 40.6 years (range 19–64 years); 57% of the subjects were women. Cycle sequencing of DNA samples was carried out in two ways (numbering of nucleotides was based on the respective position in sequence NT_034376.1).
Primers used for sequencing of PADI4. Primers used for amplification and sequencing of exons 2, 3, and 4 of PADI4. The specified locations of the 3′-terminals of the primers are based on the numbering of sequence NT_034376.1 (F forward primer, R reverse primer)
Sequence 5′→3′ (location of 3′ terminal end)
The second sequencing approach was designed to provide information on the exact haplotypic organization of novel variants and haplotypes of exons 2–4 of PADI4. It was therefore necessary to amplify large DNA fragments (5.3 kb) in a haplotype-specific manner. We designed allele-specific primers for the SNPs padi4_89*A/G and padi4_96*T/C (Table 1) and performed long-range PCR. Briefly, we performed four amplification reactions using Platinum PCR SuperMix High Fidelity (Invitrogen) and one of the following haplotype-specific primer pairs (Table 1; final concentrations are indicated in parentheses): padi4_89_F01A (forward primer, 300 nM)/padi4_96_R01T (reverse primer, 300 nM), padi4_89_F01A (forward primer, 200 nM)/padi4_96_R01C (reverse primer, 200 nM), padi4_89_F01G (forward primer, 200 nM)/padi4_96_R01T (reverse primer, 200 nM), and padi4_89_F01G (forward primer, 200 nM)/padi4_96_R01C (reverse primer, 200 nM). The thermal cycle profile for long-range PCR was as follows: denaturation (94°C, 2 min), 15 cycles of (94°C, 30 s; 65°C, 30 s; 68°C, 5.5 min), 15 cycles of (94°C, 30 s; 60°C, 30 s; 68°C, 5.5 min), and 10 cycles of (94°C, 30 s; 55°C, 30 s; 68°C, 5.5 min). The specificity of these primer pairs for the distinct PADI4 haplotypes was tested on the haplotypes calculated using the expectation-maximization algorithm (EH program, available at ftp://linkage.rockefeller.edu/software/eh) based on the results of the sequencing of single exons described above. The reactions resulting in an amplification product were digested using ExoSAP-IT (Amersham Biosciences) following the manufacturer’s instructions, and the PCR products were sequenced on a thermal cycler (25 cycles of 96°C, 10 s; 50°C, 10 s, 60°C, 4 min) using BigDye terminators v. 1.1 and one of the following sequencing primers with a final concentration of 2.5 nM (Table 1): PADI4ex02_1− (exon 2, reverse primer), PADI4ex03_1+ (exon 3, forward primer), PADI4ex03_1− (exon 3, reverse primer), or PADI4ex04_1+ (exon 4, forward primer). The designations of the PADI4 haplotypes are in accordance with Suzuki et al. .
PADI4 haplotype frequencies (exons 2–4)
Haplotype frequencies of PADI4 (exons 2–4) in white population (n=102). Haplotypic organization of exons 2–4 of PADI4 and the haplotype frequencies (parenthesis); PADI4 haplotype designations are based on those of Suzuki et al. 
Distribution of PADI4 haplotype combinations
PADI4 (exons 2–4) haplotype combinations in a white population (n=102): numbers of different PADI4 haplotype combinations
Localization and characterization of six novel PADI4 variants
The specified positions of the intronic SNPs identified in 5 individuals (4.9%) are indicated based on the sequence NT_034376.1 (Fig. 1). SNP 390194C→T (PADI4h02in02/01, accession number AJ715938) linked to haplotype 2/3 was found in three individuals and is located 38 nucleotides downstream of the boundary of exon 2 and intron 2. SNP 393030A→G (PADI4h02in03/01, accession number AJ715936) which was identified in intron 3 of PADI4 haplotype 2/3 (n=1) is located 14 nucleotides downstream of exon 3. Linked to haplotype 1 the SNP 392864C→T (PADI4h01in02/01, accession number AJ715932) was found 85 nucleotides upstream of exon 3 (n=1).
The unambiguous determination of the cis/trans linkage of SNPs 390194C→T and 392G→C by the expectation-maximization algorithm was not possible because both SNPs were identified in individuals presenting uniformly with PADI4 haplotype 1 combined with haplotype 2/3. In these cases haplotype-specific sequencing was necessary to assign the exact haplotypic context.
The mechanism by which PADI4 variability affects the breakage of tolerance is still unknown. Initial studies demonstrated different half-lives of mRNA transcribed from sPADI4 and nPADI4 [9, 10]. It was argued that these differences in mRNA stability can result in higher enzymatic activity in cases in which sPADI4 is present, leading to the generation of larger amounts of citrullinated peptides. This could ultimately promote an autoimmunization process. However, we believe that differences in substrate specificities between sPADI4- and nPADI4-encoded enzymes that can result in the formation of specific sPADI4-dependent, citrullinated auto-antigens triggering autoimmunization should be considered as well. Similar to the specific binding and presentation of distinct peptide repertoires by different MHC molecules the gene product of sPADI4 could bind and modify peptide motifs that are not compatible for the interaction with nPADI4-encoded proteins. To verify this hypothesis the PADI4 gene should be characterized using techniques capable of identifying the cis/trans linkage of SNPs directly, thus allowing one to determine the exact haplotypic organization of PADI4, including the detection and characterization of novel polymorphisms and their haplotypic linkage.
The main PADI4 haplotypes in our white German population exhibited a distribution similar to those in Japanese and British studies [10, 11]: The most prevalent forms were haplotype 1 (padi_89*A, padi_90*C, padi_92*C, padi4_94*C, padi_104*C, padi4_95*G, padi4_96*T) and haplotype 2/3 (padi_89*G, padi_90*T, padi_92*G, padi4_94*T, padi_104*T, padi4_95*C, padi4_96*C; Germany 58%/31%; Japan 60%/29%; United Kingdom 56%/32%). We did not discriminate between PADI4 haplotypes 2 and 3 because SNP padi4_102, which differentiates between haplotypes 2 and 3, is located more than 11 kb downstream of the region of interest. Haplotype 4 (padi_89*G, padi_90*T, padi_92*G, padi4_94*T, padi_104*C, padi4_95*G, padi4_96*T) was about twice as frequent in Germany and the United Kingdom as in Japan (Germany 8%; Japan 4%; United Kingdom 9%).
An exact comparison of the frequency of the haplotype 1B identified in this study with this of previously published studies was not possible. No SNP constellation comparable to haplotype 1B was described in the Japanese population . In the British study only the SNPs padi4_89, padi4_90, padi4_92, and padi4_104 were determined . However, when considering the constellation padi4_89*A, padi4_90*C, padi4_92*G, and padi4_104*C, which is common to haplotypes 1B, the frequencies reported in the UK (2.2%) and in the present study (2.9%) are largely similar.
The most remarkable finding in our study was the large number of additional novel variants identified (Fig. 1). More than 10% of the individuals studied presented with previously unknown polymorphisms. All of the exonic variations result in amino acid substitutions (265G→A, D89 N; 304C→A, P102T; 392G→C, R131T) that alter the charge of the respective amino acids (D89 N, R131T), or that may affect the steric arrangement of the neighboring amino acids (P102T). Because the novel intronic variations are located near the exon-intron boundaries—390194C→T, 392864C→T, and 393030A→G are located 38 bp downstream of exon 2, 85 bp upstream of exon 3, and 14 bp downstream of exon 3—one may speculate that both variations affect the process of splicing. However, intronic variants located more distantly from intron-exon boundaries may also affect the results of disease association studies. Further studies should address the questions of the functional relevance of the described amino acid substitutions and of the influence of intronic variants on PADI4 splicing. Studies focusing on the structural analysis of PADI4 by X-ray cristallographic analysis are under way . A complete structural analysis of PADI4 will help to understand the way by which PADI4 interacts with the respective substrates and how variations of PADI4 could modify substrate specificity.
A further interesting observation is that four out of six novel variants present in 8 out of 11 individuals were found to be in cis linkage with the susceptibility haplotype 2/3. This finding is all the more interesting when one considers that haplotype 2/3 is about half as frequent as haplotype 1. The linkage of the newly described PADI4 variants with the susceptibility haplotype 2/3 raises the question of whether the phenomenon of association with RA is affected, not only by the SNP constellation characterizing the so-called susceptibility PADI4 haplotypes 2/3 and 4 themselves but also by additional variants predominantly found in linkage with these susceptibility haplotypes. Such additional variants cannot be identified by simple SNP diagnostic procedures such as amplification refractory mutation system, TaqMan 5′ allelic discrimination assays or Invader assays. We therefore emphasize that further studies on disease association of PADI4 should be performed using sequencing-based approaches that allow the identification of novel variants and the characterization of their exact haplotypic context.
In view of the variability of PADI4 and the need for a correct attribution of novel variants to the respective PADI4 haplotypes we feel it is necessary to establish a PADI4 nomenclature allowing a clearcut description of PADI4 variants.
We thank Gisela Diederich for excellent technical assistance.