Introduction

Systemic lupus erythematosus (SLE) is a systemic autoimmune disease that affects predominantly women aged 15–40 years, particularly women of child-bearing age. SLE has been estimated to affect 31–70 cases per 100,000 people in China [1] with a ratio of 9:1 between female and male patients. It is well known that both genetic and environmental factors contribute to disease susceptibility [26]. Numerous variants on autosomal loci have been found to be associated with SLE in multiple ethnic groups through candidate gene and genome-wide association studies (GWASs) [7]. Despite the advances in the genetic studies over recent years, the pathogenesis of SLE remains poorly understood.

Because of the huge gender difference in disease prevalence, involvement of the genetic variants on the X chromosome has long been suspected. In recent years, genetic variants of several genes on the X chromosome, such as MECP2, IRAK1, TLR7, and PRPS2, have been confirmed to be associated with SLE [811]. In particular, single nucleotide polymorphism (SNP) rs3853839 on the 3′ untranslated region (UTR) of TLR7 was shown to be associated with SLE, especially in Chinese and Japanese male subjects compared with females [10], and a fine mapping study by Kaufman et al. [11] in four different ancestral groups suggested that the nonsynonymous SNP rs1059702 (S196F) within IRAK1 might be a causal risk variant for SLE. More recently, we performed a meta-analysis of GWASs in Chinese Han populations and followed up the top findings in four additional Asian cohorts [12]. Besides confirming the previously reported associations within IRAK1-MECP2 (rs1059702) and L1CAM-MECP2 loci, we also identified a genetic variant (rs7062536) in PRPS2 on Xp22.3 as a novel susceptibility locus and novel independent associations within the NAA10 (rs2070028) and TMEM187 (rs17422) loci.

In this study, with the aim to discover additional X-linked genetic risk variants for SLE, we performed a follow-up study of our previously published GWAS dataset by improving the coverage of genetic variation through imputation and validating the top findings in an additional three independent Chinese Han sample collections [13]. We discovered a novel susceptibility locus LINC01420 on Xp11.21 associated with SLE.

Methods

Subjects

SLE cases and controls were all female and were recruited from multiple hospitals in three geographic regions of China (central, southern, and northern China). All subjects were of self-reported Chinese Han origin. Samples in the GWAS discovery stage (1017 SLE cases and 539 controls) were recruited from central China [13]. Samples in the replication studies were recruited from multiple regions in China, mainly from central (replication: 1156 cases and 2330 controls), southern (replication: 1012 cases and 335 controls), and northern (replication: 274 cases and 133 controls) China. All patients were diagnosed as cases by at least two experienced physicians using the American College of Rheumatology (ACR) criteria revised in 1997 [14]. Controls also were geographically and ethnically matched and clinically evaluated to be without SLE, autoimmune disorders, or family history of autoimmune diseases. Clinical information for all patients and controls was collected through a structured questionnaire. Written informed consent was acquired from all participants. This study was approved by the Institutional Ethical Committee of The First Affiliated Hospital of Anhui Medical University, China–Japan Friendship Hospital, Jiangmen Central Hospital, and The Third Affiliated Hospital of Sun Yat-Sen University, according to Declaration of Helsinki principles. The information for all subjects is summarized in Table 1.

Table 1 Summary of samples used in GWASs and replication studies

Genotyping

The genotyping in the discovery stage for the central China cohort was conducted by Illumina 610-Quad Human Beadchip array (Illumina, Inc., San Diego, CA, USA). The genomic DNA was isolated from peripheral blood mononuclear cells (PBMCs) with standard procedures using Flexi Gene DNA kits (QIAGEN GmbH, Hilden, Germany) and was diluted to working concentrations of 50 ng/μl for genome-wide genotyping and 15–20 ng/μl for the validation study. The SNPs in the X chromosome for the validation stage were genotyped using the Sequenom MassArray iPlex Gold platform (Sequenom, Inc., San Diego, CA, USA).

Statistical analyses

Quality control criteria were applied to genotyped SNPs, and those with minor allele frequency (MAF) <5 % in cases and controls were excluded. SNPs with a genotype missing rate >10 % or Hardy–Weinberg equilibrium (HWE) P <3.14 × 10−6 in controls were also excluded. Association analysis was performed in PLINK v1.07 [15] using the logistic regression test. We selected 12 SNPs within novel or unpublished loci with P <1.00 × 10−2 for further validation in 2442 cases and 2798 controls (SNP missing rate <10 % and HWE for female controls with P >1.00 × 10−2).

To control the impact of population stratification in the validation and combined analysis, we matched cases and controls in terms of ethnic and geographic origins as independent validation samples for combined analysis. Fixed-effects meta-analysis of the four independent studies in the discovery GWAS and three validation cohorts (central, southern and northern) was performed using the inverse variants weighted effect size method in Metasoft version 2.0.0 [16].

We performed the combined analysis of the central region (both discovery and central validation) cohort, southern validation cohort, and northern validation cohort using fixed-effects meta-analysis. The I 2 heterogeneity statistic shows the heterogeneity across studies, with I 2 < 50 and P het >0.05 considered insignificant (Table 2).

Table 2 Validation of top SNP rs5914778 in southern and northern regions of China

Imputation

Imputation of the X chromosome SNPs was performed on the discovery dataset for female individuals using X chromosome nonpseudoautosomal region data from the 1000 Genomes project (phase 1 integrated version 3) as reference [17]. As part of the quality control, SNPs with accuracy score <0.8, missing rate >10 %, MAF <5 % in cases and controls, or HWE P <2.89 × 10−7 in controls were also excluded. Association was carried out by logistic regression test. The imputation results show that there is no substantial improvement of significant signals between imputed or genotyped SNPs (Fig. 1). No imputed SNPs show better P values that would warrant further validation on top of the genotyped SNPs. Therefore, we proceeded with the validation of the selected genotyped SNPs which resided in novel regions.

Fig. 1
figure 1

Manhattan plot of the X chromosome association analysis on SLE. Manhattan plot of association results (−log10(P value)) are depicted with regards to the physical location of SNPs and include both imputed and genotyped association results. Positions and genes were based on National Center for Biotechnology Information build 37 (http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/). The color of gene labeling corresponds to the gene type of known loci, suggestive loci, and novel loci. Known loci are defined as loci published previously, whereby novel loci are defined as loci in the current study validated at genome-wide significance (P <5 × 10−8)

Results

X chromosome discovery and first-stage study

We conducted X chromosome association tests of SLE in the GWAS dataset which consists of 1017 cases and 539 controls, after stringent quality control filtering (see Statistical analyses). The discovery analysis revealed strong evidence of association for all previously identified susceptibility loci on the X chromosome and suggested additional novel risk loci (Additional file 1: Table S1).

To further investigate the observed associations, we imputed the genotypes of additional SNPs that were not genotyped using IMPUTE (v2.0) (Oxf, Oxford, Oxon, UK). After stringent quality control filtering (imputation), no imputed SNPs show better P values that would warrant further validation on top of the genotyped SNPs. Therefore, to validate the findings from the discovery analysis, we selected the top SNPs from 14 independent new loci with suggestive association with SLE (P <10−2) for a follow-up analysis in an additional 1156 cases and 2330 controls of Chinese Han descent from central China. Of the 12 successfully genotyped SNPs, two showed association at P <0.05 in the validation samples and six showed consistent effects between the discovery and validation samples. The meta-analysis results for the 12 SNPs in the combined discovery and central validation dataset totaling 2173 cases and 2869 controls, using fixed-effects and random-effects models, are presented in Table 3. The combined analysis discovered a novel locus rs5914778 within LINC01420 associated with SLE disease at genome-wide significance (P = 1.00 × 10−8; odds ratio (OR) = 1.32).

Table 3 Validation of selected SNPs on the X chromosome in the central region

Further replication of selected SNPs and the heterogeneity test

We performed further replication analysis of rs5914778 in two additional independent samples of Chinese Han descent from the southern and northern regions of China. The replication in the southern Chinese sample cohort, consisting of a total of 1012 cases and 335 controls, provided strong supporting evidence for the association of rs5914778 with SLE (P = 5.31 × 10−5; OR = 1.51). The meta-analysis of the samples from the central and southern regions, totaling 3185 cases and 3204 controls, provided robust evidence for the association of rs5914778 (P = 5.26 × 10−12; OR = 1.35). In addition, the strength of the association is very consistent without any evidence of heterogeneity (P het = 0.46, I 2 = 0) (Table 2).

Intriguingly, this SNP did not show association with SLE in the northern sample with a total of 274 cases and 133 controls (P = 0.33, OR = 0.85), and the SNP actually showed an opposite effect in the northern sample as compared with the central and southern samples (Table 2). This could be because of the very small sample size of the northern replication cohort. Further studies are needed to confirm the heterogeneity of this association between the northern and central/southern Chinese populations.

Lastly, we performed a joint analysis for all of the discovery, central, southern, and northern validation samples totaling 3459 cases and 3337 controls, using a fixed-effects meta-analysis. The association at rs5914778 (LINC01420) on Xp11.21 surpassed the genome-wide significance (P = 1.22 × 10−10; OR = 1.31), but a moderate heterogeneity of association was observed within the samples (P het = 0.034, I 2 = 65.3) (Table 2 and Fig. 2).

Fig. 2
figure 2

a Regional association plots of new loci (Xp11.21, LINC01420). The association results (−log10(P value)) of SNPs from the discovery analysis were shown against their map positions (NCBI build 37). Validated SNP rs5914778 is labeled purple. b Regional association plots after conditioning on rs5914778. The association results (−log10(P value)) of SNPs from the discovery analysis after conditioning on rs5914778. All map coordinates are based on NCBI build 37. chrX X chromosome, NCBI National Center for Biotechnology Information

Discussion

Through the discovery and validation analyses in two independent female samples from the central region of China, we have discovered a novel SLE susceptibility locus at rs5914778 (LINC01420) on Xp11.21 at the genome-wide significance. Further replication analysis in the independent sample of southern Chinese confirmed the association with strong evidence. The analysis of the independent sample of northern Chinese failed to replicate the association, but the sample size of the northern cohort is very small.

rs5914778 is located within a long intronic region between the first and second exons of LINC01420. LINC01420 is a long noncoding RNA with enhancers marked by histone modifications in human umbilical vein endothelial cells (HUVEC) and HSMM based on HaploReg annotation [18] (Additional file 2: Table S2). LINC01420 was found to have sex-specific DNAse I hypersensitivity patterns which showed H3K4me3 histone enrichment and strong expression in females only [19]. According to the regulatory annotation information from the ENCODE project [20], this SNP is within a DNase I hypersensitive site that was detected in the lymphoblastoid cell line. LINC01420 may maintain the X inactivation which avoids X-linked gene overexpression through dosage compensation in females [21]. Long noncoding RNAs have been shown to be associated with many complex diseases such as psoriasis, breast cancer, gastric cancer, colorectal cancer, osteosarcoma, adrenocortical cancer, and cardiovascular diseases in recent years [2228]. Some noncoding RNAs also play a role in the pathogenesis and progression of hepatocellular carcinoma, and may act as therapeutic targets for hepatocellular carcinoma [29]. In order to reveal whether there are expression difference of LINC01420 between females and males, we performed gene expression analyses using the gene expression data from CD4+ T cells and monocytes from 461 healthy individuals [30] and the gene expression data from PBMCs of 82 controls [31] in GEO datasets. However, we did not obtain the gene expression result of LINC01420, indicating that LINC01420 might express too low to be detected in blood cells from healthy individuals. Hence, more work will be needed to elucidate the biological mechanism through which LINC01420 influences SLE pathogenesis.

We also observed another SNP, rs5913992, in perfect linkage disequilibrium with our top SNP rs5914778 (R 2 = 1) that was predicted to be functional by Regulome DB (LSJU, Stanford, CA, USA) with a score of 2b (likely to affect binding of motifs, transcription factors, and enhancer histone marks) in this locus [32]. rs5913992 is also within the region of the binding sites of six overlapping transcription factors (transcription factor binding sites)—RELA, CTCF, CEBPB, RAD21, ZNF143, and SMC3—that were detected by ChIP-Seq analysis in lymphoblastoid, epithelial, endothelial, breast cancer, and myeloid leukemia cell lines (Fig. 3). The prediction by Regulome DB indicates that this SNP overlaps a potential consensus EWSR1-FLI1 binding motif within the binding sites of the six transcription factor binding sites (Additional file 3: Table S3).

Fig. 3
figure 3

Transcription factor binding sites of rs5913992 SNP (ENCODE). The transcription track from the UCSC (University of California, Santa Cruz) genome browser showed regions where transcription factors responsible for modulating gene transcription bind to DNA as assayed by ChIP-seq

We observed the same risk effect at rs5914778 in the central and southern validation results, while the opposite effect was observed in the northern validation results. The association of the northern cohort was significantly heterogeneous compared with the central and southern cohorts respectively (P het = 0.034, I 2 = 65.3). Several previous studies have demonstrated differences in disease risk between northern and southern Chinese, and further studies in more northern Chinese samples will be needed to confirm the genetic heterogeneity of this susceptibility locus among the central, southern, and northern Chinese populations [3335].

Conclusions

We performed a three-stage X chromosome association analysis of SLE in the Chinese Han population and discovered a novel susceptibility locus on Xp11.21. Although further studies will be required to understand how the locus influences the etiology of SLE, the discovery of this novel locus has further expanded the role of the X chromosome in the development of SLE in the Chinese Han population.