Background

Atopic dermatitis (AD) is a chronic, relapsing, inflammatory skin disorder characterized by eczematous lesions and dry, itchy skin. AD seems to be caused by a combination of hereditary and environmental factors. Although AD has features of a multigenic syndrome, it tends to run in families and commonly begins to manifest in childhood [1]. In a large cohort study of family history, AD was determined to be inherited in an autosomal dominant fashion [2]. There are strong genetic heritable components in many other common and complex diseases [3].

AD occurs at a rate as high as 20% in children [4]. Understanding the genetic background, early discovery, and best therapies for AD is important. Thus, identification of causal variants associated with a common complex trait like AD is needed for early detection.

Whole-exome sequencing (WES) is a technique that involves the sequencing of all protein-coding genes, known as the exome, which is comprised of about 3×107 bases. Although the exome constitutes less than 2% of the human genome, mutations in the exome can have more severe consequences than do mutations in the other 98% of the human genome [5]. The purpose of WES is to identify variations by filtering big data collected from all protein-coding regions. This data includes disease-causing mutations inherited in a Mendelian pattern or disease-predisposing single-nucleotide polymorphisms (SNPs) found in both common and complex disorders [6]. To identify familial causative variants of early-onset AD, we recruited three pedigrees from families with a history of AD and severe clinical phenotypes. We then performed WES on all involved individuals. Alleles were compared to identify causal variants and subsequently validated in a population-based case-control study using Sanger sequencing.

A considerable number of rare and common variants were found, and 14 overlapping genes were detected in the three families. The common disease-common variant (CD-CV) hypothesis can best be tested in genome-wide association studies (GWASs). However, the hypothesis that individually gathered rare variants have severe effects arose from issues of missing heritability in the CD-CV hypothesis [7]. Common variants of common diseases were recently reported to be shared among different races [8]. The importance of common and rare variants is still uncertain [9]. Thus, we aimed to identify not only rare variants, but also common variants.

Linkage analysis is used to determine the rough positions of causal genes relative to known genetic makers. An AD linkage analysis found that chromosome region 1q21 contains an epidermal differentiation complex (EDC). The EDC includes various AD-related genes (e.g., loricrin, involucrin, filaggrin, and the S100 family) [10]. We used AD-related loci to confirm the association of the 14 overlapping genes with AD [11].

Here, we report the WES results from familes with early-onset AD, and we suggest the possibility of new variants, COL6A6 polymorphisms, as novel candidate for the detection of early-onset AD.

Methods

Patients

Peripheral blood samples were obtained from three families with a history of AD. Each family consists of 2 affected and 2 unaffected individuals (Additional file 1: Figure S1). We attempted to eliminate environmental factors as much as possible by recruiting early-onset cases. This study was reviewed and approved by the Chung-Ang University Hospital Institutional Review Board. Each family member was diagnosed with AD by a dermatologist. All patients and children developed AD before 2 years of age and were selected based on high IgE level (>1000) and SCORAD score (>50). Additionally, 112 AD patients and 61 control subjects under 2 years 9 months old were enrolled to validate the association between the candidate variants and atopic dermatitis (Additional file 1: Table S1).

Whole-exome sequencing

Genomic DNA was isolated from the peripheral blood of the members of the three families using a QIAamp DNA Mini Kit (Qiagen Inc, Valencia, CA, USA). The DNA quality and quantity were assessed with a Nanodrop spectrometer (Nanodrop Technologies, Wilmington, DE, USA) and a Qubit fluorometer (Life Technologies, Grand Island, NY, USA). WES was performed using SureSelect Human All Exon V4 + UTR 71 Mb (Agilent, CA, USA), following the manufacturer’s standard protocol. Genomic DNA was sheared using Covaris (Covaris, Woburn, MA, USA). A paired-end DNA sequencing library was prepared through shearing, end-repair, A-tailing, peak detection, PE adaptor ligation, and amplification. After the library was hybridized with bait sequences for 24 h, it was purified and amplified with an index barcode tag, and the library quality and quantity were determined. The exome library was sequenced with the 100-bp paired-end mode of the HiSeq SBS kit.

Whole-exome sequencing processing and alignment

Sequence reads in FASTQ format were mapped to the human assembly UCSC hg19 using the Burrows-Wheeler Aligner (BWA, v0.7.7) [12] with “mem” and seed value parameters “-k 45” to create SAM files with correct mate pair information. The read group tag included the sample name. Picard (v1.92) was then used to convert the SAM files to compressed BAM files and then to sort the BAM files by chromosome coordinate. The Genome Analysis Toolkit (v2.3.9Lite) [13] was used to locally realign the BAM files at intervals corresponding to potential insertion/deletion (indel) alignment errors. Insertions and deletions were identified with Mutect [14] and a GATK Somatic Indel Detector, respectively. Single-nucleotide variants and indels were annotated using snpEff (v3.6c) [15] to classify variants as synonymous, non-synonymous, missense, frameshift point mutations, or frameshift indels.

Annotation

Filter 1: SnpEff (http://snpeff.sourceforge.net/index.html) is a type of variant annotation and an effect prediction tool. It annotates and predicts the effects of variants, such as amino acid changes, on genes. Variants produce effects of different “types” (e.g., non-synonymous, stop-gained, insertion, deletion).

Filter 2: Impact prediction by SnpEff shows results of putative variants, making it easier to quickly categorize and prioritize variants (High: Splice_Site_Acceptor, Splice_Site_Donor, Start_Lost, Frame_Shift, Stop_Gained; Moderate: Non_Synonymous_coding, Codon_Change, Codon_Insertion and Deletion, etc.).

Filter 3: SIFT and Polyphen2 of dbNSFP

The SIFT score predicts whether an amino acid substitution affects protein function. SIFT prediction is based on the degree of amino acid conservation in aligned segments derived from closely related sequences, as collected through PSI-BLAST. The range was 0 to 1; substitutions with scores lower than 0.05 were predicted to be damaging (a lower score signified a greater detrimental effect), whereas substitutions with higher scores were predicted to be tolerable.

The Polyphen2 HDIV score was based on HumDiv, i.e., hdiv_prob. The score ranged from 0 to 1, and a higher score suggested a greater degree of predicted damage. A prediction of “probably damaging” corresponded to scores between 0.957 and 1, “possibly damaging” for scores ranging from 0.453 to 0.956, and “benign” for those between 0 and 0.452.

Filter 4: The PhyloP of dbNSFP detects sites under negative or positive selection while allowing for changes in evolution rate over the branches of the phylogenetic tree. A higher PhyloP score indicates a better conserved site (http://varianttools.sourceforge.net/Annotation/DbNSFP).

Filter 5: PhastCons measures the strength of purifying selection acting on a DNA sequence. A high PhastCons score (0.2) is strong evidence that a genomic region is functionally important.

Filter 6: The 1000 Genome allele frequency of dbNSFP selects variants with frequencies of less than 0.01 or those with unknown frequencies.

Filter 7: The in-house Korean database at the Theragen Etex Bio Institute selects variants with minor allele frequencies (MAFs, less than 0.02 or unknown).

Sanger sequencing

Three SNPs were selected for Sanger validation. PCR amplification of all SNPs was performed at 95 °C for 10 min, followed by 35 cycles at 95 °C for 30 s, 55–58 °C for 30 s and 72 °C for 40 s, with a final extension at 72 °C for 1 min 30 s. The PCR mixtures (total volume 50 μL) contained 25 μL of 2X EF-Taq premix (SolGent, Seoul, South Korea), 2.5 μL of oligonucleotide primer (10 pmol/μL), 18 μL of distilled water, and 2 μL of template containing 20 ng genomic DNA. The PCR products underwent purification via a PCR purification kit (Favorgen, Pingtung, Taiwan) and were sequenced on an Applied Biosystems 3500 DNA sequencer (Foster City, CA, USA) according to the manufacturer’s instructions.

Results

Whole-exome sequencing in three families with atopic dermatitis

WES was performed on three families with AD. To limit our study to genetic factors, we recruited three pedigrees from families with severe clinical AD phenotypes and attempted to minimize environmental factors by selecting early-onset cases. The WES analysis showed that all affected individuals were heterozygous for the identified variants, while unaffected individuals were homozygous wild-type (Table 1).

Table 1 Genotypes of overlapping common and rare variants in three families

A large amount of WES data from each individual exome were filtered in a stepwise fashion to isolate candidate variants related to AD (Additional file 1: Figure S2). The variants collected from the three families were counted after each filtering process. We obtained an average of 176 common variants per family after filter 4 and an average of 48 rare variants following the 1000 Genome filter 5. An average of 44 rare variants were detected after the Korean filter (Table 2).

Table 2 Number of variants by filter in three families using a dominant genetic model

Amino acid changes and variant types (non-synonymous SNP, stop-gained, insertion, deletion) were determined using the variant annotation from the step 1 filter. A non-synonymous SNP is a single-nucleotide change that results in a codon for a different amino acid. Considerable numbers of non-synonymous SNPs were observed in the three families (data not shown).

The variants identified by filter 5 not only indicated common variants (MAF greater than 1%), but also possible functional variants predicted to impair protein function as well as sequences that are highly conserved among 100 vertebrate species.

To find more critical variants among the many filtered genes, we confirmed the overlapping genes of common variants from filter 5 in the three families (Table 1 and Additional file 1: Figure S3). There were 14 overlapping genes in filter 5, four of which reached filter 7 and could be called “rare variants.” Risk alleles were identified in AD patients by comparison with healthy controls (Table 1). The number of overlapping genes is also depicted as a Venn diagram (Additional file 1: Figure S3). Variants of COL6A6 appeared in all three families (Tables 3, 4 and 5), and two COL6A6 SNPs were detected in Family A (Table 3).

Table 3 Family-specific common and rare variants in Family A
Table 4 Family-specific common and rare variants in Family B
Table 5 Family-specific common and rare variants in Family C

The genotypes of the 14 genes were further assessed in an exome analysis. Affected fathers and children were heterozygous, while unaffected mothers and children were homozygous wild type for all 14 candidate genes (Table 1).

The chromosome position, amino acid change, and functional prediction score for each of the 14 overlapping genes are presented for each family. The 14 candidate genes were deemed functionally interesting and supported by the SIFT scores (probability of being damaging/deleterious) and the results of the PhyloP analysis (highly conserved among 100 vertebrate species). The significantly low SIFT and high PhyloP and Phastcon scores of COL6A6 found in all three families signify deleterious protein function. Furthermore, when the Korean filter was applied to COL6A6, all three families showed a genetic variant with an MAF range of 1.7–18%, as measured in 800 Koreans subjects (Tables 3, 4 and 5).

To reduce errors from WES, COL6A6 was analyzed by Sanger sequencing to detect SNPs in all three families. The variants we identified were in positions consistent with those of the WES results. In Family A, the missense mutations c.5216G > A, p.Arg1739Glu (common variant), and c.2716C > T, p.Arg906Cys (rare variant) were detected in the affected family members, but not in unaffected family members (Fig. 1a and b). In Family B, the missense mutation c.1666C > T, p.Pro555Ser (common variant) was only found in the affected members (Fig. 1c). In Family C, the missense mutation c.2716C > T, p.Arg906Cys (rare variant) was also only detected in the affected members (Fig. 1d).

Fig. 1
figure 1

Single-nucleotide polymorphisms (SNPs) in coding regions of COL6A6 in three families. The missense mutation was detected by Sanger sequencing analysis in each family (a-d)

Additionally, to investigate the possibility of COL6A6 variants (rs16830494, rs59021909, and rs200963433) as candidate risk factors for AD, Sanger sequencing was also performed in a case-control study involving 112 patients with AD and 61 control subjects. The allele and genotype frequencies were counted (Table 6). Odds ratios (OR) and 95% confidence intervals (CI) were estimated for all risk factors (Additional file 1: Table S2). No significant associations were found in these three variants under either dominant or recessive models. However, a minor allele (A) of the common variant rs16830494 showed a tendency toward lower frequency. Homozygosity for the rs16830494 minor allele (AA) and for the rs59021909 (TT) allele were more frequent in AD cases compared to controls. The rs200963433 is a rare variant; the 3.5% heterozygous frequency (CT) of rs200963433 seen in the AD cases was double the 1.6% frequency observed in healthy controls (Table 6). The frequency of rs200963433 in Sanger sequencing was also compared with the frequency of 1000 global, 286 East Asian, and 800 Korean subjects. The 1.7% MAF of rs200963433 in the 800 Koreans surveyed nearly matched that of the 61 health controls in Sanger sequencing, and the MAF was elevated in AD cases (Tables 3 and 6).

Table 6 Allele and genotype frequencies of COL6A6 polymorphisms in 112 Korean AD patients and 61 controls

We compared the genetic loci between AD candidate variants identified via WES and in AD-linked chromosomal regions. In previous studies, AD-related chromosome loci were detected using AD linkage analysis. The CDX1, ANKRD35, and TUFT1 genes were present at positions 5q31-33 and 1q21. COL6A6 was in close proximity to the 3q21 locus, which is known to be linked to AD (Table 7).

Table 7 Five candidate genes within AD susceptibility loci identified through genetic linkage analysis

We also detected SNPs in filaggrin (FLG) and FLG2 in these three families. FLG polymorphisms at the 1q21 locus were observed in three families, respectively (Additional file 1: Table S3).

Discussion

Early-onset AD is a phenotype that may be associated with a higher risk of multiple allergies and asthma [16]. The identification of specific genes predictive of early-onset AD may lead to AD prevention and better management.

To identify familial candidate genes related to early-onset AD, we recruited three families with this phenotype. A family-based design has the advantages of being cost-effective and the ability to discover rare variants not detectable in a population study [17]. De novo gene mutations capable of influencing AD susceptibility can also be identified in affected individuals by comparison with unaffected individuals in a family.

Family-specific candidate AD genes were detected using WES. Considerable numbers of common and rare variants were identified in each of the three families. To identify highly critical genes for AD, we confirmed 14 overlapping genes in these families. Variants of the overlapping genes were predicted to be deleterious through functional prediction algorithms. The results of previous AD association studies and the functions of candidate genes were also examined.

Common and rare variants of COL6A6 were found in all three families. The COL6A6 gene encodes collagen type VI alpha 6, the α6-chain of an extracellular matrix protein that is widely expressed in human skin and maintains skeletal muscle and skin integrity [18]. COL6A6 is in close proximity to 3q21. In previous studies, AD-associated loci were confirmed by whole-genome linkage scans. The 3q21 locus has been identified as an AD susceptibility region [11]. Another genome-wide linkage study found highly significant evidence of a linkage to chromosome 3q21. Moreover, significant evidence has linked this locus with allergic sensitization presumably by paternal imprinting, further supporting the presence of an atopy-related gene in this region [19]. CD80 and CD86 are major candidate genes located on 3q21 that are expressed by antigen-presenting cells and are essential to T cell activation [20]. COL6A5, the other collagen alpha-chain, is also linked to AD. A lack of COL6A5 expression affects epidermal integrity and function [21]. Early onset-AD has a prevalence of 15–20% in industrialized countries [4]. The MAF (1.7–18%) of COL6A6 variants in a large sample of the Korean population was similar to the incidence rate of early-onset AD. Our findings suggest that variants of COL6A6 may be novel candidates for early-onset AD in Koreans.

No significant association with AD was identified among the three COL6A6 variants per the Odds ratios used in the case-control study. However, a high frequency of homo alt was detected in rs16830494 and rs59021909 in AD cases relative to controls. It was difficult to obtain p-values for rare variants using Odds ratio, a common limitation [22]. The similar frequencies of rs200963433 in both the 800 Korean population and the 61 healthy controls demonstrate the credibility of the data despite the small study size.

A rare variant of the caudal type homeobox1 (CDX1) gene was detected in two families. CDX1 is located in a candidate AD-linkage region, 5q31-33 [23]. The function of the CDX1 gene is to inhibit β-catenin/T-cell factor transcriptional activity. β-catenin regulates cell-cell adhesion, and Wnt/β-catenin signaling is associated with skin development [24, 25]. A common variant of the ultraviolet (UV)-stimulated scaffold protein A (UVSSA) gene was also identified. The function of the UVSSA gene is to repair DNA damaged by UV rays. The function of any other genes assumed the association of AD was not found.

Missense mutations of FLG and FLG2 were detected in three families, respectively. The effects of the FLG polymorphism on AD are not well characterized. However, Seon-Young et al. recently reported that the FLG (rs11584340) polymorphism is associated with a higher AD risk in the Korean population, and that it affects free fatty acids in serum of AD patients [26]. Loss‐of‐function mutations involving FLG strongly predispose the carrier to early-onset AD, but not to late-onset AD [27]. This study suggests that genetic screening is crucial for identifying risk variants of early-onset AD.

Individually gathered rare variants have severe effects and play important roles in complex human disorders [28]. Thus, our data will help to expand genetic studies of AD.

Conclusions

Identifying family-specific COL6A6 polymorphisms and genetic variants of other candidate genes associated with AD using WES is a novel approach. Our study suggests that COL6A6 variants may constitute candidate risk factors for AD development, as identified via family-based WES and a non-familial case-control study of 173 subjects. This study provides a genetic basis for early-onset AD diagnosis in Korean patients and the development of new therapies.