Background

Vitiligo is an acquired hypomelanotic disorder characterized by circumscribed depigmented macules in the skin resulting from the loss of melanocytes. Autoimmunity has been identified as the major etiological factor in vitiligo, although many other factors including infections, stress, neural abnormalities, aberrant melanocyte function, and genetic susceptibility have been implicated [1]. The Smyth line (SL) chicken is the only animal model for autoimmune vitiligo that spontaneously displays all clinical and biological manifestations of the human disorder [2, 3]. Like other autoimmune diseases, SL vitiligo (SLV) is multi-factorial in nature and involves the interplay of genetic, immune system, and environmental-factors. SLV susceptibility is manifested in part in an inherent melanocyte defect and loss of melanocytes is due to melanocyte-specific cell-mediated and humoral immune activities [2, 3].

Recent genome wide association studies (GWAS) in humans to understand the role of genetic components in a variety of autoimmune diseases including vitiligo have identified hundreds of loci harboring risk alleles [4]. Several GWAS results identified vitiligo susceptible loci in human populations [510]. However, most susceptible loci identified by GWAS results were found in regulatory regions of gene expression, therefore the identified associations were not sufficient to identify the causal gene or deduce alterations caused by risk variants, which generally do not induce profound changes to genes (e.g. coding sequence changes, deletions, or duplications). Recently, the encyclopedia of DNA elements (ENCODE) of mammalian species suggested that ~90% of disease associated genetic variations in human lie in noncoding regions, while only ~10% of variations in coding regions were causative mutations associated with human disease [11, 12]. Nevertheless, the identification of potential coding mutations that alter protein functionalities is a prerequisite process to understand disease etiology. Moreover, the functional study of candidate genetic risk factors is almost impossible without appropriate model systems.

The SL chicken is an excellent model to conduct a functional verification study of candidate genes that underlie genetic susceptibility for vitiligo due to the tractable, definite phenotype, the high vitiligo incidence in the population (80-90%), the feasibility of in vivo characterization and the relatively short generation time. Recently, microarray analysis examined global gene expression during SLV development and provided comprehensive information at the transcriptome level that supported the multifactorial etiology of vitiligo [13]. In this study, whole genome resequencing analysis using an Illumina platform was performed to more deeply investigate the genetic aspects of SLV expression in comparison with the parental Brown line (BL) of chickens from which the SL was originally derived. BL chickens retain vitiligo susceptibility but with a very low (0 – 2%) incidence rate of vitiligo development [2, 3], although none of the BL chickens used in this study had vitiligo. Millions of single nucleotide polymorphisms (SNP) were identified by genome resequencing and only potentially causal genes containing non-synonymous mutations that can induce amino acid changes in proteins were focused on in this study.

Results and discussion

Genome resequencing for BL and SL chickens

Genome sequencing of pooled DNA from 10 non-vitiliginous BL and SL chickens each with confirmed SLV produced ~63 and 89 million sequence reads of 200 bp, respectively (Table 1). Of those, ~80% of the reads were used for sequence alignment, while 20% of sequence reads were not aligned. Therefore, genome coverage for BL and SL reached 5.1x and 7.0x, respectively, of the Red Jungle Fowl chicken genome. The total number of SNPs was 4.8 and 5.5 million (~0.5% of template genome) for the BL and SL genome, respectively. The large number of SNPs per examined chicken line was based on data of at least 2 read coverage depths (number of read counts per nucleotide location). Most SNPs were found on the larger chromosomes (Chr), including Chr 1 through 5 and Z (sex chromosome) (data not shown). To identify genetic biomarkers that are responsible for the incidence of SLV, unique SNPs that are found in SL only were selected by removing SNPs that overlapped with those found in the parental BL. Then, mutations with ≥75% SNP rates were chosen as reliable marker SNPs. Since the objective of this study was to identify mutational SNPs uniquely found in SL compared to the parental BL, the filtering process used did not involve a typical SNP calling and filtering method based on quality score (Q call column in Additional file 1: Table S1) [14]. Instead, SNP filtering was conducted by removing overlapping SNPs found in both BL and SL, and applying fixed %SNP rates (≥75%) as described in Methods. As a result, a total of ~1 million unique SNPs were identified throughout the SL chicken genome (Figure 1). Over 100,000 SNPs were found in Chr 1, 2, 3, and Z while Chr 32 did not contain any unique SNPs for SL. When SNPs were grouped by the feature types of chromosome regions, ~50% of SNPs were in the intergenic (heterochromatic) regions and 13,710 SNPs were found in CDS sequences (protein coding regions) (Figure 2). Most genes containing SNPs in regulatory regions, not in CDS regions, identified by human GWAS studies were also observed to contain unique SNPs in the current SL study (data not shown). Around 60% of SL SNPs in protein coding CDS regions were synonymous mutations that did not induce amino acid changes. To identify potentially causal mutations that induce protein coding alterations, SNP analysis focused on SNPs leading to changes in amino acid sequences. Using this approach, a total of 3518 SNPs were identified that could induce non-synonymous-, frameshift-, nonsense-, and no-start-changes in the CDS region (Figure 2 and Additional file 1: Table S1), suggesting that the 3518 SNPs are part of the genetic components in functional protein coding regions that may drive the high incidence rate of SLV. Of the 3518 candidate SNPs that are associated with amino acid changes, SNPs showing ≥10 read depths (considered to be more reliable candidate genetic markers) were chosen for the further analysis. Using this approach, 195 SNPs remained (data not shown). To reduce false positives due to possible errors in the assembly process, re-scanning of each SNP position for the 195 potentially more reliable protein coding SNPs was conducted using the Seqman-Pro viewer program. This process yielded 156 more reliable SNPs that were chosen as candidate marker SNPs for further analysis (Table 2).

Table 1 Results of Illumina sequencing and assembly
Figure 1
figure 1

Number of unique SNPs per chromosome found in vitiliginous SL chickens compared to non-vitiliginous BL chickens. Numbers are indicated for bars not clearly visible.

Figure 2
figure 2

Summary of SNPs in SLV. A) Number of SNPs categorized by chromosomal region in SL chickens. B) Number of SNPs categorized by type of amino acid sequence changes.

Table 2 The 156 reliable marker SNPs that induced amino acid changes showing ≥10 read depths

SNP validation using PCR and Sanger sequencing

Since pooled DNA samples of 10 chickens for each line were used for genome sequencing, individual SNPs were subjected to the verification process with larger bird populations. For this, 14 SNPs were randomly chosen from the 156 candidate marker SNPs showing ≥10 read depths and were subjected to SNP verification analysis using PCR and Sanger sequencing to detect SNP positions with larger numbers of birds; specifically 20 non-vitiliginous BL chickens and 70 SL chickens exhibiting vitiligo. The results clearly showed differential frequencies of nucleotide bases in the 14 SNP positions between BL and SL chickens (Table 3). Thus, the 156 SNPs known to induce amino acid changes can become potential genetic biomarkers for vitiligo in SL chickens.

Table 3 Verification of 14 SNPs using PCR and Sanger sequencing in larger numbers of non-vitiliginous parental BL (20) vs. vitiliginous SL (70) chickens

Bioinformatic analyses of genes containing amino acid change SNPs

Amino acid changes may have impacts on the functional interpretations for vitiligo induction in SL chickens. The Ingenuity Pathway Analysis (IPA) program generated bioinformatics data sets including functional groups (gene ontology; GO) and gene networks for genes containing amino acid changes in SL chicken. The 156 SNPs were found in 139 genes encompassing known- and unknown functions, chromosomal open reading frames, and hypothetical proteins (Additional file 2: Table S2).

Functional roles

Genes were categorized in 76 functional groups (Additional file 3: Table S3). Of these, six functional groups are of particular interest to autoimmune vitiligo development, including dermatological diseases/conditions, inflammatory response, inflammatory disease, immunological disease, immune cell trafficking, and infectious disease (Table 4). The functional group of genes for dermatological diseases/conditions contained the following genes: ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif 13); ASPM [asp (abnormal spindle) homolog, microcephaly associated; Drosophila)]; ATP6V0A2 (ATPase, H + transporting, lysosomal V0 subunit A2); BRCA2 (breast cancer 2, early onset); COL12A1 (collagen, type XII, alpha 1); GRM5 (glutamate receptor, metabotropic 5); LRP2 (low density lipoprotein receptor-related protein 2); MKI67 (marker of proliferation Ki-67); OBSCN (obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF); PLAU (plasminogen activator, urokinase); RNF168 (ring finger protein 168, E3 ubiquitin protein ligase); STAB2 [stabilin 2 or FEEL2 (fasciclin EGF-like, laminin-type EGF-like, and link domain-containing scavenger receptor 2)]; and XIRP1 (xin actin-binding repeat containing 1). General and dermatological disease related functions for these genes are summarized in Table 5.

Table 4 Vitiligo related functions of candidate genes
Table 5 Function of candidate genes containing SNPs in CDS region related to dermatological diseases/conditions

Interestingly, a recent report by Nikolaev et al. (2012) [17] indicated that amino acid changes found in ASPM, LRP2, STAB2, and XIRP1 proteins were associated with human melanoma by exome sequencing. Melanocytes in vitiligo also exhibit morphological and biological melanocyte defects/alterations compared to melanocytes from individuals with normal pigmentation [29]. While these alterations may be different from those observed in melanoma, e.g. slower growth, and higher sensitivity to oxidative stress of cultured melanocytes [30, 31], alterations in amino acid sequences found in homolog proteins but different residues may result in opposite phenotypes of dermatological diseases/conditions [32]. In addition to these molecules, BRCA2, GRM5, MKI67, and OBSCN associated with SLV are also known to be associated with human melanoma. The relationship between candidate genes and other dermatological diseases including melanoma is summarized in Table 5.

Gene networks

Gene network analysis, which represents the intermolecular connections among interacting genes based on functional knowledge inputs, was performed on genes with amino acid changes in SLV chickens using the IPA program. The gene network analysis was carried out using the simplest setting of 35 focus molecules to facilitate and summarize the intermolecular connections (Table 6 and Figures 3, 4, 5, 6, 7, 8, and 9). A discussion of the 7 gene networks is provided below and gene information for focus molecules in each network is listed in Additional file 4: Table S4.

Table 6 Associated network functions of candidate genes
Figure 3
figure 3

Gene network #1. Molecular interactions among important focus molecules are displayed. Gray symbols show the genes found in the list of SNP while white symbols indicate neighboring genes that are functionally associated, but not included, in the gene list of SNP. Symbols for each molecule are presented according to molecular functions and type of interactions.

Figure 4
figure 4

Gene network #2. Molecular interaction and symbols are the same as the description in Figure 3.

Figure 5
figure 5

Gene network #3. Molecular interaction and symbols are the same as the description in Figure 3.

Figure 6
figure 6

Gene network #4. Molecular interaction and symbols are the same as the description in Figure 3.

Figure 7
figure 7

Gene network #5. Molecular interaction and symbols are the same as the description in Figure 3.

Figure 8
figure 8

Gene network #6. Molecular interaction and symbols are the same as the description in Figure 3.

Figure 9
figure 9

Gene network #7. Molecular interaction and symbols are the same as the description in Figure 3.

Candidate genes in Network #1 are associated with signaling pathways of the mitogen activated protein kinase (MAPK; also ERK1/2) and protein kinase C (Pkc) connected to VEGF (vascular endothelial growth factor) and PDGF (platelet derived growth factor) with PLAU in the center (Figure 3). The top functions related to network #1 are cardiovascular disease, hematological disease, and cardiac infarction. Interestingly, molecules including LRP2, PLAU, ADAMTS13, and GRM5 that are part of Network #1 were also identified as functional factors for melanoma as described above [17]. Additionally, mutations in the amino acid sequence of LRP2, altered function of MAP2K1 and MAP2K2 induced by genetic mutations in melanoma patients [17], and mutations in GRM5 in mouse melanoma models [20] were also reported. The connections in Network #1 therefore suggest genetic mutations that generated amino acid changes in LRP2, PLAU, ADAMTS13, and GRM5 may influence dermatological diseases, including vitiligo, through MAPK and ERK1/2 signaling pathways.

The top functions of Network #2 include Cell Cycle, DNA Replication, Recombination and Repair, and Developmental Disorder (Figure 4) and Network #3 is involved in Developmental Disorder, Endocrine System Disorders, Hereditary Disorder (Figure 5). Most molecules in Networks #2 and #3 directly bind to UBC (ubiquitin C). Ubiquitinylation, the covalent attachment of ubiquitin to proteins, regulates numerous cellular processes such as protein degradation and signal transduction. Recently, many ubiquitinylated proteins and their lysine ubiquitinylation site were identified using proteomic technologies in mammalian species [3337]. Indeed, in SLV, the amino acid changes in lysine residues for CEP192 (centrosomal protein 192 kDa; pK169R) and API5 (apoptosis inhibitor 5; pK1219Q) were identified (Table 2), suggesting that the various cellular functions including protein degradation by altered ubiquitinylation properties of proteins may play a significant role in the induction of vitiligo.

Molecules in Network #4 are involved in Cell Morphology, Cellular Function and Maintenance, Hair and Skin Development and Function (Figure 6). Molecules in network #4 mainly interact with IFNG (interferon gamma), IL4 (interleukin 4), MAPK8, and calcium signaling pathways in addition to protein phosphatase (PPP1CA; protein phosphatase 1 catalytic subunit alpha isozyme) functions. MYO16 (myocin 16), KIF18A (kinesin family member 18A), and CASC1 (cancer susceptibility candidate 1) are known to directly bind to PPP1CA [3840], suggesting that amino acid changes in those proteins may induce alterations (hyper vs hypo) in protein phophorylation states in SL chickens, resulting in vitiligo development. Molecules interacting with IFNG, IL4, MAPK8 and calcium signaling pathways showed an indirect relationship, not a direct relationship with each other making it difficult to explain how amino acid changes in these molecules affect vitiligo induction in SL chicken.Molecules in Network #5 also mainly bind to UBC as discussed in Networks #2 and #3 and the functions include Lipid Metabolism, Small Molecule Biochemistry, Digestive System Development and Function (Figure 7).

Network #6 contains molecules involved in Hereditary Disorder, Ophthalmic Disease, Neurological Disease (Figure 8). In this network, ZC2HC1A (zinc finger, C2HC-type containing 1A) and RUFY3 (RUN and FYVE domain containing 3) directly bind to APP [amyloide beta (A4) precursor protein]. APP is a precursor protein for beta-amyloide, which is the main constituent of amyloid plaques in the brains of Alzheimer disease patients [41]. RUFY3 (also known as single axon-related; singar1), which is a brain specific protein, regulates neuronal polarity by suppressing formation of surplus axons [42]. Though the binding of ZC2HC1A and RUFY3 to APP was found during the progression of Alzheimer disease [41], the functional roles for this binding in the progression of this disease has not been characterized. Similarly, amino acid changes found in ZC2HC1A and RUFY3 are implicated in SLV development possibly as a result of altered APP binding properties. USH2A [Usher syndrome 2A (autosomal recessive, mild)] was included in network #6. Various mutations in USH2A have been identified in patients of Usher syndrome type II, which is characterized by moderate to severe sensorineural hearing loss and progressive retinitis pigmentosa [43]. Vitiliginous SL chickens may also develop severe visual impairment and blindness due to autoimmune activity directed against choroidal melanocytes and subsequent damage to the retinal pigment epithelium. [44, 45]. Taken together, the amino acid change in USH2A also may affect vitiligo progression and retinal depigmentation.

Molecules in Network #7 mainly function in Cellular Response to Therapeutics, Cellular Assembly and Organization, DNA Replication, Recombination, and Repair. PRKDC (protein kinase, DNA-activated, catalytic polypeptide) and its interacting protein kinases are mainly involved in this network (Figure 9). In addition to knock-out and inactive mutations, alteration of autophosphorylation capability by single amino acid change of PRKDC has been known to influence rejoining of DNA double stranded breaks in mammalian cells [46], suggesting that vitiligo development may be affected by aberrant PRKDC kinase activity due to an observed amino acid change.

Vitiligo susceptible loci in human populations identified by several GWAS [510] also showed several loci that induced amino acid changes in various proteins such as STRN3 (Striatin, calmodulin binding protein 3), DNAH5 (dynein, axonemal, heavy chain 5), KIAA1005 (immunoglobulin heavy variable 3), TYR (tyrosinase), OCA2 (oculocutaneous albinism II), PTPN22 [protein tyrosine phosphatase, non-receptor type 22 (lymphoid)], IFIH1 (interferon induced with helicase C domain 1), SLA (SRC-like-adaptor), CD44, MC1R [melanocortin 1 receptor (alpha melanocyte stimulating hormone receptor)], UBASH3A (ubiquitin associated and SH3 domain containing A), C1QTNF6 (C1Q and tumor necrosis factor related protein 6), CASP7 (caspase 7), and GZMB (granzyme B). One similar mutation in UBASH3A protein coding region from human [6] was also found in the SLV chicken model (Table 2). Additionally, when the long list of 3518 candidate amino acid altering SNPs (read depth <10) was considered, several genes, including IFIH1 (interferon-induced helicase C domain-containing protein 1), CD44 antigen and DNAH5 (dynein, axonemal, heavy chain 5), matched those identified by human GWAS studies [Additional file 1: Table S1 and [5, 6]], although the amino acid position and alterations did not match. UBASH3A is one of the two family members belonging to the T cell ubiquitin ligand (TULA) family and can negatively regulate T cell signaling [47]. Together with the UBC molecule discussed elsewhere in this paper, functions for UBASH3A related to ubiquitinylation and T cell signaling pathway may be important for SLV development. IFIH1 encodes an interferon-induced RNA helicase involved in antiviral innate immune responses, associated with type 1 diabetes, Graves’ disease, multiple sclerosis, psoriasis, and perhaps lupus [4853]. CD44 encodes a cell surface glycoprotein with various functions, including a role in T cell development [54], and is associated with lupus [55]. DNAH5 gene mutation is found in patients with primary ciliary dyskinesia (PCD) [56], a rare disease transmitted as an autosomal recessive trait and characterized by recurrent airway infections due to abnormal ciliary structure and function. Primary defects in the structure and function of sensory and motile cilia result in multiple ciliopathies [57].

Conclusions

In this study, various potential genetic markers showing amino acid changes were identified in the SLV model through genome re-sequencing. When considering functionality based on the interpretation of factors involved, development of vitiligo appeared to be associated with the interactions among cytoskeletal factors (OBSCN, ASPM, XIRP1, ADAMTS13), protein kinases (MAPK, ERK1/2, PKC, PRKDC), phosphatase (PPP1CA), ubiquitinylation (UBC) and amyloid (APP) production. Further functional validation study, such as allele specific expression of the candidate genes with candidate SNPs at the target tissues involved in SLV development will be carried out using the SL chicken model for spontaneous autoimmune vitiligo.

Methods

Animals and Illumina sequencing

Adult SL chickens with vitiligo and parental non-vitiliginous BL chickens, maintained by G. Erf at the University of Arkansas (Fayetteville, AR), were selected from the breeder populations. Blood (3 ml) was collected from 12 birds each following an animal use protocol approved by the University of Arkansas Institutional Animal Care and Use Committee (IACUC; approval number: 11019). Genomic DNA was isolated from each whole blood sample using QiaAmp DNA mini kit (Qiagen, Hilden, Germany) following manufacturer’s instructions. DNA quality was determined by agarose gel electrophoresis and 10 samples having the highest quality in each line were pooled to represent each line. Library preparation and Illumina genome sequencing for the pooled DNA samples were performed by the National Center for Genome Resources (NCGR; Santa Fe, NM). Illumina HiSeq system 2x100 bp paired end read technology was used for genome sequencing.

Genome sequence assembly and data analysis

Illumina sequencing data received from NCGR was aligned to the chicken reference genome sequence for Red Jungle Fowl (GBK 4.0) that was retrieved from NCBI. For the reference based genome alignment, the NGen genome sequence assembly program of the Lasergene software package (DNAStar, Madison, WI) was used. Assembly parameters were as follows: file format, BAM; mer Size, 21; mer skip query, 2; minimum match percentage, 93; maximum gap size, 6; minimum aligned length, 35; match score, 10; mismatch penalty, 20; gap penalty, 30; SNP calculation method, diploid bayesian; minimum SNP percentage, 5; SNP confidence threshold, 10; minimum SNP count, 2; minimum base quality score, 5. After assembly, the SeqMan Pro program of the Lasergene package was used for further analyses including SNP data.

SNP detection and analysis

JMP genomics (SAS Institute, Inc., Cary, NC) program was used for filtering unique SNPs for vitiligo SL chickens. SNPs occurring in both SL and BL lines were filtered out, leaving behind unique SNPs for each line. To identify highly fixed and homozygous SNPs, the SNPs were filtered based on SNP percentages (SNP%). SNPs with a SNP% of ≥75 (%) (for example, number of SNP = 3 of read depth = 4) were chosen. The 75% cutoff for SNP selection was set by considering potential sequencing errors that can be generated by the massively parallel sequencing method. Potential causal SL SNPs that induce non-synonymous changes in CDS regions were chosen for further analysis. Since the read depth of many SL SNPs was low, unique SNPs showing ≥10 read depths were considered as reliable SNPs. Reliable and causal SNPs, which were chosen by criteria described above were confirmed by double-checking the raw assembly data with alignment view to reduce false positives.

SNP validation using PCR and Sanger sequencing

Fourteen randomly chosen SNPs, which induce amino acid changes in the CDS region, were subjected to validation using PCR and Sanger sequencing with larger numbers of SL and BL chickens. Twenty BL and 70 SL chickens that were verified phenotypically to be non-vitiliginous and vitiliginous, respectively, were used for blood sampling. Approximately 100 μL of blood was collected from each bird by wing vein puncture into tubes containing citrate (anticoagulant). Genomic DNA was isolated from whole blood using the Wizard SV 96 Genomic DNA Purification System (Promega; Madison, WI) following manufacturer’s instructions with modifications. Isolated DNA was quantified using a Nanodrop 1000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA) and a dilution of 1 ng/μL was prepared in 96 well PCR format for all samples. For PCR reaction, forward and reverse primers were designed based on the RJF genome sequence (GenBank assembly ID: GCA_000002315.2) using Primer 3 online software (Table 7). The sequencing primers were designed to anneal at least 50 bp upstream of the SNP position and forward/reverse primers were chosen at the flanking regions of the seq primer and the SNP position. All primers were commercially synthesized by Integrated DNA Technology (Ames, IA). PCR was carried out as 25 μL reaction volumes in 96 well plates with cyle conditions as follows: denaturation 95°C for 1 min, 40 cycles of amplification (95°C for 30 s, 55-63°C for 1 min, 72°C for 1 min), final extension 72°C for 10 min. Verification of PCR was performed by 1% agarose gel electrophoresis. PCR products were purified using the Wizard SV 96 PCR Clean-Up System (Promega; Madison, WI) following manufacturer’s instructions. Briefly, four plates (four different PCR products) were pooled into one plate and were subjected to PCR clean-up. Cross-specificity of seq-primers to the four pooled PCR products was examined by BLAST function (NCBI) and only products that were not cross-specific with other seq primers were pooled. Purified PCR products were subjected to Sanger sequencing performed by the University of Arkansas DNA Resources Center (Fayetteville, AR). Results were analyzed using ABI Sequence scanner software (Life Technologies, Carlsbad, CA). Ratios of bases occurring at SNP locations were recorded.

Table 7 Primers used for PCR and Sanger sequencing

Bioinformatics

Functional interpretation of 139 genes showing ≥75 SNP%, ≥10 read depths and non-synonymous changes was analyzed in the context of gene ontology and molecular networks using Ingenuity Pathways Analysis (IPA; Ingenuity Systems®; http://www.ingenuity.com). Since IPA is based on human and mouse bioinformatics, functionalities for differentially expressed genes in the chicken were interpreted based primarily on mammalian biological mechanisms. The limit of number of molecules in the network was set to 35, leaving only the most important molecules based on the number of connections for each focus gene (a subset of uploaded significant genes having direct interactions with other genes in the database) to other significant genes [58].

Availability of supporting data

All sequence reads described in the manuscript are available under BioProject accession PRJNA256208. Illumina sequence reads have been deposited at NCBI’s SRA archive under following numbers (SL: Sample: SRS670088, Experiment: SRX665272, Read: SRR1531502; BL: Sample: SRS670098, Experiment: SRX665286, Reads: SRR1531503).