Background

Crustaceans make up around 10% of the world’s aquaculture production with average growth in production of 15% per year from 1970 reaching 5 million tonnes in 2008 [1]. Rapid growth during the period 2001–2008 was due to increased production of Litopenaeus vannamei in China, Thailand, and Indonesia. Production of P. chinensis has been reduced, and no significant change in the production of P. monodon has occurred over the last 13 years, mainly because of difficulties due to disease with white spot syndrome virus in these species and the increased availability of genetically improved specific pathogen free L. vannamei post-larvae. More than 80% of shrimp exports from India are derived from aquaculture production.

One of the major worldwide problems limiting the culture of shrimp is viral disease. White spot syndrome virus (family Nimaviridae, genus Whispovirus, WSSV) is a lethal pathogen that can cause up to 100% mortality within 7–10 days on shrimp farms, and has devastated shrimp farming industries across the world (reviews by: [24]). Selective breeding has been suggested by many as a highly effective long term strategy to combat the threat of disease [5]. However, resistance to WSSV has low heritability in L. vannamei[611], and limited evidence has been found for genetic variation in resistance to WSSV in P. monodon[12, 13], especially because of the difficulty with developing a standardized challenge protocol for WSSV. Shrimp exposed to WSSV have a rapid mortality rate and cannibalism can cause secondary waves of infection. Oral infection of individual shrimp with a controlled dose of the virus, although technically difficult and labour intensive, is recommended [8]. Where genetic resistance has been detected, it has been found to be strongly negatively correlated with growth rate [10].

Shrimp have a very limited adaptive immune response [14] and lack diverse immune related molecules such as immunoglobulin, T cell receptor and major histocompatibility complex. The innate immune response of shrimp has been shown to be triggered almost instantaneously in response to peptidoglycan stimulation [15] and is believed to be the primary defence mechanism against infection in this group of species. A number of potential antimicrobial peptide coding genes have been isolated from penaeid shrimp and some such as penaeidins and crustins have been found to be differentially expressed over the time course of infection [1618]. The susceptibility of P. monodon to white spot disease has been shown to increase when penaeidin class 5 expression is suppressed by interference mediated gene silencing [19]. Shrimp surviving 84 hours post-infection have higher expression of lysozyme, C-type lectin, penaeidins, prophenoloxidase-1 and prophenoloxidase-2 in haemocytes than those dying less than 60 hours post infection [18]. Heat shock protein 21 is down regulated after infection to WSSV [20]. Shrimp lysozyme has been shown to be effective in blocking infection by WSSV in blue shrimp (Litopenaeus stylirostris) [21].

As yet there are no vaccines or other treatments available with proven efficacy against WSSV, although a number of studies have revealed promising leads. The WSSV binding proteins isolated from viral particles in the haemolymph of shrimp infected with WSSV, have been shown to inhibit the binding of this virus to haemolymph cells and improve survival of shrimp [22]. Injection of shrimp with recombinant fortilin after infection with WSSV, results in 80-100% survival and low levels of WSSV are detected, suggesting that fortilin inhibits viral replication [23]. Fortilin is highly upregulated in haemolymph during the early phase of white spot infection [24]. Injection with recombinant ferritin or lysozyme also results in protection to challenge with WSSV [21, 25]. Inoculation in feed with bacterially expressed double stranded RNA VP28 (encoding for an envelope protein found in WSSV) and vaccination with VP28 and recombinant VP292 [2629], as well as exposure to probiotics and beta-1,3/1,6-glucans [30], have been shown to provide improved survivability. Shrimp immunity to WSSV was shown to be enhanced by intramuscular injection of oligodeoxynucleotides with Cytosine-Guanine motifs and Vibrio harveyi DNA [31]. In addition, double stranded RNA of any type has been found to induce antiviral protection in shrimp [32]. Interestingly, a gene designated as PmAV was isolated using differential display from viral resistant shrimp and was shown to have antiviral activity [33].

Resistance to WSSV is a strong candidate trait for marker-assisted or genomic selection since it appears to have low heritability and has a negative correlation with another selected trait (growth). The lack of reported quantitative trait loci associated with this trait may not be due to the lack of segregating genes for resistance, but could instead be due to the highly virulent nature of WSSV, challenge testing methods that do not deliver accurate resistant phenotypes and because marker resources do not sufficiently cover the genome.

Another important factor in shrimp cultivation is sex determination. Female penaeid shrimps grow more rapidly than males and so mono-sex production of females would be advantageous for production [34]. This could also be used to provide a level of genetic protection, hindering the replication of genetically superior stock. In penaeid shrimps, females are known to be heterogametic with sex determined by a WZ-ZZ chromosomal system [3537]. However, more detailed mapping studies are needed to find closely linked markers and genes associated with sex determination. If homogametic females can be easily identified there is potential to use them as parents to yield completely sexually uniform heterogametic female offspring [38]. Although some markers associated with sex determination have been identified [38], little is known about candidate genes, mechanism or map regions associated with the sex of crustaceans.

Here we undertake the first comprehensive genome scan for QTL associated with resistance to WSSV and for the sex-determining locus in P. monodon. A new WSSV challenge testing protocol that aims to deliver more accurate disease resistant phenotypes is devised and utilised. A set of 3959 linkage mapped transcribed gene SNPs are used to genotype 1038 sexed individuals derived from 7 full-sibling families challenged-tested for WSSV.

Results

Challenge tests

Shrimp survived on average 57.2 ± 12.0 SD hrs post challenge and a spread of hours of survival was observed within families (eg. ranging between 30 and 90 hours for the upper and lower 40 percentiles genotyped from families B, F and G Figure 1). No mortality was observed in the control group injected with saline buffer.

Figure 1
figure 1

Plot of hours of survival among progeny genotyped from 7 full-sibling families (A-G).

Genetic parameters associated with white spot syndrome virus resistance

Neither sex nor time of challenge (family) had significant effects on time to death in the model (Table 1, all 95% confidence intervals overlap zero).

Table 1 Summary of MCMCglmm analysis under an animal model of days survival after WSSV experimental challenge

Linkage disequilibrium

Mean and median values of LD, measured as r2 between adjacent markers for the 3961 genome-wide distributed SNPs used in this study, were 0.35 and 0.30, respectively.

QTL for WSSV resistance – GWAS and interval mapping analysis

The quality control steps excluded all markers with non-Mendelian inheritance and all individuals unassigned with parentage analysis, leaving 3959 markers and 1038 individuals for analysis. For the FASTA and GRAMMAS GWAS analysis the additional quality control steps excluded 135 markers and 17 individuals with a call rate of less than 95%, 5 individuals with high autosomal heterozygosity (FDR <1%) and 4 individuals with identity by descent ≥0.95. No markers or individuals with a call rate less than 0.1 and minor allele frequency <0.24% were detected. After the quality control, 3824 markers and 1019 individuals were selected for the FASTA and GRAMMAS analysis.

Ten significant QTL for WSSV resistance (hours of survival post-WSSV infection, P < 0.05 after Bonferroni correction) were detected on linkage groups 2, 3, 5, 6, 17, 18, 19, 22, 27 and 43 (Table 2, Figures 2 and 3). Eight SNPs (51997_2402, 41442_21, 45605_1545, 29124_228, 44821_270, 50096_1789, 18472_352, 27976_64 on linkage groups 2, 3, 5, 6, 18, 19, 22, and 43 respectively) showed significant genome wide associations, and three regions (between SNPs 50756_3741 and 46539 on LG6, SNPs 25133_74 and 36717_243 on LG17 and SNPs 18687_338 and 3729_523 on LG27) showed significant linkage with hours of survival. These SNPs occurred in transcripts for genes encoding runt protein 1a, flagellar hook-length control protein, ubiquitin domain-containing protein ubfd1, paired-like homeodomain transcription factor 3, ankyrinn repeat and many unannotated genes. Box plots of hours of survival post-WSSV infection for individuals with alternative genotypes for two informative SNPs in the vicinity of the QTL detected on linkage group 17, and for GWAS significant SNPs on linkage groups 18 and 22 (Figure 4), show patterns indicating additive gene effects for these QTL.

Table 2 Suggestive and significant QTL for trait hours of survival after WSSV challenge detected using PLINK (QFAM total) and GenAbel (FASTA and GRAMMAS) analyses in 7 P. monodon families
Figure 2
figure 2

Manhattan (A, C and E) and QQ plots (B, D and F) for GWAS analyses showing corrected P -values with 1 degrees of freedom after permutation testing for SNPs across the 44 linkage groups for trait hours of survival for tests QFAM (A and B), FASTA (C and D) and GRAMMAS (E and F). Linkage positions are shown in centimorgons (cM) on the horizontal axis. Upper and lower dotted lines mark significance thresholds after Bonferroni correction of P < 0.01 and P < 0.05 respectively.

Figure 3
figure 3

GridQTL interval mapping F-test statistic plots for trait hours of survival across all linkage groups (A) and across LG17 (B). Upper and lower dotted lines mark significance thresholds after permutation testing of P < 0.01 (genome-wide significance after Bonferroni correction) and P < 0.05 (chromosome-wide significance) respectively (plot B). Chromosome-wide significance was detected on linkage groups 5, 6 and 27 while genome-wide significance was detected on LG17.

Figure 4
figure 4

Box plot showing hours of survival post-WSSV infection for genotypes detected at SNP loci 51874_459 and 52129_570 positioned at 51 and 50 cM respectively on linkage group 17 (mapping closely to the predicted QTL location at 54 cM) and across families at GWAS significant ( P< 0.01) SNP loci 44821_270 on linkage group 18 and 18472_352 on linkage group 22. Data is presented for family D in which both parents were heterozygous for the QTL on LG17.

Some of the SNPs associated with QTL were found to map within or close to genes with putative immune functions of interest (Tables 2 and 3, Additional file 1). For example, the SNP marking a QTL at position 61.8 cM on linkage group 2 (51997_2402, P < 0.05 after Bonferroni correction for the QFAM test), occurred in a transcript that shared high homology to a gene encoding runt protein 1a in the signal crayfish Pacifastacus leniusculus. SNP 24034_664 at 47.3 cM on LG 2 in a transcript with homology to the proteasome (macropain) 26 s gene maps in the middle of a broad 41 cM region containing several SNPs in transcripts showing suggestive and significant associations with hours of survival after WSSV infection (including SNP 51997_2402, P < 0.05 after Bonferroni correction, at 61.8 cM). Variation at SNP 45605_1545 in a transcript with homology to a gene encoding ubiquitin domain-containing protein ubfd1 on LG 5 was associated with hours of survival (FASTA P < 0.05 after Bonferroni correction). The SNP 40050_2030 occurs in a transcript with homology to a gene encoding 26 s proteasome subunit s9 and maps 4.4 cM from SNP 33044_1018 (suggestive association on LG6) and 0.6 cM from a predicted QTL (GridQTL, position 39 cM, P < 0.05 chromosome-wide significance). The SNP 49912_5110 which occurs in a transcript with homology to the mitogen activated protein kinase gene, mapped 2.3 cM from the QTL position detected by GridQTL analysis on LG17 (P < 0.01 genome-wide significance). The SNP 52376_14757 occurs in a transcript that is homologous to the hect e3 ubiquitin gene and maps 2.4 and 3.9 cM from SNPs 51029_2543 (suggestive association) and 50096_1789 (significant association P < 0.05 after Bonferroni for the QFAM test) at 70.9 and 82.4 cM respectively along LG19. SNPs 48349_91, which occurs in a transcript with homology to proteasome (macropain) 26 s non- 2 gene, and 38683_977, which occurs in a transcript with homology to a gene encoding ubiquitin conjugating enzyme 7 interacting protein, map 12.8 and 0.6 cM respectively from SNP 50096_1789 (P < 0.05 after Bonferroni correction, test QFAM) at 82.4 cM on LG19. Three genes encoding proteins with putative immune function map near to SNP 18472_352 (P < 0.01 after Bonferroni correction, QFAM test) at 27.9 cM on LG22, SNP 52279_11861 which also maps to 27.9 cM on LG 22 and which occurs in a transcript with homology to the serine-threonine protein kinase gene, SNP 42578_2554 which occurs in a transcript with homology to a gene encoding mitogen-activated protein kinase-binding protein 1 which is 1.9 cM distant and SNP 50961_705 which occurs in a transcript showing homology to a gene encoding IGF2 mRNA binding protein and is 2.4 cM distant.

Table 3 SNPs with homology to genes of putative immune function mapping near to QTL regions

Association with sex on LG30

In all, 15 SNP markers were significantly associated with sex, (5 at P < 0.01 and 10 at P < 0.001 significance levels after Bonferroni correction, Additional file 2, Figure 5A and B). All significant associations mapped to a broad 43 cM interval of LG30 between positions 21.7 and 64.7 cM. The three markers with the strongest association mapped to an interval of 0.8 cM (positions 43.5 - 44.3 cM, SNPs 49245_2916, 49087_997 and 49482_526). Most significant was SNP 49245_2916 (P = 1.9E-49) which occurs in a gene encoding G7-c-like protein and von Williebrand factor A domain-containing protein 7 (Additional file 2). The sex locus was predicted to map to 45 cM on LG30 by the GridQTL interval mapping analysis (P < 0.001 genome-wide significance, Figure 5B).

Figure 5
figure 5

GridQTL interval mapping F-statistic plots over all linkage groups (A) and on LG30 (B) for the trait sex , and genotype frequency differences between male (C) and female (D) P. monodon for SNP 49245_2916 located at 43.5 cM on LG 30 which was found to be significantly associated with sex ( P <0.001 after Bonferroni correction).

The pattern of segregation of this locus to male and female offspring fits what would be expected for a locus associated with sex determination, assuming that female P. monodon are the heterogametic sex (Figure 5C and D). Eighty-seven percent of males (out of 483 genotyped) were homozygous AA for SNP 49245_2916 across the families (the allele frequency of A and G alleles was 0.93 and 0.07 respectively, n = 966) whereas ninety percent of females (541 genotyped) were heterozygous AG. Of the males that were not AA, 13% were AG, and less than 1% were GG genotypes. The GG males were only detected in one family (3/74 individuals in family 4 which also contained a high proportion, 30/74, of AG males). Most other families contained a low proportion of AG males, except two families (2 and 6) where all males were AA genotypes. Of the females that were not AG, 5% were AA and 5% were GG. The GG females were only detected in one particular full-sibling family (family 4 with 26/64 female genotypes recorded as GG). All families contained low numbers of AA females, except family 5 in which 56/56 females were AG. Also mapping to this region (at 44.3 cM) is SNP 43522_2279 which occurs in a transcript with homology to the feminisation-1 gene (fem-1 homolog c) in the nematode Caenorhabditis elegans (69% homology, contig length 3820 bases, Additional file 1 and Additional file 2).

Discussion

Invertebrates rely on innate immune systems to recognise and respond to foreign agents. Resistance to disease is a complex quantitative trait that is likely to be regulated by the additive effects of many genes, epigenetics and by the environment. In contrast, sex, which is measured as a binomial qualitative trait, is likely to be determined by the action of a few genes mapping to a specific area of one linkage group. Variation affecting disease resistance or sex could act by changing the regulation of gene expression or by leading to modifications of the protein product and consequent function. The SNPs developed for this study were detected among shrimp sourced from the east coast of India and Andaman Islands [39]. In developing SNPs we included RNA from three individuals that had survived a severe WSSV outbreak on a farm in Bapatla. These surviving shrimp represented only 0.2% of the total post-larvae that were stocked for culture. They were later transferred to secure tank facilities where they lived for more than four months. These shrimp were found to be positive for WSSV using a nested PCR test. These survivors were included in the present study to improve the chance of detecting SNP variants that are associated with resistance to WSSV. All the SNPs used in the study occur in transcribed genes (ie. cSNPs).

The challenge test experiment used in this study which lodged shrimp in individual baskets was designed so that all shrimp could be collected and sampled within 1 hour of death and to prevent secondary infection (transmitted with cannibalism). Although the time from infection with WSSV to death is rapid, a controlled route of infection and dosage was chosen to prolong the overall time frame of the experiment as much as possible and to give a spread of hours of survival. Large full-sibling families (146 offspring per family on average) and frequent observation were also employed to give a strong power for detecting QTL.

Both the linkage and GWAS analyses detected significant QTL associated with hours of survival after WSSV infection. For three of the four QTL detected by linkage analysis, closely mapping SNPs with suggestive associations were detected by GWAS analysis (on linkage groups 5, 6 and 27, Table 2). Fewer QTL were detected using linkage analysis than using GWAS. While linkage analysis relies on the segregation of alleles within families, GWAS correlates the occurrence of SNP alleles with phenotypes across the population. Comparison of linkage analysis and GWAS has shown that GWAS, where all SNPs are fitted simultaneously as random effects, has greater power to discriminate linked QTL [40], especially those of limited or modest sized effects [41]. The sensitivity of linkage analysis is affected by the number of parents that are segregating for the QTL and neighbouring SNP loci and by the extent of linkage among SNPs mapping in the vicinity of the QTL. The sensitivity of GWAS depends on the existence of linkage disequilibrium between the QTL and single SNP loci (which, to some extent, is dependent on the number of SNPs tested) and on the existence of SNPs sharing a similar allele conformation to that of the QTL. It has been found by other studies that the two types of analyses generally yield inconsistent results, but can agree if the differences between the two methods (caused by differences in the precision for mapping QTL location, ability to account for multiple linked QTL and due to over estimation of what are sometimes modelled as fixed SNP effects), are accounted for [40].

For the GWAS analyses, the GRAMMAS and FASTA results were often in agreement, while the results of QFAM analysis were less often in agreement with GRAMMAS or FASTA. For instance, SNP 18472_352 on LG22 was found to be associated with hours survival by the QFAM test (P < 0.01 after Bonferroni correction), but was found to be suggestively associated with the trait by the GRAMMAS and FASTA tests. Similarly, SNP 51997_2402 on LG2 was associated with hours survival for the QFAM test (P < 0.05 after Bonferroni) and a closely mapping SNP was suggestively associated using the FASTA test. No agreement for the significant association detected by QFAM at position 82.4 cM was found by GRAMMAS or FASTA tests across LG19. Whereas, significant associations detected on linkage groups 3, 5, 18 and 43 by GRAMMAS or FASTA were supported by corresponding suggestive or significant associations by FASTA or GRAMMAS respectively for the same SNP. FASTA and GRAMMAS, which use genomic control to infer genetic relations from genomic data, and thereby account for the true genealogy (population structure and all levels of relationships), are thought to be superior to methods such as QFAM, which makes use of the observed genealogy (observed parent-offspring relationships in our study) [42].

Candidate genes mapping to QTL regions

Several of the SNPs directly associated, or closely linked to WSSV resistance QTL, were found to occur in transcripts that share homology to genes with putative immune functions. Some of the genes, such as heat shock protein 21, c-type lectin and serine-threonine specific protein kinase, have been implicated in affecting the WSSV resistance of crustaceans in other studies [18, 20, 4345]. Some are components of gene pathways, such as the ubiquitination pathway, which have been found to affect the pathogenesis of WSSV [46, 47].

The ubiquitin proteasome pathway

The ubiquitin proteasome pathway has been shown to play an important role in immune defence and more specifically proteasome I is presumed to be involved in intracellular antibody-mediated proteolysis of antibody-bound viruses [48]. Six SNPs in transcripts with homology to proteasome encoding genes of interest were either directly or closely mapped to QTL for WSSV resistance (Table 3), including SNP 24034_664 in a transcript with homology to the proteasome (macropain) gene which was 14.1 cM from SNP 51997_2402 (P < 0.05 after Bonferroni correction, LG2), SNP 23272_344 in a transcript with homology to the 26 s protease regulatory subunit gene (suggestive association), SNP 40050_2030 in a transcript with homology to the 26 s proteasome subunit s9 gene which maps 0.6 cM from a QTL position predicted by linkage analysis (P < 0.05 chromosome-wide significance on LG6), SNP 17687_140 in a transcript with homology to the proteasome subunit alpha type-7 gene which was 0.5 cM from SNP 32667_1134 (suggestive association with hours of survival on LG15), SNP 48349_91 in a transcript with homology to the proteasome (macropain) 26 s non-2 gene which maps 12.7 cM from SNP 50096_1789 (P < 0.05 after Bonferroni correction, LG19) and SNP 49666_3836 in a transcript with homology to the 26 s proteasome non-atpase regulatory subunit 11-like gene which maps to the same position as SNP 43412_2186 (suggestive association at 44 cM on LG29).

Modulation of the host ubiquitin proteasome pathways by viral proteins is thought to affect viral pathogenesis, and four proteins have been identified in the WSSV (WSSV199, WSSV222, WSSV249 and WSSV403) [4952] which interact with the P. monodon ubiquitination pathway (eg. with conjugating enzyme (E2) in shrimp) and act as viral E3 ubiquitin protein ligases to inhibit apoptosis and affect viral pathogenesis [46, 47]. Injection of recombinant Fenneropenaeus chinensis ubiquitin-conjugating enzyme E2 has been shown to reduce the mortality of shrimp challenged with WSSV, inhibit replication of WSSV and bind to (and ubiquitinate) WSSV RING domain-containing proteins [53], and ubiquitin C expression is up-regulated when F. chinensis are challenged by WSSV [54]. It follows that variation in the structure or expression of E3 ubiquitin-protein ligase, ubiquitin conjugating enzyme (E2) or other enzymes involved in the ubiquitin proteasome pathway, could be important in affecting the resistance or susceptibility of P. monodon to WSSV. Variation in a SNP in a transcript with homology to the ubiquitin domain-containing protein ubfd1 gene (45605_1545) mapping to 21.2 cM along linkage group 5 was found to be associated with WSSV resistance in this study (P < 0.05 after Bonferroni correction for the FASTA test). The SNPs in nine other transcripts with homology to genes involved in the ubiquitin proteasome pathway (two forms of e3 ubiquitin-protein ligase, two forms of hect e3 ubiquitin, ubiquitin carboxyl-terminal hydrolase 47, ubiquitin-conjugating enzyme e2 c, ubiquitin-conjugating factor e4, ubiquitin-conjugating enzyme 7 interacting protein and ubiquitin protein ligase, Table 3) were all found to show suggestive associations or to map closely to other SNPs significantly or suggestively associated with hours of survival after WSSV challenge in this study.

Lymphocyte function and heat shock proteins

Interleukin enhancer-binding factor 2 is a transcription factor required for expression of the interleukin 2 gene which regulates the activity of lymphocytes responsible for immunity [55]. A SNP in a transcript with homology to the gene coding for this factor was found to map 3 cM from a SNP (3927_708) with suggestive association to hours of survival on LG17 (Table 3).

Heat shock proteins act as intercellular signalling molecules for the regulation of the immune response of many organisms, particularly with regard to lymphocyte mediated responses [56]. The Hsp70-Hsp90 organizing protein (Hop, SNP 45405_1355 at 26.7 on LG17, Table 3) is a co-chaperone that reversibly links HSP70 and HSP90, moderating chaperone activity. The expression of HSP70 and HSP90 increases in hemocyte and lymphoid organs when crustaceans (Marsupenaeus japonicus and Procambarus clarkii) are challenged with WSSV [43, 44]. HSP21 is normally highly expressed in P. monodon tissues, but is down-regulated following infection with WSSV [20].

The TOLL pathway

Nuclear factor kappa-light-chain-enhancer of activated B cells (NF-kB, SNP 51361_1388 at 4.7 cM on LG 25, Table 3) is a rapid acting primary transcription factor which regulates the innate and adaptive immune cellular response to viral and other forms of infection. When pattern recognition toll-like receptors in T- or B-cells are activated, NF-kB enters the nucleus and up-regulates genes involved in development, maturation and proliferation (eg. type I interferon response genes). Large precursor molecules of NF-kB (p105 and p100) are processed by the ubiquitin/proteasome pathway which involves the degradation of ankyrin repeat c-terminal regions.

“Inappropriate” activation of NFKB has been linked to AIDS, whereas inhibition has been linked to disorders in immune cell development. The stimulation of activator protein 1 activity by mitogen-activated protein kinases is thought to elicit stress responses and promote cell survival and death in response to viral infection [57].

Mitogen activated protein kinases

Protein kinase signal transduction pathways, including mitogen-activated protein kinases, have been shown to have important roles in the regulation of cytokine gene expression [5860], particularly interleukin-1, which is a potent inflammatory cytokine regulating host defence and immune responses [61]. Mitogen activated protein kinases (MAP kinases) are involved in directing cellular responses to a range of stimuli including viral infection. Extracellular signal-regulated kinase is a type of serine-threonine specific protein kinase that is activated by WSSV in the early stage of infection, and when silenced or inhibited, reduces WSSV proliferation, and delays viral early gene transcription, in L. vannamei[45]. The SNPs in transcripts with homology to mitogen activated protein kinase, mitogen-activated protein kinase organising factor 1, map kinase-activated protein kinase 2-like isoform, serine-threonine protein kinase, interleukin enhancer binding factor and mitogen-activated protein kinase-binding protein 1 were found to map near to SNPs showing suggestive and significant (LG17 GridQTL P < 0.01 genome-wide significance) associations with days survival on linkage groups 11, 16, 17, 22 and 28 (Table 3 and Figures 2 and 3).

The mRNA binding proteins, such as IGF2 mRNA binding protein (gene mapping 2.4 cM from SNP 18472_352, P < 0.01 after Bonferroni for the QFAM test, Table 3), play an important role in stabilizing mRNAs during cellular stress [62].

Lectin

Lectins are non-self-recognition factors thought to be involved in immune recognition and microorganism phagocytosis through opsonisation in crustaceans [63]. A SNP in a transcript with homology to C-type lectin (45153_220) maps 1.8 cM from SNP 27976_64 on LG43 (P < 0.05 after Bonferroni correction for the GRAMMAS test, Table 3). Tiger shrimp surviving more than 84 hrs post WSSV infection have been observed to have higher haemocyte expression of c-type lectin [18]. WSSV infected L. vannamei that are pre-challenged with WSSV shower higher haemocyte expression of c-type lectin than previously naïve individuals [17]. Lectin is also more highly expressed in the hepatopancreas of resistant L. vannamei[64], and in the haemocytes and hepatopancreas of resistant M. japonicus[65, 66], than more susceptible individuals. C-type lectin-like domains have been detected in other genes such as PmAV, which are believed to be involved in conferring viral resistance in P. monodon[33].

Runt protein

The runt protein is up-regulated prior to haemocyte release and is known to be involved in haematopoiesis [67]. The RUNT-related transcription factors (eg. RUNX3/p33) play important roles in the development and differentiation of the immune system [68] and mutations in this gene are known to be associated with greater susceptibility to autoimmune disorders [69]. The expression of RUNT domain protein is 40% lower in Norwegian lobsters (Nephrops norvegicus) that are immunologically suppressed by high levels of manganese [70]. A SNP associated with WSSV resistance on LG2 (51997_2402, P < 0.05 after Bonferroni correction for the QFAM test, Table 2), occurred in a transcript which shared high homology to runt protein 1a in the signal crayfish Pacifastacus leniusculus.

Detection of markers associated with sex

Although sex determination is a simply inherited binary trait in most organisms, the precise genetic processes affecting sex determination have been found to be complex and diverse. SNP 43522_2279 occurs in a transcript for a gene that shares homology with Feminization-1 (Fem-1), a known signal transducing regulator affecting sex determination in the nematode Caenorhabditis elegans[71, 72]. This gene maps to the same position (at 44.3 cM on LG30) as SNP 49482_526, is 0.7 cM from the position of the sex determining locus predicted by GridQTL and is sandwiched 0.5 cM from SNPs 48571_1638 and 46782_1391, and 0.8 cM from SNPs 49245_2916 and 49087_997, all of which are SNPS found to be significantly associated with sex (P < 0.001 after Bonferroni correction, Additional file 2, Figure 5B). FEM-1, FEM-2 and FEM-3 form a CUL-2-based ubiquitin ligase complex which promotes proteolysis of the male-repressing transcription factor TRA-1, which is a regulator of sex determination by ubiquitin-mediated proteolysis [73, 74]. FEM-1 is the substrate recognition subunit in the complex, while FEM-2 and FEM-3 act as cofactors [74]. Maternal FEM-1 transcripts have been shown to prevent epigenetic silencing of FEM-1, which is believed to help protect the identity and integrity of the germ line [75]. Comparative mapping was unable to verify whether this is the same region containing the AFLP marker developed by [38] for sexing P. monodon.

For SNP marker 49245_2916, which showed the strongest association with sex, most males were AA genotypes while most females were AG genotypes. However, for family 4 there was a high proportion of AG male offspring (30/74) and high proportion of GG female offspring (26/74). In this instance the male parent had marker genotype AG (but sex locus genotype ZZ) such that ZZ males were either genotypes AA or AG, and WZ females were either genotypes AG or GG, at the marker locus. Possible explanations for other discrepancies (eg. the low frequency occurrence of AA and GG females in other families) are that recombination between the marker and sex determining locus occurred, that more than one gene in this linkage group effects sex determination, that environmental conditions during development are also influencing sex determination and/or that some homogametic females naturally occur. These discrepancies highlight that use of a single SNP marker locus for identifying sex will not be possible until the causative mutations for sex determination are identified.

In summary, indications are that the markers identified by this study, could be useful for the purpose of identifying homogametic females. Detailed studies of mutations and phenotypes in candidate genes mapping in this region of linkage group 30, could lead us to a better understanding of the genetic mechanisms affecting sexual dimorphism in P. monodon. In other invertebrates such as C. elegans there are a diversity of molecules and control networks involved in sex determination [71]. The models for sex determination developed for C. elegans and other invertebrates such as Drosophila melanogaster will be informative.

Application to marker assisted or genomic selection

Further research is needed to predict the most effective means of using the markers identified here to assist the genetic improvement of WSSV resistance. Consideration needs to be given to the overall goals of the breeding programs to which marker information is applied. In 2001, Meuwissen et al.[76] devised a method for the prediction of total genetic value using genome-wide dense marker maps, without phenotypic information, which has otherwise become known as genomic selection (GS). With the development of new low-cost fully-automated genotyping technologies, use of genome wide dense marker information is becoming more feasible for many species, especially for traits where direct measurement of the performance of individuals is problematic, such as disease resistance. GS uses information about genome-wide marker associations to estimate the breeding value of candidate individuals.

Validation using a population of tested and genotyped training individuals is necessary to estimate the effects at each genomic interval for GS. Effects estimated at numerous evenly spaced loci across the genome, including the QTL marker loci identified in this study, could be used to calculate genomic estimated breeding values for genomic selection. The weighting placed on each marker in the overall breeding value would depend on the relative allele substitution effect, and standard error, for each QTL (as shown in the effect column of Table 2) and on the emphasis placed on marker and/or phenotypic information for other traits included in the selection index. Estimation of these allele substitution effects differs, depending on the method and training populations used for their calculation. Linkage analysis within families tended to estimate higher allele substitution effects than GWAS across families (Table 2). Over-estimation of the size of the QTL effect was expected [77], particularly as selective genotyping was used in this study. Selective genotyping using sparse markers has been predicted to be effective for GS [78]. Higher emphasis for GS might be given to individuals inheriting favourable alleles at SNP marker loci such as 25133_74 where the estimate of the allele substitution effect is relatively large.

Conclusions

From evidence in the available literature, genes affecting the action of the ubiquitin-proteasome pathway, lymphocyte-cell function, heat shock protein function, the TOLL pathway, protein kinase signal transduction pathways, mRNA-binding proteins, lectins and the development and differentiation of the immune system (eg. RUNT protein 1A), which were found in this study to closely map to SNPs on linkage groups 1, 2, 5, 6, 9, 11, 15, 17, 19, 21, 22, 24, 25, 28, 29, 32, and 43 suggestively or significantly associated with QTL affecting WSSV resistance in P. monodon, are all candidate genes that could be involved in controlling the immune response to this viral disease in this species. Sex is associated with the segregation of a number of SNPs mapping to linkage group 30. The strongest association with sex occurred for 3 SNPs mapping to a 0.8 cM stretch between positions 43.5 and 44.3 cM where the feminisation gene (FEM-1 in C. elegans) was positioned (44.3 cM). Interval mapping predicted that the QTL was positioned at 45 cM. The feminisation gene is known to be an important component of the CUL-2-based ubiquitin ligase complex and this complex is known to be involved in the control of sex determination in nematodes by promoting proteolysis of the male-repressing transcription factor TRA-1. Future efforts to identify the causative genes affecting these traits should focus on the fine mapping of genes in these regions and mutation experiments to elucidate function. This has been an effective strategy for livestock such as dairy cattle where genes affecting musculature [79] and milk composition [80, 81] have been identified. In the meantime, markers found to be associated with WSSV resistance could be applied to supplement genetic evaluations made by selective breeding programs for P. monodon (eg. run by Moana in Hawaii) and the efficacy of marker assisted selection for improving resistance to WSSV should be further evaluated in this and closely related species such as L. vannamei.

Methods

Shrimp sourced for challenge test experiments

Adult males and non-gravid female tiger shrimp from the wild were procured from the East coast of India and kept in the quarantine facility of the Muttukadu Experimental Station (MES) of Central Institute of Brackishwater Aquaculture, 35 km south of Chennai. These shrimp were checked for the presence of WSSV using a simple method to isolate the virus [82] and a WSSV detection kit (Bangalore Genei). The adults that were clear of WSSV, were eye-ring tagged and shifted to the maturation facility of the Crustacean Culture Division of MES for breeding trials. Two females and a male were placed together for mating in one tonne fibre re-inforced plastic (FRP) tanks. The shrimp were fed on a diet consisting of squid and polychaete worms which facilitates maturation. From maturation trials, seven full-sib families were produced. The shrimp from these families were cultured in separate hapas in a pond to an injectable size of about 3 to 5 g in order to retain family identity. At this stage, approximately 200 juveniles were randomly collected from the hapas and transferred to the challenge test facility where they were introduced into a 4 t concrete cement tank. The shrimp were allowed to de-stress for a couple of days to overcome the transportation stress. From each lot of 200 shrimps, a sample of ten shrimp were collected at random and tested using the WSSV detection kit.

WSSV challenge experiment

A custom-made experimental facility, for preventing cannibalism, was fabricated for challenge studies to achieve recovery of all challenged shrimp. This facility consisted of multiple plastic baskets that were anchored to a support and lodged side-by-side at the same depth (just below the water surface) in a cement tank. Only one shrimp was housed in each basket during the experiment. Each basket had a lid for ease of placing or removing shrimp. The base of each basket had plastic wire mesh stitched to the sides such that feed pellets could be retained and faecal matter could easily pass through.

The muscle tissue from juvenile shrimp that were fed with WSSV-infected shrimp meat were used for extraction of WSSV virus following the protocol of [82]. The virus stock concentration was established as 1.04 X 106 copies per μl in a real-time standard curve experiment. Trials were undertaken to compare intramuscular and oral routes of challenge and it was observed that intramuscular injection gave consistent results compared to the venocatch method. Consequent to this finding, all the experimental shrimp were challenged with the WSSV virus following the intramuscular method. The shrimp were injected intramuscularly with 100 μl of 10-5 dilution of virus stock using 1 mL tuberculin syringe. The virus was injected into the muscle tissue between the third and fourth abdominal segments on the lateral side. Extra care was exercised to avoid physical injury to the intestine and aorta running along the dorsal side and nerve cord running along the ventral side of the abdomen. After injection, the shrimp were retained in a 4 tonne cement tank for 6 hours to de-stress and to observe any mortality due to physical injury. De-stressed shrimp were then placed in individual baskets and monitored at hourly intervals for mortality. Simultaneously, twenty juvenile shrimp were injected with 100 μl of TNE (Tris–HCl-NaCl-EDTA) buffer solution and kept in a 100 L FRP tank. Care was taken to inject these shrimp first before challenging the test animals to avoid contamination. These shrimp served as a control and were kept under constant observation until the actual challenge experiment was completed. Each family was challenged on separate occasions. Care was taken to maintain uniform conditions for all individuals and families that were challenged. The salinity of the water, the weight of shrimp, the viral dose and the distribution of shrimp in baskets were similar for all the families.

Continuous aeration was provided for the experimental and control tanks. The animals were checked for mortality on an hourly basis. Water temperature was recorded on an hourly basis and pH and salinity was recorded once every morning. The water in the experimental and control tanks were exchanged daily (at 50%) when faecal matter and unused feed at the bottom of the tank was siphoned out in the process. Fresh seawater was provided after removing the debris at the bottom. The cleaning process was carried out daily until the last shrimp died.

When the challenged shrimp started dying, survival data (time to death) along with sex and wet weight of each shrimp were recorded. The dead shrimp were removed and stored at -80°C for DNA extraction.

SNP markers and genotyping

Parents, along with the most susceptible and resistant 40 percentiles of progeny (based on hours of survival post-WSSV infection), were selected from each family for genotyping to find QTL. In all, 1024 offspring belonging to 7 full-sibling families that were challenge tested as described above, were successfully genotyped. Genomic DNA was extracted from the challenged shrimps using the Phenol Chloroform method as described by [83] with slight modifications. The quality of extracted DNA was checked on 2% agarose gel in 1X TBE buffer after electrophoresing at 50 V for an hour. The purity of DNA was checked using OD values at 260 and 280 nm. Quantification was achieved using OD value at 260 nm in Nanodrop 2000C (Thermo Scientific). The DNA of the experimental shrimp was extracted, dissolved in TE (Tris-EDTA) buffer, stored carefully in eppendorf tubes and transported in dry ice to Nofima, Norway for genotyping.

Genotyping was performed with 6 K custom developed Illumina Infinium iSelect Beadchips containing 6 K SNPs from P. monodon transcribed genes [39]. The SNPs were identified by two numbers separated by an underscore, where the first number identified the contig containing the SNP, and the second number was the SNP position in base numbers along the contig length. The same set of SNP genotypes and families used to detect QTL in this paper were previously used to construct a linkage map for P. monodon[39]. The sex averaged map consisted of 3961 informative SNPs which were assigned to 44 linkage groups. We used the map distances for the SNPs on the sex averaged map for the QTL analysis described below. The parentage of the challenge tested animals was checked when the linkage map was created [39].

Genetic parameters, significance of fixed effects and correlation of traits

An animal model was applied to estimate genetic parameters (without accounting for SNP genotype). The animal model decomposed the phenotypic variance into additive genetic and environmental components. Our main interest was whether sex and/or time of challenge (family) should be included as fixed effects in the QTL analysis and whether weight should be included as a covariate. A Markov chain Monte Carlo (MCMC) method using a multi-trait generalised linear mixed effect model (glmm) in a Bayesian estimation framework, with animal breeding value and ID fitted as a random effects, was used for the analysis (R Package, MCMCglmm, [84], http://www.cran.r-project.org). The ID was the same as the animal factor, but was used by MCMCglmm to dissociate individual records from the pedigree and give an indication of between individual variance [85]. The model fitted was,

y=mu+sex+family+animal+ID

where y was time to death, sex and family were fitted as fixed effects, animal and ID were random animal effects and mu represented unknown random residual effects. A bivariate model (similar to the above) was used to obtain covariance components, and the genetic correlation between weight and time to death was estimated as,

r g = σ A 1 A 2 σ A 1 2 σ A 2 2

where σA1A2 is the estimated additive genetic covariance component between the two traits.

The model was run using 300,000 iterations as burn-in, 1 million iterations for sampling and a thinning interval of 500. A “plausible” prior assuming weak genetic control (additive genetic variance, permanent environmental variance and residual variance accounting for 0.2, 0.1 and 0.7) was used with the smallest possible degree of belief parameter (n = 1).

Linkage disequilibrium

Linkage disequilibrium measured by r2 was calculated for all adjacent SNP pairs with the PLINK software package (Purcell et al., 2007).

QTL for WSSV resistance – linkage analysis

Data were analysed using a regression-interval mapping method available through the web-based software GridQTL [86]. The sib-pair model was utilised in order to take advantage of the full-sib nature of the animal pedigree. Sex was included as a fixed effect, and weight included as a covariate in the model. P-values were calculated for all trait-by-LG combinations with the significance of the peak F-statistic (putative QTL) estimated after 10,000 chromosome-wide permutation tests. A QTL was found to be genome-wide significant if the chromosome-wide significance level was smaller than 0.0011 (0.05/44), a Bonferroni correction based on the number of linkage groups in P. monodon. This correction was equivalent to a Benjamini Hochberg [87] false discovery rate of >95% (q-value of 0.98), such that it was expected that more than 95% of the significant results actually were false positives. QTL were denoted as “suggestive” when P < 0.01 (before Bonferroni correction).

QTL for WSSV resistance - GWAS

QTL GWAS analyses were performed in several ways. First we determined which markers and individuals should be excluded from the GWAS analysis using the check.marker function in GenABEL (http://www.genabel.org). This function was used to exclude individuals or markers with call rate <95%, markers with minor allele frequency <0.24%, individuals with high autosomal heterozygosity (FDR <1%) and individuals with identity by state ≥0.95. Genomic kingship was computed between all pairs of individuals. We performed a pedigree based association analysis where the pedigree is a confounder (where the heritable trait is more similar between close relatives and therefore some degree of association is expected between any genetic marker and any heritable trait). The effect of the confounding pedigree is expected to inflate the resulting null distribution of the chi square test statistic by a certain constant, lambda. Lambda is a function of the traits heritability and pedigree structure (expressed as a kinship matrix). Two fast tests for genome wide association were applied, Family-based Score Test for Association (FASTA, [88]) and Genome-wide Rapid Analysis using Mixed Models And Score test (GRAMMAS, [42]) using the R package GenABEL. A mixed polygenic model of inheritance was assumed in order to study association in our genetically homogeneous families where hours of survival (y) was modelled as

y=µ+G+e

where μ was the intercept, G describes the polygenetic effect (contribution from multiple independently segregating genes all having a small additive effect on the trait) and e describes the random residual effects. The joint distribution of residuals in the pedigree was modelled using a multivariate normal distribution with variance-covariance matrix proportional to the identity matrix. A genomic kingship matrix, generated by calculating the average identity-by-state between individuals in the pedigree (ibs in GenABEL), was used as the relationship matrix for FASTA and GRAMMAS. Both FASTA and GRAMMAS exploit maximum likelihood estimates of the intercept from the polygenic model. One thousand permutations were used to estimate genome wide significance for both the FASTA and GRAMMAS tests. The P-value for the 1 degrees of freedom test was corrected for the inflation factor. Genomic control was applied by dividing the observed test statistic (P-value for the 1 degrees of freedom test) by the genomic inflation factor λ (where λ is the regression coefficient of the observed χ2 test statistic onto the expected χ2 test statistic). Genomic control is believed by some authors to circumvent the need for Bonferroni correction for multiple testing [89].

The QFAM analysis module in PLINK (http://pngu.mgh.harvard.edu/purcell/plink/[90]) was used to perform a linear regression of phenotype on genotype. In this case the module used an adaptive permutation procedure to correct for family structure. Association testing was performed across the total data. Data from a total of 1024 offspring and 14 parents (7 nuclear families) were used with a genotyping success rate of 99%. Minimum number of permutations per SNP was 5, maximum 1 million, alpha level threshold 0, confidence interval on empirical p-value 0.0001 and intercept and slope of the pruning interval 1 and 0.001 respectively. GWAS associations with significance at P < 0.001, P < 0.01 and P < 0.05 levels after Bonferroni correction based on the number of linkage groups (which was 44 for P. monodon) were noted for all tests. GWAS associations were denoted as “suggestive” when P < 0.01 (before Bonferroni correction). As explained for the linkage analysis, the Bonferroni correction was equivalent to a Benjamini Hochberg [87] false discovery rate of >95% (q-value of 0.98).

Mapping the sex-determining locus

SNPs significantly associated with sex were detected using a simple χ2 test of observed and expected allele frequencies in male and female offspring across families under the null hypothesis that the segregation of alleles would be independent of sex. Associations were treated as significant when P < 0.01 after Bonferroni correction based on the number of linkage groups. Regression interval mapping using the sib-pair module was also carried out in GridQTL as described for the WSSV analysis using sex as a phenotype.

Availability of supporting data

The supporting high density P. monodon linkage map and SNP characterisations can be found in [39]. Annotated transcriptome sequence data is available through the Transcriptome Shotgun Assembly Database of NCBI (accession numbers JR196815 – JR235449, http://www.ncbi.nlm.nih.gov/Genbank). Other supporting data (map position and annotation for linkage mapped transcripts, tests for association with sex) are included in the additional files section.