The genome of the soybean cyst nematode (Heterodera glycines) reveals complex patterns of duplications involved in the evolution of parasitism genes
Heterodera glycines, commonly referred to as the soybean cyst nematode (SCN), is an obligatory and sedentary plant parasite that causes over a billion-dollar yield loss to soybean production annually. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. The parasitic success of H. glycines relies on the comprehensive re-engineering of an infection site into a syncytium, as well as the long-term suppression of host defense to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms of effector acquisition, diversification, and selection need to be understood before effective control strategies can be developed, but the lack of an annotated genome has been a major roadblock.
Here, we use PacBio long-read technology to assemble a H. glycines genome of 738 contigs into 123 Mb with annotations for 29,769 genes. The genome contains significant numbers of repeats (34%), tandem duplicates (18.7 Mb), and horizontal gene transfer events (151 genes). A large number of putative effectors (431 genes) were identified in the genome, many of which were found in transposons.
This advance provides a glimpse into the host and parasite interplay by revealing a diversity of mechanisms that give rise to virulence genes in the soybean cyst nematode, including: tandem duplications containing over a fifth of the total gene count, virulence genes hitchhiking in transposons, and 107 horizontal gene transfers not reported in other plant parasitic nematodes thus far. Through extensive characterization of the H. glycines genome, we provide new insights into H. glycines biology and shed light onto the mystery underlying complex host-parasite interactions. This genome sequence is an important prerequisite to enable work towards generating new resistance or control measures against H. glycines.
KeywordsHeterodera glycines SCN Soybean cyst nematode Genome Tandem duplication Effector Evolution
BR-C, ttk, and bab domain containing
Dorsal expressed gene
Expressed sequence tag
Horizontal gene transfer
Long interspersed nuclear element
Long terminal repeat
Non-redundant protein database
Principal components analysis
Pox virus and zinc finger domain
RAs-related Nuclear protein
Soybean cyst nematode
Short interspersed nuclear element
Single nucleotide polymorphism
Secreted SPRY domain-containing protein
Terminal inverted repeat.
The soybean cyst nematode (SCN) Heterodera glycines is considered the most damaging pest of soybean and poses a serious threat to a sustainable soybean industry . H. glycines management relies on crop rotations, nematode resistant crop varieties, and a panel of biological and chemical seed treatments. However, cyst nematodes withstand adverse conditions and remain dormant for extended periods of time, and therefore, are difficult to control. Furthermore, the overuse of resistant soybean varieties has stimulated the proliferation of virulent nematode populations that can infect these varieties . Hence, there continues to be a strong need to identify, develop, and implement novel sources of nematode resistance and management strategies.
H. glycines nematodes are obligate endoparasites of soybean roots. Once they emerge from eggs in the soil, they find nearby soybean roots and penetrate the plant tissue where they migrate in search for a suitable feeding location near the vascular cylinder. The now sedentary H. glycines convert adjacent root cells into specialized, fused cells that form the feeding site, termed syncytium . The parasitic success of H. glycines depends on the formation and long-term maintenance of the syncytium, which serves as the sole source of nutrition for the remainder of its life cycle. Host finding, root penetration, syncytium induction, and the long-term successful suppression of host defenses are all examples of adaptation to a parasitic lifestyle. At the base of these adaptations lies a group of nematode proteins that are secreted into plant cells to modify host processes . Intense research is focused on identifying these proteins, called effectors, and to elucidate their complex functions. To date, over 80 H. glycines effectors have been identified and confirmed [5, 6], although many more remain to be discovered. Characterization of some known effectors has provided critical insights into the parasitic strategies of H. glycines. For example, these studies revealed that effectors are involved in a suite of functions, including defense suppression, plant hormone signaling alteration, cytoskeletal modification, and metabolic manipulation (reviewed by [7, 8, 9, 10]). However, research has yet to provide a basic understanding of the molecular basis of virulence, i.e., the ability of some nematode populations to infect soybean plants with resistance genes, while other nematode populations are controlled by these resistance genes.
H. glycines populations are categorized into Hg types based on their virulence to a panel of soybean cultivars with differing resistance genetics [2, 11]. Based on the Hg type designation, growers can make informed decisions on soybean cultivar choice. To date, the Hg type designation can only be ascertained through time-consuming and expensive greenhouse experiments. However, once the genetic basis for virulence phenotypes has been explored, it is conceivable that molecular tests can be developed to make Hg type identification fast and reliable.
Resistant soybean cultivars are becoming less effective, as H. glycines populations alter their Hg type designation as a function of the soybean resistance genes to which the nematode population is exposed. In other words, when challenged with a resistant soybean cultivar for an extended duration, the surviving nematodes of an otherwise largely non-virulent H. glycines population will eventually shift towards a new Hg type that is virulent on resistant soybean cultivars . It is unknown if this phenomenon solely relies on the selection of virulent genotypes already present within a given nematode population, or if H. glycines wields the power to diversify an existing effector portfolio to quickly infect resistant soybean cultivars. In addition, such genetic shifts appear to be distinct across populations with the same pathotype, indicating populations can independently acquire the ability to overcome host resistance . Understanding these and other questions targeting the molecular basis of H. glycines virulence are critical for sustainable soybean production in a time when virulent nematodes are becoming more prevalent.
Scientists can finally start answering such questions, as we are presenting a near-complete genome assembly and extensive effector annotation of H. glycines, along with single-nucleotide polymorphisms (SNPs) associated with fifteen H. glycines populations of differing virulence phenotypes. PacBio long-reads were assembled and annotated into 738 contigs of 123 Mb containing 29,769 genes. The H. glycines genome has significant numbers of repeats (34% of the genome), tandem duplications (14.6 Mb), and horizontal gene transfer events (151 genes). Using this genome, we explored potential mechanisms for how effectors originate, duplicate, and diversify. Specifically, we found that effectors are frequently associated with tandem duplications, DNA transposons, and LTR retrotransposons. Additionally, we have leveraged RNA-seq data from pre-parasitic and parasitic nematodes and DNA sequencing across 15 H. glycines populations to further characterize effector expression and diversity.
Genome assembly, annotation, and completeness
Gene annotations were performed using Braker on an unmasked assembly, as multiple known effector alignments were absent from predicted genes when the genome was masked (Additional file 1: Figure S7). While all known effectors are present in the assembly, the resulting gene count of 29,769 also includes many expressed repetitive elements (12,357). A variety of transcriptional sequencing was used as input for gene annotations, including 230 million RNA-seq reads from both pre-parasitic and parasitic J2 H. glycines nematodes , 34,041 iso-seq reads from early, middle and late life stages of both a virulent and an avirulent strain, and the entirety of the H. glycines ESTs in NCBI (35,796).
Effector gene identification
Given that DOG boxes are only present in some effector promoters, to identify a comprehensive repertoire of effectors we combined several methods and criteria. First, we aligned the 80 known H. glycines effector sequences to the genome using GMAP, identifying 121 putative effector genes. Second, the same 80 known effectors were subjected to motif discovery with MEME, identifying 24 motifs in 60/80 effectors (Additional file 1: Figure S8). One motif (motif 1) was a known signal peptide found in 10/60 effectors . In addition, motifs 8, 12, and 18 were also consistently found at the N terminus in 7/60, 16/60, and 17/60 effectors, respectively. Because genes containing these motifs may also be effectors, the 24 motifs (Additional file 2: Data S1) were queried against the H. glycines predicted proteome using FIMO, revealing a set of 292 proteins with at least one effector-like motif. All three effector gene predictions were merged to produce a unique set of 431 effector-like genes. This gene set was used in downstream analyses exploring effector evolution. Of the 431 effector-like loci, 216 are predicted to encode a secretion signaling peptide and lack a transmembrane domain. While the remaining 215 effector-like loci may contain non-effectors, they were retained for downstream evolutionary analyses because they may represent genes with non-canonical secretion signals, “progenitor” housekeeping genes that gave rise to effectors (e.g. GS-like effectors , SPRY-SECs , etc.), or an effector graveyard.
Genomic insights into the mechanisms of effector duplication and diversification
The tandem duplication (TD) of genes in pathogen genomes is a common evolutionary response to the arms race between pathogen and host as a means to avoid/overcome host resistance . To identify the role tandem duplications play in the duplication virulence genes, we implemented RedTandem to survey the H. glycines genome. We determined that a total of 18.7 MB of the genome is duplicated with a total of 20,577 duplications in the genome. While most individual duplications were small, the average tandem duplication size was 909 bp. We verified that tandem duplications were not assembly artifacts by aligning the PacBio preads to the genome and confirmed that the larger than average tandem duplications (4410/4241) were spanned by PacBio preads across > 90% of tandem duplication length. The density of genes in the tandemly duplicated regions is higher than in non-duplicated regions of the genome: 6730/18.7 MB (~ 360 genes/MB) vs 23,039/105.2 MB (~ 219 genes/MB), and thus contributes to one fifth of the total gene count in the H. glycines genome. The largest groups of orthologous genes found in tandem duplications (881/3940 genes) were annotated with BLAST to the NCBI non-redundant (NR) database, revealing that the 38 largest clusters of duplicated genes were frequently transposable element genes, effector/gland-expressed genes, or BTB/POZ domain-containing genes (Additional file 1: Figure S9). Both effector-like loci (136/431; 36%) and HGT genes (38/151; 25%) were duplicated in the tandem duplications. Of effectors that were orthologous in the tandemly duplicated orthologs, Hgg-20 (144), 4D06 (11) and 2D01 (11) were the most frequent, while RAN-binding proteins formed the largest cluster of HGT genes (Additional file 1: Figure S9).
Genomic structures associated with gene expression change in H. glycines
To assess the importance of genes affected by duplications, repeat-association, and SNP density, we utilized gene expression from second-stage juveniles of H. glycines population PA3 before and after root infection of a resistant and susceptible soybean cultivar (SRP122521). Genes differentially up and downregulated after infection were identified using DESEQ with a q-value cutoff of 1e-8, revealing 1211 and 568 genes with significant up and down regulation, respectively. To associate differential expression with effectors and other gene categories, significant associations were identified using the GeneOverlap R package (Fig. 5, Additional file 5: Table S5). As expected, many of the predicted effectors were significantly upregulated upon infection, a trend that continued with putative effectors found in DNA transposons and tandem duplications. In contrast, the only significantly upregulated gene categories not directly associated with predicted effectors were secreted genes and genes associated with an effector-associated repeat (Family-976 repeat).
However, since virulence genes have a limited span of use before host immunity is developed, the expression of a recognized effector may hinder survival, thus finding effectors with reduced gene expression is not surprising. Generally, genes associated with tandem duplications, HGT, and transposons had similar distributions of expression as genes that were non-associated, yet effectors found in tandem duplications and DNA transposons were significantly enriched for genes with high and low expression (Fig. 5). This high and low expression trend in effectors was also apparent in secreted genes at a higher significance, indicating that many potential effectors remain elusive to detection.
To overcome the expected assembly problems associated with high-levels of repetitive DNA and to reveal the evolutionary means behind the rapid evolution and population shifts in H. glycines, we used long-read technology to assemble a genome from a heterogenous population of individuals. Several analyses confirmed a high level of genome completeness with ~ 88% of the RNA-Seq aligning, 93% of preads aligning, and zero contaminating scaffolds (Additional file 1: Table S1, Figure S1). While percentages of missing BUSCO  genes were high, BUSCO genes were 72% complete, ranking H. glycines the best among sequenced genomes in the cyst and knot-nematode clades (Fig. 1, Additional file 1: Table S2). Some level of artifactual duplication may be present in the genome, with BUSCO gene duplication being highest among the species analyzed. However, only 79/349 duplicated BUSCO genes are found in tandem duplications, indicating that duplication or heterozygous contigs may be present elsewhere in the genome. With a goal-oriented approach of capturing all genic variation in the genome, we sequenced a population of multiple individuals. We therefore assembled a chimera of individuals, with some duplicated genes originating from single variants in the population. However, even when considering that nearly nine thousand genes could be attributed to repetitive elements and tandem duplications, the gene frequency (20,830) and exon statistics of H. glycines are elevated in relation to sister Tylenchida species.
Because plant parasitism has independently arisen three times in Nematoda, and because it is thought that HGT plays a crucial role in the nematodes’ adaptation to this lifestyle [25, 33], we investigated the potential role HGT may have in H. glycines. Almost all previously identified HGT in plant-parasitic nematodes were also found in H. glycines (n = 82) (Additional file 1: Table S3) . Genes with strong AI (> 30) were mainly hydrolases, transferases, oxidoreductases or transporters (Additional file 3: Data S2). Of interest were genes originating from bacteria or fungi, but lacking BLAST hits to Metazoan species (highlighted blue in Additional file 4: Data S3). Among these is a gene coding for an Inosine-uridine preferring nucleoside hydrolase (Hetgly.000009703; AI = 101.2), an enzyme essential for parasitism in many plant-pathogenic bacteria and trypanosomes . A candidate oomycete RxLR effector  was also identified in the genome (Hetgly.000002962, Hetgly.000002964 and Hetgly.000002966; AI up to 42.2). Besides being necessary for successful infection, RxLR effectors are also avirulence genes in some species, including the soybean pathogen Phytophthora sojae . The H. glycines genome is also host to a putative HGT gene (Hetgly.000001822 and Hetgly.000022293; AI up to 55.3) that has been characterized as a G. pallida effector (Gp-FAR-1) involved in plant defense evasion by binding plant defense compounds . Thus, horizontal gene transfer appears to contribute to the evolution of H. glycines virulence as well as to the ancestral development of parasitism in plant-parasitic nematodes [33, 39, 40, 41].
Although HGT is more common among nematodes and arthropods than other animals , there are many documented cases of gene duplication leading to evolutionary novelty and phenotypic adaptation across metazoans [43, 44]. With over a fifth of the genes in the H. glycines genome found in tandem duplications, characterizing the largest clusters of orthologous gene families in tandem duplications provides relevant information for identifying genes related to parasitism, adaptation, and virulence. A functional assessment of the 38 largest clusters of tandemly duplicated orthologues were largely transposon-associated proteins or proteins related to effectors, indicating that transposons have a role in duplicating effector genes (Additional file 1: Figures S9, S10). Because many of the LTRs and TIRs were nested, the frequent rearrangements of nested clusters of transposons  could be attributed to effector exon shuffling . While genes in duplicated regions of the genome were significantly associated with high SNP density (Fig. 5a), putative effectors were not. While it is known that genes in duplicated regions pave a way for evolutionary novelty [43, 44], the lack of high SNP density for effectors in duplicated regions may represent low sequencing depth or the recent duplication of these loci. While significant effector mutations could not be found in these regions, these effectors were some of the most highly upregulated and downregulated genes upon infection (Fig. 5b).
The H. glycines genome assembly and annotation provides a glimpse into host and parasite interplay through the characterization of known and predicted effector genes. This relationship is further unraveled through the characterization of tandem duplications, horizontal gene transfers, transposon hitchhiking, promotor regulatory element identification, alternative splicing, SNP density, and gene expression. The generation of these genomic resources will facilitate a greater understanding of the host-parasite relationship by revealing genes involved in creating and maintaining a functional feeding site. Thus, the genomic analysis of the H. glycines genome is an important advance in the pathway to generating new forms of resistance and control measures against H. glycines.
Nematode culture and DNA/RNA isolation
H. glycines inbred population TN10, Hg type 188.8.131.52, was grown on susceptible soybean cultivar Williams 82 in a greenhouse at Iowa State University. A starting culture of approximately 10,000 eggs from Dr. Kris Lambert, University of Illinois, was bulked for four generations on Williams 82 soybeans grown in a 2:1 mixture of steam pasteurized sand:field soil in 8″ clay pots, with approximately 16 h daylight at 27 °C. Genomic DNA was extracted from approximately 100,000 eggs in a subset of third generation cysts. Egg extraction was performed with standard nematological protocols , eggs washed 3 times in sterile 10 mM MES buffered water, and pelleted before flash freezing in liquid nitrogen.
Genomic DNA was isolated using the MasterPure Complete DNA Purification Kit (Epicentre) with the following modifications: Frozen nematode eggs were resuspended in 300 ul of tissue and cell lysis solution, and immediately placed in a small precooled mortar, where the nematode solution refroze and was finely ground. The mortar was then placed in a 50 °C-water bath for 30 min, then transferred to 500 ul PCR tubes with 1 ul of proteinase K, and incubated at 65 °C for 15 min, inverting every 5 min. Genomic DNA was resuspended in 30 ul of RNAse/DNase free water, quantified via nanodrop, and inspected with an 0.8% agarose gel at 40 V for 1 h. Two 20 kb insert libraries were generated and sequenced on 20 PacBio flow cells at the National Center for Genome Resources in Santa Fe, NM (SRR5397387 – SRR5397406).
Fifteen H. glycines populations were chosen based on Hg-type diversity and were biotyped to ensure identity (TN22, TN8, TN7, TN15, TN1, TN21, TN19, LY1, OP50, OP20, OP25, TN16, PA3, G3). Information on the selection and Hg-types of these lines is available in Additional file 6: Table S6. Genomic DNA from approximately 100,000 eggs for each population was extracted as described previously, and 500 bp libraries were sequenced on an Illumina HiSeq 2500 at 100PE (SRR5422809 – SRR5422824).
Six life stages were isolated for both PA3 and TN19 H. glycines populations: eggs, pre-parasitic second-stage juveniles (J2), parasitic J2, third-stage juveniles (J3), fourth stage juveniles (J4) and adult females. Parasitic J2 were isolated, followed by isolations of J3, J4, and adult females at 3, 8, 15, and 24 days post-infection via a combination of root maceration, sieving and sucrose floatation, using standard nematological methods . Total RNA was extracted with the Exiqon miRCURY RNA Isolation Kit (Catalog #300112). RNA was combined to form three pools for each population, corresponding to early (egg and pre-parasitic J2), middle (parasitic J2 and J3) and late (J4 females and early adult females) developmental stages. The IsoSeq data were used to improve the annotation (see below) (SAMN08541516-SAMN08541521).
A PacBio subreads assembly was generated with Falcon to correct subreads into consensus preads (error corrected reads), followed by contig assembly. An alternative approach using only transcript containing preads was helpful in solving heterozygosity and population problems. Transcripts were aligned to preads using Gmap  under default parameters, and a pool of preads for each unique transcript was assembled using CAP3  under default parameters. The longest assembled contigs and all unassembled preads were retained and read/contig redundancy was removed with sort and uniq. New FASTA headers were generated using nanocorrect-preprocess.pl [https://github.com/jts/nanocorrect/blob/master/nanocorrect-preprocess.pl], and sequences were then assembled with Falcon with default settings into 2692 contigs (supp file H. glycines.cfg). Falcon output was converted to Fastg with Falcon2Fastg [https://github.com/md5sam/Falcon2Fastg], and longer scaffolds were created with Bandage  using multiple criteria. 1) The longest path was chosen and ended with an absence of edges. 2) If the orientation of an interior contig was disputed, one set of edges was deleted to extend the scaffold. 3) The shortest path through difficult repetitive subgraphs was chosen.
Intragenomic synteny was used to remove clonal haplotigs [51, 52] (synteny as below). When synteny was identified between two contigs/scaffolds, if a longer 3′ or 5′ fragment could be made, then the ends of each contig/scaffold were exchanged at the syntenic/non-syntenic juncture. All remaining duplicate scaffolds retaining synteny were truncated or removed from the assembly, and followed by a BWA  self-alignment to remove redundant repetitive scaffolds at a 90% identity threshold.
Genome quality control
Multiple measures were taken to assess genome assembly quality, including a default BLASR  alignment of PacBio subreads, preads, and ccsreads resulting in alignment percentages at 88.7, 93.3, 90.1%, respectively (Additional file 1: Table S1). Using default settings, Gmap and Hisat2 (2.0.3) mapped 86.4% percent of a transcriptome assembly and ~ 88% of the five RNA-seq libraries, respectively (Additional file 1: Table S1). Genome completeness was assessed with BUSCO  at 71.9%. An absence of contamination was found with Blobtools (4.8.2)  using MegaBLAST (2.2.30+) to the NCBI nt database, accessed 02/02/17, at a 1-e5 e-value. See Additional file 7 for more detail.
To account for the high proportion of noncanonical splicing in nematodes , Braker  was used to predict genes using Hisat2 (2.0.3)  raw RNA-Seq alignments of ~ 230 million 100 bp PE RNA-Seq reads  and GMAP  alignments of IsoSeq reads, and all EST sequences from NCBI. Because gene models were greatly influenced by repeat masking, three differentially repeat-masked genomes were used for gene prediction: unmasked, all masked, and all except simple repeats masked (see supp table RNASEQ mapping in excel). All protein isoforms were annotated with Interproscan [59, 60] in BlastGO , and with BLAST  to Swiss-prot  and Uniref  at e-value 1e-5.
Repetitive elements in the H. glycines genome were classified into families with five rounds of RepeatModeler (1.0.8)  at default settings, followed by genome masking with RepeatMasker  at default settings. Inverted Repeat Finder (3.07) and LTR Finder (1.0.5) were used at default settings to define the border of a TE only when overlapping RepeatModeler repeats were present. Supplemental helitron prediction was done with HelitronScanner  under default settings.
To determine to what extent cyst nematodes use common mechanisms for dorsal gland effector regulation, we screened sequences previously associated with the DOG box in other genera against the H. glycines genome. The G. rostochiensis DOG-effectors  were used as queries in BLASTp to identify DOG-effector-like loci in the predicted proteome of H. glycines. The most similar sequence was retrieved if it satisfied two criteria, an e-value <1e-10 and the protein encoded a signal peptide for secretion (78 unique H. glycines loci). Using the same approach, 94 genes similar to other published dorsal gland expressed effectors (58) were identified  and combined with the DOG-effector-like list to a non-redundant 128 loci. Given the nature of these two criteria, not all sequences in this list will be effectors and not all effectors will be in this list, nevertheless, it will contain a sufficient number to determine whether the DOG box is conserved in H. glycines. A 500 bp region 5′ of the ATG start codon, termed the promoter region, was extracted from these 128 loci and used for motif enrichment analysis using HOMER , as previously described . DOG-box positional enrichment was calculated using FIMO web server  and predictive power calculated using custom python scripts.
At default settings Gmap  was used to align 80 previously identified effectors to the genome [5, 6, 70, 71]. Conserved protein motifs in effectors were identified with MEME: -nmotifs 24, −minsites 5, −minw 7, −maxw 300, and zoops (zero or one per sequence) . These motifs were used as FIMO queries to search the inferred H. glycines proteome .
The genome, gff, and peptide sequences for C. elegans (WBcel235), G. pallida , and M. hapla  were downloaded from WormBase . The genome and gff of G. rostochiensis  was downloaded from NCBI. The G. ellingtonae genome was also downloaded from NCBI , but gene models were unavailable, thus gene models for G. ellingtonae were called with Braker using RNA-seq reads from SRR3162514, as described earlier.
Fastp and global alignments with Opscan (0.1)  were used to calculate orthologous gene families between H. glycines and C. elegans , G. pallida , G. ellingtonae , G. rostochiensis , M. hapla , and M. incognita . All alternatively spliced variants and all possible multi-family genes were considered (-C, −b, −Q).
To infer synteny, iAdHoRe 3.0.01  was used with prob_cutoff = 0.001, level 2 multiplicons only, gap_size = 15, cluster_gap = 20, q_value = 0.9, and a minimum of 3 anchor points. Syntenic regions are displayed using Circos (0.69.2) .
Predicted protein sequences from the aforementioned nematode genomes (excluding C. elegans) were scanned with BUSCO 2.0  for 982 proteins conserved in nematoda_odb9. 651 proteins were found in at least 3 species and aligned with Prank  in Guidance  at default parameters. Maximum likelihood gene trees were computed using RAxML  with 1000 bootstraps and PROTGAMMAAUTO for model selection. Astral  at default settings was used to prepare a coalescent-based species tree.
With default settings, ReDtandem.pl was used to identify tandem duplications in the genome . Tandem duplicate orthologous genes were identified using a self-BLASTp to predicted proteins with 50% query length and 90% identity . To annotate clusters of orthologous genes, groups of highly connected nodes or entire clusters were concatenated and queried with BLASTp to the NCBI NR database .
SNP density and PCA analysis of fifteen H. glycines populations
Raw sequences from fifteen populations of H. glycines nematodes were quality checked with FastQC . Virulence for each H. glycines population are available in Additional file 6: Table S6. Reads were aligned to the H. glycines genome using default parameters in BWA-MEM . The BAM files were sorted, cleaned, marked for duplicates, read groups were added and SNP/Indel realignment were performed prior to calling SNPs and Indels with GATK. Custom Bash scripts were used to convert the vcf file into a gff for use with Bedtools (2.2.6) to identify SNP and exon overlap . The density of SNPs was calculated by dividing the number of SNPs/CDS length (bp). Phasing and imputing SNPs with Beagle 4.1 [90, 91] followed by a PCA analysis of SNPs vs Hg-type virulence using SNPRelate (1.12.2) .
RNA-seq reads were obtained from NCBI SRA accession SRP122521. Briefly, SCN inbred population PA3 was grown on soybean cultivar Williams 82 or EXF63. Pre-parasitic second-stage juveniles and parasitic second stage juveniles were isolated from roots of resistant and susceptible cultivars at 5 days post-inoculation . 100 bp PE reads were aligned to the genome using default settings with HiSat2 . Read counts were calculated using default settings with FeatureCounts from the Subread package , followed by Deseq2  at default settings to determine log-fold change between the pre-parasitic samples (2 × ppJ2_PA3) and parasitic J2 samples (2 × pJ2_s63, pJ2_race3_Forrest).
The analysis of the global changes and effector specific effects in alternative splicing landscape was assessed following a recent de novo transcriptomics analysis of the H. glycines nematode effectors  Transcriptome annotation was constructed using 230 million RNA-Seq reads from both pre-parasitic and parasitic J2 H. glycines , 34,041 iso-seq reads from three life stages of both a virulent and an avirulent strain, and H. glycines ESTs in NCBI (35,796). Specifically, using a standard alternative splicing analysis pipeline , 230 million reads from both pre-parasitic and parasitic J2 H. glycines  were preprocessed with Trimmomatic , aligned with Tophat 2.1.1 , and quantified with Cufflinks 2.2.1 , followed by conversion of FPKM to TPM , and patterns assessment with IsoformSwitchAnalyzerR . For the 80 previously identified effectors [6, 70, 71], the changes in the functional domain architectures between specific alternatively spliced isoforms are determined using InterPro domain annotation server with a focus on Pfam domains .
We thank the National Center for Genome Resources in Santa Fe, NM for performing PacBio sequencing, and Iowa State DNA Facility in Ames, IA for Illumina sequencing of fifteen populations. We also thank Levi Baber for IT support in visualizing genomics data with JBrowse.
RM, TRM, PSJ, MGM, MH, AJS and TJB would like to acknowledge the critical support of the North Central Soybean Research Program. Work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. SEvdA is supported by Biotechnology and Biological Sciences Research Council grant BB/R011311/1. DK and NTJ acknowledge support by National Science Foundation (DBI-1458267 to DK). This work used the Extreme Science and Engineering Discovery Environment (XSEDE) , which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system , which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). BM and EL are supported by Genome Canada, Genome Quebec and partners listed on soyagen.ca. PacBio sequencing was obtained using funds from the National Science Foundation I/UCRC, the Center for Arthropod Management Technologies under Grant No. IIP-1338775 and by industry partners.
Availability of data and materials
Datasets generated during the current study are available at Genbank accessions (SRR6782833 - SRR6782842), (SRR5397387 – SRR5397406), (SRR5422809 – SRR5422824). BioProject address: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA381081 . Scripts used for the alternative splicing analysis can be found at https://github.com/bioinfonerd/SCN_AS_RNA_Seq . Scripts used for the promoter analysis can be found here: https://github.com/sebastianevda/Fimo_parse/tree/master. All other scripts and bioinformatic analyses can be found at: https://github.com/remkv6/SCN_Genome_Paper.
RM, TRM, PSJ, MGM, MH, AJS, UM, JS, AS, and TJB conceived and designed the experiment. TRM isolated and acquired the data. RM and AJS performed the assembly. SEvdA performed and wrote the promoter analysis. DK and NTJ performed and wrote the alternative splicing analysis. BM and EL performed and wrote the horizontal gene transfer analysis. RM performed all other comparative analyses. All authors made substantial contributions to the final text. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.Koenning SR, Wrather JA. Suppression of soybean yield potential in the continental United States from plant diseases estimated from 2006 to 2009. Plant Health Prog. 2010. https://doi.org/10.1094/PHP-2010-1122-01-RS.
- 3.Endo BY. Penetration and development of Heterodera glycines in soybean roots and related anatomical changes. Phytopath. 1964;54:79–88.Google Scholar
- 6.Noon JB, Hewezi TAF, Maier TR, Simmons C, Wei J-Z, Wu G, Llaca V, Deschamps S, Davis E, Mitchum M. Eighteen new candidate effectors of the phytonematode Heterodera glycines produced specifically in the secretory esophageal gland cells during parasitism. Phytopathology. 2015; (ja).Google Scholar
- 9.Juvale PS, Baum TJ: “Cyst-ained” research into Heterodera parasitism. PLoS Pathog 2018, 14(2):e1006791.Google Scholar
- 12.Eves-van den Akker S, Laetsch DR, Thorpe P, Lilley CJ, Danchin EG, Da Rocha M, Rancurel C, Holroyd NE, Cotton JA, Szitenberg A. The genome of the yellow potato cyst nematode, Globodera rostochiensis, reveals insights into the basis of parasitism and virulence. Genome Biol. 2016;17(1):124.PubMedPubMedCentralCrossRefGoogle Scholar
- 16.Triantaphyllou A. An advance treatise on Meloidogyne vol. 1. Raleigh, USA: North Carolina State University Graphics; 1985.Google Scholar
- 20.Mei Y, Thorpe P, Guzha A, Haegeman A, Blok VC, MacKenzie K, Gheysen G, Jones JT, Mantelin S. Only a small subset of the SPRY domain gene family in Globodera pallida is likely to encode effectors, two of which suppress host defences induced by the potato resistance gene Gpa2. Nematology. 2015;17(4):409–24.CrossRefGoogle Scholar
- 26.Mitreva M, Smant G, Helder J. Role of horizontal gene transfer in the evolution of plant parasitism among nematodes. Horizontal Gene Transfer. Humana Press; 2009. p. 517-535.Google Scholar
- 27.Smant G, Stokkermans JP, Yan Y, De Boer JM, Baum TJ, Wang X, Hussey RS, Gommers FJ, Henrissat B, Davis EL. Endogenous cellulases in animals: isolation of β-1, 4-endoglucanase genes from two species of plant-parasitic cyst nematodes. Proc Natl Acad Sci. 1998;95(9):4906–11.PubMedCrossRefGoogle Scholar
- 38.Prior A, Jones JT, Blok VC, Beauchamp J, McDermott L, Cooper A, Kennedy MW. A surface-associated retinol-and fatty acid-binding protein (Gp-FAR-1) from the potato cyst nematode Globodera pallida: lipid binding activities, structural analysis and expression pattern. Biochem J. 2001;356(Pt 2):387.PubMedPubMedCentralCrossRefGoogle Scholar
- 39.Danchin EG, Perfus-Barbeoch L, Rancurel C, Thorpe P, Da Rocha M, Bajew S, Neilson R, Sokolova E, Da Silva C, Guy J. The transcriptomes of Xiphinema index and Longidorus elongatus suggest independent acquisition of some plant parasitism genes by horizontal gene transfer in early-branching nematodes. Genes. 2017;8(10):287.PubMedCentralCrossRefPubMedGoogle Scholar
- 44.Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. In: Proc R Soc B: 2012: The Royal Society; 2012. p. 5048–57.Google Scholar
- 46.Vanholme B, Kast P, Haegeman A, Jacob J, Grunewald W, Gheysen G. Structural and functional investigation of a secreted chorismate mutase from the plant-parasitic nematode Heterodera schachtii in the context of related enzymes from diverse origins. Mol Plant Pathol. 2009;10(2):189–200.PubMedCrossRefGoogle Scholar
- 51.Seo J-S, Rhie A, Kim J, Lee S, Sohn M-H, Kim C-U, Hastie A, Cao H, Yun J-Y, Kim J. De novo assembly and phasing of a Korean human genome. Nature. 2016.Google Scholar
- 52.Makoff AJ, Flomen RH. Detailed analysis of 15q11-q14 sequence corrects errors and gaps in the public access sequence to fully reveal large segmental duplications at breakpoints for Prader-Willi, Angelman, and inv dup (15) syndromes. Genome Biol. 2007;8(6):R114.PubMedPubMedCentralCrossRefGoogle Scholar
- 65.Smit AFA, Hubley R. RepeatModeler Open-1.0 (2008–2015). http://www.repeatmasker.org.
- 66.Smit A, Hubley R, Green P: RepeatMasker Open-4.0. 2013–2015. Institute for Systems Biology http://repeatmasker org 2015.Google Scholar
- 68.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89.PubMedPubMedCentralCrossRefGoogle Scholar
- 73.Cotton JA, Lilley CJ, Jones LM, Kikuchi T, Reid AJ, Thorpe P, Tsai IJ, Beasley H, Blok V, Cock PJ. The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode. Genome Biol. 2014;15(3):R43.PubMedPubMedCentralCrossRefGoogle Scholar
- 75.Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, Done J, Down T, Gao S, Grove C. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 2015; gkv1217.Google Scholar
- 87.Coordinators NR. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2016;44(Database issue):D7.Google Scholar
- 88.Andrews S. FastQC: a quality control tool for high throughput sequence data; 2010.Google Scholar
- 89.Quinlan AR. BEDTools: the Swiss-army tool for genome feature analysis. In: Current protocols in bioinformatics; 2014. 11.12. 11–11.12. 34.Google Scholar
- 95.Merino GA, Conesa A, Fernández EA. A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies. Brief Bioinform. 2017.Google Scholar
- 99.Pachter L: Models for transcript quantification from RNA-Seq. arXiv preprint arXiv:11043889 2011.Google Scholar
- 103.Nystrom NA, Levine MJ, Roskies RZ, Scott J. Bridges: a uniquely flexible HPC resource for new communities and data analytics. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure: 2015: ACM; 2015. p. 30.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.