DNA sequence polymorphism of the Rhg4 candidate gene conferring resistance to soybean cyst nematode in Chinese domesticated and wild soybeans

Rhg4 is one of the major resistant genes conferring resistance to soybean cyst nematode races 1, 3 and 4. In order to better understand its sequence diversity among different Chinese soybean populations and the impact of human activities on it, we designed 5 primer sets based on its sequence deposited in Genbank (Genbank accession No. AF506518) to obtain the Rhg4 sequence from 104 Chinese cultivated and wild soybean genotypes, and then analyzed the DNA sequence polymorphism in different Chinese soybean populations. The alignment of Rhg4 sequence included 5,216 nucleotide base pairs. A total of 67 single nucleotide polymorphisms (SNPs) including 59 single base changes and 8 DNA insertion-deletions (InDels) were identified with a SNP frequency of 1/78. Except for a 14-base InDel, there were 29 SNPs in coding regions, and among them, 13 were non-synonymous (9 in functional domains with 1 in a leucine-rich repeats region, 2 in a transmembrane region and 6 in a Ser/Thr kinase domain). The probability of substitution at each site was not the same, there were two hot spots, one was in the 5′-untranslated region between positions 124 and 804, and the other was in the region between positions 2520 and 3733. Sequence diversity analysis among 104 soybean genotypes showed π = 0.00102 and θ = 0.00218 for Rhg4. A domestication bottleneck was found because of lower sequence diversity and 58% unique SNPs loss in landraces compared with Glycine soja. Intensive selection increased the sequence diversity of cultivars, which had higher diversity and more unique SNPs than landraces.


Introduction
Soybean cyst nematode (SCN) (Heterodera glycines Ichinohe) is one of the most devastating pathogens in soybean production worldwide and causes substantial yield losses (Li et al. 2011;Wrather et al. 2001;Wrather and Koenning 2009) by feeding on soybean roots, damaging root systems, and reducing the plant's ability to absorb water and nutrients. Resistant cultivars are considered the best method to control SCN. Many scientists have conducted research programs in an effort to identify resistant sources (Arelli and Wilcox 1997;Arelli et al. 2000; Coordinative group of evaluation of SCN 1993; Lai et al. 2005;Young 1990; Zhang and Dai 1992), identify the genes involved (Caldwell et al. 1960;Lu et al. 2006;Matson and Williams 1965;Rao-Arelli et al. 1992;Vuong et al. 2010;Wang et al. 2001;Winter et al. 2007) or produce resistant varieties (Anand et al. 2004;Diers et al. 2006;Du et al. 2006;Hao et al. 2003;Mengistu et al. 2005;Qiu and Wang 2007;Shannon et al. 2009;Wang et al. 2007). Classical inheritance studies identified five SCN resistance genes in soybean, three recessive genes designated rhg1, rhg2 and rhg3 were first reported in 'Peking' (Caldwell et al. 1960), the dominant gene Rhg4 was also identified in 'Peking', and was linked to the 'i' locus controlling seed coat color (Matson and Williams 1965), an additional dominant gene, Rhg5, was reported in PI 88788 (Rao- Arelli et al. 1992).
Almost 20 years of genetic mapping studies described more than 70 SCN resistance quantitative trait loci (QTLs) (Concibido et al. 2004;Guo et al. 2006;Vuong et al. 2010;Winter et al. 2007;Wu et al. 2009;Yuan et al. 2006). Despite inconsistencies of QTL mapping, it was concluded that rhg1 and Rhg4 were two major SCN resistance genes. The rhg1 locus repeatedly mapped on linkage group (LG) G [chromosome (Chr) 18] in many resistant soybean genotypes (Chang et al. 1997;Concibido et al. 1994Concibido et al. , 1997Guo et al. 2006;Prabhu et al. 1999;Webb et al. 1995;Yue et al. 2001) and provided the greatest level of resistance. Ruben et al. (2006) summarized the construction of integrated physical and genetic maps of a 0.2 cM interval encompassing the rhg1 locus, and characterized the candidate gene as well as the encoding protein, RHG1, a receptor-like kinase. Li et al.(2009) developed 6 SNP markers based on the variation in rhg1 and reported their significant improvement of efficiency in marker-assisted selection (MAS) when combined with microsatellite marker BACR-Satt309, although Melito et al. (2010) reported no significant impacts of the LRRkinase gene on SCN resistance.
Meanwhile Rhg4 was located on LG A2 (Chr 8) (Chang et al. 1997;Concibido et al. 1994;Guo et al. 2006;Heer et al. 1998;Mahalingam and Skorupska 1995;Webb et al. 1995), 0.35 cM from the I locus (Matson and Williams 1965). Several genes associated with stress or defense responses such as chalcone synthase, glucosyl-transferase, heat-shock transcription factor, protein kinase, G10-like protein and restriction fragment length polymorphism molecular marker pBLT65 were close to the I and Rhg4 loci (Heer et al. 1998;Lewers et al. 2002;Matthews et al. 2001;Todd and Vodkin 1996;Webb et al. 1995;Weismann et al. 1992). Two separate research groups isolated the receptor-like kinase candidate gene Rhg4 from soybean variety 'Forrest' by positional cloning (Hauge et al. 2001;Lightfoot and Meksem 2002) and its DNA and protein sequence were lodged in Genbank in 2002 (Genbank accessions AF506518 and AAM44275.1). However, the candidate gene was little studied except for the work of Jang et al. (2004) who reported 3 SNPs and 7 InDels within two regions of Rhg4 totalling 901 bp by direct sequencing with 2 primer sets.
Like other important crops, soybean has undergone selection by human, involving domestication, intensive breeding, and probable founding events (Gyuhwa and Ram 2008;Hyten et al. 2006). These selection activities likely decrease genetic diversity (Tenaillon et al. 2001;Zhu et al. 2007), change allelic frequencies (Hyten et al. 2006) and eliminate rare alleles (Hyten et al. 2006;Tenaillon et al. 2001). Cultivated soybean (G. max) was domesticated from wild soybean (G. soja) in China (Hymowitz and Newell 1981), and domestication immediately resulted in G. max landraces (Hyten et al. 2007). Subsequent intensive selection imposed on landraces by soybean breeding created elite soybean cultivars. Hyten et al. (2006) and Yuan et al. (2008) detected effects of domestication bottlenecks in soybeans, but they had inconsistent results regarding intensive selection effects. Hyten et al. (2006) showed that modern soybean breeding had only minimal affects on the allelic structure of the soybean genome, but Yuan et al. (2008) reported an intensive selection bottleneck on GmHs1 pro-1 . However, in investigating the founding effects of soybean introduction to North America, they found evidence for only minor and non-significant bottlenecks (Hyten et al. 2006). The cumulative effects of the two genetic bottlenecks caused by founding events and intensive selection led to significant reductions in genetic diversity among North American elite cultivars in comparison with Asian landraces (Hyten et al. 2006).
In the present work we quantified DNA sequence polymorphism of Rhg4 in Chinese domesticated and wild soybeans by investigating an almost complete Rhg4 gene sequence in order to better understand its sequence diversity among different Chinese soybean populations and the impact of human activities on the candidate resistance gene. The resulting information may help the development of SNP markers for use in MAS in breeding programs.

Plant materials
The plant materials were selected from 27 provinces (autonomous regions or municipalities) of China (MOESM1) and represent three populations (G. soja, landraces and cultivars). The population of G. soja consisted of 28 accessions from 14 provinces (autonomous regions), landraces were represented by 51 accessions from 22 provinces (autonomous regions or municipalities), cultivars were from 7 provinces (municipalities), including 8 soybean genotypes from our core collection resistant to SCN (Ma et al. 2006), 3 awarded varieties, 4 parental lines (varieties) of our soybean genetic populations, and 10 elite cultivars. Genomic DNA was extracted from seedlings of wild accessions and from seed of domesticated genotypes as described by Yuan et al. (2008).

Primers design and polymerase chain reaction (PCR) amplification
Five pairs of primers (MOESM2) were designed from the sequence of the Glycine max receptor-like kinase Rhg4 gene (GenBank accession AF506518), with overlaps of 137-218 bp between adjacent PCR regions. PCR was carried out in total volumes of 20 ll consisting of 60 ng DNA, 19 PCR TaKaRa Buffer, 0.15 lM of each primer, 0.15 mM dNTP and 0.5 U TaKaRa ExTaq polymerase. PCR amplification conditions were as described by Yuan et al. (2008).
Sequencing PCR products were separated on 1.0% agarose gels stained with ethidium bromide. The PCR primer set of 3561U22 and 4782L26 produced 3 amplicons with about 90 nucleotide base differences in length. A preexperiment determined that the smallest amplicon was the most similar to AF506518 and located in LG A2 (Chr 8) by blastn against AF506518 (unshown), so we chose the smallest amplicon as a target fragment. The fragment was purified using a DNA fragment purification kit (Biotech), cloned into the pMD18-T vector (TaKaRa) and then sequenced with primers of M13F and M13R. The other 4 PCR primer sets each produced single amplicons, and the PCR products were directly sequenced with PCR primers after being collected and purified. When necessary, sequencing primers (MOESM2) were designed from the sequencing information to assure accuracy of sequence determination.

Sequence analysis
Sequences were assembled with the SeqMan tool of DNAstar software, and the sequence of cv. Huipizhiheidou (HPZhHD) was interrogated by BLAST searches in Genbank for identity confirmation. Sequence alignment was performed using Clustalx 1.8 with manual refinement. Single base changes and single or multiple base InDels were collectively preferred as SNPs. Only informative SNP sites were selected to build haplotypes. Differential regions of the DNA sequence were predicted and located following the method of Tang and Lewontin (1999). Two DNA polymorphism measures of nucleotide diversity (p and h) and haplotype diversity were calculated with DnaSP v.5.10.01 software. A neighbor-joining phyolgenetic tree was constructed using MEGA 5.0 software with a Kimura 2-parameter model and 1,000 bootstrap replications.

Sequence comparison of Rhg4 with AF506518
Sequence assembly with the SeqMan program generated a continous sequence of 5,216 bp and a BLAST search showed it had 98% identity at the nucleotide level with the receptor-like kinase Rhg4 gene (AF506518), therefore the sequence was presumed to be Rhg4. Alignment with AF506518, there were 49 base changes, 2 single-base inserts and one three-base insert (MOESM3). Both single-base inserts occurred in the 5 0 -untranslated region (UTR), whereas the three-base insert was in the second exon, but it was not a frame-shift mutation. The predicated protein therefore had one more amino acid than the receptor-like kinase RHG4 (AAM44275.1).

DNA variants of Rhg4
We obtained the 5,216 bp DNA sequence of Rhg4 from 104 soybean genotypes, representing members of the three distinct populations, viz. G. soja, landraces and cultivars. Surprisingly high sequence polymorphism was found in Rhg4, a total of 67 SNPs, including 59 single base changes and 8 DNA InDels, were identified with an SNP frequency of 1/78 (Table 1). Five InDel loci had 3 alleles each, and a 14-base gap occurred in the coding region of one wild soybean genotype presumably leading to a frame-shift mutation. Except for the 14-base Indel, there were 29 SNPs in coding regions, among them 13 were nonsynonymous and 16 were synonymous. Of the 59 single base changes, 40 were involved in transitions, and 19 were transversions with a transition:transversion ratio of 2:1.
The probability of substitution at each site in Rhg4 was not the same, there were two hot spots with higher probabilities of substitution. One hot spot occurred in the 5 0 -UTR region between positions 124 and 804 and the other was in the region between positions 2,520 and 3,733.
DNA polymorphism of Rhg4 among the three soybean populations Unique and shared SNPs among the three soybean populations were investigated (Fig. 1). A total of 41 SNPs were detected in G. soja, of which 26 were unique and not found in the two G. max populations. Landraces contained 27 SNPs, of which 11 were unique. Cultivars also had 26 SNPs, 13 of which were unique. While examining the coding region of Rhg4, we found that cultivars had the largest number of sequence variants with 19 SNPs, 11of which were unique; however G. soja and landraces had 13 and 12 SNPs, with only 7 and 4, respectively being unique.
Nucleotide diversity analysis on 104 soybean genotypes showed p = 0.00102 and h = 0.00218 for Rhg4 (Table 2). Among the three populations, G. soja had the highest diversity with p = 0.00114 and h = 0.00164, followed by cultivars, and landraces had the lowest with p = 0.00090 and h = 0.00098. Sequence diversity in different regions of the gene showed obvious differences, the highest sequence diversity occurred in synonymous sites of the coding region, followed by intron, the 5 0 -UTR region, and in non-synonymous sites of the coding region, with the  Fig. 1 Number of shared and unique SNPs in the three soybean populations lowest sequence diversity in the 3 0 -UTR. Cultivars had the highest sequence diversity in the coding region among the three populations. Three haplotypes of Rhg4 commonly occurred among the three soybean populations, the other 23 haplotypes were in only one or two of the populations. Among the 23 haplotypes, 14 were uniquely detected in G. soja, 5 were uniquely in cultivars, 3 were uniquely in landraces, and 1 was detected in both cultivars and landraces. As for haplotype diversity, G. soja had the highest haplotype diversity at 0.97, whereas cultivars and landraces had lower values (0.89 and 0.80, respectively).

Discussion
The SNP frequency in Rhg4 among the 104 soybean accessions was 1/78, obviously higher than earlier estimates of 1/106 (Yuan et al. 2008), 1/107 (Hyten et al. 2006), 1/273 (Zhu et al. 2003) and 1/343 (Van et al. 2005) in soybean. When compared with other plants, the SNP frequency was lower than those reported in maize (Ching et al. 2002;Tenaillon et al. 2001;Yamasaki et al. 2005) and chickpea (Rajesh and Muehlbauer 2008), and higher than those in bread wheat (Ravel et al. 2007) and Arabidopsis thaliana (Schmid et al. 2003;Clark et al. 2007). The difference for SNP frequencies maybe due to different samples and different genomic regions. Although higher SNP frequency was detected than other reports for soybean, sequence diversity among the 3 soybean populations was obviously low compared with other reports among corresponding soybean populations (Hyten et al. 2006). The reason probably was that soybean resources we used were only from China, having somewhat narrower genetic variation.
As for nucleotide mutation type, there is a clear transition bias probable due of the high spontaneous rate of deamination of 5-methl cytosine to thymidine in the CpG dinucleotides (Vignal et al. 2002), that is, GC content is probably linked to the ratio of transitions to transversions. In our study the ratio was 2.1, similar to 2.12 reported in soybean by Van et al. (2005), 2 reported in humans (Wang et al. 1998) and mouse (Lindblad-Toh et al. 2000), and relatively greater than 1.3 reported in soybean (Yuan et al. 2008), but was contrary to 0.93 reported by Zhu et al. (2003) in soybean. The difference among different species and even among different samples in the same species was probably related with GC content in observed genomic region. Of course some other factors leading to DNA mutation may influence the ratio.
Domestication represents the first result of human selection in soybean. Hyten et al. (2006) reported that landraces retained only 66% (p) and 49% (h) of the nucleotide diversity found in G. soja, and had lost 81% of the rare alleles in G. soja, thus representing a domestication bottleneck. Considering the overall sequence region of Rhg4, we also found a domestication bottleneck because firstly the sequence diversity within landraces was obviously low (p = 0.00090 and h = 0.00098) compared with G. soja (p = 0.00114 and h = 0.00164) ( Table 2), and secondly, landraces lacked 58% of the unique sequence variants present in G. soja despite that 2 novel SNPs happened (Fig. 1). However, the domestication bottleneck was slighter than that reported by Hyten et al. (2006). It maybe contributed to frequently communication among agricultural people since the Shang Dynasty or earlier and several domestication centers in China (Hymowitz andNewell 1981, Hymowitz 1970).
Intensive selection imposed in modern soybean breeding programs is generally thought to reduce genetic diversity of elite soybean cultivars (Gizlice and Burton 1994;Miranda et al. 2001;Concibido et al. 2004;Xiong et al. 2008), but Hyten et al. (2006) failed to find large effects of intensive selection based on DNA sequence diversity. In the present study, a clearly higher sequence diversity was found within cultivars (p = 0.00107 and h = 0.00122) than in landraces (p = 0.00090 and h = 0.00098), especially for the sequence diversity in the coding region cultivars had sequence diversity increases of p = 0.00036 and h = 0.00089 compared to landraces ( Table 2). In regard to unique SNPs, cultivars had two more than landraces in the overall sequence region, but seven more in the coding region (Fig. 1). This suggested that intensive selection has increased the sequence diversity for Rhg4 in cultivars. The reason was probably the effect of selection in breeding programs on SCN disease, or other traits, such as oil content (Oil 1-1), protein content (Prot 17-4) and seed weight (Sd wt 4-5) associated with yield, whose loci were close to Rhg4 (http://soybase.org/Marker DB/MapFeatureSearch.php?OutPutType=HTML&map set=GmComposite2003&MapName=A2&FeatureType =All_Types&FeatureStart=0&FeatureStop=9999). Studies on association of Rhg4 alleles with SCN resistance, and on association mapping of SCN disease, yield and seed quality traits mentioned above on LG A2 (Chr 8) maybe helpful for explanation the effect of intensive selection on Rhg4 sequence diversity. In addition, incorporation of exotic germplasm from USA, France and Japan into breeding program (MOESM1) may also contribute to the high sequence diversity in cultivars.
Although there were pedigree relationships among some soybean elite varieties or lines in the population of cultivars (MOESM1), which may decrease the sequence diversity, DNA variations were detected among them (MOESM4), indicating possible different allele origins or recombination occurrence. For example, the four cultivars of Hefeng 25, Hefeng 23, Jilin 47, and Suinong 14 were pedigree, but they were grouped into two clusters. Similar case also happened among the Kangxianchong 1, Kangxianchong 2 and Kangxianchong 3.