Background

Hop is an important cash crop for the Pacific Northwest USA as well as several European countries, China, Australia, South Africa and other minor production regions. It is primarily used as a flavoring and bittering additive in beer brewing but alternative uses have become increasingly important [1, 2]. Hop is a dioecious perennial plant species propagated via rhizomatous cuttings. The female inflorescence (or hop “cone”) is the harvested product. While male hop plants are required for breeding purposes female hop plants will produce cones without pollination [3]. Male hop plants disperse pollen via air and if present near production hop yards, can pollinate and produce seed on female hop varieties. Seedlings from these crosses can supersede the previous genotype if they possess superior fitness. Furthermore, when new varieties are produced on yards previously producing a different variety, it is possible for escapes to continue production. Both of these scenario’s can be compounded over the life of a new hop variety planting with the result that a yard becomes contaminated. If rhizome cuttings are subsequently sold from this yard, the recipient grower could end up with either a partially or fully contaminated yard. Hop sales from this field are then rejected due to unexpected flavors or bittering capacity. In addition, farms located across the USA with historical importance have requested help in identifying feral hops growing on their property ([4]; Personal Observation). In these cases, the goal would be to eliminate the possibility that the unknown line is a currently available hop variety.

Regardless of the scenario, the hop industry does not currently have an efficient, accurate and widely available method for marker-based genotyping of hop accessions. Current means used by the National Clean Plant Network for genotyping hop are based upon 25 microsatellites that provides relatively accurate genotyping but cannot always differentiate sister-lines (Dr. Ken Eastwell, Personal Communication 2015). Patzak and Matoušek [5] reported on the use of expressed sequence tagged, simple sequence repeat (EST-SSR) markers as a means of differentiating hop varieties. The reported PCR-based method utilized 30 EST-SSR markers to differentiate 11 different hop genotypes representing a wide genetic pool. Unfortunately, no broad-based evaluation of related and unrelated genotypes was reported. In addition, a significant number of PCR steps (30) are required to utilize this method. Koelling et al. [6] reported on the identification of a 952 new SSR markers identified from expressed sequence tagged data sets deposited with National Center for Biotechnology Information (NCBI: http://www.ncbi.nlm.nih.gov/). These 952 markers were tested across 8 different cultivars to determine differentiation power of the markers. The combination of all 952 markers was successful in differentiating among the 8 cultivars. Again, no minimal number of SSR markers was identified in this study. Howard et al. [7] reported on the genotyping capabilities of diversity array technology markers (DArT) in hop. While Howard et al. [7] demonstrated DArT markers as having sufficient capability to resolve closely related hop genotypes, its cost and dependence upon a single service provider (Diversity Array Technology Inc.; http://www.diversityarrays.com/) limit availability. What is needed is a simple, widely available methodology that utilizes a minimal number of markers to differentiate between both related and unrelated hop genotypes.

Single nucleotide polymorphic (SNP) markers represent the most abundant source of variation that can be utilized to differentiate among genotypes especially as they are found in both coding [8] and non-coding regions [9]. Recent genome sequencing work (data not published) shows the presence of a SNP every 346 bp on average in hop. Matthews et al. [10] was the first group to identify and report on next generation sequencing derived SNP markers having identified 17,128 SNPs. This group utilized SNP markers to genotype hop varieties and concluded that a highly filtered group of 3068 SNP markers resulted in a dendrogram that did not significantly differ from dendrograms obtained using the lower stringency filtered set of 16,106 SNP markers. However, no minimum number of markers required to differentiate among all genotypes were identified and reported.

The minimal number of markers chosen for DNA fingerprinting cultivars has been examined in numerous crops (see [11] for review) and computer programs have been written to address this application [12] across any plant species. In essence, the primary means of identifying the minimal number of markers consists of some means of ranking markers upon their effectiveness at describing population variation and reiteratively including more and more markers until all genotypes in the population can be genetically differentiated. This process was utilized to identify a small set of SNP markers that could, upon validation, be utilized to differentiate among genetically diverse hop accessions and be widely adaptable and available to genetic laboratories worldwide.

Results and discussion

A total of 374,829 SNP markers were identified using the TASSEL-UNEAK Ver 3.0 pipeline [13] across a population of 121 individual varieties and germplasm accessions. Filtering of SNP sites, as well as filtering out individuals with poor sequencing results, was accomplished using TASSEL ver 4.3.4 [14] resulting in a set of 32,206 high quality SNP markers across 116 genotypes (Table 1). SNP marker filtration settings were set to require presence in 80 % of all genotypes for acceptance into the data set. Presence of greater than 50 % of all 32,206 SNP markers was set as cut-off for inclusion of a variety into the final data set. Some genetic lines with higher than 50 % missing gamete calls were kept in the study due to their importance in hop production (Hallertau Mittelfrueh, Wye Zenith, etc., Table 1). Cut-off specifications did not differ significantly from those utilized by Matthews et al. [10].

Table 1 Summary genotypic information of the results of partial sequencing for 116 varieties and experimental lines

Genotype summaries using all 32,206 SNP markers were obtained using TASSEL. Included in TASSEL’s genotype summary were estimations of the minor allele frequency (MAF). MAF-values are important statistics utilized to filter out markers with high error potential (MAF <0.05) or provide the best discrimination power between genotypes [15]. Ranking of MAF-values from highest to lowest identified numerous markers with MAF <0.5. SNP markers that were heterozygous across all genotypes were discarded from consideration. Using a reiterative process of additive inclusion of a single marker with highest MAF values we identified a set of six (6) SNPs that were capable of differentiating among all 116 genotypes in the study.

The dendrogram resulting from the use of these six SNP markers did not match up well with dendrograms developed from the use of a complete set of SNP markers (data not shown). As a result, we continued to include additional markers with high-MAF values to the minimal set of markers and then compared the resulting genetic diversity matrices to a matrix calculated from a complete set of 1006 markers (no missing markers from data set) (Table 2). It was determined that the seven SNP markers (Table 3; Fig. 1) with highest MAF-values were required to both differentiate all 116 genotypes and define statistically similar dendrograms (approximate Mantel T test; t = −15.7471, p = 0.00001) as compared to a complete set of 1006 SNP markers (Fig. 2).

Table 2 3-way Mantel’s t test [23] for cophenetic comparisons among genetic distance matrices comparing genetic distances calculated via 6-, 7-, 8-, 10-, and 12-markers (X-matrix, no missing markers) to that of a matrix calculated with 1006 SNP markers (Y-matrix, no missing markers) using a Z-matrix calculated from 32,217 markers (20 % missing marker data allowed)
Table 3 List of seven SNP sequences (SNP shown in parentheses) differentiating all 116 hop accessions
Fig. 1
figure 1

Dendrogram of the 116 hop varieties and germplasm resources as determined using the seven SNPs proposed as the minimal number of markers to genetically differentiate hop accessions

Fig. 2
figure 2

Dendrogram of hop 116 hop varieties and germplasm resources as determined by use of 1006 SNP markers with no missing data out of the pool of 32,206 SNPs utilized for this study

PCR-based methodology to screen SNP markers varies from simple (single strand conformational polymorphism, SSCP; [4] to resequencing using next generation sequencing. This study identified a set of SNP markers that could potentially be used to differentiate hop genotypes. We propose the use of high-resolution melting (HRM) curve analyses as a simple and rapid means to perform genetic fingerprinting on hop genotypes. Utilizing a draft hop genome, we aligned the raw reads for informative SNP markers to extend reads to a total length of 264-bp. Primer3 software identified optimum primer sequences that can be used to develop Amplicons for HRM analysis (Table 4).

Table 4 Suggested primers pair sets and amplicon specifications for high resolution melting curve analysis of seven SNP markers differentiating among 116 hop genotypes

Several of the accessions used in this study are thought to be clonal selections from other lines contained in this study. As an example, Savinja Golding is thought to be a clonal selection from Fuggle (see: “Slovenian Styrian Goldings: https://bsgcraftbrewing.com/slovenian-styrian-goldings) as are Fuggle H and Fuggle N (A. Haunold, Personal Communication, 2014). In addition, Hersbrucker 6 and 8 are thought to be clonal selections from the original German ‘Hersbrucker’ landrace (see: USDA ACCESSION No. 21514; http://www.ars.usda.gov/SP2UserFiles/person/2450/hopcultivars/21514.html). All these “clonal selections” show sufficient phenotypic differences from the related lines as well as parent lines to suggest genetic differences between them, although differences are expected to be minor. The inclusion of clonal selections was to determine if a sufficiently robust method could be devised to differentiate among such lines.

Previous work in hop have focused upon the identification of male plants from a population of offspring [16] or genetic diversity and DNA fingerprinting using older marker technology such as STS, SSR, AFLP, RAPD and DArT [7, 1719]. In all publications, differentiation of accessions required the full compliment of markers used for defining genetic diversity in hop populations. In several reports, a few hop varieties were not differentiated from one another and complete validation was not possible given the marker technology used. Furthermore, none of the published reports identified a subset of markers that could be used independently to fingerprint hop varieties.

In this study, use of the full compliment of 1006 SNP markers found in all cultivars (Fig. 2) and use of the minimum number of markers (7 SNPs—Fig. 1) completely differentiated all female lines contained in this study. In this report, 7 SNPs were identified that effectively differentiated all varieties and accessions present in the study. The hop lines chosen for this study represent a broad spectrum of hop lines from around the world. Some of the varieties evaluated in this study were not adequately differentiated using older marker technology such as AFLP or SSR’s. Thus, these older technologies have sufficient limitations in their usefulness for variety validation or identification. Partial sequencing through next generation sequencing technology allows for the identification of thousands of SNP markers from across the genome. These markers are not limited to clustered regions such as SSRs and DArT markers [16, 20] and are therefore more representative of the genome. Because of their distribution throughout the genome, SNP markers offer a greater likelihood of differentiating among accessions.

The 7 SNPs identified in this study were the minimum number of markers required to differentiate all the hop accessions in this study. They have not yet been tested using high resolution melting (HRM) or other SNP detection methods. Furthermore, the use of these 7 SNPs as a discriminating tool for samples consisting of mixtures of different cultivars has not been tested but may have limited applicability given the small number of markers used. The primers for use in HRM are reported for implementation by other projects (Table 3). If one or two of these SNPs prove to be insufficient for use in HRM or other PCR techniques, there are additional SNP markers that can be utilized (Supplementary Data).

Conclusions

This note reports on the identification of a minimal number of markers (7 SNPs) required to differentiate among 116 widely divergent hop accessions including clonal selections and sister hop lines. As such, it is the first publication outlining a simple widely available protocol for the identification of, and discrimination among, hop varieties. The SNPs and associated primer sequences for HRM analysis are provided and supplementary data provided to aid genetic.

Laboratories ensure their own set of markers that can be used for differentiation among hop lines.

Methods

Plant material consisted of 121 genotypes (varieties and experimental germplasm) contained in the USDA-ARS hop genetics and breeding program located at Corvallis, OR. Due to poor DNA quality of a few of the lines, the final sample number used was 116. DNA was extracted using DNAeasy Kits (Qiagen Inc) with the exception that the amount of RNase A was doubled and the QIAshredder spin column was not used. Library preparation, sequencing and were as reported by Elshire [21]. Because hop does not currently have a reference genome, SNP identification and production of hapmap files were accomplished using the TASSEL-UNEAK pipeline (http://www.maizegenetics.net/tassel/docs/TasselPipelineUNEAK.pdf). Resulting hapmap was analyzed by TASSEL 5.2.1 [14]. Marker and genotype summaries were exported as csv-format files which were imported into Microsoft® Excel® for Mac 2011. Minor allele frequency (MAF) were calculated in TASSEL and subsequently sorted from highest to lowest values. Initially, the top two markers with the highest MAF values were chosen for data analysis. These two markers with the highest MAF values were filtered into a separate data file in TASSEL v 5.0 using the “filter sites” option and genetic diversity values estimated from this filtered data. The resulting genetic diversity matrix was scanned for presence of genetic diversity estimates equal to zero. If present, the process was repeated adding the next marker with highest MAF value. These steps were reiterated until all genetic diversity estimates were greater than zero (matrix with six SNP markers having the highest MAF values). Additional high-MAF, SNP markers were added to this set of six SNPs to form additional genetic distance matrices (genetic distance matrices formed from 7-, 8-, 10- and 12-markers) for comparison to a complete set of polymorphic markers with no missing data (1006 SNP markers). NTSYSpc V2.21c [22] was used to estimate correlations between genetic matrices for minimal marker sets (6-, 7-, 8-, 10-, 12-markers) and the complete data set using 3-way Mantel’s t test [23] and a matrix calculated (constant or “Z-matrix”) from the original set of 32,206 SNP markers.

The 64-bp reads representing minimal marker data sets were aligned with a USDA-ARS/OSU draft hop genome (http://hopbase.cgrb.oregonstate.edu/app_dev.php/) to extend reads by 100-bp on either side of the 64-bp read using Geneious Pro ver 5.5.9 (http://www.geneious.com, [24] (Table 3). As an aid to interested parties, we developed primer pairs (Table 4) that are appropriate for high-resolution melting curve analyses [25] using Primer3 [26]. Default settings were used and product size was limited to a range of 70- to 115-bp length. Other PCR-based SNP assays are available and can be designed using the information in Table 3.

Availability of supporting data

The data set supporting the results of this article is included within the article while the hapmap file from which this study derives is included as supplementary files (Additional file 1).