Introduction

Venison products are highly expensive and premium quality products, therefore, become the target of adulteration. In most European countries, food safety aspects are strictly regulated based on EU legislation; however, composition requirements and standard control methods are generally undefined. Maintaining customer confidence has a key role in the future of venison products, as customers are often wary of quality issues in the case of wild game-derived products. Meat and venison production is also susceptible to food adulteration; thus, the control of identity and origin is the most important factor in quality assurance and traceability. Incorrect labelling represents a commercial issue, as forgers may get a competitive advantage over fair producers (Fajardo et al. 2007; Ballin and Lametsch 2008), and incorrectly identified products can cause health problems for consumers living with sensitivity to various allergens (Pascal and Mahé 2001).

Wild boar (Sus scrofa) is one of the most widely distributed mammals in the world. A significant and continuous increase in their population numbers has been reported by several studies in recent decades (Massei and Genov 2004; Bieber and Ruf 2005; Csányi 2014; Massei et al. 2015). Due to the efforts to control population size, the number of harvested wild boars tends to increase throughout Europe (Massei et al. 2015) and, as a consequence, the total amount and the market share of wild boar venison are expanding within the food industry.

The application of DNA-based methods for food authentication has gained much attention because of food safety and quality concerns. DNA-based methods have effectively boosted the traceability of many processed food products (Fajardo et al. 2007; Wilkinson et al. 2012), as DNA is more stable under conditions typically associated with food preparation procedures (Arslan et al. 2006).

Some types of DNA markers, such as simple tandem repeats (STR) and single-nucleotide polymorphisms (SNP) have been used in pig for breed assignment purposes (Wilkinson et al. 2012; Zsolnai et al. 2013; Lin et al. 2014). Some of these are also capable of detecting wild boar (Caratti et al. 2010; Wilkinson et al. 2012; Lin et al. 2014). However, both STR- and SNP-based methods also have their disadvantages and limitations. The complex and heterogenous mutation pattern of STRs (Ellegren 2004) introduces ambiguities, and stutter bands or technical artefacts (allelic dropouts, null alleles, size homoplasy) can cause genotyping errors (Pompanon et al. 2005). Using SNPs, a large number of loci are needed to get feasible information (Syvänen 2001; Wilkinson et al. 2012); moreover, both genotyping methods are relatively expensive at small or medium scale and require special equipment (Wilkinson et al. 2012; Lin et al. 2014).

One of the most popular recent methods is real-time PCR amplification in food composition studies due to its accuracy, reliability and the potential of multiplex measurements (Fajardo et al. 2008). Methods routinely used in research environments may have high costs and low throughput. The necessity of expensive instrumental and human resources makes most of them unsuitable for routine measurements in quality control laboratories (Lockley and Bardsley 2000).

Short, biallelic molecular markers represent a good alternative for simple species identification assay designs. Insertion/deletion (InDel) markers have the potential of developing high-throughput, cost-effective rapid tests for routine monitoring of meat samples. InDel markers are based on insertion/deletion polymorphisms of certain regions of the genome sequence and can be easily analysed as length differences of PCR products by simple agarose gel electrophoresis. InDel markers have recently been used for selection in animal breeding (Ren et al. 2017; Crespo-Piazuelo et al. 2019), genetic mapping (Väli et al. 2008), diagnostics of genes encoding human diseases like Alzheimer’s disease (Bhattramakki et al. 2002; Lehmann et al. 2005) and identification of humans for forensic purposes (Pereira et al. 2009; Fondevila et al. 2012). Longer insertions/deletions have the potential of species identification, and favourable characteristics make them ideal for the examination of degraded DNA, including analysis in short amplicon size ranges, high multiplexing capability and low mutation rates (Väli et al. 2008; Fondevila et al. 2012; Crespo-Piazuelo et al. 2019). In addition, as length-based markers, InDels can be analysed with the use of fluorescent dye-labelled PCR primers subsequently separated and detected by capillary electrophoresis as standard forensic STR markers.

The development of next-generation sequencing (NGS) technologies has changed many fields of life sciences and improved the detection of genomic variants for genetic marker development. Insertions and deletions have been characterised to a lesser extent than SNPs or STRs, although whole-genome sequence data provide the possibility to find these specific regions (Zhang et al. 2017; Crespo-Piazuelo et al. 2019). For the purpose of this study, the genomes of three Mangalica breeds were sequenced and analysed previously using the abovementioned method (Molnár et al. 2014).

In this study, the main objectives were (1) to identify wild boar-specific insertions/deletions with analytical potentials, (2) to develop and validate markers in these regions and (3) to design a rapid but reliable simple multiplex method suitable for use in food quality control. Our effort to fortify the control of origin may support the competitiveness of fair producers by the maintenance of traceability. Moreover, it also helps to prevent intended crossbreeding that aims to increase productivity with deleterious effects on the genetic structure of wild boar populations.

Materials and Methods

Sampling and DNA Extraction

A total of 209 different muscle, blood and hair samples were collected for analysis (Table 1). Domestic pig samples, collected on farms and abattoirs, represented ten different pig breeds. Mangalica samples from the Hungarian Pig Tissue collection of the research consortium of MANGFOOD were used (Zsolnai et al. 2013). The dataset includes samples of 65 wild boar individuals; muscle samples were collected at various hunting events and venison processing plants, both in different regions of Hungary. Blood samples were collected from living animals into EDTA-coated sampling tubes by a trained veterinarian according to standard veterinary medical practice. Muscle tissue samples were collected from slaughtered animals in sampling tubes containing 96% ethanol. Hair samples were collected from living animals in sampling bags. All samples were stored at − 20 °C until processed in the laboratory.

Table 1 Number of individuals with known pedigrees used in the reference set grouped by breeds and the added samples involved in the reliability cross-check and validation of the W markers

Total genomic DNA from blood samples was extracted using the Duplicα Prep Automated DNA/RNA Extraction System (EuroClone, Italy); genomic DNA was obtained from muscle tissue using the Genomic DNA Mini Kit (Geneaid, USA) and from hair samples using the QIAamp DNA Investigator Kit (QIAGEN, Germany) according to the manufacturer’s instructions. DNA samples were checked and quantified in a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, USA) and stored at − 20 °C until processing.

Identification of InDels

A total of 59 different suid whole genomes (Groenen et al. 2012; Molnár et al. 2014) were downloaded from the public SRA database of NCBI and were compared to identify sequence differences between wild boars and individuals of other breeds. The whole-genome sequencing (WGS) data of 13 wild boars, 4 Mangalicas and 42 other breeds (dominantly Duroc, landrace, large white, Meishan, Pietrain and some less widespread breeds) were obtained. Paired-end sequence reads were mapped against the indexed pig reference genome assembly (Sscrofa10.2) using the Burrows-Wheeler Alignment (BWA) tool (Li and Durbin 2009). Unmapped and non-unique reads were filtered out, and the obtained alignment result files (in bam format) were sorted and indexed using SAMtools (Li et al. 2009). Variant calling was performed with SAMtools mpileup. Species-specific sequence differences were extracted from variant callings using BCFtools (Li et al. 2009) and filtered on the basis of specificity and size. Selected loci were visually checked using integrative genomics viewer (IGV) software (Robinson et al. 2011) on available sequences. Planned primer sequences were verified using UCSC in silico PCR (http://genome.ucsc.edu/cgi-bin/hgPcr).

Marker Development and Genotyping

Wild boar species-specific insertions or deletions within the approximate size of 100–1000 bp were targeted, as this range has the potential of easy identification with agarose gel electrophoresis (AGE). Primer3 (Rozen and Skaletsky 2000) was used to design primers for the identified regions. Selected primers were individually tested on wild boar and domestic pig samples and optimised for multiplex PCR to achieve efficient amplification and fidelity. Two sets of primers were designed on the same regions: one unlabelled for the simple AGE method and a fluorescent-labelled set feasible for rapid and high-throughput screening of a bigger number of samples by capillary electrophoresis.

Unlabelled W Marker Set

Multiplex PCR amplifications were performed in a total volume of 15 μL containing 60-ng template DNA, 0.5 μL of each primer and 1 × QIAGEN Multiplex PCR Master Mix (QIAGEN, Germany). Amplification was conducted in a LifeECO thermal cycler (BIOER, China) using the following cycling conditions: initial activation at 95 °C for 15 min, followed by 40 cycles of denaturation at 94 °C for 40 s, annealing at 60 °C for 40 s and extension at 72 °C for 30 s, with a final extension step at 72 °C for 5 min (detailed primer concentrations and cycling conditions are found in supplementary material Table S1). PCR products were separated initially with AGE using 1.2% agarose gel. GeneRuler 1kb DNA Ladder (Thermo Fisher Scientific, USA) was used as a size standard.

Labelled W Marker Set

The unlabelled primers were redesigned to a maximum of 500 bp length for the fragment-analysis method. The amplifications were performed in a total volume of 25 μL containing 45-ng template DNA, the optimum concentration of each primer and 1 × QIAGEN Multiplex PCR Master Mix (QIAGEN, Germany) (detailed primer concentrations and cycling conditions are found in supplementary material Table S1). PCR conditions were the same as used in the case of the unlabelled set.

Fluorescently labelled PCR products were separated on an ABI 3130xl Genetic Analyzer (16 capillaries, 50 cm length) using BigDye Terminator v3.1 Cycle Sequencing Kit and POP-7 polymer for the separation following 100× dilution. LIZ 500 (Applied Biosystems) was used as an internal standard. Allele designation was made with Peak Scanner v1.0 software (Applied Biosystems).

Genotyping of individuals was done based on gel images using GeneRuler 100 bp+ DNA Ladder (Thermo Fisher Scientific, USA) as size standard; individual genotypes were stored and analysed as allele size tables. Thirteen tetrameric STR markers (Lin et al. 2014) were used to make a comparison with the InDel genetic composition results of the collected samples. Multiplex PCR conditions were partially modified for better amplification; genotyping was done as described by Lin et al. (2014) (adapted primer concentrations and cycling conditions are found in supplementary material Table S2).

Breed Assignments and Statistical Analysis

Allele frequencies, heterozygosity values and genetic indices were calculated using the software GenAlEx (Peakall and Smouse 2012) and custom scripts. Parameters to evaluate the efficiency, such as random match probabilities (RMPs) and diagnostic effectiveness and predictive values for each locus and profile (i.e. accumulated values for the whole marker set), were calculated using well-established formulas following Pereira et al. (2009) and Fondevila et al. (2012).

Several different approaches were used to assess the applicability of the markers for breed assignment purposes. Firstly, the probability of an animal belonging to a certain breed was calculated with the partial Bayesian approach of Rannala and Mountain (1997) implemented in GeneClass2 (Piry et al. 2004), with 10,000 simulated multi-locus genotypes and a threshold for the individual exclusion of 0.01. The Bayesian model-based clustering algorithm of the STRUCTURE software (Pritchard et al. 2000) was also applied, using the admixture model and correlated allele frequencies with no prior breed indicated; ten independent runs for each K from one to ten were performed with 750,000 iterations after a 250,000-iteration burn-in period. To determine the number of genetic clusters, we used the method of Evanno et al. (2005) based on the second-order rate of change in log probability between successive K values as implemented by the program Structure Harvester (Earl and vonHoldt 2012). All breed assignment analyses were made with the InDel and STR marker results separately and then with the entire dataset combined.

Results and Discussion

InDel Detection and Marker Testing

Whole-genome sequencing data of pigs and wild boars was used for InDel detection. The bioinformatic pipeline identified 39,716 loci with specific differences between the genomes of wild boar and other pigs. After initial filtering, five wild boar-specific InDels were selected for further testing in the adequate 100–1000-bp size range. Primer design was successful on all five loci, and the primers designed were tested on different pig and wild boar samples individually. Three of the markers amplified well, with analytical potential, based on AGE images (Fig. 1). These three markers were applicable for easy individual genotyping of samples. Chromosome locations, fragment lengths and primer sequences of the markers used in the multiplex genotyping are shown in Table 2.

Fig. 1
figure 1

Agarose gel electrophoresis (AGE) image of the PCR products: a with the labelled marker set (ladder: GeneRuler 100 bp+), b with the unlabelled marker set (ladder: 1-kb DNA ladder). W, wild boar; nW, non-wild boar allele; allele lengths of different genotypes are marked on the right

Table 2 The chromosome locations, primer sequences, labelling dye codes and PCR product lengths of the markers designed

The three InDel markers (referred to as W markers hereinafter) were tested on different pig and wild boar samples individually and then combined in a multiplex protocol for efficient daily use application. Different fragment sizes were well separated in gel electrophoresis; thus, different genotypes were easy to identify. Combining the three markers in one reaction gives a rapid result and makes sample identification more cost-effective. AGE images of PCR product sizes and allele lengths for the different genotypes are shown in Fig. 1a and b.

The AGE method alone proved to be feasible for the separation of PCR products, making the genotyping of the DNA samples quite simple. However, fluorescent gel electrophoresis adaptation was implemented for better resolution by labelling the PCR products with fluorescent dyes. Three different fluorescent dyes were used to tag one of the primers of each of the three W markers (Table 2).

Genotyping and Breed Assignment

The pig and wild boar reference set used for characterising the W markers comprised 120 pigs from different breeds and 65 wild boar individuals (Table 1). Samples added for validation (column 3 of Table 1) were not included in the reference genotyping. In order to prove species specificity, genomic samples of other non-suid species occurring in meat products were tested, including cattle (Bos taurus), sheep (Ovis aries), horse (Equus caballus) and two poultry species (Anser anser, Gallus gallus). Interspecies tests gave no PCR products in the expected size range. In silico PCR predicted non-specific products as well when tested on the genomes of turkey (Meleagris gallopavo) and rat (Rattus rattus) (http://genome.ucsc.edu/cgi-bin/hgPcr).

All three InDels amplified efficiently and reliably with detectable products, and all markers proved to be biallelic on the reference samples. The deletion alleles of the selected InDels were not exclusively wild boar specific nor were all the insertions pig specific. But, there were differences between the allele frequencies of wild boars and pigs; thus, the markers are capable of the differentiation of wild boars. The predictive values of markers for detecting wild boars were moderate when tested individually, as it was assumable for biallelic markers. However, the combined predictive value was much higher (0.996), meaning that the use of the three W markers together is capable of detecting wild boars with a probability over 99%, based on our reference sample set. Reference allele frequencies, expected and observed heterozygosity of the individual markers with the individual and combined random match probability, diagnostic effectiveness and predictive values are shown in Table 3. Expected and observed heterozygosity values varied within a wide range (0.117–0.615) and were similar to those described by Lin et al. (2014), but were relatively lower than the findings of Velickovic in the case of a domestic pig as well as wild boar (Table 3).

Table 3 Allele frequencies expected and observed heterozygosity of the individual markers with the individual and combined random match probability, diagnostic effectiveness and predictive values

Bayesian Cluster Analysis in STRUCTURE

Breed assignment was performed using the program STRUCTURE; the Q plot output was used to visualise the clustering results. In the case of W markers, both likelihood scores and the second-order rate of change in log probability indicate the presence of two groups in the sample set (Fig. 2a). This meets our presumption that the W markers are able to distinguish between wild boar and domestic pig samples.

Fig. 2
figure 2

Q plot output of STRUCTURE clustering results: a using the three wild boar-specific markers designed, b using the STR marker set, c using the combined W and STR markers. K, number of clusters; P, Pietrain; H, Hampshire; LW, large white; H39, H39 × large white; L, landrace; D, Duroc; MxD, Mangalica × Duroc; BM, blond Mangalica; SM, swallow-belly Mangalica; RM, red Mangalica; WB, wild boar; WB2, additional wild boar samples used in validation; D2, additional Duroc samples used in validation

Two groups in the population had the highest statistical probability in the case of STR markers as well; however, classification of the different breeds was not consequent; thereby, these markers were not reliable in separating wild boar from other breeds (Fig. 2b). Additionally, we found that in most cases, STRs classified the indigenous Mangalica breeds as wild boar. The underlying reason for this can be that the genetic relationship of Mangalica with wild boar is much closer than that with other breeds, and crossbreeding could occur. Mangalica breeds and the extensive management techniques associated with these pigs (i.e. free herding) allowed crossbreeding with wild boars (Manunza et al. 2016; Frank et al. 2017).

Combining the dataset of W and STR markers had the best reliability to distinguish wild boar from a domestic pig. In this case, the optimal classification had four groups, and the results were highly consequent in identifying wild boars, domestic breeds and crossbreeds/hybrids (Fig. 2c). However, the reliability of W markers alone did not differ significantly from the combined W plus STR results.

Bayesian Cluster Analysis with GeneClass2

With the partial Bayesian approach implemented in GeneClass2, using only the W markers could correctly identify 115 (95.8%) of the 120 domestic pig reference samples as domestic, and 61 (93.9%) of the 65 wild boars as wild boar (assignment score 75<) (Table 4). The remaining four wild boar and five domestic pig samples had assignment scores of equal probability for being domestic or wild (assignment score < 75), and these samples were classified as wild-domestic crossbreeds (Table 4). The clustering results again indicate that the W markers alone were able to distinguish wild boar and domestic pig samples with fairly high reliability (Fig. 3a).

Table 4 Bayesian clustering of the reference samples using the InDel markers with GeneClass2
Fig. 3
figure 3

Bayesian clustering results: a using the three wild boar-specific markers designed, b using the STR marker set, c using the combined W and STR markers. K, number of clusters; P, Pietrain; H, Hampshire; LW, large white; H39, H39 × large white; L, landrace; D, Duroc; MxD, Mangalica × Duroc; BM, blond Mangalica; SM, swallow-belly Mangalica; RM, red Mangalica; WB, wild boar; WB2, additional wild boar samples used in validation; D2, additional Duroc samples used in validation

Using the STR marker set for assignment test gave a slightly different result (Fig. 3b). In this case, all wild boar samples were identified as wild, but some of the Mangalica and crossbreeds were also identified as wild boar or as a crossbreed. This can be a consequence of the known genetic connectivity between Mangalica breeds and wild boars, as mentioned before.

Combining the dataset of the W and STR markers gave the best separation, similar to STRUCTURE results, although in this case, all domestic pig samples were identified as domestic and every wild boar as wild (Fig. 3c). However, again as in the case of STRUCTURE, the reliability of W markers alone did not differ significantly from that of the combined W and STR dataset.

In the validation process, a trial test was made with an additional 12 Duroc samples with known pedigree and 12 wild boar samples to test the reliability and consistency of the markers. In the case of the W marker set, both clustering methods classified Duroc samples as domestic and the others as wild boar, as expected (Figs. 2a and 3a). Breed assignment tests confirm the expectation that the designed InDel markers are suitable for the identification of wild boar in meat samples. The identification of hybrids with the multiplex W marker set alone was possible as well. In the case of samples identified as a hybrid, further, higher resolution methods such as SNPs, microsatellites and partial or whole-genome sequencing can provide more information about the genetic background. The STR-based method designed by Lin et al. (2014) has a much higher error, as it could not discriminate between Mangalica and wild boar. The use of microsatellites in the study of Velickovic et al. (2014) gave a better resolution in the investigated populations than that described by Lin et al. (2014), but the use of capillary electrophoresis increases the cost and decreases the throughput compared with the InDel-based method described here using AGE separation. Although the combination of the above described W marker set characterises only 3 shorter but specific regions of the whole genome, the reliability seems much higher than that of methods based on colour genes like MC1R (Fajardo et al. 2008; Fernández et al. 2004) or on mtDNA (Fang and Andersson 2006). Conyers et al. (2012) designed a method based on microsatellite markers, namely 20 simple sequence repeats (SSR) for the differentiation of wild boar from a domestic pig in meat samples, which proved to be feasible for deciding whether or not a meat sample contained wild boar, but unsuitable for the accurate quantification in mixtures; the same can be concluded in case of the W marker set. The reliability of the method based on 20 SSRs was very close to our findings, although it seems to be difficult to use in routine tests as it is quite complex, results cannot be obtained in simple, single-step analysis, and evaluation demands knowledge in the use of various statistical tools (Conyers et al. 2012). In the case of the W marker set sample preparation, PCR amplification and AGE separation can be done easily, and the evaluation does not require complex statistical analysis. This could represent the main benefit of the W marker set designed. Even a higher number of microsatellite markers may have lower potential in discriminating closer related subspecies (as it seemed when using the STR markers on Mangalica and wild boar samples in our study) if the polymorphism of the involved genomic regions is not sufficiently high (Conyers et al. 2012). In our case, the combination of only three specific biallelic InDel markers was sufficient to provide a predictive value higher than 99%, comparable with the methods based on 13 or 20 STRs. SNP chips can be very accurate and specific in identification; their disadvantage is the cost and the complexity of the method and the evaluation, respectively (Ramos et al. 2011). Isothermal ‘on-site’ amplification with test stripe detection of products can be an optional pre-filtering method in the traceability control of products. Szántó-Egész et al. (2016) successfully designed a recombinase polymerase amplification (RPA)-based breed-specific method for the rapid detection of Mangalica meat in food. The W plex InDel marker set would be feasible for the development of a similar rapid field test. This can be a further perspective of our study, as it is not yet adapted for the detection of wild boar in meat. For DNA quantification in meat products, Floren et al. (2015) designed a workflow chart with real-time PCR as a first step followed by a one-step or two-step droplet digital PCR. This ensures that undesired admixtures can be detected even at very low concentrations in a product. Our recommendation would be to include a more simple and fairly rapid InDel-based test as an initial step and to implement quantitative PCR only if this first test is positive.

Conclusions

Identification of wild boar-specific insertions/deletions with bioinformatic tools using whole-genome sequences proved to be successful. We were able to design specific primers for the amplification of three from the five identified InDels. The primers designed could be used successfully in simple multiplex reactions; PCR products were feasible in size for AGE identification. Validation showed that the designed W marker set proved to be a sensitive tool for distinguishing wild boar from other domesticated breeds in meat samples. The autosomal STR marker results had less confidence in separation than the W markers. A combination of the two methods resulted the most reliable method for identification; however, W markers alone also enabled reliable separation. Thus, the newly developed markers have the potential for rapid, sensitive and reliable identification of wild boar meat content of food products in routine laboratory practice. The use of the reported method in food quality control can mean a simple and cost-effective way to maintain consumer confidence and to support the competitiveness of fair producers.