High-throughput retrotransposon-based genetic diversity of maize germplasm assessment and analysis

Maize is one of the world’s most important crops and a model for grass genome research. Long terminal repeat (LTR) retrotransposons comprise most of the maize genome; their ability to produce new copies makes them efficient high-throughput genetic markers. Inter-retrotransposon-amplified polymorphisms (IRAPs) were used to study the genetic diversity of maize germplasm. Five LTR retrotransposons (Huck, Tekay, Opie, Ji, and Grande) were chosen, based on their large number of copies in the maize genome, whereas polymerase chain reaction primers were designed based on consensus LTR sequences. The LTR primers showed high quality and reproducible DNA fingerprints, with a total of 677 bands including 392 polymorphic bands showing 58% polymorphism between maize hybrid lines. These markers were used to identify genetic similarities among all lines of maize. Analysis of genetic similarity was carried out based on polymorphic amplicon profiles and genetic similarity phylogeny analysis. This diversity was expected to display ecogeographical patterns of variation and local adaptation. The clustering method showed that the varieties were grouped into three clusters differing in ecogeographical origin. Each of these clusters comprised divergent hybrids with convergent characters. The clusters reflected the differences among maize hybrids and were in accordance with their pedigree. The IRAP technique is an efficient high-throughput genetic marker-generating method.


Introduction
The long terminal repeat (LTR) retrotransposons (RLX) [1,2] are a large class of transposable elements that propagate in the genome by a "copy-and-paste" mechanism that is essentially identical to the intracellular phase of retrovirus replication [1][2][3][4][5], in contrast to the "cut-and-paste" mobility of DNA transposons. The RLX lifecycle involves transcription of an integrated copy, reverse transcription of the transcript into cDNA, and integration of the new copy. Because the RLX mother copy remains part of the chromosome and the daughter copies integrate at new loci, the precise insertion points for the daughter are unlikely to be identical in lines diverging by descent. Complete understanding of the genome and the relationship between genotype and phenotype requires knowledge of both the role and function of the genes as well as of the repetitive component, particularly regarding RLX dynamics [3]. Most eukaryotic genomes comprise over 70% repetitive DNA, with gene numbers, ranging from 10,000 to 50,000, showing much less variation at the monoploid level [4][5][6][7]. Particularly in higher plants, RLXs compose more than half of the repetitive DNA; they not only facilitate homologous recombination, but also can undergo intra-and inter-RLX recombination that is part of their dynamism [4,[8][9][10]. The RLXs are generally dispersed throughout genome, displaying relatively high structural diversity [11][12][13][14][15]. Retroelements have been suggested as an important creative force in genome evolution, driving processes such as mutation, recombination, genome expansion, and adaptation of an organism to changing environmental conditions [3,14,16].
These retrotransposon-based high-throughput genetic DNA fingerprinting methods are both highly informative and polymorphic, even in areas of chromosomes showing low levels of inter-genic recombination and therefore haplotypes with few genic single-nucleotide polymorphisms (SNPs); RLX markers are consistent with geographical and morphological data. The stability of retrotransposon integration sites and recombination events allows them to be used as molecular genetic markers in genetic map construction [28][29][30][31][32]. Retrotransposon markers have also been widely used to assess genetic diversity in many species [33][34][35][36][37][38][39]. Given that plant retrotransposons are stressactivated [15,40], their role in generating ecogeographical patterns of genomic diversity is of particular interest. Retrotransposon markers have been applied successfully to the analysis of genetic diversity in various genera and species, such as apple, rice, sunflower, grapevine, flax, and alfalfa [38,41,42].
Maize (Zea mays L.) is a very special species among the cereal crops, because of its high phenotypic and genomic diversity [43]. Maize is important worldwide as a food and feed crop, and also as an energy crop, due to its high biomass potential. Moreover, it has long been used as a model organism for plant biology. Maize was the first eukaryote in which transposable elements (TEs) were discovered, during the mid-twentieth century, by the Barbara McClintock, bringing her the Nobel Prize [44]. TEs and, particularly, retrotransposons, comprise most of the maize genome; 95% of maize TEs are RLXs [43,45,46]. Through their copy number variation, rearrangements, and polymorphic loci, the TEs contribute most of the genome variation between maize lines, where amplification of a few retrotransposon families is the major cause of "genomic obesity" [47,48]. High-throughput sequencing methodologies have demonstrated that some families of TEs show considerable transcriptional activity.
In the present study, we developed and applied a highthroughput IRAP technique for five RLX families to detect genetic polymorphisms among maize germplasm. These families (Opie, Ji, Cinful, Huck, and Grande) comprise a large fraction of the maize genome [49][50][51][52][53][54], up to 25% of the total. The main goal was to find efficient and highthroughput retrotransposon markers for diversity analyses and to assess the polymorphism of these markers among maize genotypes originating from different ecogeographical origins. The IRAP genetic markers developed were used to compare the genetic variability among maize cultivars and breeding lines differing in ecogeographical origin to detect correlations between phenotypic characters and retrotransposon markers.

Plant material and DNA extraction
Grains of maize lines and hybrids were kindly provided by the Maize Research Section, Agricultural Research Center (ARC) and U.S. Department of Agriculture (USDA). The names of these hybrids are listed in Table 1 and Supplemental Data 1. Further data on the genotypes can be found on the National Germplasm Resources Laboratory homepage (https ://npgsw eb.ars-grin.gov/gring lobal /taxon omyge nus. aspx?id=13020 ).

TE sequence sources and PCR primer design
Thirty LTR primers were designed, based on the most abundant RLX groups in maize (Cinful1 (AC231746), Huck1 (AC230001), Ji (DQ002406), Opie (AY664413), Grande (AY664416.1:70909-83340), and Tekay (AF050455)). The RLX sequences were obtained from the TRansposable Elements Platform (TREP) database (https ://botse rv2.uzh.ch/kelld ata/trep-db/), and analysis of homologous sequences was performed on the output of the National Center for Biotechnology Information (NCBI) search results. For a given family of retrotransposons, their LTRs showed sequence variability, but certain regions were relatively conserved. For each family, the sequence accessions were aligned and conservation assessed with the multiple alignment procedure of MULTALIN [56]. The conserved segments of the LTR of the retrotransposons were used for the design of PCR primers, which was carried out with the program FastPCR (https ://prime rdigi tal.com/fastp cr.html) [57][58][59]. Several inverted primers at both ends of the LTRs of each retrotransposon to compare the efficiency and reproducibility of amplification were designed. The sequences of the primers are shown in Table 2. None of the primers chosen formed self-dimers, and all showed high PCR efficiency for IRAP fingerprinting. The chosen primers matched the motifs sufficiently conserved in the retrotransposons to allow amplification of the great majority of targets in the genome. The PCR products were separated by electrophoresis at 70 V for 8-10 h in a 1.4% agarose gel (Wide Range; SERVA Electrophoresis GmbH, Heidelberg, Germany) with 1 × Tris-HEPES-EDTA (THE) electrophoresis buffer [18]. The Thermo Scientific GeneRuler DNA Ladder Mix, 100-10,000 base pairs (bp), #SM0332, was used as a standard. The gels were stained with ethidium bromide (EtBr) and scanned, using an FLA-5100 imaging system (FUJi Photo Film GmbH; now FUJIFILM Europe GmbH, Heidelberg, Germany) with a resolution of 50 µm.

Data analysis
From the IRAP fingerprint profiles, all clear bands were scored at each band position for each primer in all samples. Polymorphic bands (PBs) of the same size were assumed to represent a single locus. The presence or absence of a fragment of a given length was recorded in binary code. The gels were scored of a total of 677 PBs for the samples. Based on the primary data, the level of genetic diversity as defined by Nei was determined, using Arlequin software [60]. The method applied was based on cluster analysis expressing the relationships of the hybrids examined as a distance percentage in a cluster tree and similarity matrix. The data were analyzed, using NTSYSpc software (Numerical Taxonomy and Multivariate Analysis System) version 2.11 (https :// www.exete rsoft ware.com/cat/ntsys pc/ntsys pc.html).
The primary genetic data were bootstrapped with SEQ-BOOT (https ://csbf.stanf ord.edu/phyli p/seqbo ot.html), after which the pairwise genetic distances were calculated, using the Genetic Distance Matrix Program (GENDIST) (https ://www.bablo kb.de/gendi st/). Both programs are from PHYLIP (Phylogeny Inference Package) software package (https ://evolu tion.gs.washi ngton .edu/phyli p.html). The ability of IRAP markers to reveal genetic relationships among all the maize accessions was evaluated phylogenetically by neighbor-joining (NJ), for which an algorithm was constructed, using PAUP software [61]. Support for the tree was determined by performing 1000 bootstrap operations on the dataset generated by distance analysis. To determine the partitioning of the IRAP genetic variation into inter-and intrapopulation variance components, analysis of molecular variance (AMOVA) was conducted with the program Genetic Analysis in Excel (GenAlex) 6.5 [62].
Summary statistics related to the number of bands generated by each genotype (NTI) and for each group only, including the number of PBs, percentage of polymorphic loci (PPL%), number of private bands (NPB), Shannon's Information Index (I), genetic differentiation index (PhiPT) among populations, Nei's genetic distance (D), and Nei's genetic identity (IN), were calculated using GenAlex 6.5. Genetic distance, using minimum Jaccard coefficients, was calculated with Factor Analysis of Mixed Data (FAMD) 1.31. A dendrogram for the studied genotypes was constructed, based on the maximum likelihood method [63], using MEGA X software [64].

In silico PCR analysis of the maize genome
We performed in silico IRAP analysis, using FastPCR software for the maize (B73 RefGen_v4) (https ://www.ncbi. nlm.nih.gov/genom e/12) and sorghum (Sorghum bicolor (L.) Moench) (Sorghum_bicolor_NCBIv3) (https ://www. ncbi.nlm.nih.gov/genom e/108) genomes, using a single LTR primer corresponding to a sequence highly conserved in the RLXs examined. For in silico IRAP analysis, we applied the default options, since the length of potential PCR products varied from 50 to 3000 bp and allowed a single mismatch within the 3′-termini of the LTR primer. The results of the in silico IRAP analysis are represented in Table 3. As expected, no amplicons were predicted for the Sorghum bicolor genome, due to high divergence of the RLX sequences in the maize genome from those in Sorghum bicolor.

Diversity assay among Egyptian maize hybrids
The maize genome is composed of a diverse group of RLXs that are major sources of genetic variations. The selection of effective LTR primers must consider the abundance and distribution of the RLX family. In this study, the LTR primers used were designed for five different LTR retrotransposon families belonging to the high-copy classes gypsy and copia. LTR primers are usually designed to complement areas as close as possible to the 5′ or 3′ ends of the LTR, which is the most conservative part, and to minimize the amplification of long LTR fragments. The effectiveness of IRAP amplification was directly dependents not only on the total number of copies of the element, but also on the degree of LTR regional conservation. Screening for single primers resulted in the selection of 30 LTR primers for IRAP (Fig. 1). All selected LTR primers yielded 20-60 scorable bands and showed high-quality reproducible DNA fingerprints. Overall, 677 amplicons were scored as effective, using all LTR primers from the five RLX families, of which 392 were polymorphic, with a mean polymorphism of 58%. A total of 40 unique IRAP amplicons were generated, including 22 positive and 18 negative unique bands, using 20 primers. Primers 4319 (Opie) and 4324 (Grande) produced the highest number of unique bands (7 and 6, respectively). The Grande RLX showed the highest number of unique bands (13), while the Huck LTR primers produced the smallest number (3). These results were expected and corresponded to the result of in silico IRAP analysis. The maize genome is relatively large (2182.61 Mb), and consists of about 50% RLX sequences. The size of the genome, with the main part comprising RLXs, increased the efficiency of the RLX-based methods and resulted in a high percentage of polymorphism, using a single primer. Due to the ability of retrotransposons to integrate into a multitude of loci in the genome, they constituted informative molecular markers for the plant species. The pattern obtained was related to the copy number and the size of the RLX family. The PCR products and the polymorphism resulted from the amplification of hundreds to many thousands of target sites in the genome. These polymorphisms functioned as means of identification, in detecting genetic erosion, and in revealing genetic relationships.
The LTR primers revealed different levels of polymorphism among the maize lines examined. Tables 4 and 5 showed that the LTR primers applied in 16 maize hybrids produced 677 bands; 392 of these were PBs and 285 were monomorphic bands. LTR primer 4304 (Huck) produced the highest number of bands (58) and primer 4325 (Tekay) the lowest number (19). The highest percentage of PBs was produced by primer 4310 (93%) for the Ji RLX and the lowest percentage by Opie LTR primer 4318 (26%). The number of polymorphic amplicons per primer ranged from 8 with 4318 (Opie) to 33 with 4306 (Huck). On average, the number of amplicons per primer throughout the 16 genotypes was 34 and for polymorphic amplicons were 20%. Various levels of polymorphism among primers were detected that ranged from 93% for primer 4310 (Ji) to 26% for primer 4318 (Opie) and were also recorded for each TE family. The primers based on the Grande elements showed the highest percentage of polymorphism (75%), compared with those based on the Ji, Huck, Opie, and Tekay elements, which showed total polymorphism of 55%, 53%, 48%, and 63%, respectively.

IRAP markers based on the Ji family
Five LTR primers for the Ji RLX were tested in 16 maize hybrids, four of which (4309-4312) revealed variability among maize hybrids ( Table 4). The bands ranged from 100 to 2000 bp. The total number of bands generated from the Ji element-based primers was 149; 82 were PBs with a mean polymorphism of 55%. Unique bands, either positive or negative (present or absent), characterized the maize hybrids. Five positive unique bands of 680, 2550, 1120, 410, and 500 bp characterized hybrids H-10, H-7, and H-16, whereas only one negative band of 298 bp, resulting from primer 4309, characterized H-11. Each band distinguished its respective hybrid and could be used as a fingerprint.

IRAP markers based on the Huck family
Five LTR primers for the Huck RLX were tested, four of which (4303-4307) revealed variability among the maize hybrids (Table 4). A total of 181 number of bands were generated by the Huck-based primers, and ranged from 160 to 5100 bp; 96 PBs and 85 monomorphic bands were generated, with a mean of 53% polymorphic. The Huck-based LTR primers characterized H-15 and H-11 with two unique bands of 1301 and 1600 bp, respectively, while a single unique negative band of 2400 bp was generated from primer 4303 and characterized H-5 (Table 5). These bands could be used to fingerprint their respective genotypes.

IRAP markers based on the Opie family
Seven LTR primers for the Opie RLX were tested in maize hybrids, of which primers 4315-4319 were suitable for scoring amplicons. A total of 126 bands were evaluated, of which 61 were PBs with a mean polymorphism of 48% ( Table 4). The Opie LTR primers detected five positive unique bands with molecular sizes of 560, 720, 730, 1,031, and 2050 bp for hybrids H-16, H-7, H-14, H-16, and H-13, respectively. LTR primer 4317 detected one unique negative band of 720 bp for hybrid H-9, whereas primer 4319 distinguished H-13 with five unique negative bands with molecular sizes of 1080, 1090, 1400, 1510, and 1800 bp.

IRAP markers based on the Grande element
Five LTR primers for the Grande RLX were tested in the maize hybrids, four of which (4320-4324) enabled molecular genetic evaluation. In all, 112 bands were detected, using Grande LTR primers, of which 84 were PBs with a mean polymorphism of 75%; the bands ranged from 250 to 3100 bp (Table 4) (Table 5).
The IRAP analysis used in this study succeeded in demonstrating positive and negative unique markers that aided in genotype discrimination. In all, 16 out of the 20 primers used revealed 40 unique IRAP markers, of which four (4311, 4306, 4307, 4318) did not produce any unique bands. The

Genetic relationships among maize hybrids
Understanding the relationships among genotypes within and between species has valuable applications in crop improvement programs. For this task, we selected two standard maize hybrids (A619 x A632 and B73 x Mo17) and their parental inbred lines (A619, A632, B73, Mo17) to evaluate the effectiveness of the IRAP method. Singleseed-derived DNA samples exclusively were used in this analysis. The bands shared between a hybrid and its parent inbred lines (Fig. 2) are clearly visible and also reflect by their decreased brightness the allelic dosage in the hybrid compared with the parent. For example, primer 4317 (Opie) yielded about 45 bands that could be detected well for all these maize lines, of which 25 were polymorphic (45%). Similar for primer 4320 (Grande), about 32 bands were well detected for all these maize lines, of which 16 were polymorphic (50%). The IRAP banding profiles, which displayed from 21 to 95% polymorphism, were used to identify genetic similarity in the tested maize hybrid lines ( Table 5). The highest similarity value (95%) was observed between the two white triple-cross hybrids (H-11 and H-12), which possess a common ancestor and share two parents, as seen in their pedigree (Table 1). In contrast, the lowest genetic similarity value (21%) was detected between the single-cross hybrid (H-4) and the triple-cross white maize (H-11), indicating that these two hybrids were the most divergent genotypes. This dissimilarity can be attributed to the two genotypes inheriting their genetic makeup from different ancestors ( Table 1). Genotypes that have low genetic similarity are of great interest for maize breeders. Weising et al. [65] mentioned that it is mandatory that genetically divergent parents be chosen that exhibit sufficient numbers of polymorphisms, but are not so distant as to cause sterility of the progeny. Estimation of genetic similarity based on molecular data is dependent on several factors, such as the number of markers analyzed, their distribution throughout the genome, and the quality of marker scoring. It is difficult to compare genetic distance between different studies, due to the difference in materials, number of genotypes analyzed, the number of alleles detected per marker, and the genetic diversity of the markers.
The genetic similarity coefficient determined for the maize hybrids was employed to develop a dendrogram based on IRAP data, as shown in Fig. 3. Cluster analysis resolved the 16 maize hybrids into two main clusters (A and B). The first cluster (A) was divided into two subclusters (C and D). The first subcluster (C) contained hybrid H-6 as a separate group, while the other subcluster (D) contained two groups; the first of which included H-4 and H-5, while the other contained H-9 (white single-cross) as a separate group. All yellow single-cross maize hybrids were clearly grouped in Fig. 3 Dendrogram of maize genotypes generated by the IRAP primers the main cluster A. In contrast, cluster B was divided into two subclusters (E and F). Subcluster E contained all white triple-cross maize hybrids (H-11, H-12, H-10, H-13, H-14, and H-15), whereas the yellow triple-cross (H-16) was separated into a different group. Subcluster F contained all white single-cross maize hybrids in a separate group, while H-7 branched out into a different group.

Conclusion
Here, we developed and applied a high-throughput IRAP technique for five LTR retrotransposon families to detect genetic polymorphisms among maize germplasm. These RLX families included Opie, Ji, Cinful, Huck, and Grande, which together comprise a large fraction of the maize genome. The RLX polymorphism captures the record of integration events, which are driven by retrotransposon activation and replication, that have been fixed in the germ line and inherited, and their subsequent fate in plant populations. The main goal of this study was to find efficient and high-throughput LTR retrotransposon markers for diversity analyses and to assess the polymorphism of these markers among maize genotypes originating from differing ecogeographical origins.
We successfully characterized the maize genotypes in worldwide and Egyptian collections, using high-throughput IRAP fingerprinting DNA markers. The DNA analysis of the lines of maize germplasm showed that even single LTR primers can be successfully used in the assessment of genetic differences at the line level and display several advantages, such as robustness, informativeness, and efficiency in breeding selection. We demonstrated here that the IRAP marker system provides a useful and simple electrophoretic technique for studying genetic diversity in maize, as they have in other plant species. The LTR primers used yielded multilocus fingerprints, displaying sufficiently high levels of polymorphism to differentiate between maize accessions and grouping them according to their cross level and kernel color. The markers are informative, reliable, and inexpensive for maize breeders and researchers. The number of differences between maize lines was sufficient to easily identify them as separate genotypes, correlated with their phenotypic differences.