Abstract
Microsatellites, also known as simple sequence repeats (SSRs), are the preferred type of marker for many genetic applications. In conjunction with the ongoing development of next-generation sequencing, several bioinformatic tools have been developed for identifying SSRs from genomic or transcriptomic sequences. Although these tools are handy for generating polymorphic SSRs, their application almost always depends on an existing reference genome or self-assembly of the reference genome. With this in mind, we propose a pipeline for developing polymorphic SSRs that may be applied to species without reference genomes. Using a species without a reference genome (black Amur bream; Megalobrama terminalis Richardson, 1846) as a model, our pipeline was able to effectively discover polymorphic SSRs. Under different R parameters of a reference-free single nucleotide polymorphisms (SNPs) caller (ebwt2InDel), a total of 258, 208, 102, and 11 polymorphic SSRs were mined. To quantify the accuracy of the polymorphic SSRs detected using our pipeline, we analyzed 25 SSRs with PCR experiments. All primers were successfully amplified, and most SSRs (23 SSRs, 92%) were polymorphic. From the 36 individual black Amur bream, we acquired an average of 3.36 alleles per locus, ranging from one to 11. This demonstrates the effectiveness of our pipeline in identifying polymorphic SSRs and designing primers for SSR genotyping. Ultimately, our pipeline can effectively mine polymorphic SSRs for species without reference genomes, complementing SSR mining approaches based on reference genomes and helping to resolve biological issues that accompany these methods.
This is a preview of subscription content, access via your institution.




Availability of data and materials
The genome sequence data supporting this study's findings are available in GenBank of NCBI at [https://www.ncbi.nlm.nih.gov] under the accession number OM982470 to OM982494. The associated BioProject number is PRJNA813998. Other sequence read archives were downloaded under the BioProject numbers PRJNA756243, PRJNA640946, and PRJNA688781. The reference genome of Acanthopagrus latus was downloaded under the Assembly number GCF_904848185.
Code availability
All codes related to the current pipeline are included as supplementary material.
References
Aitken KS (2021) History and development of molecular markers for sugarcane breeding. Sugar Tech 24(1):341–353. https://doi.org/10.1007/s12355-021-01000-7
Alves F, Martins FMS, Areias M, Munoz-Merida A (2022) Automating microsatellite screening and primer design from multi-individual libraries using Micro-Primers. Sci Rep 12(1):295. https://doi.org/10.1038/s41598-021-04275-8
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27(2):573–580. https://doi.org/10.1093/nar/27.2.573
Botstein D, White RL, Skolnick M, Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32(3):314–331
Cantarella C, D’Agostino N (2015) PSR: polymorphic SSR retrieval. BMC Res Notes 8:525. https://doi.org/10.1186/s13104-015-1474-4
Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17):i884–i890. https://doi.org/10.1093/bioinformatics/bty560
Cui X, Li C, Qin S, Huang Z, Gan B, Jiang Z, Huang X, Yang X, Li Q, Xiang X, Chen J, Zhao Y, Rong J (2022) High-throughput sequencing-based microsatellite genotyping for polyploids to resolve allele dosage uncertainty and improve analyses of genetic diversity, structure and differentiation: a case study of the hexaploid Camellia oleifera. Mol Ecol Resour 22(1):199–211. https://doi.org/10.1111/1755-0998.13469
da Maia LC, Palmieri DA, de Souza VQ, Kopp MM, de Carvalho FI, Costa de Oliveira A (2008) SSR locator: tool for simple sequence repeat discovery integrated with primer design and PCR simulation. Int J Plant Genom. https://doi.org/10.1155/2008/412696
Das R, Arora V, Jaiswal S, Iquebal MA, Angadi UB, Fatma S, Singh R, Shil S, Rai A, Kumar D (2018) PolyMorphPredict: a universal web-tool for rapid polymorphic microsatellite marker discovery from whole genome and transcriptome data. Front Plant Sci 9:1966. https://doi.org/10.3389/fpls.2018.01966
Dereeper A, Argout X, Billot C, Rami JF, Ruiz M (2007) SAT, a flexible and optimized Web application for SSR marker development. BMC Bioinform 8:465. https://doi.org/10.1186/1471-2105-8-465
Du RUI, Zhang D, Wang Y, Wang W, Gao Z (2013) Cross-species amplification of microsatellites in genera Megalobrama and Parabramis. J Genet 93(S2):106–109. https://doi.org/10.1007/s12041-013-0308-1
Fox G, Preziosi RF, Antwis RE, Benavides-Serrato M, Combe FJ, Harris WE, Hartley IR, Kitchener AC, de Kort SR, Nekaris AI, Rowntree JK (2019) Multi-individual microsatellite identification: a multiple genome approach to microsatellite design (MiMi). Mol Ecol Resour 19(6):1672–1680. https://doi.org/10.1111/1755-0998.13065
Green MR, Sambrook J (2018) Isolation of High-molecular-weight DNA from suspension cultures of mammalian cells using proteinase K and phenol. Cold Spring Harb Protoc 4:317–321. https://doi.org/10.1101/pdb.prot093476
Guang XM, Xia JQ, Lin JQ, Yu J, Wan QH, Fang SG (2019) IDSSR: an efficient pipeline for identifying polymorphic microsatellites from a single genome sequence. Int J Mol Sci. https://doi.org/10.3390/ijms20143497
Holland MM, Parson W (2011) GeneMarker(R) HID: a reliable software tool for the analysis of forensic STR data. J Forensic Sci 56(1):29–35. https://doi.org/10.1111/j.1556-4029.2010.01565.x
Jain A, Roorkiwal M, Kale S, Garg V, Yadala R, Varshney RK (2019) InDel markers: an extended marker resource for molecular breeding in chickpea. PLoS ONE 14(3):e0213999. https://doi.org/10.1371/journal.pone.0213999
Johnson MS, Black R (1984) The Wahlund effect and the geographical scale of variation in the intertidal limpet Siphonaria sp. Mar Biol 79(3):295–302. https://doi.org/10.1007/BF00393261
Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol Ecol 16(5):1099–1106. https://doi.org/10.1111/j.1365-294X.2007.03089.x
Kofler R, Schlotterer C, Lelley T (2007) SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics 23(13):1683–1685. https://doi.org/10.1093/bioinformatics/btm157
Lee JC, Tseng B, Ho BC, Linacre A (2015) pSTR Finder: a rapid method to discover polymorphic short tandem repeat markers from whole-genome sequences. Investig Genet 6:10. https://doi.org/10.1186/s13323-015-0027-x
Liu K, Feng X-y, Ma H-j, Xie N (2021) Development and characterization of 68 microsatellite markers of black amur bream Megalobrama terminalis by next-generation sequencing. Turk J Fish Aquat Sci 21(6):299–308. https://doi.org/10.4194/1303-2712-v21_6_05
Luo W, Wu Q, Yang L, Chen P, Yang S, Wang T, Wang Y, Du Z (2020) SSREnricher: a computational approach for large-scale identification of polymorphic microsatellites based on comparative transcriptome analysis. PeerJ 8:e9372. https://doi.org/10.7717/peerj.9372
Magoc T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27(21):2957–2963. https://doi.org/10.1093/bioinformatics/btr507
Mantaci S, Restivo A, Rosone G, Sciortino M (2007) An extension of the Burrows-Wheeler Transform. Theoret Comput Sci 387(3):298–312. https://doi.org/10.1016/j.tcs.2007.07.014
Meglecz E, Pech N, Gilles A, Dubut V, Hingamp P, Trilles A, Grenier R, Martin JF (2014) QDD version 3.1: a user-friendly computer program for microsatellite selection and primer design revisited: experimental validation of variables determining genotyping success rate. Mol Ecol Resour 14(6):1302–1313. https://doi.org/10.1111/1755-0998.12271
Pan G, Li Z, Huang S, Tao J, Shi Y, Chen A, Li J, Tang H, Chang L, Deng Y, Li D, Zhao L (2021) Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification. BMC Genom 22(1):595. https://doi.org/10.1186/s12864-021-07883-w
Prezza N, Pisanti N, Sciortino M, Rosone G (2020) Variable-order reference-free variant discovery with the Burrows-Wheeler Transform. BMC Bioinform 21(Suppl 8):260. https://doi.org/10.1186/s12859-020-03586-3
Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H, Turner DJ (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Methods 5(12):1005–1010. https://doi.org/10.1038/nmeth.1270
Rice WR (1989) Analyzing tables of statistical tests. Evolution 43(1):223–225. https://doi.org/10.1111/j.1558-5646.1989.tb04220.x
Rosone G (2018) Multi-string eBWT/LCP/GSA computation. Department of Informatics, University of Pisa. https://github.com/giovannarosone/BCR_LCP_GSA. Accessed 10 Dec 2021
Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST (2022) Database resources of the national center for biotechnology information. Nucleic Acids Res 50(D1):D20–D26. https://doi.org/10.1093/nar/gkab1112
Serrote CML, Reiniger LRS, Silva KB, Rabaiolli S, Stefanel CM (2020) Determining the polymorphism information content of a molecular marker. Gene 726:144175. https://doi.org/10.1016/j.gene.2019.144175
Song W, Zhu D, Lv Y, Wang W (2016) Development and characterization of 29 polymorphic microsatellite loci of Megalobrama pellegrini by next-generation sequencing technology and cross-species amplification in related species. PeerJ Preprints 4:e2490v2491. https://doi.org/10.7287/peerj.preprints.2490v1
Song W, Zhu D, Lv Y, Wang W (2017) Isolation and characterization of 37 polymorphic microsatellite loci of Megalobrama hoffmanni by next-generation sequencing technology and cross-species amplification in related species. J Genet 96(4):e39–e45. https://doi.org/10.1007/s12041-017-0815-6
Tang J, Baldwin SJ, Jacobs JM, Linden CG, Voorrips RE, Leunissen JA, van Eck H, Vosman B (2008) Large-scale identification of polymorphic microsatellites using an in silico approach. BMC Bioinform 9:374. https://doi.org/10.1186/1471-2105-9-374
Thiel T, Michalek W, Varshney RK, Graner A (2003) Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 106(3):411–422. https://doi.org/10.1007/s00122-002-1031-0
Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3—new capabilities and interfaces. Nucleic Acids Res 40(15):e115. https://doi.org/10.1093/nar/gks596
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) micro-checker: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes 4(3):535–538. https://doi.org/10.1111/j.1471-8286.2004.00684.x
Vieira ML, Santini L, Diniz AL, Munhoz Cde F (2016) Microsatellite markers: what they mean and why they are so useful. Genet Mol Biol 39(3):312–328. https://doi.org/10.1590/1678-4685-GMB-2016-0027
Wang J, Yu X, Zhao K, Zhang Y, Tong J, Peng Z (2012) Microsatellite development for an endangered bream Megalobrama pellegrini (Teleostei, Cyprinidae) using 454 sequencing. Int J Mol Sci 13(3):3009–3021. https://doi.org/10.3390/ijms13033009
Wang X, Lu P, Luo Z (2013) GMATo: A novel tool for the identification and analysis of microsatellites in large genomes. Bioinformation 9(10):541–544. https://doi.org/10.6026/97320630009541
Wu C, Li F, Chen J, Jiang X, Zou S (2015) Karyotype and DNA contents analysis of Parabramis and Megalobrama. J Shanghai Ocean Univ 24(6):801–809
Xia EH, Yao QY, Zhang HB, Jiang JJ, Zhang LP, Gao LZ (2015) CandiSSR: an efficient pipeline used for identifying candidate polymorphic SSRs based on multiple assembled sequences. Front Plant Sci 6:1171. https://doi.org/10.3389/fpls.2015.01171
Yeh FC, Yang RC, Boyle TBJ, Ye ZH, Mao JX (1999) POPGENE 1.32, the user-friendly shareware for population genetic analysis. Molecular Biology and Biotechnology Center, University of Alberta. https://sites.ualberta.ca/~fyeh/popgene_download.html Accessed 10 Dec 2021
Yue J, Liu Y (2022) SSR2Marker: an integrated pipeline for identification of SSR markers within any two given genome-scale sequences. Mol Hortic. https://doi.org/10.1186/s43897-022-00033-0
Zane L, Bargelloni L, Patarnello T (2002) Strategies for microsatellite isolation: a review. Mol Ecol 11(1):1–16. https://doi.org/10.1046/j.0962-1083.2001.01418.x
Funding
This work was supported by China Agriculture Research System under Grant CARS-45-39; Science and Technology Innovation Program of Hangzhou Academy of Agricultural Sciences under Grant 2019HNCT-01.
Author information
Authors and Affiliations
Contributions
KL and NX conducted the experiments; KL analyzed the data and wrote the manuscript; All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest in the publication.
Ethics approval
The approval from the Science and Technology Bureau of China and the Department of Wildlife Administration is not required for the experiments conducted in this paper when the fish in question are neither rare nor near extinction (first- or second-class state protection level). According to Measures of Zhejiang Province on Administration of Laboratory Animals, ethical approval was not required, because the approval is only necessary when researchers will use mammals.
Consent to participate
The participant has consented to the participants of the manuscript.
Consent for publication
The participant has consented to the submission of the manuscript to the journal.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, K., Xie, N. Pipeline for developing polymorphic microsatellites in species without reference genomes. 3 Biotech 12, 248 (2022). https://doi.org/10.1007/s13205-022-03313-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13205-022-03313-0
Keywords
- Polymorphic
- Microsatellites
- Simple sequence repeats
- Nucleotide insertions and deletions
- Reference genome