Abstract
Short tandem repeats (STRs) are contiguous repetitions of motifs (1 and 6nts) over DNA sequence and considered as important genetic markers. The design of a computational method for accurately and efficiently identifying STRs across whole genome sequences will be useful. We observe that atomic motifs, which we refer to as seeds, are the fundamental building blocks of STRs. We develop an algorithm that determines whether a motif is atomic as it moves through a sequence, and examines its non-cyclic redundancy and non-enclosing qualities to determine whether it can continue to repeat as a tandem pattern. The method is known as STR Seed Selection (3S) since its goal is to locate the seeds of STRs and track their sustainability. The approach extracts all the non-redundant STRs in linear time and using only a single scan of the sequence. Experiments show that 3S outperforms state-of-the-art exhaustive approaches in extracting STRs from genome-wide sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lygo JE, Johnson PE, Holdaway DJ et al (1994) The validation of short tandem repeat (STR) loci for use in forensic casework. Int J Legal Med 107(2):77–89. https://doi.org/10.1007/bf01225493
Laszik A, Brinkmann B, Sotonyi P et al (2000) Automated fluorescent detection of a 10 loci multiplex for paternity testing. Acta Biologica Hungarica 51(1):99–105
Madsen BE, Villesen P, Wiuf C (2008) Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 9(410). https://doi.org/10.1186/1471-2164-9-410
Sideris M, Papagrigoriadis S (2014) Molecular biomarkers and classification models in the evaluation of the prognosis of colorectal cancer. Rev Anticancer Res 34(5):2061–2068
Ott J, Wang J, Leal SM (2015) Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16(5):275–284. https://doi.org/10.1038/nrg3908
Allendorf FW, Hohenlohe PA, Luikart G et al (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11(10):697–709. https://doi.org/10.1038/nrg2844
Ishiura H, Doi K, Mitsui J et al (2018) Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet 50(4):581–590. https://doi.org/10.1038/s41588-018-0067-2
Lim KG, Kwoh CK, Hsu LY et al (2012) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14(1):67–81. https://doi.org/10.1093/bib/bbs023
Do HH, Choi KP, Preparata FP et al (2008) Spectrum-based de novo repeat detection in genomic sequences. J Comput Biol 15(5):469–87. https://doi.org/10.1089/cmb.2008.0013
Domaniç NO, Preparata FP (2007) A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric. J Comput Biol 14(7):873–91. https://doi.org/10.1089/cmb.2007.0018
Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: a software program using big data technique for mining simple sequence repeats from large-scale sequencing data. Front Big Data 4:727216. https://doi.org/10.3389/fdata.2021.727216. PMID: 35118375; PMCID: PMC8805145
Lewis DH, Jarvis DE, Maughan PJ. SSRgenotyper: a simple sequence repeat genotyping application for whole-genome resequencing and reduced representational sequencing projects. Appl Plant Sci 8(12):e11402. https://doi.org/10.1002/aps3.11402. PMID: 33344093; PMCID: PMC7742204
Gou X, Shi H, Yu S, Wang Z, Li C, Liu S, Ma J, Chen G, Liu T, Liu Y. SSRMMD: a rapid and accurate algorithm for mining SSR feature loci and candidate polymorphic SSRs based on assembled sequences. Front Genet 11:706. https://doi.org/10.3389/fgene.2020.00706. PMID: 32849772; PMCID: PMC7398111
Pickett BD, Karlinsey SM, Penrod CE et al (2016) SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences. Bioinformatics 32(17):2707–9. https://doi.org/10.1093/bioinformatics/btw298
Wirawan A, Kwoh CK, Hsu LY, Koh TH (2010) INVERTER: integrated Variable number Tandem repeat finder. In: Chan JH, Ong YS, Cho SB (eds) Computational systems-biology and bioinformatics. CSBio 2010. Communications in computer and information science, vol 115. Springer, Berlin, Heidelberg
Pickett BD, Miller JB, Ridge PG (2017) Kmer-SSR: a fast and exhaustive SSR search algorithm. Bioinformatics 33(24):3922–3928. https://doi.org/10.1093/bioinformatics/btx538
Avvaru AK, Sowpati DT, Mishra RK (2017) PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences. Bioinformatics 34(6):943–948. https://doi.org/10.1093/bioinformatics/btx721
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mitra, U., Ghosh, S., Gupta, S. (2024). 3S: A Fast and Exhaustive STR Search Algorithm. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_37
Download citation
DOI: https://doi.org/10.1007/978-981-99-5435-3_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5434-6
Online ISBN: 978-981-99-5435-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)