Skip to main content

3S: A Fast and Exhaustive STR Search Algorithm

  • Conference paper
  • First Online:
Data Science and Communication (ICTDsC 2023)

Abstract

Short tandem repeats (STRs) are contiguous repetitions of motifs (1 and 6nts) over DNA sequence and considered as important genetic markers. The design of a computational method for accurately and efficiently identifying STRs across whole genome sequences will be useful. We observe that atomic motifs, which we refer to as seeds, are the fundamental building blocks of STRs. We develop an algorithm that determines whether a motif is atomic as it moves through a sequence, and examines its non-cyclic redundancy and non-enclosing qualities to determine whether it can continue to repeat as a tandem pattern. The method is known as STR Seed Selection (3S) since its goal is to locate the seeds of STRs and track their sustainability. The approach extracts all the non-redundant STRs in linear time and using only a single scan of the sequence. Experiments show that 3S outperforms state-of-the-art exhaustive approaches in extracting STRs from genome-wide sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lygo JE, Johnson PE, Holdaway DJ et al (1994) The validation of short tandem repeat (STR) loci for use in forensic casework. Int J Legal Med 107(2):77–89. https://doi.org/10.1007/bf01225493

    Article  Google Scholar 

  2. Laszik A, Brinkmann B, Sotonyi P et al (2000) Automated fluorescent detection of a 10 loci multiplex for paternity testing. Acta Biologica Hungarica 51(1):99–105

    Article  Google Scholar 

  3. Madsen BE, Villesen P, Wiuf C (2008) Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 9(410). https://doi.org/10.1186/1471-2164-9-410

  4. Sideris M, Papagrigoriadis S (2014) Molecular biomarkers and classification models in the evaluation of the prognosis of colorectal cancer. Rev Anticancer Res 34(5):2061–2068

    Google Scholar 

  5. Ott J, Wang J, Leal SM (2015) Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16(5):275–284. https://doi.org/10.1038/nrg3908

    Article  Google Scholar 

  6. Allendorf FW, Hohenlohe PA, Luikart G et al (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11(10):697–709. https://doi.org/10.1038/nrg2844

    Article  Google Scholar 

  7. Ishiura H, Doi K, Mitsui J et al (2018) Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet 50(4):581–590. https://doi.org/10.1038/s41588-018-0067-2

  8. Lim KG, Kwoh CK, Hsu LY et al (2012) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14(1):67–81. https://doi.org/10.1093/bib/bbs023

    Article  Google Scholar 

  9. Do HH, Choi KP, Preparata FP et al (2008) Spectrum-based de novo repeat detection in genomic sequences. J Comput Biol 15(5):469–87. https://doi.org/10.1089/cmb.2008.0013

    Article  MathSciNet  Google Scholar 

  10. Domaniç NO, Preparata FP (2007) A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric. J Comput Biol 14(7):873–91. https://doi.org/10.1089/cmb.2007.0018

    Article  MathSciNet  Google Scholar 

  11. Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: a software program using big data technique for mining simple sequence repeats from large-scale sequencing data. Front Big Data 4:727216. https://doi.org/10.3389/fdata.2021.727216. PMID: 35118375; PMCID: PMC8805145

  12. Lewis DH, Jarvis DE, Maughan PJ. SSRgenotyper: a simple sequence repeat genotyping application for whole-genome resequencing and reduced representational sequencing projects. Appl Plant Sci 8(12):e11402. https://doi.org/10.1002/aps3.11402. PMID: 33344093; PMCID: PMC7742204

  13. Gou X, Shi H, Yu S, Wang Z, Li C, Liu S, Ma J, Chen G, Liu T, Liu Y. SSRMMD: a rapid and accurate algorithm for mining SSR feature loci and candidate polymorphic SSRs based on assembled sequences. Front Genet 11:706. https://doi.org/10.3389/fgene.2020.00706. PMID: 32849772; PMCID: PMC7398111

  14. Pickett BD, Karlinsey SM, Penrod CE et al (2016) SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences. Bioinformatics 32(17):2707–9. https://doi.org/10.1093/bioinformatics/btw298

    Article  Google Scholar 

  15. Wirawan A, Kwoh CK, Hsu LY, Koh TH (2010) INVERTER: integrated Variable number Tandem repeat finder. In: Chan JH, Ong YS, Cho SB (eds) Computational systems-biology and bioinformatics. CSBio 2010. Communications in computer and information science, vol 115. Springer, Berlin, Heidelberg

    Google Scholar 

  16. Pickett BD, Miller JB, Ridge PG (2017) Kmer-SSR: a fast and exhaustive SSR search algorithm. Bioinformatics 33(24):3922–3928. https://doi.org/10.1093/bioinformatics/btx538

    Article  Google Scholar 

  17. Avvaru AK, Sowpati DT, Mishra RK (2017) PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences. Bioinformatics 34(6):943–948. https://doi.org/10.1093/bioinformatics/btx721

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uddalak Mitra .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mitra, U., Ghosh, S., Gupta, S. (2024). 3S: A Fast and Exhaustive STR Search Algorithm. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_37

Download citation

Publish with us

Policies and ethics