Skip to main content

Profiling Short Tandem Repeats from Short Reads

  • Protocol
  • First Online:
Deep Sequencing Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1038))

Abstract

Short tandem repeats (STRs), also known as microsatellites, have a wide range of applications, including medical genetics, forensics, and population genetics. High-throughput sequencing has the potential to profile large numbers of STRs, but cumbersome gapped alignment and STR-specific noise patterns hamper this task. We recently developed an algorithm, called lobSTR, to overcome these challenges and to accurately profile STRs from short reads. Here we describe how to use lobSTR to call STR variations from high-throughput sequencing datasets and to diagnose the quality of the calls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mirkin SM (2007) Expandable DNA repeats and human disease. Nature 447:932

    Article  PubMed  CAS  Google Scholar 

  2. (1993) A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s Disease Collaborative Research Group. Cell 72: 971

    Google Scholar 

  3. Pearson CE, Nichol Edamura K, Cleary JD (2005) Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 6:729

    Article  PubMed  CAS  Google Scholar 

  4. Kozlowski P, Sobczak K, Krzyzosiak WJ (2010) Trinucleotide repeats: triggers for genomic disorders? Genome Med 2:29

    Article  PubMed  Google Scholar 

  5. Broman KW, Murray JC, Sheffield VC, White RL, Weber JL (1998) Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 63:861

    Article  PubMed  CAS  Google Scholar 

  6. Butler JM, Buel E, Crivellente F, McCord BR (2004) Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis. Electrophoresis 25:1397

    Article  PubMed  CAS  Google Scholar 

  7. Zhivotovsky LA et al (2004) The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 74:50

    Article  PubMed  CAS  Google Scholar 

  8. Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36

    CAS  Google Scholar 

  9. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754

    Article  PubMed  CAS  Google Scholar 

  10. Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473

    Article  PubMed  CAS  Google Scholar 

  11. Gymrek M, Golan D, Rosset S, Erlich Y (2012) lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 22(6):1154–1162

    Article  PubMed  CAS  Google Scholar 

  12. Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156

    Article  PubMed  CAS  Google Scholar 

  13. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573

    Article  PubMed  CAS  Google Scholar 

  14. Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996

    PubMed  CAS  Google Scholar 

  15. Robinson JT et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24

    Article  PubMed  CAS  Google Scholar 

  16. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767

    Article  PubMed  CAS  Google Scholar 

  17. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078

    Article  PubMed  Google Scholar 

  18. Bentley DR et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53

    Article  PubMed  CAS  Google Scholar 

  19. Wheeler DA et al (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872

    Article  PubMed  CAS  Google Scholar 

  20. Friedmann T (1979) Rapid nucleotide sequencing of DNA. Am J Hum Genet 31:19

    PubMed  CAS  Google Scholar 

  21. Rothberg JM et al (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348

    Article  PubMed  CAS  Google Scholar 

  22. Loman NJ et al (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30(5):434–439

    Article  PubMed  CAS  Google Scholar 

  23. Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996

    PubMed  CAS  Google Scholar 

  24. Sharma D, Issac B, Raghava GP, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405

    Article  PubMed  CAS  Google Scholar 

  25. Leclercq S, Rivals E, Jarne P (2007) Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics 8:125

    Article  PubMed  Google Scholar 

  26. Lim KG, Kwoh CK, Hsu LY, Wirawan A (2013) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14(1):67–81

    Article  PubMed  Google Scholar 

  27. Castelo AT, Martins W, Gao GR (2002) TROLL–tandem repeat occurrence locator. Bioinformatics 18:634

    Article  PubMed  CAS  Google Scholar 

  28. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

Y.E. is an Andria and Paul Heafy Family Fellow. This publication was supported by the National Defense Science and Engineering Graduate Fellowship (M.G.). We thank Dina Esposito for useful comments.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Gymrek, M., Erlich, Y. (2013). Profiling Short Tandem Repeats from Short Reads. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-514-9_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-513-2

  • Online ISBN: 978-1-62703-514-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics